fix(heartbeat): 對齊整點發送避免多replica各自發 + KM向量化改查embedding欄位
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 14m10s

- _heartbeat_loop: 先 sleep 到下一個整點倍數再開始循環
  避免 3 個 replica 啟動時間不同導致短時間內收到多條心跳
- heartbeat_report_service: km_vectorized 改查 KnowledgeEntryRecord.embedding IS NOT NULL
  原本錯誤查 IncidentRecord.vectorized 導致顯示 0/714 (0%)

2026-04-12 ogt (ADR-073 heartbeat fix)
This commit is contained in:
OG T
2026-04-12 16:33:15 +08:00
parent c8e9fbb518
commit 93f9522d5a
2 changed files with 14 additions and 6 deletions

View File

@@ -339,15 +339,13 @@ class HeartbeatReportService:
km_total = await db.scalar(select(func.count()).select_from(KnowledgeEntryRecord))
stats.km_total = km_total or 0
# Incident 向量化數
# KM 向量化數embedding IS NOT NULL
# 注意knowledge_entries 無 vectorized 欄位,用 embedding 判斷
vec_count = await db.scalar(
select(func.count()).select_from(IncidentRecord)
.where(IncidentRecord.vectorized == True) # noqa: E712
select(func.count()).select_from(KnowledgeEntryRecord)
.where(KnowledgeEntryRecord.embedding.isnot(None))
)
inc_total = await db.scalar(select(func.count()).select_from(IncidentRecord))
stats.km_vectorized = vec_count or 0
if not stats.km_total:
stats.km_total = inc_total or 0
# 24h 修復統計
since = datetime.utcnow() - timedelta(hours=24)

View File

@@ -4748,6 +4748,16 @@ class TelegramGateway:
"""
interval_seconds = interval_minutes * 60
# 對齊到下一個整點倍數(例如 interval=30 → 對齊到 :00 或 :30
# 避免多 replica 因啟動時間不同而各自發送
now_ts = datetime.now(UTC).timestamp()
next_slot = (int(now_ts / interval_seconds) + 1) * interval_seconds
wait_seconds = next_slot - now_ts
try:
await asyncio.sleep(wait_seconds)
except asyncio.CancelledError:
return
while self._heartbeat_active:
try:
await self.send_heartbeat()