fix(heartbeat): 對齊整點發送避免多replica各自發 + KM向量化改查embedding欄位
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 14m10s
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 14m10s
- _heartbeat_loop: 先 sleep 到下一個整點倍數再開始循環 避免 3 個 replica 啟動時間不同導致短時間內收到多條心跳 - heartbeat_report_service: km_vectorized 改查 KnowledgeEntryRecord.embedding IS NOT NULL 原本錯誤查 IncidentRecord.vectorized 導致顯示 0/714 (0%) 2026-04-12 ogt (ADR-073 heartbeat fix)
This commit is contained in:
@@ -339,15 +339,13 @@ class HeartbeatReportService:
|
||||
km_total = await db.scalar(select(func.count()).select_from(KnowledgeEntryRecord))
|
||||
stats.km_total = km_total or 0
|
||||
|
||||
# Incident 向量化數
|
||||
# KM 向量化數(embedding IS NOT NULL)
|
||||
# 注意:knowledge_entries 無 vectorized 欄位,用 embedding 判斷
|
||||
vec_count = await db.scalar(
|
||||
select(func.count()).select_from(IncidentRecord)
|
||||
.where(IncidentRecord.vectorized == True) # noqa: E712
|
||||
select(func.count()).select_from(KnowledgeEntryRecord)
|
||||
.where(KnowledgeEntryRecord.embedding.isnot(None))
|
||||
)
|
||||
inc_total = await db.scalar(select(func.count()).select_from(IncidentRecord))
|
||||
stats.km_vectorized = vec_count or 0
|
||||
if not stats.km_total:
|
||||
stats.km_total = inc_total or 0
|
||||
|
||||
# 24h 修復統計
|
||||
since = datetime.utcnow() - timedelta(hours=24)
|
||||
|
||||
@@ -4748,6 +4748,16 @@ class TelegramGateway:
|
||||
"""
|
||||
interval_seconds = interval_minutes * 60
|
||||
|
||||
# 對齊到下一個整點倍數(例如 interval=30 → 對齊到 :00 或 :30)
|
||||
# 避免多 replica 因啟動時間不同而各自發送
|
||||
now_ts = datetime.now(UTC).timestamp()
|
||||
next_slot = (int(now_ts / interval_seconds) + 1) * interval_seconds
|
||||
wait_seconds = next_slot - now_ts
|
||||
try:
|
||||
await asyncio.sleep(wait_seconds)
|
||||
except asyncio.CancelledError:
|
||||
return
|
||||
|
||||
while self._heartbeat_active:
|
||||
try:
|
||||
await self.send_heartbeat()
|
||||
|
||||
Reference in New Issue
Block a user