fix(sweeper): 限制只掃 48h 內 incident,防止歷史舊案洗版 Telegram
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled

問題:
  首次部署 sweeper 時,找到 117 個無 sweeper_done: 標記的舊 incident
  (最舊 2026-04-09,7 天前) → 觸發全部 LLM 分析
  舊 incident 資料格式 → OPENCLAW_NEMO timeout → Expert System 降級
  confidence=0.2 "降級" → Telegram 連發相同格式告警洗版

修正:
  加入 _MAX_INCIDENT_AGE_HOURS=48 過濾
  只處理 48h 內的 INVESTIGATING incident
  確保 created_at 時區安全(naive → UTC)

2026-04-16 Claude Sonnet 4.6 Asia/Taipei

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
OG T
2026-04-16 01:27:02 +08:00
parent 0760315059
commit 9bfa6fc045

View File

@@ -36,6 +36,8 @@ _MAX_BATCH = 5 # 每批最多 5 個
_SEMAPHORE_LIMIT = 3 # 最多 3 個並發 AI 分析
_DONE_MARKER_PREFIX = "sweeper_done:" # 輕量標記:已觸發過分析
_DONE_MARKER_TTL = 3600 # 1 小時 TTL後續由 get_or_create 去重
# 2026-04-16 ogt: 只處理 48h 內的 incident避免首次啟動把所有歷史舊案洗版到 Telegram
_MAX_INCIDENT_AGE_HOURS = 48
async def run_incident_analysis_sweeper() -> None:
@@ -81,9 +83,30 @@ async def _sweep_once(sem: asyncio.Semaphore) -> None:
if not incidents:
return
# 過濾:只處理 48h 內的 incident避免首次啟動把全部歷史舊案洗版 Telegram
from datetime import datetime, timezone, timedelta
now_utc = datetime.now(timezone.utc)
cutoff = now_utc - timedelta(hours=_MAX_INCIDENT_AGE_HOURS)
recent_incidents = []
for incident in incidents:
created = getattr(incident, "created_at", None)
if created:
# 確保 created_at 有時區資訊
if created.tzinfo is None:
created = created.replace(tzinfo=timezone.utc)
if created >= cutoff:
recent_incidents.append(incident)
else:
# 沒有 created_at 的舊資料:跳過
pass
if not recent_incidents:
return
# 找出尚未觸發過分析的 (用輕量標記,不掃描 decision:DEC-* 全集)
unanalyzed = []
for incident in incidents:
for incident in recent_incidents:
done_key = f"{_DONE_MARKER_PREFIX}{incident.incident_id}"
if not await redis.exists(done_key):
unanalyzed.append(incident)