fix(telegram): ADR-091 禁止 Agent Debate 分析失敗時廣播「待分析」喪屍卡片
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 10m51s
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 10m51s
問題根因: GET /incidents 觸發 Phase 2 Agent Debate → LLM 全失敗 → description="待分析" + action="" → 每隔幾分鐘廣播新 Telegram 卡片 → 告警疲勞(SRE 最致命的殺手) 架構缺陷 (anti-pattern): GET 請求(讀取操作)產生對外廣播副作用 → 違反 RESTful 原則 修復 (_push_decision_to_telegram): 在 DB 更新完成後、Telegram 推送前加入閘門: description="待分析" AND action="" → 靜默退出,絕不廣播 ADR-091 鐵律: 只有 Alertmanager Webhook POST(真實新告警)可觸發 Telegram 廣播 Agent Debate 失敗分析 → 靜默 DB 更新,不污染頻道 2026-04-17 ogt + Claude Sonnet 4.6 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -264,6 +264,20 @@ async def _push_decision_to_telegram(
|
||||
error=str(_update_err),
|
||||
)
|
||||
|
||||
# 🚨 ADR-091 鐵律 (2026-04-17 ogt + Claude Sonnet 4.6): 禁止分析失敗廣播
|
||||
# 問題: GET /incidents 觸發 Agent Debate → LLM 全部失敗 → description="待分析" + action=""
|
||||
# → 系統每隔幾分鐘廣播「待分析」喪屍卡片 → 告警疲勞(SRE 最致命的殺手)
|
||||
# 規則: description="待分析" + action="" = Phase 2 所有 Agent 無有效輸出 → 靜默退出
|
||||
# 已完成: DB 狀態已在上方更新 (update_action_by_incident_id),不需要 Telegram 廣播
|
||||
# 禁止改動: 此閘門保護全域告警信噪比,任何「加例外」需首席架構師書面授權
|
||||
if description.strip() == "待分析" and not action.strip():
|
||||
logger.info(
|
||||
"telegram_push_suppressed_no_analysis",
|
||||
incident_id=incident.incident_id,
|
||||
reason="Agent Debate 無有效輸出 (description=待分析, action=空白)",
|
||||
)
|
||||
return
|
||||
|
||||
# 建立 approval_id (使用 incident_id 作為追蹤)
|
||||
# 2026-03-27 ogt: 修復 INC-INC-INC- 重複前綴 bug
|
||||
approval_id = incident.incident_id # 已經是 INC-xxx 格式
|
||||
|
||||
Reference in New Issue
Block a user