fix(escalation): dedup escalation card by fingerprint + 24h TTL
Some checks failed
Code Review / ai-code-review (push) Successful in 55s
CD Pipeline / build-and-deploy (push) Has been cancelled
CD Pipeline / post-deploy-checks (push) Has been cancelled
CD Pipeline / tests (push) Has been cancelled

接續 b3a0f0d7(decision card dedup)—— 統帥 17:35 鐵證:4 條 ESCALATION P0
連發(HostOutOfDiskSpace + 3×HostDiskUsageHigh,全 target=node-exporter-110,
全不同 INC ID C9CD6E/FB7944/559B54/C1BBF3)。

decision card 修了但 escalation card 走另一條路徑,根因相同:
- emergency_escalation_service.py:31 dedup key 綁 incident_id (uuid4 隨機)
- TTL 900s 比 sweeper 重觸週期 1h 短

修法:
- escalate_auto_repair_unavailable() 改用 alertname+target fingerprint dedup
- TTL 900s → 86400s,與 decision_manager.py:574 對齊

drift_auto_adopt 路徑暫不動(TTL 已 3600s + report_id 非隨機,非當前問題)。

Tests: 7 passed (escalation/emergency 相關用例)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Your Name
2026-05-02 17:38:48 +08:00
parent 697e13b23a
commit 47342dfb34

View File

@@ -28,12 +28,19 @@ async def escalate_auto_repair_unavailable(
) -> None:
"""Open an emergency channel when auto repair cannot safely continue."""
dedup_key = f"auto_repair:emergency_escalated:{incident_id}"
if not await _dedup_first_send(dedup_key, ttl=900, event="auto_repair"):
# 2026-05-02 Claude Opus 4.7 + 統帥 ogtdedup key 從 incident_id → fingerprint(alertname+target)
# 鐵證4 條 ESCALATION 卡 17:35-17:36 連發HostOutOfDiskSpace + 3×HostDiskUsageHigh全 target=node-exporter-110
# 原本 incident_id 是 uuid4 隨機TTL 900s 太短 → 同症狀換 INC ID 完全不去重
# 改成 alertname+target fingerprint + TTL 86400s與 decision_manager.py:218 對齊。
_alertname_fp = (alert_type or "AutoRepairBlocked").strip().lower().replace(" ", "_")[:60]
_target_fp = (target_resource or "unknown").lower()[:40]
dedup_key = f"auto_repair:emergency_escalated:fp:{_alertname_fp}:{_target_fp}"
if not await _dedup_first_send(dedup_key, ttl=86400, event="auto_repair"):
logger.info(
"auto_repair_escalation_dedup_skipped",
incident_id=incident_id,
approval_id=approval_id,
fingerprint=f"{_alertname_fp}:{_target_fp}",
)
return