fix(auto-repair): _extract_symptoms 優先用 labels.alertname 取得原始 alertname
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled

問題: signal.alert_name 存的是 alert_type (如 "custom"),而非 Prometheus
alertname (如 "SentryDown"),導致 playbook Jaccard 匹配永遠失敗 (NO_MATCH)。

根本原因: webhook 的 alertname_to_type mapping 將未知 alertname 轉為 "custom",
存入 signal.alert_name,但 Playbook 的 symptom_pattern.alert_names 存原始名稱。

修正: 從 signal.labels["alertname"] 讀取原始 Prometheus alertname,
fallback 到 signal.alert_name (保持向下相容)。

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
OG T
2026-04-09 12:26:18 +08:00
parent 5bd8a8a719
commit fc03eb1f4d

View File

@@ -517,7 +517,11 @@ class AutoRepairService:
if incident.signals:
for signal in incident.signals:
alert_names.append(signal.alert_name)
# 優先用 labels["alertname"](原始 Prometheus alertname
# fallback 到 signal.alert_name可能是 "custom" 等類別值)
# (2026-04-09 Claude Sonnet 4.6 Asia/Taipei, L7 E2E 修正)
raw_alertname = signal.labels.get("alertname") if signal.labels else None
alert_names.append(raw_alertname or signal.alert_name)
# 從 annotations 提取關鍵字
if signal.annotations:
for value in signal.annotations.values():