fix(aiops): use existing escalation event type
Some checks failed
CD Pipeline / tests (push) Successful in 1m54s
Code Review / ai-code-review (push) Successful in 29s
CD Pipeline / post-deploy-checks (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled

This commit is contained in:
Your Name
2026-05-01 10:56:59 +08:00
parent 78bcc090ad
commit 2c12bce135
3 changed files with 2 additions and 2 deletions

View File

@@ -43,7 +43,6 @@ ALERT_EVENT_TYPES = {
"BACKUP_COMPLETED",
"BACKUP_FAILED",
"APPROVAL_ESCALATED",
"EMERGENCY_ESCALATED",
"CHANGE_APPLIED",
# ADR-071 通知生命週期 (2026-04-11 Claude Sonnet 4.6 Asia/Taipei)
"NOTIFICATION_CLASSIFIED",

View File

@@ -56,7 +56,7 @@ async def escalate_auto_repair_unavailable(
)
await get_alert_operation_log_repository().append(
"EMERGENCY_ESCALATED",
"APPROVAL_ESCALATED",
incident_id=incident_id,
approval_id=approval_id,
actor="auto_repair",

View File

@@ -14,6 +14,7 @@ Live e2e 用 `HostBackupFailed` 打 Alertmanager 後發現 aged backup 告警會
- `_should_use_alertmanager_rule_first()` / `_should_bypass_alertmanager_llm()` 納入 `backup_failure`,備份失敗 YAML `SSH_DIAGNOSE` 不再被 LLM 覆蓋成 K8s 動作。
- `AutoRepairService` 追加 host/backup Playbook guard主機/備份 incident 若匹配到 K8s rollout 類 Playbook阻擋為 `HOST_BACKUP_K8S_PLAYBOOK`,改走緊急介入。
- `AutoRepairService` post-verification rollback guardhost/backup 或非 K8s Playbook 驗證失敗時,不再合成 `kubectl rollout restart deployment/{target}`,改走 emergency escalation且不自動 resolve incident。
- `EmergencyEscalationService` 沿用既有 `APPROVAL_ESCALATED` DB enum 寫 AOL避免緊急通道因新 enum 未 migration 而留痕失敗。
- `NodeExporterDown` Prometheus rule `auto_repair` 改為 `true`,與 YAML rule catalog 的 exporter restart 策略一致。
-`backup_failure` NO_ACTION / SSH_DIAGNOSE 單元測試。