fix(aiops): use existing escalation event type
This commit is contained in:
@@ -43,7 +43,6 @@ ALERT_EVENT_TYPES = {
|
|||||||
"BACKUP_COMPLETED",
|
"BACKUP_COMPLETED",
|
||||||
"BACKUP_FAILED",
|
"BACKUP_FAILED",
|
||||||
"APPROVAL_ESCALATED",
|
"APPROVAL_ESCALATED",
|
||||||
"EMERGENCY_ESCALATED",
|
|
||||||
"CHANGE_APPLIED",
|
"CHANGE_APPLIED",
|
||||||
# ADR-071 通知生命週期 (2026-04-11 Claude Sonnet 4.6 Asia/Taipei)
|
# ADR-071 通知生命週期 (2026-04-11 Claude Sonnet 4.6 Asia/Taipei)
|
||||||
"NOTIFICATION_CLASSIFIED",
|
"NOTIFICATION_CLASSIFIED",
|
||||||
|
|||||||
@@ -56,7 +56,7 @@ async def escalate_auto_repair_unavailable(
|
|||||||
)
|
)
|
||||||
|
|
||||||
await get_alert_operation_log_repository().append(
|
await get_alert_operation_log_repository().append(
|
||||||
"EMERGENCY_ESCALATED",
|
"APPROVAL_ESCALATED",
|
||||||
incident_id=incident_id,
|
incident_id=incident_id,
|
||||||
approval_id=approval_id,
|
approval_id=approval_id,
|
||||||
actor="auto_repair",
|
actor="auto_repair",
|
||||||
|
|||||||
@@ -14,6 +14,7 @@ Live e2e 用 `HostBackupFailed` 打 Alertmanager 後發現 aged backup 告警會
|
|||||||
- `_should_use_alertmanager_rule_first()` / `_should_bypass_alertmanager_llm()` 納入 `backup_failure`,備份失敗 YAML `SSH_DIAGNOSE` 不再被 LLM 覆蓋成 K8s 動作。
|
- `_should_use_alertmanager_rule_first()` / `_should_bypass_alertmanager_llm()` 納入 `backup_failure`,備份失敗 YAML `SSH_DIAGNOSE` 不再被 LLM 覆蓋成 K8s 動作。
|
||||||
- `AutoRepairService` 追加 host/backup Playbook guard:主機/備份 incident 若匹配到 K8s rollout 類 Playbook,阻擋為 `HOST_BACKUP_K8S_PLAYBOOK`,改走緊急介入。
|
- `AutoRepairService` 追加 host/backup Playbook guard:主機/備份 incident 若匹配到 K8s rollout 類 Playbook,阻擋為 `HOST_BACKUP_K8S_PLAYBOOK`,改走緊急介入。
|
||||||
- `AutoRepairService` post-verification rollback guard:host/backup 或非 K8s Playbook 驗證失敗時,不再合成 `kubectl rollout restart deployment/{target}`,改走 emergency escalation,且不自動 resolve incident。
|
- `AutoRepairService` post-verification rollback guard:host/backup 或非 K8s Playbook 驗證失敗時,不再合成 `kubectl rollout restart deployment/{target}`,改走 emergency escalation,且不自動 resolve incident。
|
||||||
|
- `EmergencyEscalationService` 沿用既有 `APPROVAL_ESCALATED` DB enum 寫 AOL,避免緊急通道因新 enum 未 migration 而留痕失敗。
|
||||||
- `NodeExporterDown` Prometheus rule `auto_repair` 改為 `true`,與 YAML rule catalog 的 exporter restart 策略一致。
|
- `NodeExporterDown` Prometheus rule `auto_repair` 改為 `true`,與 YAML rule catalog 的 exporter restart 策略一致。
|
||||||
- 補 `backup_failure` NO_ACTION / SSH_DIAGNOSE 單元測試。
|
- 補 `backup_failure` NO_ACTION / SSH_DIAGNOSE 單元測試。
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user