Files
awoooi/apps/api
OG T 513232e90b
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 28m30s
fix(decision_manager): Agent 分析結果覆寫 Webhook 垃圾 action
根因 (INC-20260416-C365D0 事故完整根因分析):
- Webhook inline LLM 建立 ApprovalRecord.action = "kubectl rollout restart awoooi-prod"
- Agent 分析正確(postgres disk → NO_ACTION)但只發新 Telegram 卡,未覆寫 DB
- 用戶批准 Agent 卡 → 系統查 incident_id → 找到 Webhook 舊 ApprovalRecord
  → 執行垃圾 action(rollout restart 一個磁碟告警!)

修復:
- approval_db.py: 新增 update_action_by_incident_id()(按 incident_id 更新 PENDING 記錄)
- decision_manager.py: Agent 確認 action 後立即覆寫 ApprovalRecord
  若 action="" (NO_ACTION): 存 "NO_ACTION - {description}" 讓用戶知道 Agent 建議觀察
  用戶批准時執行的是 Agent 的正確建議,而非 Webhook 的通用 action

2026-04-16 ogt + Claude Sonnet 4.6(亞太)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 20:07:15 +08:00
..