All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 8m57s
L3 修復總結(2026-04-25):
【修復 1】Gitea 跨域界限 kubectl 過濾(solver_agent.py)
根因:GiteaMemoryPressure 告警觸發 Solver → LLM 生成 'kubectl scale deployment gitea'
Gitea 在主機 docker-compose,不在 awoooi-prod K8s namespace → 執行必然失敗
變更:
- 添加 _filter_non_k8s_targets() 函數,對 scale/restart/delete/patch 指令驗證 target
- 添加 _KUBECTL_MUTATING_VERBS / _KUBECTL_ROLLOUT_MUTATING_SUBVERBS 常數
- 在 _solve() 呼叫 _fetch_k8s_inventory() 獲取實際部署清單
- 後置過濾:candidates 中若 target 不在 inventory 且屬寫入動詞 → 丟棄 + 警告
預期行為:GiteaMemoryPressure → Solver 現生成調查類 kubectl(get/describe),而非 scale
【修復 2】HostBackupFailed 誤判升級(incident_service.py + webhooks.py)
根因:備份失敗 >24h 被標記 TYPE-1(純資訊),導致靜默發送無按鈕卡片,未觸發自動修復
變更:
- incident_service.py classify_alert_early() 添加 age_hours 參數
- 添加 _BACKUP_AGE_UPGRADE_NAMES + _BACKUP_AGE_THRESHOLD_HOURS=24.0
- 若 alertname in (HostBackupFailed/Stale/Missing) 且 age > 24h → TYPE-3 升級
- webhooks.py 計算 alert.startsAt → age_hours,並傳遞給 classify_alert_early()
預期行為:HostBackupFailed 25h+ → 升級為 TYPE-3,觸發 LLM 分析 + P0 自動修復建議
測試結果:
- solver_agent: 35/35 tests PASSED ✅
- incident_service: 11/11 tests PASSED ✅
- incident_api integration: 7/7 tests PASSED ✅
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>