Your Name
|
e2ab879636
|
fix(alerts): correct telegram execution truth
CD Pipeline / tests (push) Failing after 52s
CD Pipeline / build-and-deploy (push) Has been skipped
CD Pipeline / post-deploy-checks (push) Has been skipped
Code Review / ai-code-review (push) Successful in 11s
|
2026-05-31 13:58:39 +08:00 |
|
Your Name
|
0e14935351
|
fix(ops): classify systemd runner alerts as host resources
CD Pipeline / tests (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
CD Pipeline / post-deploy-checks (push) Has been cancelled
Code Review / ai-code-review (push) Has been cancelled
|
2026-05-05 14:28:18 +08:00 |
|
OG T
|
10b74affcf
|
fix(GAP-A4): 規則 Action 模板 placeholder 解析修復 — 解開 8.3h 飛輪沉默
CD Pipeline / build-and-deploy (push) Has been cancelled
🚨 真因診斷(統帥逮到):
API log 顯示最近 1 小時爆發大量 auto_execute_blocked_unresolved_placeholder:
- action: "kubectl rollout restart deployment HostHighCpuLoad" ← target=alertname
- action: "kubectl rollout restart deployment unknown"
- action: "kubectl scale deployment unknown --replicas=3"
根因:alert_rule_engine._extract_vars() target 解析邏輯不夠強健,
當 Prometheus 告警無 deployment label 時,退回 alertname 或 "unknown",
產生垃圾指令。GAP-A1 防注入閘正確攔下,但自動修復路徑因此卡死,
KM 不寫入 → 飛輪沈默。
修復(三層防護):
1. 新增 _strip_pod_suffix() — K8s Pod 名稱還原 Deployment base
- Deployment 格式: awoooi-api-7d6b776f78-4sgjl → awoooi-api
- StatefulSet: postgresql-0 → postgresql
- Legacy: my-job-x2m4k → my-job
2. 新增 _is_bad_target() — 垃圾 target 識別
- 空串 / "unknown" / "none" / "null"
- target == alertname 本身
- IP:port 格式、純 IP、含空白/括號/引號
- 未解析 {placeholder}
3. 重寫 _extract_vars() — 多層 label 查找(權威優先):
deployment > app > statefulset > pod(去後綴) > container > service > target_resource
每層都過 _is_bad_target 驗證,全失敗 → target="unknown"
4. match_rule() 後置雙驗證:
- bad target → 清空 kubectl_command (降級 LLM)
- 殘留 { or } → 清空 kubectl_command (模板未填完)
測試覆蓋:
- 33 個新單元測試(GAP-A4 四大場景全覆蓋)
- 214/214 回歸測試全過
影響:
- 原本產出「kubectl rollout restart deployment HostHighCpuLoad」的路徑
→ 現在會 `rule_kubectl_command_discarded_bad_target` 並降級 LLM
- LLM 若能從錯誤 log 推理真實 deployment,飛輪恢復正常運轉
- 若 LLM 也無解,進 TYPE-4 人工扶梯
2026-04-14 Claude Sonnet 4.6(MASTER 藍圖之外的隱性 Bug 殲滅)
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
|
2026-04-14 18:43:29 +08:00 |
|