OG T
ba18ad2ef8
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Successful in 40s
CD Pipeline / build-and-deploy (push) Successful in 8m37s
feat(hermes+rules): LLM 升級 Hermes + 統帥決策 deprecate PostgreSQLDiskGrowthRate
統帥 2026-04-19 決策:
- Rule 1 PostgreSQLDiskGrowthRate → 選項 C: deprecate + 替代新規則
- Rule 2 NoAlertsReceived2Hours → 保留 (真實告警鏈路守護)
- noise_rate 算法先修正 (NO_ACTION 不算 fp),觀察後動態調整
1. rule_stats_updater v2 noise 算法:
原: 任何 EXPIRED approval 都算 fp
問題: NO_ACTION/OBSERVE/INVESTIGATE 是 AI 純觀察,不該算假報
修: WHERE ar.action NOT ILIKE '%NO_ACTION%' AND NOT ILIKE '%OBSERVE%' AND ...
2. hermes_rule_quality v2 LLM 升級:
新增 _llm_analyze_noisy_rule:
- 用 OpenClaw (Ollama/NemoTron/Gemini) 分析每條噪音 rule
- JSON 輸出: probable_root_causes/recommended_actions/confidence/should_deprecate
- 3 路 parse fallback (直接 / NemoTron wrapper / description nested)
_write_advisory_aol 加 llm_analysis 到 output_payload
_send_telegram_summary 加 AI 判定 + top 2 建議 (8 條上限避免太長)
符合統帥鐵律: AI 分析但不自動動作,仍人工決策
3. ops/monitoring/alerts-unified.yml 替換 Rule 1:
刪 PostgreSQLDiskGrowthRate (500MB/h 增長 → 觸發 WAL 正常行為誤報)
加 HostDiskUsageHigh (>80% for 10m, warning)
加 HostDiskUsageCritical (>90% for 5m, critical)
兩者 labels.supersedes='PostgreSQLDiskGrowthRate' 供追溯
(待 deploy-alerts workflow 下次 apply 到 Prometheus)
4. DB 即時 mark deprecated (避免等 alerts yaml 部署前 Hermes 又推):
UPDATE alert_rule_catalog SET review_status='deprecated' WHERE rule_name='PostgreSQLDiskGrowthRate'
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 19:39:05 +08:00
..
2026-04-16 03:19:58 +08:00
2026-04-09 18:04:19 +08:00
2026-04-03 00:18:00 +08:00
2026-04-19 19:39:05 +08:00
2026-03-23 18:40:36 +08:00
2026-03-29 10:29:11 +08:00
2026-03-29 16:04:14 +08:00
2026-03-24 15:19:52 +08:00
2026-04-05 11:10:02 +08:00