fix(governance): stabilize adr100 km growth slo
Some checks failed
Code Review / ai-code-review (push) Successful in 22s
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Successful in 25s
CD Pipeline / tests (push) Successful in 1m11s
CD Pipeline / post-deploy-checks (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
Some checks failed
Code Review / ai-code-review (push) Successful in 22s
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Successful in 25s
CD Pipeline / tests (push) Successful in 1m11s
CD Pipeline / post-deploy-checks (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
This commit is contained in:
@@ -45,9 +45,12 @@ groups:
|
||||
/
|
||||
sum(rate(approval_records_high_confidence_total[1h]))
|
||||
|
||||
# SLO 4: KM 增長率 = 24h increase (絕對值,不做 rate)
|
||||
# SLO 4: KM 增長率 = DB-derived 24h gauge;fallback 給舊 counter history
|
||||
- record: sli:km_growth_rate:24h
|
||||
expr: increase(knowledge_entries_total[24h])
|
||||
expr: |
|
||||
max(knowledge_entries_created_24h)
|
||||
or
|
||||
increase(knowledge_entries_total[24h])
|
||||
|
||||
# -----------------------------------------------------------------------
|
||||
# Error Budget Recording Rules(輔助 Grafana 顯示)
|
||||
@@ -248,7 +251,7 @@ groups:
|
||||
summary: "SLO KM 增長率嚴重不足(< 5 筆/day)— 疑似 KM 鏈斷裂"
|
||||
description: "過去 24h KM 新增 {{ $value }} 筆,遠低於目標 20 筆/day,飛輪學習迴圈疑似中斷。"
|
||||
runbook: |
|
||||
1. 確認 knowledge_entries_total counter 是否正常遞增
|
||||
1. 確認 knowledge_entries_created_24h gauge 與 knowledge_entries_total counter 是否正常遞增
|
||||
2. 查 governance_agent 日誌中 governance_km_growth_slo_violation
|
||||
3. 確認 auto_execute 後 KM 寫入路徑(feedback_flywheel_km_write_gap.md)
|
||||
4. 手動執行 POST /api/v1/governance/check
|
||||
|
||||
@@ -35,7 +35,8 @@ tests:
|
||||
- expr: sli:autonomy_rate:5m
|
||||
eval_time: 15m
|
||||
exp_samples:
|
||||
- value: 0.8
|
||||
- labels: '{__name__="sli:autonomy_rate:5m"}'
|
||||
value: 0.8
|
||||
|
||||
# ---- SLI 1: 自主化率 = 100%(無 human_required)----
|
||||
- interval: 1m
|
||||
@@ -47,7 +48,8 @@ tests:
|
||||
- expr: sli:autonomy_rate:5m
|
||||
eval_time: 15m
|
||||
exp_samples:
|
||||
- value: 1.0
|
||||
- labels: '{__name__="sli:autonomy_rate:5m"}'
|
||||
value: 1.0
|
||||
|
||||
# ---- SLI 2: 決策準確率 = 90% (success=9, auto_executed=10) ----
|
||||
- interval: 1m
|
||||
@@ -61,7 +63,8 @@ tests:
|
||||
- expr: sli:decision_accuracy:5m
|
||||
eval_time: 15m
|
||||
exp_samples:
|
||||
- value: 0.9
|
||||
- labels: '{__name__="sli:decision_accuracy:5m"}'
|
||||
value: 0.9
|
||||
|
||||
# ---- SLI 4: KM 增長率(24h increase)----
|
||||
- interval: 1m
|
||||
@@ -74,7 +77,23 @@ tests:
|
||||
eval_time: 25h
|
||||
exp_samples:
|
||||
# increase over 24h = 1440 samples × 1/min
|
||||
- value: 1440
|
||||
- labels: '{__name__="sli:km_growth_rate:24h"}'
|
||||
value: 1440
|
||||
|
||||
# ---- SLI 4: DB-derived gauge 優先,避免 counter 新上線暖機誤報 0 ----
|
||||
- interval: 1m
|
||||
name: "sli:km_growth_rate:24h 應優先使用 knowledge_entries_created_24h"
|
||||
input_series:
|
||||
- series: "knowledge_entries_created_24h"
|
||||
values: "25x30"
|
||||
- series: "knowledge_entries_total"
|
||||
values: "100x30"
|
||||
promql_expr_test:
|
||||
- expr: sli:km_growth_rate:24h
|
||||
eval_time: 15m
|
||||
exp_samples:
|
||||
- labels: '{__name__="sli:km_growth_rate:24h"}'
|
||||
value: 25
|
||||
|
||||
# ============================================================
|
||||
# Alert Tests — SLO 1: 自主化率
|
||||
@@ -120,6 +139,10 @@ tests:
|
||||
burn_window: 3d
|
||||
team: ai
|
||||
auto_repair: "false"
|
||||
exp_annotations:
|
||||
summary: "SLO 自主化率 slow burn(長期趨勢偏低)"
|
||||
description: "自主化率長期低於目標,累積 error budget 消耗率偏高,建議本週 review。"
|
||||
runbook: "分析近 7d 數據,是否需要重訓或調整 confidence threshold。"
|
||||
|
||||
# ---- 負測: 自主化率 = 85% → SlowBurn 不觸發 ----
|
||||
- interval: 1m
|
||||
@@ -157,6 +180,10 @@ tests:
|
||||
burn_window: 3d
|
||||
team: ai
|
||||
auto_repair: "false"
|
||||
exp_annotations:
|
||||
summary: "SLO 決策準確率 slow burn(長期趨勢偏低)"
|
||||
description: "決策準確率長期低於目標,累積 error budget 消耗偏高。"
|
||||
runbook: "近 7d verifier 失敗分析,考慮 playbook fine-tune。"
|
||||
|
||||
# ---- 負測: 決策準確率 = 92% → SlowBurn 不觸發 ----
|
||||
- interval: 1m
|
||||
@@ -192,6 +219,14 @@ tests:
|
||||
slo_name: km_growth_rate
|
||||
team: ai
|
||||
auto_repair: "false"
|
||||
exp_annotations:
|
||||
summary: "SLO KM 增長率嚴重不足(< 5 筆/day)— 疑似 KM 鏈斷裂"
|
||||
description: "過去 24h KM 新增 0 筆,遠低於目標 20 筆/day,飛輪學習迴圈疑似中斷。"
|
||||
runbook: |
|
||||
1. 確認 knowledge_entries_created_24h gauge 與 knowledge_entries_total counter 是否正常遞增
|
||||
2. 查 governance_agent 日誌中 governance_km_growth_slo_violation
|
||||
3. 確認 auto_execute 後 KM 寫入路徑(feedback_flywheel_km_write_gap.md)
|
||||
4. 手動執行 POST /api/v1/governance/check
|
||||
|
||||
# ---- 正測: KM 增長率 = 3/day → Critical 觸發(< 5)----
|
||||
- interval: 30m
|
||||
@@ -210,6 +245,14 @@ tests:
|
||||
slo_name: km_growth_rate
|
||||
team: ai
|
||||
auto_repair: "false"
|
||||
exp_annotations:
|
||||
summary: "SLO KM 增長率嚴重不足(< 5 筆/day)— 疑似 KM 鏈斷裂"
|
||||
description: "過去 24h KM 新增 2.9999999999999996 筆,遠低於目標 20 筆/day,飛輪學習迴圈疑似中斷。"
|
||||
runbook: |
|
||||
1. 確認 knowledge_entries_created_24h gauge 與 knowledge_entries_total counter 是否正常遞增
|
||||
2. 查 governance_agent 日誌中 governance_km_growth_slo_violation
|
||||
3. 確認 auto_execute 後 KM 寫入路徑(feedback_flywheel_km_write_gap.md)
|
||||
4. 手動執行 POST /api/v1/governance/check
|
||||
|
||||
# ---- 負測: KM 增長率 = 30/day → Critical 不觸發 ----
|
||||
- interval: 1m
|
||||
@@ -240,3 +283,7 @@ tests:
|
||||
slo_name: km_growth_rate
|
||||
team: ai
|
||||
auto_repair: "false"
|
||||
exp_annotations:
|
||||
summary: "SLO KM 增長率偏低(< 20 筆/day)"
|
||||
description: "過去 24h KM 新增 14.976000000000393 筆,低於目標 20 筆/day。"
|
||||
runbook: "查 KM 寫入路徑(auto_execute 後 _write_execution_result_to_km),確認飛輪 KM 閉環正常。"
|
||||
|
||||
Reference in New Issue
Block a user