12-Agent 全景診斷觸發的 P0/P1 觀測層修復。 ## P0 假警報止血(4 SLO 雪崩根因) - governance_agent.py:306 — 空 result 不再 fallback 0.0,改 continue + log warning 根因:Prometheus 查無資料(emitter 未實作 / rule 未部署)被誤判為 SLO=0 必觸發 violated=True 噴 4 條假告警 ## P0 鬼魂按鈕守門 - telegram_gateway.py:1654 — LLM 動態按鈕 Redis 失敗時 btn_list.clear() first_row(批准/拒絕,HMAC nonce 無狀態)由 caller 1488 永遠保留 feedback_no_ghost_buttons.md 三缺一鐵律對齊 ## ConfigMap drift 修復(3 處) - config.py:683 PROMETHEUS_URL: 188→110(drift checker 揪出 = SPF-4 部分根因) - config.py:705 ARGOCD_URL: 125→121(T0 G3 已知) - config.py:375 AI_FALLBACK_ORDER: 補 nvidia 對齊 ConfigMap ## P1 Alertmanager 升級(amtool SUCCESS) - ops/alertmanager/alertmanager.yml: deprecated → v0.27+ 新語法 - match/match_re → matchers - source_match/target_match → source_matchers/target_matchers - group_by 加 team label(防 SLO 雪崩 4 條同秒推) - PostgreSQL/Redis inhibit 補 equal: ['instance'](防爆炸抑制) - 新增 3 組因果抑制: - OllamaInstanceDown → SLO_*/AI_*(30 分鐘) - KMConverterDown → SLO_KMGrowthRate* - SLO_*_FastBurn → SLO_*_(Medium|Slow)Burn ## 治理工具落地 - scripts/check_config_drift.py: ConfigMap vs code default drift 檢測 揪出 PROMETHEUS_URL drift 是 SPF-4 根因(governance_agent 連 188 而非 110) - scripts/health_check_session.sh: 11 服務 + 4 SSH + drift + git 全景驗證 ## 驗證 - 1552 unit tests 全綠 - amtool check-config SUCCESS(8 inhibit_rules / 2 receivers) - drift checker 4 欄位全對齊 - health check 11 服務全可達 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
110 lines
3.5 KiB
Python
Executable File
110 lines
3.5 KiB
Python
Executable File
#!/usr/bin/env python3
|
||
# 2026-04-28 ogt + Claude Opus 4.7: P2-1 ConfigMap vs code default drift checker
|
||
# 來源:tool-expert 統一治理方案
|
||
# 目的:CI / pre-commit 階段驗證 k8s ConfigMap 與 apps/api/src/core/config.py default 一致
|
||
# 違反「事實驅動」紅線案例:AI_FALLBACK_ORDER、ARGOCD_URL 都曾發生 drift
|
||
"""
|
||
ConfigMap vs Code Default Drift Checker
|
||
|
||
用法:
|
||
python3 scripts/check_config_drift.py
|
||
退出碼:
|
||
0 = 全部對齊
|
||
1 = 至少一項 drift(CI 應 fail)
|
||
|
||
可加進 .pre-commit-config.yaml:
|
||
- repo: local
|
||
hooks:
|
||
- id: config-drift-check
|
||
name: ConfigMap vs code default drift
|
||
entry: python3 scripts/check_config_drift.py
|
||
language: python
|
||
pass_filenames: false
|
||
additional_dependencies: [pyyaml]
|
||
"""
|
||
from __future__ import annotations
|
||
|
||
import json
|
||
import re
|
||
import sys
|
||
from pathlib import Path
|
||
|
||
import yaml # noqa: F401 pre-commit 會經 additional_dependencies 安裝
|
||
|
||
ROOT = Path(__file__).resolve().parent.parent
|
||
CONFIGMAP_PATH = ROOT / "k8s" / "awoooi-prod" / "04-configmap.yaml"
|
||
CONFIG_PY_PATH = ROOT / "apps" / "api" / "src" / "core" / "config.py"
|
||
|
||
# 需要比對的欄位
|
||
# code_default_pattern: 在 config.py 找 default=... 用的 regex(DOTALL)
|
||
CHECK_FIELDS: dict[str, dict[str, str]] = {
|
||
"AI_FALLBACK_ORDER": {
|
||
"configmap_key": "AI_FALLBACK_ORDER",
|
||
"code_pattern": r"AI_FALLBACK_ORDER:\s*list\[str\]\s*=\s*Field\([^)]*?default=(\[[^\]]+\])",
|
||
},
|
||
"ARGOCD_URL": {
|
||
"configmap_key": "ARGOCD_URL",
|
||
"code_pattern": r"ARGOCD_URL[^\n]*?\n[^)]*?default=[\"']([^\"']+)[\"']",
|
||
},
|
||
"PROMETHEUS_URL": {
|
||
"configmap_key": "PROMETHEUS_URL",
|
||
"code_pattern": r"PROMETHEUS_URL[^\n]*?\n[^)]*?default=[\"']([^\"']+)[\"']",
|
||
},
|
||
"OLLAMA_URL": {
|
||
"configmap_key": "OLLAMA_URL",
|
||
"code_pattern": r"OLLAMA_URL[^\n]*?\n[^)]*?default=[\"']([^\"']+)[\"']",
|
||
},
|
||
}
|
||
|
||
|
||
def _normalize(raw: str) -> object:
|
||
"""嘗試把字串解析成 list/dict,失敗就回原字串。"""
|
||
raw_strip = raw.strip().strip("'\"")
|
||
if raw_strip.startswith("["):
|
||
try:
|
||
return json.loads(raw_strip.replace("'", '"'))
|
||
except json.JSONDecodeError:
|
||
return raw_strip
|
||
return raw_strip
|
||
|
||
|
||
def main() -> int:
|
||
if not CONFIGMAP_PATH.exists():
|
||
print(f"[ERROR] ConfigMap not found: {CONFIGMAP_PATH}")
|
||
return 2
|
||
if not CONFIG_PY_PATH.exists():
|
||
print(f"[ERROR] config.py not found: {CONFIG_PY_PATH}")
|
||
return 2
|
||
|
||
with CONFIGMAP_PATH.open() as fh:
|
||
cm_data: dict = yaml.safe_load(fh).get("data", {}) or {}
|
||
py_src = CONFIG_PY_PATH.read_text()
|
||
|
||
exit_code = 0
|
||
print("=== ConfigMap ↔ code.default Drift Check ===")
|
||
for field, spec in CHECK_FIELDS.items():
|
||
cm_raw = cm_data.get(spec["configmap_key"], "<MISSING_IN_CONFIGMAP>")
|
||
m = re.search(spec["code_pattern"], py_src, re.DOTALL)
|
||
py_raw = m.group(1) if m else "<NOT_FOUND_IN_CONFIG_PY>"
|
||
|
||
cm_val = _normalize(cm_raw)
|
||
py_val = _normalize(py_raw)
|
||
|
||
if cm_val == py_val:
|
||
print(f"[OK] {field}: {cm_val}")
|
||
else:
|
||
print(f"[DRIFT] {field}:")
|
||
print(f" ConfigMap = {cm_val}")
|
||
print(f" config.py = {py_val}")
|
||
exit_code = 1
|
||
|
||
if exit_code == 0:
|
||
print("=== All drift-check fields aligned ===")
|
||
else:
|
||
print("=== DRIFT detected, fix the inconsistency ===")
|
||
return exit_code
|
||
|
||
|
||
if __name__ == "__main__":
|
||
sys.exit(main())
|