Files
awoooi/scripts/check_config_drift.py
Your Name c5753e1c57 fix(critic-review): KMWriter 名實統一 + Alertmanager 修抑制 + drift checker AST 化
critic PR review 揭示已 push commits 的 7 個 blocker,本 commit 全部修復。

## C1 + C2 + M1 + M2 + M3 — KMWriter 真正統一契約(critic 最嚴重 5 條)

### C1 km_writer.py:194 — backfill 自打臉修
- 裸 asyncio.create_task(_backfill_path_a_approval) → await _backfill_path_a_approval_safe()
- 同步 await + 獨立 DLQ km:backfill:dlq + try/except 不阻塞主寫入
- 新增 km_backfill_reconciler_job.py(每 5 分鐘掃 DLQ)+ ENABLE_KM_BACKFILL_RECONCILER flag
- 防 Path B 比 Path A 先完成 → related_approval_id 永遠 NULL 的 race

### C2 km_writer.py:391 — KM_WRITE_AWAIT=false 路徑收緊
- 從 ensure_future(fire-and-forget 比舊版同步寫更糟)
- 改 await writer.write(retry=1, timeout=2.0)(仍 await 但只試一次、超時短)
- docstring 明確標註「緊急回滾用,不保證可靠性」

### M1 decision_manager.py:2178/2203 — 移除 _fire_and_forget 旁路
- 兩處 _fire_and_forget(executor.write_execution_result_to_km(...))
- 改 await asyncio.shield(...) + BaseException 保護(防上層 cancel 中斷)
- KM_WRITE_AWAIT=true 在這條路徑終於真正 await

### M2 incident_service.py:1099 — 自製 path 加 retry+DLQ
- 原本 if settings.KM_WRITE_AWAIT: await asyncio.wait_for else create_task
- 改 3 次指數退避 retry + DLQ 保護(呼叫 km_writer 私有 helper)

### M3 km_writer.py:166 — 冪等聲明對齊實作
- knowledge_repository.create() 加 UPSERT 路徑(pg_insert ON CONFLICT DO UPDATE)
- KnowledgeEntryCreate / KnowledgeEntryRecord 加 path_type 欄位
- migration: ADD COLUMN path_type + partial unique index uix_knowledge_incident_path

## M4 alertmanager.yml — equal: [] 收緊(critic 防爆炸抑制)
- OllamaInstanceDown / KMConverterDown 抑制加 equal: ['cluster'] 約束
- 防多 cluster 場景下任一 Ollama down 誤抑全 AI/SLO 告警

## M5 Alertmanager 版本驗證(已確認 v0.31.1,遠超 v0.22+)

## M6 governance_agent.py — health score 區分 skipped vs ok vs violated
- check_slo_compliance 加 _meta {violated_count, skipped_count, ok_count, all_skipped, status}
- run_self_check: SLO 全 skipped 時獨立發 governance_slo_data_gap 告警
  (不污染 self_failure 計數,因為 no_data 是 emitter 未實作不是治理機制故障)

## M7 scripts/check_config_drift.py — 改 AST 解析
- regex 改 ast.parse 找 Settings ClassDef AnnAssign Field(default=...)
- 避免多行 list / default_factory= / 含跳行字串的 false negative
- 4 欄位(AI_FALLBACK_ORDER / ARGOCD_URL / PROMETHEUS_URL / OLLAMA_URL)全對齊

## 新增測試
- test_km_writer_backfill_reconciler.py: 7 cases(C1 reconciler + safe helper)
- test_km_writer_idempotent.py: 5 cases(M3 path_type 注入 + UPSERT 分支)

## 驗證
- 1585 unit tests 全綠(+13 從 1572)
- amtool check-config SUCCESS(8 inhibit_rules / 2 receivers)
- drift checker AST-based 4 欄位全對齊
- Alertmanager v0.31.1 確認支援新語法

## 期望影響
- KMWriter 名實統一:飛輪閉環 KM 寫入路徑 100% 可靠
- M4 抑制爆炸風險解除
- 治理層不再對 SLO no_data 靜默
- drift checker false negative 風險解除

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 10:44:39 +08:00

166 lines
5.4 KiB
Python
Executable File
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
#!/usr/bin/env python3
# 2026-04-28 ogt + Claude Opus 4.7: P2-1 ConfigMap vs code default drift checker
# 2026-04-29 ogt + Claude Opus 4.7: critic M7 修 — regex 改 AST 解析,避免 false negative
# 來源tool-expert 統一治理方案 + critic PR review
# 目的CI / pre-commit 階段驗證 k8s ConfigMap 與 apps/api/src/core/config.py default 一致
"""
ConfigMap vs Code Default Drift Checker (AST-based)
用法:
python3 scripts/check_config_drift.py
退出碼:
0 = 全部對齊
1 = 至少一項 driftCI 應 fail
2 = 配置/解析錯誤
設計:
用 ast.parse 解析 config.py 找 ClassDef Settings → 每個 AnnAssign 的
Field(default=...),避免 regex 對多行 list / default_factory= / 含跳行字串
的 false negativecritic M7
"""
from __future__ import annotations
import ast
import json
import sys
from pathlib import Path
from typing import Any
import yaml
ROOT = Path(__file__).resolve().parent.parent
CONFIGMAP_PATH = ROOT / "k8s" / "awoooi-prod" / "04-configmap.yaml"
CONFIG_PY_PATH = ROOT / "apps" / "api" / "src" / "core" / "config.py"
# 需要比對的欄位
CHECK_FIELDS: list[str] = [
"AI_FALLBACK_ORDER",
"ARGOCD_URL",
"PROMETHEUS_URL",
"OLLAMA_URL",
]
def _extract_field_default(call_node: ast.Call) -> Any:
"""從 ast.Call(Field(default=..., ...)) 提取 default value。
回傳 Python 物件str / list / int / bool或 None找不到
"""
for kw in call_node.keywords:
if kw.arg == "default":
return _ast_to_value(kw.value)
if kw.arg == "default_factory":
# default_factory 動態產生,無法靜態比對
return "<DEFAULT_FACTORY_UNCOMPARABLE>"
return None
def _ast_to_value(node: ast.AST) -> Any:
"""ast 節點 → Python 物件(保守,無法解析回 None"""
if isinstance(node, ast.Constant):
return node.value
if isinstance(node, ast.List):
return [_ast_to_value(elt) for elt in node.elts]
if isinstance(node, ast.Tuple):
return tuple(_ast_to_value(elt) for elt in node.elts)
if isinstance(node, ast.Dict):
return {
_ast_to_value(k): _ast_to_value(v)
for k, v in zip(node.keys, node.values, strict=False)
}
if isinstance(node, ast.UnaryOp) and isinstance(node.op, ast.USub):
v = _ast_to_value(node.operand)
return -v if isinstance(v, (int, float)) else None
return None
def _parse_settings_defaults(py_path: Path) -> dict[str, Any]:
"""解析 config.py 的 Settings class 取所有欄位的 default value。"""
src = py_path.read_text(encoding="utf-8")
tree = ast.parse(src, filename=str(py_path))
defaults: dict[str, Any] = {}
for cls_node in ast.walk(tree):
if not isinstance(cls_node, ast.ClassDef) or cls_node.name != "Settings":
continue
for stmt in cls_node.body:
if not isinstance(stmt, ast.AnnAssign):
continue
if not isinstance(stmt.target, ast.Name):
continue
field_name = stmt.target.id
if stmt.value is None:
continue
# Settings = Field(default=..., ...)
if isinstance(stmt.value, ast.Call) and isinstance(stmt.value.func, ast.Name):
if stmt.value.func.id == "Field":
val = _extract_field_default(stmt.value)
if val is not None:
defaults[field_name] = val
continue
# 直接 default: var = "value"
val = _ast_to_value(stmt.value)
if val is not None:
defaults[field_name] = val
return defaults
def _normalize(raw: Any) -> Any:
"""ConfigMap 字串可能是 JSON list如 AI_FALLBACK_ORDER嘗試解析。"""
if isinstance(raw, str):
s = raw.strip()
if s.startswith("[") and s.endswith("]"):
try:
return json.loads(s.replace("'", '"'))
except json.JSONDecodeError:
return s
return raw
def main() -> int:
if not CONFIGMAP_PATH.exists():
print(f"[ERROR] ConfigMap not found: {CONFIGMAP_PATH}")
return 2
if not CONFIG_PY_PATH.exists():
print(f"[ERROR] config.py not found: {CONFIG_PY_PATH}")
return 2
try:
with CONFIGMAP_PATH.open() as fh:
cm_data: dict = (yaml.safe_load(fh) or {}).get("data", {}) or {}
except yaml.YAMLError as exc:
print(f"[ERROR] ConfigMap YAML parse: {exc}")
return 2
try:
py_defaults = _parse_settings_defaults(CONFIG_PY_PATH)
except SyntaxError as exc:
print(f"[ERROR] config.py AST parse: {exc}")
return 2
exit_code = 0
print("=== ConfigMap ↔ code.default Drift Check (AST-based) ===")
for field in CHECK_FIELDS:
cm_raw = cm_data.get(field, "<MISSING_IN_CONFIGMAP>")
py_val = py_defaults.get(field, "<NOT_FOUND_IN_CONFIG_PY>")
cm_val = _normalize(cm_raw)
if cm_val == py_val:
print(f"[OK] {field}: {cm_val}")
else:
print(f"[DRIFT] {field}:")
print(f" ConfigMap = {cm_val}")
print(f" config.py = {py_val}")
exit_code = 1
if exit_code == 0:
print("=== All drift-check fields aligned ===")
else:
print("=== DRIFT detected, fix the inconsistency ===")
return exit_code
if __name__ == "__main__":
sys.exit(main())