critic PR review 揭示已 push commits 的 7 個 blocker,本 commit 全部修復。
## C1 + C2 + M1 + M2 + M3 — KMWriter 真正統一契約(critic 最嚴重 5 條)
### C1 km_writer.py:194 — backfill 自打臉修
- 裸 asyncio.create_task(_backfill_path_a_approval) → await _backfill_path_a_approval_safe()
- 同步 await + 獨立 DLQ km:backfill:dlq + try/except 不阻塞主寫入
- 新增 km_backfill_reconciler_job.py(每 5 分鐘掃 DLQ)+ ENABLE_KM_BACKFILL_RECONCILER flag
- 防 Path B 比 Path A 先完成 → related_approval_id 永遠 NULL 的 race
### C2 km_writer.py:391 — KM_WRITE_AWAIT=false 路徑收緊
- 從 ensure_future(fire-and-forget 比舊版同步寫更糟)
- 改 await writer.write(retry=1, timeout=2.0)(仍 await 但只試一次、超時短)
- docstring 明確標註「緊急回滾用,不保證可靠性」
### M1 decision_manager.py:2178/2203 — 移除 _fire_and_forget 旁路
- 兩處 _fire_and_forget(executor.write_execution_result_to_km(...))
- 改 await asyncio.shield(...) + BaseException 保護(防上層 cancel 中斷)
- KM_WRITE_AWAIT=true 在這條路徑終於真正 await
### M2 incident_service.py:1099 — 自製 path 加 retry+DLQ
- 原本 if settings.KM_WRITE_AWAIT: await asyncio.wait_for else create_task
- 改 3 次指數退避 retry + DLQ 保護(呼叫 km_writer 私有 helper)
### M3 km_writer.py:166 — 冪等聲明對齊實作
- knowledge_repository.create() 加 UPSERT 路徑(pg_insert ON CONFLICT DO UPDATE)
- KnowledgeEntryCreate / KnowledgeEntryRecord 加 path_type 欄位
- migration: ADD COLUMN path_type + partial unique index uix_knowledge_incident_path
## M4 alertmanager.yml — equal: [] 收緊(critic 防爆炸抑制)
- OllamaInstanceDown / KMConverterDown 抑制加 equal: ['cluster'] 約束
- 防多 cluster 場景下任一 Ollama down 誤抑全 AI/SLO 告警
## M5 Alertmanager 版本驗證(已確認 v0.31.1,遠超 v0.22+)
## M6 governance_agent.py — health score 區分 skipped vs ok vs violated
- check_slo_compliance 加 _meta {violated_count, skipped_count, ok_count, all_skipped, status}
- run_self_check: SLO 全 skipped 時獨立發 governance_slo_data_gap 告警
(不污染 self_failure 計數,因為 no_data 是 emitter 未實作不是治理機制故障)
## M7 scripts/check_config_drift.py — 改 AST 解析
- regex 改 ast.parse 找 Settings ClassDef AnnAssign Field(default=...)
- 避免多行 list / default_factory= / 含跳行字串的 false negative
- 4 欄位(AI_FALLBACK_ORDER / ARGOCD_URL / PROMETHEUS_URL / OLLAMA_URL)全對齊
## 新增測試
- test_km_writer_backfill_reconciler.py: 7 cases(C1 reconciler + safe helper)
- test_km_writer_idempotent.py: 5 cases(M3 path_type 注入 + UPSERT 分支)
## 驗證
- 1585 unit tests 全綠(+13 從 1572)
- amtool check-config SUCCESS(8 inhibit_rules / 2 receivers)
- drift checker AST-based 4 欄位全對齊
- Alertmanager v0.31.1 確認支援新語法
## 期望影響
- KMWriter 名實統一:飛輪閉環 KM 寫入路徑 100% 可靠
- M4 抑制爆炸風險解除
- 治理層不再對 SLO no_data 靜默
- drift checker false negative 風險解除
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
166 lines
5.4 KiB
Python
Executable File
166 lines
5.4 KiB
Python
Executable File
#!/usr/bin/env python3
|
||
# 2026-04-28 ogt + Claude Opus 4.7: P2-1 ConfigMap vs code default drift checker
|
||
# 2026-04-29 ogt + Claude Opus 4.7: critic M7 修 — regex 改 AST 解析,避免 false negative
|
||
# 來源:tool-expert 統一治理方案 + critic PR review
|
||
# 目的:CI / pre-commit 階段驗證 k8s ConfigMap 與 apps/api/src/core/config.py default 一致
|
||
"""
|
||
ConfigMap vs Code Default Drift Checker (AST-based)
|
||
|
||
用法:
|
||
python3 scripts/check_config_drift.py
|
||
退出碼:
|
||
0 = 全部對齊
|
||
1 = 至少一項 drift(CI 應 fail)
|
||
2 = 配置/解析錯誤
|
||
|
||
設計:
|
||
用 ast.parse 解析 config.py 找 ClassDef Settings → 每個 AnnAssign 的
|
||
Field(default=...),避免 regex 對多行 list / default_factory= / 含跳行字串
|
||
的 false negative(critic M7)。
|
||
"""
|
||
from __future__ import annotations
|
||
|
||
import ast
|
||
import json
|
||
import sys
|
||
from pathlib import Path
|
||
from typing import Any
|
||
|
||
import yaml
|
||
|
||
ROOT = Path(__file__).resolve().parent.parent
|
||
CONFIGMAP_PATH = ROOT / "k8s" / "awoooi-prod" / "04-configmap.yaml"
|
||
CONFIG_PY_PATH = ROOT / "apps" / "api" / "src" / "core" / "config.py"
|
||
|
||
# 需要比對的欄位
|
||
CHECK_FIELDS: list[str] = [
|
||
"AI_FALLBACK_ORDER",
|
||
"ARGOCD_URL",
|
||
"PROMETHEUS_URL",
|
||
"OLLAMA_URL",
|
||
]
|
||
|
||
|
||
def _extract_field_default(call_node: ast.Call) -> Any:
|
||
"""從 ast.Call(Field(default=..., ...)) 提取 default value。
|
||
回傳 Python 物件(str / list / int / bool)或 None(找不到)。
|
||
"""
|
||
for kw in call_node.keywords:
|
||
if kw.arg == "default":
|
||
return _ast_to_value(kw.value)
|
||
if kw.arg == "default_factory":
|
||
# default_factory 動態產生,無法靜態比對
|
||
return "<DEFAULT_FACTORY_UNCOMPARABLE>"
|
||
return None
|
||
|
||
|
||
def _ast_to_value(node: ast.AST) -> Any:
|
||
"""ast 節點 → Python 物件(保守,無法解析回 None)。"""
|
||
if isinstance(node, ast.Constant):
|
||
return node.value
|
||
if isinstance(node, ast.List):
|
||
return [_ast_to_value(elt) for elt in node.elts]
|
||
if isinstance(node, ast.Tuple):
|
||
return tuple(_ast_to_value(elt) for elt in node.elts)
|
||
if isinstance(node, ast.Dict):
|
||
return {
|
||
_ast_to_value(k): _ast_to_value(v)
|
||
for k, v in zip(node.keys, node.values, strict=False)
|
||
}
|
||
if isinstance(node, ast.UnaryOp) and isinstance(node.op, ast.USub):
|
||
v = _ast_to_value(node.operand)
|
||
return -v if isinstance(v, (int, float)) else None
|
||
return None
|
||
|
||
|
||
def _parse_settings_defaults(py_path: Path) -> dict[str, Any]:
|
||
"""解析 config.py 的 Settings class 取所有欄位的 default value。"""
|
||
src = py_path.read_text(encoding="utf-8")
|
||
tree = ast.parse(src, filename=str(py_path))
|
||
|
||
defaults: dict[str, Any] = {}
|
||
|
||
for cls_node in ast.walk(tree):
|
||
if not isinstance(cls_node, ast.ClassDef) or cls_node.name != "Settings":
|
||
continue
|
||
for stmt in cls_node.body:
|
||
if not isinstance(stmt, ast.AnnAssign):
|
||
continue
|
||
if not isinstance(stmt.target, ast.Name):
|
||
continue
|
||
field_name = stmt.target.id
|
||
if stmt.value is None:
|
||
continue
|
||
# Settings = Field(default=..., ...)
|
||
if isinstance(stmt.value, ast.Call) and isinstance(stmt.value.func, ast.Name):
|
||
if stmt.value.func.id == "Field":
|
||
val = _extract_field_default(stmt.value)
|
||
if val is not None:
|
||
defaults[field_name] = val
|
||
continue
|
||
# 直接 default: var = "value"
|
||
val = _ast_to_value(stmt.value)
|
||
if val is not None:
|
||
defaults[field_name] = val
|
||
return defaults
|
||
|
||
|
||
def _normalize(raw: Any) -> Any:
|
||
"""ConfigMap 字串可能是 JSON list(如 AI_FALLBACK_ORDER),嘗試解析。"""
|
||
if isinstance(raw, str):
|
||
s = raw.strip()
|
||
if s.startswith("[") and s.endswith("]"):
|
||
try:
|
||
return json.loads(s.replace("'", '"'))
|
||
except json.JSONDecodeError:
|
||
return s
|
||
return raw
|
||
|
||
|
||
def main() -> int:
|
||
if not CONFIGMAP_PATH.exists():
|
||
print(f"[ERROR] ConfigMap not found: {CONFIGMAP_PATH}")
|
||
return 2
|
||
if not CONFIG_PY_PATH.exists():
|
||
print(f"[ERROR] config.py not found: {CONFIG_PY_PATH}")
|
||
return 2
|
||
|
||
try:
|
||
with CONFIGMAP_PATH.open() as fh:
|
||
cm_data: dict = (yaml.safe_load(fh) or {}).get("data", {}) or {}
|
||
except yaml.YAMLError as exc:
|
||
print(f"[ERROR] ConfigMap YAML parse: {exc}")
|
||
return 2
|
||
|
||
try:
|
||
py_defaults = _parse_settings_defaults(CONFIG_PY_PATH)
|
||
except SyntaxError as exc:
|
||
print(f"[ERROR] config.py AST parse: {exc}")
|
||
return 2
|
||
|
||
exit_code = 0
|
||
print("=== ConfigMap ↔ code.default Drift Check (AST-based) ===")
|
||
for field in CHECK_FIELDS:
|
||
cm_raw = cm_data.get(field, "<MISSING_IN_CONFIGMAP>")
|
||
py_val = py_defaults.get(field, "<NOT_FOUND_IN_CONFIG_PY>")
|
||
|
||
cm_val = _normalize(cm_raw)
|
||
|
||
if cm_val == py_val:
|
||
print(f"[OK] {field}: {cm_val}")
|
||
else:
|
||
print(f"[DRIFT] {field}:")
|
||
print(f" ConfigMap = {cm_val}")
|
||
print(f" config.py = {py_val}")
|
||
exit_code = 1
|
||
|
||
if exit_code == 0:
|
||
print("=== All drift-check fields aligned ===")
|
||
else:
|
||
print("=== DRIFT detected, fix the inconsistency ===")
|
||
return exit_code
|
||
|
||
|
||
if __name__ == "__main__":
|
||
sys.exit(main())
|