docs(awooop): record t5 reconciliation deployment

This commit is contained in:
Your Name
2026-05-13 09:14:15 +08:00
parent 631fc22090
commit 5294f0712f
2 changed files with 76 additions and 0 deletions

View File

@@ -1,3 +1,64 @@
## 2026-05-13 | T5 Incident / Approval / Execution reconciliation 已推版
**背景**B6C589 類 incident 會出現狀態矛盾Telegram 顯示需要審批 / 處理DB 裡 approval 已 `APPROVED` 且 action 是 `NO_ACTION`,但 incident 仍 `INVESTIGATING`automation execution / verification 又沒有成功紀錄。Operator 不能再靠人工猜測「AI 到底修了沒」。
**修正**
- `awooop_truth_chain_service.py` 新增 read-only `incident_reconciliation_v1`
- 不自動關 incident、不補寫 approval、不重跑 execution只把跨表狀態一致性機器化輸出。
- Reconciliation 會比對:
- incident 是否已關閉。
- latest approval 是否已終態。
- approval 是否 approved 但沒有 `automation_operation_log`
- `NO_ACTION` 是否沒有 successful executor operation。
- evidence sensors 是否全部失敗。
- timeline 是否缺少 lifecycle entries。
- Truth-chain 回傳:
- `consistency_status=consistent|degraded|blocked|not_applicable`
- `operator_next_state=continue|investigate|manual_required|not_applicable`
- `facts`
- `mismatches[]`
**驗證與推版**
- Local
- `py_compile`pass。
- `ruff --select F,E9`pass。
- `pytest tests/test_awooop_truth_chain_service.py tests/test_phase25_drift_detection.py tests/test_drift_interpreter_ollama_first.py tests/test_platform_router_order.py tests/test_awooop_operator_auth.py -q`39 passed。
- `git diff --check`pass。
- Gitea
- `1003fa42 feat(awooop): expose incident reconciliation state` 已推 `gitea main`
- Code Review run `1940`success。
- CD run `1939`success。
- Deploy marker`631fc220 chore(cd): deploy 1003fa4 [skip ci]`
- Production
- API/Web/Worker image 均為 `1003fa4246290bec2bec4cd04caae9b8221996d9`
- K3s rollout statusAPI/Web/Worker success。
- Healthhost-local NodePort `127.0.0.1:32334` healthy / mock_mode=false本機直連 `192.168.0.120:32334` 當下仍 timeout需另查 host/network path。
- Truth-chain smoke `INC-20260512-B6C589`
- `source_type=incident`
- `current_stage=manual_required`
- `stage_status=blocked`
- `needs_human=true`
- `reconciliation_schema=incident_reconciliation_v1`
- `consistency_status=blocked`
- `operator_next_state=manual_required`
- mismatch codes
- `incident_open_after_approval_resolved`
- `approval_approved_without_execution_record`
- `approval_no_action_without_execution`
- `evidence_all_sensors_failed`
- `automation_records=0`
- `timeline_events=1`
**整體進度**:
- Wave 0MOMO PostgreSQL backup → AwoooP 失敗通知接線完成並已推版。
- T0Truth-chain read-only API 完成、部署、production smoke 完成。
- T1Channel Event hardening 完成、部署、production smoke 完成。
- T2legacy MCP audit bridge / backfill / truth-chain visibility 完成、部署、production smoke 完成first-class Gateway enforced path 仍待後續 wave。
- T3Ansible audit contract + decision candidate dry-run audit 完成、部署、production smoke 完成。
- T4Config Drift stable fingerprint / repeat-state / Telegram stage visibility 完成、部署、production smoke 完成。
- T5Incident / Approval / Execution reconciliation 完成、部署、production smoke 完成。
- 仍未完成first-class MCP Gateway enforcement、Ansible 真正 check-mode executor / diff / apply / rollback、reconciliation 結果推回 Telegram / Operator Console UI 的顯示層。
## 2026-05-13 | T4 Config Drift fingerprint repeat-state 已推版
**背景**Config Drift Telegram 卡片只顯示單次 `report_id` 與 HIGH/MEDIUM/INFO 計數Operator 無法判斷是否同一漂移一直重複、已跑到哪個流程階段、是否需要人工。舊 truth-chain repeat 只用 namespace/status/counts 分組,會把「剛好同計數但 items 不同」誤認為同一漂移。

View File

@@ -1958,6 +1958,21 @@ Phase 6 完成後
- 重要校正:舊 count-based repeat 看到 12 次,新 stable item fingerprint 證實同一漂移 fingerprint 只有 2 次12 次只能稱為同計數候選,不能稱為同一漂移。
- 邊界T4 只補可觀測與重複判定,不做 auto-adopt / rollback / ignore。
**T5 Incident / Approval / Execution reconciliation production verified2026-05-13 台北)**
- `1003fa42 feat(awooop): expose incident reconciliation state` 已推 Gitea mainCode Review run `1940` successCD run `1939` success。
- Deploy marker`631fc220 chore(cd): deploy 1003fa4 [skip ci]`
- Truth-chain 新增 read-only `incident_reconciliation_v1`,不自動關單、不補寫 approval、不重跑 execution只輸出跨表一致性。
- Reconciliation 會回傳 `consistency_status``operator_next_state``facts``mismatches[]`,用於 Operator Console / Telegram 顯示「AI 是否真的處理完成,或必須人工介入」。
- Production `INC-20260512-B6C589` smoke
- `current_stage=manual_required`
- `stage_status=blocked`
- `consistency_status=blocked`
- `operator_next_state=manual_required`
- mismatches`incident_open_after_approval_resolved``approval_approved_without_execution_record``approval_no_action_without_execution``evidence_all_sensors_failed`
- `automation_records=0`
- HealthK3s rollout successhost-local NodePort health 200 / `mock_mode=false`。本機直連 `192.168.0.120:32334` 當下 timeout需另查 workstation-to-node pathcluster 內與 host-local API healthy。
- 邊界T5 只讓矛盾狀態可見;下一段仍需把 reconciliation 結果回推 Telegram / Operator Console UI並處理 root causeexecution / incident closure
---
### 2026-04-20 晚 (台北) — C1-C4 全流程串接 — Playbook 鏈路保護commit de2d34d