From b7b4eb53b5a052b02b0297de97aa4377d0a35fa2 Mon Sep 17 00:00:00 2001 From: Your Name Date: Sun, 31 May 2026 12:43:45 +0800 Subject: [PATCH] docs(logbook): record ansible runtime readiness deploy [skip ci] --- docs/LOGBOOK.md | 89 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 89 insertions(+) diff --git a/docs/LOGBOOK.md b/docs/LOGBOOK.md index eb459a31..64230787 100644 --- a/docs/LOGBOOK.md +++ b/docs/LOGBOOK.md @@ -39,6 +39,95 @@ production same-shape smoke: -> writes_incident_state=false, writes_auto_repair_result=false, writes_ticket=false ``` +## 2026-05-31|AwoooP Truth Chain Ansible runtime gate 上線 + +**背景**: + +- AwoooP truth-chain production summary 顯示 24h 內 `ansible_considered_total=7`、`ansible_candidate_total=33`,但全部只能停在 audit/pending check-mode。 +- live API pod 已掛載 `/etc/repair-ssh/id_ed25519` 與 `/etc/repair-known-hosts/known_hosts`,但缺少 `ansible-playbook`,導致 `ansible_runtime.can_run_check_mode=false`、`blockers=["ansible_playbook_binary_missing"]`。 +- 本階段只解除 PlayBook check-mode runtime 前置門檻;不啟用 automatic apply,也不宣稱 full auto-repair 已達成。 + +**本次調整**: + +- API image 透過 `ansible-core>=2.16.0,<2.18.0` 提供 `ansible-playbook` runtime。 +- `AwoooPTruthChainService` 的 Ansible readiness 從「只看 binary/catalog/inventory」補強為同時檢查 repair SSH key 與 known_hosts 是否存在且可讀。 +- truth-chain quality summary 現在會回報: + - `repair_ssh_key_present/readable` + - `repair_known_hosts_present/readable` + - `can_run_check_mode` + - 具體 blocker:`ansible_repair_ssh_key_missing` / `ansible_repair_known_hosts_missing` +- 新增單元測試覆蓋「缺 SSH 修復材料要阻擋」與「binary/catalog/inventory/SSH 材料齊全才能 check-mode ready」。 + +**部署過程**: + +- `da519423 fix(api): install ansible runtime for truth chain` 推到 Gitea main 後,CD run `3295` 被既有 `tests/test_cs1_auto_execute.py` 的 Python 3.11 event loop 測試相容性問題擋下,build/deploy 被跳過。 +- 後續主線 `514c201f fix(api-tests): use asyncio run in cs1 tests` 修掉該 CI gate,CD run `3299` 全部成功。 +- deploy marker:`ca2d95e9 chore(cd): deploy 514c201 [skip ci]`。 + +**驗證**: + +```text +local targeted pytest: + apps/api/tests/test_awooop_truth_chain_service.py -> 29 passed + +local static checks: + py_compile awooop_truth_chain_service.py -> pass + ruff E9,F401,F821 targeted files -> pass + pyproject ansible-core dependency parse -> pass + web i18n JSON parse -> pass + ansible inventory/playbook YAML parse -> pass + git diff --check -> pass + +local docker build: + skipped by environment; Docker daemon unavailable on local Mac socket + +production: + gitea cd run 3299 -> tests/build-and-deploy/post-deploy-checks success + awoooi-api image -> 192.168.0.110:5000/awoooi/api:514c201ff4d3de70b8b2fb1b3b87cfd7ac68f0cf + awoooi-worker image -> 192.168.0.110:5000/awoooi/api:514c201ff4d3de70b8b2fb1b3b87cfd7ac68f0cf + rollout status awoooi-api/awoooi-worker -> success + pod command -v ansible-playbook -> /usr/local/bin/ansible-playbook + ansible-playbook --version -> core 2.17.14 + repair_ssh_key_readable -> true + repair_known_hosts_readable -> true + /api/v1/health -> healthy +``` + +**live truth-chain 摘要**: + +```text +incident_total=20 +evaluated_total=20 +verified_auto_repair_total=0 +average_score=71.2 +ansible_considered_total=7 +ansible_candidate_total=33 +ansible_check_mode_total=0 +ansible_apply_total=0 +ansible_pending_check_mode_total=7 +ansible_runtime.can_run_check_mode=true +ansible_runtime.blockers=[] +production_claim.can_claim_full_auto_repair=false +production_claim.reason=some_incidents_are_not_auto_repaired_verified +``` + +**目前整體進度**: + +```text +AwoooP truth-chain 可觀測性與真相鏈路:92% +PlayBook / Ansible runtime readiness:40% -> 65% +PlayBook check-mode 自動驗證:0% -> 仍未啟動 +PlayBook apply 自動修復:0% -> 仍未啟動 +AI 自動化管理產品整體:99.35% -> 99.45% +full auto-repair production claim:false +``` + +**下一步**: + +- 針對 `ansible_pending_check_mode_total=7` 建立 safe check-mode worker/job,先只跑 `--check --diff` 並把 stdout/stderr、playbook、inventory、target、source incident 寫回 truth-chain DB。 +- check-mode 成功且風險分級符合 safe rule 後,再進 approval flow;未經批准前仍不得開啟 apply。 +- 前端 AwoooP Runs / Work Items 需顯示 `ansible_runtime.can_run_check_mode=true`、pending check-mode 數量與 full auto-repair claim=false,避免 Telegram 告警被誤解成已完成自動修復。 + ## 2026-05-31|IwoooS 部署證據去固定化 **背景**: