docs(logbook): record ansible runtime readiness deploy [skip ci]

This commit is contained in:
Your Name
2026-05-31 12:43:45 +08:00
parent 83e27fa2b2
commit b7b4eb53b5

View File

@@ -39,6 +39,95 @@ production same-shape smoke:
-> writes_incident_state=false, writes_auto_repair_result=false, writes_ticket=false
```
## 2026-05-31AwoooP Truth Chain Ansible runtime gate 上線
**背景**
- AwoooP truth-chain production summary 顯示 24h 內 `ansible_considered_total=7``ansible_candidate_total=33`,但全部只能停在 audit/pending check-mode。
- live API pod 已掛載 `/etc/repair-ssh/id_ed25519``/etc/repair-known-hosts/known_hosts`,但缺少 `ansible-playbook`,導致 `ansible_runtime.can_run_check_mode=false``blockers=["ansible_playbook_binary_missing"]`
- 本階段只解除 PlayBook check-mode runtime 前置門檻;不啟用 automatic apply也不宣稱 full auto-repair 已達成。
**本次調整**
- API image 透過 `ansible-core>=2.16.0,<2.18.0` 提供 `ansible-playbook` runtime。
- `AwoooPTruthChainService` 的 Ansible readiness 從「只看 binary/catalog/inventory」補強為同時檢查 repair SSH key 與 known_hosts 是否存在且可讀。
- truth-chain quality summary 現在會回報:
- `repair_ssh_key_present/readable`
- `repair_known_hosts_present/readable`
- `can_run_check_mode`
- 具體 blocker`ansible_repair_ssh_key_missing` / `ansible_repair_known_hosts_missing`
- 新增單元測試覆蓋「缺 SSH 修復材料要阻擋」與「binary/catalog/inventory/SSH 材料齊全才能 check-mode ready」。
**部署過程**
- `da519423 fix(api): install ansible runtime for truth chain` 推到 Gitea main 後CD run `3295` 被既有 `tests/test_cs1_auto_execute.py` 的 Python 3.11 event loop 測試相容性問題擋下build/deploy 被跳過。
- 後續主線 `514c201f fix(api-tests): use asyncio run in cs1 tests` 修掉該 CI gateCD run `3299` 全部成功。
- deploy marker`ca2d95e9 chore(cd): deploy 514c201 [skip ci]`
**驗證**
```text
local targeted pytest:
apps/api/tests/test_awooop_truth_chain_service.py -> 29 passed
local static checks:
py_compile awooop_truth_chain_service.py -> pass
ruff E9,F401,F821 targeted files -> pass
pyproject ansible-core dependency parse -> pass
web i18n JSON parse -> pass
ansible inventory/playbook YAML parse -> pass
git diff --check -> pass
local docker build:
skipped by environment; Docker daemon unavailable on local Mac socket
production:
gitea cd run 3299 -> tests/build-and-deploy/post-deploy-checks success
awoooi-api image -> 192.168.0.110:5000/awoooi/api:514c201ff4d3de70b8b2fb1b3b87cfd7ac68f0cf
awoooi-worker image -> 192.168.0.110:5000/awoooi/api:514c201ff4d3de70b8b2fb1b3b87cfd7ac68f0cf
rollout status awoooi-api/awoooi-worker -> success
pod command -v ansible-playbook -> /usr/local/bin/ansible-playbook
ansible-playbook --version -> core 2.17.14
repair_ssh_key_readable -> true
repair_known_hosts_readable -> true
/api/v1/health -> healthy
```
**live truth-chain 摘要**
```text
incident_total=20
evaluated_total=20
verified_auto_repair_total=0
average_score=71.2
ansible_considered_total=7
ansible_candidate_total=33
ansible_check_mode_total=0
ansible_apply_total=0
ansible_pending_check_mode_total=7
ansible_runtime.can_run_check_mode=true
ansible_runtime.blockers=[]
production_claim.can_claim_full_auto_repair=false
production_claim.reason=some_incidents_are_not_auto_repaired_verified
```
**目前整體進度**
```text
AwoooP truth-chain 可觀測性與真相鏈路92%
PlayBook / Ansible runtime readiness40% -> 65%
PlayBook check-mode 自動驗證0% -> 仍未啟動
PlayBook apply 自動修復0% -> 仍未啟動
AI 自動化管理產品整體99.35% -> 99.45%
full auto-repair production claimfalse
```
**下一步**
- 針對 `ansible_pending_check_mode_total=7` 建立 safe check-mode worker/job先只跑 `--check --diff` 並把 stdout/stderr、playbook、inventory、target、source incident 寫回 truth-chain DB。
- check-mode 成功且風險分級符合 safe rule 後,再進 approval flow未經批准前仍不得開啟 apply。
- 前端 AwoooP Runs / Work Items 需顯示 `ansible_runtime.can_run_check_mode=true`、pending check-mode 數量與 full auto-repair claim=false避免 Telegram 告警被誤解成已完成自動修復。
## 2026-05-31IwoooS 部署證據去固定化
**背景**