docs(logbook): record ansible runtime readiness deploy [skip ci]
This commit is contained in:
@@ -39,6 +39,95 @@ production same-shape smoke:
|
||||
-> writes_incident_state=false, writes_auto_repair_result=false, writes_ticket=false
|
||||
```
|
||||
|
||||
## 2026-05-31|AwoooP Truth Chain Ansible runtime gate 上線
|
||||
|
||||
**背景**:
|
||||
|
||||
- AwoooP truth-chain production summary 顯示 24h 內 `ansible_considered_total=7`、`ansible_candidate_total=33`,但全部只能停在 audit/pending check-mode。
|
||||
- live API pod 已掛載 `/etc/repair-ssh/id_ed25519` 與 `/etc/repair-known-hosts/known_hosts`,但缺少 `ansible-playbook`,導致 `ansible_runtime.can_run_check_mode=false`、`blockers=["ansible_playbook_binary_missing"]`。
|
||||
- 本階段只解除 PlayBook check-mode runtime 前置門檻;不啟用 automatic apply,也不宣稱 full auto-repair 已達成。
|
||||
|
||||
**本次調整**:
|
||||
|
||||
- API image 透過 `ansible-core>=2.16.0,<2.18.0` 提供 `ansible-playbook` runtime。
|
||||
- `AwoooPTruthChainService` 的 Ansible readiness 從「只看 binary/catalog/inventory」補強為同時檢查 repair SSH key 與 known_hosts 是否存在且可讀。
|
||||
- truth-chain quality summary 現在會回報:
|
||||
- `repair_ssh_key_present/readable`
|
||||
- `repair_known_hosts_present/readable`
|
||||
- `can_run_check_mode`
|
||||
- 具體 blocker:`ansible_repair_ssh_key_missing` / `ansible_repair_known_hosts_missing`
|
||||
- 新增單元測試覆蓋「缺 SSH 修復材料要阻擋」與「binary/catalog/inventory/SSH 材料齊全才能 check-mode ready」。
|
||||
|
||||
**部署過程**:
|
||||
|
||||
- `da519423 fix(api): install ansible runtime for truth chain` 推到 Gitea main 後,CD run `3295` 被既有 `tests/test_cs1_auto_execute.py` 的 Python 3.11 event loop 測試相容性問題擋下,build/deploy 被跳過。
|
||||
- 後續主線 `514c201f fix(api-tests): use asyncio run in cs1 tests` 修掉該 CI gate,CD run `3299` 全部成功。
|
||||
- deploy marker:`ca2d95e9 chore(cd): deploy 514c201 [skip ci]`。
|
||||
|
||||
**驗證**:
|
||||
|
||||
```text
|
||||
local targeted pytest:
|
||||
apps/api/tests/test_awooop_truth_chain_service.py -> 29 passed
|
||||
|
||||
local static checks:
|
||||
py_compile awooop_truth_chain_service.py -> pass
|
||||
ruff E9,F401,F821 targeted files -> pass
|
||||
pyproject ansible-core dependency parse -> pass
|
||||
web i18n JSON parse -> pass
|
||||
ansible inventory/playbook YAML parse -> pass
|
||||
git diff --check -> pass
|
||||
|
||||
local docker build:
|
||||
skipped by environment; Docker daemon unavailable on local Mac socket
|
||||
|
||||
production:
|
||||
gitea cd run 3299 -> tests/build-and-deploy/post-deploy-checks success
|
||||
awoooi-api image -> 192.168.0.110:5000/awoooi/api:514c201ff4d3de70b8b2fb1b3b87cfd7ac68f0cf
|
||||
awoooi-worker image -> 192.168.0.110:5000/awoooi/api:514c201ff4d3de70b8b2fb1b3b87cfd7ac68f0cf
|
||||
rollout status awoooi-api/awoooi-worker -> success
|
||||
pod command -v ansible-playbook -> /usr/local/bin/ansible-playbook
|
||||
ansible-playbook --version -> core 2.17.14
|
||||
repair_ssh_key_readable -> true
|
||||
repair_known_hosts_readable -> true
|
||||
/api/v1/health -> healthy
|
||||
```
|
||||
|
||||
**live truth-chain 摘要**:
|
||||
|
||||
```text
|
||||
incident_total=20
|
||||
evaluated_total=20
|
||||
verified_auto_repair_total=0
|
||||
average_score=71.2
|
||||
ansible_considered_total=7
|
||||
ansible_candidate_total=33
|
||||
ansible_check_mode_total=0
|
||||
ansible_apply_total=0
|
||||
ansible_pending_check_mode_total=7
|
||||
ansible_runtime.can_run_check_mode=true
|
||||
ansible_runtime.blockers=[]
|
||||
production_claim.can_claim_full_auto_repair=false
|
||||
production_claim.reason=some_incidents_are_not_auto_repaired_verified
|
||||
```
|
||||
|
||||
**目前整體進度**:
|
||||
|
||||
```text
|
||||
AwoooP truth-chain 可觀測性與真相鏈路:92%
|
||||
PlayBook / Ansible runtime readiness:40% -> 65%
|
||||
PlayBook check-mode 自動驗證:0% -> 仍未啟動
|
||||
PlayBook apply 自動修復:0% -> 仍未啟動
|
||||
AI 自動化管理產品整體:99.35% -> 99.45%
|
||||
full auto-repair production claim:false
|
||||
```
|
||||
|
||||
**下一步**:
|
||||
|
||||
- 針對 `ansible_pending_check_mode_total=7` 建立 safe check-mode worker/job,先只跑 `--check --diff` 並把 stdout/stderr、playbook、inventory、target、source incident 寫回 truth-chain DB。
|
||||
- check-mode 成功且風險分級符合 safe rule 後,再進 approval flow;未經批准前仍不得開啟 apply。
|
||||
- 前端 AwoooP Runs / Work Items 需顯示 `ansible_runtime.can_run_check_mode=true`、pending check-mode 數量與 full auto-repair claim=false,避免 Telegram 告警被誤解成已完成自動修復。
|
||||
|
||||
## 2026-05-31|IwoooS 部署證據去固定化
|
||||
|
||||
**背景**:
|
||||
|
||||
Reference in New Issue
Block a user