docs(logbook): record momo backup ansible apply proof [skip ci]

This commit is contained in:
Your Name
2026-05-31 15:52:40 +08:00
parent 5ed5022cd7
commit 8d9525fb3b

View File

@@ -84,6 +84,96 @@ git diff --check
- 歷史 incident 不會被批次重跑修復;舊資料若缺原始 Telegram message id新版後續會送 standalone 結果通知,歷史全量補發需另開 backfill 策略避免洗版。
- 自動化要宣稱 repair completed仍必須有有效 repair execution 或 auto-repair execution 證據診斷、觀察、dry-run、degraded 不再被顯示成修復完成。
## 2026-05-31MOMO PostgreSQL backup 接入 AwoooP 失敗通知與受控 Ansible 修復
**背景**
- 使用者要求 AI 自動化、監控、告警、匹配規則、PlayBook、自動修復、KM 必須能從 Telegram / AwoooP / DB 看出實際跑到哪個階段,不能只傳一段看不出真實處置狀態的訊息。
- 前一輪已讓 `188-ai-web-readonly.yml` 可透過 production API pod 使用 `ssh_mcp` 執行 read-only Ansible check-mode本輪接著處理 MOMO PostgreSQL backup 的真實 drift188 上 `pg_backup.sh` 不可執行crontab 仍指向舊 `/home/ollama/bin/momo-pg-backup.sh`
- 本輪是「使用者批准後的受控 non-root Ansible apply」不是全自動修復不可把這筆證據宣稱成 24h autonomous auto-repair closure。
**本次調整**
- 新增 `infra/ansible/playbooks/188-momo-backup-user.yml`
- `become: false`,只管理 `ollama` 擁有的 `/home/ollama/momo-pro/scripts/*``/home/ollama/momo_backups``ollama` crontab。
- 從 API image controller 複製 `scripts/backup/backup-momo-188-pg.sh``scripts/ops/notify-awoooi-ops.sh` 到 188。
- 移除舊 unmanaged cron `/home/ollama/bin/momo-pg-backup.sh`,新增 Ansible 受管 cron每日 02:00 執行 `/home/ollama/momo-pro/scripts/pg_backup.sh`
- `apps/api/src/services/awooop_ansible_audit_service.py` 新增專用低風險 catalog
- `catalog_id=ansible:188-momo-backup-user`
- `risk_level=low`
- `auto_apply_enabled=false`
- `approval_required=true`
- truth-chain 測試改為 MOMO backup 優先匹配專用 playbook而不是模糊落到整台 `188-ai-web` 大 playbook。
- 修正 API image build context 破鏈:
- `.dockerignore` 白名單 `scripts/backup/backup-momo-188-pg.sh`
- `.gitea/workflows/cd.yaml` 加入 `scripts/backup/backup-momo-188-pg.sh``scripts/ops/notify-awoooi-ops.sh` 作為 CD trigger避免未來 repo 腳本更新但 production image 沒更新。
**驗證**
```text
local:
ruby YAML parse .gitea/workflows/cd.yaml + 188-momo-backup-user.yml -> yaml ok
python3 -m py_compile awooop_ansible_audit_service.py / awooop_ansible_check_mode_service.py / test_awooop_truth_chain_service.py -> pass
ruff check ... --select E9,F401,F821 -> pass
pytest apps/api/tests/test_awooop_truth_chain_service.py -q -> 44 passed
git diff --check -> pass
Gitea / production deploy:
75f6929b fix(awooop): add momo backup user ansible repair -> pushed to gitea main
a1696695 fix(ansible): satisfy momo backup playbook lint -> ansible-lint run 3346 success
ebd9ca86 fix(api): include momo backup script in runtime image -> pushed to gitea main
run 3347 tests: 2291 passed, 23 skipped
run 3347 build-and-deploy: success
run 3347 post-deploy-checks: cancelled by later push, so final proof uses K8s image + live host + DB evidence
run 3348 ai-code-review: success
production image at verification:
awoooi-api = 192.168.0.110:5000/awoooi/api:ebd9ca865fa9a0af4ebc3470458f94b935805849
awoooi-web = 192.168.0.110:5000/awoooi/web:ebd9ca865fa9a0af4ebc3470458f94b935805849
production Ansible:
API pod ansible-playbook --syntax-check 188-momo-backup-user.yml -> pass
API pod ansible-playbook --check --diff via ssh_mcp -> success, changed=3
controlled apply via API pod -> success
automation_operation_log:
ansible_check_mode_executed op_id=1430b250-16fa-485b-bb7b-f18c829ff673 status=success returncode=0
ansible_apply_executed op_id=08f52074-7ac6-4eb7-affd-a85f1f8eb0be status=success returncode=0 parent=1430b250-16fa-485b-bb7b-f18c829ff673
ansible AOL aggregate after apply:
ansible_candidate_total=167
ansible_check_mode_total=14
ansible_apply_total=1
ansible_apply_success_total=1
momo_apply_total=1
188 host proof after apply:
/home/ollama/momo-pro/scripts/pg_backup.sh owner=ollama:ollama mode=755 size=5982
/home/ollama/momo-pro/scripts/notify-awoooi-ops.sh owner=ollama:ollama mode=755
/home/ollama/momo_backups owner=ollama:ollama mode=755
bash -n pg_backup.sh notify-awoooi-ops.sh -> pass
crontab has only Ansible managed MOMO backup cron:
#Ansible: AWOOOI momo PostgreSQL daily backup
0 2 * * * PATH=... /home/ollama/momo-pro/scripts/pg_backup.sh >> /home/ollama/momo_backups/backup.log 2>&1
end-to-end backup proof:
AWOOI_BACKUP_LOG_STDOUT=1 AWOOI_BACKUP_NOTIFY_SUCCESS=0 /home/ollama/momo-pro/scripts/pg_backup.sh
Backup success: momo_analytics_20260531_154931.sql.gz (177M, 33s)
backup_log insert success
Deleted old backups: 0
略過 AwoooP 成功通知backup-health exporter 作為健康狀態來源
file:
/home/ollama/momo_backups/momo_analytics_20260531_154931.sql.gz owner=ollama:ollama mode=640 size=177M
momo backup_log latest:
momo_analytics_20260531_154931.sql.gz | 185144359 bytes | 33s | success
```
**進度邊界**
- MOMO PostgreSQL backup 接入 AwoooP 失敗通知 / 受控 Ansible 修復100%。
- AwoooP truth-chain / DB 可追蹤性:約 96.5%;本輪新增 `ansible_apply_executed` 成功證據,但仍需把前端 timeline / Telegram 詳情對 apply row 做更完整 drill-down。
- Ansible check-mode runtime約 88%`ssh_mcp`、production pod、playbook catalog、DB 回寫都已打通,但仍只有少數 playbook 有專用化 check-mode。
- 受控 Ansible apply約 10%;已有第一筆 low-risk user-approved non-root apply 成功,但仍未開放 autonomous apply。
- 24h 完整自動修復 production claim0%;本輪是使用者批准的受控 apply不是 AI 自主 apply。
- 完整 AI 自動化飛輪:約 65%;下一步應把 `ansible_apply_executed`、backup result、AwoooP timeline、Telegram 詳情/歷史與前端 AwoooP Runs drill-down 串成同一條 operator truth view。
## 2026-05-31側邊欄 nav 全語系繁體中文收斂
**背景**