docs(logbook): record momo backup ansible apply proof [skip ci]
This commit is contained in:
@@ -84,6 +84,96 @@ git diff --check
|
||||
- 歷史 incident 不會被批次重跑修復;舊資料若缺原始 Telegram message id,新版後續會送 standalone 結果通知,歷史全量補發需另開 backfill 策略避免洗版。
|
||||
- 自動化要宣稱 repair completed,仍必須有有效 repair execution 或 auto-repair execution 證據;診斷、觀察、dry-run、degraded 不再被顯示成修復完成。
|
||||
|
||||
## 2026-05-31|MOMO PostgreSQL backup 接入 AwoooP 失敗通知與受控 Ansible 修復
|
||||
|
||||
**背景**:
|
||||
|
||||
- 使用者要求 AI 自動化、監控、告警、匹配規則、PlayBook、自動修復、KM 必須能從 Telegram / AwoooP / DB 看出實際跑到哪個階段,不能只傳一段看不出真實處置狀態的訊息。
|
||||
- 前一輪已讓 `188-ai-web-readonly.yml` 可透過 production API pod 使用 `ssh_mcp` 執行 read-only Ansible check-mode;本輪接著處理 MOMO PostgreSQL backup 的真實 drift:188 上 `pg_backup.sh` 不可執行,crontab 仍指向舊 `/home/ollama/bin/momo-pg-backup.sh`。
|
||||
- 本輪是「使用者批准後的受控 non-root Ansible apply」,不是全自動修復;不可把這筆證據宣稱成 24h autonomous auto-repair closure。
|
||||
|
||||
**本次調整**:
|
||||
|
||||
- 新增 `infra/ansible/playbooks/188-momo-backup-user.yml`:
|
||||
- `become: false`,只管理 `ollama` 擁有的 `/home/ollama/momo-pro/scripts/*`、`/home/ollama/momo_backups` 與 `ollama` crontab。
|
||||
- 從 API image controller 複製 `scripts/backup/backup-momo-188-pg.sh` 與 `scripts/ops/notify-awoooi-ops.sh` 到 188。
|
||||
- 移除舊 unmanaged cron `/home/ollama/bin/momo-pg-backup.sh`,新增 Ansible 受管 cron:每日 02:00 執行 `/home/ollama/momo-pro/scripts/pg_backup.sh`。
|
||||
- `apps/api/src/services/awooop_ansible_audit_service.py` 新增專用低風險 catalog:
|
||||
- `catalog_id=ansible:188-momo-backup-user`
|
||||
- `risk_level=low`
|
||||
- `auto_apply_enabled=false`
|
||||
- `approval_required=true`
|
||||
- truth-chain 測試改為 MOMO backup 優先匹配專用 playbook,而不是模糊落到整台 `188-ai-web` 大 playbook。
|
||||
- 修正 API image build context 破鏈:
|
||||
- `.dockerignore` 白名單 `scripts/backup/backup-momo-188-pg.sh`。
|
||||
- `.gitea/workflows/cd.yaml` 加入 `scripts/backup/backup-momo-188-pg.sh` 與 `scripts/ops/notify-awoooi-ops.sh` 作為 CD trigger,避免未來 repo 腳本更新但 production image 沒更新。
|
||||
|
||||
**驗證**:
|
||||
|
||||
```text
|
||||
local:
|
||||
ruby YAML parse .gitea/workflows/cd.yaml + 188-momo-backup-user.yml -> yaml ok
|
||||
python3 -m py_compile awooop_ansible_audit_service.py / awooop_ansible_check_mode_service.py / test_awooop_truth_chain_service.py -> pass
|
||||
ruff check ... --select E9,F401,F821 -> pass
|
||||
pytest apps/api/tests/test_awooop_truth_chain_service.py -q -> 44 passed
|
||||
git diff --check -> pass
|
||||
|
||||
Gitea / production deploy:
|
||||
75f6929b fix(awooop): add momo backup user ansible repair -> pushed to gitea main
|
||||
a1696695 fix(ansible): satisfy momo backup playbook lint -> ansible-lint run 3346 success
|
||||
ebd9ca86 fix(api): include momo backup script in runtime image -> pushed to gitea main
|
||||
run 3347 tests: 2291 passed, 23 skipped
|
||||
run 3347 build-and-deploy: success
|
||||
run 3347 post-deploy-checks: cancelled by later push, so final proof uses K8s image + live host + DB evidence
|
||||
run 3348 ai-code-review: success
|
||||
production image at verification:
|
||||
awoooi-api = 192.168.0.110:5000/awoooi/api:ebd9ca865fa9a0af4ebc3470458f94b935805849
|
||||
awoooi-web = 192.168.0.110:5000/awoooi/web:ebd9ca865fa9a0af4ebc3470458f94b935805849
|
||||
|
||||
production Ansible:
|
||||
API pod ansible-playbook --syntax-check 188-momo-backup-user.yml -> pass
|
||||
API pod ansible-playbook --check --diff via ssh_mcp -> success, changed=3
|
||||
controlled apply via API pod -> success
|
||||
automation_operation_log:
|
||||
ansible_check_mode_executed op_id=1430b250-16fa-485b-bb7b-f18c829ff673 status=success returncode=0
|
||||
ansible_apply_executed op_id=08f52074-7ac6-4eb7-affd-a85f1f8eb0be status=success returncode=0 parent=1430b250-16fa-485b-bb7b-f18c829ff673
|
||||
ansible AOL aggregate after apply:
|
||||
ansible_candidate_total=167
|
||||
ansible_check_mode_total=14
|
||||
ansible_apply_total=1
|
||||
ansible_apply_success_total=1
|
||||
momo_apply_total=1
|
||||
|
||||
188 host proof after apply:
|
||||
/home/ollama/momo-pro/scripts/pg_backup.sh owner=ollama:ollama mode=755 size=5982
|
||||
/home/ollama/momo-pro/scripts/notify-awoooi-ops.sh owner=ollama:ollama mode=755
|
||||
/home/ollama/momo_backups owner=ollama:ollama mode=755
|
||||
bash -n pg_backup.sh notify-awoooi-ops.sh -> pass
|
||||
crontab has only Ansible managed MOMO backup cron:
|
||||
#Ansible: AWOOOI momo PostgreSQL daily backup
|
||||
0 2 * * * PATH=... /home/ollama/momo-pro/scripts/pg_backup.sh >> /home/ollama/momo_backups/backup.log 2>&1
|
||||
|
||||
end-to-end backup proof:
|
||||
AWOOI_BACKUP_LOG_STDOUT=1 AWOOI_BACKUP_NOTIFY_SUCCESS=0 /home/ollama/momo-pro/scripts/pg_backup.sh
|
||||
Backup success: momo_analytics_20260531_154931.sql.gz (177M, 33s)
|
||||
backup_log insert success
|
||||
Deleted old backups: 0
|
||||
略過 AwoooP 成功通知;backup-health exporter 作為健康狀態來源
|
||||
file:
|
||||
/home/ollama/momo_backups/momo_analytics_20260531_154931.sql.gz owner=ollama:ollama mode=640 size=177M
|
||||
momo backup_log latest:
|
||||
momo_analytics_20260531_154931.sql.gz | 185144359 bytes | 33s | success
|
||||
```
|
||||
|
||||
**進度邊界**:
|
||||
|
||||
- MOMO PostgreSQL backup 接入 AwoooP 失敗通知 / 受控 Ansible 修復:100%。
|
||||
- AwoooP truth-chain / DB 可追蹤性:約 96.5%;本輪新增 `ansible_apply_executed` 成功證據,但仍需把前端 timeline / Telegram 詳情對 apply row 做更完整 drill-down。
|
||||
- Ansible check-mode runtime:約 88%;`ssh_mcp`、production pod、playbook catalog、DB 回寫都已打通,但仍只有少數 playbook 有專用化 check-mode。
|
||||
- 受控 Ansible apply:約 10%;已有第一筆 low-risk user-approved non-root apply 成功,但仍未開放 autonomous apply。
|
||||
- 24h 完整自動修復 production claim:0%;本輪是使用者批准的受控 apply,不是 AI 自主 apply。
|
||||
- 完整 AI 自動化飛輪:約 65%;下一步應把 `ansible_apply_executed`、backup result、AwoooP timeline、Telegram 詳情/歷史與前端 AwoooP Runs drill-down 串成同一條 operator truth view。
|
||||
|
||||
## 2026-05-31|側邊欄 nav 全語系繁體中文收斂
|
||||
|
||||
**背景**:
|
||||
|
||||
Reference in New Issue
Block a user