diff --git a/docs/LOGBOOK.md b/docs/LOGBOOK.md index 92d29dad..1e1d2aa0 100644 --- a/docs/LOGBOOK.md +++ b/docs/LOGBOOK.md @@ -1,3 +1,25 @@ +## 2026-06-25|09:05 live cold-start / backup / MOMO token read-only refresh + +**背景**:2026-06-24 23:33 已確認 SOP 能正確阻擋 MOMO source absence,但今天已是 2026-06-25,不能沿用昨日證據。本輪只做 read-only refresh 與文件同步,不碰 Wazuh / 112,不做 Docker、Nginx、firewall、K8s、ArgoCD 或 host runtime 寫操作。 + +**Read-only evidence**: +- Repo / Gitea baseline before this refresh:`bb2ad032 docs(ops): record 23:33 cold-start readback [skip ci]`,controlled AWOOOI workspace clean on `codex/awoooi-current-main-dev-base-20260624`。 +- `scripts/reboot-recovery/full-stack-cold-start-check.sh --monitor-read-only --no-color --watch --interval 1 --max-attempts 1` at `2026-06-25 09:05:37 CST` returned expected exit code `2` with `PASS=87 WARN=1 BLOCKED=1`。 +- Hosts / K3s:110 / 120 / 121 / 188 ping and SSH port OK;K3s `mon` / `mon1` both `Ready control-plane`;VIP `192.168.0.125` present;node filesystem / disk-pressure / readonly events `0`;latest `km-vectorize-29705460-55rgs` completed about 6h before the check。 +- Public routes direct smoke:`awoooi API=200`、`/zh-TW/iwooos=200`、`vibework=200`、`awooogo=200`、`mo health=200`、`stock=200`、`gitea=200`、`harbor=200`、`registry /v2=401`、`sentry=200`、`signoz=200`、`langfuse=200`、`bitan=200`、`aiops=200`。 +- AWOOOI API health:`status=healthy`、`environment=prod`、`mock_mode=false`;postgresql / redis / openclaw / signoz / gcp ollama providers are up;`ollama_local` was in a short cooldown and is not the current release blocker。 +- MOMO service health:`https://mo.wooo.work/health` returned healthy / PostgreSQL / `V10.655`。 +- MOMO data / scheduler:`MOMO_MONTHLY_SYNC 10936|10936|2026-06-01|2026-06-17|2026-06-01|2026-06-17` remains green;`MOMO_DAILY_FRESHNESS 8|2026-06-17` remains a hard blocker;latest job `56 completed` still has `sync_success=true` and bounds `2026-06-01..2026-06-17`。 +- MOMO Google Drive token metadata check, without reading token content:host path `/home/ollama/momo-pro/config/google_token.json` is `missing`; container-side `config/google_token.json` is also `missing`; scheduler process runs as UID/GID `100000:100000`。This matches the cold-start WARN `188 momo Google Drive token ownership/writeback not confirmed` and is separate from the hard data-freshness blocker。 +- Backup read-only status from 110 `/backup/scripts/backup-status.sh --no-notify --no-refresh` at 09:05:110 `13/13 fresh failed=0`、188 `2/2 fresh failed=0`、`core_blockers=0`、`integrity_stale=0`、`offsite_fresh=1`、`rclone_gdrive_fresh=1`、`escrow_missing=5`、last aggregate `2026-06-25 02:35:09`。 + +**判定**: +- SOP 仍有效:它正確區分 hosts/routes/K3s/backups/service health 已恢復,以及 MOMO business data freshness / source evidence 仍 blocked;沒有被網站 200、MOMO health 200、DB parity 或 backup green 誤判成 full-stack green。 +- 可宣稱:核心主機、K3s、public routes、AWOOOI API health、MOMO service health、backup/offsite surfaces are available for this read-only evidence set。 +- 不可宣稱:full-stack green、MOMO data current、DR complete、credential escrow complete、或 110 live monitor 已同步 repo v1.42。Google Drive token missing / writeback not confirmed 也不可用猜測或讀 token 的方式補證。 + +**邊界**:本輪沒有主機寫入、沒有 `scp` live script、沒有 Docker / Nginx / firewall / K8s / ArgoCD 操作、沒有 Wazuh / 112 / SOC 修改、沒有使用聊天中的密碼,也沒有讀取或保存 secret。 + ## 2026-06-25|Wazuh manager registry 匯出預檢正式讀回 **背景**:使用者追問 Wazuh 仍未把所有主機納入監控、原本納管用戶端為何消失,以及前台不應顯示工作視窗、內部位址、repo owner 或主機直白名稱。本輪在不碰 Wazuh runtime、主機、Docker、Nginx、K8s、firewall、secret 或 active scan 的前提下,把「manager registry truth 應如何交付」補成可驗收、可拒收、可前台讀回的脫敏收件 Gate。 diff --git a/docs/runbooks/BACKUP-STATUS.md b/docs/runbooks/BACKUP-STATUS.md index df38f106..91d7cc88 100644 --- a/docs/runbooks/BACKUP-STATUS.md +++ b/docs/runbooks/BACKUP-STATUS.md @@ -15,6 +15,31 @@ > 2026-06-24 23:04 Codex cold-start gate refresh: repo-side v1.42 dry-run now emits MOMO source-absence evidence and blocks with `188 momo source file absent while daily sales data stale`; backup/offsite remains green and live 110 script deployment is not claimed. > 2026-06-24 23:15 Codex live-sync gate readback: read-only deploy parity check correctly blocks because repo cold-start hash `f60b81029969a527dc742ebc9558d2933f11fe24ec4f46f7a7bc6637759b7b05` differs from 110 live hash `10608873d406911a519afa96218abebc2b85ab6123bdf46b6e21eb269e554bb8`; installer remains a live write requiring explicit approval. > 2026-06-24 23:33 Codex backup readback: 110 `13/13 fresh failed=0`, 188 `2/2 fresh failed=0`, `core_blockers=0`, `integrity_stale=0`, `offsite_fresh=1`, `rclone_gdrive_fresh=1`, `escrow_missing=5`; live cold-start still blocks only on MOMO source absence / data freshness, not backup. +> 2026-06-25 09:05 Codex backup readback: 110 `13/13 fresh failed=0`, 188 `2/2 fresh failed=0`, `core_blockers=0`, `integrity_stale=0`, `offsite_fresh=1`, `rclone_gdrive_fresh=1`, `escrow_missing=5`; live cold-start is `PASS=87 WARN=1 BLOCKED=1` because MOMO business data is stale and MOMO Google Drive token metadata is missing. + +--- + +## 2026-06-25 09:05 Backup / Offsite / Escrow Live Status + +Read-only command: `/backup/scripts/backup-status.sh --no-notify --no-refresh` from 110 at 09:05 Asia/Taipei. + +- 110 backup health: `13/13 fresh failed=0`。 +- 188 backup health: `2/2 fresh failed=0`。 +- Integrity / configured blockers: `core_blockers=0`、`dr_warnings=5`、`configured_missing_110=0`、`configured_missing_188=0`、`script_missing_110=0`、`script_missing_188=0`、`integrity_stale=0`。 +- Offsite / GDrive freshness: `offsite_configured=1`、`offsite_fresh=1`、`rclone_gdrive_configured=1`、`rclone_gdrive_fresh=1`。 +- Last aggregate backup: `2026-06-25 02:35:09`。 +- DR blocker remains: `escrow_missing=5`,不得偽造 evidence marker,也不得貼 secret value / hash / partial token。 +- Full-stack service release blocker remains separate: cold-start `PASS=87 WARN=1 BLOCKED=1`,原因是 MOMO business data freshness stale (`MOMO_DAILY_FRESHNESS 8|2026-06-17`) plus Google Drive token metadata missing / writeback not confirmed。這不是 backup freshness failure。 + +| Gate | Status | Evidence | +|------|--------|----------| +| 110 backup freshness | VERIFIED | 13/13 fresh, failed count 0. | +| 188 backup freshness | VERIFIED | 2/2 fresh, failed count 0. | +| Offsite / GDrive freshness | VERIFIED | `offsite_fresh=1`, `rclone_gdrive_fresh=1`. | +| Backup core blockers | GREEN | `core_blockers=0`. | +| Credential escrow | BLOCKED | `escrow_missing=5`; only real non-secret owner evidence may close this. | +| MOMO Drive token metadata | WARN | Host and container token metadata paths are missing; no token content was read. | +| Service full green | NO-GO | Blocked by MOMO source absence / data freshness; token metadata warning also requires owner-gated evidence. | --- diff --git a/docs/runbooks/FULL-STACK-COLD-START-SOP.md b/docs/runbooks/FULL-STACK-COLD-START-SOP.md index ff2f152c..cd5ccc24 100644 --- a/docs/runbooks/FULL-STACK-COLD-START-SOP.md +++ b/docs/runbooks/FULL-STACK-COLD-START-SOP.md @@ -1,7 +1,7 @@ # AWOOOI 全棧冷啟動與主機重啟 SOP -> Version: v1.44 -> Last updated: 2026-06-24 Asia/Taipei +> Version: v1.45 +> Last updated: 2026-06-25 Asia/Taipei > Scope: 110 / 120 / 121 / 188 full-stack reboot recovery. 112 Kali is recorded as P3 optional and is not part of this recovery path. --- @@ -10,23 +10,23 @@ 本節是每次接手、開機、關機、重啟後的第一個判定錨點。若日期不是今天,必須先重跑 live check,再更新本節與 `docs/workplans/2026-06-04-reboot-cold-start-backup-recovery-workplan.md`。 -2026-06-24 23:33 live read-only refresh supersedes the earlier 22:16 / 23:04 scorecard wording. It confirms the SOP is behaving correctly: hosts, routes, K3s, AWOOOI API health, MOMO service health, and backup/offsite are available, while full-stack release remains blocked only by MOMO source absence / business data freshness and DR remains blocked by missing credential escrow evidence. The 110 live script parity blocker from 23:15 still applies. +2026-06-25 09:05 live read-only refresh supersedes the 2026-06-24 23:33 scorecard wording. It confirms the SOP is behaving correctly: hosts, routes, K3s, AWOOOI API health, MOMO service health, and backup/offsite are available, while full-stack release remains blocked by MOMO source absence / business data freshness and DR remains blocked by missing credential escrow evidence. A new WARN shows MOMO Google Drive token ownership/writeback is not confirmed because the token metadata path is missing from both host and container views; this must be handled as a separate evidence / owner gate and not by reading token contents. The 110 live script parity blocker from 23:15 still applies. ```text Repo-side reboot SOP / Plan B / automation contracts: COMPLETE, 100%. -Live cold-start read-only check: 2026-06-24 23:33 PASS=88 WARN=0 BLOCKED=1, Result=BLOCKED. +Live cold-start read-only check: 2026-06-25 09:05 PASS=87 WARN=1 BLOCKED=1, Result=BLOCKED. Repo-side cold-start v1.42 live read-only run: New MOMO fields remain MOMO_SOURCE_EMPTY_EVIDENCE_LINES=21, MOMO_IMPORT_CONFIG=當日業績匯入|即時業績_當日, MOMO_LATEST_IMPORT_JOB=56|completed|即時業績_當日.xlsx|2026-06-18T11:41:00.853176|2026-06-18T11:42:02.309425|10936|10936|0. The only BLOCKED text is "188 momo source file absent while daily sales data stale". Live 110 script sync is not claimed until a separate approved deployment/sync happens. 110 live-sync parity: 2026-06-24 23:15 read-only `verify-cold-start-monitor-deploy.sh` correctly BLOCKED because repo script hash `f60b81029969a527dc742ebc9558d2933f11fe24ec4f46f7a7bc6637759b7b05` differs from 110 live hash `10608873d406911a519afa96218abebc2b85ab6123bdf46b6e21eb269e554bb8`. Do not use live 110 monitor output to prove v1.42 behavior until the approved live-sync gate in §13.3.1 passes. Service state: SERVICE_AVAILABLE_MOMO_SOURCE_BLOCKED_DR_ESCROW_BLOCKED; 110/120/121/188 reachable, K3s mon/mon1 Ready, ArgoCD awoooi-prod Synced/Healthy at revision 7db7800e399caed5487a705c81ec993dec76c70f, public routes/TLS green, 110/188 backup health fresh, 188 node-exporter / PostgreSQL exporter / Redis exporter restored, 188 MinIO endpoint and Velero BackupStorageLocation restored, 110 disk pressure cleared. -Runtime release state: API/Web/Worker are ready; production API health returns healthy with `environment=prod`, `mock_mode=false`, and postgresql / redis / openclaw / signoz / ollama providers all up. 23:33 redirect-followed route batch shows awoooi API=200, `/zh-TW/iwooos`=200, vibework=200, awoooogo=200, momo health=200, stock=200, bitan=200, gitea=200, harbor=200, sentry=200, signoz=200, langfuse=200, aiops=200, registry /v2=401; cold-start raw route gate still records expected redirect statuses such as awoooi web=307, momo web=302, sentry=302. -MOMO release state: mo.wooo.work health is healthy on version V10.653. Gitea main fast-forwarded to 84035906aba0e5e190d031a13cfd9b47a8cd1f73 and Gitea Actions cd.yaml #904 completed Success. 188 live source contains the production marker `def _table_columns`, `業績分析儀表板同步失敗`, and `保留來源檔案等待重試,不移動 Google Drive 檔案`, proving the import-boundary fix is deployed. Mac Mini and MacBook Pro controlled Codex workspaces are both on branch codex/momo-current-main-dev-base-20260624 at commit 84035906aba0e5e190d031a13cfd9b47a8cd1f73 with dirty=0. -MOMO data state: full-table read-only DB query shows `daily_sales_snapshot=104614 rows, 2025/07/01..2026/06/17` and `realtime_sales_monthly=786621 rows, 2024/01/01..2026/06/17`. Current-month daily_sales_snapshot and realtime_sales_monthly match, but both stop at 2026-06-17. MOMO_DAILY_FRESHNESS is 7 days, which is a hard blocker because business data is not current. -Google Drive / source-file state: momo scheduler token ownership is fixed for Docker userns, container-side Drive listing works, and import config is `gdrive_folder_path=當日業績匯入`, `gdrive_file_pattern=即時業績_當日`; however scheduler stats and logs show repeated AutoImport runs with `file_count=0`, `imported_count=0`, including 2026-06-24 21:56 where the folder had `0` matching Excel files. Latest import job 56 was already completed on 2026-06-18 with `sync_success=true`, `source_file=即時業績_當日.xlsx`, and bounds `2026-06-01..2026-06-17`. Mac Mini and MacBook candidate spreadsheets were also read-only inspected: the local current daily candidate only contains 2025-07-01..2025-07-02, the iCloud full-month candidate only contains 2025-06-01..2025-06-30, and MacBook candidates are either header-only or the same 2025-07-01..2025-07-02 dataset. These are not legitimate newer sources. -Backup / monitoring state: backup-status core blockers are 0, 110 is 13/13 fresh failed=0, 188 is 2/2 fresh failed=0, offsite_fresh=1, rclone_gdrive_fresh=1, integrity_stale=0, last aggregate is 2026-06-24 02:28:39, 188 MinIO is healthy, Velero BackupStorageLocation default is Available, backup-health textfile reports Velero freshness green, PostgreSQL / Redis exporters are green, 188 nginx-exporter is restored with nginx_up=1, monitoring coverage is 14/14 jobs UP, and VeleroBackupNotRun / PostgreSQLDown / RedisDown / disk-pressure / nginx-exporter target-down evidence is resolved. 23:33 backup-status --no-notify --no-refresh reports 110 13/13 fresh failed=0, 188 2/2 fresh failed=0, core_blockers=0, integrity_stale=0, offsite_fresh=1, rclone_gdrive_fresh=1, escrow_missing=5. +Runtime release state: API/Web/Worker are ready; production API health returns healthy with `environment=prod`, `mock_mode=false`, and postgresql / redis / openclaw / signoz / gcp ollama providers up. `ollama_local` had a short cooldown in the direct health JSON and is not the current release blocker. 09:05 route smoke shows awoooi API=200, `/zh-TW/iwooos`=200, vibework=200, awoooogo=200, momo health=200, stock=200, bitan=200, gitea=200, harbor=200, sentry=200, signoz=200, langfuse=200, aiops=200, registry /v2=401; cold-start raw route gate still records expected redirect statuses such as awoooi web=307 and sentry=302. +MOMO release state: mo.wooo.work health is healthy on version V10.655. Gitea main fast-forwarded to 84035906aba0e5e190d031a13cfd9b47a8cd1f73 and Gitea Actions cd.yaml #904 completed Success for the import-boundary fix; later version readback confirms the service is current enough to separate release health from business-data freshness. +MOMO data state: current-month daily_sales_snapshot and realtime_sales_monthly still match, but both stop at 2026-06-17: `MOMO_MONTHLY_SYNC 10936|10936|2026-06-01|2026-06-17|2026-06-01|2026-06-17`. `MOMO_DAILY_FRESHNESS 8|2026-06-17` is a hard blocker because business data is not current. +Google Drive / source-file state: 2026-06-25 09:05 cold-start reports `MOMO_GDRIVE_TOKEN_STAT missing scheduler_uid=100000`; direct metadata-only readback confirms host path `/home/ollama/momo-pro/config/google_token.json` is missing and container-side `config/google_token.json` is missing, while the scheduler process runs as UID/GID `100000:100000`. Do not read token content and do not recreate/chown token evidence without an explicit maintenance-window / owner approval. The data blocker remains source absence / stale business data; the token missing state is a separate WARN until owner-provided token/writeback evidence is restored. +Backup / monitoring state: backup-status core blockers are 0, 110 is 13/13 fresh failed=0, 188 is 2/2 fresh failed=0, offsite_fresh=1, rclone_gdrive_fresh=1, integrity_stale=0, last aggregate is 2026-06-25 02:35:09. 09:05 backup-status --no-notify --no-refresh reports 110 13/13 fresh failed=0, 188 2/2 fresh failed=0, core_blockers=0, integrity_stale=0, offsite_fresh=1, rclone_gdrive_fresh=1, escrow_missing=5. Notification-noise state: healthy AWOOOI heartbeat is suppressed; heartbeat warning dedupe uses stable actionable fingerprints so HTTP status / timeout / latency drift does not create a new Telegram event every 30 minutes; MOMO Pro monitor uses https://mo.wooo.work/health as primary truth and no longer checks the 188 root path; MoWoooWorkDown now labels component=momo-pro-system and requires public/local/container/data-freshness triage instead of blind restart; docker-health-monitor keeps 5-minute repair cadence but has a separate 30-minute Telegram fallback cooldown; Bitan public-content check keeps failure alerting with same-fingerprint cooldown and one recovery notice. Monitoring coverage recovery state: if CD post-deploy fails only because `scripts/generate_monitoring.py --check` reports `nginx-exporter` down on `192.168.0.188:9113`, first verify 188 `stub_status` and restore the stateless exporter with `scripts/ops/188-nginx-exporter-restore.sh`; do not reload Nginx or restart product containers for this symptom. -Allowed declaration: core hosts, routes, K3s, backup/exporter surfaces are recovered; MOMO production code release includes the import-boundary fix at Gitea main 84035906aba0; both controlled Codex workspaces are aligned on the same MOMO fix branch; MOMO data pipeline is blocked waiting for a newer source file or owner-provided source evidence. -Forbidden declaration: full-stack green, MOMO data current, DR complete, or runtime/security acceptance. Credential escrow evidence is still missing and must not be forged. +Allowed declaration: core hosts, routes, K3s, backup/exporter surfaces, AWOOOI API health, and MOMO service health are available for the latest read-only evidence set; MOMO production code includes the import-boundary fix; MOMO data pipeline is blocked waiting for a newer source file or owner-provided source evidence; MOMO Google Drive token/writeback evidence needs a separate non-secret owner-gated repair. +Forbidden declaration: full-stack green, MOMO data current, DR complete, credential escrow complete, 110 live monitor synced, or runtime/security acceptance. Credential escrow evidence is still missing and must not be forged. ``` 2026-06-24 22:17 Codex workstation continuity readback: diff --git a/docs/workplans/2026-06-04-reboot-cold-start-backup-recovery-workplan.md b/docs/workplans/2026-06-04-reboot-cold-start-backup-recovery-workplan.md index 6b514793..72ef19a9 100644 --- a/docs/workplans/2026-06-04-reboot-cold-start-backup-recovery-workplan.md +++ b/docs/workplans/2026-06-04-reboot-cold-start-backup-recovery-workplan.md @@ -11,13 +11,13 @@ | Area | Status | Completion | Evidence | |------|--------|------------|----------| -| Overall recovery readiness | SERVICE_AVAILABLE_MOMO_SOURCE_BLOCKED_DR_ESCROW_BLOCKED | 98% | 2026-06-24 23:33 live cold-start returned `PASS=88 WARN=0 BLOCKED=1`, result `BLOCKED` because MOMO business data freshness remains stale and no newer legitimate source file is present. 110 / 120 / 121 / 188 ping and SSH port are OK, K3s `mon` / `mon1` are Ready, public routes/TLS are green, 110 / 188 runtime and backup checks are green。188 `node-exporter`、PostgreSQL exporter、Redis exporter、`nginx-exporter`、MinIO / Velero BSL are restored; monitoring coverage is now `14/14 UP`; 110 disk pressure cleared。Remaining service blocker is MOMO business data freshness: `MOMO_DAILY_FRESHNESS 7|2026-06-17`; 23:33 cold-start plus scheduler / DB / import metadata read-only evidence confirms Drive listing works from the scheduler container, `import_config` points to `當日業績匯入` / `即時業績_當日`, but recent scheduler runs report `file_count=0` and no newer legitimate source file exists. 2026-06-24 22:17 confirms MOMO `main` and Gitea Actions `cd.yaml #904` deployed `84035906aba0`, so monthly sync failure now fails the import job and prevents Drive file movement in production. DR remains blocked because credential escrow evidence markers are still missing and must not be forged. | -| P0 host / K3s recovery | DONE | 100% | 120 booted after console fsck at `2026-06-12 15:13`; latest 2026-06-14 18:15 readback shows 120 is reachable, K3s is active, `mon` and `mon1` are both `Ready control-plane`, and cold-start P0/P1 checks are green. | -| P1 backup / alert / escrow | BLOCKED_DR_ESCROW | 97% | 2026-06-24 23:33 backup / alert readback shows 110 `13/13 fresh failed=0`, 188 `2/2 fresh failed=0`, `core_blockers=0`, `integrity_stale=0`, `offsite_fresh=1`, `rclone_gdrive_fresh=1`, `escrow_missing=5`。188 `node-exporter` textfile scrape、PostgreSQL exporter、Redis exporter、`nginx-exporter`、MinIO endpoint、Velero BSL and latest completed backup freshness are restored; monitoring coverage is `14/14 UP`; `BackupHealthMonitorMissing188`、`PostgreSQLDown`、`RedisDown`、`VeleroBackupNotRun` and 110 disk-pressure alerts resolved. DR remains blocked on real non-secret credential escrow evidence IDs. | -| P2 service / data truth | BLOCKED_MOMO_DATA_FRESHNESS | 98% | Public route/TLS, API/Web route, momo health `V10.653`, MOMO main / CD `#904` commit `84035906aba0e5e190d031a13cfd9b47a8cd1f73`, 188 live import-boundary source marker, current-month parity `10936|10936|2026-06-01|2026-06-17|2026-06-01|2026-06-17`, backup exporters, schedules, K3s node readiness/storage conditions, VIP, and 110 / 188 runtime health are green. Mac Mini / MacBook Pro controlled MOMO workspaces both point to the same codex branch commit. MOMO latest business date remains `2026-06-17`; stale age is `7` days as of 23:33. Drive pending folder has `0` matching files in repeated scheduler checks; scheduler stats show `file_count=0` / `imported_count=0` for repeated AutoImport runs; latest valid job `56` already imported `即時業績_當日.xlsx` with `sync_success=true` and bounds `2026-06-01..2026-06-17`; Mac Mini / MacBook candidate files are old or header-only, so there is no safe newer source to import. | -| P3 docs / automation contracts | DONE_WITH_MOMO_SOURCE_ABSENCE_GATE_V142_REPO_ONLY | 100% | Workplan, SOP v1.44, BACKUP-STATUS, LOGBOOK, 120 console/fsck recovery, Gitea backup stale-dump hardening, reboot ledger/version-comparison SOP, escrow evidence audit, 188 nginx Ansible baseline, 110 cold-start detector script, startup judgment layers, GO/NO-GO tree, host recovery cards, explicit Plan B degraded-operation path, machine-readable `plan_b` baseline, readiness-audit Plan B guard, B0-B5 service levels, T+0/T+120 fallback timeline checks, host role / load-balancing assessment, CD `known_hosts` guardrail, `fwupd-refresh.timer` rollback note, K3s filesystem event blocker, AWOOOI backup no-direct-offsite-sync contract, 110/188 Ansible source-of-truth, Gitea self-hosted readiness validation workflow, post-CD no-regression readbacks, stale-vs-active K8s failed Job classification, 110 runaway browser / CI load AIOps exporter + alert + gated remediation PlayBook, Telegram / AI event packet mapping, healthy heartbeat Telegram suppression, MOMO scheduler / current-month detector fix, 188 node-exporter restore helper, 188 DB/Redis exporter restore helper, 188 MinIO/Velero restore helper, 188 nginx-exporter restore helper, 110 Docker disk pressure cleanup boundary, MOMO Google Drive token userns readback, MOMO daily freshness blocker, MOMO Pro false-noise health monitor source-of-truth, docker-health direct Telegram fallback cooldown, Bitan public-content same-fingerprint cooldown, notification-noise readback, MOMO source-file absence GO/NO-GO gate with scheduler stats / import_config / job 56 evidence, repo-side cold-start v1.42 source absence classifier, live-sync parity gate, MOMO V10.653 / Gitea main / dual-workstation Codex baseline readback, MOMO import-boundary production deploy, MacBook Pro Codex safe artifact sync readback, and MacBook Pro AwoooGo Gitea SSH / dev workspace readback are updated. Latest deploy marker `622bc372` points runtime image to `2ec7f6f4`; CD `#3294` retains a historical Failure because post-deploy monitoring coverage saw 188 `nginx-exporter` down before recovery, while manual coverage now passes `14/14 UP`. 2026-06-24 23:15 read-only verify still shows repo cold-start hash `f60b81029969a527dc742ebc9558d2933f11fe24ec4f46f7a7bc6637759b7b05` differs from 110 live hash `10608873d406911a519afa96218abebc2b85ab6123bdf46b6e21eb269e554bb8`; live 110 script sync of the v1.42 classifier is not claimed until separately approved and recorded. | +| Overall recovery readiness | SERVICE_AVAILABLE_MOMO_SOURCE_BLOCKED_GDRIVE_TOKEN_WARN_DR_ESCROW_BLOCKED | 97% | 2026-06-25 09:05 live cold-start returned `PASS=87 WARN=1 BLOCKED=1`, result `BLOCKED` because MOMO business data freshness remains stale and Google Drive token ownership/writeback metadata is not confirmed. 110 / 120 / 121 / 188 ping and SSH port are OK, K3s `mon` / `mon1` are Ready, public routes/TLS are green, AWOOOI API health is healthy/prod/mock=false, MOMO service health is healthy on `V10.655`, 110 / 188 runtime and backup checks are green。Remaining hard service blocker is MOMO business data freshness: `MOMO_DAILY_FRESHNESS 8|2026-06-17`; DB current-month parity remains `10936|10936|2026-06-01|2026-06-17|2026-06-01|2026-06-17`; latest valid job `56` is still completed with `sync_success=true` and bounds `2026-06-01..2026-06-17`. New warning evidence: metadata-only check shows `/home/ollama/momo-pro/config/google_token.json` missing on host and `config/google_token.json` missing inside `momo-scheduler`, while scheduler runs as UID/GID `100000:100000`; no token content was read. DR remains blocked because credential escrow evidence markers are still missing and must not be forged. | +| P0 host / K3s recovery | DONE | 100% | 120 booted after console fsck at `2026-06-12 15:13`; latest 2026-06-25 09:05 readback shows 120 is reachable, K3s is active, `mon` and `mon1` are both `Ready control-plane`, VIP `192.168.0.125` is present, node filesystem / disk-pressure / readonly events are `0`, and latest `km-vectorize-29705460-55rgs` completed. | +| P1 backup / alert / escrow | BLOCKED_DR_ESCROW | 97% | 2026-06-25 09:05 backup / alert readback shows 110 `13/13 fresh failed=0`, 188 `2/2 fresh failed=0`, `core_blockers=0`, `integrity_stale=0`, `offsite_fresh=1`, `rclone_gdrive_fresh=1`, `escrow_missing=5`, last aggregate `2026-06-25 02:35:09`。DR remains blocked on real non-secret credential escrow evidence IDs. | +| P2 service / data truth | BLOCKED_MOMO_DATA_FRESHNESS_WITH_GDRIVE_TOKEN_WARN | 97% | Public route/TLS, API/Web route, MOMO health `V10.655`, MOMO main / CD `#904` import-boundary fix, current-month parity `10936|10936|2026-06-01|2026-06-17|2026-06-01|2026-06-17`, backup exporters, schedules, K3s node readiness/storage conditions, VIP, and 110 / 188 runtime health are green. MOMO latest business date remains `2026-06-17`; stale age is `8` days as of 09:05. Latest valid job `56` already imported `即時業績_當日.xlsx` with `sync_success=true` and bounds `2026-06-01..2026-06-17`. Google Drive token metadata is now a WARN because host and container token paths are missing; this requires owner-gated metadata repair/evidence and must not be solved by reading token contents. | +| P3 docs / automation contracts | DONE_WITH_MOMO_SOURCE_ABSENCE_GATE_V142_REPO_ONLY | 100% | Workplan, SOP v1.45, BACKUP-STATUS, LOGBOOK, 120 console/fsck recovery, Gitea backup stale-dump hardening, reboot ledger/version-comparison SOP, escrow evidence audit, 188 nginx Ansible baseline, 110 cold-start detector script, startup judgment layers, GO/NO-GO tree, host recovery cards, explicit Plan B degraded-operation path, machine-readable `plan_b` baseline, readiness-audit Plan B guard, B0-B5 service levels, T+0/T+120 fallback timeline checks, host role / load-balancing assessment, CD `known_hosts` guardrail, `fwupd-refresh.timer` rollback note, K3s filesystem event blocker, AWOOOI backup no-direct-offsite-sync contract, 110/188 Ansible source-of-truth, Gitea self-hosted readiness validation workflow, post-CD no-regression readbacks, stale-vs-active K8s failed Job classification, 110 runaway browser / CI load AIOps exporter + alert + gated remediation PlayBook, Telegram / AI event packet mapping, healthy heartbeat Telegram suppression, MOMO scheduler / current-month detector fix, 188 node-exporter restore helper, 188 DB/Redis exporter restore helper, 188 MinIO/Velero restore helper, 188 nginx-exporter restore helper, 110 Docker disk pressure cleanup boundary, MOMO Google Drive token userns readback, MOMO daily freshness blocker, MOMO Pro false-noise health monitor source-of-truth, docker-health direct Telegram fallback cooldown, Bitan public-content same-fingerprint cooldown, notification-noise readback, MOMO source-file absence decision gate with scheduler stats / import_config / job 56 evidence, repo-side cold-start v1.42 source absence classifier, live-sync parity gate, MOMO import-boundary production deploy, MacBook Pro Codex safe artifact sync readback, and 2026-06-25 live refresh with Google Drive token metadata WARN are updated. 2026-06-24 23:15 read-only verify still shows repo cold-start hash `f60b81029969a527dc742ebc9558d2933f11fe24ec4f46f7a7bc6637759b7b05` differs from 110 live hash `10608873d406911a519afa96218abebc2b85ab6123bdf46b6e21eb269e554bb8`; live 110 script sync of the v1.42 classifier is not claimed until separately approved and recorded. | -Full cold-start service readiness may not be declared green for the latest verified evidence set. As of 2026-06-24 23:33, routes/hosts/K3s/backups/exporters/Velero/monitoring coverage are available, and MOMO production code has the import-boundary fix, but the latest repo-side live read-only cold-start scorecard remains `PASS=88 WARN=0 BLOCKED=1` because MOMO business data freshness is stale beyond 3 days and no newer legitimate source file is available. The blocker is explicitly `188 momo source file absent while daily sales data stale`; this is repo-side source-of-truth evidence and not yet a claim that the 110 live monitor script was deployed. Do not declare DR scorecard complete while credential escrow evidence remains blocked. +Full cold-start service readiness may not be declared green for the latest verified evidence set. As of 2026-06-25 09:05, routes/hosts/K3s/backups/exporters/monitoring surfaces are available, AWOOOI API is healthy, and MOMO service health is `V10.655`, but the latest repo-side live read-only cold-start scorecard remains `PASS=87 WARN=1 BLOCKED=1` because MOMO business data freshness is stale beyond 3 days and Google Drive token metadata is missing / writeback not confirmed. The hard blocker remains `188 momo source file absent while daily sales data stale`; the token state is a separate WARN and not a reason to read token contents. This is repo-side source-of-truth evidence and not yet a claim that the 110 live monitor script was deployed. Do not declare DR scorecard complete while credential escrow evidence remains blocked. 2026-06-13 01:26 refresh: full cold-start is again green for the current evidence set. AWOOOI API/Web workload balancing survived the next normal CD deploy: Gitea main `e4a349bc`, ArgoCD revision `e4a349bc`, images from `414413a5`, API/Web split across `mon` / `mon1`, and global `known_hosts` retained 120 / 188 after CD fix `80e6ec1a`. Do not declare DR complete while credential escrow is missing. `km-vectorize` remediation is `90%`: schedule/label fix is live, and the remaining gate is the next official 03:00 CronJob success readback.