From 8252099d9ccab2ced9c04c9c758ae538302883e6 Mon Sep 17 00:00:00 2001 From: ogt Date: Thu, 25 Jun 2026 19:10:33 +0800 Subject: [PATCH] docs(ops): record latest reboot recovery readback [skip ci] --- docs/LOGBOOK.md | 44 +++++++++++++++++++ docs/runbooks/FULL-STACK-COLD-START-SOP.md | 15 ++++--- ...oot-cold-start-backup-recovery-workplan.md | 10 ++--- 3 files changed, 57 insertions(+), 12 deletions(-) diff --git a/docs/LOGBOOK.md b/docs/LOGBOOK.md index c59dddfd..c33742b7 100644 --- a/docs/LOGBOOK.md +++ b/docs/LOGBOOK.md @@ -32,6 +32,50 @@ - AI Agent 自動化整體:維持 `42.2%`,因 Telegram live send、Bot API、report receipt write、runtime write、provider switch、OpenClaw replacement、secret read 與 destructive operation 仍全數 `0 / false`。 **邊界**:本輪只做前端可視化與公開文字遮罩;沒有發 Telegram、沒有呼叫 Telegram Bot API、沒有寫 report receipt、沒有 live query、沒有 runtime write、沒有 provider switch、沒有替換 OpenClaw、沒有讀 secret、沒有重啟服務、沒有改 K8s / Nginx / firewall,也沒有把 Codex 工作視窗或原始對話內容顯示在前端。 +## 2026-06-25|19:06 最新 deploy marker 後全棧重啟恢復讀回 + +**背景**:使用者要求在所有主機重啟後,以最快速度、最完整且最正確的方式確認所有主機、服務、產品、網站、工具與套件恢復正常。前一筆 18:53 證據已經是 service green,但之後 `gitea/main` 又前進到新的 deploy marker,因此本輪重新以最新 production truth 做只讀驗證,不沿用舊 marker。 + +**最新 production truth**: +- 最新 `gitea/main`:`510d94d1 docs(logbook): record agent professional judgment matrix rollout [skip ci]`,屬 docs-only,不改 production runtime。 +- 最新 production deploy marker:`d8ca8224 chore(cd): deploy 9dbe044 [skip ci]`。 +- Source commit:`9dbe044e fix(web): 遮罩 Agent readback 來源逐字內容標籤`。 +- ArgoCD `awoooi-prod`:`Synced / Healthy`,revision `d8ca822422021d0fda8da8fa4c354c0c4db7ff22`。 +- Live images:`awoooi-api`、`awoooi-web`、`awoooi-worker` 使用 `9dbe044ea1e8e3894ccbeb5ed760bb124b87f7be`。 +- Deployments:API `2/2`、Web `2/2`、Worker `1/1`。 + +**主機 / K3s / workload evidence**: +- `scripts/reboot-recovery/post-start-quick-check.sh --no-color` 於 `19:05` 回 `POST_START_QUICK_CHECK PASS=18 WARN=3 BLOCKED=0`,warning split `SERVICE=0 BOUNDARY=1 EVIDENCE=2`,`RESULT=FULL_STACK_GREEN_DR_ESCROW_BLOCKED`。 +- Delegated cold-start scorecard:`PASS=89 WARN=0 BLOCKED=0`,result `GREEN`。 +- 110 / 120 / 121 / 188 ping + SSH port 全部 OK。 +- K3s `mon` / `mon1` 皆 `Ready control-plane`,VIP `192.168.0.125` present,node filesystem / disk-pressure / readonly events `0`。 +- 19:06 readback 確認 API 與 Web 均分散在 `mon` / `mon1`,Worker 單副本在 `mon`,`topologySpreadConstraints` 仍是 `minDomains=2` + `DoNotSchedule`。 + +**Public route / product route evidence**: +- 19:05 quick-check route batch:AWOOOI API、IwoooS、MOMO health、Stock 皆回 `200`;cold-start route gate 亦確認 AWOOOI web `307`、MOMO、Gitea、Harbor、Registry、Sentry、SigNoz、Langfuse、Bitan、AIOps 都符合預期。 +- 19:05:38 到 19:06:24 連續 5 輪外部讀回:AWOOOI API、IwoooS、VibeWork、AwoooGo、MOMO health、Stock、Bitan 全部 `200`。 +- 判定:最新 deploy marker 下沒有持續 `502`。前一輪 AwoooGo / Stock `502` 已被歸類為 post-deploy upstream warmup transient;SOP 必須等最終 deploy marker + upstream healthy + 連續 route readback 後再宣稱 recovered。 + +**MOMO / DB freshness evidence**: +- MOMO health version:`V10.690`。 +- `momo-pro-system`、`momo-scheduler`、`momo-telegram-bot` 皆 healthy;scheduler restart count `0`。 +- `daily_sales_snapshot=109061|2025-07-01|2026-06-24`。 +- `realtime_sales_monthly` / current-month snapshot parity:`15383|15383|2026-06-01|2026-06-24|2026-06-01|2026-06-24`。 +- `DB_DAILY_FRESHNESS 1|2026-06-24`。 +- Latest import job:`57 completed|即時業績_當日.xlsx|2026-06-25T13:16:47.359958|2026-06-25T13:18:02.964985|15383|15383|0`。 + +**Backup / DR evidence**: +- Backup status:110 `13/13 fresh failed=0`、188 `2/2 fresh failed=0`、`core_blockers=0`、`integrity_stale=0`、`offsite_fresh=1`、`rclone_gdrive_fresh=1`。 +- Credential escrow 仍 `escrow_missing=5`;DR complete 不得宣稱。 + +**110 CPU / load evidence**: +- 110 load 仍偏高,讀回約 `14.51 / 12.34 / 11.42`;主要來源為 Gitea Actions cache 壓縮 / `zstdmt` / `tar`、StockPlatform headless Chrome smoke / CI、Gitea、AWOOOI API、ClickHouse、Docker、平台服務。 +- 本輪沒有 kill process、沒有 Docker/systemd/Nginx/firewall/K8s/ArgoCD 寫操作。後續若要清 Chrome,仍需先分辨 orphan process group 與 active CI smoke;active CI / deploy load 不得直接 kill。 + +**判定**: +- 最新 marker 下,主機、K3s、AWOOOI、MOMO、Stock、AwoooGo、VibeWork、Bitan、Gitea、Harbor、Registry、Sentry、SigNoz、Langfuse、AIOps 與核心 backup/offsite/monitoring surfaces 已恢復到 service green。 +- 可宣稱:`FULL_STACK_GREEN_DR_ESCROW_BLOCKED`,也就是服務面全綠、DR escrow 還缺 evidence。 +- 不得宣稱:DR complete、credential escrow complete、Wazuh manager registry recovered、active response / host write / Kali active scan authorized、或每次未來重啟保證必綠。 ## 2026-06-25|Wazuh owner evidence 預檢補上 Dashboard API 分欄 diff --git a/docs/runbooks/FULL-STACK-COLD-START-SOP.md b/docs/runbooks/FULL-STACK-COLD-START-SOP.md index 0aea43d8..3150fca5 100644 --- a/docs/runbooks/FULL-STACK-COLD-START-SOP.md +++ b/docs/runbooks/FULL-STACK-COLD-START-SOP.md @@ -1,6 +1,6 @@ # AWOOOI 全棧冷啟動與主機重啟 SOP -> Version: v1.55 +> Version: v1.56 > Last updated: 2026-06-25 Asia/Taipei > Scope: 110 / 120 / 121 / 188 full-stack reboot recovery. 112 Kali is recorded as P3 optional and is not part of this recovery path. @@ -12,21 +12,22 @@ 若只是重啟後要快速判斷能不能宣稱恢復,先跑一頁式總檢查:`scripts/reboot-recovery/post-start-quick-check.sh --no-color`,並以 `docs/runbooks/REBOOT-POST-START-QUICK-CHECK.md` 作為人工 fallback。長 SOP 保留完整背景、例外處理與 Plan B;短版 wrapper / checklist 負責每次 T+10 分鐘內的固定判定。 -2026-06-25 18:23 post-CD live read-only refresh supersedes the 15:04 wrapper wording. Consecutive main pushes caused older CD runs to be replaced, so the latest production truth is the final deploy marker `2a9e816a chore(cd): deploy aa70835 [skip ci]`, Gitea CD for `aa70835` has produced deploy marker `2a9e816a`, read-only ArgoCD shows `awoooi-prod Synced / Healthy` at revision `2a9e816a9db6e428e1f497c7e4a1759bb2f63d25`; API/Web/Worker live image tag `aa70835c7177475430479d8ab68621f59ebeb9b0`; 18:23 K3s pods are Running and cold-start result is GREEN, and post-start quick check `RESULT=FULL_STACK_GREEN_DR_ESCROW_BLOCKED`. Hosts, routes, K3s, AWOOOI API health, MOMO service health, MOMO business data freshness, backup core/offsite, and core monitoring/exporter surfaces are green for controlled runner/CD release. MOMO is healthy on `V10.690`; latest import job `57` completed cleanly; `MOMO_DAILY_FRESHNESS 1|2026-06-24`; current-month daily snapshot and realtime tables match through `2026-06-24`. `post-start-quick-check.sh` parses cold-start `PASS / WARN / BLOCKED` summary before classifying exit codes, so WARN-only rollout/stale evidence is no longer inflated into a service blocker. The wrapper returns `RESULT=FULL_STACK_GREEN_DR_ESCROW_BLOCKED` when service blockers are zero but `escrow_missing=5` remains. Do not turn this into a DR complete or security/runtime acceptance claim. Wazuh production routes are now `200 disabled_waiting_iwooos_wazuh_owner_gate`, but `configured=false`, manager query accepted `0`, manager registry accepted `0`, and runtime gate `0`; treat Wazuh as a security registry evidence blocker, not a reboot service blocker. +2026-06-25 19:06 post-CD live read-only refresh supersedes the 18:53 wrapper wording. Consecutive main pushes caused older deploy markers to be replaced, so the latest production truth is deploy marker `d8ca8224 chore(cd): deploy 9dbe044 [skip ci]`. Read-only ArgoCD shows `awoooi-prod Synced / Healthy` at revision `d8ca822422021d0fda8da8fa4c354c0c4db7ff22`; API/Web/Worker live image tag `9dbe044ea1e8e3894ccbeb5ed760bb124b87f7be`; API `2/2`, Web `2/2`, Worker `1/1`. The 19:05 post-start quick check returns `RESULT=FULL_STACK_GREEN_DR_ESCROW_BLOCKED`, delegated cold-start remains `PASS=89 WARN=0 BLOCKED=0`, and 19:05-19:06 route stability checks confirm AWOOOI API, IwoooS, AwoooGo, Stock, VibeWork, Bitan, and MOMO health all return `200` for five consecutive external reads. Earlier AwoooGo / Stock `502` reads were post-deploy upstream warmup transients, not persistent service failures. Hosts, routes, K3s, AWOOOI API health, MOMO service health, MOMO business data freshness, backup core/offsite, and core monitoring/exporter surfaces are green for controlled runner/CD release. MOMO is healthy on `V10.690`; latest import job `57` completed cleanly; `MOMO_DAILY_FRESHNESS 1|2026-06-24`; current-month daily snapshot and realtime tables match through `2026-06-24`. `post-start-quick-check.sh` parses cold-start `PASS / WARN / BLOCKED` summary before classifying exit codes, so WARN-only rollout/stale evidence is no longer inflated into a service blocker. The wrapper returns `RESULT=FULL_STACK_GREEN_DR_ESCROW_BLOCKED` when service blockers are zero but `escrow_missing=5` remains. Do not turn this into a DR complete or security/runtime acceptance claim. Wazuh production routes are now `200 disabled_waiting_iwooos_wazuh_owner_gate`, but `configured=false`, manager query accepted `0`, manager registry accepted `0`, and runtime gate `0`; treat Wazuh as a security registry evidence blocker, not a reboot service blocker. ```text Repo-side reboot SOP / Plan B / automation contracts: COMPLETE, 100%. -Live cold-start read-only check: 2026-06-25 18:23 wrapper delegated cold-start PASS=89 WARN=0 BLOCKED=0, Result=GREEN. -Post-start quick check: 2026-06-25 18:23 PASS=18 WARN=3 BLOCKED=0; warning split SERVICE=0 BOUNDARY=1 EVIDENCE=2; Result=FULL_STACK_GREEN_DR_ESCROW_BLOCKED; exit code 0. +Live cold-start read-only check: 2026-06-25 19:05 wrapper delegated cold-start PASS=89 WARN=0 BLOCKED=0, Result=GREEN. +Post-start quick check: 2026-06-25 19:05 PASS=18 WARN=3 BLOCKED=0; warning split SERVICE=0 BOUNDARY=1 EVIDENCE=2; Result=FULL_STACK_GREEN_DR_ESCROW_BLOCKED; exit code 0. Repo-side cold-start v1.42+ live read-only run: MOMO source absence / stale data blocker is cleared by import job 57 and `MOMO_DAILY_FRESHNESS 1|2026-06-24`. Live 110 script sync is not claimed until a separate approved deployment/sync happens. 110 live-sync parity: 2026-06-24 23:15 read-only `verify-cold-start-monitor-deploy.sh` correctly BLOCKED because repo script hash `f60b81029969a527dc742ebc9558d2933f11fe24ec4f46f7a7bc6637759b7b05` differs from 110 live hash `10608873d406911a519afa96218abebc2b85ab6123bdf46b6e21eb269e554bb8`. Do not use live 110 monitor output to prove v1.42 behavior until the approved live-sync gate in §13.3.1 passes. Service state: FULL_STACK_GREEN_DR_ESCROW_BLOCKED; 110/120/121/188 reachable, K3s mon/mon1 Ready, public routes/TLS green, MOMO data fresh, 110/188 backup health fresh, 188 node-exporter / PostgreSQL exporter / Redis exporter restored, 188 MinIO endpoint and Velero BackupStorageLocation restored, 110 disk pressure cleared. -Runtime release state: API/Web live image tag is `aa70835c7177475430479d8ab68621f59ebeb9b0`, and 18:23 K3s readback shows API/Web/Worker pods Running; production API health returns healthy with `environment=prod`, `mock_mode=false`, and postgresql / redis / openclaw / signoz / gcp ollama providers up. 18:23 direct route smoke returned 200 for AWOOOI API, `/zh-TW/iwooos`, `/zh-TW/governance?tab=automation-inventory`, MOMO health, and Stock; cold-start raw route gate returned all expected route statuses, including redirects such as awoooi web=307 and sentry=302. +Runtime release state: API/Web/Worker live image tag is `9dbe044ea1e8e3894ccbeb5ed760bb124b87f7be`, and 19:06 K3s readback shows API/Web/Worker pods Running; production API health returns healthy with `environment=prod`, `mock_mode=false`, and postgresql / redis / openclaw / signoz / gcp ollama providers up. 19:05 route smoke returned 200 for AWOOOI API, IwoooS, MOMO health, and Stock; cold-start route gate also returned expected statuses for AWOOOI web, MOMO, Gitea, Harbor, Registry, Sentry, SigNoz, Langfuse, Bitan, and AIOps. AwoooGo, Stock, AWOOOI API, IwoooS, VibeWork, MOMO health, and Bitan then returned 200 for five consecutive external route reads from 19:05:38 to 19:06:24. Cold-start raw route gate returned all expected route statuses, including redirects such as awoooi web=307 and sentry=302. MOMO release state: mo.wooo.work health is healthy on version V10.690. `momo-pro-system`, `momo-scheduler`, and `momo-telegram-bot` are healthy; scheduler `RestartCount=0`. 18:23 dedicated preflight returns PASS=19 WARN=2 BLOCKED=0, so retain recent container-replace / scheduler fail-closed / notification evidence notes, but no service blocker remains. MOMO data state: current-month daily_sales_snapshot and realtime_sales_monthly match through 2026-06-24: `daily_sales_snapshot=109061|2025-07-01|2026-06-24`, `MOMO_MONTHLY_SYNC 15383|15383|2026-06-01|2026-06-24|2026-06-01|2026-06-24`, and `MOMO_DAILY_FRESHNESS 1|2026-06-24`. Latest import job is `57 completed|即時業績_當日.xlsx|2026-06-25T13:16:47.359958|2026-06-25T13:18:02.964985|15383|15383|0`. Google Drive / source-file state: 14:16 cold-start reports `MOMO_GDRIVE_TOKEN_STAT 100000:100000:600 scheduler_uid=100000`. Dedicated preflight confirms host token metadata matches scheduler UID and restrictive mode; container token artifact exists with mode `600`. Token content was not read. Future Drive auth/API failure must still be treated as failed import evidence rather than no-file success. -110 CPU/load readback: 2026-06-25 10:58 user-approved minimal SIGTERM targeted only orphan `stockplatform-review-bulk-ux` Chrome process groups `438005`, `471295`, `640155`, and `670628`; `OLD_GROUPS_REMAINING` returned empty. 18:23 readback shows current higher load is mainly StockPlatform `next build`, StockPlatform headless Chrome smoke, and platform services; AWOOOI CD action container was visible only as active CI/deploy load. No Docker/systemd/Nginx/firewall/K8s write was performed; do not cancel active CI/smoke unless separately approved. -Backup / monitoring state: 18:23 wrapper readback confirms backup core blockers are 0, 110 is 13/13 fresh failed=0, 188 is 2/2 fresh failed=0, offsite_fresh=1, rclone_gdrive_fresh=1, integrity_stale=0, last aggregate is 2026-06-25 02:35:09, and escrow_missing=5. +110 CPU/load readback: 2026-06-25 10:58 user-approved minimal SIGTERM targeted only orphan `stockplatform-review-bulk-ux` Chrome process groups `438005`, `471295`, `640155`, and `670628`; `OLD_GROUPS_REMAINING` returned empty. 19:05 readback shows current higher load is mainly Gitea Actions cache save / `zstdmt` / `tar`, StockPlatform headless Chrome smoke / CI, Gitea, AWOOOI API, ClickHouse, Docker, and platform services. No Docker/systemd/Nginx/firewall/K8s write was performed; do not cancel active CI/smoke unless separately approved. If Chrome groups are active children of Playwright / CI, observe queue and timeout; if they become PPID 1 orphan process groups with sustained CPU and no parent smoke, run dry-run and require owner approval before targeted `SIGTERM`. +Backup / monitoring state: 19:05 wrapper readback confirms backup core blockers are 0, 110 is 13/13 fresh failed=0, 188 is 2/2 fresh failed=0, offsite_fresh=1, rclone_gdrive_fresh=1, integrity_stale=0, last aggregate is 2026-06-25 02:35:09, and escrow_missing=5. +Route transient handling: post-deploy `502` on Stock or AwoooGo is a blocker only if it persists after upstream container health is ready and 3-5 consecutive external route reads still fail. For AwoooGo, live upstream is on 110 `192.168.0.110:32190`; do not test only `127.0.0.1` on 110 because the listener may bind the host address. For K3s workload balancing, wait for terminating pods to disappear before judging API/Web placement; final required state for two-replica API/Web is split across `mon` and `mon1`. Notification-noise state: healthy AWOOOI heartbeat is suppressed; heartbeat warning dedupe uses stable actionable fingerprints so HTTP status / timeout / latency drift does not create a new Telegram event every 30 minutes; MOMO Pro monitor uses https://mo.wooo.work/health as primary truth and no longer checks the 188 root path; MoWoooWorkDown now labels component=momo-pro-system and requires public/local/container/data-freshness triage instead of blind restart; docker-health-monitor keeps 5-minute repair cadence but has a separate 30-minute Telegram fallback cooldown; Bitan public-content check keeps failure alerting with same-fingerprint cooldown and one recovery notice. Deploy storm / CD replacement state: if several main commits land during recovery, older CD runs may be canceled by newer commits. Do not treat the canceled run as a service failure. Wait for the final deploy marker, verify live image tags, ArgoCD health, public routes, DB freshness, backup status, and post-start quick check before declaring latest production recovered. Wazuh / SOC boundary state: production Wazuh read-only route presence is not equivalent to Wazuh registry recovery. `/api/iwooos/wazuh` and `/api/v1/iwooos/wazuh` returning `200 disabled_waiting_iwooos_wazuh_owner_gate` only proves the route boundary is deployed; manager registry accepted, owner evidence accepted, active response, host write, agent re-enroll, restart, secret patch, Kali active scan, and runtime gate remain `0 / false`. diff --git a/docs/workplans/2026-06-04-reboot-cold-start-backup-recovery-workplan.md b/docs/workplans/2026-06-04-reboot-cold-start-backup-recovery-workplan.md index c6d9fcb5..92922120 100644 --- a/docs/workplans/2026-06-04-reboot-cold-start-backup-recovery-workplan.md +++ b/docs/workplans/2026-06-04-reboot-cold-start-backup-recovery-workplan.md @@ -11,15 +11,15 @@ | Area | Status | Completion | Evidence | |------|--------|------------|----------| -| Overall recovery readiness | FULL_STACK_GREEN_DR_ESCROW_BLOCKED | 99% | 2026-06-25 18:23 post-CD quick check returned exit `0`, `POST_START_QUICK_CHECK PASS=18 WARN=3 BLOCKED=0`, warning split `SERVICE=0 BOUNDARY=1 EVIDENCE=2`, result `FULL_STACK_GREEN_DR_ESCROW_BLOCKED`: latest deploy marker is `2a9e816a chore(cd): deploy aa70835 [skip ci]`; latest main deploy marker `2a9e816a` is present after `aa70835`; read-only ArgoCD shows `awoooi-prod Synced / Healthy` at revision `2a9e816a9db6e428e1f497c7e4a1759bb2f63d25`; API/Web/Worker images are `aa70835c7177475430479d8ab68621f59ebeb9b0`; 110 / 120 / 121 / 188 ping and SSH port are OK, K3s `mon` / `mon1` are Ready, public routes/TLS are green, AWOOI API health is healthy/prod/mock=false, delegated cold-start is `PASS=89 WARN=0 BLOCKED=0`, MOMO service health is healthy on `V10.690`, MOMO data freshness is `1|2026-06-24`, 110 / 188 runtime and backup checks are green. MOMO latest valid job `57` completed cleanly at `2026-06-25T13:18:02`, `15383/15383/0`, and current-month snapshot / realtime bounds match through `2026-06-24`. DR remains blocked because credential escrow evidence markers are still missing (`escrow_missing=5`) and must not be forged. | +| Overall recovery readiness | FULL_STACK_GREEN_DR_ESCROW_BLOCKED | 99% | 2026-06-25 19:05 post-CD quick check returned exit `0`, `POST_START_QUICK_CHECK PASS=18 WARN=3 BLOCKED=0`, warning split `SERVICE=0 BOUNDARY=1 EVIDENCE=2`, result `FULL_STACK_GREEN_DR_ESCROW_BLOCKED`: latest deploy marker is `d8ca8224 chore(cd): deploy 9dbe044 [skip ci]`; read-only ArgoCD shows `awoooi-prod Synced / Healthy` at revision `d8ca822422021d0fda8da8fa4c354c0c4db7ff22`; API/Web/Worker images are `9dbe044ea1e8e3894ccbeb5ed760bb124b87f7be`; 110 / 120 / 121 / 188 ping and SSH port are OK, K3s `mon` / `mon1` are Ready, public routes/TLS are green, AWOOOI API health is healthy/prod/mock=false, delegated cold-start is `PASS=89 WARN=0 BLOCKED=0`, MOMO service health is healthy on `V10.690`, MOMO data freshness is `1|2026-06-24`, 110 / 188 runtime and backup checks are green. MOMO latest valid job `57` completed cleanly at `2026-06-25T13:18:02`, `15383/15383/0`, and current-month snapshot / realtime bounds match through `2026-06-24`. AwoooGo, Stock, AWOOOI API, IwoooS, VibeWork, MOMO health, and Bitan returned `200` for 5 consecutive reads from 19:05-19:06; final K3s readback confirms API/Web split across `mon` / `mon1`. DR remains blocked because credential escrow evidence markers are still missing (`escrow_missing=5`) and must not be forged. | | P0 host / K3s recovery | DONE | 100% | 120 booted after console fsck at `2026-06-12 15:13`; latest 2026-06-25 09:05 readback shows 120 is reachable, K3s is active, `mon` and `mon1` are both `Ready control-plane`, VIP `192.168.0.125` is present, node filesystem / disk-pressure / readonly events are `0`, and latest `km-vectorize-29705460-55rgs` completed. | | P1 backup / alert / escrow | BLOCKED_DR_ESCROW | 97% | 2026-06-25 09:05 backup / alert readback shows 110 `13/13 fresh failed=0`, 188 `2/2 fresh failed=0`, `core_blockers=0`, `integrity_stale=0`, `offsite_fresh=1`, `rclone_gdrive_fresh=1`, `escrow_missing=5`, last aggregate `2026-06-25 02:35:09`。DR remains blocked on real non-secret credential escrow evidence IDs. | -| P2 service / data truth | GREEN | 100% | Public route/TLS, API/Web route, MOMO health `V10.690`, MOMO main / CD `#904` monthly-sync failure boundary, MOMO main / CD `#910` Drive-auth fail-closed boundary, direct 18:23 wrapper public route smoke all expected 2xx/3xx, current-month parity `15383|15383|2026-06-01|2026-06-24|2026-06-01|2026-06-24`, backup exporters, schedules, K3s node readiness/storage conditions, VIP, and 110 / 188 runtime health are green. 18:23 preflight confirms app / scheduler / Telegram bot healthy, scheduler restart count `0`, token metadata aligned to scheduler UID, latest job `57` completed cleanly, and `DB_DAILY_FRESHNESS 1|2026-06-24`. | -| P3 docs / automation contracts | DONE_WITH_POST_CD_DEPLOY_STORM_READBACK | 100% | Workplan, SOP v1.55, one-page post-start quick check wrapper + fallback runbook, BACKUP-STATUS, LOGBOOK, 120 console/fsck recovery, Gitea backup stale-dump hardening, reboot ledger/version-comparison SOP, escrow evidence audit, 188 nginx Ansible baseline, 110 cold-start detector script, startup judgment layers, GO/NO-GO tree, host recovery cards, explicit Plan B degraded-operation path, machine-readable `plan_b` baseline, readiness-audit Plan B guard, B0-B5 service levels, T+0/T+120 fallback timeline checks, host role / load-balancing assessment, CD `known_hosts` guardrail, `fwupd-refresh.timer` rollback note, K3s filesystem event blocker, AWOOI backup no-direct-offsite-sync contract, 110/188 Ansible source-of-truth, Gitea self-hosted readiness validation workflow, post-CD no-regression readbacks, stale-vs-active K8s failed Job classification, 110 runaway browser / CI load AIOps exporter + alert + gated remediation PlayBook, Telegram / AI event packet mapping, healthy heartbeat Telegram suppression, MOMO scheduler / current-month detector fix, 188 node-exporter restore helper, 188 DB/Redis exporter restore helper, 188 MinIO/Velero restore helper, 188 nginx-exporter restore helper, 110 Docker disk pressure cleanup boundary, MOMO Google Drive token userns readback, MOMO data freshness hard blocker, MOMO Pro false-noise health monitor source-of-truth, docker-health direct Telegram fallback cooldown, Bitan public-content same-fingerprint cooldown, notification-noise readback, MOMO source-file absence decision gate with scheduler stats / import_config / job 56 evidence, repo-side cold-start v1.42 source absence classifier, live-sync parity gate, MOMO import-boundary production deploy, MOMO Drive-auth fail-closed production deploy, 10:04 scheduler fail-closed live proof, 10:35 route / DB / backup refresh, 11:44 MOMO dedicated preflight blocked readback, 14:16 MOMO dedicated preflight recovery on V10.674 / job 57 / freshness 1, 14:41 wrapper warning split, 15:04 cold-start WARN-only classifier fix, 18:23 deploy-storm replacement readback after marker `2a9e816a`, 10:58 user-approved 110 orphan Chrome SIGTERM evidence, MacBook Pro Codex safe artifact sync readback, and 2026-06-25 live refresh with full cold-start GREEN are updated. 2026-06-24 23:15 read-only verify still shows repo cold-start hash `f60b81029969a527dc742ebc9558d2933f11fe24ec4f46f7a7bc6637759b7b05` differs from 110 live hash `10608873d406911a519afa96218abebc2b85ab6123bdf46b6e21eb269e554bb8`; live 110 script sync of the v1.42 classifier is not claimed until separately approved and recorded. | +| P2 service / data truth | GREEN | 100% | Public route/TLS, API/Web route, MOMO health `V10.690`, MOMO main / CD `#904` monthly-sync failure boundary, MOMO main / CD `#910` Drive-auth fail-closed boundary, direct 19:05 wrapper public route smoke all expected 2xx/3xx, five-iteration AwoooGo / Stock / AWOOOI / IwoooS / VibeWork / MOMO / Bitan route stability reads, current-month parity `15383|15383|2026-06-01|2026-06-24|2026-06-01|2026-06-24`, backup exporters, schedules, K3s node readiness/storage conditions, VIP, and 110 / 188 runtime health are green. 19:05 preflight confirms app / scheduler / Telegram bot healthy, scheduler restart count `0`, token metadata aligned to scheduler UID, latest job `57` completed cleanly, and `DB_DAILY_FRESHNESS 1|2026-06-24`. | +| P3 docs / automation contracts | DONE_WITH_POST_CD_DEPLOY_STORM_READBACK | 100% | Workplan, SOP v1.56, one-page post-start quick check wrapper + fallback runbook, BACKUP-STATUS, LOGBOOK, 120 console/fsck recovery, Gitea backup stale-dump hardening, reboot ledger/version-comparison SOP, escrow evidence audit, 188 nginx Ansible baseline, 110 cold-start detector script, startup judgment layers, GO/NO-GO tree, host recovery cards, explicit Plan B degraded-operation path, machine-readable `plan_b` baseline, readiness-audit Plan B guard, B0-B5 service levels, T+0/T+120 fallback timeline checks, host role / load-balancing assessment, CD `known_hosts` guardrail, `fwupd-refresh.timer` rollback note, K3s filesystem event blocker, AWOOOI backup no-direct-offsite-sync contract, 110/188 Ansible source-of-truth, Gitea self-hosted readiness validation workflow, post-CD no-regression readbacks, stale-vs-active K8s failed Job classification, 110 runaway browser / CI load AIOps exporter + alert + gated remediation PlayBook, Telegram / AI event packet mapping, healthy heartbeat Telegram suppression, MOMO scheduler / current-month detector fix, 188 node-exporter restore helper, 188 DB/Redis exporter restore helper, 188 MinIO/Velero restore helper, 188 nginx-exporter restore helper, 110 Docker disk pressure cleanup boundary, MOMO Google Drive token userns readback, MOMO data freshness hard blocker, MOMO Pro false-noise health monitor source-of-truth, docker-health direct Telegram fallback cooldown, Bitan public-content same-fingerprint cooldown, notification-noise readback, MOMO source-file absence decision gate with scheduler stats / import_config / job 56 evidence, repo-side cold-start v1.42 source absence classifier, live-sync parity gate, MOMO import-boundary production deploy, MOMO Drive-auth fail-closed production deploy, 10:04 scheduler fail-closed live proof, 10:35 route / DB / backup refresh, 11:44 MOMO dedicated preflight blocked readback, 14:16 MOMO dedicated preflight recovery on V10.674 / job 57 / freshness 1, 14:41 wrapper warning split, 15:04 cold-start WARN-only classifier fix, 18:23 deploy-storm replacement readback after marker `2a9e816a`, 18:53 latest marker `cc835df5` readback, 19:06 latest marker `d8ca8224` readback, transient AwoooGo / Stock 502 warmup classification, rollout terminating-pod workload-balanced timing note, 10:58 user-approved 110 orphan Chrome SIGTERM evidence, MacBook Pro Codex safe artifact sync readback, and 2026-06-25 live refresh with full cold-start GREEN are updated. 2026-06-24 23:15 read-only verify still shows repo cold-start hash `f60b81029969a527dc742ebc9558d2933f11fe24ec4f46f7a7bc6637759b7b05` differs from 110 live hash `10608873d406911a519afa96218abebc2b85ab6123bdf46b6e21eb269e554bb8`; live 110 script sync of the v1.42 classifier is not claimed until separately approved and recorded. | -2026-06-25 18:23 post-CD wrapper readback supersedes the 15:04 wording: consecutive main pushes created a deploy storm where `d2caa4eb` CD `#3340` and `d52583d9` CD `#3342` were superseded by later commits. Latest production truth is deploy marker `2a9e816a chore(cd): deploy aa70835 [skip ci]`, Gitea CD for `aa70835` has produced deploy marker `2a9e816a`, ArgoCD `Synced / Healthy`, API/Web/Worker image tag `aa70835c7177475430479d8ab68621f59ebeb9b0`, direct route smoke 200 for AWOOI API / IwoooS / Governance / MOMO health / Stock, and wrapper `POST_START_QUICK_CHECK PASS=18 WARN=3 BLOCKED=0`. Repo-side cold-start returns `PASS=89 WARN=0 BLOCKED=0`; `/backup/scripts/backup-status.sh --no-notify --no-refresh` reports 110 `13/13 fresh failed=0`, 188 `2/2 fresh failed=0`, `core_blockers=0`, `integrity_stale=0`, `offsite_fresh=1`, `rclone_gdrive_fresh=1`, `escrow_missing=5`; MOMO dedicated preflight returns `PASS=19 WARN=2 BLOCKED=0`; MOMO health is `V10.690`; 110 load is around `15.78 / 11.19 / 9.02`, with StockPlatform `next build`, StockPlatform headless Chrome smoke, and platform services visible, not an AWOOI service blocker. Wrapper result is `FULL_STACK_GREEN_DR_ESCROW_BLOCKED`, not `DEGRADED`, because service warnings are `0` and only DR boundary / evidence warnings remain. Wazuh route readback is now `200 disabled_waiting_iwooos_wazuh_owner_gate`, but manager registry accepted remains `0`, so Wazuh is a security registry evidence blocker rather than a reboot service blocker. +2026-06-25 19:06 post-CD wrapper readback supersedes the 18:53 wording: consecutive main pushes created a deploy storm where older deploy markers were superseded by later commits. Latest production truth is deploy marker `d8ca8224 chore(cd): deploy 9dbe044 [skip ci]`, ArgoCD `Synced / Healthy`, API/Web/Worker image tag `9dbe044ea1e8e3894ccbeb5ed760bb124b87f7be`, direct route smoke 200 for AWOOOI API / IwoooS / VibeWork / AwoooGo / MOMO health / Stock / Bitan and expected route-gate statuses for MOMO / Gitea / Harbor / Registry / Sentry / SigNoz / Langfuse / AIOps, and wrapper `POST_START_QUICK_CHECK PASS=18 WARN=3 BLOCKED=0`. Repo-side cold-start returns `PASS=89 WARN=0 BLOCKED=0`; `/backup/scripts/backup-status.sh --no-notify --no-refresh` reports 110 `13/13 fresh failed=0`, 188 `2/2 fresh failed=0`, `core_blockers=0`, `integrity_stale=0`, `offsite_fresh=1`, `rclone_gdrive_fresh=1`, `escrow_missing=5`; MOMO dedicated preflight returns `PASS=19 WARN=2 BLOCKED=0`; MOMO health is `V10.690`; AwoooGo / Stock transient 502 reads cleared after upstream warmup and five consecutive route reads returned `200`; 110 load is around `14.51 / 12.34 / 11.42`, with Gitea Actions cache save / `zstdmt` / `tar`, StockPlatform headless Chrome smoke / CI, Gitea, AWOOOI API, ClickHouse, Docker, and platform services visible, not an AWOOOI service blocker. Wrapper result is `FULL_STACK_GREEN_DR_ESCROW_BLOCKED`, not `DEGRADED`, because service warnings are `0` and only DR boundary / evidence warnings remain. Wazuh route readback is now `200 disabled_waiting_iwooos_wazuh_owner_gate`, but manager registry accepted remains `0`, so Wazuh is a security registry evidence blocker rather than a reboot service blocker. -Full cold-start service readiness may now be declared GREEN for the latest verified evidence set. As of 2026-06-25 18:23, routes/hosts/K3s/backups/exporters/monitoring surfaces are available, AWOOOI API is healthy, MOMO service health is `V10.690`, and MOMO business data is fresh through `2026-06-24`. The live read-only cold-start scorecard is `PASS=89 WARN=0 BLOCKED=0`, and the post-start wrapper result is `FULL_STACK_GREEN_DR_ESCROW_BLOCKED`. Do not declare DR scorecard complete while credential escrow evidence remains blocked, and do not declare Wazuh registry recovery until manager registry evidence is accepted. +Full cold-start service readiness may now be declared GREEN for the latest verified evidence set. As of 2026-06-25 19:06, routes/hosts/K3s/backups/exporters/monitoring surfaces are available, AWOOOI API is healthy, MOMO service health is `V10.690`, and MOMO business data is fresh through `2026-06-24`. The live read-only cold-start scorecard is `PASS=89 WARN=0 BLOCKED=0`, the post-start wrapper result is `FULL_STACK_GREEN_DR_ESCROW_BLOCKED`, AwoooGo / Stock route stability has been rechecked after transient warmup, and final API/Web workload placement is split across `mon` / `mon1`. Do not declare DR scorecard complete while credential escrow evidence remains blocked, and do not declare Wazuh registry recovery until manager registry evidence is accepted. 2026-06-13 01:26 refresh: full cold-start is again green for the current evidence set. AWOOOI API/Web workload balancing survived the next normal CD deploy: Gitea main `e4a349bc`, ArgoCD revision `e4a349bc`, images from `414413a5`, API/Web split across `mon` / `mon1`, and global `known_hosts` retained 120 / 188 after CD fix `80e6ec1a`. Do not declare DR complete while credential escrow is missing. `km-vectorize` remediation is `90%`: schedule/label fix is live, and the remaining gate is the next official 03:00 CronJob success readback.