docs(ops): 記錄重啟 live readback 階段判定 [skip ci]

This commit is contained in:
Your Name
2026-06-18 12:21:27 +08:00
parent 5013ebb770
commit 68c528f4d9
3 changed files with 76 additions and 6 deletions

View File

@@ -1,3 +1,25 @@
## 2026-06-18重啟 live cold-start readback服務可用但保留 stale failed Job warning
**背景**:重啟 SOP / Plan B / repo-side readiness blockers 已推上 `gitea/main=63d8361f` 後,為避免把 repo-side readiness 誤講成 live full green本輪立即用只讀方式重跑 cold-start gate 並追蹤剛好同時發生的 AWOOOI rollout 自然收斂。
**Live readback**
- `bash scripts/reboot-recovery/full-stack-cold-start-check.sh --monitor-read-only --no-color --watch --interval 1 --max-attempts 1` 於 12:13 回 `PASS=83 WARN=1 BLOCKED=0`,結果 `DEGRADED`
- P0 reachability110 / 120 / 121 / 188 ping 與 SSH port 全部 OK112 Kali 仍照 SOP 排除。
- 188 data layerPostgreSQL accepting connections、Redis `PONG`、momo health / SigNoz reachablemomo current-month snapshot 與 realtime parity 為 `10936|10936|2026-06-01|2026-06-17|2026-06-01|2026-06-17`
- 110 registry / observabilityHarbor、Gitea、Prometheus、Alertmanager、Sentry reachable110 failed units `0`backup health 110 `total=13 stale=0 missing_cron=0 missing_script=0 failed_count=0 config_failed=0 integrity_total=2 integrity_stale=0`188 backup health `total=2 stale=0`
- K3s`mon` / `mon1``Ready control-plane`VIP `192.168.0.125` 存在於 120`NODE_FS_ERROR_EVENTS 0`
- Public routes / TLSawoooi API/Web、mo、momo health、Gitea、Harbor、registry、Sentry、SigNoz、stock、Langfuse、Bitan、aiops 全部回 2xx/3xx 且 TLS 驗證通過。
- 唯一 warningK8s 保留舊 `km-vectorize-29689620` failed Job該 Job 由 2026-06-14 03:00 官方 CronJob 產生image `26b67d11...``Pods Statuses: 0 Active / 0 Succeeded / 1 Failed`Events 已不存在;後續 `km-vectorize-29692500``29693940``29695380` 均為 `Complete`,最新約 9 小時前成功。
- 追蹤同時發生的 rollout12:14 曾短暫看到 external health `502` 與 API/Web startup probe 未 ready12:15 後 API health 持續 `200 healthy`12:16 至 12:17 readback 顯示 `awoooi-api 2/2``awoooi-web 2/2``awoooi-worker 1/1``awoooi-auto-repair-canary 1/1`pod 全部 `1/1 Running`
**完成度同步**
- Reboot SOP / Plan B / repo-side automation contracts`100%`
- Live service availability after read-only check`SERVICE_AVAILABLE_DEGRADED`hard blocker `0`
- Full cold-start green`NO-GO`,因為 `WARN=1`,必須清楚標成 stale failed Job warning不得講 `WARN=0`
- DR complete`NO-GO`credential escrow evidence markers 仍不可偽造,最新治理口徑仍為 `escrow_missing=5`
**邊界**:本輪 live 追蹤全程只讀;未刪除 failed Job、未手動建立 Job、未 patch K8s、未 ArgoCD sync、未重啟服務、未改 Docker / Nginx / firewall、未讀 secret、未送 Telegram、未 active scan。下一次正式重啟前仍必須重跑同一條 live preflight若只有 stale failed Job warning 且後續官方 Job 成功,可走 Plan B `B3_SERVICE_AVAILABLE_DEGRADED` 或維護後收斂到 `B4_FULL_STACK_GREEN`,但不可把 DEGRADED 說成 full green。
## 2026-06-18IwoooS SOC / SIEM / Kali 112 / Wazuh 整合控制本地驗證完成
**背景**:使用者要求把整體資安監控、告警與 Kali 112 主機徹底整合,並導入業界主流資安機制。前一輪已完成外部入侵主機防堵控制,但仍缺一層把 Wazuh、Kali 112、Prometheus / Alertmanager、SigNoz、Sentry、Nginx / Gateway、host forensic、Docker / systemd、K8s / ArgoCD、Gitea / runner、Harbor / SBOM 與 backup / DR 串成同一條只讀 SOC 控管線的 gate。

View File

@@ -1,6 +1,6 @@
# AWOOOI 全棧冷啟動與主機重啟 SOP
> Version: v1.23
> Version: v1.24
> Last updated: 2026-06-18 Asia/Taipei
> Scope: 110 / 120 / 121 / 188 full-stack reboot recovery. 112 Kali is recorded as P3 optional and is not part of this recovery path.
@@ -10,6 +10,18 @@
本節是每次接手、開機、關機、重啟後的第一個判定錨點。若日期不是今天,必須先重跑 live check再更新本節與 `docs/workplans/2026-06-04-reboot-cold-start-backup-recovery-workplan.md`
2026-06-18 12:17 live readback supersedes older service-availability wording:
```text
Repo-side reboot SOP / Plan B / automation contracts: COMPLETE, 100%.
Live cold-start read-only check: PASS=83 WARN=1 BLOCKED=0, Result=DEGRADED.
Service state: SERVICE_AVAILABLE_DEGRADED; 110/120/121/188 reachable, K3s mon/mon1 Ready, NODE_FS_ERROR_EVENTS=0, public routes/TLS green, 110/188 backup health fresh.
Rollout state after transient 12:14 startup window: awoooi-api 2/2, awoooi-web 2/2, worker 1/1, canary 1/1, public API health 200 healthy.
Only live warning: retained stale K8s Job km-vectorize-29689620 from 2026-06-14 03:00. Later official km-vectorize Jobs 29692500 / 29693940 / 29695380 are Complete.
Allowed declaration: services are available with one stale failed Job warning.
Forbidden declaration: full cold-start green, DR complete, or runtime/security acceptance.
```
| 項目 | 2026-06-14 18:15 Asia/Taipei live result | 判定 |
|------|-------------------------------------------|------|
| Overall recovery readiness | `97%` | `SERVICE_AVAILABLE_KM_VECTORIZE_FAILED_DR_ESCROW_BLOCKED` |
@@ -55,6 +67,17 @@ NO-GO for any CD workflow that writes deploy host keys into `/home/wooo/.ssh/kno
Current allowed wording: "core service and backup are available; 110 failed units are cleared after intentionally disabling `fwupd-refresh.timer`; high-value config Owner Packet 前台同步後 recovery readback shows no service regression; cold-start is degraded only by the `km-vectorize` official Job failure; DR complete still blocked by credential escrow; `km-vectorize` failed Job is retained but failed Pod/log are currently absent, so the next official 03:00 run remains the evidence gate."
```
2026-06-18 12:17 live rule:
```text
GO for controlled service availability: PASS=83 WARN=1 BLOCKED=0, public routes/TLS green, API health 200 healthy, API/Web/Worker/Canary ready after rollout convergence.
GO for repo-side reboot readiness mechanism: readiness audit PASS=185 WARN=1 BLOCKED=0; only skipped live gate warning before the live check was run.
NO-GO for "full cold-start green" until the retained stale failed Job evidence is either cleared by normal K8s history policy or explicitly accepted by an owner-provided readback package.
NO-GO for "DR complete" while credential escrow evidence markers remain missing.
Do not delete the failed Job manually during routine SOP verification. Keep it as evidence unless an approved maintenance window explicitly authorizes cleanup.
Current allowed wording: "SOP / Plan B / automation contracts are complete; live services are available with one retained stale km-vectorize failed Job warning; hard blockers are zero; DR remains blocked by credential escrow evidence."
```
After any future 120 recovery, rerun this exact chain from 110:
```bash
@@ -1481,6 +1504,23 @@ SOP update:
| Repo-side readiness audit | `PASS=185 WARN=1 BLOCKED=0`,結果 `READY WITH WARNINGS`;唯一 warning 是未跑 `--live` |
| Declaration limit | 可宣稱 `REPO_SIDE_REBOOT_READINESS_READY_WITH_LIVE_CHECK_REQUIRED`;不可宣稱 `FULL_STACK_GREEN``DR_COMPLETE` 或 live service recovery complete |
### 14.24 2026-06-18 live cold-start readback after repo-side closure
2026-06-18 12:13-12:17 的 readback 是 repo-side readiness closure 後的同日 live 驗證。這不是主機重啟,也不是 runtime 修復;它的用途是把「機制已完成」和「當下 live 狀態」分開,避免 false-green。
| 項目 | 2026-06-18 12:17 live baseline |
|------|--------------------------------|
| SOP version | `v1.24` |
| Cold-start read-only result | `PASS=83 WARN=1 BLOCKED=0`result `DEGRADED` |
| Host reachability | 110 / 120 / 121 / 188 ping OK and SSH port OK |
| K3s | `mon` / `mon1` Ready control-planeVIP `192.168.0.125` present on 120`NODE_FS_ERROR_EVENTS 0` |
| 110 / 188 service checks | 110 Harbor / Gitea / Prometheus / Alertmanager / Sentry reachable188 PostgreSQL / Redis / momo / SigNoz reachable |
| Backup health | 110 backup health `total=13 stale=0 missing_cron=0 missing_script=0 failed_count=0 config_failed=0 integrity_total=2 integrity_stale=0`188 backup health `total=2 stale=0` |
| Public route / TLS | awoooi API/Web、mo、momo health、Gitea、Harbor、registry、Sentry、SigNoz、stock、Langfuse、Bitan、aiops all 2xx/3xx with TLS verified |
| AWOOOI rollout convergence | After transient 12:14 startup window, final readback shows API `2/2`, Web `2/2`, Worker `1/1`, Canary `1/1`, API health `200 healthy` |
| Remaining warning | retained stale Job `km-vectorize-29689620` from 2026-06-14 03:00; later official Jobs `km-vectorize-29692500`, `29693940`, `29695380` are `Complete` |
| Declaration limit | 可宣稱 `SERVICE_AVAILABLE_DEGRADED`;不可宣稱 `FULL_STACK_GREEN`,因為 `WARN=1`;不可宣稱 `DR_COMPLETE`credential escrow evidence still requires real non-secret owner evidence |
### 14.22 重啟後時間軸驗證
每次重啟後照時間軸推進,不要等到最後才一次判定。

View File

@@ -11,13 +11,13 @@
| Area | Status | Completion | Evidence |
|------|--------|------------|----------|
| Overall recovery readiness | SERVICE_AVAILABLE_ARGOCD_HEALTHY_DR_ESCROW_BLOCKED | 98% | 2026-06-15 03:11 官方 `km-vectorize` 03:00 gate 已成功ArgoCD `awoooi-prod``Synced / Healthy`CronJob `lastSuccessfulTime=2026-06-14T19:00:55Z`Job `km-vectorize-29691060` `Complete`log 為 `embed-all: 200 {"total":31,"success":31,"failed":0}`。backup core blockers 仍為 `0`110 `13/13 fresh failed=0`188 `2/2 fresh failed=0`,但 `escrow_missing=5`。Full cold-start 仍不可宣稱 green因為最新 scorecard 為 `PASS=81 WARN=2 BLOCKED=0`warning 來自 188 momo scheduler registration/activity 未確認與 K8s 仍保留舊 failed Job evidence。 |
| Overall recovery readiness | SERVICE_AVAILABLE_DEGRADED_STALE_KM_JOB_DR_ESCROW_BLOCKED | 98% | 2026-06-18 12:13 live cold-start read-only gate returned `PASS=83 WARN=1 BLOCKED=0`, result `DEGRADED`。110 / 120 / 121 / 188 ping and SSH port are OK, K3s `mon` / `mon1` are Ready, `NODE_FS_ERROR_EVENTS=0`, public routes/TLS are green, 110 backup health is `13/13 fresh failed=0`, 188 backup health is `2/2 fresh failed=0`, and final rollout readback at 12:17 shows API `2/2`, Web `2/2`, Worker `1/1`, Canary `1/1`, API health `200 healthy`。Only warning is retained stale Job `km-vectorize-29689620` from 2026-06-14 03:00; later official `km-vectorize` Jobs `29692500` / `29693940` / `29695380` are `Complete`。DR remains blocked because credential escrow evidence markers are still missing and must not be forged. |
| P0 host / K3s recovery | DONE | 100% | 120 booted after console fsck at `2026-06-12 15:13`; latest 2026-06-14 18:15 readback shows 120 is reachable, K3s is active, `mon` and `mon1` are both `Ready control-plane`, and cold-start P0/P1 checks are green. |
| P1 backup / alert / escrow | BLOCKED_DR_ESCROW | 92% | 2026-06-15 03:11 `backup-status` shows 110 `13/13 fresh failed=0`, 188 `2/2 fresh failed=0`, `core_blockers=0`, `escrow_missing=5`, last aggregate `2026-06-15 02:40:13`. Offsite / escrow report shows `SCRIPT_MISSING_COUNT=0`, `OFFSITE_CONFIGURED=1`, `RCLONE_CONFIGURED=1`, `ESCROW_MISSING_COUNT=5`. Owner request package is ready; actual marker write remains blocked on real non-secret evidence IDs. |
| P2 service / data truth | VERIFIED_ARGOCD_HEALTHY_WITH_RESIDUAL_WARNINGS | 99% | 2026-06-15 03:11 cold-start is degraded by two warnings only; public route/API smoke is green, VIP API/Web are reachable, momo current-month parity remains covered by the scorecard, schedules/services are mostly green, and 110 failed units remain `0`. `km-vectorize-29691060` succeeded, ArgoCD is `Healthy`, and API/Web remain split across 120 / 121. Remaining scorecard warnings are 188 momo scheduler registration/activity not confirmed and retained old K8s failed Job evidence. |
| P3 docs / automation contracts | REPO_SIDE_READY_WITH_LIVE_CHECK_REQUIRED | 100% | Workplan, SOP v1.23, BACKUP-STATUS, LOGBOOK, 120 console/fsck recovery, Gitea backup stale-dump hardening, reboot ledger/version-comparison SOP, escrow evidence audit, 188 nginx Ansible baseline, 110 cold-start detector script, startup judgment layers, GO/NO-GO tree, host recovery cards, explicit Plan B degraded-operation path, machine-readable `plan_b` baseline, readiness-audit Plan B guard, B0-B5 service levels, T+0/T+120 fallback timeline checks, host role / load-balancing assessment, CD `known_hosts` guardrail, `fwupd-refresh.timer` rollback note, K3s filesystem event blocker, AWOOOI backup no-direct-offsite-sync contract, 110/188 Ansible source-of-truth, Gitea self-hosted readiness validation workflow, post-CD no-regression readback, P2-135 deploy recovery readback, P2-136 / AI Agent 活動正式部署後 recovery readback, P2-137 / CI smoke timeout recovery readback, P2-143 owner response 預檢後 recovery readback, P2-144 owner response 回讀後 recovery readback, P2-145 owner response 驗收門檻後 recovery readback, IwoooS P0 配置控管優先序後 recovery readback, 高價值配置 Owner Packet 前台同步後 recovery readback以及 `km-vectorize` official success readback 均已更新。2026-06-18 repo-side `reboot-recovery-readiness-audit.sh --no-color` returned `PASS=185 WARN=1 BLOCKED=0`; only warning is live gate skipped. |
| P2 service / data truth | VERIFIED_SERVICE_AVAILABLE_WITH_STALE_JOB_WARN | 99% | 2026-06-18 12:13 cold-start verifies public route/TLS, API/Web route, momo health and current-month parity `10936|10936|2026-06-01|2026-06-17|2026-06-01|2026-06-17`, backup exporters, schedules, K3s node readiness, VIP, and 110 / 188 runtime health. 12:17 post-rollout readback confirms AWOOOI deployments ready and API health `200 healthy`. The only service warning is old retained `km-vectorize-29689620` failed Job evidence; no hard blocker remains. |
| P3 docs / automation contracts | REPO_SIDE_READY_LIVE_CHECK_RUN_WITH_WARN | 100% | Workplan, SOP v1.24, BACKUP-STATUS, LOGBOOK, 120 console/fsck recovery, Gitea backup stale-dump hardening, reboot ledger/version-comparison SOP, escrow evidence audit, 188 nginx Ansible baseline, 110 cold-start detector script, startup judgment layers, GO/NO-GO tree, host recovery cards, explicit Plan B degraded-operation path, machine-readable `plan_b` baseline, readiness-audit Plan B guard, B0-B5 service levels, T+0/T+120 fallback timeline checks, host role / load-balancing assessment, CD `known_hosts` guardrail, `fwupd-refresh.timer` rollback note, K3s filesystem event blocker, AWOOOI backup no-direct-offsite-sync contract, 110/188 Ansible source-of-truth, Gitea self-hosted readiness validation workflow, post-CD no-regression readbacks, and 2026-06-18 live readback are updated. Repo-side `reboot-recovery-readiness-audit.sh --no-color` returned `PASS=185 WARN=1 BLOCKED=0`; live cold-start returned `PASS=83 WARN=1 BLOCKED=0`. |
Full cold-start may be declared green only for the latest verified evidence set. As of 2026-06-15 03:11, `km-vectorize` and ArgoCD are healthy, but the latest scorecard is still `DEGRADED` by residual warnings. Do not declare DR scorecard complete while credential escrow evidence remains blocked.
Full cold-start may be declared green only for the latest verified evidence set. As of 2026-06-18 12:17, services are available and hard blockers are zero, but the latest scorecard is still `DEGRADED` by one retained stale `km-vectorize` failed Job warning. Do not declare DR scorecard complete while credential escrow evidence remains blocked.
2026-06-13 01:26 refresh: full cold-start is again green for the current evidence set. AWOOOI API/Web workload balancing survived the next normal CD deploy: Gitea main `e4a349bc`, ArgoCD revision `e4a349bc`, images from `414413a5`, API/Web split across `mon` / `mon1`, and global `known_hosts` retained 120 / 188 after CD fix `80e6ec1a`. Do not declare DR complete while credential escrow is missing. `km-vectorize` remediation is `90%`: schedule/label fix is live, and the remaining gate is the next official 03:00 CronJob success readback.
@@ -175,7 +175,7 @@ Next: <single next action>
| P3-005 | DONE | 100 | Update cold-start SOP | SOP now includes start, shutdown, reboot, record, comparison, and 120 blocker handling. | Increment SOP version after each process change. | SOP has controlled power-operation sections and ledger template. |
| P3-006 | DONE | 100 | Update backup status | Backup status now reflects current cron, rclone latest-only, failure-only alert posture, and escrow blocker. | Refresh after 120 backup rerun. | Backup status no longer claims noisy success Telegram notifications. |
| P3-007 | DONE | 100 | Harden Gitea backup stale dump handling | 2026-06-05 manual Gitea backup failed because the container retained `/tmp/gitea-dump.zip` from the 02:00 failure. `scripts/backup/backup-gitea.sh` now renames stale container dump files to timestamped evidence before running a new dump, and the live 110 script is updated. | Watch the next 02:00 Gitea backup. | `bash -n` passes locally and on 110; manual Gitea backup completed after stale evidence rename. |
| P3-008 | DONE | 100 | Continuously optimize host reboot SOP | SOP v1.23 adds startup judgment layers, GO/NO-GO decision tree, freeze execution checklist, host boot detection, 110/188/120/121 recovery cards, explicit Plan B degraded-operation path, machine-readable `plan_b` baseline, readiness-audit Plan B guard, B0-B5 service levels, T+0/T+120 fallback timeline, K3s filesystem event blocker, repo-side readiness audit blocker closure, 2026-06-12 post-reboot anchor, 2026-06-13 post-CD trust/workload anchor, 2026-06-14 110 failed-unit cleanup anchor, 2026-06-14 post-CD recovery readback, P2-135 deploy recovery readback, P2-136 / AI Agent 活動正式部署後 recovery readback, P2-137 / CI smoke timeout recovery readback, P2-143 owner response 預檢後 recovery readback, P2-144 owner response 回讀後 recovery readback, P2-145 owner response 驗收門檻後 recovery readback, IwoooS P0 配置控管優先序後 recovery readback, 高價值配置 Owner Packet 前台同步後 recovery readback, AA/AS 判定, workload 分散判定, CD SSH trust guardrail, CronJob failure evidence retention rule, `fwupd-refresh.timer` rollback note以及 allowed declaration wording. | Use v1.23 for the next reboot record, then compare actual timing, Plan B trigger, degraded level, and blockers against §1.4 plus §14.8 through §14.23. Before any real reboot, rerun same-day live cold-start / backup / offsite / alert / escrow checks. | SOP distinguishes `HOST_BOOTED`, `HOST_READY`, `SERVICE_READY`, `FULL_STACK_GREEN`, `K3S_CONTROL_PLANE_AA`, `WORKLOAD_BALANCED`, `B0_ABORTED_BEFORE_REBOOT`, `B1_HOST_RECOVERY_ONLY`, `B2_CORE_SERVICE_READY`, `B3_SERVICE_AVAILABLE_DEGRADED`, `B4_FULL_STACK_GREEN`, and `B5_DR_COMPLETE`; repo-side `reboot-recovery-readiness-audit.sh --no-color` now returns `PASS=185 WARN=1 BLOCKED=0`, with the remaining warning only because live gate was intentionally skipped. |
| P3-008 | DONE | 100 | Continuously optimize host reboot SOP | SOP v1.24 adds startup judgment layers, GO/NO-GO decision tree, freeze execution checklist, host boot detection, 110/188/120/121 recovery cards, explicit Plan B degraded-operation path, machine-readable `plan_b` baseline, readiness-audit Plan B guard, B0-B5 service levels, T+0/T+120 fallback timeline, K3s filesystem event blocker, repo-side readiness audit blocker closure, 2026-06-18 live cold-start readback, post-reboot / post-CD recovery anchors, AA/AS 判定, workload 分散判定, CD SSH trust guardrail, CronJob failure evidence retention rule, `fwupd-refresh.timer` rollback note以及 allowed declaration wording. | Use v1.24 for the next reboot record, then compare actual timing, Plan B trigger, degraded level, and blockers against §1.4 plus §14.8 through §14.24. Before any real reboot, rerun same-day live cold-start / backup / offsite / alert / escrow checks. | SOP distinguishes `HOST_BOOTED`, `HOST_READY`, `SERVICE_READY`, `FULL_STACK_GREEN`, `K3S_CONTROL_PLANE_AA`, `WORKLOAD_BALANCED`, `B0_ABORTED_BEFORE_REBOOT`, `B1_HOST_RECOVERY_ONLY`, `B2_CORE_SERVICE_READY`, `B3_SERVICE_AVAILABLE_DEGRADED`, `B4_FULL_STACK_GREEN`, and `B5_DR_COMPLETE`; repo-side `reboot-recovery-readiness-audit.sh --no-color` returns `PASS=185 WARN=1 BLOCKED=0`, and live cold-start returns `PASS=83 WARN=1 BLOCKED=0` with one retained stale km-vectorize failed Job warning. |
| P3-009 | DONE | 100 | Assess 120/121 AA/AS role and host load balancing | 2026-06-12 15:19 live check confirms 120 and 121 are both `Ready control-plane`, `k3s active`, `k3s-agent inactive`, with no taints; however most AWOOOI / ArgoCD / Velero workload remains on 121 after 120 fsck recovery. New assessment defines control-plane AA vs workload AA, migration candidates from 110/188, and stateful migration blockers. | After P0 backup/offsite/cold-start green, implement topology spread for AWOOOI API/Web before moving additional services. | `docs/runbooks/HOST-ROLE-LOAD-BALANCING-ASSESSMENT.md` exists; SOP v1.6 links AA/AS and load-balancing checks; migration implementation remains explicitly `0%`. |
| P3-010 | DONE | 100 | Update workload balancing docs with 2026-06-13 live truth | Host role assessment, workplan, SOP, backup status, and LOGBOOK are refreshed with current cold-start, backup, 188 certbot degraded, ArgoCD `km-vectorize` degraded, Gitea main `acaae999`, ArgoCD sync, and final pod placement evidence. | Keep updating this file after the next reboot or deploy. | Docs separate service-green status from DR escrow, workload rollout, and non-service governance debt. |
| P3-011 | DONE | 100 | Record `km-vectorize` remediation status | LOGBOOK, this workplan, and SOP now state the schedule/label fix, ArgoCD sync evidence, the invalid manual Job boundary, and the 90% waiting-for-next-schedule gate. | After next 03:00 run, update this row and the top verdict with `lastSuccessfulTime` / ArgoCD health evidence. | No document claims ArgoCD green before official CronJob success evidence exists. |
@@ -214,6 +214,14 @@ Do not run `truncate`, whole DB restore, force-push, DROP, or online root filesy
## 9. Progress Updates
```text
2026-06-18 12:17 Asia/Taipei
Phase: P0/P2/P3 live readback
Before: repo-side readiness was complete, but live gate had not been rerun after the same-day push.
After: live cold-start is `PASS=83 WARN=1 BLOCKED=0`, result `DEGRADED`; final rollout readback shows API `2/2`, Web `2/2`, Worker `1/1`, Canary `1/1`, and API health `200 healthy`.
Evidence: `full-stack-cold-start-check.sh --monitor-read-only --no-color --watch --interval 1 --max-attempts 1`; read-only K8s deployment/job snapshot from 120; public API health readback.
Blocked: no hard blocker. One warning remains: stale retained Job `km-vectorize-29689620` from 2026-06-14 03:00; later official km-vectorize Jobs are Complete. DR complete still blocked by real credential escrow evidence markers.
Next: before any actual reboot, rerun the same live preflight and classify as `B3_SERVICE_AVAILABLE_DEGRADED` if only stale evidence remains, or `B4_FULL_STACK_GREEN` only when `WARN=0 BLOCKED=0`.
2026-06-18 12:06 Asia/Taipei
Phase: P3
Before: repo-side readiness audit PASS=147 WARN=2 BLOCKED=37 before blocker batch; after Plan B-only guard it still had pre-existing blockers.