From 293b70a2e76bee5f23071bce34898d812a533f63 Mon Sep 17 00:00:00 2001 From: Your Name Date: Sat, 13 Jun 2026 14:15:28 +0800 Subject: [PATCH] docs(ops): record final post-trigger deploy closeout [skip ci] --- docs/LOGBOOK.md | 10 ++++++++++ docs/runbooks/FULL-STACK-COLD-START-SOP.md | 8 ++++---- ...06-04-reboot-cold-start-backup-recovery-workplan.md | 3 ++- 3 files changed, 16 insertions(+), 5 deletions(-) diff --git a/docs/LOGBOOK.md b/docs/LOGBOOK.md index 1c0639bd..7c1951ed 100644 --- a/docs/LOGBOOK.md +++ b/docs/LOGBOOK.md @@ -34055,3 +34055,13 @@ production browser smoke: - Guards:`SECURITY_MIRROR_PROGRESS_GUARD_OK`、`SOURCE_CONTROL_OWNER_RESPONSE_GUARD_OK`。 - Full cold-start:`PASS=83 WARN=0 BLOCKED=0`。 - 判定:security mirror source / production bundle / routes / cold-start 均已收斂;剩餘紅燈仍是 credential escrow marker 缺 `5` 個與 `km-vectorize` 下一次 03:00 官方成功 gate。 + +**Final post-trigger deploy closeout,14:13 Asia/Taipei**: +- Deploy marker:`834ccdba chore(cd): deploy bf86017 [skip ci]`。 +- ArgoCD:revision `834ccdba83541ec68913324afabef3f71c6890bf`,`Synced / Degraded`;Degraded 仍只由 `km-vectorize` 官方成功 gate 造成。 +- Live API/Web/Worker image:`bf86017757c457e98c8b2ce5513b77f6fbaf97f1`。 +- Public smoke:`/zh-TW/governance=200`、`/en/governance=200`、`/api/v1/health=healthy`。 +- Source guards:`SECURITY_MIRROR_PROGRESS_GUARD_OK`、`SOURCE_CONTROL_OWNER_RESPONSE_GUARD_OK`。 +- Backup status:110 `13/13 fresh failed=0`、188 `2/2 fresh failed=0`、`core_blockers=0`、`escrow_missing=5`。 +- Full cold-start:`PASS=83 WARN=0 BLOCKED=0`。 +- 判定:production deploy / security mirror / cold-start 均已綠;DR complete 仍等待五個 credential escrow marker,ArgoCD fully healthy 仍等待下一次 03:00 `km-vectorize` 官方成功。 diff --git a/docs/runbooks/FULL-STACK-COLD-START-SOP.md b/docs/runbooks/FULL-STACK-COLD-START-SOP.md index 53028082..c82a4630 100644 --- a/docs/runbooks/FULL-STACK-COLD-START-SOP.md +++ b/docs/runbooks/FULL-STACK-COLD-START-SOP.md @@ -26,11 +26,11 @@ | Backup status | 13:52 status: 110 `13/13 fresh failed=0`, 188 `2/2 fresh failed=0`, `core_blockers=0`, `escrow_missing=5`, last aggregate `2026-06-13 02:34:06` | `GREEN_WITH_DR_ESCROW_WARNING` | | Offsite sync / verify | 01:28 textfile: `awoooi_backup_offsite_remote_verify_ok=1`, `full_verify_fresh=1`, all 13 repos have `snapshot_count=1` and `snapshot_latest_only=1`; latest scheduled verifier log is 2026-06-12 07:20 | `GREEN` | | Backup / cold-start alerts | 01:27 live visibility check confirms Prometheus and Alertmanager expose the 5 required credential escrow gap alerts; Prometheus rules API has all five required alert names healthy; label contract check loads 24 baseline backup alert rules | `GREEN_WITH_EXPECTED_REDLIGHTS` | -| Cold-start scorecard | 14:10 read-only scorecard after production Web image `6cf8d3ca` verification: `PASS=83 WARN=0 BLOCKED=0` | `GREEN` | +| Cold-start scorecard | 14:13 read-only scorecard after production image `bf860177` verification: `PASS=83 WARN=0 BLOCKED=0` | `GREEN` | | momo DB parity | `4571|4571|2026-06-01|2026-06-07|2026-06-01|2026-06-07` | `GREEN` | | momo scheduler | container healthy; scorecard reads `SCHEDULER_RECENT_ACTIVITY 1136`; detector widened and deployed to 110 | `GREEN` | -| ArgoCD app health | 14:10 ArgoCD revision `64ea2444` is `Synced`; app remains `Degraded` only because `km-vectorize` CronJob has not completed its last execution successfully. CronJob schedule is `0 3 * * *` with `timeZone=Asia/Taipei`, `failedJobsHistoryLimit=3`, and `lastSuccessfulTime=2026-06-04T11:00:37Z`; next 03:00 run must prove success or leave inspectable failure evidence. | `GOVERNANCE_DEBT_IN_REMEDIATION` | -| Workload balancing | Live Web image is `6cf8d3ca`, containing the security mirror messages; API/Web pods remain split across 120 / 121, Worker single replica remains healthy | `GREEN` | +| ArgoCD app health | 14:13 ArgoCD revision `834ccdba` is `Synced`; app remains `Degraded` only because `km-vectorize` CronJob has not completed its last execution successfully. CronJob schedule is `0 3 * * *` with `timeZone=Asia/Taipei`, `failedJobsHistoryLimit=3`, and `lastSuccessfulTime=2026-06-04T11:00:37Z`; next 03:00 run must prove success or leave inspectable failure evidence. | `GOVERNANCE_DEBT_IN_REMEDIATION` | +| Workload balancing | Live API/Web/Worker image is `bf860177`; API/Web pods remain split across 120 / 121, Worker single replica remains healthy | `GREEN` | | Credential escrow | 5 non-secret evidence markers missing | `BLOCKED` | Release rule: @@ -44,7 +44,7 @@ Do not declare DR scorecard complete while credential escrow markers are missing 2026-06-13 live rule: ```text -110 / 120 / 121 / 188 service recovery is full-stack green after the 14:10 scorecard. +110 / 120 / 121 / 188 service recovery is full-stack green after the 14:13 scorecard. GO for controlled runner/CD release; keep AI auto-remediation governed by normal gates. NO-GO for "DR complete" while credential escrow evidence markers are missing. Do not fake or silence credential escrow alerts; they are the remaining correct DR red light. diff --git a/docs/workplans/2026-06-04-reboot-cold-start-backup-recovery-workplan.md b/docs/workplans/2026-06-04-reboot-cold-start-backup-recovery-workplan.md index 4c141571..64eb9243 100644 --- a/docs/workplans/2026-06-04-reboot-cold-start-backup-recovery-workplan.md +++ b/docs/workplans/2026-06-04-reboot-cold-start-backup-recovery-workplan.md @@ -11,7 +11,7 @@ | Area | Status | Completion | Evidence | |------|--------|------------|----------| -| Overall recovery readiness | SERVICE_GREEN_WORKLOAD_BALANCED_DR_ESCROW_BLOCKED | 96% | 2026-06-13 14:10 final cold-start scorecard is `PASS=83 WARN=0 BLOCKED=0`; 120/121 K3s are both `Ready control-plane`, backup core blockers remain `0`, public routes/TLS/momo DB/schedules/Alertmanager are green, Web image `6cf8d3ca` containing security mirror messages is live, API/Web remain live-verified split across 120 / 121, and CD no longer clobbers global `known_hosts`. 13:10 escrow report shows offsite/rclone/script readiness green, but DR remains blocked by five missing credential escrow evidence markers; ArgoCD `km-vectorize` is tracked separately as governance health debt until its official scheduled Job refreshes `lastSuccessfulTime`. | +| Overall recovery readiness | SERVICE_GREEN_WORKLOAD_BALANCED_DR_ESCROW_BLOCKED | 96% | 2026-06-13 14:13 final cold-start scorecard is `PASS=83 WARN=0 BLOCKED=0`; 120/121 K3s are both `Ready control-plane`, backup core blockers remain `0`, public routes/TLS/momo DB/schedules/Alertmanager are green, deploy marker `834ccdba` put API/Web/Worker image `bf860177` live, API/Web remain live-verified split across 120 / 121, and CD no longer clobbers global `known_hosts`. 13:10 escrow report shows offsite/rclone/script readiness green, but DR remains blocked by five missing credential escrow evidence markers; ArgoCD `km-vectorize` is tracked separately as governance health debt until its official scheduled Job refreshes `lastSuccessfulTime`. | | P0 host / K3s recovery | DONE | 100% | 120 booted after console fsck at `2026-06-12 15:13`; host is reachable, root is mounted `rw`, failed units `0`, `mon` and `mon1` are both `Ready control-plane`, and cold-start P0/P1 checks are green. | | P1 backup / alert / escrow | BLOCKED_DR_ESCROW | 92% | 2026-06-13 12:43 `backup-status` shows 110 `13/13 fresh failed=0`, 188 `2/2 fresh failed=0`, `core_blockers=0`, `escrow_missing=5`; 13:10 escrow report shows `SCRIPT_MISSING_COUNT=0`, `OFFSITE_CONFIGURED=1`, `RCLONE_CONFIGURED=1`, `ESCROW_MISSING_COUNT=5`, `PASS=8 WARN=5 BLOCKED=0`. Owner request package is now ready; actual marker write remains blocked on real non-secret evidence IDs. | | P2 service / data truth | VERIFIED_WORKLOAD_BALANCED | 100% | 2026-06-13 13:52 cold-start is green; public routes/TLS are green, VIP API/Web are reachable, momo current-month parity remains covered by the scorecard, schedules/services are green. API/Web both keep 120 / 121 split placement after latest ArgoCD revision `b557a4b5`. | @@ -60,6 +60,7 @@ Full cold-start may be declared green only for the latest verified evidence set. | 2026-06-13 API rollout strategy hardening | LIVE_VERIFIED | First hard-spread rollout reached ArgoCD revision `17e017f5`; `DoNotSchedule` was live, but API completed with both new pods on 121 because old 120 pods were still terminating during scheduling. Second GitOps rollout reached ArgoCD revision `60f653a0`, API/Web use `maxSurge=0`, `maxUnavailable=1`, `minDomains=2`, `DoNotSchedule`, and both deployments are split `mon` / `mon1`. Public API / governance route smoke passed and 12:59 cold-start returned `PASS=83 WARN=0 BLOCKED=0`. | | 2026-06-13 security mirror guard closure | LIVE_VERIFIED | Gitea main `b557a4b5` restores `apps/web/messages/en.json` as the required Traditional Chinese mirror of `zh-TW.json`; `security-mirror-progress-guard.py` now passes. ArgoCD revision `b557a4b5` is `Synced / Degraded` only by `km-vectorize`; API/Web/Worker are ready, API pods split `mon` / `mon1`, Web pods split `mon1` / `mon`, public API health is `healthy`, zh/en governance routes are `200`, backup status has `core_blockers=0`, and 13:52 cold-start is `PASS=83 WARN=0 BLOCKED=0`. | | 2026-06-13 security mirror production image closeout | LIVE_VERIFIED | Gitea main `64ea2444` records the Web rebuild trigger. Deploy marker `2cc02f1c chore(cd): deploy 6cf8d3c [skip ci]` put Web image `6cf8d3ca` live; ArgoCD source revision later advanced to `64ea2444` while Web image correctly remains `6cf8d3ca` because `64ea2444` is docs/changelog only. Public `/zh-TW/governance` and `/en/governance` return `200`, API health is `healthy`, `security-mirror-progress-guard.py` passes, and 14:10 cold-start is `PASS=83 WARN=0 BLOCKED=0`. | +| 2026-06-13 final post-trigger deploy closeout | LIVE_VERIFIED | Deploy marker `834ccdba chore(cd): deploy bf86017 [skip ci]` put API/Web/Worker image `bf860177` live. ArgoCD revision `834ccdba` is `Synced / Degraded` only by `km-vectorize`; routes `/zh-TW/governance` and `/en/governance` return `200`, API health is `healthy`, source guards pass, backup status has `core_blockers=0` and `escrow_missing=5`, and 14:13 cold-start is `PASS=83 WARN=0 BLOCKED=0`. | ---