docs(ops): record km-vectorize official success [skip ci]
This commit is contained in:
@@ -37160,6 +37160,26 @@ production browser smoke:
|
||||
- API / Web workload balancing:`LIVE_VERIFIED`。
|
||||
- DR scorecard:仍不可宣稱完成,credential escrow evidence 仍缺 `5` 個。
|
||||
|
||||
## 2026-06-15 — km-vectorize official 03:00 success readback
|
||||
|
||||
**Live read-only evidence,03:11 Asia/Taipei**:
|
||||
- ArgoCD `awoooi-prod`:`sync=Synced`、`health=Healthy`、revision `d388e5b477333fd5e661527a729406a4e8215320`。
|
||||
- CronJob `km-vectorize`:`schedule=0 3 * * *`、`timeZone=Asia/Taipei`、`suspend=false`、`lastScheduleTime=2026-06-14T19:00:00Z`、`lastSuccessfulTime=2026-06-14T19:00:55Z`。
|
||||
- Job `km-vectorize-29691060`:`Complete`、`succeeded=1`、`start=2026-06-14T19:00:00Z`、`completion=2026-06-14T19:00:55Z`。
|
||||
- Pod `km-vectorize-29691060-78xpz`:`Completed`、restart `0`、node `mon`。
|
||||
- Job log:`embed-all: 200 {"total":31,"success":31,"failed":0}`。
|
||||
- Backup status on 110:`110備份=13/13 fresh failed=0`、`188備份=2/2 fresh failed=0`、`core_blockers=0`、`escrow_missing=5`、last aggregate `2026-06-15 02:40:13`。
|
||||
- Offsite / escrow report on 110:`SCRIPT_MISSING_COUNT=0`、`OFFSITE_CONFIGURED=1`、`RCLONE_CONFIGURED=1`、`READINESS_REQUIRE_CONFIGURED_BLOCKED=0`、`ESCROW_MISSING_COUNT=5`。
|
||||
- Full cold-start read-only scorecard:`PASS=81 WARN=2 BLOCKED=0`、result `DEGRADED`。
|
||||
|
||||
**判定**:
|
||||
- `km-vectorize` 官方 03:00 success gate 已關閉,ArgoCD fully healthy gate 已解除。
|
||||
- Full cold-start 仍不可宣稱 `GREEN`,因為 scorecard 仍有兩個 warning:188 momo scheduler registration/activity 未確認,以及 K8s 仍保留舊 failed Job evidence。
|
||||
- DR scorecard 仍不可宣稱完成,credential escrow evidence marker 仍缺 `5` 個。
|
||||
|
||||
**邊界**:
|
||||
- 本輪只做只讀查證與文件更新;沒有手動刪 Job、沒有手動建立 Job、沒有 `kubectl patch` live、沒有重啟服務、沒有寫 credential escrow marker、沒有讀取或保存任何 secret value。
|
||||
|
||||
## 2026-06-13 — Credential escrow owner evidence request package
|
||||
|
||||
**Live read-only evidence,13:10 Asia/Taipei**:
|
||||
|
||||
@@ -1330,6 +1330,21 @@ SOP update:
|
||||
| Remaining gate | `km-vectorize-29689620` official Job 仍 failed;Credential escrow missing count 仍 `5` |
|
||||
| SOP change | v1.19 記錄 IwoooS P0 配置控管優先序後 no-regression readback,並維持宣告上限為 `SERVICE_AVAILABLE_KM_VECTORIZE_FAILED_DR_ESCROW_BLOCKED` |
|
||||
|
||||
### 14.20 2026-06-15 km-vectorize official success readback
|
||||
|
||||
2026-06-15 03:11 的變更不是主機重啟,而是確認 `km-vectorize` 官方 03:00 排程成功,並把 ArgoCD fully healthy gate 關閉。這個錨點只記錄 recovery / cold-start readback,不手動刪 Job、不手動建立 Job、不 `kubectl patch` live、不重啟服務,也不把任何 backup / restore / escrow owner acceptance ledger 視為 backup run、restore run、credential escrow marker write、host write 或 production write 授權。
|
||||
|
||||
| 項目 | 2026-06-15 03:11 km-vectorize official success baseline |
|
||||
|------|----------------------------------------------------------|
|
||||
| ArgoCD | `awoooi-prod` sync `Synced`,health `Healthy`,revision `d388e5b477333fd5e661527a729406a4e8215320` |
|
||||
| CronJob readback | `km-vectorize` schedule `0 3 * * *`、`timeZone=Asia/Taipei`、`suspend=false`、`lastScheduleTime=2026-06-14T19:00:00Z`、`lastSuccessfulTime=2026-06-14T19:00:55Z` |
|
||||
| Job / Pod / log | Job `km-vectorize-29691060` `Complete`,Pod `km-vectorize-29691060-78xpz` `Completed` restart `0`,log `embed-all: 200 {"total":31,"success":31,"failed":0}` |
|
||||
| Cold-start | 03:11 returned `PASS=81 WARN=2 BLOCKED=0`,result `DEGRADED` |
|
||||
| Backup | 110 `13/13 fresh failed=0`,188 `2/2 fresh failed=0`,`core_blockers=0`,last aggregate `2026-06-15 02:40:13` |
|
||||
| Escrow | `ESCROW_MISSING_COUNT=5`,缺 `restic_repository_password`、`offsite_provider_credentials`、`break_glass_admin_credentials`、`dns_registrar_recovery`、`oauth_ai_provider_recovery` |
|
||||
| Remaining warnings | 188 momo scheduler registration/activity 未確認;K8s 仍保留舊 failed Job evidence |
|
||||
| SOP change | v1.21 關閉 `km-vectorize` official success gate,但宣告上限仍是 `SERVICE_AVAILABLE_ARGOCD_HEALTHY_DR_ESCROW_BLOCKED`;不可宣稱 `full-stack green` 或 `DR complete` |
|
||||
|
||||
### 14.19 2026-06-14 高價值配置 Owner Packet 前台同步後 recovery readback
|
||||
|
||||
2026-06-14 18:15 的變更不是主機重啟,而是確認高價值配置 Owner Packet 前台同步正式部署後,reboot recovery baseline 仍沒有倒退。這個錨點只記錄 recovery / cold-start readback,不重複 Owner Packet 前台正式驗證、posture projection 或 intake preflight 內容,也不把前台草案可見視為 request sent、owner response received / accepted、runtime gate、Nginx reload、DNS / TLS probe、certbot renew、workflow / secret 修改、host write、active scan 或 production write。
|
||||
|
||||
@@ -11,13 +11,13 @@
|
||||
|
||||
| Area | Status | Completion | Evidence |
|
||||
|------|--------|------------|----------|
|
||||
| Overall recovery readiness | SERVICE_AVAILABLE_KM_VECTORIZE_FAILED_DR_ESCROW_BLOCKED | 97% | 2026-06-14 18:15 高價值配置 Owner Packet 前台同步後 cold-start scorecard 為 `PASS=82 WARN=1 BLOCKED=0`;120/121 K3s 皆為 `Ready control-plane`,backup core blockers 仍為 `0`,public route/API smoke 仍綠,最新 repo 文件基準為 `0a4766dd`,runtime deploy marker 為 `16c6b983`,ArgoCD source revision 為 `0a4766ddc94b0690824ce3deba5c6b9a69764f94`,且 API/Web/Worker/CronJob image `e999c16b3435f197b78fe2adfeec1c4faa6c4675` 已 live,API/Web 維持分散在 120 / 121。110 `fwupd` failed-unit warning 仍已清除,`systemctl --failed` clean。Full cold-start 仍不可宣稱 green,因為官方 `km-vectorize-29689620` Job 仍 failed;DR 仍受 5 個 credential escrow evidence marker 缺口阻擋。 |
|
||||
| Overall recovery readiness | SERVICE_AVAILABLE_ARGOCD_HEALTHY_DR_ESCROW_BLOCKED | 98% | 2026-06-15 03:11 官方 `km-vectorize` 03:00 gate 已成功:ArgoCD `awoooi-prod` 為 `Synced / Healthy`,CronJob `lastSuccessfulTime=2026-06-14T19:00:55Z`,Job `km-vectorize-29691060` `Complete`,log 為 `embed-all: 200 {"total":31,"success":31,"failed":0}`。backup core blockers 仍為 `0`,110 `13/13 fresh failed=0`,188 `2/2 fresh failed=0`,但 `escrow_missing=5`。Full cold-start 仍不可宣稱 green,因為最新 scorecard 為 `PASS=81 WARN=2 BLOCKED=0`,warning 來自 188 momo scheduler registration/activity 未確認與 K8s 仍保留舊 failed Job evidence。 |
|
||||
| P0 host / K3s recovery | DONE | 100% | 120 booted after console fsck at `2026-06-12 15:13`; latest 2026-06-14 18:15 readback shows 120 is reachable, K3s is active, `mon` and `mon1` are both `Ready control-plane`, and cold-start P0/P1 checks are green. |
|
||||
| P1 backup / alert / escrow | BLOCKED_DR_ESCROW | 92% | 2026-06-14 18:14 `backup-status` shows 110 `13/13 fresh failed=0`, 188 `2/2 fresh failed=0`, `core_blockers=0`, `escrow_missing=5`, last aggregate `2026-06-14 02:40:22`. Owner request package is ready; actual marker write remains blocked on real non-secret evidence IDs. |
|
||||
| P2 service / data truth | VERIFIED_WORKLOAD_BALANCED_WITH_KM_WARN | 99% | 2026-06-14 18:15 cold-start is degraded by one warning only; public route/API smoke is green, VIP API/Web are reachable, momo current-month parity remains covered by the scorecard, schedules/services are mostly green, and 110 failed units remain `0`. API/Web both keep 120 / 121 split placement after latest ArgoCD source revision `0a4766ddc94b0690824ce3deba5c6b9a69764f94`, with live API/Web/Worker/CronJob image `e999c16b3435f197b78fe2adfeec1c4faa6c4675`; the remaining exception is failed `km-vectorize-29689620`. |
|
||||
| P3 docs / automation contracts | DONE_WITH_VALIDATION_GAP | 100% | Workplan, SOP v1.20, BACKUP-STATUS, LOGBOOK, 120 console/fsck recovery, Gitea backup stale-dump hardening, reboot ledger/version-comparison SOP, escrow evidence audit, 188 nginx Ansible baseline, 110 cold-start detector script, startup judgment layers, GO/NO-GO tree, host recovery cards, T+0/T+60 timeline checks, host role / load-balancing assessment, CD `known_hosts` guardrail, `fwupd-refresh.timer` rollback note, post-CD no-regression readback, P2-135 deploy recovery readback, P2-136 / AI Agent 活動正式部署後 recovery readback, P2-137 / CI smoke timeout recovery readback, P2-143 owner response 預檢後 recovery readback, P2-144 owner response 回讀後 recovery readback, P2-145 owner response 驗收門檻後 recovery readback, IwoooS P0 配置控管優先序後 recovery readback, 高價值配置 Owner Packet 前台同步後 recovery readback,以及 `km-vectorize` remediation tracking 均已更新;本工作站無法執行 Ansible syntax check。 |
|
||||
| P1 backup / alert / escrow | BLOCKED_DR_ESCROW | 92% | 2026-06-15 03:11 `backup-status` shows 110 `13/13 fresh failed=0`, 188 `2/2 fresh failed=0`, `core_blockers=0`, `escrow_missing=5`, last aggregate `2026-06-15 02:40:13`. Offsite / escrow report shows `SCRIPT_MISSING_COUNT=0`, `OFFSITE_CONFIGURED=1`, `RCLONE_CONFIGURED=1`, `ESCROW_MISSING_COUNT=5`. Owner request package is ready; actual marker write remains blocked on real non-secret evidence IDs. |
|
||||
| P2 service / data truth | VERIFIED_ARGOCD_HEALTHY_WITH_RESIDUAL_WARNINGS | 99% | 2026-06-15 03:11 cold-start is degraded by two warnings only; public route/API smoke is green, VIP API/Web are reachable, momo current-month parity remains covered by the scorecard, schedules/services are mostly green, and 110 failed units remain `0`. `km-vectorize-29691060` succeeded, ArgoCD is `Healthy`, and API/Web remain split across 120 / 121. Remaining scorecard warnings are 188 momo scheduler registration/activity not confirmed and retained old K8s failed Job evidence. |
|
||||
| P3 docs / automation contracts | DONE_WITH_VALIDATION_GAP | 100% | Workplan, SOP v1.21, BACKUP-STATUS, LOGBOOK, 120 console/fsck recovery, Gitea backup stale-dump hardening, reboot ledger/version-comparison SOP, escrow evidence audit, 188 nginx Ansible baseline, 110 cold-start detector script, startup judgment layers, GO/NO-GO tree, host recovery cards, T+0/T+60 timeline checks, host role / load-balancing assessment, CD `known_hosts` guardrail, `fwupd-refresh.timer` rollback note, post-CD no-regression readback, P2-135 deploy recovery readback, P2-136 / AI Agent 活動正式部署後 recovery readback, P2-137 / CI smoke timeout recovery readback, P2-143 owner response 預檢後 recovery readback, P2-144 owner response 回讀後 recovery readback, P2-145 owner response 驗收門檻後 recovery readback, IwoooS P0 配置控管優先序後 recovery readback, 高價值配置 Owner Packet 前台同步後 recovery readback,以及 `km-vectorize` official success readback 均已更新;本工作站無法執行 Ansible syntax check。 |
|
||||
|
||||
Full cold-start may be declared green only for the latest verified evidence set. As of 2026-06-14 18:15, the latest evidence set is degraded by `km-vectorize` only, not green. Do not declare DR scorecard complete while credential escrow evidence remains blocked.
|
||||
Full cold-start may be declared green only for the latest verified evidence set. As of 2026-06-15 03:11, `km-vectorize` and ArgoCD are healthy, but the latest scorecard is still `DEGRADED` by residual warnings. Do not declare DR scorecard complete while credential escrow evidence remains blocked.
|
||||
|
||||
2026-06-13 01:26 refresh: full cold-start is again green for the current evidence set. AWOOOI API/Web workload balancing survived the next normal CD deploy: Gitea main `e4a349bc`, ArgoCD revision `e4a349bc`, images from `414413a5`, API/Web split across `mon` / `mon1`, and global `known_hosts` retained 120 / 188 after CD fix `80e6ec1a`. Do not declare DR complete while credential escrow is missing. `km-vectorize` remediation is `90%`: schedule/label fix is live, and the remaining gate is the next official 03:00 CronJob success readback.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user