docs(recovery): record post-push cold start blockers
Some checks failed
CD Pipeline / workflow-shape (push) Successful in 1s
CD Pipeline / cancel-stale-cd (push) Has been skipped
CD Pipeline / tests (push) Successful in 39s
CD Pipeline / build-and-deploy (push) Failing after 2m48s
AWOOOI Harbor 110 Local Repair / workflow-shape (push) Successful in 1s
CD Pipeline / post-deploy-checks (push) Has been skipped
AWOOOI Harbor 110 Local Repair / harbor-110-local-repair (push) Has been cancelled

This commit is contained in:
Your Name
2026-06-30 23:29:05 +08:00
parent 6808447069
commit c73f83b99e
2 changed files with 20 additions and 4 deletions

View File

@@ -1,3 +1,19 @@
## 2026-06-30 — 23:27 Post-push cold-start / Harbor / Stock readback
**照主線修正的問題**
- `3de828f97 fix(agent): surface harbor cd retry receipt blocker` 已推到 Gitea `main`,隨後同主線 fast-forward 到 `68084470 feat(agent): expose deploy marker receipt input`;本地 worktree 已同步 `gitea-ssh/main`
- 推後 public queue 回 `status=blocked_harbor_110_repair_no_matching_runner`latest CD `#4117 Running`、commit `6808447069422989c547c5edd5e88bcb745dd16b`Harbor repair `#4115 Waiting`、no-matching label `awoooi-host`repair jobs 仍 stale/mismatched。
- Live route probe 仍回 `registry.wooo.work/v2=502``192.168.0.110:5000/v2=502``harbor.wooo.work/api/v2.0/health=502``signoz.wooo.work=502`;不可宣稱全站已恢復。
- `scripts/reboot-recovery/full-stack-cold-start-check.sh` artifact `/tmp/awoooi-cold-start-after-3de828f97.log``PASS=67 WARN=5 BLOCKED=4`,結果 `BLOCKED`blockers 是 110 registry external `/v2`、110 SSH read-only check、K3s registry pull refused by `110:5000`、SignOz TLS/public route。
- StockPlatform public route / health 為 200`/api/v1/system/freshness``/api/v1/system/ingestion` 仍回 `status=not_configured``blockers=["postgres_not_ready"]``latest_trading_date=null`;不可宣稱產品資料最新。
**驗證**
- `pytest apps/api/tests/test_harbor_registry_controlled_recovery_receipt.py apps/api/tests/test_awoooi_priority_work_order_readback_api.py apps/api/tests/test_ai_agent_log_controlled_writeback_executor_readback_api.py ops/runner/test_read_public_gitea_actions_queue.py -q` 通過41 passed
- `py_compile``ruff check``guard-gitea-runner-pressure.py --root .``git diff --check` 通過。
- `git ls-remote gitea-ssh refs/heads/main``6808447069422989c547c5edd5e88bcb745dd16b`
**邊界**:只做 Gitea fast-forward、source/test/doc commit、public queue / HTTP / cold-start readback未讀 secret / token / `.env` / raw sessions / SQLite / auth未使用 GitHub / `gh` / GitHub API未 workflow_dispatch未 SSH 寫入、未重啟主機、未 restart Docker / Nginx / K3s / DB / firewall。
## 2026-06-30 — 23:18 Harbor recovery receipt absorbs live CD 502 retry
**照主線修正的問題**

View File

@@ -15,11 +15,11 @@
| 優先 | 狀態 | 工作項 | 2026-06-30 證據 | 下一步 / 完成條件 |
|------|------|--------|------------------|-------------------|
| P0-1 | BLOCKED | 全主機 cold-start / 10 分鐘自動恢復 SLO | 22:55 live cold-start artifact `/tmp/awoooi-cold-start-live-after-ff.log``PASS=68 WARN=4 BLOCKED=4`blockers 是 110 registry external `/v2`、110 SSH read-only check、K3s registry pull refused by `110:5000`、SignOz TLS / 502。22:31 SLO scorecard `/tmp/awoooi-reboot-slo-live-20260630-2231-scorecard.json` 仍回 `can_claim_all_services_recovered_within_target=false`22:28 post-reboot summary `/tmp/awoooi-post-reboot-readiness-20260630-222856/summary.txt``SERVICE_GREEN=0``PRODUCT_DATA_GREEN=0``BACKUP_CORE_GREEN=0``HOST_188_SERVICE_GREEN=0`。 | 先修第一個 runtime blocker110 control path / Harbor registry `/v2`。重跑同一 summary / cold-start / SLO scorecard 到 `SERVICE_GREEN=1``POST_START_BLOCKED=0``PASS` 無 BLOCKED、all-host required observed/reachable 且 `awoooi_reboot_auto_recovery_slo_ready=1`;不可只用 route 200 或 CD `Running` 宣稱恢復。 |
| P0-1 | BLOCKED | 全主機 cold-start / 10 分鐘自動恢復 SLO | 23:27 live cold-start artifact `/tmp/awoooi-cold-start-after-3de828f97.log``PASS=67 WARN=5 BLOCKED=4``Result: BLOCKED`blockers 是 110 registry external `/v2`、110 SSH read-only check、K3s registry pull refused by `110:5000`、SignOz TLS / public route。22:31 SLO scorecard `/tmp/awoooi-reboot-slo-live-20260630-2231-scorecard.json` 仍回 `can_claim_all_services_recovered_within_target=false`22:28 post-reboot summary `/tmp/awoooi-post-reboot-readiness-20260630-222856/summary.txt``SERVICE_GREEN=0``PRODUCT_DATA_GREEN=0``BACKUP_CORE_GREEN=0``HOST_188_SERVICE_GREEN=0`。 | 先修第一個 runtime blocker110 control path / Harbor registry `/v2`。重跑同一 summary / cold-start / SLO scorecard 到 `SERVICE_GREEN=1``POST_START_BLOCKED=0``PASS` 無 BLOCKED、all-host required observed/reachable 且 `awoooi_reboot_auto_recovery_slo_ready=1`;不可只用 route 200 或 CD `Running` 宣稱恢復。 |
| P0-2 | DONE_THIS_INCIDENT | 使用者可見 502Tsenyang | `www.tsenyang.com` / `tsenyang.com` 由 502 恢復為 200188 `tsenyang-website` container runninglocal `127.0.0.1:3000` 回 200。 | 下次同類 502 先查 release symlink / image / container不先動 Nginx、DNS、DB、主機重啟。 |
| P0-3 | BLOCKED | StockPlatform data freshness | 22:50 public `/api/v1/system/freshness``/api/v1/system/ingestion``status=not_configured``blockers=["postgres_not_ready"]`public route `https://stock.wooo.work/` 為 200 只代表網站可達不代表資料最新。 | 恢復 110 control path 後read-only 查 `/home/wooo/stockplatform-v2` compose / DB schema / migration status禁止 fake freshness、manual DB rows、restore/prune。 |
| P0-4 | BLOCKED | AWOOOI production 版本最新性 | Gitea SSH `main` 最新已到 `a7b79b7b feat(agent): expose harbor receipt input contract`。Public Gitea queue 23:18 讀到 latest CD `#4113 Running`,但 `current_main_cd_inflight_classifier=harbor_registry_public_route_unavailable_pending_retry`、latest registry `/v2` status `502`、Harbor login attempt `8`production 尚不能證明已部署最新 source。 | 補 deploy marker / runtime SHA / endpoint readback 一致Harbor `/v2` 恢復前 CD 無法把最新 source 發到 production未一致前不可宣稱 AWOOOI 最新。 |
| P0-5 | BLOCKED | 110 control path / Harbor registry `/v2` | 23:18 queue readback 回 `status=blocked_harbor_110_repair_no_matching_runner`Harbor repair workflow `#4112``Waiting`no-matching label `awoooi-host`jobs API 仍 stale/mismatched `ai-code-review`。Harbor receipt validator 對 live queue 回 `status=blocked_waiting_harbor_controlled_recovery_receipt`active blockers 含 `gitea_queue_current_cd_harbor_retrying_unavailable``gitea_queue_harbor_110_repair_no_matching_runner``gitea_queue_harbor_110_repair_jobs_stale_or_mismatched``public_registry_v2_verifier_not_green``internal_registry_v2_verifier_not_green`。22:55 cold-start 同步證明 110 registry `/v2` blocked、110 SSH read-only check blocked、K3s pull refused by `110:5000`。 | 讓 110-local repair workflow 或 110 console/local script 真正執行 `recover-110-control-path-and-harbor-local.sh --check` / `--apply-all`,並讀回 public/internal `/v2``200/401`。恢復 SSH read-only command path 後才能驗證 Stock DB、Gitea dump、110 backup completeness。 |
| P0-3 | BLOCKED | StockPlatform data freshness | 23:27 public route / health 為 200 `/api/v1/system/freshness``/api/v1/system/ingestion``status=not_configured``blockers=["postgres_not_ready"]``latest_trading_date=null`網站可達不代表資料最新。 | 恢復 110 control path 後read-only 查 `/home/wooo/stockplatform-v2` compose / DB schema / migration status禁止 fake freshness、manual DB rows、restore/prune。 |
| P0-4 | BLOCKED | AWOOOI production 版本最新性 | Gitea SSH `main` 最新已到 `68084470 feat(agent): expose deploy marker receipt input`,且包含 `3de828f97 fix(agent): surface harbor cd retry receipt blocker`。Public Gitea queue 23:27 讀到 latest CD `#4117 Running` for `68084470`production 尚不能證明已部署最新 source。 | 補 deploy marker / runtime SHA / endpoint readback 一致Harbor `/v2` 恢復前 CD 無法把最新 source 發到 production未一致前不可宣稱 AWOOOI 最新。 |
| P0-5 | BLOCKED | 110 control path / Harbor registry `/v2` | 23:27 queue readback 回 `status=blocked_harbor_110_repair_no_matching_runner`Harbor repair workflow `#4115``Waiting`no-matching label `awoooi-host`jobs API 仍 stale/mismatched。Live route probe 同步回 public registry `/v2=502`、internal `192.168.0.110:5000/v2=502`、Harbor health `502`、SignOz `502`。Harbor receipt validator 對 live queue 回 `status=blocked_waiting_harbor_controlled_recovery_receipt`active blockers 含 `gitea_queue_harbor_110_repair_no_matching_runner``gitea_queue_harbor_110_repair_jobs_stale_or_mismatched``public_registry_v2_verifier_not_green``internal_registry_v2_verifier_not_green`。23:27 cold-start 同步證明 110 registry `/v2` blocked、110 SSH read-only check blocked、K3s pull refused by `110:5000`。 | 讓 110-local repair workflow 或 110 console/local script 真正執行 `recover-110-control-path-and-harbor-local.sh --check` / `--apply-all`,並讀回 public/internal `/v2``200/401`。恢復 SSH read-only command path 後才能驗證 Stock DB、Gitea dump、110 backup completeness。 |
| P0-6 | BLOCKED_BACKUP_COMPLETENESS | Gitea repo visibility 與完整備份 | Gitea version API 200public repo search 只列 4 個 public repo`stockplatform-v2` public page/API 404但 internal `git ls-remote` 成功188 `/home/ollama/backup/110/gitea` 起初為空。已建立 verified emergency bundle `/home/ollama/backup/110/gitea/git-bundles/20260630-190931`4 個 public/internal repo bundle verify + checksum 成功,`AwoooGo``stockplatform-v2``vibework` 因 private auth fail-closed。20:18 summary 因 110 `backup-status` 不可讀回,`BACKUP_CORE_GREEN=0``DR_ESCROW_BLOCKED=1``DR_ESCROW_EVIDENCE_UNKNOWN=1`。 | 188 `gitea_repo_mirror_from_110` subtree metric / alert 已補;下一步仍是恢復 110 SSH command path 後跑正式 `gitea dump`、private repo 非互動備份、repo count、backup-status 與 restore drill readback。unknown 不得當作 backup / DR green。 |
| P0-7 | SOURCE_READY_RUNTIME_BLOCKED | 99 VMware / VM autostart | repo 已有 `windows99-vmware-autostart.ps1`22:05 host probe 讀到 99 ping reachable 但 `boot_id=reachable_unknown_boot` / uptime unknown111 不可達112/120/121/188 可讀188 startup unit failed/degraded。先前只讀 readback 顯示 99 RDP 3389 / SSH 22 可達、WinRM 5985 fail`administrator@192.168.0.99` SSH publickey denied。 | 恢復 99 可控通道或由 console 套用腳本;完成後讀回 111/188/120/121/112 boot evidence要求 all-host required observed/reachable 且 99 不再是 unknown uptime。 |
| P0-8 | SOURCE_READY_RUNTIME_BLOCKED | 502 maintenance fallback / Telegram / backup alert | L0/L1 fallback runbook、Nginx snippet、reboot / backup alert rules 已在 sourceruntime 尚需部署與外部 L1 provider readback。 | L0 以測試 vhost 驗證 `X-AWOOOI-Fallback`L1 需外部雲端/CDN probeTelegram 以脫敏 alert receipt 驗證。 |