diff --git a/docs/LOGBOOK.md b/docs/LOGBOOK.md index 85780de2..f95f11bb 100644 --- a/docs/LOGBOOK.md +++ b/docs/LOGBOOK.md @@ -46252,3 +46252,35 @@ production browser smoke: - Wazuh manager registry accepted 仍為 `0`:不得宣稱 Wazuh 全主機納管恢復。 - Owner response received / accepted 仍為 `0 / 0`;不得把「批准繼續」、空模板、UI 可見、route `200`、transport `6`、Dashboard index pattern `3` 或 owner-packet JSON 當成 evidence accepted。 - Runtime action / host write / credential marker write / Wazuh active response / Kali active scan 仍全部 `0 / false`。 + +## 2026-06-26 — 23:56 AWOOOI API rollout warmup classifier / SOP v1.76 + +**時間與來源**: +- 2026-06-26 23:31-23:56 Asia/Taipei。 +- 來源:`scripts/reboot-recovery/post-start-quick-check.sh --no-color`、`scripts/reboot-recovery/post-reboot-readiness-summary.sh --no-color`、AWOOOI production `/api/v1/health`、120 K3s / ArgoCD 只讀 readback、112 Wazuh manager / dashboard 只讀脫敏 readback。 + +**完成內容**: +- `post-start-quick-check.sh` 新增 `AWOOOI_API_ROUTE_OK`,當 delegated cold-start 在 K3s/CD rollout 瞬間單次輸出 `BLOCKED AWOOOI API not reachable`,但 wrapper public route retry 已確認 `https://awoooi.wooo.work/api/v1/health` 回 2xx 時,該 cold-start blocker 會降級為 route/API warmup evidence warning;public API 仍失敗、其他 non-route blocker 或 retry 後未恢復仍是 hard blocker。 +- `docs/runbooks/FULL-STACK-COLD-START-SOP.md` 升至 v1.76;`docs/runbooks/REBOOT-POST-START-QUICK-CHECK.md` 升至 v1.16;workplan Current Verdict 更新為 `FULL_STACK_GREEN_DR_ESCROW_BLOCKED`。 +- 120 K3s 只讀補查完成:ArgoCD `awoooi-prod` 為 `Synced / Healthy`,`awoooi-prod` Pod 均為 `Running` 或 `Completed`。歷史 `km-vectorize-29689620` failed Job 已被 2026-06-23、2026-06-24、2026-06-25 後續成功 Job 覆蓋,不是現行服務 blocker。 +- 112 Wazuh 只讀補查完成:systemd `running`,manager / indexer / dashboard `active`,API root 回 `401`,dashboard unauthenticated check endpoints 回 `401`;未讀、未輸出、未保存 secret value。manager registry 脫敏統計顯示 local manager `1`、受管 agent `5`、active managed `5`、disconnected `0`、never connected `0`。這證明 registry 並非全空,但仍未達成 SOP expected scope / Dashboard API / owner acceptance,因此 `WAZUH_MANAGER_REGISTRY_ACCEPTED=0` 維持。 + +**live 驗證結果**: +- 23:34 summary:`POST_START_RC=0`、`POST_START_RESULT=FULL_STACK_GREEN_DR_ESCROW_BLOCKED`、`SERVICE_GREEN=1`、`PRODUCT_DATA_GREEN=1`、`STOCK_FRESHNESS_STATUS=ok`、`STOCK_LATEST_TRADING_DATE=2026-06-26`、`STOCK_BLOCKERS=none`、`BACKUP_CORE_GREEN=1`、`ESCROW_MISSING_COUNT=5`、`HOST_188_HYGIENE_BLOCKED=0`。 +- 最終 23:56 summary:`POST_START_RC=0`、`POST_START_RESULT=FULL_STACK_GREEN_DR_ESCROW_BLOCKED`、`POST_START_PASS=38`、`POST_START_WARN=3`、`POST_START_BLOCKED=0`、`SERVICE_GREEN=1`、`PRODUCT_DATA_GREEN=1`、`BACKUP_CORE_GREEN=1`、`DR_ESCROW_BLOCKED=1`、`ESCROW_MISSING_COUNT=5`、`HOST_188_HYGIENE_BLOCKED=0`、`WAZUH_MANAGER_REGISTRY_ACCEPTED=0`、`RUNTIME_ACTION_AUTHORIZED=0`、`NEXT_REQUIRED_GATES=credential_escrow_evidence,wazuh_manager_registry_export`。 +- 110 CPU attribution:高 load 來自 active Gitea Actions / StockPlatform Next build / Jest worker 與平台服務;未觀察到 orphan Chrome recurrence;本輪未 kill、未 restart、未 cancel CI。 + +**做過的命令類型**: +- 只讀:cold-start / post-start / readiness summary、public route、112 Wazuh service / API / dashboard endpoint / registry 計數、110 CPU attribution。 +- 寫入:repo script / runbook / workplan / LOGBOOK only。 +- 未做:沒有 host / Docker / systemd / Nginx / firewall / K8s / DB runtime 寫操作;沒有讀 secret 明文;沒有 credential marker write;沒有 Wazuh active response / agent re-enroll / restart;沒有 Kali active scan;沒有取消 CI。 + +**目前判定**: +- 主機、K3s、服務、public routes、MOMO、StockPlatform freshness、backup core、188 host hygiene:`GREEN`。 +- Overall recovery declaration:`FULL_STACK_GREEN_DR_ESCROW_BLOCKED`。 +- SOP / quick-check / route + AWOOOI API warmup classifier:v1.76。 + +**仍 blocked / 不得宣稱**: +- DR credential escrow evidence 仍缺 `5`:不得宣稱 `DR_COMPLETE`。 +- Wazuh manager registry accepted 仍為 `0`:不得宣稱 Wazuh 全主機納管恢復;manager registry 脫敏統計只是 evidence,還缺 Dashboard API / owner acceptance。 +- Runtime action / host write / credential marker write / Wazuh active response / Kali active scan 仍全部 `0 / false`。 diff --git a/docs/runbooks/FULL-STACK-COLD-START-SOP.md b/docs/runbooks/FULL-STACK-COLD-START-SOP.md index b0828882..7eff15c4 100644 --- a/docs/runbooks/FULL-STACK-COLD-START-SOP.md +++ b/docs/runbooks/FULL-STACK-COLD-START-SOP.md @@ -1,6 +1,6 @@ # AWOOOI 全棧冷啟動與主機重啟 SOP -> Version: v1.75 +> Version: v1.76 > Last updated: 2026-06-26 Asia/Taipei > Scope: 110 / 120 / 121 / 188 full-stack reboot recovery. 112 Kali is recorded as P3 optional and is not part of this recovery path. @@ -10,7 +10,11 @@ 本節是每次接手、開機、關機、重啟後的第一個判定錨點。若日期不是今天,必須先重跑 live check,再更新本節與 `docs/workplans/2026-06-04-reboot-cold-start-backup-recovery-workplan.md`。 -若只是重啟後要快速判斷能不能宣稱恢復,先跑機器可讀摘要:`scripts/reboot-recovery/post-reboot-readiness-summary.sh --no-color`。此腳本會呼叫一頁式總檢查、188 host hygiene checklist 與 Wazuh no-false-green repo gates,並把 delegated logs 和可重放的 `summary.txt` 留在 `/tmp/awoooi-post-reboot-readiness-*`。v1.75 起,同一輪驗收後續步驟必須吃同一個 `$ARTIFACT_DIR/summary.txt`,例如 `scripts/reboot-recovery/post-reboot-declaration-guard.py --summary-file "$ARTIFACT_DIR/summary.txt" --no-color` 與 `scripts/reboot-recovery/post-reboot-next-gate-dispatch.sh --summary-file "$ARTIFACT_DIR/summary.txt" --no-color`;不得在同一輪 evidence chain 反覆重跑 live probes 後混用不同時間點結論。宣告 guard 會把 summary 轉成 allowed / forbidden declaration,避免把服務綠誤報成 DR complete、188 host hygiene、Wazuh registry recovered 或 runtime authorized。若 summary 顯示 `SERVICE_GREEN=1` 但 `NEXT_REQUIRED_GATES` 仍非空,再由 dispatch checklist 把尚未完成的 blocker 轉成 owner / evidence / forbidden-action checklist;需要機器可讀 intake 時,再跑 `scripts/reboot-recovery/post-reboot-next-gate-owner-packets.py --dispatch-file --output /tmp/awoooi-post-reboot-owner-packets.json` 產生 `awoooi_post_reboot_next_gate_owner_packets_v1` JSON,並立刻跑 `scripts/reboot-recovery/post-reboot-owner-packet-contract-guard.py --packet-file /tmp/awoooi-post-reboot-owner-packets.json`。dispatch / packet / guard 均固定 `DISPATCH_AUTHORIZED=0`、`REQUEST_SENT_COUNT=0`、`OWNER_RESPONSE_ACCEPTED=0`、`HOST_WRITE_AUTHORIZED=0`、`SECRET_VALUE_COLLECTION_ALLOWED=0`、`RUNTIME_GATE=0`;guard 未通過時不得送 owner request、不得寫 escrow marker、不得進維護窗口、不得宣稱 DR / Wazuh registry complete。v1.74 起,任何 owner response JSON 還必須經過 `scripts/reboot-recovery/post-reboot-owner-response-preflight.py --no-color --owner-packet-file --response-file `:空模板、placeholder、secret payload、runtime action request、credential marker write、Wazuh active response / re-enroll / restart、Kali active scan 或缺少 Dashboard API / manager registry evidence 都必須 fail-closed;preflight 通過也只表示可進入獨立 reviewer acceptance,不是 runtime 授權。需要人工展開時,再跑 `scripts/reboot-recovery/post-start-quick-check.sh --no-color` 並以 `docs/runbooks/REBOOT-POST-START-QUICK-CHECK.md` 作為 fallback。長 SOP 保留完整背景、例外處理與 Plan B;短版 wrapper / checklist 負責每次 T+10 分鐘內的固定判定。 +若只是重啟後要快速判斷能不能宣稱恢復,先跑機器可讀摘要:`scripts/reboot-recovery/post-reboot-readiness-summary.sh --no-color`。此腳本會呼叫一頁式總檢查、188 host hygiene checklist 與 Wazuh no-false-green repo gates,並把 delegated logs 和可重放的 `summary.txt` 留在 `/tmp/awoooi-post-reboot-readiness-*`。v1.75 起,同一輪驗收後續步驟必須吃同一個 `$ARTIFACT_DIR/summary.txt`,例如 `scripts/reboot-recovery/post-reboot-declaration-guard.py --summary-file "$ARTIFACT_DIR/summary.txt" --no-color` 與 `scripts/reboot-recovery/post-reboot-next-gate-dispatch.sh --summary-file "$ARTIFACT_DIR/summary.txt" --no-color`;不得在同一輪 evidence chain 反覆重跑 live probes 後混用不同時間點結論。v1.76 起,delegated cold-start 若在 K3s rollout / CD 替換瞬間出現單次 `BLOCKED AWOOOI API not reachable`,但 wrapper 自己的 public `https://awoooi.wooo.work/api/v1/health` route retry 已回 2xx,該 blocker 只列為 route/API warmup evidence warning;public API 仍失敗、其他 non-route blocker、或 retry 後未恢復時,仍維持 hard blocked。宣告 guard 會把 summary 轉成 allowed / forbidden declaration,避免把服務綠誤報成 DR complete、188 host hygiene、Wazuh registry recovered 或 runtime authorized。若 summary 顯示 `SERVICE_GREEN=1` 但 `NEXT_REQUIRED_GATES` 仍非空,再由 dispatch checklist 把尚未完成的 blocker 轉成 owner / evidence / forbidden-action checklist;需要機器可讀 intake 時,再跑 `scripts/reboot-recovery/post-reboot-next-gate-owner-packets.py --dispatch-file --output /tmp/awoooi-post-reboot-owner-packets.json` 產生 `awoooi_post_reboot_next_gate_owner_packets_v1` JSON,並立刻跑 `scripts/reboot-recovery/post-reboot-owner-packet-contract-guard.py --packet-file /tmp/awoooi-post-reboot-owner-packets.json`。dispatch / packet / guard 均固定 `DISPATCH_AUTHORIZED=0`、`REQUEST_SENT_COUNT=0`、`OWNER_RESPONSE_ACCEPTED=0`、`HOST_WRITE_AUTHORIZED=0`、`SECRET_VALUE_COLLECTION_ALLOWED=0`、`RUNTIME_GATE=0`;guard 未通過時不得送 owner request、不得寫 escrow marker、不得進維護窗口、不得宣稱 DR / Wazuh registry complete。v1.74 起,任何 owner response JSON 還必須經過 `scripts/reboot-recovery/post-reboot-owner-response-preflight.py --no-color --owner-packet-file --response-file `:空模板、placeholder、secret payload、runtime action request、credential marker write、Wazuh active response / re-enroll / restart、Kali active scan 或缺少 Dashboard API / manager registry evidence 都必須 fail-closed;preflight 通過也只表示可進入獨立 reviewer acceptance,不是 runtime 授權。需要人工展開時,再跑 `scripts/reboot-recovery/post-start-quick-check.sh --no-color` 並以 `docs/runbooks/REBOOT-POST-START-QUICK-CHECK.md` 作為 fallback。長 SOP 保留完整背景、例外處理與 Plan B;短版 wrapper / checklist 負責每次 T+10 分鐘內的固定判定。 + +v1.76 owner gate replay rule:同一輪 summary 產生後,owner packet 與 owner response preflight 必須優先使用 `--summary-file "$ARTIFACT_DIR/summary.txt"`,例如 `scripts/reboot-recovery/post-reboot-next-gate-owner-packets.py --no-color --summary-file "$ARTIFACT_DIR/summary.txt" --output /tmp/awoooi-post-reboot-owner-packets.json` 與 `scripts/reboot-recovery/post-reboot-owner-response-preflight.py --no-color --summary-file "$ARTIFACT_DIR/summary.txt" --response-file `。只有在刻意要重新取 live evidence 時,才允許省略 `--summary-file`;否則 preflight 不得自己重跑 summary 造成同一輪狀態漂移。 + +2026-06-26 23:56 最新 live summary:`scripts/reboot-recovery/post-reboot-readiness-summary.sh --no-color` 回傳 `POST_START_RESULT=FULL_STACK_GREEN_DR_ESCROW_BLOCKED`、`POST_START_PASS=38`、`POST_START_WARN=3`、`POST_START_BLOCKED=0`、`SERVICE_GREEN=1`、`PRODUCT_DATA_GREEN=1`、`STOCK_FRESHNESS_STATUS=ok`、`STOCK_LATEST_TRADING_DATE=2026-06-26`、`STOCK_BLOCKERS=none`、`BACKUP_CORE_GREEN=1`、`ESCROW_MISSING_COUNT=5`、`HOST_188_HYGIENE_BLOCKED=0`、`WAZUH_MANAGER_REGISTRY_ACCEPTED=0`、`RUNTIME_ACTION_AUTHORIZED=0`。同一時段只讀補查 120:ArgoCD `awoooi-prod` 為 `Synced / Healthy`,`awoooi-prod` Pod 均為 `Running` 或 `Completed`;歷史 `km-vectorize-29689620` failed Job 已被 2026-06-23、2026-06-24、2026-06-25 後續成功 Job 覆蓋,不是目前服務 blocker。同一時段只讀補查 112:systemd `running`,Wazuh manager / indexer / dashboard `active`,manager API root 回 `401`,Dashboard unauthenticated check endpoints 回 `401`,manager registry 脫敏讀回為 local manager `1`、受管 agent `5`、active managed `5`、disconnected `0`、never connected `0`。此證據證明 registry 不再是「全空」,但仍不能宣稱 Wazuh 全主機納管恢復,因為 SOP expected scope 仍是 6、Dashboard API connection / version 尚未以登入或 owner evidence 驗收,owner response accepted 仍為 `0`。 2026-06-26 18:46 最新即時恢復真相已覆蓋 12:13 對今日產品資料的判讀:`scripts/reboot-recovery/post-reboot-readiness-summary.sh --no-color` 回傳 `POST_START_RESULT=PRODUCT_DATA_PENDING_EOD_WINDOW`、`SERVICE_GREEN=1`、`PRODUCT_DATA_GREEN=0`、`STOCK_LATEST_TRADING_DATE=2026-06-26`、`STOCK_BLOCKERS=core_margin_short_daily_missing,ai_recommendations_stale`、`BACKUP_CORE_GREEN=1`、`ESCROW_MISSING_COUNT=5`、`WAZUH_MANAGER_REGISTRY_ACCEPTED=0`。同一輪 live cold-start 長檢查回傳 `PASS=87 WARN=0 BLOCKED=0` 與 `Result: GREEN`,代表 110 / 120 / 121 / 188 主機、K3s、public routes、AWOOI API、MOMO、backup core、exporters、cron 與 Alertmanager 服務層已恢復;但 StockPlatform 今日官方 margin-short 尚未發布,AI recommendations 仍依賴該資料,因此不可宣稱所有產品資料最新。18:43 已以授權 `SIGTERM` 清除 110 上兩組 6 小時以上 `stockplatform-review-bulk-ux` orphan Chrome process group,`REMAINING=0`;18:44-18:46 已停止 168 Mac Mini 上本機 AWOOOI `next build` 並清理 temp/build/cache 與 Antigravity backup browser recordings,使 `/System/Volumes/Data` 從約 `1.0Gi / 100%` 回到約 `8.7Gi / 96%`。112 Kali 的 `networking.service` failed 已定位為 `/etc/network/if-up.d/wg-nat` 錯誤 shebang `#\!/bin/bash` 導致 `Exec format error`;Wazuh manager / indexer / dashboard 仍 active,該 hook 修復需要 112 sudo 提權,未使用或保存密碼。 diff --git a/docs/runbooks/REBOOT-POST-START-QUICK-CHECK.md b/docs/runbooks/REBOOT-POST-START-QUICK-CHECK.md index b3eecb2e..ab3c3f2c 100644 --- a/docs/runbooks/REBOOT-POST-START-QUICK-CHECK.md +++ b/docs/runbooks/REBOOT-POST-START-QUICK-CHECK.md @@ -1,6 +1,6 @@ # 主機重啟後一頁式總檢查 -> Version: v1.15 +> Version: v1.16 > Last updated: 2026-06-26 Asia/Taipei > Scope: 110 / 120 / 121 / 188 post-reboot service recovery. 112 Kali / Wazuh / active scan 不屬於本流程。 @@ -10,7 +10,7 @@ 每次 110 / 120 / 121 / 188 任一台主機開機、關機、重啟、斷電恢復、VMware console fsck、Docker / K3s 大量重排後,都先跑本頁,再決定是否宣稱恢復。 -最新基準:2026-06-26 17:45 single-summary replay / route warmup classifier。`scripts/reboot-recovery/post-reboot-readiness-summary.sh --no-color` 回傳 `SERVICE_GREEN=1`、`PRODUCT_DATA_GREEN=1`、`BACKUP_CORE_GREEN=1`、`DR_ESCROW_BLOCKED=1`、`ESCROW_MISSING_COUNT=5`、`HOST_188_HYGIENE_BLOCKED=0`、`HOST_188_RESULT=HOST_188_HYGIENE_GREEN.`、`WAZUH_MANAGER_REGISTRY_ACCEPTED=0`、`WAZUH_COVERAGE_SCOPE=6`、`WAZUH_DIRECT_ACTIVE=2`、`WAZUH_NO_TRANSPORT=1`、`WAZUH_SSH_BLOCKED=3`、`WAZUH_ROUTE_CODE=200`、`WAZUH_TRANSPORT_COUNT=6`、`WAZUH_DASHBOARD_API_CONNECTION=pending_or_spinning`、`WAZUH_DASHBOARD_INDEX_OK=3`、`RUNTIME_ACTION_AUTHORIZED=0`、`OVERALL_DECLARATION=FULL_STACK_GREEN_DR_ESCROW_BLOCKED`,並自動把同一份 key/value 寫到 `$ARTIFACT_DIR/summary.txt`。同一輪後續 `post-reboot-declaration-guard.py`、`post-reboot-next-gate-dispatch.sh`、`post-reboot-next-gate-owner-packets.py`、`post-reboot-owner-packet-contract-guard.py`、`post-reboot-owner-response-preflight.py` 必須使用這份 `summary.txt` 或由它產生的 dispatch / packet,不得混用多次 live probe 的不同時間點結果。`NEXT_REQUIRED_GATES=credential_escrow_evidence,wazuh_manager_registry_export` 仍是唯一目前 next gates;DR 仍因 `escrow_missing=5` 不可宣稱 complete;Wazuh manager registry accepted 仍是 `0`,不可把 route `200`、transport `6`、Dashboard index pattern `3` 當成 registry recovered。v1.15 另補 route warmup classifier:delegated cold-start 若只因 public route 單次 502 / TLS readback 暫時 blocked,但 wrapper route retry 已確認全部恢復,該 blocker 會降級為 evidence warning;非 route blocker 或 retry 後仍失敗仍為 hard blocked。 +最新基準:2026-06-26 23:56 single-summary replay / route + AWOOOI API warmup classifier。`scripts/reboot-recovery/post-reboot-readiness-summary.sh --no-color` 回傳 `POST_START_RESULT=FULL_STACK_GREEN_DR_ESCROW_BLOCKED`、`POST_START_PASS=38`、`POST_START_WARN=3`、`POST_START_BLOCKED=0`、`SERVICE_GREEN=1`、`PRODUCT_DATA_GREEN=1`、`STOCK_FRESHNESS_STATUS=ok`、`STOCK_LATEST_TRADING_DATE=2026-06-26`、`STOCK_BLOCKERS=none`、`BACKUP_CORE_GREEN=1`、`DR_ESCROW_BLOCKED=1`、`ESCROW_MISSING_COUNT=5`、`HOST_188_HYGIENE_BLOCKED=0`、`HOST_188_RESULT=HOST_188_HYGIENE_GREEN.`、`WAZUH_MANAGER_REGISTRY_ACCEPTED=0`、`WAZUH_COVERAGE_SCOPE=6`、`WAZUH_DIRECT_ACTIVE=2`、`WAZUH_NO_TRANSPORT=1`、`WAZUH_SSH_BLOCKED=3`、`WAZUH_ROUTE_CODE=200`、`WAZUH_TRANSPORT_COUNT=6`、`WAZUH_DASHBOARD_API_CONNECTION=pending_or_spinning`、`WAZUH_DASHBOARD_INDEX_OK=3`、`RUNTIME_ACTION_AUTHORIZED=0`、`OVERALL_DECLARATION=FULL_STACK_GREEN_DR_ESCROW_BLOCKED`,並自動把同一份 key/value 寫到 `$ARTIFACT_DIR/summary.txt`。同一輪後續 `post-reboot-declaration-guard.py`、`post-reboot-next-gate-dispatch.sh`、`post-reboot-next-gate-owner-packets.py`、`post-reboot-owner-packet-contract-guard.py`、`post-reboot-owner-response-preflight.py` 必須使用這份 `summary.txt` 或由它產生的 dispatch / packet,不得混用多次 live probe 的不同時間點結果。`NEXT_REQUIRED_GATES=credential_escrow_evidence,wazuh_manager_registry_export` 仍是唯一目前 next gates;DR 仍因 `escrow_missing=5` 不可宣稱 complete;Wazuh manager registry accepted 仍是 `0`,不可把 route `200`、transport `6`、Dashboard index pattern `3` 或脫敏 registry 計數當成全主機納管完成。v1.16 另補 route/API warmup classifier:delegated cold-start 若只因 public route 單次 502 / TLS readback,或 K3s rollout 瞬間單次 `BLOCKED AWOOOI API not reachable`,但 wrapper route retry 已確認 public AWOOOI API health 為 2xx,該 blocker 會降級為 evidence warning;public API 仍失敗、其他 non-route blocker 或 retry 後未恢復仍為 hard blocked。 本頁只回答四件事: @@ -99,7 +99,7 @@ scripts/reboot-recovery/post-reboot-next-gate-dispatch.sh --no-color --summary-f 若要交給 AI / 工單 / owner review 使用,產生機器可讀 owner packet: ```bash -scripts/reboot-recovery/post-reboot-next-gate-owner-packets.py --no-color +scripts/reboot-recovery/post-reboot-next-gate-owner-packets.py --no-color --summary-file "$ARTIFACT_DIR/summary.txt" ``` 輸出 JSON 只能作為 intake / review packet,不是 request sent。必須看到 `request_sent_count=0`、`owner_response_accepted_count=0`、`runtime_action_authorized_count=0`,否則視為不合格。 @@ -107,7 +107,7 @@ scripts/reboot-recovery/post-reboot-next-gate-owner-packets.py --no-color 送入任何 owner review queue 前,必須先把 JSON 存成 artifact 並跑 contract guard: ```bash -scripts/reboot-recovery/post-reboot-next-gate-owner-packets.py --no-color --dispatch-file /tmp/awoooi-post-reboot-dispatch.txt --output /tmp/awoooi-post-reboot-owner-packets.json +scripts/reboot-recovery/post-reboot-next-gate-owner-packets.py --no-color --summary-file "$ARTIFACT_DIR/summary.txt" --output /tmp/awoooi-post-reboot-owner-packets.json scripts/reboot-recovery/post-reboot-owner-packet-contract-guard.py --packet-file /tmp/awoooi-post-reboot-owner-packets.json ``` @@ -116,7 +116,7 @@ guard 必須輸出 `POST_REBOOT_OWNER_PACKET_CONTRACT_GUARD_OK gates= argparse.Namespace: type=Path, help="Use an existing post-reboot-next-gate-dispatch output file.", ) + parser.add_argument( + "--summary-file", + type=Path, + help="Pass an existing readiness summary file to the delegated dispatch checklist.", + ) parser.add_argument( "--output", type=Path, @@ -43,8 +48,10 @@ def parse_args() -> argparse.Namespace: return parser.parse_args() -def run_dispatch(no_color: bool) -> str: +def run_dispatch(no_color: bool, summary_file: Path | None) -> str: cmd = [str(DISPATCH_SCRIPT)] + if summary_file: + cmd.extend(["--summary-file", str(summary_file)]) if no_color: cmd.append("--no-color") completed = subprocess.run( @@ -65,7 +72,7 @@ def run_dispatch(no_color: bool) -> str: def load_dispatch(args: argparse.Namespace) -> str: if args.dispatch_file: return args.dispatch_file.read_text(encoding="utf-8") - return run_dispatch(no_color=args.no_color) + return run_dispatch(no_color=args.no_color, summary_file=args.summary_file) def split_csv(value: str) -> list[str]: diff --git a/scripts/reboot-recovery/post-reboot-owner-response-preflight.py b/scripts/reboot-recovery/post-reboot-owner-response-preflight.py index 864f8c33..bc858d38 100755 --- a/scripts/reboot-recovery/post-reboot-owner-response-preflight.py +++ b/scripts/reboot-recovery/post-reboot-owner-response-preflight.py @@ -97,6 +97,11 @@ def parse_args() -> argparse.Namespace: type=Path, help="Use an existing owner packet JSON instead of generating one.", ) + parser.add_argument( + "--summary-file", + type=Path, + help="Generate owner packets from an existing readiness summary file.", + ) parser.add_argument("--json", action="store_true", help="Print machine-readable JSON.") parser.add_argument( "--no-color", @@ -118,8 +123,10 @@ def load_json(path: Path, label: str = "response_file") -> dict[str, Any]: return payload -def generate_owner_packet(no_color: bool) -> dict[str, Any]: +def generate_owner_packet(no_color: bool, summary_file: Path | None) -> dict[str, Any]: cmd = [str(OWNER_PACKET_GENERATOR)] + if summary_file: + cmd.extend(["--summary-file", str(summary_file)]) if no_color: cmd.append("--no-color") completed = subprocess.run( @@ -147,7 +154,7 @@ def generate_owner_packet(no_color: bool) -> dict[str, Any]: def load_owner_packet(args: argparse.Namespace) -> dict[str, Any]: if args.owner_packet_file: return load_json(args.owner_packet_file, label="owner_packet_file") - return generate_owner_packet(no_color=args.no_color) + return generate_owner_packet(no_color=args.no_color, summary_file=args.summary_file) def as_list(value: Any) -> list[Any]: diff --git a/scripts/reboot-recovery/post-start-quick-check.sh b/scripts/reboot-recovery/post-start-quick-check.sh index 77ceb6ed..d54feb7a 100755 --- a/scripts/reboot-recovery/post-start-quick-check.sh +++ b/scripts/reboot-recovery/post-start-quick-check.sh @@ -22,6 +22,7 @@ COLD_START_PENDING_BLOCKERS=0 COLD_START_BLOCKED_SUMMARY="" COLD_START_BLOCKED_LINES="" ROUTE_SMOKE_BLOCKED=0 +AWOOOI_API_ROUTE_OK=0 STOCK_EOD_WINDOW_PENDING=0 STOCK_EOD_CLASSIFICATION="not_evaluated" STOCK_EOD_NEXT_ACTION="not_evaluated" @@ -469,6 +470,9 @@ if [[ "$RUN_ROUTES" -eq 1 ]]; then done case "$code" in 2*|3*) + if [[ "$url" == "https://awoooi.wooo.work/api/v1/health" && "$code" == 2* ]]; then + AWOOOI_API_ROUTE_OK=1 + fi if [[ "$attempt" -gt 1 ]]; then evidence_warn "$code $url recovered_after_attempt=$attempt" else @@ -485,8 +489,13 @@ fi if [[ "$COLD_START_PENDING_BLOCKERS" -gt 0 ]]; then non_route_cold_blockers="$(printf '%s\n' "$COLD_START_BLOCKED_LINES" | grep -Ev '^BLOCKED public route ' || true)" + if [[ "$RUN_ROUTES" -eq 1 && "$ROUTE_SMOKE_BLOCKED" -eq 0 && "$AWOOOI_API_ROUTE_OK" -eq 1 ]]; then + non_route_cold_blockers="$( + printf '%s\n' "$non_route_cold_blockers" | grep -Ev '^BLOCKED AWOOOI API not reachable$|^BLOCKED AWOOI API not reachable$' || true + )" + fi if [[ "$RUN_ROUTES" -eq 1 && "$ROUTE_SMOKE_BLOCKED" -eq 0 && -z "$non_route_cold_blockers" ]]; then - evidence_warn "cold-start public-route blockers recovered under wrapper route retry: $COLD_START_BLOCKED_SUMMARY" + evidence_warn "cold-start route/API warmup blockers recovered under wrapper route retry: $COLD_START_BLOCKED_SUMMARY" printf '%s\n' "$COLD_START_BLOCKED_LINES" else blocked "cold-start has blockers: $COLD_START_BLOCKED_SUMMARY"