Merge remote-tracking branch 'gitea/main' into codex/delivery-workbench-release-20260626-ffsync
This commit is contained in:
@@ -46252,3 +46252,35 @@ production browser smoke:
|
||||
- Wazuh manager registry accepted 仍為 `0`:不得宣稱 Wazuh 全主機納管恢復。
|
||||
- Owner response received / accepted 仍為 `0 / 0`;不得把「批准繼續」、空模板、UI 可見、route `200`、transport `6`、Dashboard index pattern `3` 或 owner-packet JSON 當成 evidence accepted。
|
||||
- Runtime action / host write / credential marker write / Wazuh active response / Kali active scan 仍全部 `0 / false`。
|
||||
|
||||
## 2026-06-26 — 23:56 AWOOOI API rollout warmup classifier / SOP v1.76
|
||||
|
||||
**時間與來源**:
|
||||
- 2026-06-26 23:31-23:56 Asia/Taipei。
|
||||
- 來源:`scripts/reboot-recovery/post-start-quick-check.sh --no-color`、`scripts/reboot-recovery/post-reboot-readiness-summary.sh --no-color`、AWOOOI production `/api/v1/health`、120 K3s / ArgoCD 只讀 readback、112 Wazuh manager / dashboard 只讀脫敏 readback。
|
||||
|
||||
**完成內容**:
|
||||
- `post-start-quick-check.sh` 新增 `AWOOOI_API_ROUTE_OK`,當 delegated cold-start 在 K3s/CD rollout 瞬間單次輸出 `BLOCKED AWOOOI API not reachable`,但 wrapper public route retry 已確認 `https://awoooi.wooo.work/api/v1/health` 回 2xx 時,該 cold-start blocker 會降級為 route/API warmup evidence warning;public API 仍失敗、其他 non-route blocker 或 retry 後未恢復仍是 hard blocker。
|
||||
- `docs/runbooks/FULL-STACK-COLD-START-SOP.md` 升至 v1.76;`docs/runbooks/REBOOT-POST-START-QUICK-CHECK.md` 升至 v1.16;workplan Current Verdict 更新為 `FULL_STACK_GREEN_DR_ESCROW_BLOCKED`。
|
||||
- 120 K3s 只讀補查完成:ArgoCD `awoooi-prod` 為 `Synced / Healthy`,`awoooi-prod` Pod 均為 `Running` 或 `Completed`。歷史 `km-vectorize-29689620` failed Job 已被 2026-06-23、2026-06-24、2026-06-25 後續成功 Job 覆蓋,不是現行服務 blocker。
|
||||
- 112 Wazuh 只讀補查完成:systemd `running`,manager / indexer / dashboard `active`,API root 回 `401`,dashboard unauthenticated check endpoints 回 `401`;未讀、未輸出、未保存 secret value。manager registry 脫敏統計顯示 local manager `1`、受管 agent `5`、active managed `5`、disconnected `0`、never connected `0`。這證明 registry 並非全空,但仍未達成 SOP expected scope / Dashboard API / owner acceptance,因此 `WAZUH_MANAGER_REGISTRY_ACCEPTED=0` 維持。
|
||||
|
||||
**live 驗證結果**:
|
||||
- 23:34 summary:`POST_START_RC=0`、`POST_START_RESULT=FULL_STACK_GREEN_DR_ESCROW_BLOCKED`、`SERVICE_GREEN=1`、`PRODUCT_DATA_GREEN=1`、`STOCK_FRESHNESS_STATUS=ok`、`STOCK_LATEST_TRADING_DATE=2026-06-26`、`STOCK_BLOCKERS=none`、`BACKUP_CORE_GREEN=1`、`ESCROW_MISSING_COUNT=5`、`HOST_188_HYGIENE_BLOCKED=0`。
|
||||
- 最終 23:56 summary:`POST_START_RC=0`、`POST_START_RESULT=FULL_STACK_GREEN_DR_ESCROW_BLOCKED`、`POST_START_PASS=38`、`POST_START_WARN=3`、`POST_START_BLOCKED=0`、`SERVICE_GREEN=1`、`PRODUCT_DATA_GREEN=1`、`BACKUP_CORE_GREEN=1`、`DR_ESCROW_BLOCKED=1`、`ESCROW_MISSING_COUNT=5`、`HOST_188_HYGIENE_BLOCKED=0`、`WAZUH_MANAGER_REGISTRY_ACCEPTED=0`、`RUNTIME_ACTION_AUTHORIZED=0`、`NEXT_REQUIRED_GATES=credential_escrow_evidence,wazuh_manager_registry_export`。
|
||||
- 110 CPU attribution:高 load 來自 active Gitea Actions / StockPlatform Next build / Jest worker 與平台服務;未觀察到 orphan Chrome recurrence;本輪未 kill、未 restart、未 cancel CI。
|
||||
|
||||
**做過的命令類型**:
|
||||
- 只讀:cold-start / post-start / readiness summary、public route、112 Wazuh service / API / dashboard endpoint / registry 計數、110 CPU attribution。
|
||||
- 寫入:repo script / runbook / workplan / LOGBOOK only。
|
||||
- 未做:沒有 host / Docker / systemd / Nginx / firewall / K8s / DB runtime 寫操作;沒有讀 secret 明文;沒有 credential marker write;沒有 Wazuh active response / agent re-enroll / restart;沒有 Kali active scan;沒有取消 CI。
|
||||
|
||||
**目前判定**:
|
||||
- 主機、K3s、服務、public routes、MOMO、StockPlatform freshness、backup core、188 host hygiene:`GREEN`。
|
||||
- Overall recovery declaration:`FULL_STACK_GREEN_DR_ESCROW_BLOCKED`。
|
||||
- SOP / quick-check / route + AWOOOI API warmup classifier:v1.76。
|
||||
|
||||
**仍 blocked / 不得宣稱**:
|
||||
- DR credential escrow evidence 仍缺 `5`:不得宣稱 `DR_COMPLETE`。
|
||||
- Wazuh manager registry accepted 仍為 `0`:不得宣稱 Wazuh 全主機納管恢復;manager registry 脫敏統計只是 evidence,還缺 Dashboard API / owner acceptance。
|
||||
- Runtime action / host write / credential marker write / Wazuh active response / Kali active scan 仍全部 `0 / false`。
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# AWOOOI 全棧冷啟動與主機重啟 SOP
|
||||
|
||||
> Version: v1.75
|
||||
> Version: v1.76
|
||||
> Last updated: 2026-06-26 Asia/Taipei
|
||||
> Scope: 110 / 120 / 121 / 188 full-stack reboot recovery. 112 Kali is recorded as P3 optional and is not part of this recovery path.
|
||||
|
||||
@@ -10,7 +10,11 @@
|
||||
|
||||
本節是每次接手、開機、關機、重啟後的第一個判定錨點。若日期不是今天,必須先重跑 live check,再更新本節與 `docs/workplans/2026-06-04-reboot-cold-start-backup-recovery-workplan.md`。
|
||||
|
||||
若只是重啟後要快速判斷能不能宣稱恢復,先跑機器可讀摘要:`scripts/reboot-recovery/post-reboot-readiness-summary.sh --no-color`。此腳本會呼叫一頁式總檢查、188 host hygiene checklist 與 Wazuh no-false-green repo gates,並把 delegated logs 和可重放的 `summary.txt` 留在 `/tmp/awoooi-post-reboot-readiness-*`。v1.75 起,同一輪驗收後續步驟必須吃同一個 `$ARTIFACT_DIR/summary.txt`,例如 `scripts/reboot-recovery/post-reboot-declaration-guard.py --summary-file "$ARTIFACT_DIR/summary.txt" --no-color` 與 `scripts/reboot-recovery/post-reboot-next-gate-dispatch.sh --summary-file "$ARTIFACT_DIR/summary.txt" --no-color`;不得在同一輪 evidence chain 反覆重跑 live probes 後混用不同時間點結論。宣告 guard 會把 summary 轉成 allowed / forbidden declaration,避免把服務綠誤報成 DR complete、188 host hygiene、Wazuh registry recovered 或 runtime authorized。若 summary 顯示 `SERVICE_GREEN=1` 但 `NEXT_REQUIRED_GATES` 仍非空,再由 dispatch checklist 把尚未完成的 blocker 轉成 owner / evidence / forbidden-action checklist;需要機器可讀 intake 時,再跑 `scripts/reboot-recovery/post-reboot-next-gate-owner-packets.py --dispatch-file <dispatch.txt> --output /tmp/awoooi-post-reboot-owner-packets.json` 產生 `awoooi_post_reboot_next_gate_owner_packets_v1` JSON,並立刻跑 `scripts/reboot-recovery/post-reboot-owner-packet-contract-guard.py --packet-file /tmp/awoooi-post-reboot-owner-packets.json`。dispatch / packet / guard 均固定 `DISPATCH_AUTHORIZED=0`、`REQUEST_SENT_COUNT=0`、`OWNER_RESPONSE_ACCEPTED=0`、`HOST_WRITE_AUTHORIZED=0`、`SECRET_VALUE_COLLECTION_ALLOWED=0`、`RUNTIME_GATE=0`;guard 未通過時不得送 owner request、不得寫 escrow marker、不得進維護窗口、不得宣稱 DR / Wazuh registry complete。v1.74 起,任何 owner response JSON 還必須經過 `scripts/reboot-recovery/post-reboot-owner-response-preflight.py --no-color --owner-packet-file <owner-packets.json> --response-file <file>`:空模板、placeholder、secret payload、runtime action request、credential marker write、Wazuh active response / re-enroll / restart、Kali active scan 或缺少 Dashboard API / manager registry evidence 都必須 fail-closed;preflight 通過也只表示可進入獨立 reviewer acceptance,不是 runtime 授權。需要人工展開時,再跑 `scripts/reboot-recovery/post-start-quick-check.sh --no-color` 並以 `docs/runbooks/REBOOT-POST-START-QUICK-CHECK.md` 作為 fallback。長 SOP 保留完整背景、例外處理與 Plan B;短版 wrapper / checklist 負責每次 T+10 分鐘內的固定判定。
|
||||
若只是重啟後要快速判斷能不能宣稱恢復,先跑機器可讀摘要:`scripts/reboot-recovery/post-reboot-readiness-summary.sh --no-color`。此腳本會呼叫一頁式總檢查、188 host hygiene checklist 與 Wazuh no-false-green repo gates,並把 delegated logs 和可重放的 `summary.txt` 留在 `/tmp/awoooi-post-reboot-readiness-*`。v1.75 起,同一輪驗收後續步驟必須吃同一個 `$ARTIFACT_DIR/summary.txt`,例如 `scripts/reboot-recovery/post-reboot-declaration-guard.py --summary-file "$ARTIFACT_DIR/summary.txt" --no-color` 與 `scripts/reboot-recovery/post-reboot-next-gate-dispatch.sh --summary-file "$ARTIFACT_DIR/summary.txt" --no-color`;不得在同一輪 evidence chain 反覆重跑 live probes 後混用不同時間點結論。v1.76 起,delegated cold-start 若在 K3s rollout / CD 替換瞬間出現單次 `BLOCKED AWOOOI API not reachable`,但 wrapper 自己的 public `https://awoooi.wooo.work/api/v1/health` route retry 已回 2xx,該 blocker 只列為 route/API warmup evidence warning;public API 仍失敗、其他 non-route blocker、或 retry 後未恢復時,仍維持 hard blocked。宣告 guard 會把 summary 轉成 allowed / forbidden declaration,避免把服務綠誤報成 DR complete、188 host hygiene、Wazuh registry recovered 或 runtime authorized。若 summary 顯示 `SERVICE_GREEN=1` 但 `NEXT_REQUIRED_GATES` 仍非空,再由 dispatch checklist 把尚未完成的 blocker 轉成 owner / evidence / forbidden-action checklist;需要機器可讀 intake 時,再跑 `scripts/reboot-recovery/post-reboot-next-gate-owner-packets.py --dispatch-file <dispatch.txt> --output /tmp/awoooi-post-reboot-owner-packets.json` 產生 `awoooi_post_reboot_next_gate_owner_packets_v1` JSON,並立刻跑 `scripts/reboot-recovery/post-reboot-owner-packet-contract-guard.py --packet-file /tmp/awoooi-post-reboot-owner-packets.json`。dispatch / packet / guard 均固定 `DISPATCH_AUTHORIZED=0`、`REQUEST_SENT_COUNT=0`、`OWNER_RESPONSE_ACCEPTED=0`、`HOST_WRITE_AUTHORIZED=0`、`SECRET_VALUE_COLLECTION_ALLOWED=0`、`RUNTIME_GATE=0`;guard 未通過時不得送 owner request、不得寫 escrow marker、不得進維護窗口、不得宣稱 DR / Wazuh registry complete。v1.74 起,任何 owner response JSON 還必須經過 `scripts/reboot-recovery/post-reboot-owner-response-preflight.py --no-color --owner-packet-file <owner-packets.json> --response-file <file>`:空模板、placeholder、secret payload、runtime action request、credential marker write、Wazuh active response / re-enroll / restart、Kali active scan 或缺少 Dashboard API / manager registry evidence 都必須 fail-closed;preflight 通過也只表示可進入獨立 reviewer acceptance,不是 runtime 授權。需要人工展開時,再跑 `scripts/reboot-recovery/post-start-quick-check.sh --no-color` 並以 `docs/runbooks/REBOOT-POST-START-QUICK-CHECK.md` 作為 fallback。長 SOP 保留完整背景、例外處理與 Plan B;短版 wrapper / checklist 負責每次 T+10 分鐘內的固定判定。
|
||||
|
||||
v1.76 owner gate replay rule:同一輪 summary 產生後,owner packet 與 owner response preflight 必須優先使用 `--summary-file "$ARTIFACT_DIR/summary.txt"`,例如 `scripts/reboot-recovery/post-reboot-next-gate-owner-packets.py --no-color --summary-file "$ARTIFACT_DIR/summary.txt" --output /tmp/awoooi-post-reboot-owner-packets.json` 與 `scripts/reboot-recovery/post-reboot-owner-response-preflight.py --no-color --summary-file "$ARTIFACT_DIR/summary.txt" --response-file <file>`。只有在刻意要重新取 live evidence 時,才允許省略 `--summary-file`;否則 preflight 不得自己重跑 summary 造成同一輪狀態漂移。
|
||||
|
||||
2026-06-26 23:56 最新 live summary:`scripts/reboot-recovery/post-reboot-readiness-summary.sh --no-color` 回傳 `POST_START_RESULT=FULL_STACK_GREEN_DR_ESCROW_BLOCKED`、`POST_START_PASS=38`、`POST_START_WARN=3`、`POST_START_BLOCKED=0`、`SERVICE_GREEN=1`、`PRODUCT_DATA_GREEN=1`、`STOCK_FRESHNESS_STATUS=ok`、`STOCK_LATEST_TRADING_DATE=2026-06-26`、`STOCK_BLOCKERS=none`、`BACKUP_CORE_GREEN=1`、`ESCROW_MISSING_COUNT=5`、`HOST_188_HYGIENE_BLOCKED=0`、`WAZUH_MANAGER_REGISTRY_ACCEPTED=0`、`RUNTIME_ACTION_AUTHORIZED=0`。同一時段只讀補查 120:ArgoCD `awoooi-prod` 為 `Synced / Healthy`,`awoooi-prod` Pod 均為 `Running` 或 `Completed`;歷史 `km-vectorize-29689620` failed Job 已被 2026-06-23、2026-06-24、2026-06-25 後續成功 Job 覆蓋,不是目前服務 blocker。同一時段只讀補查 112:systemd `running`,Wazuh manager / indexer / dashboard `active`,manager API root 回 `401`,Dashboard unauthenticated check endpoints 回 `401`,manager registry 脫敏讀回為 local manager `1`、受管 agent `5`、active managed `5`、disconnected `0`、never connected `0`。此證據證明 registry 不再是「全空」,但仍不能宣稱 Wazuh 全主機納管恢復,因為 SOP expected scope 仍是 6、Dashboard API connection / version 尚未以登入或 owner evidence 驗收,owner response accepted 仍為 `0`。
|
||||
|
||||
2026-06-26 18:46 最新即時恢復真相已覆蓋 12:13 對今日產品資料的判讀:`scripts/reboot-recovery/post-reboot-readiness-summary.sh --no-color` 回傳 `POST_START_RESULT=PRODUCT_DATA_PENDING_EOD_WINDOW`、`SERVICE_GREEN=1`、`PRODUCT_DATA_GREEN=0`、`STOCK_LATEST_TRADING_DATE=2026-06-26`、`STOCK_BLOCKERS=core_margin_short_daily_missing,ai_recommendations_stale`、`BACKUP_CORE_GREEN=1`、`ESCROW_MISSING_COUNT=5`、`WAZUH_MANAGER_REGISTRY_ACCEPTED=0`。同一輪 live cold-start 長檢查回傳 `PASS=87 WARN=0 BLOCKED=0` 與 `Result: GREEN`,代表 110 / 120 / 121 / 188 主機、K3s、public routes、AWOOI API、MOMO、backup core、exporters、cron 與 Alertmanager 服務層已恢復;但 StockPlatform 今日官方 margin-short 尚未發布,AI recommendations 仍依賴該資料,因此不可宣稱所有產品資料最新。18:43 已以授權 `SIGTERM` 清除 110 上兩組 6 小時以上 `stockplatform-review-bulk-ux` orphan Chrome process group,`REMAINING=0`;18:44-18:46 已停止 168 Mac Mini 上本機 AWOOOI `next build` 並清理 temp/build/cache 與 Antigravity backup browser recordings,使 `/System/Volumes/Data` 從約 `1.0Gi / 100%` 回到約 `8.7Gi / 96%`。112 Kali 的 `networking.service` failed 已定位為 `/etc/network/if-up.d/wg-nat` 錯誤 shebang `#\!/bin/bash` 導致 `Exec format error`;Wazuh manager / indexer / dashboard 仍 active,該 hook 修復需要 112 sudo 提權,未使用或保存密碼。
|
||||
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# 主機重啟後一頁式總檢查
|
||||
|
||||
> Version: v1.15
|
||||
> Version: v1.16
|
||||
> Last updated: 2026-06-26 Asia/Taipei
|
||||
> Scope: 110 / 120 / 121 / 188 post-reboot service recovery. 112 Kali / Wazuh / active scan 不屬於本流程。
|
||||
|
||||
@@ -10,7 +10,7 @@
|
||||
|
||||
每次 110 / 120 / 121 / 188 任一台主機開機、關機、重啟、斷電恢復、VMware console fsck、Docker / K3s 大量重排後,都先跑本頁,再決定是否宣稱恢復。
|
||||
|
||||
最新基準:2026-06-26 17:45 single-summary replay / route warmup classifier。`scripts/reboot-recovery/post-reboot-readiness-summary.sh --no-color` 回傳 `SERVICE_GREEN=1`、`PRODUCT_DATA_GREEN=1`、`BACKUP_CORE_GREEN=1`、`DR_ESCROW_BLOCKED=1`、`ESCROW_MISSING_COUNT=5`、`HOST_188_HYGIENE_BLOCKED=0`、`HOST_188_RESULT=HOST_188_HYGIENE_GREEN.`、`WAZUH_MANAGER_REGISTRY_ACCEPTED=0`、`WAZUH_COVERAGE_SCOPE=6`、`WAZUH_DIRECT_ACTIVE=2`、`WAZUH_NO_TRANSPORT=1`、`WAZUH_SSH_BLOCKED=3`、`WAZUH_ROUTE_CODE=200`、`WAZUH_TRANSPORT_COUNT=6`、`WAZUH_DASHBOARD_API_CONNECTION=pending_or_spinning`、`WAZUH_DASHBOARD_INDEX_OK=3`、`RUNTIME_ACTION_AUTHORIZED=0`、`OVERALL_DECLARATION=FULL_STACK_GREEN_DR_ESCROW_BLOCKED`,並自動把同一份 key/value 寫到 `$ARTIFACT_DIR/summary.txt`。同一輪後續 `post-reboot-declaration-guard.py`、`post-reboot-next-gate-dispatch.sh`、`post-reboot-next-gate-owner-packets.py`、`post-reboot-owner-packet-contract-guard.py`、`post-reboot-owner-response-preflight.py` 必須使用這份 `summary.txt` 或由它產生的 dispatch / packet,不得混用多次 live probe 的不同時間點結果。`NEXT_REQUIRED_GATES=credential_escrow_evidence,wazuh_manager_registry_export` 仍是唯一目前 next gates;DR 仍因 `escrow_missing=5` 不可宣稱 complete;Wazuh manager registry accepted 仍是 `0`,不可把 route `200`、transport `6`、Dashboard index pattern `3` 當成 registry recovered。v1.15 另補 route warmup classifier:delegated cold-start 若只因 public route 單次 502 / TLS readback 暫時 blocked,但 wrapper route retry 已確認全部恢復,該 blocker 會降級為 evidence warning;非 route blocker 或 retry 後仍失敗仍為 hard blocked。
|
||||
最新基準:2026-06-26 23:56 single-summary replay / route + AWOOOI API warmup classifier。`scripts/reboot-recovery/post-reboot-readiness-summary.sh --no-color` 回傳 `POST_START_RESULT=FULL_STACK_GREEN_DR_ESCROW_BLOCKED`、`POST_START_PASS=38`、`POST_START_WARN=3`、`POST_START_BLOCKED=0`、`SERVICE_GREEN=1`、`PRODUCT_DATA_GREEN=1`、`STOCK_FRESHNESS_STATUS=ok`、`STOCK_LATEST_TRADING_DATE=2026-06-26`、`STOCK_BLOCKERS=none`、`BACKUP_CORE_GREEN=1`、`DR_ESCROW_BLOCKED=1`、`ESCROW_MISSING_COUNT=5`、`HOST_188_HYGIENE_BLOCKED=0`、`HOST_188_RESULT=HOST_188_HYGIENE_GREEN.`、`WAZUH_MANAGER_REGISTRY_ACCEPTED=0`、`WAZUH_COVERAGE_SCOPE=6`、`WAZUH_DIRECT_ACTIVE=2`、`WAZUH_NO_TRANSPORT=1`、`WAZUH_SSH_BLOCKED=3`、`WAZUH_ROUTE_CODE=200`、`WAZUH_TRANSPORT_COUNT=6`、`WAZUH_DASHBOARD_API_CONNECTION=pending_or_spinning`、`WAZUH_DASHBOARD_INDEX_OK=3`、`RUNTIME_ACTION_AUTHORIZED=0`、`OVERALL_DECLARATION=FULL_STACK_GREEN_DR_ESCROW_BLOCKED`,並自動把同一份 key/value 寫到 `$ARTIFACT_DIR/summary.txt`。同一輪後續 `post-reboot-declaration-guard.py`、`post-reboot-next-gate-dispatch.sh`、`post-reboot-next-gate-owner-packets.py`、`post-reboot-owner-packet-contract-guard.py`、`post-reboot-owner-response-preflight.py` 必須使用這份 `summary.txt` 或由它產生的 dispatch / packet,不得混用多次 live probe 的不同時間點結果。`NEXT_REQUIRED_GATES=credential_escrow_evidence,wazuh_manager_registry_export` 仍是唯一目前 next gates;DR 仍因 `escrow_missing=5` 不可宣稱 complete;Wazuh manager registry accepted 仍是 `0`,不可把 route `200`、transport `6`、Dashboard index pattern `3` 或脫敏 registry 計數當成全主機納管完成。v1.16 另補 route/API warmup classifier:delegated cold-start 若只因 public route 單次 502 / TLS readback,或 K3s rollout 瞬間單次 `BLOCKED AWOOOI API not reachable`,但 wrapper route retry 已確認 public AWOOOI API health 為 2xx,該 blocker 會降級為 evidence warning;public API 仍失敗、其他 non-route blocker 或 retry 後未恢復仍為 hard blocked。
|
||||
|
||||
本頁只回答四件事:
|
||||
|
||||
@@ -99,7 +99,7 @@ scripts/reboot-recovery/post-reboot-next-gate-dispatch.sh --no-color --summary-f
|
||||
若要交給 AI / 工單 / owner review 使用,產生機器可讀 owner packet:
|
||||
|
||||
```bash
|
||||
scripts/reboot-recovery/post-reboot-next-gate-owner-packets.py --no-color
|
||||
scripts/reboot-recovery/post-reboot-next-gate-owner-packets.py --no-color --summary-file "$ARTIFACT_DIR/summary.txt"
|
||||
```
|
||||
|
||||
輸出 JSON 只能作為 intake / review packet,不是 request sent。必須看到 `request_sent_count=0`、`owner_response_accepted_count=0`、`runtime_action_authorized_count=0`,否則視為不合格。
|
||||
@@ -107,7 +107,7 @@ scripts/reboot-recovery/post-reboot-next-gate-owner-packets.py --no-color
|
||||
送入任何 owner review queue 前,必須先把 JSON 存成 artifact 並跑 contract guard:
|
||||
|
||||
```bash
|
||||
scripts/reboot-recovery/post-reboot-next-gate-owner-packets.py --no-color --dispatch-file /tmp/awoooi-post-reboot-dispatch.txt --output /tmp/awoooi-post-reboot-owner-packets.json
|
||||
scripts/reboot-recovery/post-reboot-next-gate-owner-packets.py --no-color --summary-file "$ARTIFACT_DIR/summary.txt" --output /tmp/awoooi-post-reboot-owner-packets.json
|
||||
scripts/reboot-recovery/post-reboot-owner-packet-contract-guard.py --packet-file /tmp/awoooi-post-reboot-owner-packets.json
|
||||
```
|
||||
|
||||
@@ -116,7 +116,7 @@ guard 必須輸出 `POST_REBOOT_OWNER_PACKET_CONTRACT_GUARD_OK gates=<live_next_
|
||||
收到 owner response 檔案前,或收到任何聲稱已補證據的 JSON 前,必須跑 owner response preflight:
|
||||
|
||||
```bash
|
||||
scripts/reboot-recovery/post-reboot-owner-response-preflight.py --no-color
|
||||
scripts/reboot-recovery/post-reboot-owner-response-preflight.py --no-color --summary-file "$ARTIFACT_DIR/summary.txt"
|
||||
scripts/reboot-recovery/post-reboot-owner-response-preflight.py --no-color --owner-packet-file /tmp/awoooi-post-reboot-owner-packets.json
|
||||
scripts/reboot-recovery/post-reboot-owner-response-preflight.py --no-color --owner-packet-file /tmp/awoooi-post-reboot-owner-packets.json --response-file docs/templates/post-reboot-next-gate-owner-response.json
|
||||
```
|
||||
|
||||
@@ -11,11 +11,11 @@
|
||||
|
||||
| Area | Status | Completion | Evidence |
|
||||
|------|--------|------------|----------|
|
||||
| Overall recovery readiness | SERVICE_GREEN_PRODUCT_DATA_PENDING_EOD_WINDOW | 98% | 2026-06-26 18:45 即時摘要覆蓋 12:13 對今日產品資料的判讀。`post-reboot-readiness-summary.sh --no-color` 回傳 `SERVICE_GREEN=1`、`PRODUCT_DATA_GREEN=0`、`POST_START_RESULT=PRODUCT_DATA_PENDING_EOD_WINDOW`、`STOCK_LATEST_TRADING_DATE=2026-06-26`、`STOCK_BLOCKERS=core_margin_short_daily_missing,ai_recommendations_stale`、`BACKUP_CORE_GREEN=1`、`ESCROW_MISSING_COUNT=5`、`WAZUH_MANAGER_REGISTRY_ACCEPTED=0`。長版 live cold-start 腳本仍回傳 `PASS=87 WARN=0 BLOCKED=0` / `GREEN`,所以主機 / 服務 / route / K3s / backup core 已恢復;StockPlatform 今日官方融資融券資料尚未發布,需等既有 EOD retry,不可宣稱所有產品資料最新。 |
|
||||
| Overall recovery readiness | FULL_STACK_GREEN_DR_ESCROW_BLOCKED | 99% | 2026-06-26 23:56 即時摘要覆蓋 18:45 EOD pending 判讀。`post-reboot-readiness-summary.sh --no-color` 回傳 `SERVICE_GREEN=1`、`PRODUCT_DATA_GREEN=1`、`POST_START_RESULT=FULL_STACK_GREEN_DR_ESCROW_BLOCKED`、`POST_START_WARN=3`、`STOCK_FRESHNESS_STATUS=ok`、`STOCK_LATEST_TRADING_DATE=2026-06-26`、`STOCK_BLOCKERS=none`、`BACKUP_CORE_GREEN=1`、`ESCROW_MISSING_COUNT=5`、`WAZUH_MANAGER_REGISTRY_ACCEPTED=0`。主機 / K3s / public routes / AWOOOI / MOMO / Stock / backup core / 188 hygiene 已恢復;120 ArgoCD 為 `Synced / Healthy`,`awoooi-prod` Pod 均為 `Running` 或 `Completed`;歷史 `km-vectorize-29689620` failed Job 已由後續成功 Job 覆蓋。DR 仍因 credential escrow 缺 5 不能宣稱 complete;Wazuh registry 已有脫敏 manager readback,但尚未 Dashboard API / owner acceptance。 |
|
||||
| P0 host / K3s recovery | DONE | 100% | 120 booted after console fsck at `2026-06-12 15:13`; latest 2026-06-26 07:19 readback shows 120 and 121 reachable, K3s active, `mon` and `mon1` both `Ready control-plane`, AWOOOI API/Web replicas split across both nodes, ArgoCD `awoooi-prod Synced / Healthy` at revision `1fd5e2a8b0f18d24eed16aa2a44286bcbf230603`, and `km-vectorize` official 03:00 台北時間 run succeeded with `lastSuccess=2026-06-25T19:00:14Z`. |
|
||||
| P1 backup / alert / escrow | BLOCKED_DR_ESCROW | 97% | 2026-06-26 06:58 backup readback shows 110 `13/13 fresh failed=0`, 188 `2/2 fresh failed=0`, `core_blockers=0`, `integrity_stale=0`, `offsite_fresh=1`, `rclone_gdrive_fresh=1`, `escrow_missing=5`, last aggregate `2026-06-26 02:31:02`。DR remains blocked on real non-secret credential escrow evidence IDs; do not write placeholder markers or paste secret values. |
|
||||
| P2 service / data truth | BLOCKED_STOCK_EOD_WINDOW | 90% | Public routes 與 service health 為綠燈,MOMO `V10.716` healthy,current-month parity 為 `15383|15383|2026-06-01|2026-06-24|2026-06-01|2026-06-24`。StockPlatform 網站為 `200`,且 2026-06-26 price / chips / market index 已存在,但 `/api/v1/system/freshness` 仍為 `blocked`,因為 2026-06-26 的 `core.margin_short_daily` 仍等待官方來源 / row count `0`;`ai.recommendations` 依設計停在 2026-06-25。這是產品資料 freshness blocker,不是 reboot / Nginx / Docker blocker。 |
|
||||
| P3 docs / automation contracts | DONE_WITH_SINGLE_SUMMARY_REPLAY_V175 | 100% | Workplan, SOP v1.75, post-reboot declaration guard, machine-readable post-reboot readiness summary with Wazuh registry detail fields and auto-persisted `summary.txt`, post-reboot next-gate dispatch checklist, owner-packet JSON generator, dynamic owner-packet contract guard, post-reboot owner response preflight, owner response placeholder template, one-page post-start quick check v1.15, route retry gate, delegated cold-start public-route warmup classifier, deploy warmup classification, expanded public route list, StockPlatform freshness gate, StockPlatform cron-source recovery evidence, StockPlatform natural schedule green evidence, 110 orphan Chrome recurrence cleanup evidence, 188 fail-closed startup data recovery gate, 188 host hygiene read-only checklist, 188 PostgreSQL runtime-ready source-of-truth, 188 ACME route/timer hygiene, baseline `stockplatform_system_freshness_ok`, BACKUP-STATUS, LOGBOOK, 120 console/fsck recovery, Gitea backup stale-dump hardening, reboot ledger/version-comparison SOP, escrow evidence audit, 188 nginx Ansible baseline, 110 cold-start detector script, startup judgment layers, GO/NO-GO tree, host recovery cards, explicit Plan B degraded-operation path, machine-readable `plan_b` baseline, readiness-audit Plan B guard, B0-B5 service levels, T+0/T+120 fallback timeline checks, host role / load-balancing assessment, CD `known_hosts` guardrail, `fwupd-refresh.timer` rollback note, K3s filesystem event blocker, AWOOOI backup no-direct-offsite-sync contract, 110/188 Ansible source-of-truth, Gitea self-hosted readiness validation workflow, post-CD no-regression readbacks, stale-vs-active K8s failed Job classification, 110 runaway browser / CI load AIOps exporter + alert + gated remediation PlayBook, Telegram / AI event packet mapping, healthy heartbeat Telegram suppression, MOMO scheduler / current-month detector fix, exporter restore helpers, 110 Docker disk pressure cleanup boundary, notification-noise readback, MOMO import-boundary / Drive-auth fail-closed deploys, product version/readback matrix, and stricter product-data / route retry gates are updated. Declaration guard now machine-checks allowed / forbidden recovery statements from the same `summary.txt`: service/data/backup/188 host hygiene green may be declared when live summary says so, while `DR_COMPLETE`、`WAZUH_REGISTRY_RECOVERED` and `RUNTIME_ACTION_AUTHORIZED` remain forbidden until evidence gates close. Owner response preflight blocks missing files, placeholder templates, secret payloads, credential marker writes, Wazuh active response / re-enroll / restart, host write, and Kali active scan before any evidence can be counted as received or accepted. Live 110 script sync remains a separate approved live-write gate; do not claim it here. |
|
||||
| P2 service / data truth | DONE | 100% | Public routes 與 service health 為綠燈,MOMO health `V10.719`,current-month parity 為 `15383|15383|2026-06-01|2026-06-24|2026-06-01|2026-06-24`。StockPlatform `/api/v1/system/freshness` 為 `ok`,latest trading date `2026-06-26`,blockers `none`;先前 Stock EOD blocker 已由官方來源與正式 cron 自然收斂。 |
|
||||
| P3 docs / automation contracts | DONE_WITH_API_WARMUP_CLASSIFIER_V176 | 100% | Workplan, SOP v1.76, post-reboot declaration guard, machine-readable post-reboot readiness summary with Wazuh registry detail fields and auto-persisted `summary.txt`, post-reboot next-gate dispatch checklist, owner-packet JSON generator, dynamic owner-packet contract guard, post-reboot owner response preflight, owner response placeholder template, one-page post-start quick check v1.16, route retry gate, delegated cold-start public-route / AWOOOI API warmup classifier, deploy warmup classification, expanded public route list, StockPlatform freshness gate, StockPlatform cron-source recovery evidence, StockPlatform natural schedule green evidence, 110 orphan Chrome recurrence cleanup evidence, 188 fail-closed startup data recovery gate, 188 host hygiene read-only checklist, 188 PostgreSQL runtime-ready source-of-truth, 188 ACME route/timer hygiene, baseline `stockplatform_system_freshness_ok`, BACKUP-STATUS, LOGBOOK, 120 console/fsck recovery, Gitea backup stale-dump hardening, reboot ledger/version-comparison SOP, escrow evidence audit, 188 nginx Ansible baseline, 110 cold-start detector script, startup judgment layers, GO/NO-GO tree, host recovery cards, explicit Plan B degraded-operation path, machine-readable `plan_b` baseline, readiness-audit Plan B guard, B0-B5 service levels, T+0/T+120 fallback timeline checks, host role / load-balancing assessment, CD `known_hosts` guardrail, `fwupd-refresh.timer` rollback note, K3s filesystem event blocker, AWOOOI backup no-direct-offsite-sync contract, 110/188 Ansible source-of-truth, Gitea self-hosted readiness validation workflow, post-CD no-regression readbacks, stale-vs-active K8s failed Job classification, 110 runaway browser / CI load AIOps exporter + alert + gated remediation PlayBook, Telegram / AI event packet mapping, healthy heartbeat suppression, MOMO scheduler / current-month detector fix, exporter restore helpers, 110 Docker disk pressure cleanup boundary, notification-noise readback, MOMO import-boundary / Drive-auth fail-closed deploys, product version/readback matrix, and stricter product-data / route retry gates are updated. Declaration guard now machine-checks allowed / forbidden recovery statements from the same `summary.txt`: service/data/backup/188 host hygiene green may be declared when live summary says so, while `DR_COMPLETE`、`WAZUH_REGISTRY_RECOVERED` and `RUNTIME_ACTION_AUTHORIZED` remain forbidden until evidence gates close. |
|
||||
|
||||
2026-06-26 12:13 machine-readable summary baseline supersedes the 07:47 / 08:59 gate set: `scripts/reboot-recovery/post-reboot-readiness-summary.sh --no-color` stores delegated logs under `/tmp/awoooi-post-reboot-readiness-20260626-121303` and returns `SERVICE_GREEN=1`, `PRODUCT_DATA_GREEN=1`, `BACKUP_CORE_GREEN=1`, `DR_ESCROW_BLOCKED=1`, `ESCROW_MISSING_COUNT=5`, `HOST_188_SERVICE_GREEN=1`, `HOST_188_HYGIENE_BLOCKED=0`, `HOST_188_CHECK_RC=0`, `HOST_188_RESULT=HOST_188_HYGIENE_GREEN.`, `WAZUH_ROUTE_CODE=200`, `WAZUH_TRANSPORT_COUNT=6`, `WAZUH_COVERAGE_SCOPE=6`, `WAZUH_DIRECT_ACTIVE=2`, `WAZUH_NO_TRANSPORT=1`, `WAZUH_SSH_BLOCKED=3`, `WAZUH_DASHBOARD_API_CONNECTION=pending_or_spinning`, `WAZUH_DASHBOARD_INDEX_OK=3`, `WAZUH_MANAGER_REGISTRY_ACCEPTED=0`, `WAZUH_RUNTIME_GATE=0`, `RUNTIME_ACTION_AUTHORIZED=0`, `OVERALL_DECLARATION=FULL_STACK_GREEN_DR_ESCROW_BLOCKED`, and `NEXT_REQUIRED_GATES=credential_escrow_evidence,wazuh_manager_registry_export`. This is now the preferred first operator/AI-agent entrypoint after reboot because it separates service health from DR and security registry evidence; 188 host hygiene is no longer a next gate unless the live checklist regresses.
|
||||
|
||||
|
||||
@@ -30,6 +30,11 @@ def parse_args() -> argparse.Namespace:
|
||||
type=Path,
|
||||
help="Use an existing post-reboot-next-gate-dispatch output file.",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--summary-file",
|
||||
type=Path,
|
||||
help="Pass an existing readiness summary file to the delegated dispatch checklist.",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--output",
|
||||
type=Path,
|
||||
@@ -43,8 +48,10 @@ def parse_args() -> argparse.Namespace:
|
||||
return parser.parse_args()
|
||||
|
||||
|
||||
def run_dispatch(no_color: bool) -> str:
|
||||
def run_dispatch(no_color: bool, summary_file: Path | None) -> str:
|
||||
cmd = [str(DISPATCH_SCRIPT)]
|
||||
if summary_file:
|
||||
cmd.extend(["--summary-file", str(summary_file)])
|
||||
if no_color:
|
||||
cmd.append("--no-color")
|
||||
completed = subprocess.run(
|
||||
@@ -65,7 +72,7 @@ def run_dispatch(no_color: bool) -> str:
|
||||
def load_dispatch(args: argparse.Namespace) -> str:
|
||||
if args.dispatch_file:
|
||||
return args.dispatch_file.read_text(encoding="utf-8")
|
||||
return run_dispatch(no_color=args.no_color)
|
||||
return run_dispatch(no_color=args.no_color, summary_file=args.summary_file)
|
||||
|
||||
|
||||
def split_csv(value: str) -> list[str]:
|
||||
|
||||
@@ -97,6 +97,11 @@ def parse_args() -> argparse.Namespace:
|
||||
type=Path,
|
||||
help="Use an existing owner packet JSON instead of generating one.",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--summary-file",
|
||||
type=Path,
|
||||
help="Generate owner packets from an existing readiness summary file.",
|
||||
)
|
||||
parser.add_argument("--json", action="store_true", help="Print machine-readable JSON.")
|
||||
parser.add_argument(
|
||||
"--no-color",
|
||||
@@ -118,8 +123,10 @@ def load_json(path: Path, label: str = "response_file") -> dict[str, Any]:
|
||||
return payload
|
||||
|
||||
|
||||
def generate_owner_packet(no_color: bool) -> dict[str, Any]:
|
||||
def generate_owner_packet(no_color: bool, summary_file: Path | None) -> dict[str, Any]:
|
||||
cmd = [str(OWNER_PACKET_GENERATOR)]
|
||||
if summary_file:
|
||||
cmd.extend(["--summary-file", str(summary_file)])
|
||||
if no_color:
|
||||
cmd.append("--no-color")
|
||||
completed = subprocess.run(
|
||||
@@ -147,7 +154,7 @@ def generate_owner_packet(no_color: bool) -> dict[str, Any]:
|
||||
def load_owner_packet(args: argparse.Namespace) -> dict[str, Any]:
|
||||
if args.owner_packet_file:
|
||||
return load_json(args.owner_packet_file, label="owner_packet_file")
|
||||
return generate_owner_packet(no_color=args.no_color)
|
||||
return generate_owner_packet(no_color=args.no_color, summary_file=args.summary_file)
|
||||
|
||||
|
||||
def as_list(value: Any) -> list[Any]:
|
||||
|
||||
@@ -22,6 +22,7 @@ COLD_START_PENDING_BLOCKERS=0
|
||||
COLD_START_BLOCKED_SUMMARY=""
|
||||
COLD_START_BLOCKED_LINES=""
|
||||
ROUTE_SMOKE_BLOCKED=0
|
||||
AWOOOI_API_ROUTE_OK=0
|
||||
STOCK_EOD_WINDOW_PENDING=0
|
||||
STOCK_EOD_CLASSIFICATION="not_evaluated"
|
||||
STOCK_EOD_NEXT_ACTION="not_evaluated"
|
||||
@@ -469,6 +470,9 @@ if [[ "$RUN_ROUTES" -eq 1 ]]; then
|
||||
done
|
||||
case "$code" in
|
||||
2*|3*)
|
||||
if [[ "$url" == "https://awoooi.wooo.work/api/v1/health" && "$code" == 2* ]]; then
|
||||
AWOOOI_API_ROUTE_OK=1
|
||||
fi
|
||||
if [[ "$attempt" -gt 1 ]]; then
|
||||
evidence_warn "$code $url recovered_after_attempt=$attempt"
|
||||
else
|
||||
@@ -485,8 +489,13 @@ fi
|
||||
|
||||
if [[ "$COLD_START_PENDING_BLOCKERS" -gt 0 ]]; then
|
||||
non_route_cold_blockers="$(printf '%s\n' "$COLD_START_BLOCKED_LINES" | grep -Ev '^BLOCKED public route ' || true)"
|
||||
if [[ "$RUN_ROUTES" -eq 1 && "$ROUTE_SMOKE_BLOCKED" -eq 0 && "$AWOOOI_API_ROUTE_OK" -eq 1 ]]; then
|
||||
non_route_cold_blockers="$(
|
||||
printf '%s\n' "$non_route_cold_blockers" | grep -Ev '^BLOCKED AWOOOI API not reachable$|^BLOCKED AWOOI API not reachable$' || true
|
||||
)"
|
||||
fi
|
||||
if [[ "$RUN_ROUTES" -eq 1 && "$ROUTE_SMOKE_BLOCKED" -eq 0 && -z "$non_route_cold_blockers" ]]; then
|
||||
evidence_warn "cold-start public-route blockers recovered under wrapper route retry: $COLD_START_BLOCKED_SUMMARY"
|
||||
evidence_warn "cold-start route/API warmup blockers recovered under wrapper route retry: $COLD_START_BLOCKED_SUMMARY"
|
||||
printf '%s\n' "$COLD_START_BLOCKED_LINES"
|
||||
else
|
||||
blocked "cold-start has blockers: $COLD_START_BLOCKED_SUMMARY"
|
||||
|
||||
Reference in New Issue
Block a user