From 35dba352538c8d8ddb156d8510fbc6430d45bf94 Mon Sep 17 00:00:00 2001 From: ogt Date: Fri, 26 Jun 2026 17:55:45 +0800 Subject: [PATCH] ops(reboot): persist summary evidence and classify warmup routes --- docs/LOGBOOK.md | 38 +++++++++++++++++++ docs/runbooks/FULL-STACK-COLD-START-SOP.md | 6 ++- .../runbooks/REBOOT-POST-START-QUICK-CHECK.md | 24 +++++++++--- ...oot-cold-start-backup-recovery-workplan.md | 4 +- .../post-reboot-owner-response-preflight.py | 12 +++--- .../post-reboot-readiness-summary.sh | 4 ++ .../reboot-recovery/post-start-quick-check.sh | 21 +++++++++- 7 files changed, 94 insertions(+), 15 deletions(-) diff --git a/docs/LOGBOOK.md b/docs/LOGBOOK.md index c4911a7f..57831b12 100644 --- a/docs/LOGBOOK.md +++ b/docs/LOGBOOK.md @@ -45664,3 +45664,41 @@ production browser smoke: - Wazuh manager registry accepted 仍為 `0`:不得宣稱 Wazuh 全主機納管恢復。 - Owner response received / accepted 仍為 `0 / 0`;不得把「批准繼續」、空模板、UI 可見、route `200`、transport `6`、Dashboard index pattern `3` 或 owner-packet JSON 當成 evidence accepted。 - Runtime action / host write / credential marker write / Wazuh active response / Kali active scan 仍全部 `0 / false`。 + +## 2026-06-26 — 17:45 single-summary replay / route warmup classifier / SOP v1.75 + +**時間與來源**: +- 2026-06-26 17:39-17:45 Asia/Taipei。 +- 來源:`scripts/reboot-recovery/post-reboot-readiness-summary.sh --no-color`、`scripts/reboot-recovery/post-start-quick-check.sh --no-color`、`scripts/reboot-recovery/post-reboot-declaration-guard.py --summary-file ...`、`scripts/reboot-recovery/post-reboot-next-gate-dispatch.sh --summary-file ...`、owner packet / contract / owner response preflight。 + +**完成內容**: +- `post-reboot-readiness-summary.sh` 會把 stdout 的 key/value summary 同步寫入 `$ARTIFACT_DIR/summary.txt`,讓同一輪 declaration guard、next-gate dispatch、owner packet、contract guard 與 owner response preflight 都吃同一份 evidence。 +- `post-start-quick-check.sh` 新增 delegated cold-start blocker 分類:cold-start 若只因 public route 單次 502 / TLS readback 暫時 blocked,但 wrapper 自己的 route retry 已全部恢復,該 blocker 降級為 evidence warning;非 route blocker 或 retry 後仍失敗仍為 hard blocker。 +- `post-reboot-owner-response-preflight.py` 的 JSON loader 錯誤訊息已區分 `owner_packet_file_*` 與 `response_file_*`,避免 race 或缺檔時誤導 operator。 +- `docs/runbooks/FULL-STACK-COLD-START-SOP.md` 升至 v1.75;`docs/runbooks/REBOOT-POST-START-QUICK-CHECK.md` 升至 v1.15;workplan 狀態更新為 `DONE_WITH_SINGLE_SUMMARY_REPLAY_V175`。 + +**live / replay 證據**: +- 17:42 首輪修補後 summary 確認 `summary.txt` 已寫入 `/tmp/awoooi-post-reboot-readiness-20260626-174129/summary.txt`,但 delegated cold-start 因 `stock.wooo.work` 單次 502 / TLS check failure 產生 `POST_START_BLOCKED=1`;同一輪 wrapper route retry 已顯示 `WARN 200 https://stock.wooo.work/ recovered_after_attempt=2`,且 StockPlatform freshness 為 `status=ok`。 +- 分類修正後,`scripts/reboot-recovery/post-start-quick-check.sh --no-color` 回到 `POST_START_QUICK_CHECK PASS=38 WARN=4 BLOCKED=0`、`POST_START_QUICK_CHECK_WARNINGS SERVICE=0 BOUNDARY=1 EVIDENCE=3`、`RESULT=FULL_STACK_GREEN_DR_ESCROW_BLOCKED`。 +- 最終 summary artifact:`/tmp/awoooi-post-reboot-readiness-20260626-174451/summary.txt`,回傳 `POST_START_RC=0`、`POST_START_RESULT=FULL_STACK_GREEN_DR_ESCROW_BLOCKED`、`POST_START_PASS=38`、`POST_START_WARN=4`、`POST_START_BLOCKED=0`、`SERVICE_GREEN=1`、`PRODUCT_DATA_GREEN=1`、`BACKUP_CORE_GREEN=1`、`DR_ESCROW_BLOCKED=1`、`ESCROW_MISSING_COUNT=5`、`HOST_188_HYGIENE_BLOCKED=0`、`HOST_188_RESULT=HOST_188_HYGIENE_GREEN.`、`WAZUH_MANAGER_REGISTRY_ACCEPTED=0`、`RUNTIME_ACTION_AUTHORIZED=0`、`OVERALL_DECLARATION=FULL_STACK_GREEN_DR_ESCROW_BLOCKED`、`NEXT_REQUIRED_GATES=credential_escrow_evidence,wazuh_manager_registry_export`。 +- 固定 summary 重放:`post-reboot-next-gate-dispatch.sh --summary-file /tmp/awoooi-post-reboot-readiness-20260626-174451/summary.txt` 只輸出兩個 P0 gate:`credential_escrow_evidence` 與 `wazuh_manager_registry_export`;`DISPATCH_AUTHORIZED=0`、`REQUEST_SENT_COUNT=0`、`HOST_WRITE_AUTHORIZED=0`、`SECRET_VALUE_COLLECTION_ALLOWED=0`。 +- Declaration guard 使用同一份 summary 正確拒絕 proposed `DR_COMPLETE`、`WAZUH_REGISTRY_RECOVERED`、`RUNTIME_ACTION_AUTHORIZED`:`POST_REBOOT_DECLARATION_GUARD_REJECTED status=blocked_false_green_proposal allowed=5 forbidden=3 next_gates=2 rejected_proposed=3`。 +- Owner packet contract guard:`POST_REBOOT_OWNER_PACKET_CONTRACT_GUARD_OK gates=2 request_sent=0 accepted=0 runtime_gate=0`。 +- Owner response preflight:無 response file 維持 `blocked_waiting_owner_response_file`;placeholder template 維持 `blocked_waiting_owner_response_content`、`received=0`、`accepted=0`、`runtime_gate=0`。 + +**做過的命令類型**: +- 只讀:live summary、quick-check、declaration guard、dispatch replay、owner packet / contract / preflight 驗證。 +- 寫入:repo script / runbook / workplan / LOGBOOK only。 +- 未做:沒有 host / Docker / systemd / Nginx / firewall / K8s / DB / Wazuh runtime 寫操作;沒有讀 secret 明文;沒有寫 credential marker;沒有送 owner request;沒有 Wazuh active response / agent re-enroll / restart;沒有 Kali active scan。 + +**目前判定**: +- Reboot service / product data / backup / 188 host hygiene:`GREEN`。 +- Overall recovery declaration:`FULL_STACK_GREEN_DR_ESCROW_BLOCKED`。 +- SOP / quick-check / single-summary evidence chain:v1.75。 +- Route warmup no-false-blocker classifier:`100%`。 + +**仍 blocked / 不得宣稱**: +- DR credential escrow evidence 仍缺 `5`:不得宣稱 `DR_COMPLETE` 或 credential escrow complete。 +- Wazuh manager registry accepted 仍為 `0`:不得宣稱 Wazuh 全主機納管恢復。 +- Owner response received / accepted 仍為 `0 / 0`;不得把「批准繼續」、空模板、UI 可見、route `200`、transport `6`、Dashboard index pattern `3` 或 owner-packet JSON 當成 evidence accepted。 +- Runtime action / host write / credential marker write / Wazuh active response / Kali active scan 仍全部 `0 / false`。 diff --git a/docs/runbooks/FULL-STACK-COLD-START-SOP.md b/docs/runbooks/FULL-STACK-COLD-START-SOP.md index 3629dd20..ab7e2000 100644 --- a/docs/runbooks/FULL-STACK-COLD-START-SOP.md +++ b/docs/runbooks/FULL-STACK-COLD-START-SOP.md @@ -1,6 +1,6 @@ # AWOOOI 全棧冷啟動與主機重啟 SOP -> Version: v1.74 +> Version: v1.75 > Last updated: 2026-06-26 Asia/Taipei > Scope: 110 / 120 / 121 / 188 full-stack reboot recovery. 112 Kali is recorded as P3 optional and is not part of this recovery path. @@ -10,12 +10,14 @@ 本節是每次接手、開機、關機、重啟後的第一個判定錨點。若日期不是今天,必須先重跑 live check,再更新本節與 `docs/workplans/2026-06-04-reboot-cold-start-backup-recovery-workplan.md`。 -若只是重啟後要快速判斷能不能宣稱恢復,先跑機器可讀摘要:`scripts/reboot-recovery/post-reboot-readiness-summary.sh --no-color`。此腳本會呼叫一頁式總檢查、188 host hygiene checklist 與 Wazuh no-false-green repo gates,並把 delegated logs 留在 `/tmp/awoooi-post-reboot-readiness-*`。接著跑 `scripts/reboot-recovery/post-reboot-declaration-guard.py --no-color`,把 summary 轉成 allowed / forbidden declaration,避免把服務綠誤報成 DR complete、188 host hygiene、Wazuh registry recovered 或 runtime authorized。若 summary 顯示 `SERVICE_GREEN=1` 但 `NEXT_REQUIRED_GATES` 仍非空,再跑 `scripts/reboot-recovery/post-reboot-next-gate-dispatch.sh --no-color`,把 live summary 內尚未完成的 blocker 轉成 owner / evidence / forbidden-action dispatch checklist;需要機器可讀 intake 時,再跑 `scripts/reboot-recovery/post-reboot-next-gate-owner-packets.py --no-color --output /tmp/awoooi-post-reboot-owner-packets.json` 產生 `awoooi_post_reboot_next_gate_owner_packets_v1` JSON,並立刻跑 `scripts/reboot-recovery/post-reboot-owner-packet-contract-guard.py --packet-file /tmp/awoooi-post-reboot-owner-packets.json`。dispatch / packet / guard 均固定 `DISPATCH_AUTHORIZED=0`、`REQUEST_SENT_COUNT=0`、`OWNER_RESPONSE_ACCEPTED=0`、`HOST_WRITE_AUTHORIZED=0`、`SECRET_VALUE_COLLECTION_ALLOWED=0`、`RUNTIME_GATE=0`;guard 未通過時不得送 owner request、不得寫 escrow marker、不得進維護窗口、不得宣稱 DR / Wazuh registry complete。v1.74 起,任何 owner response JSON 還必須經過 `scripts/reboot-recovery/post-reboot-owner-response-preflight.py --no-color --response-file `:空模板、placeholder、secret payload、runtime action request、credential marker write、Wazuh active response / re-enroll / restart、Kali active scan 或缺少 Dashboard API / manager registry evidence 都必須 fail-closed;preflight 通過也只表示可進入獨立 reviewer acceptance,不是 runtime 授權。需要人工展開時,再跑 `scripts/reboot-recovery/post-start-quick-check.sh --no-color` 並以 `docs/runbooks/REBOOT-POST-START-QUICK-CHECK.md` 作為 fallback。長 SOP 保留完整背景、例外處理與 Plan B;短版 wrapper / checklist 負責每次 T+10 分鐘內的固定判定。 +若只是重啟後要快速判斷能不能宣稱恢復,先跑機器可讀摘要:`scripts/reboot-recovery/post-reboot-readiness-summary.sh --no-color`。此腳本會呼叫一頁式總檢查、188 host hygiene checklist 與 Wazuh no-false-green repo gates,並把 delegated logs 和可重放的 `summary.txt` 留在 `/tmp/awoooi-post-reboot-readiness-*`。v1.75 起,同一輪驗收後續步驟必須吃同一個 `$ARTIFACT_DIR/summary.txt`,例如 `scripts/reboot-recovery/post-reboot-declaration-guard.py --summary-file "$ARTIFACT_DIR/summary.txt" --no-color` 與 `scripts/reboot-recovery/post-reboot-next-gate-dispatch.sh --summary-file "$ARTIFACT_DIR/summary.txt" --no-color`;不得在同一輪 evidence chain 反覆重跑 live probes 後混用不同時間點結論。宣告 guard 會把 summary 轉成 allowed / forbidden declaration,避免把服務綠誤報成 DR complete、188 host hygiene、Wazuh registry recovered 或 runtime authorized。若 summary 顯示 `SERVICE_GREEN=1` 但 `NEXT_REQUIRED_GATES` 仍非空,再由 dispatch checklist 把尚未完成的 blocker 轉成 owner / evidence / forbidden-action checklist;需要機器可讀 intake 時,再跑 `scripts/reboot-recovery/post-reboot-next-gate-owner-packets.py --dispatch-file --output /tmp/awoooi-post-reboot-owner-packets.json` 產生 `awoooi_post_reboot_next_gate_owner_packets_v1` JSON,並立刻跑 `scripts/reboot-recovery/post-reboot-owner-packet-contract-guard.py --packet-file /tmp/awoooi-post-reboot-owner-packets.json`。dispatch / packet / guard 均固定 `DISPATCH_AUTHORIZED=0`、`REQUEST_SENT_COUNT=0`、`OWNER_RESPONSE_ACCEPTED=0`、`HOST_WRITE_AUTHORIZED=0`、`SECRET_VALUE_COLLECTION_ALLOWED=0`、`RUNTIME_GATE=0`;guard 未通過時不得送 owner request、不得寫 escrow marker、不得進維護窗口、不得宣稱 DR / Wazuh registry complete。v1.74 起,任何 owner response JSON 還必須經過 `scripts/reboot-recovery/post-reboot-owner-response-preflight.py --no-color --owner-packet-file --response-file `:空模板、placeholder、secret payload、runtime action request、credential marker write、Wazuh active response / re-enroll / restart、Kali active scan 或缺少 Dashboard API / manager registry evidence 都必須 fail-closed;preflight 通過也只表示可進入獨立 reviewer acceptance,不是 runtime 授權。需要人工展開時,再跑 `scripts/reboot-recovery/post-start-quick-check.sh --no-color` 並以 `docs/runbooks/REBOOT-POST-START-QUICK-CHECK.md` 作為 fallback。長 SOP 保留完整背景、例外處理與 Plan B;短版 wrapper / checklist 負責每次 T+10 分鐘內的固定判定。 2026-06-26 12:13 latest live summary supersedes the 08:59 gate set:`scripts/reboot-recovery/post-reboot-readiness-summary.sh --no-color` 回傳 `POST_START_RESULT=FULL_STACK_GREEN_DR_ESCROW_BLOCKED`、`POST_START_PASS=38`、`POST_START_WARN=4`、`POST_START_BLOCKED=0`、`SERVICE_GREEN=1`、`PRODUCT_DATA_GREEN=1`、`BACKUP_CORE_GREEN=1`、`DR_ESCROW_BLOCKED=1`、`ESCROW_MISSING_COUNT=5`、`HOST_188_SERVICE_GREEN=1`、`HOST_188_HYGIENE_BLOCKED=0`、`HOST_188_RESULT=HOST_188_HYGIENE_GREEN.`、`WAZUH_ROUTE_CODE=200`、`WAZUH_TRANSPORT_COUNT=6`、`WAZUH_MANAGER_REGISTRY_ACCEPTED=0`、`WAZUH_DASHBOARD_API_CONNECTION=pending_or_spinning`、`WAZUH_DASHBOARD_INDEX_OK=3`、`RUNTIME_ACTION_AUTHORIZED=0`、`OVERALL_DECLARATION=FULL_STACK_GREEN_DR_ESCROW_BLOCKED`、`NEXT_REQUIRED_GATES=credential_escrow_evidence,wazuh_manager_registry_export`。188 host hygiene 已從 blocker 移除;目前不可宣稱完成的只剩 DR credential escrow 與 Wazuh manager registry。ACME HTTP-01 route 與 certbot timer hygiene 已修復,但不得宣稱憑證已正式 renew,需等 snap certbot timer / ACME window readback。 2026-06-26 13:01 owner response preflight baseline:新增 `scripts/reboot-recovery/post-reboot-owner-response-preflight.py --no-color` 與 `docs/templates/post-reboot-next-gate-owner-response.json`。無 response file 時必須輸出 `POST_REBOOT_OWNER_RESPONSE_PREFLIGHT_BLOCKED status=blocked_waiting_owner_response_file expected_gates=2 received=0 accepted=0 runtime_gate=0`;直接使用模板時必須輸出 `POST_REBOOT_OWNER_RESPONSE_PREFLIGHT_BLOCKED status=blocked_waiting_owner_response_content expected_gates=2 received=0 accepted=0 runtime_gate=0`。此 gate 只驗收 `credential_escrow_evidence` 與 `wazuh_manager_registry_export` 的脫敏 owner evidence,不送 request、不寫 escrow marker、不讀 secret、不做 Wazuh / host / Kali runtime action,也不把一般批准訊息轉成 owner accepted。 +2026-06-26 17:45 single-summary replay baseline:`scripts/reboot-recovery/post-reboot-readiness-summary.sh --no-color` 現在會自動寫入 `/tmp/awoooi-post-reboot-readiness-20260626-174451/summary.txt`,同一輪後續 `declaration guard`、`next-gate dispatch`、`owner packet`、`contract guard` 與 `owner response preflight` 均用此 summary 重放。17:45 summary 回傳 `SERVICE_GREEN=1`、`PRODUCT_DATA_GREEN=1`、`BACKUP_CORE_GREEN=1`、`DR_ESCROW_BLOCKED=1`、`ESCROW_MISSING_COUNT=5`、`HOST_188_HYGIENE_BLOCKED=0`、`WAZUH_MANAGER_REGISTRY_ACCEPTED=0`、`OVERALL_DECLARATION=FULL_STACK_GREEN_DR_ESCROW_BLOCKED`、`NEXT_REQUIRED_GATES=credential_escrow_evidence,wazuh_manager_registry_export`。`post-start-quick-check.sh` 也已補 route warmup 分類:若 delegated cold-start 的 `BLOCKED` 全部是 public route,且 wrapper 自己的 route retry 已全部恢復,該 cold-start blocker 會降級為 evidence warning,不再把整輪服務恢復誤判成 blocked;非 route blocker 或 retry 後仍失敗仍維持 hard blocked。 + 2026-06-26 07:47 machine-readable readiness summary retained as historical pre-repair evidence:當時 `HOST_188_HYGIENE_BLOCKED=1`、`NEXT_REQUIRED_GATES=credential_escrow_evidence,host_188_hygiene_maintenance_window,wazuh_manager_registry_export`。此段只用來比對 188 修復前後差異;現行 gate set 必須使用 12:13 baseline。 2026-06-26 08:12 next-gate dispatch baseline retained as historical pre-repair evidence:當時 output 固定三個 P0 checklist。12:13 起 dispatch 依 live summary 動態輸出,目前 expected `NEXT_GATE_COUNT=2`,只剩 credential escrow 與 Wazuh registry。 diff --git a/docs/runbooks/REBOOT-POST-START-QUICK-CHECK.md b/docs/runbooks/REBOOT-POST-START-QUICK-CHECK.md index 456091c7..b3eecb2e 100644 --- a/docs/runbooks/REBOOT-POST-START-QUICK-CHECK.md +++ b/docs/runbooks/REBOOT-POST-START-QUICK-CHECK.md @@ -1,6 +1,6 @@ # 主機重啟後一頁式總檢查 -> Version: v1.14 +> Version: v1.15 > Last updated: 2026-06-26 Asia/Taipei > Scope: 110 / 120 / 121 / 188 post-reboot service recovery. 112 Kali / Wazuh / active scan 不屬於本流程。 @@ -10,7 +10,7 @@ 每次 110 / 120 / 121 / 188 任一台主機開機、關機、重啟、斷電恢復、VMware console fsck、Docker / K3s 大量重排後,都先跑本頁,再決定是否宣稱恢復。 -最新基準:2026-06-26 13:01 post-reboot owner response preflight。`scripts/reboot-recovery/post-reboot-readiness-summary.sh --no-color` 回傳 `SERVICE_GREEN=1`、`PRODUCT_DATA_GREEN=1`、`BACKUP_CORE_GREEN=1`、`DR_ESCROW_BLOCKED=1`、`ESCROW_MISSING_COUNT=5`、`HOST_188_HYGIENE_BLOCKED=0`、`HOST_188_RESULT=HOST_188_HYGIENE_GREEN.`、`WAZUH_MANAGER_REGISTRY_ACCEPTED=0`、`WAZUH_COVERAGE_SCOPE=6`、`WAZUH_DIRECT_ACTIVE=2`、`WAZUH_NO_TRANSPORT=1`、`WAZUH_SSH_BLOCKED=3`、`WAZUH_ROUTE_CODE=200`、`WAZUH_TRANSPORT_COUNT=6`、`WAZUH_DASHBOARD_API_CONNECTION=pending_or_spinning`、`WAZUH_DASHBOARD_INDEX_OK=3`、`RUNTIME_ACTION_AUTHORIZED=0`、`OVERALL_DECLARATION=FULL_STACK_GREEN_DR_ESCROW_BLOCKED`。`scripts/reboot-recovery/post-reboot-declaration-guard.py --no-color` 會把 summary 轉成 allowed / forbidden declaration:目前允許宣稱服務、產品資料、備份核心、188 host hygiene green 與 `FULL_STACK_GREEN_DR_ESCROW_BLOCKED`;禁止宣稱 `DR_COMPLETE`、`WAZUH_REGISTRY_RECOVERED`、`RUNTIME_ACTION_AUTHORIZED`。接著 `scripts/reboot-recovery/post-reboot-next-gate-dispatch.sh --no-color` 將 `NEXT_REQUIRED_GATES=credential_escrow_evidence,wazuh_manager_registry_export` 展成 owner / evidence / forbidden-action checklist;Wazuh checklist 的 `CURRENT_EVIDENCE` 會保留 registry accepted、coverage scope、direct active、no transport、SSH blocked、route、transport、Dashboard API 與 index pattern 狀態,避免把 route `200` 或 transport `6` 誤報成 registry recovered。`scripts/reboot-recovery/post-reboot-next-gate-owner-packets.py --no-color` 進一步轉成 `awoooi_post_reboot_next_gate_owner_packets_v1` JSON,固定 `dispatch_authorized=0`、`request_sent_count=0`、`owner_response_accepted_count=0`、`host_write_authorized=0`、`secret_value_collection_allowed=0`、`runtime_gate_count=0`;`scripts/reboot-recovery/post-reboot-owner-packet-contract-guard.py --packet-file /tmp/awoooi-post-reboot-owner-packets.json` 依 live `next_required_gates` 動態鎖定 P0 gate、所有 `0 / false` 邊界、禁用 secret payload / runtime action 與 no-false-green 規則。新增 `scripts/reboot-recovery/post-reboot-owner-response-preflight.py --no-color` 作為 owner response 收件預檢:沒有 response file 必須是 `blocked_waiting_owner_response_file`;直接套用 `docs/templates/post-reboot-next-gate-owner-response.json` 必須是 `blocked_waiting_owner_response_content`;只有具備遮罩 evidence refs、完整 owner 欄位、Wazuh registry / Dashboard API 狀態、五個 credential escrow 非 secret evidence refs,且沒有 secret value / runtime action request 的 response 才能進入下一層 reviewer acceptance。DR 仍因 `escrow_missing=5` 不可宣稱 complete;Wazuh manager registry 仍是 service green 之外的獨立 blocker。ACME HTTP-01 route / certbot timer hygiene 已修復,但憑證正式 renew 成功需等 snap certbot timer 或獨立 ACME window readback。 +最新基準:2026-06-26 17:45 single-summary replay / route warmup classifier。`scripts/reboot-recovery/post-reboot-readiness-summary.sh --no-color` 回傳 `SERVICE_GREEN=1`、`PRODUCT_DATA_GREEN=1`、`BACKUP_CORE_GREEN=1`、`DR_ESCROW_BLOCKED=1`、`ESCROW_MISSING_COUNT=5`、`HOST_188_HYGIENE_BLOCKED=0`、`HOST_188_RESULT=HOST_188_HYGIENE_GREEN.`、`WAZUH_MANAGER_REGISTRY_ACCEPTED=0`、`WAZUH_COVERAGE_SCOPE=6`、`WAZUH_DIRECT_ACTIVE=2`、`WAZUH_NO_TRANSPORT=1`、`WAZUH_SSH_BLOCKED=3`、`WAZUH_ROUTE_CODE=200`、`WAZUH_TRANSPORT_COUNT=6`、`WAZUH_DASHBOARD_API_CONNECTION=pending_or_spinning`、`WAZUH_DASHBOARD_INDEX_OK=3`、`RUNTIME_ACTION_AUTHORIZED=0`、`OVERALL_DECLARATION=FULL_STACK_GREEN_DR_ESCROW_BLOCKED`,並自動把同一份 key/value 寫到 `$ARTIFACT_DIR/summary.txt`。同一輪後續 `post-reboot-declaration-guard.py`、`post-reboot-next-gate-dispatch.sh`、`post-reboot-next-gate-owner-packets.py`、`post-reboot-owner-packet-contract-guard.py`、`post-reboot-owner-response-preflight.py` 必須使用這份 `summary.txt` 或由它產生的 dispatch / packet,不得混用多次 live probe 的不同時間點結果。`NEXT_REQUIRED_GATES=credential_escrow_evidence,wazuh_manager_registry_export` 仍是唯一目前 next gates;DR 仍因 `escrow_missing=5` 不可宣稱 complete;Wazuh manager registry accepted 仍是 `0`,不可把 route `200`、transport `6`、Dashboard index pattern `3` 當成 registry recovered。v1.15 另補 route warmup classifier:delegated cold-start 若只因 public route 單次 502 / TLS readback 暫時 blocked,但 wrapper route retry 已確認全部恢復,該 blocker 會降級為 evidence warning;非 route blocker 或 retry 後仍失敗仍為 hard blocked。 本頁只回答四件事: @@ -49,7 +49,7 @@ scripts/reboot-recovery/post-reboot-readiness-summary.sh --no-color ``` -此 summary 只做 read-only 檢查,會委派一頁式總檢查、188 host hygiene checklist 與 Wazuh repo-side no-false-green gates,並將 delegated logs 保留在 `/tmp/awoooi-post-reboot-readiness-*`。第一眼先看這些欄位: +此 summary 只做 read-only 檢查,會委派一頁式總檢查、188 host hygiene checklist 與 Wazuh repo-side no-false-green gates,並將 delegated logs 與可重放的 `summary.txt` 保留在 `/tmp/awoooi-post-reboot-readiness-*`。第一眼先看這些欄位: - `SERVICE_GREEN=1`:服務面可宣稱恢復。 - `PRODUCT_DATA_GREEN=1`:MOMO / StockPlatform 主要資料 freshness 可宣稱恢復。 @@ -67,6 +67,13 @@ summary 顯示 `SERVICE_GREEN=1` 後,先跑宣告 guard,確認本輪可以 scripts/reboot-recovery/post-reboot-declaration-guard.py --no-color ``` +若已經有 `$ARTIFACT_DIR/summary.txt`,同一輪建議固定使用它,避免重跑 live probes 時被 CI / route warmup 瞬間狀態影響: + +```bash +SUMMARY_FILE=/tmp/awoooi-post-reboot-readiness-YYYYMMDD-HHMMSS/summary.txt +scripts/reboot-recovery/post-reboot-declaration-guard.py --no-color --summary-file "$SUMMARY_FILE" +``` + 這支 guard 只讀取 summary evidence,並輸出本輪允許與禁止宣稱的邊界。若要測試某個說法是否允許,可用 `--proposed`: ```bash @@ -81,6 +88,12 @@ summary 顯示 `SERVICE_GREEN=1` 但仍有 `NEXT_REQUIRED_GATES` 時,再跑下 scripts/reboot-recovery/post-reboot-next-gate-dispatch.sh --no-color ``` +同一輪 evidence chain 應改用同一份 summary: + +```bash +scripts/reboot-recovery/post-reboot-next-gate-dispatch.sh --no-color --summary-file "$SUMMARY_FILE" | tee /tmp/awoooi-post-reboot-dispatch.txt +``` + 這支腳本只把目前 live summary 內的 `NEXT_REQUIRED_GATES` 轉成 owner / required evidence / forbidden action / done criteria。2026-06-26 12:13 起通常只剩 `credential_escrow_evidence` 與 `wazuh_manager_registry_export`;若未來 188 又紅,才會重新出現 `host_188_hygiene_maintenance_window`。它不送 request、不讀 secret、不寫 marker、不 restart / reload / repair / import / delete / patch,也不授權 host / Wazuh / Nginx / K8s / DB runtime action。若它輸出 `SERVICE_GREEN=0`,先回到服務恢復,不進入 boundary dispatch。 若要交給 AI / 工單 / owner review 使用,產生機器可讀 owner packet: @@ -94,7 +107,7 @@ scripts/reboot-recovery/post-reboot-next-gate-owner-packets.py --no-color 送入任何 owner review queue 前,必須先把 JSON 存成 artifact 並跑 contract guard: ```bash -scripts/reboot-recovery/post-reboot-next-gate-owner-packets.py --no-color --output /tmp/awoooi-post-reboot-owner-packets.json +scripts/reboot-recovery/post-reboot-next-gate-owner-packets.py --no-color --dispatch-file /tmp/awoooi-post-reboot-dispatch.txt --output /tmp/awoooi-post-reboot-owner-packets.json scripts/reboot-recovery/post-reboot-owner-packet-contract-guard.py --packet-file /tmp/awoooi-post-reboot-owner-packets.json ``` @@ -104,7 +117,8 @@ guard 必須輸出 `POST_REBOOT_OWNER_PACKET_CONTRACT_GUARD_OK gates= argparse.Namespace: return parser.parse_args() -def load_json(path: Path) -> dict[str, Any]: +def load_json(path: Path, label: str = "response_file") -> dict[str, Any]: try: payload = json.loads(path.read_text(encoding="utf-8")) except FileNotFoundError as exc: - raise SystemExit(f"response_file_not_found={path}") from exc + raise SystemExit(f"{label}_not_found={path}") from exc except json.JSONDecodeError as exc: - raise SystemExit(f"response_json_invalid={exc}") from exc + raise SystemExit(f"{label}_json_invalid={exc}") from exc if not isinstance(payload, dict): - raise SystemExit("response_json_not_object") + raise SystemExit(f"{label}_json_not_object") return payload @@ -146,7 +146,7 @@ def generate_owner_packet(no_color: bool) -> dict[str, Any]: def load_owner_packet(args: argparse.Namespace) -> dict[str, Any]: if args.owner_packet_file: - return load_json(args.owner_packet_file) + return load_json(args.owner_packet_file, label="owner_packet_file") return generate_owner_packet(no_color=args.no_color) @@ -375,7 +375,7 @@ def evaluate(packet: dict[str, Any], response: dict[str, Any] | None) -> dict[st def main() -> int: args = parse_args() packet = load_owner_packet(args) - response = load_json(args.response_file) if args.response_file else None + response = load_json(args.response_file, label="response_file") if args.response_file else None result = evaluate(packet, response) if args.json: diff --git a/scripts/reboot-recovery/post-reboot-readiness-summary.sh b/scripts/reboot-recovery/post-reboot-readiness-summary.sh index 185cd880..bb76aeb2 100755 --- a/scripts/reboot-recovery/post-reboot-readiness-summary.sh +++ b/scripts/reboot-recovery/post-reboot-readiness-summary.sh @@ -196,6 +196,9 @@ else next_required_gates_csv="$(IFS=,; echo "${next_required_gates[*]}")" fi +summary_file="$ARTIFACT_DIR/summary.txt" + +{ cat <