ops(reboot): persist summary evidence and classify warmup routes
This commit is contained in:
@@ -45664,3 +45664,41 @@ production browser smoke:
|
||||
- Wazuh manager registry accepted 仍為 `0`:不得宣稱 Wazuh 全主機納管恢復。
|
||||
- Owner response received / accepted 仍為 `0 / 0`;不得把「批准繼續」、空模板、UI 可見、route `200`、transport `6`、Dashboard index pattern `3` 或 owner-packet JSON 當成 evidence accepted。
|
||||
- Runtime action / host write / credential marker write / Wazuh active response / Kali active scan 仍全部 `0 / false`。
|
||||
|
||||
## 2026-06-26 — 17:45 single-summary replay / route warmup classifier / SOP v1.75
|
||||
|
||||
**時間與來源**:
|
||||
- 2026-06-26 17:39-17:45 Asia/Taipei。
|
||||
- 來源:`scripts/reboot-recovery/post-reboot-readiness-summary.sh --no-color`、`scripts/reboot-recovery/post-start-quick-check.sh --no-color`、`scripts/reboot-recovery/post-reboot-declaration-guard.py --summary-file ...`、`scripts/reboot-recovery/post-reboot-next-gate-dispatch.sh --summary-file ...`、owner packet / contract / owner response preflight。
|
||||
|
||||
**完成內容**:
|
||||
- `post-reboot-readiness-summary.sh` 會把 stdout 的 key/value summary 同步寫入 `$ARTIFACT_DIR/summary.txt`,讓同一輪 declaration guard、next-gate dispatch、owner packet、contract guard 與 owner response preflight 都吃同一份 evidence。
|
||||
- `post-start-quick-check.sh` 新增 delegated cold-start blocker 分類:cold-start 若只因 public route 單次 502 / TLS readback 暫時 blocked,但 wrapper 自己的 route retry 已全部恢復,該 blocker 降級為 evidence warning;非 route blocker 或 retry 後仍失敗仍為 hard blocker。
|
||||
- `post-reboot-owner-response-preflight.py` 的 JSON loader 錯誤訊息已區分 `owner_packet_file_*` 與 `response_file_*`,避免 race 或缺檔時誤導 operator。
|
||||
- `docs/runbooks/FULL-STACK-COLD-START-SOP.md` 升至 v1.75;`docs/runbooks/REBOOT-POST-START-QUICK-CHECK.md` 升至 v1.15;workplan 狀態更新為 `DONE_WITH_SINGLE_SUMMARY_REPLAY_V175`。
|
||||
|
||||
**live / replay 證據**:
|
||||
- 17:42 首輪修補後 summary 確認 `summary.txt` 已寫入 `/tmp/awoooi-post-reboot-readiness-20260626-174129/summary.txt`,但 delegated cold-start 因 `stock.wooo.work` 單次 502 / TLS check failure 產生 `POST_START_BLOCKED=1`;同一輪 wrapper route retry 已顯示 `WARN 200 https://stock.wooo.work/ recovered_after_attempt=2`,且 StockPlatform freshness 為 `status=ok`。
|
||||
- 分類修正後,`scripts/reboot-recovery/post-start-quick-check.sh --no-color` 回到 `POST_START_QUICK_CHECK PASS=38 WARN=4 BLOCKED=0`、`POST_START_QUICK_CHECK_WARNINGS SERVICE=0 BOUNDARY=1 EVIDENCE=3`、`RESULT=FULL_STACK_GREEN_DR_ESCROW_BLOCKED`。
|
||||
- 最終 summary artifact:`/tmp/awoooi-post-reboot-readiness-20260626-174451/summary.txt`,回傳 `POST_START_RC=0`、`POST_START_RESULT=FULL_STACK_GREEN_DR_ESCROW_BLOCKED`、`POST_START_PASS=38`、`POST_START_WARN=4`、`POST_START_BLOCKED=0`、`SERVICE_GREEN=1`、`PRODUCT_DATA_GREEN=1`、`BACKUP_CORE_GREEN=1`、`DR_ESCROW_BLOCKED=1`、`ESCROW_MISSING_COUNT=5`、`HOST_188_HYGIENE_BLOCKED=0`、`HOST_188_RESULT=HOST_188_HYGIENE_GREEN.`、`WAZUH_MANAGER_REGISTRY_ACCEPTED=0`、`RUNTIME_ACTION_AUTHORIZED=0`、`OVERALL_DECLARATION=FULL_STACK_GREEN_DR_ESCROW_BLOCKED`、`NEXT_REQUIRED_GATES=credential_escrow_evidence,wazuh_manager_registry_export`。
|
||||
- 固定 summary 重放:`post-reboot-next-gate-dispatch.sh --summary-file /tmp/awoooi-post-reboot-readiness-20260626-174451/summary.txt` 只輸出兩個 P0 gate:`credential_escrow_evidence` 與 `wazuh_manager_registry_export`;`DISPATCH_AUTHORIZED=0`、`REQUEST_SENT_COUNT=0`、`HOST_WRITE_AUTHORIZED=0`、`SECRET_VALUE_COLLECTION_ALLOWED=0`。
|
||||
- Declaration guard 使用同一份 summary 正確拒絕 proposed `DR_COMPLETE`、`WAZUH_REGISTRY_RECOVERED`、`RUNTIME_ACTION_AUTHORIZED`:`POST_REBOOT_DECLARATION_GUARD_REJECTED status=blocked_false_green_proposal allowed=5 forbidden=3 next_gates=2 rejected_proposed=3`。
|
||||
- Owner packet contract guard:`POST_REBOOT_OWNER_PACKET_CONTRACT_GUARD_OK gates=2 request_sent=0 accepted=0 runtime_gate=0`。
|
||||
- Owner response preflight:無 response file 維持 `blocked_waiting_owner_response_file`;placeholder template 維持 `blocked_waiting_owner_response_content`、`received=0`、`accepted=0`、`runtime_gate=0`。
|
||||
|
||||
**做過的命令類型**:
|
||||
- 只讀:live summary、quick-check、declaration guard、dispatch replay、owner packet / contract / preflight 驗證。
|
||||
- 寫入:repo script / runbook / workplan / LOGBOOK only。
|
||||
- 未做:沒有 host / Docker / systemd / Nginx / firewall / K8s / DB / Wazuh runtime 寫操作;沒有讀 secret 明文;沒有寫 credential marker;沒有送 owner request;沒有 Wazuh active response / agent re-enroll / restart;沒有 Kali active scan。
|
||||
|
||||
**目前判定**:
|
||||
- Reboot service / product data / backup / 188 host hygiene:`GREEN`。
|
||||
- Overall recovery declaration:`FULL_STACK_GREEN_DR_ESCROW_BLOCKED`。
|
||||
- SOP / quick-check / single-summary evidence chain:v1.75。
|
||||
- Route warmup no-false-blocker classifier:`100%`。
|
||||
|
||||
**仍 blocked / 不得宣稱**:
|
||||
- DR credential escrow evidence 仍缺 `5`:不得宣稱 `DR_COMPLETE` 或 credential escrow complete。
|
||||
- Wazuh manager registry accepted 仍為 `0`:不得宣稱 Wazuh 全主機納管恢復。
|
||||
- Owner response received / accepted 仍為 `0 / 0`;不得把「批准繼續」、空模板、UI 可見、route `200`、transport `6`、Dashboard index pattern `3` 或 owner-packet JSON 當成 evidence accepted。
|
||||
- Runtime action / host write / credential marker write / Wazuh active response / Kali active scan 仍全部 `0 / false`。
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# AWOOOI 全棧冷啟動與主機重啟 SOP
|
||||
|
||||
> Version: v1.74
|
||||
> Version: v1.75
|
||||
> Last updated: 2026-06-26 Asia/Taipei
|
||||
> Scope: 110 / 120 / 121 / 188 full-stack reboot recovery. 112 Kali is recorded as P3 optional and is not part of this recovery path.
|
||||
|
||||
@@ -10,12 +10,14 @@
|
||||
|
||||
本節是每次接手、開機、關機、重啟後的第一個判定錨點。若日期不是今天,必須先重跑 live check,再更新本節與 `docs/workplans/2026-06-04-reboot-cold-start-backup-recovery-workplan.md`。
|
||||
|
||||
若只是重啟後要快速判斷能不能宣稱恢復,先跑機器可讀摘要:`scripts/reboot-recovery/post-reboot-readiness-summary.sh --no-color`。此腳本會呼叫一頁式總檢查、188 host hygiene checklist 與 Wazuh no-false-green repo gates,並把 delegated logs 留在 `/tmp/awoooi-post-reboot-readiness-*`。接著跑 `scripts/reboot-recovery/post-reboot-declaration-guard.py --no-color`,把 summary 轉成 allowed / forbidden declaration,避免把服務綠誤報成 DR complete、188 host hygiene、Wazuh registry recovered 或 runtime authorized。若 summary 顯示 `SERVICE_GREEN=1` 但 `NEXT_REQUIRED_GATES` 仍非空,再跑 `scripts/reboot-recovery/post-reboot-next-gate-dispatch.sh --no-color`,把 live summary 內尚未完成的 blocker 轉成 owner / evidence / forbidden-action dispatch checklist;需要機器可讀 intake 時,再跑 `scripts/reboot-recovery/post-reboot-next-gate-owner-packets.py --no-color --output /tmp/awoooi-post-reboot-owner-packets.json` 產生 `awoooi_post_reboot_next_gate_owner_packets_v1` JSON,並立刻跑 `scripts/reboot-recovery/post-reboot-owner-packet-contract-guard.py --packet-file /tmp/awoooi-post-reboot-owner-packets.json`。dispatch / packet / guard 均固定 `DISPATCH_AUTHORIZED=0`、`REQUEST_SENT_COUNT=0`、`OWNER_RESPONSE_ACCEPTED=0`、`HOST_WRITE_AUTHORIZED=0`、`SECRET_VALUE_COLLECTION_ALLOWED=0`、`RUNTIME_GATE=0`;guard 未通過時不得送 owner request、不得寫 escrow marker、不得進維護窗口、不得宣稱 DR / Wazuh registry complete。v1.74 起,任何 owner response JSON 還必須經過 `scripts/reboot-recovery/post-reboot-owner-response-preflight.py --no-color --response-file <file>`:空模板、placeholder、secret payload、runtime action request、credential marker write、Wazuh active response / re-enroll / restart、Kali active scan 或缺少 Dashboard API / manager registry evidence 都必須 fail-closed;preflight 通過也只表示可進入獨立 reviewer acceptance,不是 runtime 授權。需要人工展開時,再跑 `scripts/reboot-recovery/post-start-quick-check.sh --no-color` 並以 `docs/runbooks/REBOOT-POST-START-QUICK-CHECK.md` 作為 fallback。長 SOP 保留完整背景、例外處理與 Plan B;短版 wrapper / checklist 負責每次 T+10 分鐘內的固定判定。
|
||||
若只是重啟後要快速判斷能不能宣稱恢復,先跑機器可讀摘要:`scripts/reboot-recovery/post-reboot-readiness-summary.sh --no-color`。此腳本會呼叫一頁式總檢查、188 host hygiene checklist 與 Wazuh no-false-green repo gates,並把 delegated logs 和可重放的 `summary.txt` 留在 `/tmp/awoooi-post-reboot-readiness-*`。v1.75 起,同一輪驗收後續步驟必須吃同一個 `$ARTIFACT_DIR/summary.txt`,例如 `scripts/reboot-recovery/post-reboot-declaration-guard.py --summary-file "$ARTIFACT_DIR/summary.txt" --no-color` 與 `scripts/reboot-recovery/post-reboot-next-gate-dispatch.sh --summary-file "$ARTIFACT_DIR/summary.txt" --no-color`;不得在同一輪 evidence chain 反覆重跑 live probes 後混用不同時間點結論。宣告 guard 會把 summary 轉成 allowed / forbidden declaration,避免把服務綠誤報成 DR complete、188 host hygiene、Wazuh registry recovered 或 runtime authorized。若 summary 顯示 `SERVICE_GREEN=1` 但 `NEXT_REQUIRED_GATES` 仍非空,再由 dispatch checklist 把尚未完成的 blocker 轉成 owner / evidence / forbidden-action checklist;需要機器可讀 intake 時,再跑 `scripts/reboot-recovery/post-reboot-next-gate-owner-packets.py --dispatch-file <dispatch.txt> --output /tmp/awoooi-post-reboot-owner-packets.json` 產生 `awoooi_post_reboot_next_gate_owner_packets_v1` JSON,並立刻跑 `scripts/reboot-recovery/post-reboot-owner-packet-contract-guard.py --packet-file /tmp/awoooi-post-reboot-owner-packets.json`。dispatch / packet / guard 均固定 `DISPATCH_AUTHORIZED=0`、`REQUEST_SENT_COUNT=0`、`OWNER_RESPONSE_ACCEPTED=0`、`HOST_WRITE_AUTHORIZED=0`、`SECRET_VALUE_COLLECTION_ALLOWED=0`、`RUNTIME_GATE=0`;guard 未通過時不得送 owner request、不得寫 escrow marker、不得進維護窗口、不得宣稱 DR / Wazuh registry complete。v1.74 起,任何 owner response JSON 還必須經過 `scripts/reboot-recovery/post-reboot-owner-response-preflight.py --no-color --owner-packet-file <owner-packets.json> --response-file <file>`:空模板、placeholder、secret payload、runtime action request、credential marker write、Wazuh active response / re-enroll / restart、Kali active scan 或缺少 Dashboard API / manager registry evidence 都必須 fail-closed;preflight 通過也只表示可進入獨立 reviewer acceptance,不是 runtime 授權。需要人工展開時,再跑 `scripts/reboot-recovery/post-start-quick-check.sh --no-color` 並以 `docs/runbooks/REBOOT-POST-START-QUICK-CHECK.md` 作為 fallback。長 SOP 保留完整背景、例外處理與 Plan B;短版 wrapper / checklist 負責每次 T+10 分鐘內的固定判定。
|
||||
|
||||
2026-06-26 12:13 latest live summary supersedes the 08:59 gate set:`scripts/reboot-recovery/post-reboot-readiness-summary.sh --no-color` 回傳 `POST_START_RESULT=FULL_STACK_GREEN_DR_ESCROW_BLOCKED`、`POST_START_PASS=38`、`POST_START_WARN=4`、`POST_START_BLOCKED=0`、`SERVICE_GREEN=1`、`PRODUCT_DATA_GREEN=1`、`BACKUP_CORE_GREEN=1`、`DR_ESCROW_BLOCKED=1`、`ESCROW_MISSING_COUNT=5`、`HOST_188_SERVICE_GREEN=1`、`HOST_188_HYGIENE_BLOCKED=0`、`HOST_188_RESULT=HOST_188_HYGIENE_GREEN.`、`WAZUH_ROUTE_CODE=200`、`WAZUH_TRANSPORT_COUNT=6`、`WAZUH_MANAGER_REGISTRY_ACCEPTED=0`、`WAZUH_DASHBOARD_API_CONNECTION=pending_or_spinning`、`WAZUH_DASHBOARD_INDEX_OK=3`、`RUNTIME_ACTION_AUTHORIZED=0`、`OVERALL_DECLARATION=FULL_STACK_GREEN_DR_ESCROW_BLOCKED`、`NEXT_REQUIRED_GATES=credential_escrow_evidence,wazuh_manager_registry_export`。188 host hygiene 已從 blocker 移除;目前不可宣稱完成的只剩 DR credential escrow 與 Wazuh manager registry。ACME HTTP-01 route 與 certbot timer hygiene 已修復,但不得宣稱憑證已正式 renew,需等 snap certbot timer / ACME window readback。
|
||||
|
||||
2026-06-26 13:01 owner response preflight baseline:新增 `scripts/reboot-recovery/post-reboot-owner-response-preflight.py --no-color` 與 `docs/templates/post-reboot-next-gate-owner-response.json`。無 response file 時必須輸出 `POST_REBOOT_OWNER_RESPONSE_PREFLIGHT_BLOCKED status=blocked_waiting_owner_response_file expected_gates=2 received=0 accepted=0 runtime_gate=0`;直接使用模板時必須輸出 `POST_REBOOT_OWNER_RESPONSE_PREFLIGHT_BLOCKED status=blocked_waiting_owner_response_content expected_gates=2 received=0 accepted=0 runtime_gate=0`。此 gate 只驗收 `credential_escrow_evidence` 與 `wazuh_manager_registry_export` 的脫敏 owner evidence,不送 request、不寫 escrow marker、不讀 secret、不做 Wazuh / host / Kali runtime action,也不把一般批准訊息轉成 owner accepted。
|
||||
|
||||
2026-06-26 17:45 single-summary replay baseline:`scripts/reboot-recovery/post-reboot-readiness-summary.sh --no-color` 現在會自動寫入 `/tmp/awoooi-post-reboot-readiness-20260626-174451/summary.txt`,同一輪後續 `declaration guard`、`next-gate dispatch`、`owner packet`、`contract guard` 與 `owner response preflight` 均用此 summary 重放。17:45 summary 回傳 `SERVICE_GREEN=1`、`PRODUCT_DATA_GREEN=1`、`BACKUP_CORE_GREEN=1`、`DR_ESCROW_BLOCKED=1`、`ESCROW_MISSING_COUNT=5`、`HOST_188_HYGIENE_BLOCKED=0`、`WAZUH_MANAGER_REGISTRY_ACCEPTED=0`、`OVERALL_DECLARATION=FULL_STACK_GREEN_DR_ESCROW_BLOCKED`、`NEXT_REQUIRED_GATES=credential_escrow_evidence,wazuh_manager_registry_export`。`post-start-quick-check.sh` 也已補 route warmup 分類:若 delegated cold-start 的 `BLOCKED` 全部是 public route,且 wrapper 自己的 route retry 已全部恢復,該 cold-start blocker 會降級為 evidence warning,不再把整輪服務恢復誤判成 blocked;非 route blocker 或 retry 後仍失敗仍維持 hard blocked。
|
||||
|
||||
2026-06-26 07:47 machine-readable readiness summary retained as historical pre-repair evidence:當時 `HOST_188_HYGIENE_BLOCKED=1`、`NEXT_REQUIRED_GATES=credential_escrow_evidence,host_188_hygiene_maintenance_window,wazuh_manager_registry_export`。此段只用來比對 188 修復前後差異;現行 gate set 必須使用 12:13 baseline。
|
||||
|
||||
2026-06-26 08:12 next-gate dispatch baseline retained as historical pre-repair evidence:當時 output 固定三個 P0 checklist。12:13 起 dispatch 依 live summary 動態輸出,目前 expected `NEXT_GATE_COUNT=2`,只剩 credential escrow 與 Wazuh registry。
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# 主機重啟後一頁式總檢查
|
||||
|
||||
> Version: v1.14
|
||||
> Version: v1.15
|
||||
> Last updated: 2026-06-26 Asia/Taipei
|
||||
> Scope: 110 / 120 / 121 / 188 post-reboot service recovery. 112 Kali / Wazuh / active scan 不屬於本流程。
|
||||
|
||||
@@ -10,7 +10,7 @@
|
||||
|
||||
每次 110 / 120 / 121 / 188 任一台主機開機、關機、重啟、斷電恢復、VMware console fsck、Docker / K3s 大量重排後,都先跑本頁,再決定是否宣稱恢復。
|
||||
|
||||
最新基準:2026-06-26 13:01 post-reboot owner response preflight。`scripts/reboot-recovery/post-reboot-readiness-summary.sh --no-color` 回傳 `SERVICE_GREEN=1`、`PRODUCT_DATA_GREEN=1`、`BACKUP_CORE_GREEN=1`、`DR_ESCROW_BLOCKED=1`、`ESCROW_MISSING_COUNT=5`、`HOST_188_HYGIENE_BLOCKED=0`、`HOST_188_RESULT=HOST_188_HYGIENE_GREEN.`、`WAZUH_MANAGER_REGISTRY_ACCEPTED=0`、`WAZUH_COVERAGE_SCOPE=6`、`WAZUH_DIRECT_ACTIVE=2`、`WAZUH_NO_TRANSPORT=1`、`WAZUH_SSH_BLOCKED=3`、`WAZUH_ROUTE_CODE=200`、`WAZUH_TRANSPORT_COUNT=6`、`WAZUH_DASHBOARD_API_CONNECTION=pending_or_spinning`、`WAZUH_DASHBOARD_INDEX_OK=3`、`RUNTIME_ACTION_AUTHORIZED=0`、`OVERALL_DECLARATION=FULL_STACK_GREEN_DR_ESCROW_BLOCKED`。`scripts/reboot-recovery/post-reboot-declaration-guard.py --no-color` 會把 summary 轉成 allowed / forbidden declaration:目前允許宣稱服務、產品資料、備份核心、188 host hygiene green 與 `FULL_STACK_GREEN_DR_ESCROW_BLOCKED`;禁止宣稱 `DR_COMPLETE`、`WAZUH_REGISTRY_RECOVERED`、`RUNTIME_ACTION_AUTHORIZED`。接著 `scripts/reboot-recovery/post-reboot-next-gate-dispatch.sh --no-color` 將 `NEXT_REQUIRED_GATES=credential_escrow_evidence,wazuh_manager_registry_export` 展成 owner / evidence / forbidden-action checklist;Wazuh checklist 的 `CURRENT_EVIDENCE` 會保留 registry accepted、coverage scope、direct active、no transport、SSH blocked、route、transport、Dashboard API 與 index pattern 狀態,避免把 route `200` 或 transport `6` 誤報成 registry recovered。`scripts/reboot-recovery/post-reboot-next-gate-owner-packets.py --no-color` 進一步轉成 `awoooi_post_reboot_next_gate_owner_packets_v1` JSON,固定 `dispatch_authorized=0`、`request_sent_count=0`、`owner_response_accepted_count=0`、`host_write_authorized=0`、`secret_value_collection_allowed=0`、`runtime_gate_count=0`;`scripts/reboot-recovery/post-reboot-owner-packet-contract-guard.py --packet-file /tmp/awoooi-post-reboot-owner-packets.json` 依 live `next_required_gates` 動態鎖定 P0 gate、所有 `0 / false` 邊界、禁用 secret payload / runtime action 與 no-false-green 規則。新增 `scripts/reboot-recovery/post-reboot-owner-response-preflight.py --no-color` 作為 owner response 收件預檢:沒有 response file 必須是 `blocked_waiting_owner_response_file`;直接套用 `docs/templates/post-reboot-next-gate-owner-response.json` 必須是 `blocked_waiting_owner_response_content`;只有具備遮罩 evidence refs、完整 owner 欄位、Wazuh registry / Dashboard API 狀態、五個 credential escrow 非 secret evidence refs,且沒有 secret value / runtime action request 的 response 才能進入下一層 reviewer acceptance。DR 仍因 `escrow_missing=5` 不可宣稱 complete;Wazuh manager registry 仍是 service green 之外的獨立 blocker。ACME HTTP-01 route / certbot timer hygiene 已修復,但憑證正式 renew 成功需等 snap certbot timer 或獨立 ACME window readback。
|
||||
最新基準:2026-06-26 17:45 single-summary replay / route warmup classifier。`scripts/reboot-recovery/post-reboot-readiness-summary.sh --no-color` 回傳 `SERVICE_GREEN=1`、`PRODUCT_DATA_GREEN=1`、`BACKUP_CORE_GREEN=1`、`DR_ESCROW_BLOCKED=1`、`ESCROW_MISSING_COUNT=5`、`HOST_188_HYGIENE_BLOCKED=0`、`HOST_188_RESULT=HOST_188_HYGIENE_GREEN.`、`WAZUH_MANAGER_REGISTRY_ACCEPTED=0`、`WAZUH_COVERAGE_SCOPE=6`、`WAZUH_DIRECT_ACTIVE=2`、`WAZUH_NO_TRANSPORT=1`、`WAZUH_SSH_BLOCKED=3`、`WAZUH_ROUTE_CODE=200`、`WAZUH_TRANSPORT_COUNT=6`、`WAZUH_DASHBOARD_API_CONNECTION=pending_or_spinning`、`WAZUH_DASHBOARD_INDEX_OK=3`、`RUNTIME_ACTION_AUTHORIZED=0`、`OVERALL_DECLARATION=FULL_STACK_GREEN_DR_ESCROW_BLOCKED`,並自動把同一份 key/value 寫到 `$ARTIFACT_DIR/summary.txt`。同一輪後續 `post-reboot-declaration-guard.py`、`post-reboot-next-gate-dispatch.sh`、`post-reboot-next-gate-owner-packets.py`、`post-reboot-owner-packet-contract-guard.py`、`post-reboot-owner-response-preflight.py` 必須使用這份 `summary.txt` 或由它產生的 dispatch / packet,不得混用多次 live probe 的不同時間點結果。`NEXT_REQUIRED_GATES=credential_escrow_evidence,wazuh_manager_registry_export` 仍是唯一目前 next gates;DR 仍因 `escrow_missing=5` 不可宣稱 complete;Wazuh manager registry accepted 仍是 `0`,不可把 route `200`、transport `6`、Dashboard index pattern `3` 當成 registry recovered。v1.15 另補 route warmup classifier:delegated cold-start 若只因 public route 單次 502 / TLS readback 暫時 blocked,但 wrapper route retry 已確認全部恢復,該 blocker 會降級為 evidence warning;非 route blocker 或 retry 後仍失敗仍為 hard blocked。
|
||||
|
||||
本頁只回答四件事:
|
||||
|
||||
@@ -49,7 +49,7 @@
|
||||
scripts/reboot-recovery/post-reboot-readiness-summary.sh --no-color
|
||||
```
|
||||
|
||||
此 summary 只做 read-only 檢查,會委派一頁式總檢查、188 host hygiene checklist 與 Wazuh repo-side no-false-green gates,並將 delegated logs 保留在 `/tmp/awoooi-post-reboot-readiness-*`。第一眼先看這些欄位:
|
||||
此 summary 只做 read-only 檢查,會委派一頁式總檢查、188 host hygiene checklist 與 Wazuh repo-side no-false-green gates,並將 delegated logs 與可重放的 `summary.txt` 保留在 `/tmp/awoooi-post-reboot-readiness-*`。第一眼先看這些欄位:
|
||||
|
||||
- `SERVICE_GREEN=1`:服務面可宣稱恢復。
|
||||
- `PRODUCT_DATA_GREEN=1`:MOMO / StockPlatform 主要資料 freshness 可宣稱恢復。
|
||||
@@ -67,6 +67,13 @@ summary 顯示 `SERVICE_GREEN=1` 後,先跑宣告 guard,確認本輪可以
|
||||
scripts/reboot-recovery/post-reboot-declaration-guard.py --no-color
|
||||
```
|
||||
|
||||
若已經有 `$ARTIFACT_DIR/summary.txt`,同一輪建議固定使用它,避免重跑 live probes 時被 CI / route warmup 瞬間狀態影響:
|
||||
|
||||
```bash
|
||||
SUMMARY_FILE=/tmp/awoooi-post-reboot-readiness-YYYYMMDD-HHMMSS/summary.txt
|
||||
scripts/reboot-recovery/post-reboot-declaration-guard.py --no-color --summary-file "$SUMMARY_FILE"
|
||||
```
|
||||
|
||||
這支 guard 只讀取 summary evidence,並輸出本輪允許與禁止宣稱的邊界。若要測試某個說法是否允許,可用 `--proposed`:
|
||||
|
||||
```bash
|
||||
@@ -81,6 +88,12 @@ summary 顯示 `SERVICE_GREEN=1` 但仍有 `NEXT_REQUIRED_GATES` 時,再跑下
|
||||
scripts/reboot-recovery/post-reboot-next-gate-dispatch.sh --no-color
|
||||
```
|
||||
|
||||
同一輪 evidence chain 應改用同一份 summary:
|
||||
|
||||
```bash
|
||||
scripts/reboot-recovery/post-reboot-next-gate-dispatch.sh --no-color --summary-file "$SUMMARY_FILE" | tee /tmp/awoooi-post-reboot-dispatch.txt
|
||||
```
|
||||
|
||||
這支腳本只把目前 live summary 內的 `NEXT_REQUIRED_GATES` 轉成 owner / required evidence / forbidden action / done criteria。2026-06-26 12:13 起通常只剩 `credential_escrow_evidence` 與 `wazuh_manager_registry_export`;若未來 188 又紅,才會重新出現 `host_188_hygiene_maintenance_window`。它不送 request、不讀 secret、不寫 marker、不 restart / reload / repair / import / delete / patch,也不授權 host / Wazuh / Nginx / K8s / DB runtime action。若它輸出 `SERVICE_GREEN=0`,先回到服務恢復,不進入 boundary dispatch。
|
||||
|
||||
若要交給 AI / 工單 / owner review 使用,產生機器可讀 owner packet:
|
||||
@@ -94,7 +107,7 @@ scripts/reboot-recovery/post-reboot-next-gate-owner-packets.py --no-color
|
||||
送入任何 owner review queue 前,必須先把 JSON 存成 artifact 並跑 contract guard:
|
||||
|
||||
```bash
|
||||
scripts/reboot-recovery/post-reboot-next-gate-owner-packets.py --no-color --output /tmp/awoooi-post-reboot-owner-packets.json
|
||||
scripts/reboot-recovery/post-reboot-next-gate-owner-packets.py --no-color --dispatch-file /tmp/awoooi-post-reboot-dispatch.txt --output /tmp/awoooi-post-reboot-owner-packets.json
|
||||
scripts/reboot-recovery/post-reboot-owner-packet-contract-guard.py --packet-file /tmp/awoooi-post-reboot-owner-packets.json
|
||||
```
|
||||
|
||||
@@ -104,7 +117,8 @@ guard 必須輸出 `POST_REBOOT_OWNER_PACKET_CONTRACT_GUARD_OK gates=<live_next_
|
||||
|
||||
```bash
|
||||
scripts/reboot-recovery/post-reboot-owner-response-preflight.py --no-color
|
||||
scripts/reboot-recovery/post-reboot-owner-response-preflight.py --no-color --response-file docs/templates/post-reboot-next-gate-owner-response.json
|
||||
scripts/reboot-recovery/post-reboot-owner-response-preflight.py --no-color --owner-packet-file /tmp/awoooi-post-reboot-owner-packets.json
|
||||
scripts/reboot-recovery/post-reboot-owner-response-preflight.py --no-color --owner-packet-file /tmp/awoooi-post-reboot-owner-packets.json --response-file docs/templates/post-reboot-next-gate-owner-response.json
|
||||
```
|
||||
|
||||
第一個命令必須輸出 `POST_REBOOT_OWNER_RESPONSE_PREFLIGHT_BLOCKED status=blocked_waiting_owner_response_file expected_gates=2 received=0 accepted=0 runtime_gate=0`。第二個命令必須輸出 `POST_REBOOT_OWNER_RESPONSE_PREFLIGHT_BLOCKED status=blocked_waiting_owner_response_content expected_gates=2 received=0 accepted=0 runtime_gate=0`,證明空模板不能被算成已收件或已接受。合格 response 只能包含脫敏 evidence refs、owner role / team / decision / reviewer / followup owner、五個 escrow item 的 non-secret evidence ref,以及 Wazuh manager registry / Dashboard API readback;不得包含密碼、token、secret value、hash、prefix/suffix、raw Wazuh payload、agent 原名、內網 IP、`client.keys`、active response、host write、agent re-enroll、Wazuh restart、Kali active scan 或 credential marker write。preflight 通過也只代表可進入獨立 reviewer acceptance,不代表 `DR_COMPLETE`、`WAZUH_REGISTRY_RECOVERED` 或任何 runtime action 授權。
|
||||
|
||||
@@ -15,7 +15,7 @@
|
||||
| P0 host / K3s recovery | DONE | 100% | 120 booted after console fsck at `2026-06-12 15:13`; latest 2026-06-26 07:19 readback shows 120 and 121 reachable, K3s active, `mon` and `mon1` both `Ready control-plane`, AWOOOI API/Web replicas split across both nodes, ArgoCD `awoooi-prod Synced / Healthy` at revision `1fd5e2a8b0f18d24eed16aa2a44286bcbf230603`, and `km-vectorize` official 03:00 台北時間 run succeeded with `lastSuccess=2026-06-25T19:00:14Z`. |
|
||||
| P1 backup / alert / escrow | BLOCKED_DR_ESCROW | 97% | 2026-06-26 06:58 backup readback shows 110 `13/13 fresh failed=0`, 188 `2/2 fresh failed=0`, `core_blockers=0`, `integrity_stale=0`, `offsite_fresh=1`, `rclone_gdrive_fresh=1`, `escrow_missing=5`, last aggregate `2026-06-26 02:31:02`。DR remains blocked on real non-secret credential escrow evidence IDs; do not write placeholder markers or paste secret values. |
|
||||
| P2 service / data truth | DONE | 100% | Service routes and core runtime are available, 110 current CPU pressure is attributable to active AWOOOI Web `turbo build` / Docker buildx, and previous orphan Chrome groups remain cleared. 2026-06-26 07:19 StockPlatform `/api/v1/system/freshness` returned `200`; 07:01 freshness payload was `status=ok`, `latest_trading_date=2026-06-25`, blockers `[]`; price / chips / margin / AI recommendations are all on `2026-06-25`. `ai.recommendations` row count is `2868`; `core.margin_short_daily` row count is `1976`. MOMO health `V10.699`, current-month parity `15383|15383|2026-06-01|2026-06-24|2026-06-01|2026-06-24`, and `MOMO_DAILY_FRESHNESS 1|2026-06-24` are green; expanded public routes are green. |
|
||||
| P3 docs / automation contracts | DONE_WITH_OWNER_RESPONSE_PREFLIGHT_V174 | 100% | Workplan, SOP v1.74, post-reboot declaration guard, machine-readable post-reboot readiness summary with Wazuh registry detail fields, post-reboot next-gate dispatch checklist, owner-packet JSON generator, dynamic owner-packet contract guard, post-reboot owner response preflight, owner response placeholder template, one-page post-start quick check v1.14, route retry gate, deploy warmup classification, expanded public route list, StockPlatform freshness gate, StockPlatform cron-source recovery evidence, StockPlatform natural schedule green evidence, 110 orphan Chrome recurrence cleanup evidence, 188 fail-closed startup data recovery gate, 188 host hygiene read-only checklist, 188 PostgreSQL runtime-ready source-of-truth, 188 ACME route/timer hygiene, baseline `stockplatform_system_freshness_ok`, BACKUP-STATUS, LOGBOOK, 120 console/fsck recovery, Gitea backup stale-dump hardening, reboot ledger/version-comparison SOP, escrow evidence audit, 188 nginx Ansible baseline, 110 cold-start detector script, startup judgment layers, GO/NO-GO tree, host recovery cards, explicit Plan B degraded-operation path, machine-readable `plan_b` baseline, readiness-audit Plan B guard, B0-B5 service levels, T+0/T+120 fallback timeline checks, host role / load-balancing assessment, CD `known_hosts` guardrail, `fwupd-refresh.timer` rollback note, K3s filesystem event blocker, AWOOOI backup no-direct-offsite-sync contract, 110/188 Ansible source-of-truth, Gitea self-hosted readiness validation workflow, post-CD no-regression readbacks, stale-vs-active K8s failed Job classification, 110 runaway browser / CI load AIOps exporter + alert + gated remediation PlayBook, Telegram / AI event packet mapping, healthy heartbeat Telegram suppression, MOMO scheduler / current-month detector fix, exporter restore helpers, 110 Docker disk pressure cleanup boundary, notification-noise readback, MOMO import-boundary / Drive-auth fail-closed deploys, product version/readback matrix, and stricter product-data / route retry gates are updated. Declaration guard now machine-checks allowed / forbidden recovery statements: service/data/backup/188 host hygiene green may be declared when live summary says so, while `DR_COMPLETE`、`WAZUH_REGISTRY_RECOVERED` and `RUNTIME_ACTION_AUTHORIZED` remain forbidden until evidence gates close. Owner response preflight blocks missing files, placeholder templates, secret payloads, credential marker writes, Wazuh active response / re-enroll / restart, host write, and Kali active scan before any evidence can be counted as received or accepted. Live 110 script sync remains a separate approved live-write gate; do not claim it here. |
|
||||
| P3 docs / automation contracts | DONE_WITH_SINGLE_SUMMARY_REPLAY_V175 | 100% | Workplan, SOP v1.75, post-reboot declaration guard, machine-readable post-reboot readiness summary with Wazuh registry detail fields and auto-persisted `summary.txt`, post-reboot next-gate dispatch checklist, owner-packet JSON generator, dynamic owner-packet contract guard, post-reboot owner response preflight, owner response placeholder template, one-page post-start quick check v1.15, route retry gate, delegated cold-start public-route warmup classifier, deploy warmup classification, expanded public route list, StockPlatform freshness gate, StockPlatform cron-source recovery evidence, StockPlatform natural schedule green evidence, 110 orphan Chrome recurrence cleanup evidence, 188 fail-closed startup data recovery gate, 188 host hygiene read-only checklist, 188 PostgreSQL runtime-ready source-of-truth, 188 ACME route/timer hygiene, baseline `stockplatform_system_freshness_ok`, BACKUP-STATUS, LOGBOOK, 120 console/fsck recovery, Gitea backup stale-dump hardening, reboot ledger/version-comparison SOP, escrow evidence audit, 188 nginx Ansible baseline, 110 cold-start detector script, startup judgment layers, GO/NO-GO tree, host recovery cards, explicit Plan B degraded-operation path, machine-readable `plan_b` baseline, readiness-audit Plan B guard, B0-B5 service levels, T+0/T+120 fallback timeline checks, host role / load-balancing assessment, CD `known_hosts` guardrail, `fwupd-refresh.timer` rollback note, K3s filesystem event blocker, AWOOOI backup no-direct-offsite-sync contract, 110/188 Ansible source-of-truth, Gitea self-hosted readiness validation workflow, post-CD no-regression readbacks, stale-vs-active K8s failed Job classification, 110 runaway browser / CI load AIOps exporter + alert + gated remediation PlayBook, Telegram / AI event packet mapping, healthy heartbeat Telegram suppression, MOMO scheduler / current-month detector fix, exporter restore helpers, 110 Docker disk pressure cleanup boundary, notification-noise readback, MOMO import-boundary / Drive-auth fail-closed deploys, product version/readback matrix, and stricter product-data / route retry gates are updated. Declaration guard now machine-checks allowed / forbidden recovery statements from the same `summary.txt`: service/data/backup/188 host hygiene green may be declared when live summary says so, while `DR_COMPLETE`、`WAZUH_REGISTRY_RECOVERED` and `RUNTIME_ACTION_AUTHORIZED` remain forbidden until evidence gates close. Owner response preflight blocks missing files, placeholder templates, secret payloads, credential marker writes, Wazuh active response / re-enroll / restart, host write, and Kali active scan before any evidence can be counted as received or accepted. Live 110 script sync remains a separate approved live-write gate; do not claim it here. |
|
||||
|
||||
2026-06-26 12:13 machine-readable summary baseline supersedes the 07:47 / 08:59 gate set: `scripts/reboot-recovery/post-reboot-readiness-summary.sh --no-color` stores delegated logs under `/tmp/awoooi-post-reboot-readiness-20260626-121303` and returns `SERVICE_GREEN=1`, `PRODUCT_DATA_GREEN=1`, `BACKUP_CORE_GREEN=1`, `DR_ESCROW_BLOCKED=1`, `ESCROW_MISSING_COUNT=5`, `HOST_188_SERVICE_GREEN=1`, `HOST_188_HYGIENE_BLOCKED=0`, `HOST_188_CHECK_RC=0`, `HOST_188_RESULT=HOST_188_HYGIENE_GREEN.`, `WAZUH_ROUTE_CODE=200`, `WAZUH_TRANSPORT_COUNT=6`, `WAZUH_COVERAGE_SCOPE=6`, `WAZUH_DIRECT_ACTIVE=2`, `WAZUH_NO_TRANSPORT=1`, `WAZUH_SSH_BLOCKED=3`, `WAZUH_DASHBOARD_API_CONNECTION=pending_or_spinning`, `WAZUH_DASHBOARD_INDEX_OK=3`, `WAZUH_MANAGER_REGISTRY_ACCEPTED=0`, `WAZUH_RUNTIME_GATE=0`, `RUNTIME_ACTION_AUTHORIZED=0`, `OVERALL_DECLARATION=FULL_STACK_GREEN_DR_ESCROW_BLOCKED`, and `NEXT_REQUIRED_GATES=credential_escrow_evidence,wazuh_manager_registry_export`. This is now the preferred first operator/AI-agent entrypoint after reboot because it separates service health from DR and security registry evidence; 188 host hygiene is no longer a next gate unless the live checklist regresses.
|
||||
|
||||
@@ -29,6 +29,8 @@
|
||||
|
||||
2026-06-26 13:01 owner response preflight baseline: `scripts/reboot-recovery/post-reboot-owner-response-preflight.py --no-color` validates future owner responses against the dynamic owner-packet gate set without sending requests, writing markers, reading secrets, or changing runtime. Missing response file must remain `blocked_waiting_owner_response_file`; the placeholder template `docs/templates/post-reboot-next-gate-owner-response.json` must remain `blocked_waiting_owner_response_content` with `received=0`, `accepted=0`, and `runtime_gate=0`. The only acceptable payload class is redacted owner evidence for credential escrow and Wazuh manager registry export; secret values, hash / prefix / suffix, raw Wazuh payload, agent real names, internal IPs, `client.keys`, credential marker write, host write, Wazuh active response / re-enroll / restart, and Kali active scan are rejected.
|
||||
|
||||
2026-06-26 17:45 single-summary replay baseline: `scripts/reboot-recovery/post-reboot-readiness-summary.sh --no-color` now writes the exact emitted key/value summary to `$ARTIFACT_DIR/summary.txt`; latest artifact `/tmp/awoooi-post-reboot-readiness-20260626-174451/summary.txt` returns `SERVICE_GREEN=1`, `PRODUCT_DATA_GREEN=1`, `BACKUP_CORE_GREEN=1`, `DR_ESCROW_BLOCKED=1`, `ESCROW_MISSING_COUNT=5`, `HOST_188_HYGIENE_BLOCKED=0`, `WAZUH_MANAGER_REGISTRY_ACCEPTED=0`, `RUNTIME_ACTION_AUTHORIZED=0`, `OVERALL_DECLARATION=FULL_STACK_GREEN_DR_ESCROW_BLOCKED`, and `NEXT_REQUIRED_GATES=credential_escrow_evidence,wazuh_manager_registry_export`. The same summary file drives declaration guard, next-gate dispatch, owner packet generation, contract guard, and owner response preflight. `post-start-quick-check.sh` now holds delegated cold-start blockers until wrapper route retry completes; route-only cold-start blockers that recover under wrapper retry are evidence warnings, while non-route blockers or unrecovered routes remain hard blockers.
|
||||
|
||||
2026-06-26 08:47 Wazuh registry detail summary baseline: post-reboot readiness summary now emits `WAZUH_COVERAGE_SCOPE`, `WAZUH_DIRECT_ACTIVE`, `WAZUH_NO_TRANSPORT`, `WAZUH_SSH_BLOCKED`, `WAZUH_DASHBOARD_API_CONNECTION`, and `WAZUH_DASHBOARD_INDEX_OK` alongside existing route / transport / registry fields. Current read-only truth is coverage scope `6`, direct active `2`, no transport `1`, SSH blocked `3`, route `200`, transport `6`, Dashboard API `pending_or_spinning`, index OK `3`, manager registry accepted `0`, runtime gate `0`. This is a security evidence blocker, not a reboot service blocker.
|
||||
|
||||
2026-06-26 12:13 declaration guard baseline: `scripts/reboot-recovery/post-reboot-declaration-guard.py --no-color` emits `schema_version=awoooi_post_reboot_declaration_guard_v1`, status `allowed_with_boundary_blockers`, allowed declarations including service / product data / backup / 188 host hygiene green for this evidence set, and forbidden declarations `DR_COMPLETE`、`WAZUH_REGISTRY_RECOVERED`、`RUNTIME_ACTION_AUTHORIZED`. Proposed false-green declarations are rejected before they can enter LOGBOOK / owner packets / external status updates.
|
||||
|
||||
Reference in New Issue
Block a user