ops(reboot): add post-reboot owner packet JSON [skip ci]

This commit is contained in:
ogt
2026-06-26 08:32:18 +08:00
parent 229e7fc8cd
commit 02bcf0a31e
5 changed files with 269 additions and 5 deletions

View File

@@ -45072,3 +45072,44 @@ production browser smoke:
- 188 host hygiene 維護窗口仍未執行。
- Wazuh manager registry accepted remains `0`
- 不得宣稱 `DR_COMPLETE`、188 host fully green、Wazuh registry recovered、runtime/security acceptance enabled、或 owner request 已送出。
## 2026-06-26 — 08:29 post-reboot owner-packet JSON / SOP v1.69
**時間與來源**
- 2026-06-26 08:29 Asia/Taipei。
- 來源:新增 `scripts/reboot-recovery/post-reboot-next-gate-owner-packets.py --no-color`,委派 `scripts/reboot-recovery/post-reboot-next-gate-dispatch.sh --no-color``scripts/reboot-recovery/post-reboot-readiness-summary.sh --no-color`summary artifact dir `/tmp/awoooi-post-reboot-readiness-20260626-082912`
**完成內容**
- 新增 owner-packet JSON generator將 post-reboot next-gate dispatch 轉成 `awoooi_post_reboot_next_gate_owner_packets_v1`
- JSON 內含三個 P0 `owner_packets``credential_escrow_evidence``host_188_hygiene_maintenance_window``wazuh_manager_registry_export`
- `docs/runbooks/REBOOT-POST-START-QUICK-CHECK.md` 升至 v1.9,加入 owner-packet JSON intake 步驟。
- `docs/runbooks/FULL-STACK-COLD-START-SOP.md` 升至 v1.69,標明 JSON packet 是 AI / operator / owner review intake不是 request sent。
- `docs/workplans/2026-06-04-reboot-cold-start-backup-recovery-workplan.md` 更新為 `DONE_WITH_OWNER_PACKET_JSON_V169`
**只讀驗證結果**
- `schema_version=awoooi_post_reboot_next_gate_owner_packets_v1`
- `next_gate_count=3`
- `p0_gate_count=3`
- `request_sent_count=0`
- `owner_response_received_count=0`
- `owner_response_accepted_count=0`
- `runtime_action_authorized_count=0`
- `service_green=1`
- `overall_declaration=FULL_STACK_GREEN_DR_ESCROW_BLOCKED`
**做過的命令類型**
- 只讀post-reboot readiness summary、next-gate dispatch checklist、owner-packet JSON generation、source guard。
- 寫入repo script / docs-only。
- 未做host / Docker / systemd / Nginx / firewall / K8s / DB / Wazuh runtime 寫操作;未讀 secret 明文;未送 owner request未寫 escrow marker未執行 active response。
**目前判定**
- Owner-packet JSON automation`0% -> 100%`
- Reboot service / data / backup readiness remains `GREEN`
- Overall declaration remains `FULL_STACK_GREEN_DR_ESCROW_BLOCKED`
- Runtime repair / owner request sent / credential marker write / Wazuh registry accepted`0%`
**仍 blocked / 不得宣稱**
- DR credential escrow evidence missing `5`
- 188 host hygiene 維護窗口仍未執行。
- Wazuh manager registry accepted remains `0`
- 不得宣稱 owner request 已送出、owner response 已收到 / 接受、runtime 寫入已批准、`DR_COMPLETE`、188 host fully green、或 Wazuh registry recovered。

View File

@@ -1,6 +1,6 @@
# AWOOOI 全棧冷啟動與主機重啟 SOP
> Version: v1.68
> Version: v1.69
> Last updated: 2026-06-26 Asia/Taipei
> Scope: 110 / 120 / 121 / 188 full-stack reboot recovery. 112 Kali is recorded as P3 optional and is not part of this recovery path.
@@ -10,12 +10,14 @@
本節是每次接手、開機、關機、重啟後的第一個判定錨點。若日期不是今天,必須先重跑 live check再更新本節與 `docs/workplans/2026-06-04-reboot-cold-start-backup-recovery-workplan.md`
若只是重啟後要快速判斷能不能宣稱恢復,先跑機器可讀摘要:`scripts/reboot-recovery/post-reboot-readiness-summary.sh --no-color`。此腳本會呼叫一頁式總檢查、188 host hygiene checklist 與 Wazuh no-false-green repo gates並把 delegated logs 留在 `/tmp/awoooi-post-reboot-readiness-*`。若 summary 顯示 `SERVICE_GREEN=1``NEXT_REQUIRED_GATES` 仍非空,接著跑 `scripts/reboot-recovery/post-reboot-next-gate-dispatch.sh --no-color`,把 DR escrow、188 hygiene、Wazuh registry 三條 blocker 轉成 owner / evidence / forbidden-action dispatch checklist此 dispatch 仍固定 `DISPATCH_AUTHORIZED=0``REQUEST_SENT_COUNT=0``HOST_WRITE_AUTHORIZED=0``SECRET_VALUE_COLLECTION_ALLOWED=0`。需要人工展開時,再跑 `scripts/reboot-recovery/post-start-quick-check.sh --no-color` 並以 `docs/runbooks/REBOOT-POST-START-QUICK-CHECK.md` 作為 fallback。長 SOP 保留完整背景、例外處理與 Plan B短版 wrapper / checklist 負責每次 T+10 分鐘內的固定判定。
若只是重啟後要快速判斷能不能宣稱恢復,先跑機器可讀摘要:`scripts/reboot-recovery/post-reboot-readiness-summary.sh --no-color`。此腳本會呼叫一頁式總檢查、188 host hygiene checklist 與 Wazuh no-false-green repo gates並把 delegated logs 留在 `/tmp/awoooi-post-reboot-readiness-*`。若 summary 顯示 `SERVICE_GREEN=1``NEXT_REQUIRED_GATES` 仍非空,接著跑 `scripts/reboot-recovery/post-reboot-next-gate-dispatch.sh --no-color`,把 DR escrow、188 hygiene、Wazuh registry 三條 blocker 轉成 owner / evidence / forbidden-action dispatch checklist需要機器可讀 intake 時,再跑 `scripts/reboot-recovery/post-reboot-next-gate-owner-packets.py --no-color` 產生 `awoooi_post_reboot_next_gate_owner_packets_v1` JSON。dispatch / packet 均固定 `DISPATCH_AUTHORIZED=0``REQUEST_SENT_COUNT=0``OWNER_RESPONSE_ACCEPTED=0``HOST_WRITE_AUTHORIZED=0``SECRET_VALUE_COLLECTION_ALLOWED=0`。需要人工展開時,再跑 `scripts/reboot-recovery/post-start-quick-check.sh --no-color` 並以 `docs/runbooks/REBOOT-POST-START-QUICK-CHECK.md` 作為 fallback。長 SOP 保留完整背景、例外處理與 Plan B短版 wrapper / checklist 負責每次 T+10 分鐘內的固定判定。
2026-06-26 07:47 machine-readable readiness summary`scripts/reboot-recovery/post-reboot-readiness-summary.sh --no-color` 已驗證可用artifact dir `/tmp/awoooi-post-reboot-readiness-20260626-074702`。摘要輸出 `POST_START_RESULT=FULL_STACK_GREEN_DR_ESCROW_BLOCKED``POST_START_PASS=38``POST_START_WARN=3``POST_START_BLOCKED=0``SERVICE_GREEN=1``PRODUCT_DATA_GREEN=1``BACKUP_CORE_GREEN=1``DR_ESCROW_BLOCKED=1``ESCROW_MISSING_COUNT=5``HOST_188_SERVICE_GREEN=1``HOST_188_HYGIENE_BLOCKED=1``WAZUH_ROUTE_CODE=200``WAZUH_TRANSPORT_COUNT=6``WAZUH_MANAGER_REGISTRY_ACCEPTED=0``WAZUH_RUNTIME_GATE=0``RUNTIME_ACTION_AUTHORIZED=0`。目前 `OVERALL_DECLARATION=FULL_STACK_GREEN_DR_ESCROW_BLOCKED``NEXT_REQUIRED_GATES=credential_escrow_evidence,host_188_hygiene_maintenance_window,wazuh_manager_registry_export`。這是每次重啟後的第一層 operator / AI agent 判定格式。
2026-06-26 08:12 next-gate dispatch baseline`scripts/reboot-recovery/post-reboot-next-gate-dispatch.sh --no-color` 已以最新 summary live output 驗證。腳本讀回 `SERVICE_GREEN=1``OVERALL_DECLARATION=FULL_STACK_GREEN_DR_ESCROW_BLOCKED``NEXT_REQUIRED_GATES=credential_escrow_evidence,host_188_hygiene_maintenance_window,wazuh_manager_registry_export`,並輸出三個 P0 checklist一是 credential escrow non-secret evidence要求五個 escrow item 的 evidence id / owner / reviewer 且禁止密碼、token、hash、prefix/suffix二是 188 host PostgreSQL / certbot hygiene maintenance window要求 DB / DNS-TLS / rollback / postcheck owner 決策且禁止 `pg_resetwal`、certbot renew、Nginx reload、DB restore、Docker restart 等未批准動作;三是 Wazuh manager registry redacted export要求脫敏 registry count、host alias status、dashboard API/version status、time window 與 reviewer且禁止 agent real name、internal IP、client.keys、raw payload、active response、agent re-enroll、Wazuh restart、secret patch、host write、Kali active scan。輸出固定 `NEXT_GATE_COUNT=3``NEXT_STEP=dispatch_owner_packets_manually_after_review``RUNTIME_ACTION_AUTHORIZED=0`,這是 dispatch checklist不是 request sent 或 runtime approval。
2026-06-26 08:29 owner-packet JSON baseline`scripts/reboot-recovery/post-reboot-next-gate-owner-packets.py --no-color` 將 dispatch output 轉成 `schema_version=awoooi_post_reboot_next_gate_owner_packets_v1`,包含三個 `owner_packets``next_gate_count=3``p0_gate_count=3``request_sent_count=0``owner_response_received_count=0``owner_response_accepted_count=0``runtime_action_authorized_count=0`。此 JSON 是 AI / operator / owner review intake不是外部 request也不是維護窗口批准。
2026-06-26 07:39 live quick-check refresh`scripts/reboot-recovery/post-start-quick-check.sh --no-color` 完整跑完,四主機 ping / SSH 全部 OKdelegated cold-start 為 `PASS=89 WARN=0 BLOCKED=0`wrapper 總結為 `POST_START_QUICK_CHECK PASS=38 WARN=3 BLOCKED=0`、warning split `SERVICE=0 BOUNDARY=1 EVIDENCE=2``RESULT=FULL_STACK_GREEN_DR_ESCROW_BLOCKED`。MOMO health `V10.701`daily snapshot `109061` rows / `2025-07-01..2026-06-24`current-month parity `15383|15383|2026-06-01|2026-06-24|2026-06-01|2026-06-24`latest import job `57 completed`。StockPlatform freshness `status=ok`、latest trading date `2026-06-25`price / chips / margin / AI recommendations 均為 `2026-06-25`。Backup-status 07:39 顯示 110 `13/13 fresh failed=0`、188 `2/2 fresh failed=0``core_blockers=0`、offsite/rclone fresh、`last_backup_all=2026-06-26 02:31:02``escrow_missing=5`。Public routes extended list 全部回 expected 2xx/3xx。110 CPU attribution 顯示 load 約 `5.19 / 4.66 / 4.91`CPU idle 多數樣本 `80%+`,目前負載來自 Gitea / ClickHouse / Docker / Kafka / StockPlatform / AWOOOI API / Sentry 等正常平台工作,不是 orphan Chrome。這一輪 allowed declaration主機、K3s、服務、網站、產品資料 freshness、備份核心與 offsite freshness 綠forbidden declarationDR complete、credential escrow complete、188 host fully green、Wazuh registry recovered。
2026-06-26 07:19 follow-up`gitea/main` 已包含前一輪 SOP 文件 commit `1fd5e2a8`ArgoCD `awoooi-prod` 讀回 `Synced / Healthy`revision `1fd5e2a8b0f18d24eed16aa2a44286bcbf230603`API `2/2`、Web `2/2`、Worker `1/1`pods `restart=0`。重跑 full cold-start 仍是 `PASS=87 WARN=0 BLOCKED=0`result `GREEN`。直接 public route 讀回AWOOOI API `200`、AWOOOI Web `307`、VibeWork `200`、AwoooGo `200`、MOMO health `200`、Stock freshness `200`、Bitan `200`、Gitea `200`、Harbor `200`、Registry `/v2/` expected `401`、Sentry expected `302`、SigNoz `200`、Langfuse `200`。188 blocker 精準分類:`pg_lsclusters` 顯示 host PostgreSQL `14/main` down`systemctl status postgresql@14-main` 顯示 `invalid primary checkpoint record``PANIC: could not locate a valid checkpoint record``certbot.service` 顯示 `sentry.wooo.work` renew rate-limited`snap.certbot.renew.service` 顯示 challenge failed`awoooi-startup.service` 曾嘗試以 root 執行 `pg_resetwal` 並失敗。本輪不執行 `pg_resetwal`、不 `reset-failed`、不重啟 service188 需用獨立維護窗口、rollback owner、restore/source-of-truth plan 處理,詳見 `docs/runbooks/HOST-188-HYGIENE-MAINTENANCE-RUNBOOK.md`,並可先跑 `scripts/reboot-recovery/188-host-hygiene-maintenance-checklist.sh --no-color` 取得只讀 preflight。110 load 已降到約 `4.83 / 4.82 / 5.52`top CPU 是 active AWOOOI Web `turbo build` / Docker buildxSwap 仍滿但 memory available 約 `41Gi`,本輪不手動清 swap。整體宣告仍是 `FULL_STACK_GREEN_DR_ESCROW_BLOCKED`

View File

@@ -1,6 +1,6 @@
# 主機重啟後一頁式總檢查
> Version: v1.8
> Version: v1.9
> Last updated: 2026-06-26 Asia/Taipei
> Scope: 110 / 120 / 121 / 188 post-reboot service recovery. 112 Kali / Wazuh / active scan 不屬於本流程。
@@ -10,7 +10,7 @@
每次 110 / 120 / 121 / 188 任一台主機開機、關機、重啟、斷電恢復、VMware console fsck、Docker / K3s 大量重排後,都先跑本頁,再決定是否宣稱恢復。
最新基準2026-06-26 08:12 next-gate dispatch`scripts/reboot-recovery/post-reboot-readiness-summary.sh --no-color` 回傳 `SERVICE_GREEN=1``PRODUCT_DATA_GREEN=1``BACKUP_CORE_GREEN=1``DR_ESCROW_BLOCKED=1``ESCROW_MISSING_COUNT=5``HOST_188_HYGIENE_BLOCKED=1``WAZUH_MANAGER_REGISTRY_ACCEPTED=0``RUNTIME_ACTION_AUTHORIZED=0``OVERALL_DECLARATION=FULL_STACK_GREEN_DR_ESCROW_BLOCKED`。接著 `scripts/reboot-recovery/post-reboot-next-gate-dispatch.sh --no-color``NEXT_REQUIRED_GATES=credential_escrow_evidence,host_188_hygiene_maintenance_window,wazuh_manager_registry_export` 展成三個 owner / evidence / forbidden-action checklist,並固定 `DISPATCH_AUTHORIZED=0``REQUEST_SENT_COUNT=0``HOST_WRITE_AUTHORIZED=0``SECRET_VALUE_COLLECTION_ALLOWED=0`。Cold-start `PASS=89 WARN=0 BLOCKED=0`MOMO `V10.701`、latest import job `57 completed``DB_DAILY_FRESHNESS 1|2026-06-24`StockPlatform `/api/v1/system/freshness``status=ok``latest_trading_date=2026-06-25`、blockers `[]`backup-status 110 `13/13 fresh failed=0`、188 `2/2 fresh failed=0``core_blockers=0``offsite_fresh=1``rclone_gdrive_fresh=1``last_backup_all=2026-06-26 02:31:02`。DR 仍因 `escrow_missing=5` 不可宣稱 complete。188 host hygiene 與 Wazuh manager registry 仍是 service green 之外的獨立 blocker。
最新基準2026-06-26 08:29 next-gate owner packets`scripts/reboot-recovery/post-reboot-readiness-summary.sh --no-color` 回傳 `SERVICE_GREEN=1``PRODUCT_DATA_GREEN=1``BACKUP_CORE_GREEN=1``DR_ESCROW_BLOCKED=1``ESCROW_MISSING_COUNT=5``HOST_188_HYGIENE_BLOCKED=1``WAZUH_MANAGER_REGISTRY_ACCEPTED=0``RUNTIME_ACTION_AUTHORIZED=0``OVERALL_DECLARATION=FULL_STACK_GREEN_DR_ESCROW_BLOCKED`。接著 `scripts/reboot-recovery/post-reboot-next-gate-dispatch.sh --no-color``NEXT_REQUIRED_GATES=credential_escrow_evidence,host_188_hygiene_maintenance_window,wazuh_manager_registry_export` 展成三個 owner / evidence / forbidden-action checklist`scripts/reboot-recovery/post-reboot-next-gate-owner-packets.py --no-color` 進一步轉成 `awoooi_post_reboot_next_gate_owner_packets_v1` JSON固定 `dispatch_authorized=0``request_sent_count=0``owner_response_accepted_count=0``host_write_authorized=0``secret_value_collection_allowed=0``runtime_gate_count=0`。Cold-start `PASS=89 WARN=0 BLOCKED=0`MOMO `V10.701`、latest import job `57 completed``DB_DAILY_FRESHNESS 1|2026-06-24`StockPlatform `/api/v1/system/freshness``status=ok``latest_trading_date=2026-06-25`、blockers `[]`backup-status 110 `13/13 fresh failed=0`、188 `2/2 fresh failed=0``core_blockers=0``offsite_fresh=1``rclone_gdrive_fresh=1``last_backup_all=2026-06-26 02:31:02`。DR 仍因 `escrow_missing=5` 不可宣稱 complete。188 host hygiene 與 Wazuh manager registry 仍是 service green 之外的獨立 blocker。
本頁只回答四件事:
@@ -68,6 +68,14 @@ scripts/reboot-recovery/post-reboot-next-gate-dispatch.sh --no-color
這支腳本只把 `credential_escrow_evidence``host_188_hygiene_maintenance_window``wazuh_manager_registry_export` 轉成 owner / required evidence / forbidden action / done criteria。它不送 request、不讀 secret、不寫 marker、不 restart / reload / repair / import / delete / patch也不授權 host / Wazuh / Nginx / K8s / DB runtime action。若它輸出 `SERVICE_GREEN=0`,先回到服務恢復,不進入 boundary dispatch。
若要交給 AI / 工單 / owner review 使用,產生機器可讀 owner packet
```bash
scripts/reboot-recovery/post-reboot-next-gate-owner-packets.py --no-color
```
輸出 JSON 只能作為 intake / review packet不是 request sent。必須看到 `request_sent_count=0``owner_response_accepted_count=0``runtime_action_authorized_count=0`,否則視為不合格。
需要展開細節時,再使用 repo-side wrapper
```bash

View File

@@ -15,12 +15,14 @@
| P0 host / K3s recovery | DONE | 100% | 120 booted after console fsck at `2026-06-12 15:13`; latest 2026-06-26 07:19 readback shows 120 and 121 reachable, K3s active, `mon` and `mon1` both `Ready control-plane`, AWOOOI API/Web replicas split across both nodes, ArgoCD `awoooi-prod Synced / Healthy` at revision `1fd5e2a8b0f18d24eed16aa2a44286bcbf230603`, and `km-vectorize` official 03:00 台北時間 run succeeded with `lastSuccess=2026-06-25T19:00:14Z`. |
| P1 backup / alert / escrow | BLOCKED_DR_ESCROW | 97% | 2026-06-26 06:58 backup readback shows 110 `13/13 fresh failed=0`, 188 `2/2 fresh failed=0`, `core_blockers=0`, `integrity_stale=0`, `offsite_fresh=1`, `rclone_gdrive_fresh=1`, `escrow_missing=5`, last aggregate `2026-06-26 02:31:02`。DR remains blocked on real non-secret credential escrow evidence IDs; do not write placeholder markers or paste secret values. |
| P2 service / data truth | DONE | 100% | Service routes and core runtime are available, 110 current CPU pressure is attributable to active AWOOOI Web `turbo build` / Docker buildx, and previous orphan Chrome groups remain cleared. 2026-06-26 07:19 StockPlatform `/api/v1/system/freshness` returned `200`; 07:01 freshness payload was `status=ok`, `latest_trading_date=2026-06-25`, blockers `[]`; price / chips / margin / AI recommendations are all on `2026-06-25`. `ai.recommendations` row count is `2868`; `core.margin_short_daily` row count is `1976`. MOMO health `V10.699`, current-month parity `15383|15383|2026-06-01|2026-06-24|2026-06-01|2026-06-24`, and `MOMO_DAILY_FRESHNESS 1|2026-06-24` are green; expanded public routes are green. |
| P3 docs / automation contracts | DONE_WITH_NEXT_GATE_DISPATCH_V168 | 100% | Workplan, SOP v1.68, machine-readable post-reboot readiness summary, post-reboot next-gate dispatch checklist, one-page post-start quick check v1.8, route retry gate, deploy warmup classification, expanded public route list, StockPlatform freshness gate, StockPlatform cron-source recovery evidence, StockPlatform natural schedule green evidence, 110 orphan Chrome recurrence cleanup evidence, 188 fail-closed startup data recovery gate, 188 host hygiene read-only checklist, baseline `stockplatform_system_freshness_ok`, BACKUP-STATUS, LOGBOOK, 120 console/fsck recovery, Gitea backup stale-dump hardening, reboot ledger/version-comparison SOP, escrow evidence audit, 188 nginx Ansible baseline, 110 cold-start detector script, startup judgment layers, GO/NO-GO tree, host recovery cards, explicit Plan B degraded-operation path, machine-readable `plan_b` baseline, readiness-audit Plan B guard, B0-B5 service levels, T+0/T+120 fallback timeline checks, host role / load-balancing assessment, CD `known_hosts` guardrail, `fwupd-refresh.timer` rollback note, K3s filesystem event blocker, AWOOOI backup no-direct-offsite-sync contract, 110/188 Ansible source-of-truth, Gitea self-hosted readiness validation workflow, post-CD no-regression readbacks, stale-vs-active K8s failed Job classification, 110 runaway browser / CI load AIOps exporter + alert + gated remediation PlayBook, Telegram / AI event packet mapping, healthy heartbeat Telegram suppression, MOMO scheduler / current-month detector fix, exporter restore helpers, 110 Docker disk pressure cleanup boundary, notification-noise readback, MOMO import-boundary / Drive-auth fail-closed deploys, product version/readback matrix, and stricter product-data / route retry gates are updated. Next-gate dispatch turns `credential_escrow_evidence``host_188_hygiene_maintenance_window``wazuh_manager_registry_export` into owner / evidence / forbidden-action checklists while keeping request sent / host write / secret collection / runtime action at `0`. Live 110 script sync remains a separate approved live-write gate; do not claim it here. |
| P3 docs / automation contracts | DONE_WITH_OWNER_PACKET_JSON_V169 | 100% | Workplan, SOP v1.69, machine-readable post-reboot readiness summary, post-reboot next-gate dispatch checklist, owner-packet JSON generator, one-page post-start quick check v1.9, route retry gate, deploy warmup classification, expanded public route list, StockPlatform freshness gate, StockPlatform cron-source recovery evidence, StockPlatform natural schedule green evidence, 110 orphan Chrome recurrence cleanup evidence, 188 fail-closed startup data recovery gate, 188 host hygiene read-only checklist, baseline `stockplatform_system_freshness_ok`, BACKUP-STATUS, LOGBOOK, 120 console/fsck recovery, Gitea backup stale-dump hardening, reboot ledger/version-comparison SOP, escrow evidence audit, 188 nginx Ansible baseline, 110 cold-start detector script, startup judgment layers, GO/NO-GO tree, host recovery cards, explicit Plan B degraded-operation path, machine-readable `plan_b` baseline, readiness-audit Plan B guard, B0-B5 service levels, T+0/T+120 fallback timeline checks, host role / load-balancing assessment, CD `known_hosts` guardrail, `fwupd-refresh.timer` rollback note, K3s filesystem event blocker, AWOOOI backup no-direct-offsite-sync contract, 110/188 Ansible source-of-truth, Gitea self-hosted readiness validation workflow, post-CD no-regression readbacks, stale-vs-active K8s failed Job classification, 110 runaway browser / CI load AIOps exporter + alert + gated remediation PlayBook, Telegram / AI event packet mapping, healthy heartbeat Telegram suppression, MOMO scheduler / current-month detector fix, exporter restore helpers, 110 Docker disk pressure cleanup boundary, notification-noise readback, MOMO import-boundary / Drive-auth fail-closed deploys, product version/readback matrix, and stricter product-data / route retry gates are updated. Owner-packet JSON turns `credential_escrow_evidence``host_188_hygiene_maintenance_window``wazuh_manager_registry_export` into structured review packets while keeping request sent / owner accepted / host write / secret collection / runtime action at `0`. Live 110 script sync remains a separate approved live-write gate; do not claim it here. |
2026-06-26 07:47 machine-readable summary baseline: `scripts/reboot-recovery/post-reboot-readiness-summary.sh --no-color` stores delegated logs under `/tmp/awoooi-post-reboot-readiness-20260626-074702` and returns `SERVICE_GREEN=1`, `PRODUCT_DATA_GREEN=1`, `BACKUP_CORE_GREEN=1`, `DR_ESCROW_BLOCKED=1`, `ESCROW_MISSING_COUNT=5`, `HOST_188_SERVICE_GREEN=1`, `HOST_188_HYGIENE_BLOCKED=1`, `WAZUH_ROUTE_CODE=200`, `WAZUH_TRANSPORT_COUNT=6`, `WAZUH_MANAGER_REGISTRY_ACCEPTED=0`, `WAZUH_RUNTIME_GATE=0`, `RUNTIME_ACTION_AUTHORIZED=0`, `OVERALL_DECLARATION=FULL_STACK_GREEN_DR_ESCROW_BLOCKED`, and `NEXT_REQUIRED_GATES=credential_escrow_evidence,host_188_hygiene_maintenance_window,wazuh_manager_registry_export`. This is now the preferred first operator/AI-agent entrypoint after reboot because it separates service health from DR, host hygiene, and security registry evidence.
2026-06-26 08:12 next-gate dispatch baseline: `scripts/reboot-recovery/post-reboot-next-gate-dispatch.sh --no-color` reads the same summary path and emits three explicit P0 dispatch checklists without sending requests or changing runtime. `credential_escrow_evidence` requires non-secret evidence id / owner / reviewer for five escrow items and rejects password / token / secret value / hash / prefix / suffix / raw credential payloads. `host_188_hygiene_maintenance_window` requires PostgreSQL `14/main` decision, DNS / TLS / certbot path, startup unit source-of-truth, rollback owner, postcheck owner, and blocks unapproved `pg_resetwal` / certbot renew / Nginx reload / DB restore / Docker restart / host file writes. `wazuh_manager_registry_export` requires redacted registry counts, per-host alias status, dashboard API / version status, time window, and reviewer while blocking raw agent names, internal IPs, client keys, Wazuh payloads, active response, re-enroll, restart, secret patch, host write, and Kali active scan. Output fixed `NEXT_GATE_COUNT=3`, `REQUEST_SENT_COUNT=0`, `DISPATCH_AUTHORIZED=0`, `HOST_WRITE_AUTHORIZED=0`, `SECRET_VALUE_COLLECTION_ALLOWED=0`, `RUNTIME_ACTION_AUTHORIZED=0`.
2026-06-26 08:29 owner-packet JSON baseline: `scripts/reboot-recovery/post-reboot-next-gate-owner-packets.py --no-color` emits `schema_version=awoooi_post_reboot_next_gate_owner_packets_v1` with `next_gate_count=3`, `p0_gate_count=3`, `request_sent_count=0`, `owner_response_received_count=0`, `owner_response_accepted_count=0`, `runtime_action_authorized_count=0`. This packet is for AI / operator / owner review intake only; it does not send request, write credential marker, read secret, or authorize runtime action.
2026-06-26 07:39 live quick-check refresh supersedes the 07:19 row for current operator status. `scripts/reboot-recovery/post-start-quick-check.sh --no-color` returned `POST_START_QUICK_CHECK PASS=38 WARN=3 BLOCKED=0`, warning split `SERVICE=0 BOUNDARY=1 EVIDENCE=2`, result `FULL_STACK_GREEN_DR_ESCROW_BLOCKED`. Delegated cold-start returned `PASS=89 WARN=0 BLOCKED=0`; four reboot-scope hosts ping/SSH were OK; AWOOOI / VibeWork / AwoooGo / 2026FIFA / Agent Bounty / MOMO / Stock / Bitan / TsenYang / VTuber / Gitea / Harbor / Registry / Sentry / SigNoz / Langfuse / AIOps routes returned expected 2xx/3xx. MOMO `V10.701` has job `57 completed`, daily freshness `1|2026-06-24`, and current-month parity `15383|15383|2026-06-01|2026-06-24|2026-06-01|2026-06-24`. StockPlatform freshness is `ok` through `2026-06-25` with price / chips / margin / AI recommendations current. Backup core remains green: 110 `13/13 fresh failed=0`, 188 `2/2 fresh failed=0`, `core_blockers=0`, offsite/rclone fresh, `last_backup_all=2026-06-26 02:31:02`; DR still has `escrow_missing=5`. 110 load around `5.19 / 4.66 / 4.91` is attributable to normal platform processes, not orphan Chrome. 188 host hygiene remains blocked by failed host PostgreSQL / certbot / startup units and must use the dedicated maintenance runbook and read-only checklist.
2026-06-25 19:06 post-CD wrapper readback supersedes the 18:53 wording: consecutive main pushes created a deploy storm where older deploy markers were superseded by later commits. Latest production truth is deploy marker `d8ca8224 chore(cd): deploy 9dbe044 [skip ci]`, ArgoCD `Synced / Healthy`, API/Web/Worker image tag `9dbe044ea1e8e3894ccbeb5ed760bb124b87f7be`, direct route smoke 200 for AWOOOI API / IwoooS / VibeWork / AwoooGo / MOMO health / Stock / Bitan and expected route-gate statuses for MOMO / Gitea / Harbor / Registry / Sentry / SigNoz / Langfuse / AIOps, and wrapper `POST_START_QUICK_CHECK PASS=18 WARN=3 BLOCKED=0`. Repo-side cold-start returns `PASS=89 WARN=0 BLOCKED=0`; `/backup/scripts/backup-status.sh --no-notify --no-refresh` reports 110 `13/13 fresh failed=0`, 188 `2/2 fresh failed=0`, `core_blockers=0`, `integrity_stale=0`, `offsite_fresh=1`, `rclone_gdrive_fresh=1`, `escrow_missing=5`; MOMO dedicated preflight returns `PASS=19 WARN=2 BLOCKED=0`; MOMO health is `V10.690`; AwoooGo / Stock transient 502 reads cleared after upstream warmup and five consecutive route reads returned `200`; 110 load is around `14.51 / 12.34 / 11.42`, with Gitea Actions cache save / `zstdmt` / `tar`, StockPlatform headless Chrome smoke / CI, Gitea, AWOOOI API, ClickHouse, Docker, and platform services visible, not an AWOOOI service blocker. Wrapper result is `FULL_STACK_GREEN_DR_ESCROW_BLOCKED`, not `DEGRADED`, because service warnings are `0` and only DR boundary / evidence warnings remain. Wazuh route readback is now `200 disabled_waiting_iwooos_wazuh_owner_gate`, but manager registry accepted remains `0`, so Wazuh is a security registry evidence blocker rather than a reboot service blocker.

View File

@@ -0,0 +1,211 @@
#!/usr/bin/env python3
"""Build machine-readable owner packets from post-reboot next-gate dispatch.
Read-only by design. This script may run the read-only dispatch checklist, but
it never sends owner requests, reads secrets, writes credential markers, or
modifies host/runtime state.
"""
from __future__ import annotations
import argparse
import json
import subprocess
import sys
from datetime import datetime
from pathlib import Path
from typing import Any
ROOT = Path(__file__).resolve().parents[2]
DISPATCH_SCRIPT = ROOT / "scripts" / "reboot-recovery" / "post-reboot-next-gate-dispatch.sh"
def parse_args() -> argparse.Namespace:
parser = argparse.ArgumentParser(
description="Convert post-reboot gate dispatch output into owner packet JSON.",
)
parser.add_argument(
"--dispatch-file",
type=Path,
help="Use an existing post-reboot-next-gate-dispatch output file.",
)
parser.add_argument(
"--output",
type=Path,
help="Write JSON to this path instead of stdout.",
)
parser.add_argument(
"--no-color",
action="store_true",
help="Pass --no-color to the delegated dispatch script.",
)
return parser.parse_args()
def run_dispatch(no_color: bool) -> str:
cmd = [str(DISPATCH_SCRIPT)]
if no_color:
cmd.append("--no-color")
completed = subprocess.run(
cmd,
cwd=ROOT,
check=False,
text=True,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
)
if completed.returncode not in (0,):
raise SystemExit(
f"dispatch_failed rc={completed.returncode}\n{completed.stdout}"
)
return completed.stdout
def load_dispatch(args: argparse.Namespace) -> str:
if args.dispatch_file:
return args.dispatch_file.read_text(encoding="utf-8")
return run_dispatch(no_color=args.no_color)
def split_csv(value: str) -> list[str]:
if not value or value == "none":
return []
return [item.strip() for item in value.split(",") if item.strip()]
def parse_dispatch(text: str) -> dict[str, Any]:
summary: dict[str, str] = {}
gates: list[dict[str, Any]] = []
current_gate: dict[str, Any] | None = None
for raw_line in text.splitlines():
line = raw_line.strip()
if not line or "=" not in line:
continue
key, value = line.split("=", 1)
key = key.strip()
value = value.strip()
if key == "GATE_ID":
if current_gate:
gates.append(current_gate)
current_gate = {"gate_id": value}
continue
target = current_gate if current_gate is not None else summary
target[key.lower()] = value
if current_gate:
gates.append(current_gate)
for gate in gates:
for key in (
"owner_group",
"required_items",
"required_evidence",
"required_decisions",
"required_export",
"forbidden_payloads",
"forbidden_action",
"forbidden_actions",
"done_criteria",
):
if key in gate:
gate[key] = split_csv(str(gate[key]))
return {"summary": summary, "gates": gates}
def build_packet(parsed: dict[str, Any]) -> dict[str, Any]:
summary = parsed["summary"]
gates = parsed["gates"]
next_required = split_csv(summary.get("next_required_gates", ""))
return {
"schema_version": "awoooi_post_reboot_next_gate_owner_packets_v1",
"generated_at": datetime.now().astimezone().isoformat(timespec="seconds"),
"source": {
"dispatch_script": str(DISPATCH_SCRIPT.relative_to(ROOT)),
"summary_file": summary.get("summary_file", "unknown"),
"summary_artifact_dir": summary.get("summary_artifact_dir", "unknown"),
"overall_declaration": summary.get("overall_declaration", "unknown"),
"next_required_gates": next_required,
},
"status": {
"service_green": summary.get("service_green", "unknown"),
"runtime_action_authorized": 0,
"dispatch_authorized": 0,
"request_sent_count": 0,
"owner_response_received_count": 0,
"owner_response_accepted_count": 0,
"host_write_authorized": 0,
"secret_value_collection_allowed": 0,
"runtime_gate_count": 0,
},
"owner_packets": [
{
"packet_id": gate.get("gate_id", "unknown"),
"title": gate.get("gate_title", "unknown"),
"priority": gate.get("gate_priority", "unknown"),
"status": gate.get("gate_status", "unknown"),
"current_evidence": gate.get("current_evidence", "unknown"),
"owner_group": gate.get("owner_group", []),
"required_items": gate.get("required_items", []),
"required_evidence": gate.get("required_evidence", []),
"required_decisions": gate.get("required_decisions", []),
"required_export": gate.get("required_export", []),
"allowed_action": gate.get("allowed_action", "unknown"),
"forbidden_payloads": gate.get("forbidden_payloads", []),
"forbidden_actions": gate.get("forbidden_actions")
or gate.get("forbidden_action", []),
"done_criteria": gate.get("done_criteria", []),
"request_sent": False,
"response_received": False,
"response_accepted": False,
"runtime_action_authorized": False,
}
for gate in gates
],
"counts": {
"next_gate_count": len(gates),
"p0_gate_count": sum(1 for gate in gates if gate.get("gate_priority") == "P0"),
"request_sent_count": 0,
"owner_response_received_count": 0,
"owner_response_accepted_count": 0,
"runtime_action_authorized_count": 0,
},
"no_false_green_rules": [
"service_green_does_not_equal_dr_complete",
"backup_fresh_does_not_equal_credential_escrow_complete",
"host_188_service_green_does_not_equal_host_hygiene_green",
"wazuh_route_or_transport_does_not_equal_manager_registry_accepted",
],
"forbidden_global_actions": [
"send_owner_request_without_review",
"write_credential_marker_without_non_secret_evidence",
"collect_secret_value_hash_prefix_suffix_or_raw_payload",
"pg_resetwal_or_db_restore_without_maintenance_window",
"nginx_reload_or_certbot_renew_without_owner_gate",
"wazuh_active_response_reenroll_restart_or_secret_patch",
"host_write_or_kali_active_scan_without_explicit_approval",
],
}
def main() -> int:
args = parse_args()
parsed = parse_dispatch(load_dispatch(args))
packet = build_packet(parsed)
payload = json.dumps(packet, ensure_ascii=False, indent=2, sort_keys=True)
if args.output:
args.output.parent.mkdir(parents=True, exist_ok=True)
args.output.write_text(payload + "\n", encoding="utf-8")
else:
print(payload)
return 0
if __name__ == "__main__":
sys.exit(main())