Merge remote-tracking branch 'gitea/main' into codex/delivery-workbench-release-20260626-ffsync

This commit is contained in:
ogt
2026-06-27 00:10:37 +08:00
4 changed files with 42 additions and 7 deletions

View File

@@ -46295,3 +46295,36 @@ production browser smoke:
- DR credential escrow evidence 仍缺 `5`:不得宣稱 `DR_COMPLETE`
- Wazuh manager registry accepted 仍為 `0`:不得宣稱 Wazuh 全主機納管恢復manager registry 脫敏統計只是 evidence還缺 Dashboard API / owner acceptance。
- Runtime action / host write / credential marker write / Wazuh active response / Kali active scan 仍全部 `0 / false`
## 2026-06-27 — 00:02 post-midnight reboot SOP live refresh
**時間與來源**
- 2026-06-27 00:01-00:08 Asia/Taipei。
- 來源:`scripts/reboot-recovery/post-reboot-readiness-summary.sh --no-color`、production route smoke、AWOOOI `/api/v1/health`、Gitea main / deploy marker readback。
**live 驗證結果**
- `post-reboot-readiness-summary.sh --no-color` artifact`/tmp/awoooi-post-reboot-readiness-20260627-000137/summary.txt`
- Summary`POST_START_RC=0``POST_START_RESULT=FULL_STACK_GREEN_DR_ESCROW_BLOCKED``POST_START_PASS=38``POST_START_WARN=4``POST_START_BLOCKED=0``SERVICE_GREEN=1``PRODUCT_DATA_GREEN=1``STOCK_FRESHNESS_STATUS=ok``STOCK_LATEST_TRADING_DATE=2026-06-26``STOCK_BLOCKERS=none``BACKUP_CORE_GREEN=1``DR_ESCROW_BLOCKED=1``ESCROW_MISSING_COUNT=5``HOST_188_HYGIENE_BLOCKED=0``WAZUH_ROUTE_CODE=200``WAZUH_TRANSPORT_COUNT=6``WAZUH_MANAGER_REGISTRY_ACCEPTED=0``RUNTIME_ACTION_AUTHORIZED=0``NEXT_REQUIRED_GATES=credential_escrow_evidence,wazuh_manager_registry_export`
- Production route smokeIwoooS `200`、Wazuh read-only route `/api/iwooos/wazuh` `200`、Wazuh read-only route `/api/v1/iwooos/wazuh` `200`、VibeWork `200`、AwoooGo `200`、MOMO health `200`、Stock `200`
- AWOOOI API health`healthy / prod / mock_mode=false`PostgreSQL / Redis / OpenClaw / SigNoz / `ollama_gcp_a` / `ollama_gcp_b` up`ollama_local` 為 cooldown / degraded由 provider fallback 承接,不是網站或 API service blocker。
- Gitea main`9c33f4b0 docs(logbook): record controlled runtime summary deployment [skip ci]`SOP source commit 為 `89b9e67a fix(ops): harden reboot API warmup evidence flow`。最新 production deploy marker 為 `e506b9d5 chore(cd): deploy fe74d86 [skip ci]``89b9e67a` 是 SOP / scripts / docs source update不是 runtime bundle deploy marker。
**完成內容**
- `docs/runbooks/FULL-STACK-COLD-START-SOP.md` 升至 v1.77,加入 2026-06-27 00:02 post-midnight live baseline。
- `docs/runbooks/REBOOT-POST-START-QUICK-CHECK.md` 升至 v1.17,加入 production route smoke 與 local Ollama fallback 判讀。
- `docs/workplans/2026-06-04-reboot-cold-start-backup-recovery-workplan.md` Current Verdict 更新為 2026-06-27 00:02。
**做過的命令類型**
- 只讀Gitea main / deploy marker / route smoke / API health / post-reboot summary。
- 寫入repo docs only。
- 未做:沒有 host / Docker / systemd / Nginx / firewall / K8s / DB / Wazuh runtime 寫操作;沒有讀 secret 明文;沒有 credential marker write沒有 Wazuh active response / agent re-enroll / restart沒有 Kali active scan沒有取消 CI。
**目前判定**
- 主機、K3s、服務、public routes、MOMO、StockPlatform freshness、backup core、188 host hygiene`GREEN`
- Overall recovery declaration`FULL_STACK_GREEN_DR_ESCROW_BLOCKED`
- SOP / quick-checkv1.77 / v1.17。
**仍 blocked / 不得宣稱**
- DR credential escrow evidence 仍缺 `5`:不得宣稱 `DR_COMPLETE`
- Wazuh manager registry accepted 仍為 `0`:不得宣稱 Wazuh 全主機納管恢復。
- Runtime action / host write / credential marker write / Wazuh active response / Kali active scan 仍全部 `0 / false`

View File

@@ -1,7 +1,7 @@
# AWOOOI 全棧冷啟動與主機重啟 SOP
> Version: v1.76
> Last updated: 2026-06-26 Asia/Taipei
> Version: v1.77
> Last updated: 2026-06-27 Asia/Taipei
> Scope: 110 / 120 / 121 / 188 full-stack reboot recovery. 112 Kali is recorded as P3 optional and is not part of this recovery path.
---
@@ -14,7 +14,9 @@
v1.76 owner gate replay rule同一輪 summary 產生後owner packet 與 owner response preflight 必須優先使用 `--summary-file "$ARTIFACT_DIR/summary.txt"`,例如 `scripts/reboot-recovery/post-reboot-next-gate-owner-packets.py --no-color --summary-file "$ARTIFACT_DIR/summary.txt" --output /tmp/awoooi-post-reboot-owner-packets.json``scripts/reboot-recovery/post-reboot-owner-response-preflight.py --no-color --summary-file "$ARTIFACT_DIR/summary.txt" --response-file <file>`。只有在刻意要重新取 live evidence 時,才允許省略 `--summary-file`;否則 preflight 不得自己重跑 summary 造成同一輪狀態漂移。
2026-06-26 23:56 最新 live summary`scripts/reboot-recovery/post-reboot-readiness-summary.sh --no-color` 回傳 `POST_START_RESULT=FULL_STACK_GREEN_DR_ESCROW_BLOCKED``POST_START_PASS=38``POST_START_WARN=3``POST_START_BLOCKED=0``SERVICE_GREEN=1``PRODUCT_DATA_GREEN=1``STOCK_FRESHNESS_STATUS=ok``STOCK_LATEST_TRADING_DATE=2026-06-26``STOCK_BLOCKERS=none``BACKUP_CORE_GREEN=1``ESCROW_MISSING_COUNT=5``HOST_188_HYGIENE_BLOCKED=0``WAZUH_MANAGER_REGISTRY_ACCEPTED=0``RUNTIME_ACTION_AUTHORIZED=0`。同一時段只讀補查 120ArgoCD `awoooi-prod``Synced / Healthy``awoooi-prod` Pod 均為 `Running``Completed`;歷史 `km-vectorize-29689620` failed Job 已被 2026-06-23、2026-06-24、2026-06-25 後續成功 Job 覆蓋,不是目前服務 blocker。同一時段只讀補查 112systemd `running`Wazuh manager / indexer / dashboard `active`manager API root 回 `401`Dashboard unauthenticated check endpoints 回 `401`manager registry 脫敏讀回為 local manager `1`、受管 agent `5`、active managed `5`、disconnected `0`、never connected `0`。此證據證明 registry 不再是「全空」,但仍不能宣稱 Wazuh 全主機納管恢復,因為 SOP expected scope 仍是 6、Dashboard API connection / version 尚未以登入或 owner evidence 驗收owner response accepted 仍為 `0`
2026-06-27 00:02 最新 live summary`scripts/reboot-recovery/post-reboot-readiness-summary.sh --no-color` 回傳 `POST_START_RESULT=FULL_STACK_GREEN_DR_ESCROW_BLOCKED``POST_START_PASS=38``POST_START_WARN=4``POST_START_BLOCKED=0``SERVICE_GREEN=1``PRODUCT_DATA_GREEN=1``STOCK_FRESHNESS_STATUS=ok``STOCK_LATEST_TRADING_DATE=2026-06-26``STOCK_BLOCKERS=none``BACKUP_CORE_GREEN=1``ESCROW_MISSING_COUNT=5``HOST_188_HYGIENE_BLOCKED=0``WAZUH_MANAGER_REGISTRY_ACCEPTED=0``RUNTIME_ACTION_AUTHORIZED=0`。同一輪 production route smoke 回傳IwoooS `200`、Wazuh read-only routes `200`、VibeWork `200`、AwoooGo `200`、MOMO health `200`、Stock `200`AWOOOI API health `healthy / prod / mock_mode=false`PostgreSQL / Redis / OpenClaw / SigNoz / GCP Ollama provider uplocal Ollama endpoint 仍為 cooldown / degraded由 provider fallback 承接,不是網站或 API service blocker。最新 deploy marker 為 `e506b9d5 chore(cd): deploy fe74d86 [skip ci]`;本輪 `89b9e67a` 是 SOP / scripts / docs source update不是 runtime bundle deploy marker。112 Wazuh 與 120 K3s 的 23:56 脫敏 readback 仍作為本輪相鄰 evidence120 ArgoCD `Synced / Healthy`、Pod 均 `Running``Completed`Wazuh manager registry 並非全空,但 `WAZUH_MANAGER_REGISTRY_ACCEPTED=0` 維持,不能宣稱全主機納管恢復
2026-06-26 23:56 live summary retained for comparison`scripts/reboot-recovery/post-reboot-readiness-summary.sh --no-color` 回傳 `POST_START_RESULT=FULL_STACK_GREEN_DR_ESCROW_BLOCKED``POST_START_PASS=38``POST_START_WARN=3``POST_START_BLOCKED=0``SERVICE_GREEN=1``PRODUCT_DATA_GREEN=1``STOCK_FRESHNESS_STATUS=ok``STOCK_LATEST_TRADING_DATE=2026-06-26``STOCK_BLOCKERS=none``BACKUP_CORE_GREEN=1``ESCROW_MISSING_COUNT=5``HOST_188_HYGIENE_BLOCKED=0``WAZUH_MANAGER_REGISTRY_ACCEPTED=0``RUNTIME_ACTION_AUTHORIZED=0`。同一時段只讀補查 120ArgoCD `awoooi-prod``Synced / Healthy``awoooi-prod` Pod 均為 `Running``Completed`;歷史 `km-vectorize-29689620` failed Job 已被 2026-06-23、2026-06-24、2026-06-25 後續成功 Job 覆蓋,不是目前服務 blocker。同一時段只讀補查 112systemd `running`Wazuh manager / indexer / dashboard `active`manager API root 回 `401`Dashboard unauthenticated check endpoints 回 `401`manager registry 脫敏讀回為 local manager `1`、受管 agent `5`、active managed `5`、disconnected `0`、never connected `0`。此證據證明 registry 不再是「全空」,但仍不能宣稱 Wazuh 全主機納管恢復,因為 SOP expected scope 仍是 6、Dashboard API connection / version 尚未以登入或 owner evidence 驗收owner response accepted 仍為 `0`
2026-06-26 18:46 最新即時恢復真相已覆蓋 12:13 對今日產品資料的判讀:`scripts/reboot-recovery/post-reboot-readiness-summary.sh --no-color` 回傳 `POST_START_RESULT=PRODUCT_DATA_PENDING_EOD_WINDOW``SERVICE_GREEN=1``PRODUCT_DATA_GREEN=0``STOCK_LATEST_TRADING_DATE=2026-06-26``STOCK_BLOCKERS=core_margin_short_daily_missing,ai_recommendations_stale``BACKUP_CORE_GREEN=1``ESCROW_MISSING_COUNT=5``WAZUH_MANAGER_REGISTRY_ACCEPTED=0`。同一輪 live cold-start 長檢查回傳 `PASS=87 WARN=0 BLOCKED=0``Result: GREEN`,代表 110 / 120 / 121 / 188 主機、K3s、public routes、AWOOI API、MOMO、backup core、exporters、cron 與 Alertmanager 服務層已恢復;但 StockPlatform 今日官方 margin-short 尚未發布AI recommendations 仍依賴該資料因此不可宣稱所有產品資料最新。18:43 已以授權 `SIGTERM` 清除 110 上兩組 6 小時以上 `stockplatform-review-bulk-ux` orphan Chrome process group`REMAINING=0`18:44-18:46 已停止 168 Mac Mini 上本機 AWOOOI `next build` 並清理 temp/build/cache 與 Antigravity backup browser recordings使 `/System/Volumes/Data` 從約 `1.0Gi / 100%` 回到約 `8.7Gi / 96%`。112 Kali 的 `networking.service` failed 已定位為 `/etc/network/if-up.d/wg-nat` 錯誤 shebang `#\!/bin/bash` 導致 `Exec format error`Wazuh manager / indexer / dashboard 仍 active該 hook 修復需要 112 sudo 提權,未使用或保存密碼。

View File

@@ -1,7 +1,7 @@
# 主機重啟後一頁式總檢查
> Version: v1.16
> Last updated: 2026-06-26 Asia/Taipei
> Version: v1.17
> Last updated: 2026-06-27 Asia/Taipei
> Scope: 110 / 120 / 121 / 188 post-reboot service recovery. 112 Kali / Wazuh / active scan 不屬於本流程。
---
@@ -10,7 +10,7 @@
每次 110 / 120 / 121 / 188 任一台主機開機、關機、重啟、斷電恢復、VMware console fsck、Docker / K3s 大量重排後,都先跑本頁,再決定是否宣稱恢復。
最新基準2026-06-26 23:56 single-summary replay / route + AWOOOI API warmup classifier。`scripts/reboot-recovery/post-reboot-readiness-summary.sh --no-color` 回傳 `POST_START_RESULT=FULL_STACK_GREEN_DR_ESCROW_BLOCKED``POST_START_PASS=38``POST_START_WARN=3``POST_START_BLOCKED=0``SERVICE_GREEN=1``PRODUCT_DATA_GREEN=1``STOCK_FRESHNESS_STATUS=ok``STOCK_LATEST_TRADING_DATE=2026-06-26``STOCK_BLOCKERS=none``BACKUP_CORE_GREEN=1``DR_ESCROW_BLOCKED=1``ESCROW_MISSING_COUNT=5``HOST_188_HYGIENE_BLOCKED=0``HOST_188_RESULT=HOST_188_HYGIENE_GREEN.``WAZUH_MANAGER_REGISTRY_ACCEPTED=0``WAZUH_COVERAGE_SCOPE=6``WAZUH_DIRECT_ACTIVE=2``WAZUH_NO_TRANSPORT=1``WAZUH_SSH_BLOCKED=3``WAZUH_ROUTE_CODE=200``WAZUH_TRANSPORT_COUNT=6``WAZUH_DASHBOARD_API_CONNECTION=pending_or_spinning``WAZUH_DASHBOARD_INDEX_OK=3``RUNTIME_ACTION_AUTHORIZED=0``OVERALL_DECLARATION=FULL_STACK_GREEN_DR_ESCROW_BLOCKED`,並自動把同一份 key/value 寫到 `$ARTIFACT_DIR/summary.txt`。同一輪後續 `post-reboot-declaration-guard.py``post-reboot-next-gate-dispatch.sh``post-reboot-next-gate-owner-packets.py``post-reboot-owner-packet-contract-guard.py``post-reboot-owner-response-preflight.py` 必須使用這份 `summary.txt` 或由它產生的 dispatch / packet不得混用多次 live probe 的不同時間點結果。`NEXT_REQUIRED_GATES=credential_escrow_evidence,wazuh_manager_registry_export` 仍是唯一目前 next gatesDR 仍因 `escrow_missing=5` 不可宣稱 completeWazuh manager registry accepted 仍是 `0`,不可把 route `200`、transport `6`、Dashboard index pattern `3` 或脫敏 registry 計數當成全主機納管完成。v1.16 另補 route/API warmup classifierdelegated cold-start 若只因 public route 單次 502 / TLS readback或 K3s rollout 瞬間單次 `BLOCKED AWOOOI API not reachable`,但 wrapper route retry 已確認 public AWOOOI API health 為 2xx該 blocker 會降級為 evidence warningpublic API 仍失敗、其他 non-route blocker 或 retry 後未恢復仍為 hard blocked。
最新基準2026-06-27 00:02 single-summary replay / route + AWOOOI API warmup classifier。`scripts/reboot-recovery/post-reboot-readiness-summary.sh --no-color` 回傳 `POST_START_RESULT=FULL_STACK_GREEN_DR_ESCROW_BLOCKED``POST_START_PASS=38``POST_START_WARN=4``POST_START_BLOCKED=0``SERVICE_GREEN=1``PRODUCT_DATA_GREEN=1``STOCK_FRESHNESS_STATUS=ok``STOCK_LATEST_TRADING_DATE=2026-06-26``STOCK_BLOCKERS=none``BACKUP_CORE_GREEN=1``DR_ESCROW_BLOCKED=1``ESCROW_MISSING_COUNT=5``HOST_188_HYGIENE_BLOCKED=0``HOST_188_RESULT=HOST_188_HYGIENE_GREEN.``WAZUH_MANAGER_REGISTRY_ACCEPTED=0``WAZUH_COVERAGE_SCOPE=6``WAZUH_DIRECT_ACTIVE=2``WAZUH_NO_TRANSPORT=1``WAZUH_SSH_BLOCKED=3``WAZUH_ROUTE_CODE=200``WAZUH_TRANSPORT_COUNT=6``WAZUH_DASHBOARD_API_CONNECTION=pending_or_spinning``WAZUH_DASHBOARD_INDEX_OK=3``RUNTIME_ACTION_AUTHORIZED=0``OVERALL_DECLARATION=FULL_STACK_GREEN_DR_ESCROW_BLOCKED`,並自動把同一份 key/value 寫到 `$ARTIFACT_DIR/summary.txt`Production route smoke 同輪確認 IwoooS、Wazuh read-only routes、VibeWork、AwoooGo、MOMO health、Stock 均為 `200`AWOOOI API health 整體 `healthy`local Ollama cooldown 由 GCP provider fallback 承接,不是網站或 API service blocker。同一輪後續 `post-reboot-declaration-guard.py``post-reboot-next-gate-dispatch.sh``post-reboot-next-gate-owner-packets.py``post-reboot-owner-packet-contract-guard.py``post-reboot-owner-response-preflight.py` 必須使用這份 `summary.txt` 或由它產生的 dispatch / packet不得混用多次 live probe 的不同時間點結果。`NEXT_REQUIRED_GATES=credential_escrow_evidence,wazuh_manager_registry_export` 仍是唯一目前 next gatesDR 仍因 `escrow_missing=5` 不可宣稱 completeWazuh manager registry accepted 仍是 `0`,不可把 route `200`、transport `6`、Dashboard index pattern `3` 或脫敏 registry 計數當成全主機納管完成。v1.17 維持 route/API warmup classifierdelegated cold-start 若只因 public route 單次 502 / TLS readback或 K3s rollout 瞬間單次 `BLOCKED AWOOOI API not reachable`,但 wrapper route retry 已確認 public AWOOOI API health 為 2xx該 blocker 會降級為 evidence warningpublic API 仍失敗、其他 non-route blocker 或 retry 後未恢復仍為 hard blocked。
本頁只回答四件事:

View File

@@ -11,7 +11,7 @@
| Area | Status | Completion | Evidence |
|------|--------|------------|----------|
| Overall recovery readiness | FULL_STACK_GREEN_DR_ESCROW_BLOCKED | 99% | 2026-06-26 23:56 即時摘要覆蓋 18:45 EOD pending 判讀。`post-reboot-readiness-summary.sh --no-color` 回傳 `SERVICE_GREEN=1``PRODUCT_DATA_GREEN=1``POST_START_RESULT=FULL_STACK_GREEN_DR_ESCROW_BLOCKED``POST_START_WARN=3``STOCK_FRESHNESS_STATUS=ok``STOCK_LATEST_TRADING_DATE=2026-06-26``STOCK_BLOCKERS=none``BACKUP_CORE_GREEN=1``ESCROW_MISSING_COUNT=5``WAZUH_MANAGER_REGISTRY_ACCEPTED=0`主機 / K3s / public routes / AWOOOI / MOMO / Stock / backup core / 188 hygiene 已恢復120 ArgoCD 為 `Synced / Healthy``awoooi-prod` Pod 均為 `Running``Completed`;歷史 `km-vectorize-29689620` failed Job 已由後續成功 Job 覆蓋。DR 仍因 credential escrow 缺 5 不能宣稱 completeWazuh registry 已有脫敏 manager readback但尚未 Dashboard API / owner acceptance。 |
| Overall recovery readiness | FULL_STACK_GREEN_DR_ESCROW_BLOCKED | 99% | 2026-06-27 00:02 即時摘要覆蓋 2026-06-26 23:56 判讀。`post-reboot-readiness-summary.sh --no-color` 回傳 `SERVICE_GREEN=1``PRODUCT_DATA_GREEN=1``POST_START_RESULT=FULL_STACK_GREEN_DR_ESCROW_BLOCKED``POST_START_WARN=4``STOCK_FRESHNESS_STATUS=ok``STOCK_LATEST_TRADING_DATE=2026-06-26``STOCK_BLOCKERS=none``BACKUP_CORE_GREEN=1``ESCROW_MISSING_COUNT=5``WAZUH_MANAGER_REGISTRY_ACCEPTED=0`Production route smokeIwoooS / Wazuh read-only routes / VibeWork / AwoooGo / MOMO health / Stock `200`AWOOOI API health `healthy / prod / mock_mode=false`local Ollama cooldown 由 GCP provider fallback 承接,不是網站或 API blocker。主機 / K3s / public routes / AWOOOI / MOMO / Stock / backup core / 188 hygiene 已恢復。DR 仍因 credential escrow 缺 5 不能宣稱 completeWazuh registry 已有脫敏 manager readback但尚未 Dashboard API / owner acceptance。 |
| P0 host / K3s recovery | DONE | 100% | 120 booted after console fsck at `2026-06-12 15:13`; latest 2026-06-26 07:19 readback shows 120 and 121 reachable, K3s active, `mon` and `mon1` both `Ready control-plane`, AWOOOI API/Web replicas split across both nodes, ArgoCD `awoooi-prod Synced / Healthy` at revision `1fd5e2a8b0f18d24eed16aa2a44286bcbf230603`, and `km-vectorize` official 03:00 台北時間 run succeeded with `lastSuccess=2026-06-25T19:00:14Z`. |
| P1 backup / alert / escrow | BLOCKED_DR_ESCROW | 97% | 2026-06-26 06:58 backup readback shows 110 `13/13 fresh failed=0`, 188 `2/2 fresh failed=0`, `core_blockers=0`, `integrity_stale=0`, `offsite_fresh=1`, `rclone_gdrive_fresh=1`, `escrow_missing=5`, last aggregate `2026-06-26 02:31:02`。DR remains blocked on real non-secret credential escrow evidence IDs; do not write placeholder markers or paste secret values. |
| P2 service / data truth | DONE | 100% | Public routes 與 service health 為綠燈MOMO health `V10.719`current-month parity 為 `15383|15383|2026-06-01|2026-06-24|2026-06-01|2026-06-24`。StockPlatform `/api/v1/system/freshness``ok`latest trading date `2026-06-26`blockers `none`;先前 Stock EOD blocker 已由官方來源與正式 cron 自然收斂。 |