docs(ops): record StockPlatform cron recovery in reboot SOP [skip ci]
This commit is contained in:
@@ -1,3 +1,44 @@
|
||||
## 2026-06-25|StockPlatform production cron entrypoint 修復與重啟 SOP v1.58 收斂
|
||||
|
||||
**背景**:使用者要求重啟後所有網站、服務、產品版本與資料都要用最新真相判斷,不能只看 route `200`。19:35 stricter product-data gate 已確認 StockPlatform route / health route 都是 `200`,但 `/api/v1/system/freshness` 仍因 `core_margin_short_daily_missing` 與 `ai_recommendations_stale` BLOCKED。本輪追下去後確認同時存在 production source drift:live cron 參照多支不存在的 `scripts/ops/*.sh`,導致部分排程一直 `script_exit_127`。
|
||||
|
||||
**StockPlatform repo / live 修復**:
|
||||
- Clean worktree:`/private/tmp/stockplatform-v2-cron-recovery-20260625`。
|
||||
- Branch:`codex/stockplatform-cron-recovery-20260625`。
|
||||
- Commit:`fb91aa4c6272469d1d26e0820169629eac17d28a fix(ops): restore production cron recovery entrypoints`。
|
||||
- 已一般 push 到 `gitea/codex/stockplatform-cron-recovery-20260625`,並 fast-forward push 到 `gitea/main`;沒有 force push。
|
||||
- 修復內容:新增 live 缺失的 `run-data-freshness-monitor.sh`、`run-intelligence-freshness-monitor.sh`、`run-market-index-ingestion.sh`、`run-official-eod-bulk-ingestion.sh`、`run-price-ingestion.sh`、`run-source-remediation-task.sh`,並同步 `install-production-cron.sh`、`run-intelligence-sync.sh`、`run-margin-short-ingestion.sh`。
|
||||
- Live write:僅在 110 `/home/wooo/stockplatform-v2` 執行一次 `git pull --ff-only origin main`,把 live source 從 `c67a2cf5aef3f15f14c99941a1615d1c809bac33` fast-forward 到 `fb91aa4c6272469d1d26e0820169629eac17d28a`。
|
||||
- 未執行:Docker restart、systemd restart、Nginx reload、firewall / iptables、K8s / ArgoCD action、manual ingestion run、manual DB write、backup restore、secret read。
|
||||
|
||||
**Live evidence**:
|
||||
- Live source contract:`install-production-cron.sh` 參照的 17 支 `scripts/ops/*.sh` 目前全部存在;修補範圍 9 支 shell script `bash -n` 通過。
|
||||
- 19:56 `source-remediation-queue`:從 19:49 `script_exit_127` 轉為 `succeeded`,log 顯示 `STOCKPLATFORM_SOURCE_REMEDIATION_TASK_NONE`。
|
||||
- 20:00 `market-index-ingestion`:`succeeded`,不再是 `No such file or directory`。
|
||||
- 20:02 `price-ingestion`:`succeeded`,不再是 `No such file or directory`。
|
||||
- 20:05 `margin-short-ingestion`:`succeeded`,但官方 2026-06-25 margin-short data 仍 `official_pending`,`row_count=0`。
|
||||
- 20:06 `chips-ingestion`:`succeeded`。
|
||||
- 20:10 `ai-recommendation-pipeline`:cron/job 層 `succeeded`,內部結果正確 `blocked`,原因為 `core_margin_short_daily_incomplete,official_margin_short_daily_official_pending`。
|
||||
- 20:11 direct DB summary:`price|2026-06-25|1976`、`chips|2026-06-25|1976`、`margin|2026-06-24|1976`、`ai_recommendations|2026-06-24|120`。
|
||||
- 20:11 `/api/v1/system/freshness` 仍 `status=blocked`,blockers 維持 `core_margin_short_daily_missing`、`ai_recommendations_stale`。
|
||||
|
||||
**SOP / 文件更新**:
|
||||
- `docs/runbooks/FULL-STACK-COLD-START-SOP.md` 更新為 v1.58。
|
||||
- `docs/runbooks/REBOOT-POST-START-QUICK-CHECK.md` 更新為 v1.3,新增 StockPlatform route / live source / cron entrypoint / official data / AI fail-closed 分層。
|
||||
- `docs/workplans/2026-06-04-reboot-cold-start-backup-recovery-workplan.md` 更新 Overall recovery readiness 為 `HOST_AND_CORE_SERVICE_GREEN_STOCK_CRON_SOURCE_REPAIRED_STOCK_DATA_BLOCKED_DR_ESCROW_BLOCKED`,P2 service/data truth 調整為 94%。
|
||||
- `docs/runbooks/BACKUP-STATUS.md` 補上 StockPlatform cron-source / backup boundary,明確說明這不是 backup/restore incident。
|
||||
|
||||
**最新判定**:
|
||||
- Host / K3s / AWOOOI runtime / public routes / MOMO service and data / backup / offsite:green for current evidence。
|
||||
- StockPlatform source-version / cron entrypoint:repaired and naturally verified。
|
||||
- StockPlatform product data freshness:仍 blocked,原因是官方 2026-06-25 融資融券資料尚未完整發布,AI recommendation 正確等待該 gate。
|
||||
- DR:仍 blocked,`escrow_missing=5`。
|
||||
- Wazuh:仍是 security registry evidence blocker,不是重啟 service blocker;manager registry accepted 仍 `0`。
|
||||
|
||||
**下一步**:
|
||||
- 21:00 後只讀確認 `intelligence-sync` 是否用 restored Docker-backed `psql` shim 成功;不手動補跑 production data。
|
||||
- 21:00 / 22:00 / 22:35 / 23:10 後追蹤官方 margin-short source 是否發布;發布後讓正式 cron 產生 margin / AI recommendation freshness green,再重跑 post-start quick check。
|
||||
|
||||
## 2026-06-25|Repair Candidate Draft Ready owner review 狀態模型
|
||||
|
||||
**背景**:Telegram `INC-20260625-977E5F` 類 `node-exporter-188` 告警已能預填 `host_service_route_after_owner_review`、`systemctl restart node-exporter-188`、rollback、verifier 與 AwoooP Work Item,但 webhook / Telegram 仍把它標成 `NO_ACTION - REPAIR_CANDIDATE_MISSING`,造成 operator 看到「AI 選擇不修、需人工」而非「AI 已產出 owner review 草案」。這會讓 AI 自動化產品看起來像只會把問題丟回人工。
|
||||
|
||||
@@ -21,6 +21,7 @@
|
||||
> 2026-06-25 10:35 Codex route / DB / backup refresh: direct public routes for AWOOOI API, IwoooS, VibeWork, AwoooGo, MOMO health, Stock, and Bitan are 200; backup remains 110 `13/13` and 188 `2/2` fresh; MOMO daily and monthly DB bounds still stop at `2026-06-17`; latest import job remains `56 completed`.
|
||||
> 2026-06-25 19:17 Codex latest recovery readback: post-start quick check is `FULL_STACK_GREEN_DR_ESCROW_BLOCKED`; 110 backup `13/13 fresh failed=0`, 188 backup `2/2 fresh failed=0`, `core_blockers=0`, `integrity_stale=0`, `offsite_fresh=1`, `rclone_gdrive_fresh=1`; MOMO data freshness is recovered through `2026-06-24`; DR still blocked only by `escrow_missing=5`.
|
||||
> 2026-06-25 19:35 Codex product-data gate refresh: backup/offsite remains green, but overall "all products/data latest" is blocked by StockPlatform `/api/v1/system/freshness` (`core_margin_short_daily_missing`, `ai_recommendations_stale`). This is not a backup failure; keep `escrow_missing=5` as the DR blocker and Stock freshness as a separate product-data blocker.
|
||||
> 2026-06-25 20:11 Codex StockPlatform cron-source recovery: StockPlatform Gitea/live source is now `fb91aa4c6272469d1d26e0820169629eac17d28a`; six missing production cron entrypoints are restored; natural cron runs for source remediation, market index, price, margin, chips, and AI no longer fail from missing files. Backup/offsite remains green. Stock freshness still blocks because official 2026-06-25 margin-short data is pending and AI recommendations correctly stay on 2026-06-24; this is still not a backup or restore incident.
|
||||
|
||||
---
|
||||
|
||||
@@ -47,6 +48,32 @@ Read-only evidence sources: `/backup/scripts/backup-status.sh --no-notify --no-r
|
||||
| Full-stack service state | GREEN_WITH_DR_ESCROW_BLOCKED | `POST_START_QUICK_CHECK PASS=18 WARN=3 BLOCKED=0`; service warnings 0. |
|
||||
| Credential escrow | BLOCKED | `escrow_missing=5`; only real non-secret owner evidence may close this. |
|
||||
|
||||
## 2026-06-25 20:11 StockPlatform Cron Source / Backup Boundary
|
||||
|
||||
Read-only and minimal-write evidence sources: StockPlatform Gitea / live source readback, one fast-forward `git pull --ff-only origin main` on 110 `/home/wooo/stockplatform-v2`, natural cron logs, `ops.job_runs`, and `/api/v1/system/freshness`.
|
||||
|
||||
- Backup remains green from the 19:17 readback: 110 `13/13 fresh failed=0`, 188 `2/2 fresh failed=0`, `core_blockers=0`, `offsite_fresh=1`, `rclone_gdrive_fresh=1`。
|
||||
- DR blocker remains `escrow_missing=5`。
|
||||
- StockPlatform source-version drift is repaired: live `/home/wooo/stockplatform-v2` and Gitea `main` are `fb91aa4c6272469d1d26e0820169629eac17d28a`。
|
||||
- Six previously missing production cron entrypoint scripts are present, and every `scripts/ops/*.sh` referenced by `install-production-cron.sh` exists on live source。
|
||||
- Natural cron evidence after source sync:
|
||||
- `source-remediation-queue` succeeded at 19:56 and 20:00.
|
||||
- `market-index-ingestion` succeeded at 20:00.
|
||||
- `price-ingestion` succeeded at 20:02.
|
||||
- `margin-short-ingestion` succeeded at 20:05 but official 2026-06-25 margin-short data remained pending, with `row_count=0`.
|
||||
- `chips-ingestion` succeeded at 20:06.
|
||||
- `ai-recommendation-pipeline` succeeded at the cron/job layer at 20:10 and correctly blocked internally on `core_margin_short_daily_incomplete,official_margin_short_daily_official_pending`。
|
||||
- Stock freshness remains separate from backup: `/api/v1/system/freshness` is still `blocked` with `core_margin_short_daily_missing` and `ai_recommendations_stale`。
|
||||
- No backup restore, manual DB restore, Docker restart, Nginx reload, K8s action, firewall change, or secret read was performed to address StockPlatform.
|
||||
|
||||
| Gate | Status | Evidence |
|
||||
|------|--------|----------|
|
||||
| Backup / offsite | VERIFIED | 19:17 backup readback remains green. |
|
||||
| StockPlatform cron source | REPAIRED | Live and Gitea at `fb91aa4c6272469d1d26e0820169629eac17d28a`; missing entrypoints restored. |
|
||||
| StockPlatform natural cron entrypoints | VERIFIED | 19:56-20:10 official schedule runs no longer fail with `script_exit_127`. |
|
||||
| StockPlatform product data freshness | BLOCKED_EXTERNAL_SOURCE | Official 2026-06-25 margin-short source pending; AI recommendations stay stale by design. |
|
||||
| Credential escrow | BLOCKED | `escrow_missing=5`; only real non-secret owner evidence may close this. |
|
||||
|
||||
---
|
||||
|
||||
## 2026-06-25 10:35 Backup / Offsite / Escrow Live Status
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# AWOOOI 全棧冷啟動與主機重啟 SOP
|
||||
|
||||
> Version: v1.57
|
||||
> Version: v1.58
|
||||
> Last updated: 2026-06-25 Asia/Taipei
|
||||
> Scope: 110 / 120 / 121 / 188 full-stack reboot recovery. 112 Kali is recorded as P3 optional and is not part of this recovery path.
|
||||
|
||||
@@ -12,7 +12,9 @@
|
||||
|
||||
若只是重啟後要快速判斷能不能宣稱恢復,先跑一頁式總檢查:`scripts/reboot-recovery/post-start-quick-check.sh --no-color`,並以 `docs/runbooks/REBOOT-POST-START-QUICK-CHECK.md` 作為人工 fallback。長 SOP 保留完整背景、例外處理與 Plan B;短版 wrapper / checklist 負責每次 T+10 分鐘內的固定判定。
|
||||
|
||||
2026-06-25 19:35 product-version / data-freshness refresh supersedes the 19:06 data-complete wording. Host boot, K3s, AWOOOI runtime, MOMO service/data, backup/offsite, Bitan cleanliness, and expanded public routes are available, but the stricter post-start wrapper now checks StockPlatform `/api/v1/system/freshness` and correctly returns `RESULT=BLOCKED` when product data is not current. The 19:35 lightweight wrapper run used `--skip-cold-start --skip-backup --skip-cpu` after the 19:24 full host/cold-start/backup readback and returned `PASS=31 WARN=1 BLOCKED=1`, with the single blocker `StockPlatform freshness is blocked: core_margin_short_daily_missing,ai_recommendations_stale`. `stock.wooo.work`, `/healthz`, and `/api/healthz` all return `200`, and live source `/home/wooo/stockplatform-v2` matches Gitea `main` `c67a2cf5aef3f15f14c99941a1615d1c809bac33`, so this is a data-freshness blocker, not a route or source-version blocker. Public routes now covered by the wrapper include AWOOOI, VibeWork, AwoooGo, 2026FIFA, Agent Bounty, MOMO, Stock, Bitan, TsenYang, VTuber, Gitea, Harbor, Registry, Sentry, SigNoz, Langfuse, and AIOps. Do not declare "all products and data are latest" until StockPlatform freshness is `ok`; keep DR blocked until `escrow_missing=0`.
|
||||
2026-06-25 20:11 StockPlatform cron-source recovery supersedes the 19:35 source-version wording. StockPlatform Gitea `main` and live `/home/wooo/stockplatform-v2` are now at `fb91aa4c6272469d1d26e0820169629eac17d28a fix(ops): restore production cron recovery entrypoints`; six missing production cron entrypoint scripts are restored, `run-intelligence-sync.sh` contains the Docker-backed `psql` shim, and live contract check confirms every `scripts/ops/*.sh` referenced by `install-production-cron.sh` exists. The only live write performed for StockPlatform recovery was a fast-forward `git pull --ff-only origin main` on 110; no Docker/systemd/Nginx/firewall/K8s restart, manual ingestion run, manual DB write, or secret read was performed. Natural cron evidence after the pull is now green for the repaired entrypoints: `source-remediation-queue` 19:56 and 20:00 succeeded, `market-index-ingestion` 20:00 succeeded, `price-ingestion` 20:02 succeeded, `margin-short-ingestion` 20:05 succeeded, `chips-ingestion` 20:06 succeeded, and `ai-recommendation-pipeline` 20:10 ran but correctly produced the internal blocker `core_margin_short_daily_incomplete,official_margin_short_daily_official_pending`. StockPlatform `/api/v1/system/freshness` therefore still returns `status=blocked` because the 2026-06-25 official margin-short source is pending and `ai.recommendations` must stay on 2026-06-24 until that gate clears. This is no longer a route, source-version, or missing-cron-script blocker; it is a product-data freshness blocker waiting on official source availability and the next valid AI pipeline run.
|
||||
|
||||
2026-06-25 19:35 product-version / data-freshness refresh supersedes the 19:06 data-complete wording. Host boot, K3s, AWOOOI runtime, MOMO service/data, backup/offsite, Bitan cleanliness, and expanded public routes are available, but the stricter post-start wrapper now checks StockPlatform `/api/v1/system/freshness` and correctly returns `RESULT=BLOCKED` when product data is not current. The 19:35 lightweight wrapper run used `--skip-cold-start --skip-backup --skip-cpu` after the 19:24 full host/cold-start/backup readback and returned `PASS=31 WARN=1 BLOCKED=1`, with the single blocker `StockPlatform freshness is blocked: core_margin_short_daily_missing,ai_recommendations_stale`. `stock.wooo.work`, `/healthz`, and `/api/healthz` all return `200`; public routes now covered by the wrapper include AWOOOI, VibeWork, AwoooGo, 2026FIFA, Agent Bounty, MOMO, Stock, Bitan, TsenYang, VTuber, Gitea, Harbor, Registry, Sentry, SigNoz, Langfuse, and AIOps. Do not declare "all products and data are latest" until StockPlatform freshness is `ok`; keep DR blocked until `escrow_missing=0`.
|
||||
|
||||
2026-06-25 19:06 post-CD live read-only refresh supersedes the 18:53 wrapper wording. Consecutive main pushes caused older deploy markers to be replaced, so the latest production truth is deploy marker `d8ca8224 chore(cd): deploy 9dbe044 [skip ci]`. Read-only ArgoCD shows `awoooi-prod Synced / Healthy` at revision `d8ca822422021d0fda8da8fa4c354c0c4db7ff22`; API/Web/Worker live image tag `9dbe044ea1e8e3894ccbeb5ed760bb124b87f7be`; API `2/2`, Web `2/2`, Worker `1/1`. The 19:05 post-start quick check returns `RESULT=FULL_STACK_GREEN_DR_ESCROW_BLOCKED`, delegated cold-start remains `PASS=89 WARN=0 BLOCKED=0`, and 19:05-19:06 route stability checks confirm AWOOOI API, IwoooS, AwoooGo, Stock, VibeWork, Bitan, and MOMO health all return `200` for five consecutive external reads. Earlier AwoooGo / Stock `502` reads were post-deploy upstream warmup transients, not persistent service failures. Hosts, routes, K3s, AWOOOI API health, MOMO service health, MOMO business data freshness, backup core/offsite, and core monitoring/exporter surfaces are green for controlled runner/CD release. MOMO is healthy on `V10.690`; latest import job `57` completed cleanly; `MOMO_DAILY_FRESHNESS 1|2026-06-24`; current-month daily snapshot and realtime tables match through `2026-06-24`. `post-start-quick-check.sh` parses cold-start `PASS / WARN / BLOCKED` summary before classifying exit codes, so WARN-only rollout/stale evidence is no longer inflated into a service blocker. The wrapper returns `RESULT=FULL_STACK_GREEN_DR_ESCROW_BLOCKED` when service blockers are zero but `escrow_missing=5` remains. Do not turn this into a DR complete or security/runtime acceptance claim. Wazuh production routes are now `200 disabled_waiting_iwooos_wazuh_owner_gate`, but `configured=false`, manager query accepted `0`, manager registry accepted `0`, and runtime gate `0`; treat Wazuh as a security registry evidence blocker, not a reboot service blocker.
|
||||
|
||||
@@ -22,12 +24,12 @@ Live cold-start read-only check: 2026-06-25 19:05 wrapper delegated cold-start P
|
||||
Post-start quick check: 2026-06-25 19:05 PASS=18 WARN=3 BLOCKED=0; warning split SERVICE=0 BOUNDARY=1 EVIDENCE=2; Result=FULL_STACK_GREEN_DR_ESCROW_BLOCKED; exit code 0.
|
||||
Repo-side cold-start v1.42+ live read-only run: MOMO source absence / stale data blocker is cleared by import job 57 and `MOMO_DAILY_FRESHNESS 1|2026-06-24`. Live 110 script sync is not claimed until a separate approved deployment/sync happens.
|
||||
110 live-sync parity: 2026-06-24 23:15 read-only `verify-cold-start-monitor-deploy.sh` correctly BLOCKED because repo script hash `f60b81029969a527dc742ebc9558d2933f11fe24ec4f46f7a7bc6637759b7b05` differs from 110 live hash `10608873d406911a519afa96218abebc2b85ab6123bdf46b6e21eb269e554bb8`. Do not use live 110 monitor output to prove v1.42 behavior until the approved live-sync gate in §13.3.1 passes.
|
||||
Service state: HOST_AND_CORE_SERVICE_GREEN_STOCK_DATA_BLOCKED_DR_ESCROW_BLOCKED; 110/120/121/188 reachable, K3s mon/mon1 Ready, public routes/TLS green, MOMO data fresh, 110/188 backup health fresh, 188 node-exporter / PostgreSQL exporter / Redis exporter restored, 188 MinIO endpoint and Velero BackupStorageLocation restored, 110 disk pressure cleared. Product-data completeness is blocked by StockPlatform freshness.
|
||||
Service state: HOST_AND_CORE_SERVICE_GREEN_STOCK_CRON_SOURCE_REPAIRED_STOCK_DATA_BLOCKED_DR_ESCROW_BLOCKED; 110/120/121/188 reachable, K3s mon/mon1 Ready, public routes/TLS green, MOMO data fresh, 110/188 backup health fresh, 188 node-exporter / PostgreSQL exporter / Redis exporter restored, 188 MinIO endpoint and Velero BackupStorageLocation restored, 110 disk pressure cleared. StockPlatform production cron source drift is repaired and verified by natural cron runs; product-data completeness is still blocked by official margin-short source availability and AI recommendation freshness.
|
||||
Runtime release state: API/Web/Worker live image tag is `9dbe044ea1e8e3894ccbeb5ed760bb124b87f7be`, and 19:06 K3s readback shows API/Web/Worker pods Running; production API health returns healthy with `environment=prod`, `mock_mode=false`, and postgresql / redis / openclaw / signoz / gcp ollama providers up. 19:05 route smoke returned 200 for AWOOOI API, IwoooS, MOMO health, and Stock; cold-start route gate also returned expected statuses for AWOOOI web, MOMO, Gitea, Harbor, Registry, Sentry, SigNoz, Langfuse, Bitan, and AIOps. AwoooGo, Stock, AWOOOI API, IwoooS, VibeWork, MOMO health, and Bitan then returned 200 for five consecutive external route reads from 19:05:38 to 19:06:24. 19:35 expanded route readback returned expected 2xx/3xx for AWOOOI, VibeWork, AwoooGo, 2026FIFA, Agent Bounty, MOMO, Stock, Bitan, TsenYang, VTuber, Gitea, Harbor, Registry, Sentry, SigNoz, Langfuse, and AIOps. Cold-start raw route gate returned all expected route statuses, including redirects such as awoooi web=307 and sentry=302.
|
||||
MOMO release state: mo.wooo.work health is healthy on version V10.690. `momo-pro-system`, `momo-scheduler`, and `momo-telegram-bot` are healthy; scheduler `RestartCount=0`. 18:23 dedicated preflight returns PASS=19 WARN=2 BLOCKED=0, so retain recent container-replace / scheduler fail-closed / notification evidence notes, but no service blocker remains.
|
||||
MOMO data state: current-month daily_sales_snapshot and realtime_sales_monthly match through 2026-06-24: `daily_sales_snapshot=109061|2025-07-01|2026-06-24`, `MOMO_MONTHLY_SYNC 15383|15383|2026-06-01|2026-06-24|2026-06-01|2026-06-24`, and `MOMO_DAILY_FRESHNESS 1|2026-06-24`. Latest import job is `57 completed|即時業績_當日.xlsx|2026-06-25T13:16:47.359958|2026-06-25T13:18:02.964985|15383|15383|0`.
|
||||
StockPlatform data state: `/api/v1/system/freshness` returns `status=blocked`, `latest_trading_date=2026-06-25`, blockers `core_margin_short_daily_missing,ai_recommendations_stale`. Current OK sources include `core.price_daily` 2026-06-25 / 1976 rows, `core.chips_daily` 2026-06-25 / 1976 rows, `core.market_index_daily.tw` 2026-06-25 / 2 rows, and `core.market_index_daily.global` 2026-06-25 / 2001 rows. Blocked sources are `core.margin_short_daily` latest 2026-06-24 / 0 rows for 2026-06-25 and `ai.recommendations` latest 2026-06-24 / 2748 rows. Do not claim StockPlatform data or AI recommendations are latest until this endpoint is `ok`.
|
||||
Product version readback: StockPlatform live source `/home/wooo/stockplatform-v2` matches Gitea `wooo/stockplatform-v2.git` main `c67a2cf5aef3f15f14c99941a1615d1c809bac33`; VibeWork live image `192.168.0.110:5000/vibework/web:76a4ee15026af278a3660ad4b4547e9308b107be` matches Gitea `wooo/vibework.git` main `76a4ee15026af278a3660ad4b4547e9308b107be`; AwoooGo live source `/home/wooo/awooogo` matches Gitea `wooo/AwoooGo` main `6897972e9820cbb96c508fa9a80c66246c973307`; MOMO runtime uses `registry.wooo.work/wooo/momo-pro-system:stable` image id `df931906e158` created `2026-06-25T13:28:59+08:00`, while Gitea `wooo/momo-pro-system.git` main is `25120cbf21ba51affc94d0220ec87e607f59a833`; 188 runtime directory is a compose/image deployment path, not a git worktree, so add image revision label evidence before declaring code-image parity.
|
||||
StockPlatform data state: `/api/v1/system/freshness` returns `status=blocked`, `latest_trading_date=2026-06-25`, blockers `core_margin_short_daily_missing,ai_recommendations_stale`. Current OK sources include `core.price_daily` 2026-06-25 / 1976 rows, `core.chips_daily` 2026-06-25 / 1976 rows, `core.market_index_daily.tw` 2026-06-25 / 2 rows, and `core.market_index_daily.global` 2026-06-25 / 2001 rows. Direct DB readback is `price|2026-06-25|1976`, `chips|2026-06-25|1976`, `margin|2026-06-24|1976`, `ai_recommendations|2026-06-24|120`. The 20:05 margin ingestion ran successfully but returned `row_count=0` with official pending evidence for 2026-06-25; the 20:10 AI pipeline succeeded at the cron/job layer but remained blocked by `core_margin_short_daily_incomplete,official_margin_short_daily_official_pending`. Do not claim StockPlatform data or AI recommendations are latest until this endpoint is `ok`.
|
||||
Product version readback: StockPlatform live source `/home/wooo/stockplatform-v2` matches Gitea `wooo/stockplatform-v2.git` main `fb91aa4c6272469d1d26e0820169629eac17d28a`; VibeWork live image `192.168.0.110:5000/vibework/web:76a4ee15026af278a3660ad4b4547e9308b107be` matches Gitea `wooo/vibework.git` main `76a4ee15026af278a3660ad4b4547e9308b107be`; AwoooGo live source `/home/wooo/awooogo` matches Gitea `wooo/AwoooGo` main `6897972e9820cbb96c508fa9a80c66246c973307`; MOMO runtime uses `registry.wooo.work/wooo/momo-pro-system:stable` image id `df931906e158` created `2026-06-25T13:28:59+08:00`, while Gitea `wooo/momo-pro-system.git` main is `25120cbf21ba51affc94d0220ec87e607f59a833`; 188 runtime directory is a compose/image deployment path, not a git worktree, so add image revision label evidence before declaring code-image parity.
|
||||
Google Drive / source-file state: 14:16 cold-start reports `MOMO_GDRIVE_TOKEN_STAT 100000:100000:600 scheduler_uid=100000`. Dedicated preflight confirms host token metadata matches scheduler UID and restrictive mode; container token artifact exists with mode `600`. Token content was not read. Future Drive auth/API failure must still be treated as failed import evidence rather than no-file success.
|
||||
110 CPU/load readback: 2026-06-25 10:58 user-approved minimal SIGTERM targeted only orphan `stockplatform-review-bulk-ux` Chrome process groups `438005`, `471295`, `640155`, and `670628`; `OLD_GROUPS_REMAINING` returned empty. 19:05 readback shows current higher load is mainly Gitea Actions cache save / `zstdmt` / `tar`, StockPlatform headless Chrome smoke / CI, Gitea, AWOOOI API, ClickHouse, Docker, and platform services. No Docker/systemd/Nginx/firewall/K8s write was performed; do not cancel active CI/smoke unless separately approved. If Chrome groups are active children of Playwright / CI, observe queue and timeout; if they become PPID 1 orphan process groups with sustained CPU and no parent smoke, run dry-run and require owner approval before targeted `SIGTERM`.
|
||||
Backup / monitoring state: 19:05 wrapper readback confirms backup core blockers are 0, 110 is 13/13 fresh failed=0, 188 is 2/2 fresh failed=0, offsite_fresh=1, rclone_gdrive_fresh=1, integrity_stale=0, last aggregate is 2026-06-25 02:35:09, and escrow_missing=5.
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# 主機重啟後一頁式總檢查
|
||||
|
||||
> Version: v1.2
|
||||
> Version: v1.3
|
||||
> Last updated: 2026-06-25 Asia/Taipei
|
||||
> Scope: 110 / 120 / 121 / 188 post-reboot service recovery. 112 Kali / Wazuh / active scan 不屬於本流程。
|
||||
|
||||
@@ -110,6 +110,14 @@ curl -k -sS https://stock.wooo.work/api/v1/system/freshness
|
||||
|
||||
`stock.wooo.work`、`/healthz`、`/api/healthz` 皆為 200 只代表服務活著;`/api/v1/system/freshness` 回 `blocked` 時,不可宣稱 StockPlatform 資料最新。
|
||||
|
||||
若 freshness blocker 是 `core_margin_short_daily_missing` / `ai_recommendations_stale`,先分層判斷:
|
||||
|
||||
- route / health 200:只代表 StockPlatform service up。
|
||||
- live source 必須和 Gitea `main` 對齊;2026-06-25 20:11 已知恢復基準是 `fb91aa4c6272469d1d26e0820169629eac17d28a`。
|
||||
- `scripts/ops/install-production-cron.sh` 參照的 production cron entrypoints 必須全部存在,否則是 source drift,不是官方資料問題。
|
||||
- 自然 cron run 必須不再有 `script_exit_127`。2026-06-25 19:56-20:10 已驗證 source remediation / market index / price / margin / chips / AI pipeline entrypoints 可執行。
|
||||
- 若 margin ingestion status=0 但 `row_count=0` 並顯示 official pending,AI pipeline blocked 是正確 fail-closed;不可用手動 DB restore 或 fake freshness 補綠。
|
||||
|
||||
### Step 5 - Backup / offsite / escrow
|
||||
|
||||
在 110 只讀執行:
|
||||
|
||||
@@ -11,11 +11,11 @@
|
||||
|
||||
| Area | Status | Completion | Evidence |
|
||||
|------|--------|------------|----------|
|
||||
| Overall recovery readiness | HOST_AND_CORE_SERVICE_GREEN_STOCK_DATA_BLOCKED_DR_ESCROW_BLOCKED | 96% | 2026-06-25 19:24 full post-start readback showed hosts / K3s / AWOOOI / MOMO / backup / offsite service gates green and `escrow_missing=5`; 2026-06-25 19:35 stricter product-data wrapper returned `POST_START_QUICK_CHECK PASS=31 WARN=1 BLOCKED=1`, result `BLOCKED`, because StockPlatform `/api/v1/system/freshness` is `blocked` with `core_margin_short_daily_missing,ai_recommendations_stale`. Expanded public route smoke covers AWOOOI, VibeWork, AwoooGo, 2026FIFA, Agent Bounty, MOMO, Stock, Bitan, TsenYang, VTuber, Gitea, Harbor, Registry, Sentry, SigNoz, Langfuse, and AIOps; all returned expected 2xx/3xx. MOMO remains fresh through `2026-06-24` with latest job `57` completed cleanly, and Bitan public-content cleanliness direct check passed. Do not declare "all products/data latest" until StockPlatform freshness is `ok`; do not declare DR complete until `escrow_missing=0`. |
|
||||
| Overall recovery readiness | HOST_AND_CORE_SERVICE_GREEN_STOCK_CRON_SOURCE_REPAIRED_STOCK_DATA_BLOCKED_DR_ESCROW_BLOCKED | 97% | 2026-06-25 19:24 full post-start readback showed hosts / K3s / AWOOOI / MOMO / backup / offsite service gates green and `escrow_missing=5`; 2026-06-25 19:35 stricter product-data wrapper returned `POST_START_QUICK_CHECK PASS=31 WARN=1 BLOCKED=1`, result `BLOCKED`, because StockPlatform `/api/v1/system/freshness` is `blocked` with `core_margin_short_daily_missing,ai_recommendations_stale`. 2026-06-25 20:11 follow-up repaired the StockPlatform production cron source drift: Gitea and live `/home/wooo/stockplatform-v2` are `fb91aa4c6272469d1d26e0820169629eac17d28a`, six missing cron entrypoints are present, live cron contract covers all referenced scripts, and natural cron runs at 19:56 / 20:00 / 20:02 / 20:05 / 20:06 / 20:10 no longer fail with `script_exit_127`. Expanded public route smoke covers AWOOOI, VibeWork, AwoooGo, 2026FIFA, Agent Bounty, MOMO, Stock, Bitan, TsenYang, VTuber, Gitea, Harbor, Registry, Sentry, SigNoz, Langfuse, and AIOps; all returned expected 2xx/3xx. MOMO remains fresh through `2026-06-24` with latest job `57` completed cleanly, and Bitan public-content cleanliness direct check passed. Do not declare "all products/data latest" until StockPlatform freshness is `ok`; do not declare DR complete until `escrow_missing=0`. |
|
||||
| P0 host / K3s recovery | DONE | 100% | 120 booted after console fsck at `2026-06-12 15:13`; latest 2026-06-25 09:05 readback shows 120 is reachable, K3s is active, `mon` and `mon1` are both `Ready control-plane`, VIP `192.168.0.125` is present, node filesystem / disk-pressure / readonly events are `0`, and latest `km-vectorize-29705460-55rgs` completed. |
|
||||
| P1 backup / alert / escrow | BLOCKED_DR_ESCROW | 97% | 2026-06-25 19:17 backup readback shows 110 `13/13 fresh failed=0`, 188 `2/2 fresh failed=0`, `core_blockers=0`, `integrity_stale=0`, `offsite_fresh=1`, `rclone_gdrive_fresh=1`, `escrow_missing=5`, last aggregate `2026-06-25 02:35:09`。2026-06-25 19:19 offsite escrow report shows script presence OK, rclone configured, full and partial rclone markers present, `PASS=8 WARN=5 BLOCKED=0`, `ESCROW_MISSING_COUNT=5`; DR remains blocked on real non-secret credential escrow evidence IDs. |
|
||||
| P2 service / data truth | BLOCKED_STOCK_DATA_FRESHNESS | 92% | Service routes and core runtime are available, but product-data truth is not complete. 2026-06-25 19:35 StockPlatform `/api/v1/system/freshness` returned `status=blocked`, `latest_trading_date=2026-06-25`, blockers `core_margin_short_daily_missing,ai_recommendations_stale`; OK sources include price / chips / market index for `2026-06-25`, while `core.margin_short_daily` and `ai.recommendations` stop at `2026-06-24`. MOMO health `V10.690`, current-month parity `15383|15383|2026-06-01|2026-06-24|2026-06-01|2026-06-24`, and `DB_DAILY_FRESHNESS 1|2026-06-24` are green; Bitan public-content cleanliness direct check passed; expanded public routes are green. |
|
||||
| P3 docs / automation contracts | DONE_WITH_PRODUCT_DATA_GATE_V157 | 100% | Workplan, SOP v1.57, one-page post-start quick check v1.2, expanded public route list, StockPlatform freshness gate, baseline `stockplatform_system_freshness_ok`, BACKUP-STATUS, LOGBOOK, 120 console/fsck recovery, Gitea backup stale-dump hardening, reboot ledger/version-comparison SOP, escrow evidence audit, 188 nginx Ansible baseline, 110 cold-start detector script, startup judgment layers, GO/NO-GO tree, host recovery cards, explicit Plan B degraded-operation path, machine-readable `plan_b` baseline, readiness-audit Plan B guard, B0-B5 service levels, T+0/T+120 fallback timeline checks, host role / load-balancing assessment, CD `known_hosts` guardrail, `fwupd-refresh.timer` rollback note, K3s filesystem event blocker, AWOOOI backup no-direct-offsite-sync contract, 110/188 Ansible source-of-truth, Gitea self-hosted readiness validation workflow, post-CD no-regression readbacks, stale-vs-active K8s failed Job classification, 110 runaway browser / CI load AIOps exporter + alert + gated remediation PlayBook, Telegram / AI event packet mapping, healthy heartbeat Telegram suppression, MOMO scheduler / current-month detector fix, exporter restore helpers, 110 Docker disk pressure cleanup boundary, notification-noise readback, MOMO import-boundary / Drive-auth fail-closed deploys, product version/readback matrix, and 2026-06-25 stricter product-data gate are updated. Live 110 script sync remains a separate approved live-write gate; do not claim it here. |
|
||||
| P2 service / data truth | BLOCKED_STOCK_DATA_FRESHNESS | 94% | Service routes and core runtime are available, and StockPlatform cron-source drift is repaired, but product-data truth is not complete. 2026-06-25 20:11 StockPlatform `/api/v1/system/freshness` still returned `status=blocked`, `latest_trading_date=2026-06-25`, blockers `core_margin_short_daily_missing,ai_recommendations_stale`; OK sources include price / chips / market index for `2026-06-25`, while `core.margin_short_daily` and `ai.recommendations` stop at `2026-06-24`. Direct DB readback is `price|2026-06-25|1976`, `chips|2026-06-25|1976`, `margin|2026-06-24|1976`, `ai_recommendations|2026-06-24|120`; 20:05 margin ingestion ran successfully but official source was still pending, and 20:10 AI pipeline correctly blocked on the margin gate. MOMO health `V10.690`, current-month parity `15383|15383|2026-06-01|2026-06-24|2026-06-01|2026-06-24`, and `DB_DAILY_FRESHNESS 1|2026-06-24` are green; Bitan public-content cleanliness direct check passed; expanded public routes are green. |
|
||||
| P3 docs / automation contracts | DONE_WITH_PRODUCT_DATA_GATE_V158 | 100% | Workplan, SOP v1.58, one-page post-start quick check v1.3, expanded public route list, StockPlatform freshness gate, StockPlatform cron-source recovery evidence, baseline `stockplatform_system_freshness_ok`, BACKUP-STATUS, LOGBOOK, 120 console/fsck recovery, Gitea backup stale-dump hardening, reboot ledger/version-comparison SOP, escrow evidence audit, 188 nginx Ansible baseline, 110 cold-start detector script, startup judgment layers, GO/NO-GO tree, host recovery cards, explicit Plan B degraded-operation path, machine-readable `plan_b` baseline, readiness-audit Plan B guard, B0-B5 service levels, T+0/T+120 fallback timeline checks, host role / load-balancing assessment, CD `known_hosts` guardrail, `fwupd-refresh.timer` rollback note, K3s filesystem event blocker, AWOOOI backup no-direct-offsite-sync contract, 110/188 Ansible source-of-truth, Gitea self-hosted readiness validation workflow, post-CD no-regression readbacks, stale-vs-active K8s failed Job classification, 110 runaway browser / CI load AIOps exporter + alert + gated remediation PlayBook, Telegram / AI event packet mapping, healthy heartbeat Telegram suppression, MOMO scheduler / current-month detector fix, exporter restore helpers, 110 Docker disk pressure cleanup boundary, notification-noise readback, MOMO import-boundary / Drive-auth fail-closed deploys, product version/readback matrix, and 2026-06-25 stricter product-data gate are updated. Live 110 script sync remains a separate approved live-write gate; do not claim it here. |
|
||||
|
||||
2026-06-25 19:06 post-CD wrapper readback supersedes the 18:53 wording: consecutive main pushes created a deploy storm where older deploy markers were superseded by later commits. Latest production truth is deploy marker `d8ca8224 chore(cd): deploy 9dbe044 [skip ci]`, ArgoCD `Synced / Healthy`, API/Web/Worker image tag `9dbe044ea1e8e3894ccbeb5ed760bb124b87f7be`, direct route smoke 200 for AWOOOI API / IwoooS / VibeWork / AwoooGo / MOMO health / Stock / Bitan and expected route-gate statuses for MOMO / Gitea / Harbor / Registry / Sentry / SigNoz / Langfuse / AIOps, and wrapper `POST_START_QUICK_CHECK PASS=18 WARN=3 BLOCKED=0`. Repo-side cold-start returns `PASS=89 WARN=0 BLOCKED=0`; `/backup/scripts/backup-status.sh --no-notify --no-refresh` reports 110 `13/13 fresh failed=0`, 188 `2/2 fresh failed=0`, `core_blockers=0`, `integrity_stale=0`, `offsite_fresh=1`, `rclone_gdrive_fresh=1`, `escrow_missing=5`; MOMO dedicated preflight returns `PASS=19 WARN=2 BLOCKED=0`; MOMO health is `V10.690`; AwoooGo / Stock transient 502 reads cleared after upstream warmup and five consecutive route reads returned `200`; 110 load is around `14.51 / 12.34 / 11.42`, with Gitea Actions cache save / `zstdmt` / `tar`, StockPlatform headless Chrome smoke / CI, Gitea, AWOOOI API, ClickHouse, Docker, and platform services visible, not an AWOOOI service blocker. Wrapper result is `FULL_STACK_GREEN_DR_ESCROW_BLOCKED`, not `DEGRADED`, because service warnings are `0` and only DR boundary / evidence warnings remain. Wazuh route readback is now `200 disabled_waiting_iwooos_wazuh_owner_gate`, but manager registry accepted remains `0`, so Wazuh is a security registry evidence blocker rather than a reboot service blocker.
|
||||
|
||||
@@ -23,6 +23,8 @@ Full cold-start service readiness may now be declared GREEN for the latest verif
|
||||
|
||||
2026-06-25 19:35 stricter product-data gate readback supersedes the earlier "all product data green" interpretation. The full host/cold-start/backup layer remains green from the 19:24 read-only evidence, but the updated quick check now includes StockPlatform `/api/v1/system/freshness` and therefore blocks on product-data completeness: `POST_START_QUICK_CHECK PASS=31 WARN=1 BLOCKED=1`, `RESULT=BLOCKED`, blocker `core_margin_short_daily_missing,ai_recommendations_stale`. This is a correct no-false-green outcome: `stock.wooo.work`, `/healthz`, and `/api/healthz` all return `200`, but StockPlatform data and AI recommendations are not latest. Next action is a separate StockPlatform data freshness remediation lane; do not solve it by host reboot, Nginx reload, Docker restart, or route-only smoke.
|
||||
|
||||
2026-06-25 20:11 StockPlatform cron-source recovery closeout: root cause for several StockPlatform stale/old-data symptoms included production source drift where cron referenced scripts that were absent from live `/home/wooo/stockplatform-v2`, producing `script_exit_127` for source remediation, market index, price ingestion, and related monitors. Commit `fb91aa4c6272469d1d26e0820169629eac17d28a fix(ops): restore production cron recovery entrypoints` was pushed to `gitea/main` and fast-forward pulled on 110 only. Live post-pull checks confirm all referenced cron scripts exist and `bash -n` passes. Natural cron runs then recovered: source remediation 19:56 / 20:00 succeeded, market index 20:00 succeeded, price 20:02 succeeded, margin 20:05 succeeded, chips 20:06 succeeded, and AI pipeline 20:10 succeeded at cron/job level while correctly blocking on official margin-short source pending. Remaining blocker is official 2026-06-25 margin-short data and dependent AI recommendation freshness, not source-version drift. Next natural follow-up is 21:00 `intelligence-sync` to prove the restored Docker-backed `psql` shim without manual production writes.
|
||||
|
||||
2026-06-13 01:26 refresh: full cold-start is again green for the current evidence set. AWOOOI API/Web workload balancing survived the next normal CD deploy: Gitea main `e4a349bc`, ArgoCD revision `e4a349bc`, images from `414413a5`, API/Web split across `mon` / `mon1`, and global `known_hosts` retained 120 / 188 after CD fix `80e6ec1a`. Do not declare DR complete while credential escrow is missing. `km-vectorize` remediation is `90%`: schedule/label fix is live, and the remaining gate is the next official 03:00 CronJob success readback.
|
||||
|
||||
---
|
||||
|
||||
Reference in New Issue
Block a user