Compare commits
3 Commits
codex/gith
...
codex/110-
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
03f39d3c58 | ||
|
|
073141abcb | ||
|
|
4c951b2996 |
@@ -26,7 +26,7 @@ on:
|
||||
|
||||
jobs:
|
||||
validate:
|
||||
runs-on: self-hosted
|
||||
runs-on: awoooi-ubuntu
|
||||
timeout-minutes: 15
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
@@ -1245,6 +1245,12 @@ jobs:
|
||||
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: Wait for Host Web Build Pressure
|
||||
# 2026-06-27 Codex: post-deploy Playwright smoke is browser-heavy too.
|
||||
# Refuse to add another smoke run while 110 already has CI/build/smoke
|
||||
# pressure; this gate is read-only and never kills other repo work.
|
||||
run: bash scripts/ci/wait-host-web-build-pressure.sh
|
||||
|
||||
- name: Get Commit Info
|
||||
id: commit
|
||||
run: |
|
||||
|
||||
@@ -1,3 +1,82 @@
|
||||
## 2026-06-27|MOMO daily-sales source absence readback 與 cold-start blocker
|
||||
|
||||
**背景**:110 runner / StockPlatform smoke 壓力已止血後,重新跑全主機 cold-start scorecard 與資料 freshness。AWOOOI / IwoooS / Stock / 188 主要 public routes 可用,但整體 cold-start 仍不能宣告 full green;目前主要業務資料 blocker 是 188 MOMO daily sales freshness。
|
||||
|
||||
**執行邊界**:
|
||||
- 本輪只做 read-only preflight、log readback、檔名 / mtime / size 層級來源搜尋與 scorecard 彙整。
|
||||
- 未做 DB write / truncate / restore / manual import,未移動 Drive 檔案,未重啟 Docker / Nginx / K3s / scheduler,未讀 token value、raw session、SQLite、`.env` 或 secret。
|
||||
|
||||
**cold-start / scorecard 結果**:
|
||||
- `scripts/reboot-recovery/post-reboot-readiness-summary.sh` artifact:`/tmp/awoooi-post-reboot-readiness-20260627-codex-rerun/summary.txt`。
|
||||
- `POST_START_RESULT=BLOCKED`、`POST_START_PASS=37`、`POST_START_WARN=3`、`POST_START_BLOCKED=2`、`SERVICE_GREEN=0`。
|
||||
- `PRODUCT_DATA_GREEN=1`、Stock freshness `ok`,latest trading date `2026-06-26`,`STOCK_BLOCKERS=none`。
|
||||
- `BACKUP_CORE_GREEN=1`,但 `DR_ESCROW_BLOCKED=1`、`ESCROW_MISSING_COUNT=5`。
|
||||
- Wazuh route `200`,但 `WAZUH_MANAGER_REGISTRY_ACCEPTED=0`、`WAZUH_RUNTIME_GATE=0`、`RUNTIME_ACTION_AUTHORIZED=0`。
|
||||
- 直接 cold-start rerun:`PASS=88`、`WARN=0`、`BLOCKED=1`;唯一 blocker 是 `188 momo daily sales data stale beyond 3 days`。
|
||||
- 20:48 next-gate dispatch 使用同一份 summary 回傳 `DISPATCH_RC=2`、`SERVICE_GREEN=0`、`NEXT_REQUIRED_GATES=credential_escrow_evidence,wazuh_manager_registry_export`、`DISPATCH_AUTHORIZED=0`、`REQUEST_SENT_COUNT=0`、`HOST_WRITE_AUTHORIZED=0`、`SECRET_VALUE_COLLECTION_ALLOWED=0`,並停在 `NEXT_STEP=restore_service_before_boundary_dispatch`;因此目前不可把 escrow / Wazuh gates 當成已可送出的 owner packet。
|
||||
|
||||
**MOMO readback 結果**:
|
||||
- `scripts/reboot-recovery/momo-drive-token-source-recovery-preflight.sh` 結果:`PASS=20`、`WARN=3`、`BLOCKED=2`。
|
||||
- MOMO health:local / public health 皆 `200`,runtime version `V10.725`,app health `healthy`。
|
||||
- DB daily range:`109061|2025-07-01|2026-06-24`;freshness `3|2026-06-24`。
|
||||
- current monthly 與 sync snapshot parity:`15383|15383|2026-06-01|2026-06-24|2026-06-01|2026-06-24`。
|
||||
- latest import job `57`:`completed|即時業績_當日.xlsx|15383|15383|0`,表示 2026-06-25 13:16-13:18 的已匯入來源處理乾淨,但資料仍只到 `2026-06-24`。
|
||||
- Drive pending intake:`LOCAL_EXACT_DAILY_SOURCE_COUNT=0`;archive / global latest evidence 仍停在 `2026-06-25T04:21:47.000Z`。
|
||||
- `momo-scheduler` 36h log 顯示 Google Drive 連線成功、定期檢查 `當日業績匯入`,但多次回報找不到 Excel;scheduler 是 healthy / registered,不是目前 freshness blocker 的主因。
|
||||
- 188 與本機安全範圍檔名搜尋只找到舊 `即時業績_當日_20260112.xlsx` 候選,未找到可用於 2026-06-26 / 2026-06-27 的合法 daily-sales source。
|
||||
- 20:54 二次 preflight 仍為 `DRIVE_INTAKE_COUNT=0`、archive / global latest `2026-06-25T04:21:47.000Z`、DB daily freshness `3|2026-06-24`、latest import job `57 completed`;`momo-pro-system` / `momo-scheduler` containers 仍 healthy,且最近 scheduler log 只有排程註冊與一般 warning,沒有新 Excel 入站或成功匯入證據。
|
||||
|
||||
**DR / Wazuh gate readback**:
|
||||
- 110 `/backup/scripts/offsite-escrow-evidence-report.sh --no-color` 顯示 rclone offsite configured、full offsite marker fresh、local backup repos checkable,但 5 個 credential escrow marker 全缺:`restic_repository_password`、`offsite_provider_credentials`、`break_glass_admin_credentials`、`dns_registrar_recovery`、`oauth_ai_provider_recovery`。
|
||||
- `scripts/security/wazuh-manager-registry-reviewer-validation.py` 通過 repo contract validation,但 snapshot 仍是 `received=0 accepted=0 runtime_gate=0`;route / transport / index pattern 不能替代 manager registry accepted。
|
||||
- 本輪沒有寫 escrow marker,沒有產生 owner response,沒有查 Wazuh live API / secret,也沒有 Wazuh active response、agent re-enroll、restart、host write 或 Kali active scan。
|
||||
|
||||
**2026-06-27 21:00 gate 補強**:
|
||||
- 新增 `scripts/reboot-recovery/momo-source-arrival-gate.py`,只解析 `momo-drive-token-source-recovery-preflight.sh` 產出的 log 或 stdin,不連線、不查 token、不 import、不移動 Drive、不寫 DB。
|
||||
- 真實 20:54 preflight log 驗證:`MOMO_SOURCE_ARRIVAL_GATE status=blocked_source_absent_fail_closed source_intake=0 freshness=3|2026-06-24 safe_import_preflight_allowed=0 runtime_write_authorized=0 db_write_authorized=0 drive_move_authorized=0 next_step=wait_for_legitimate_daily_sales_source_then_rerun_gate`,exit code `2`。
|
||||
- 合成 source-arrived case 驗證:Drive intake count `1` 且 freshness stale 時,只回 `source_arrived_ready_for_safe_import_preflight`、`safe_import_preflight_allowed=1`,仍固定 `runtime_write_authorized=0`、`db_write_authorized=0`、`drive_move_authorized=0`。
|
||||
- 合成 freshness-green case 驗證:freshness `1|2026-06-26` 時回 `freshness_already_green_recheck_cold_start`,下一步仍是重跑 post-reboot summary,不得直接宣告 full green。
|
||||
|
||||
**結論**:
|
||||
- 目前狀態是 `SERVICE_BLOCKED_MOMO_SOURCE_ABSENCE` / `SOURCE_ABSENT_FAIL_CLOSED`,不是 runner、Docker、Nginx、K3s 或 scheduler 事故。
|
||||
- 禁止用舊 archive、舊 sample、本機舊檔、手寫 DB、truncate / restore 或 manual Drive movement 製造 freshness 假綠。
|
||||
- 解除 blocker 需要新的合法 `即時業績_當日` source 出現在 `當日業績匯入`,或 owner-approved safe source evidence ref;之後才可在 maintenance-safe path 執行匯入,並要求 `sync_success=true`、source 只在成功後移動、daily snapshot / realtime monthly bounds 一致、freshness `<=2`,再重跑 cold-start scorecard。
|
||||
|
||||
**下一步**:
|
||||
- 保持 fail-closed,等待合法來源到位後做 read-only preflight recheck。
|
||||
- 若有 owner-approved source evidence ref,另開 maintenance window 走安全匯入路徑;仍不得在沒有來源證據時宣告 all-green。
|
||||
|
||||
## 2026-06-27|110 Gitea runner 降壓防回彈與 workflow label 收斂
|
||||
|
||||
**背景**:110 CPU 事故已確認主因是 Gitea runner 反覆拉起 StockPlatform headless Chrome smoke;前一輪已停止 `gitea-act-runner-host.service`、清掉 Actions / smoke,並把 live runner labels 收斂為 `awoooi-ubuntu` / `awoooi-host`。本輪目標是防止 cold-start / startup 流程把 runner 又自動拉起,並補齊 AWOOI workflow label 與 post-deploy pressure gate。
|
||||
|
||||
**完成內容**:
|
||||
- `.gitea/workflows/cd.yaml` 的 `post-deploy-checks` 在 checkout 後新增 `Wait for Host Web Build Pressure`,避免 Alert Chain / Source Link / Monitoring / Playwright smoke 疊到 110 既有 build / smoke / load 壓力。
|
||||
- `.gitea/workflows/ansible-lint.yml` 從 `self-hosted` 收斂為 `awoooi-ubuntu`;AWOOI workflows 目前只剩 `awoooi-ubuntu` / `awoooi-host` 兩類 label。
|
||||
- `scripts/reboot-recovery/awoooi-startup-110.sh` 改成預設不自動啟動 Gitea host runner;只有明確設定 `AWOOOI_START_GITEA_RUNNER_ON_BOOT=1` 才允許 startup 拉起 runner。
|
||||
- live `/usr/local/bin/awoooi-startup-110.sh` 已安裝新版,舊檔備份為 `/usr/local/bin/awoooi-startup-110.sh.bak-20260627-runner-inactive`;本輪沒有執行 startup script,也沒有重啟 runner。
|
||||
- closeout 時發現 Docker-wrapped `gitea-runner` 短暫回彈為 running;確認 active task containers `0` 後,已只針對 `gitea-runner` 執行 `docker update --restart=no` 與 `docker stop -t 60`,恢復 `Restart=no Status=exited Running=false`。
|
||||
- `ops/runner/audit-workflow-labels.py` 修正 local fallback,沒有 Gitea auth 但指定 `--local-repo` 時不再輸出假空白。
|
||||
- `ops/runner/check-runner-isolation-readiness.sh` 認得 `awoooi-ubuntu`,避免把新 label 誤判成 unknown / mixed owner。
|
||||
- `ops/runner/README.md` 更新 2026-06-27 runner 降壓狀態、hard-fail pressure gate、startup 開關與 workflow label 邊界。
|
||||
|
||||
**驗證結果**:
|
||||
- `bash -n scripts/reboot-recovery/awoooi-startup-110.sh scripts/ci/wait-host-web-build-pressure.sh ops/runner/check-runner-isolation-readiness.sh ops/runner/audit-runner-pool.sh`:通過。
|
||||
- `python3 -m py_compile ops/runner/audit-workflow-labels.py scripts/ops/host-runaway-process-exporter.py`:通過。
|
||||
- Gitea workflow YAML parse:10 個 workflow 全部通過。
|
||||
- `rg "runs-on: (ubuntu-latest|self-hosted|ubuntu-22.04|ubuntu-24.04)" .gitea/workflows`:無命中。
|
||||
- `ops/runner/audit-workflow-labels.py --repo wooo/awoooi --local-repo wooo/awoooi=/Users/ogt/awoooi`:labels 只剩 `awoooi-host` / `awoooi-ubuntu`。
|
||||
- 110 readback:`gitea-act-runner-host.service=inactive`、Actions containers `0`、active CI groups `0`、StockPlatform orphan groups `0`。
|
||||
- Docker-wrapped `gitea-runner`:`Restart=no Status=exited Running=false`。
|
||||
- 110 readiness:primary labels `awoooi-ubuntu` / `awoooi-host` 均為 `awoooi_dedicated`,`mixed_owner_classes=0`,active action containers `none`。
|
||||
- 110 pressure gate 目前 `GATE_RC=1`,原因是 `load5/core 0.886667 > 0.85`;top process 顯示主要是 `restic` 6h backup,不是 Gitea Actions / Chrome smoke 事故復燃。
|
||||
- 110 local Gitea / Sentry / Alertmanager / Grafana health readback:`200 / 302 / 200 / 200`。
|
||||
|
||||
**邊界與下一步**:
|
||||
- runner inactive 是刻意降壓;未完成限流 / 搬遷前不可直接重開。
|
||||
- 本輪未重啟 Docker / Nginx / firewall / K3s,未 kill process,未讀 raw sessions / SQLite / auth / secret。
|
||||
- 下一個 P0:把 StockPlatform smoke 改成排程限流或搬到非 110 runner;再做全主機 cold-start scorecard 與資料 freshness readback。
|
||||
|
||||
## 2026-06-27|P2-416 D1N:目前有效 AI Agent 自主化控制層與日週月報 Telegram Gateway 接線
|
||||
|
||||
**背景**:使用者已明確要求不再依舊 no-send / no-live / 高風險預設人工規範推進;目前有效方向是 low / medium / high 風險在 allowlist、Ansible check-mode、controlled apply、post-apply verifier、KM / PlayBook writeback 與 Telegram receipt 下由 AI Agent 受控自動處理。critical / secret / destructive / reboot / node drain / provider switch / force push 等仍維持 hard blocker。
|
||||
|
||||
@@ -296,6 +296,8 @@ NO-GO: truncate, whole-DB restore, manual Drive movement, or manual import witho
|
||||
UNBLOCK: new legitimate PChome daily-sales source appears in 當日業績匯入 or an owner-approved safe import path; import job succeeds with sync_success=true; source file moves only after success; daily_sales_snapshot and realtime_sales_monthly bounds match; MOMO_DAILY_FRESHNESS <= 2.
|
||||
```
|
||||
|
||||
2026-06-27 起,若已有 `momo-drive-token-source-recovery-preflight.sh` log,先跑 `python3 scripts/reboot-recovery/momo-source-arrival-gate.py --preflight-log <log>` 做機器判讀:`blocked_source_absent_fail_closed` 代表繼續等合法來源;`source_arrived_ready_for_safe_import_preflight` 只代表可進另一個 safe import preflight,不代表 DB write、Drive move、manual import 或 runtime write 已授權;`freshness_already_green_recheck_cold_start` 仍必須重跑同一 evidence chain 的 post-reboot summary 後才能更新恢復宣告。
|
||||
|
||||
所有回報必須使用這組詞,避免把「服務面可用」誤報成「整體 DR 完成」。
|
||||
|
||||
### 0.3 Codex 工作站交接判定
|
||||
|
||||
@@ -132,9 +132,9 @@ runner:
|
||||
|
||||
| Job | runner label | 用途 |
|
||||
|-----|--------------|------|
|
||||
| `tests` | `ubuntu-latest` | API unit + B5 integration tests,仍跑在 ci-runner container |
|
||||
| `tests` | `awoooi-host` | API unit + B5 integration tests,直接跑在 110 host runner |
|
||||
| `build-and-deploy` | `awoooi-host` | Harbor login、API/Web image build/push、GitOps deploy,直接跑在 110 host |
|
||||
| `post-deploy-checks` | `ubuntu-latest` | Alert chain、monitoring coverage、Playwright smoke |
|
||||
| `post-deploy-checks` | `awoooi-host` | Alert chain、monitoring coverage、Playwright smoke |
|
||||
|
||||
110 只保留 host-level `act_runner` daemon,並在同一份 config 宣告兩類 label:
|
||||
|
||||
@@ -143,9 +143,7 @@ runner:
|
||||
capacity: 1
|
||||
shutdown_timeout: 1h
|
||||
labels:
|
||||
- "ubuntu-latest:docker://192.168.0.110:5000/awoooi/ci-runner:act-22.04"
|
||||
- "ubuntu-22.04:docker://192.168.0.110:5000/awoooi/ci-runner:act-22.04"
|
||||
- "ubuntu-24.04:docker://192.168.0.110:5000/awoooi/ci-runner:act-22.04"
|
||||
- "awoooi-ubuntu:docker://192.168.0.110:5000/awoooi/ci-runner:act-22.04"
|
||||
- "awoooi-host:host"
|
||||
```
|
||||
|
||||
@@ -208,15 +206,27 @@ AWOOI 的 Docker lock,會和 AWOOI Web image 內的 Next production build 疊
|
||||
- 只讀取 `ps`,不 kill / renice / reset 任何外部 process。
|
||||
- 排除 AWOOI 自身 checkout、local worktree 與 Web Docker build 內的
|
||||
`/app/apps/web` process,避免誤判自己的部署。
|
||||
- 預設最多等待 60 次、每次 10 秒;若仍有外部 build,先以 warning 放行,
|
||||
避免 CD 永久卡住。
|
||||
- 可用 `HOST_WEB_BUILD_PRESSURE_WARN_ONLY=0` 改成 hard fail,但必須先確認
|
||||
runner 隔離與其他 repo build 排程已收斂,避免把 shared runner 壓力轉成
|
||||
部署中斷。
|
||||
- 預設最多等待 60 次、每次 10 秒;若仍有外部 build / smoke / CI 壓力,
|
||||
hard fail,避免繼續把新的 browser smoke 疊到 production host。
|
||||
- 只有明確設定 `HOST_WEB_BUILD_PRESSURE_WARN_ONLY=1` 才 warning 放行;這只能
|
||||
用在已確認壓力來源可接受的受控補跑。
|
||||
|
||||
長期方向仍是 runner 隔離或 build offload;此 gate 是在 shared runner 尚未
|
||||
拆分前,降低重型前端 build 互相踩踏的保守保護層。
|
||||
|
||||
### 第四層補充: startup 不自動重開 Gitea runner
|
||||
|
||||
2026-06-27 110 CPU 事故止血後,`gitea-act-runner-host.service` 維持 inactive 是
|
||||
刻意降壓狀態。`scripts/reboot-recovery/awoooi-startup-110.sh` 仍可修正 runner
|
||||
`shutdown_timeout` 與 labels,也會停用 legacy Docker runner,但預設不會啟動
|
||||
host runner。只有明確設定下列開關時才允許 startup 拉起 runner:
|
||||
|
||||
```bash
|
||||
AWOOOI_START_GITEA_RUNNER_ON_BOOT=1 /usr/local/bin/awoooi-startup-110.sh
|
||||
```
|
||||
|
||||
未完成 runner 限流 / 搬遷前,不要把這個開關加入 systemd environment。
|
||||
|
||||
### 第五層修復: legacy Docker runner drain
|
||||
|
||||
2026-05-21 再次確認 110 同時存在兩個 runner:
|
||||
@@ -370,6 +380,12 @@ runner registration / service:
|
||||
|
||||
三個 split runner smoke 都通過後,才 drain primary runner 並移除混合 labels。
|
||||
|
||||
2026-06-27 live update:110 的 `gitea-act-runner-host.service` 已刻意停在
|
||||
`inactive`;`/home/wooo/act-runner/config.yaml` labels 已收斂為
|
||||
`awoooi-ubuntu` 與 `awoooi-host`,capacity 仍為 `1`。這是降壓與 label isolation
|
||||
狀態;AWOOI workflows 也應只使用 `awoooi-ubuntu` 或 `awoooi-host`,不可再使用
|
||||
`ubuntu-latest` / `self-hosted` 這類泛用 label。這不代表 runner 搬遷完成,也不代表可以直接重開 runner。
|
||||
|
||||
---
|
||||
版本: v2.0 | 更新: 2026-03-29 | 作者: Claude Code
|
||||
變更: v1.0→v2.0 序列建構取代 Job Concurrency Groups
|
||||
|
||||
@@ -179,7 +179,7 @@ def fetch_local_labels(repo: str, branch: str, repo_path: Path) -> tuple[list[Wo
|
||||
|
||||
def label_owner(label: str) -> str:
|
||||
value = label.strip().strip("'\"")
|
||||
if value == "awoooi-host":
|
||||
if value in {"awoooi-host", "awoooi-ubuntu"}:
|
||||
return "awoooi_dedicated"
|
||||
if value == "ewoooc-host":
|
||||
return "foreign_dedicated"
|
||||
@@ -234,7 +234,13 @@ def main() -> int:
|
||||
error: str | None = None
|
||||
if auth is not None:
|
||||
repo_labels, error = fetch_gitea_labels(repo, args.branch, auth)
|
||||
elif repo not in local_paths:
|
||||
elif repo in local_paths:
|
||||
repo_labels, local_error = fetch_local_labels(repo, args.branch, local_paths[repo])
|
||||
if local_error:
|
||||
errors.append(f"{repo}: {local_error}")
|
||||
labels.extend(repo_labels)
|
||||
continue
|
||||
else:
|
||||
error = "gitea_auth_unavailable"
|
||||
|
||||
if error and repo in local_paths:
|
||||
|
||||
@@ -70,7 +70,7 @@ label_owner() {
|
||||
local label="$1"
|
||||
local label_name="${label%%:*}"
|
||||
case "$label_name" in
|
||||
awoooi-host)
|
||||
awoooi-host|awoooi-ubuntu|awoooi-*)
|
||||
printf 'awoooi_dedicated'
|
||||
;;
|
||||
ewoooc-host)
|
||||
|
||||
@@ -184,15 +184,18 @@ fi
|
||||
# ──────────────────────────────────────────────
|
||||
# STEP 6: Gitea Act Runner(CI/CD 核心)
|
||||
# 2026-04-05 Claude Code: 加入 — 解決重開機後 Gitea runner 離線、CD 失效
|
||||
# 重要:必須在 Gitea server 啟動後才能啟動 runner
|
||||
# 2026-06-27 Codex: 110 是 production / registry / observability 主機;
|
||||
# runner 預設維持停用降壓,未完成限流 / 搬遷前不可在 startup 自動拉起。
|
||||
# ──────────────────────────────────────────────
|
||||
log "[6/6] 啟動 Gitea Act Runner..."
|
||||
log "[6/6] 檢查 Gitea Act Runner(預設不自動啟動)..."
|
||||
RUNNER_DIR="/home/wooo/act-runner"
|
||||
RUNNER_SERVICE="gitea-act-runner-host.service"
|
||||
START_GITEA_RUNNER_ON_BOOT="${AWOOOI_START_GITEA_RUNNER_ON_BOOT:-0}"
|
||||
if [ -x "$RUNNER_DIR/act_runner" ] && [ -f "$RUNNER_DIR/config.yaml" ]; then
|
||||
# 若舊的 .runner 配置指向過期 hostname,先清除讓 runner 重新註冊
|
||||
# 若舊的 .runner 配置指向過期 hostname,只有在明確允許啟動 runner
|
||||
# 時才清除重新註冊;預設降壓模式不得碰 registration 狀態。
|
||||
RUNNER_FILE="$RUNNER_DIR/data/.runner"
|
||||
if [ -f "$RUNNER_FILE" ]; then
|
||||
if [ "$START_GITEA_RUNNER_ON_BOOT" = "1" ] && [ -f "$RUNNER_FILE" ]; then
|
||||
OLD_URL=$(python3 -c "import json; d=json.load(open('$RUNNER_FILE')); print(d.get('address',''))" 2>/dev/null || echo "")
|
||||
if [ "$OLD_URL" != "http://192.168.0.110:3001" ]; then
|
||||
log "⚠️ runner 配置過期 ($OLD_URL),清除重新註冊..."
|
||||
@@ -248,10 +251,14 @@ while idx < len(lines):
|
||||
path.write_text("\n".join(output) + "\n")
|
||||
PY
|
||||
|
||||
if systemctl list-unit-files "$RUNNER_SERVICE" >/dev/null 2>&1; then
|
||||
systemctl enable --now "$RUNNER_SERVICE" >/dev/null 2>&1 || true
|
||||
elif ! pgrep -f "$RUNNER_DIR/act_runner daemon" >/dev/null; then
|
||||
nohup "$RUNNER_DIR/run-host-runner.sh" >> "$RUNNER_DIR/host-runner.log" 2>&1 &
|
||||
if [ "$START_GITEA_RUNNER_ON_BOOT" = "1" ]; then
|
||||
if systemctl list-unit-files "$RUNNER_SERVICE" >/dev/null 2>&1; then
|
||||
systemctl enable --now "$RUNNER_SERVICE" >/dev/null 2>&1 || true
|
||||
elif ! pgrep -f "$RUNNER_DIR/act_runner daemon" >/dev/null; then
|
||||
nohup "$RUNNER_DIR/run-host-runner.sh" >> "$RUNNER_DIR/host-runner.log" 2>&1 &
|
||||
fi
|
||||
else
|
||||
log "⏸️ Gitea host runner 維持停用;設定 AWOOOI_START_GITEA_RUNNER_ON_BOOT=1 才允許 startup 啟動"
|
||||
fi
|
||||
|
||||
# 已停用 Docker-wrapped runner;避免它搶走 host label job。
|
||||
@@ -269,9 +276,11 @@ PY
|
||||
|
||||
# 驗證 runner 已連線 Gitea
|
||||
if pgrep -f "$RUNNER_DIR/act_runner daemon" >/dev/null; then
|
||||
log "✅ Gitea host act_runner 已啟動"
|
||||
else
|
||||
log "⚠️ Gitea host act_runner 目前正在執行;請確認是否為受控限流 / 搬遷後狀態"
|
||||
elif [ "$START_GITEA_RUNNER_ON_BOOT" = "1" ]; then
|
||||
log "⚠️ Gitea host act_runner 可能尚未啟動,查看: $RUNNER_DIR/host-runner.log"
|
||||
else
|
||||
log "✅ Gitea host act_runner 維持 inactive 降壓狀態"
|
||||
fi
|
||||
else
|
||||
log "⚠️ 找不到 act-runner binary/config: $RUNNER_DIR"
|
||||
|
||||
253
scripts/reboot-recovery/momo-source-arrival-gate.py
Executable file
253
scripts/reboot-recovery/momo-source-arrival-gate.py
Executable file
@@ -0,0 +1,253 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Classify MOMO daily-sales source arrival from a read-only preflight log.
|
||||
|
||||
This parser never connects to MOMO, never imports files, never moves Drive
|
||||
artifacts, and never authorizes DB / host / Drive writes. It turns the existing
|
||||
`momo-drive-token-source-recovery-preflight.sh` evidence into a compact gate so
|
||||
operators can tell whether they should keep waiting for a legitimate source or
|
||||
start a separate safe-import preflight.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import re
|
||||
import sys
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
|
||||
EXPECTED_IMPORT_CONFIG = "當日業績匯入|即時業績_當日"
|
||||
SUMMARY_RE = re.compile(
|
||||
r"^MOMO_DRIVE_TOKEN_SOURCE_PREFLIGHT "
|
||||
r"PASS=(?P<pass>\d+) WARN=(?P<warn>\d+) BLOCKED=(?P<blocked>\d+) "
|
||||
r"HOST=(?P<host>\S+) FRESHNESS_MAX_DAYS=(?P<freshness_max_days>\d+)"
|
||||
)
|
||||
|
||||
|
||||
def parse_args() -> argparse.Namespace:
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Classify MOMO source-arrival readiness from preflight output.",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--preflight-log",
|
||||
required=True,
|
||||
help="Path to momo-drive-token-source-recovery-preflight output, or '-' for stdin.",
|
||||
)
|
||||
parser.add_argument("--json", action="store_true", help="Print JSON result.")
|
||||
return parser.parse_args()
|
||||
|
||||
|
||||
def load_text(source: str) -> str:
|
||||
if source == "-":
|
||||
return sys.stdin.read()
|
||||
return Path(source).read_text(encoding="utf-8")
|
||||
|
||||
|
||||
def parse_int(value: Any, default: int | None = None) -> int | None:
|
||||
try:
|
||||
return int(str(value).strip())
|
||||
except (TypeError, ValueError):
|
||||
return default
|
||||
|
||||
|
||||
def parse_pipe(value: str, expected_parts: int) -> list[str]:
|
||||
parts = str(value or "").split("|")
|
||||
if len(parts) < expected_parts:
|
||||
parts.extend([""] * (expected_parts - len(parts)))
|
||||
return parts[:expected_parts]
|
||||
|
||||
|
||||
def parse_preflight(text: str) -> dict[str, Any]:
|
||||
values: dict[str, str] = {}
|
||||
messages = {"ok": [], "warn": [], "blocked": []}
|
||||
summary: dict[str, Any] = {}
|
||||
|
||||
for raw_line in text.splitlines():
|
||||
line = raw_line.strip()
|
||||
if not line:
|
||||
continue
|
||||
summary_match = SUMMARY_RE.match(line)
|
||||
if summary_match:
|
||||
summary = {
|
||||
key: parse_int(value) if key != "host" else value
|
||||
for key, value in summary_match.groupdict().items()
|
||||
}
|
||||
continue
|
||||
if line.startswith("OK: "):
|
||||
messages["ok"].append(line[4:])
|
||||
continue
|
||||
if line.startswith("WARN: "):
|
||||
messages["warn"].append(line[6:])
|
||||
continue
|
||||
if line.startswith("BLOCKED: "):
|
||||
messages["blocked"].append(line[9:])
|
||||
continue
|
||||
if re.match(r"^[A-Z][A-Z0-9_]+(?:\s|$)", line):
|
||||
key, _, value = line.partition(" ")
|
||||
values[key] = value.strip()
|
||||
|
||||
return {"values": values, "messages": messages, "summary": summary}
|
||||
|
||||
|
||||
def monthly_sync_ok(value: str) -> bool:
|
||||
snapshot_count, monthly_count, dmin, dmax, mmin, mmax = parse_pipe(value, 6)
|
||||
snapshot_n = parse_int(snapshot_count, 0) or 0
|
||||
return (
|
||||
snapshot_n > 0
|
||||
and snapshot_count == monthly_count
|
||||
and bool(dmin)
|
||||
and bool(dmax)
|
||||
and dmin == mmin
|
||||
and dmax == mmax
|
||||
)
|
||||
|
||||
|
||||
def latest_import_clean(value: str) -> bool:
|
||||
job_id, status, _file_name, _created, _completed, total, success, errors = parse_pipe(
|
||||
value, 8
|
||||
)
|
||||
return (
|
||||
parse_int(job_id) is not None
|
||||
and status == "completed"
|
||||
and parse_int(total, -1) == parse_int(success, -2)
|
||||
and parse_int(errors, -1) == 0
|
||||
)
|
||||
|
||||
|
||||
def classify(parsed: dict[str, Any]) -> dict[str, Any]:
|
||||
values = parsed["values"]
|
||||
summary = parsed["summary"]
|
||||
messages = parsed["messages"]
|
||||
|
||||
freshness_days_text, latest_daily_date = parse_pipe(values.get("DB_DAILY_FRESHNESS", ""), 2)
|
||||
freshness_days = parse_int(freshness_days_text)
|
||||
freshness_max_days = parse_int(summary.get("freshness_max_days"), 2) or 2
|
||||
drive_intake_count = parse_int(values.get("DRIVE_INTAKE_COUNT"), 0) or 0
|
||||
drive_failed_count = parse_int(values.get("DRIVE_FAILED_COUNT"), 0) or 0
|
||||
drive_archive_latest = values.get("DRIVE_ARCHIVE_LATEST_MODIFIED", "none") or "none"
|
||||
drive_global_latest = values.get("DRIVE_GLOBAL_LATEST_MODIFIED", "none") or "none"
|
||||
|
||||
service_ready = (
|
||||
values.get("MOMO_PUBLIC_HEALTH_CODE") == "200"
|
||||
and values.get("MOMO_HEALTH_CODE") == "200"
|
||||
and values.get("MOMO_APP_HEALTH") == "healthy"
|
||||
and values.get("SCHEDULER_RUNNING") == "true"
|
||||
and values.get("SCHEDULER_HEALTH") == "healthy"
|
||||
)
|
||||
import_config_ok = EXPECTED_IMPORT_CONFIG in values.get("IMPORT_CONFIG", "")
|
||||
sync_ok = monthly_sync_ok(values.get("DB_MONTHLY_SYNC", ""))
|
||||
clean_import = latest_import_clean(values.get("DB_LATEST_DAILY_IMPORT_JOB", ""))
|
||||
freshness_green = (
|
||||
freshness_days is not None and 0 <= freshness_days <= freshness_max_days
|
||||
)
|
||||
freshness_stale = freshness_days is not None and freshness_days > freshness_max_days
|
||||
|
||||
blockers: list[str] = []
|
||||
warnings: list[str] = []
|
||||
status = "blocked_preflight_evidence_incomplete"
|
||||
next_step = "rerun_momo_drive_token_source_recovery_preflight"
|
||||
safe_import_preflight_allowed = False
|
||||
exit_code = 2
|
||||
|
||||
if not summary:
|
||||
blockers.append("preflight_summary_missing")
|
||||
if not service_ready:
|
||||
blockers.append("momo_service_or_scheduler_not_ready")
|
||||
if not import_config_ok:
|
||||
blockers.append("drive_import_config_not_expected_intake")
|
||||
if not sync_ok:
|
||||
blockers.append("current_month_snapshot_realtime_sync_not_proven")
|
||||
if drive_failed_count > 0:
|
||||
warnings.append("drive_failed_folder_has_matching_candidates")
|
||||
|
||||
if blockers:
|
||||
status = "blocked_service_or_evidence_not_ready"
|
||||
next_step = "repair_readonly_preflight_evidence_before_source_or_import_decision"
|
||||
elif freshness_green:
|
||||
status = "freshness_already_green_recheck_cold_start"
|
||||
next_step = "rerun_post_reboot_readiness_summary_with_same_evidence_chain"
|
||||
exit_code = 0
|
||||
elif drive_intake_count > 0 and freshness_stale:
|
||||
status = "source_arrived_ready_for_safe_import_preflight"
|
||||
next_step = "run_owner_approved_safe_import_preflight_no_db_or_drive_write_yet"
|
||||
safe_import_preflight_allowed = True
|
||||
exit_code = 0
|
||||
elif drive_intake_count > 0:
|
||||
status = "source_arrived_freshness_unknown_recheck_before_import"
|
||||
next_step = "rerun_momo_preflight_and_validate_freshness_before_import"
|
||||
safe_import_preflight_allowed = True
|
||||
exit_code = 1
|
||||
elif freshness_stale:
|
||||
status = "blocked_source_absent_fail_closed"
|
||||
next_step = "wait_for_legitimate_daily_sales_source_then_rerun_gate"
|
||||
else:
|
||||
status = "blocked_freshness_unknown_fail_closed"
|
||||
next_step = "rerun_preflight_or_repair_readonly_freshness_readback"
|
||||
|
||||
if not clean_import:
|
||||
warnings.append("latest_daily_import_job_not_clean_completed")
|
||||
|
||||
return {
|
||||
"schema_version": "momo_source_arrival_gate_v1",
|
||||
"status": status,
|
||||
"exit_code": exit_code,
|
||||
"next_step": next_step,
|
||||
"safe_import_preflight_allowed": safe_import_preflight_allowed,
|
||||
"runtime_write_authorized": False,
|
||||
"db_write_authorized": False,
|
||||
"drive_move_authorized": False,
|
||||
"manual_import_authorized": False,
|
||||
"secret_value_collection_allowed": False,
|
||||
"service_ready": service_ready,
|
||||
"import_config_ok": import_config_ok,
|
||||
"current_month_sync_ok": sync_ok,
|
||||
"latest_import_clean": clean_import,
|
||||
"freshness_days": freshness_days,
|
||||
"freshness_latest_date": latest_daily_date or "unknown",
|
||||
"freshness_max_days": freshness_max_days,
|
||||
"drive_intake_count": drive_intake_count,
|
||||
"drive_archive_latest_modified": drive_archive_latest,
|
||||
"drive_global_latest_modified": drive_global_latest,
|
||||
"drive_failed_count": drive_failed_count,
|
||||
"preflight_pass": summary.get("pass", 0),
|
||||
"preflight_warn": summary.get("warn", len(messages["warn"])),
|
||||
"preflight_blocked": summary.get("blocked", len(messages["blocked"])),
|
||||
"blockers": blockers,
|
||||
"warnings": warnings,
|
||||
"no_false_green_rules": [
|
||||
"source_arrived_does_not_authorize_import",
|
||||
"safe_import_preflight_allowed_does_not_authorize_db_write",
|
||||
"freshness_green_requires_post_reboot_summary_recheck",
|
||||
"archive_or_local_old_file_does_not_count_as_new_source",
|
||||
],
|
||||
}
|
||||
|
||||
|
||||
def print_human(result: dict[str, Any]) -> None:
|
||||
print(
|
||||
"MOMO_SOURCE_ARRIVAL_GATE "
|
||||
f"status={result['status']} "
|
||||
f"source_intake={result['drive_intake_count']} "
|
||||
f"freshness={result['freshness_days']}|{result['freshness_latest_date']} "
|
||||
f"safe_import_preflight_allowed={int(result['safe_import_preflight_allowed'])} "
|
||||
"runtime_write_authorized=0 "
|
||||
"db_write_authorized=0 "
|
||||
"drive_move_authorized=0 "
|
||||
f"next_step={result['next_step']}"
|
||||
)
|
||||
|
||||
|
||||
def main() -> int:
|
||||
args = parse_args()
|
||||
result = classify(parse_preflight(load_text(args.preflight_log)))
|
||||
if args.json:
|
||||
print(json.dumps(result, ensure_ascii=False, indent=2, sort_keys=True))
|
||||
else:
|
||||
print_human(result)
|
||||
return int(result["exit_code"])
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
raise SystemExit(main())
|
||||
Reference in New Issue
Block a user