fix(reboot): enforce direct runner fail-closed guard [skip ci]

This commit is contained in:
Your Name
2026-06-28 09:16:43 +08:00
parent f8e2b39ab3
commit 8f402983ee
5 changed files with 186 additions and 4 deletions

View File

@@ -20,6 +20,25 @@
- OpenClaw 仍維持 production decision core替換前必須 replay / shadow / canary / ADR。
- SDK install、API shadow / canary、production route、paid provider / cost route、external active security scan、secret value / credential URL / raw env、DB destructive / backup restore、force push / repo refs deletion 仍不得被本段 controlled queue 直接打開。
## 2026-06-28 — 09:16 direct runner source guard 實作收斂
**背景**09:00 前的 live hotfix 已把 110 上 direct / Gitea runner 全部 mask`awoooi-startup-110.sh`、cold-start 與 P3 release gate 還沒有把 `awoooi-direct-runner-open.service` 這條 transient direct runner 路徑納入 source-level guard。
**完成內容**
- `scripts/reboot-recovery/awoooi-startup-110.sh` 新增 `RUNNER_FAIL_CLOSED_SERVICES``RUNNER_FAIL_CLOSED_BINARY_PATHS`,預設未同時具備 `AWOOOI_START_GITEA_RUNNER_ON_BOOT=1``/run/awoooi-runner-host-enabled` 時,會強制 kill / disable / mask direct runner 與 Gitea runner units並把 live runner ELF quarantine 成 163-byte fail-closed stub。
- `scripts/reboot-recovery/full-stack-cold-start-check.sh` 新增 110 runner fail-closed readbackdirect / Gitea units 必須 `load=masked unitfile=masked active=inactive`direct runner process count 必須 `0`runner binary 不得是 ELF。
- `scripts/reboot-recovery/post-start-quick-check.sh` 新增 `110 runner fail-closed guard` section並以 `HOST_WEB_BUILD_PRESSURE_ATTEMPTS=1` 讀回 pressure gate。
- `scripts/reboot-recovery/p3-controlled-release-gate.sh` 將 direct runner fail-closed 狀態納入 `BAD_RUNNER_GUARDRAILS`,避免 P3 release gate 只看 `actions.runner.*` 而漏掉 transient direct runner。
- Live `/usr/local/bin/awoooi-startup-110.sh` 已更新並加 immutable讀回 `LIVE_STARTUP_DIRECT_UNIT=1``LIVE_STARTUP_GUARD_FUNC=2``LIVE_STARTUP_DEFAULT=failclosed`
**驗證結果**
- 本地:`bash -n` 通過 `awoooi-startup-110.sh``full-stack-cold-start-check.sh``post-start-quick-check.sh``p3-controlled-release-gate.sh``git diff --check` 通過direct runner source invariant 通過。
- quick-check runner-only`POST_START_QUICK_CHECK PASS=13 WARN=0 BLOCKED=0``RESULT=GREEN`;六個 runner/direct units 全部 masked / inactive、runner process `0`、四條 binary path 皆為 shell stub、pressure gate `RUNNER_PRESSURE_GATE_RC 0`
- cold-start 單次讀回runner guard OK整體仍 `PASS=90 WARN=1 BLOCKED=1``Result: BLOCKED`blocker 是 `188 momo daily sales data stale beyond 3 days`,不是 runner。
- P3 release gaterunner/CD guardrails 顯示 `BAD_RUNNER_GUARDRAILS 0`;整體仍 `HOLD_P3_RELEASE`blockers 包含 cold-start、188 backup stale、188 litellm not running。
**邊界**:本段沒有重啟 Docker / Nginx / firewall / K3s / DB沒有讀 raw sessions / SQLite / auth / `.env` / runner token也沒有恢復 110 runner。
## 2026-06-28 — 08:45 110 runner 壓力事故 source / live fail-closed 收斂
**背景**:統帥全面授權打開非事故級 gate但 110 Gitea runner 反覆拉起 StockPlatform headless Chrome smoke已造成 production host CPU / CI 壓力事故runner 未搬遷 / 限流前不得直接重開。