fix(recovery): harden runner failclosed authority copy [skip ci]
This commit is contained in:
@@ -46,7 +46,7 @@
|
||||
|
||||
正確動作是 AI 自動補齊 target selector、source-of-truth diff、check-mode / dry-run、rollback、post-apply verifier、KM / PlayBook trust writeback,然後推進可驗證、可回滾、低爆炸半徑的實作。
|
||||
|
||||
**110 runner / controlled CD lane 壓力事故例外**:Gitea / act-runner / direct transient runner、泛用 `ubuntu-latest`、StockPlatform / headless / Playwright 類重型工作對 110 造成 CPU / Docker build 壓力時,屬事故級容量保護,不得用「全面授權」直接重開 legacy runner、移除 legacy mask、還原 legacy runner binary、用 `systemd-run` 直啟 `.real` binary,或把 host pressure gate 改成 warn-only。未完成 runner 搬遷或非 110 硬限流前,`awoooi-cd-lane.service`、`awoooi-cd-lane-drain.service`、direct runner 與 Gitea runner 必須由 `awoooi-runner-failclosed-enforcer.timer`、`awoooi-runner-failclosed-authority.timer` 與 `/etc/cron.d/awoooi-runner-failclosed-authority` 維持 masked / inactive / no process / no job container / root restore-source left `0`;舊 `/tmp/enforce-110-runner-failclosed.sh`、`/tmp/awoooi-enforce-runner-failclosed-110.sh*` enforcer source、startup open drop-in、`awoooi-runner-failclosed-opened-*`、`awoooi-runner-failclosed-*-opened-*`、`awoooi-runner-failclosed-quarantine-*`、`failclosed-final-mask-*` disabler artifact 與 restore-source 也必須封存或改成 fail-closed stub。Gitea `cd.yaml` / `code-review.yaml` push workflow 維持 manual-only。
|
||||
**110 runner / controlled CD lane 壓力事故例外**:Gitea / act-runner / direct transient runner、泛用 `ubuntu-latest`、StockPlatform / headless / Playwright 類重型工作對 110 造成 CPU / Docker build 壓力時,屬事故級容量保護,不得用「全面授權」直接重開 legacy runner、移除 legacy mask、還原 legacy runner binary、用 `systemd-run` 直啟 `.real` binary,或把 host pressure gate 改成 warn-only。未完成 runner 搬遷或非 110 硬限流前,`awoooi-cd-lane.service`、`awoooi-cd-lane-drain.service`、direct runner 與 Gitea runner 必須由 `awoooi-runner-failclosed-enforcer.timer`、`awoooi-runner-failclosed-authority.timer` 與 `/etc/cron.d/awoooi-runner-failclosed-authority` 維持 masked / inactive / no process / no job container / root restore-source left `0`;cron / systemd authority 必須執行 `/usr/local/lib/awoooi/enforce-110-runner-failclosed.authority.sh`,讓外部 opener 覆寫 canonical `/usr/local/lib/awoooi/enforce-110-runner-failclosed.sh` 時仍能自動修復。舊 `/tmp/enforce-110-runner-failclosed.sh`、`/tmp/awoooi-enforce-runner-failclosed-110.sh*` enforcer source、startup open drop-in、`awoooi-runner-failclosed-opened-*`、`awoooi-runner-failclosed-*-opened-*`、`awoooi-runner-failclosed-quarantine-*`、`failclosed-final-mask-*` disabler artifact 與 restore-source 也必須封存或改成 fail-closed stub。Gitea `cd.yaml` / `code-review.yaml` push workflow 維持 manual-only。
|
||||
|
||||
---
|
||||
|
||||
|
||||
@@ -291,7 +291,7 @@ force push / 刪 repo / 刪 refs / 改 repo visibility / raw runtime secret volu
|
||||
|
||||
2026-06-28 事故後,110 上的 Gitea / act-runner / direct transient runner、StockPlatform headless smoke、host-side Next build 與 Docker / BuildKit 壓力屬容量事故保護面。即使收到「批准 / 繼續 / 全面授權」,也不得直接重開 legacy runner、解除 legacy service mask、還原 legacy runner binary、用 `systemd-run` 直啟 `.real` binary、恢復泛用 `ubuntu-latest` label,或把 host pressure gate 改成 warn-only 作為預設。
|
||||
|
||||
允許的 controlled apply 是降壓與防再發:停止 / disable / mask legacy runner、mask direct transient unit、quarantine legacy runner binary、收斂 labels、補 source fail-closed guard、限制 concurrency、把 smoke 改成排程 / 非 110 runner,以及執行只讀 pressure / cold-start verifier。未完成 runner 搬遷或非 110 硬限流前,`awoooi-cd-lane.service`、`awoooi-cd-lane-drain.service`、direct runner 與 Gitea runner 必須由 `awoooi-runner-failclosed-enforcer.timer`、`awoooi-runner-failclosed-authority.timer` 與 `/etc/cron.d/awoooi-runner-failclosed-authority` 維持 masked / inactive / no process / no job container / root restore-source left `0`;若外部 opener 暫時恢復 unit,只能恢復成帶 `ConditionPathExists=/run/awoooi-runner-migrated-or-hard-limited` 的 fail-closed stub,下一輪 cron authority / authority / enforcer 必須再收斂回 masked / inactive。verifier 不得再接受單一 `controlled_open` lane。
|
||||
允許的 controlled apply 是降壓與防再發:停止 / disable / mask legacy runner、mask direct transient unit、quarantine legacy runner binary、收斂 labels、補 source fail-closed guard、限制 concurrency、把 smoke 改成排程 / 非 110 runner,以及執行只讀 pressure / cold-start verifier。未完成 runner 搬遷或非 110 硬限流前,`awoooi-cd-lane.service`、`awoooi-cd-lane-drain.service`、direct runner 與 Gitea runner 必須由 `awoooi-runner-failclosed-enforcer.timer`、`awoooi-runner-failclosed-authority.timer` 與 `/etc/cron.d/awoooi-runner-failclosed-authority` 維持 masked / inactive / no process / no job container / root restore-source left `0`;cron / systemd authority 必須執行 `/usr/local/lib/awoooi/enforce-110-runner-failclosed.authority.sh`,並用該 authority copy 修復 canonical `/usr/local/lib/awoooi/enforce-110-runner-failclosed.sh`。若外部 opener 暫時恢復 unit 或覆寫 canonical,只能恢復成帶 `ConditionPathExists=/run/awoooi-runner-migrated-or-hard-limited` 的 fail-closed stub,下一輪 cron authority / authority / enforcer 必須再收斂回 masked / inactive。verifier 不得再接受單一 `controlled_open` lane。
|
||||
|
||||
恢復 runner 必須同時具備:
|
||||
|
||||
|
||||
@@ -1,3 +1,14 @@
|
||||
## 2026-06-28 — 16:22 110 runner fail-closed authority copy 補強
|
||||
|
||||
**背景**:16:21 P3 release gate 又抓到短命外部 opener 把 `awoooi-cd-lane-drain.service` 恢復為 `enabled / activating`、把 fail-closed timers mask,並把 `/usr/local/lib/awoooi/enforce-110-runner-failclosed.sh` 覆寫成 disabled stub;原 cron authority 雖存在,但若 cron 指向被覆寫的 canonical,就會失去自動修復能力。
|
||||
|
||||
**完成內容**:
|
||||
- `scripts/reboot-recovery/enforce-110-runner-failclosed.sh` 新增 authority copy `/usr/local/lib/awoooi/enforce-110-runner-failclosed.authority.sh`;`--apply` 會同時安裝 / 修復 authority copy、canonical 與 compatibility wrapper。
|
||||
- `awoooi-runner-failclosed-enforcer.service`、`awoooi-runner-failclosed-authority.service` 與 `/etc/cron.d/awoooi-runner-failclosed-authority` 改為執行 authority copy,讓外部 opener 覆寫 canonical 時,下一輪 cron / systemd authority 仍可恢復 canonical、timer、unit mask、sentinel、binary stub 與 job container `0`。
|
||||
- `AGENTS.md`、`docs/HARD_RULES.md`、MASTER spec 與 `ops/runner/README.md` 同步固定:110 runner/CD 壓力事故期間,canonical 不是唯一信任根,authority copy 才是自動修復入口。
|
||||
|
||||
**邊界**:沒有讀 runner token / secret / raw session / SQLite / auth / `.env`;沒有重啟 Docker / Nginx / firewall / K3s / DB;沒有打開 legacy runner 或 controlled drain lane。
|
||||
|
||||
## 2026-06-28 — 15:20 IwoooS Wazuh live metadata owner packet no-persist validator
|
||||
|
||||
**完成內容**:
|
||||
|
||||
@@ -153,7 +153,7 @@ AWOOOI / AwoooP / IwoooS 不是單純監控頁、告警轉發器或資安清冊
|
||||
3. 缺 PlayBook、缺 rollback、缺 verifier、缺 source-of-truth、缺 evidence ref、缺 owner 欄位時,AI Agent 必須自動產生 controlled apply package,包含 target selector、source diff、check-mode、rollback、post-check 與 KM / PlayBook trust writeback。
|
||||
4. Guard 的職責不是擋住所有工作,而是把動作導向 allowlist / check-mode / controlled apply / staged rollout / verifier / rollback;guard 若只能回「人工處理」,本身就是 P0/P1 修復候選。
|
||||
5. 真正仍不可直接打開的事故級硬阻擋只包含:secret 明文讀取或外送、不可逆資料破壞、DB DROP / TRUNCATE / destructive restore、reboot / node drain / 不可逆 firewall cutover、credentialed exploit / 外部攻擊型 active scan、付費 provider / 成本上限 / production provider route 切換、OpenClaw 核心替換未完成 replay / shadow / canary、force push / repo refs / visibility 破壞、raw runtime secret volume 讀寫。
|
||||
6. 110 runner 容量事故屬硬保護例外:不得重開 legacy runner、解除 legacy fail-closed、恢復泛用 label 或把 host pressure gate warn-only;未完成 runner 搬遷或非 110 硬限流前,AWOOOI controlled CD lane / drain lane 也必須由 `awoooi-runner-failclosed-enforcer.timer`、`awoooi-runner-failclosed-authority.timer` 與 `/etc/cron.d/awoooi-runner-failclosed-authority` 維持 masked / inactive / no process / no job container / root restore-source left `0`,舊 `/tmp/enforce-110-runner-failclosed.sh`、`/tmp/awoooi-enforce-runner-failclosed-110.sh*` opener source、`awoooi-runner-failclosed-opened-*`、`awoooi-runner-failclosed-*-opened-*`、`awoooi-runner-failclosed-quarantine-*` 與 `failclosed-final-mask-*` disabler artifact 必須封成 fail-closed stub,workflow push trigger 維持 manual-only。
|
||||
6. 110 runner 容量事故屬硬保護例外:不得重開 legacy runner、解除 legacy fail-closed、恢復泛用 label 或把 host pressure gate warn-only;未完成 runner 搬遷或非 110 硬限流前,AWOOOI controlled CD lane / drain lane 也必須由 `awoooi-runner-failclosed-enforcer.timer`、`awoooi-runner-failclosed-authority.timer` 與 `/etc/cron.d/awoooi-runner-failclosed-authority` 維持 masked / inactive / no process / no job container / root restore-source left `0`,cron / systemd authority 必須執行 `/usr/local/lib/awoooi/enforce-110-runner-failclosed.authority.sh` 並修復 canonical `/usr/local/lib/awoooi/enforce-110-runner-failclosed.sh`,舊 `/tmp/enforce-110-runner-failclosed.sh`、`/tmp/awoooi-enforce-runner-failclosed-110.sh*` opener source、`awoooi-runner-failclosed-opened-*`、`awoooi-runner-failclosed-*-opened-*`、`awoooi-runner-failclosed-quarantine-*` 與 `failclosed-final-mask-*` disabler artifact 必須封成 fail-closed stub,workflow push trigger 維持 manual-only。
|
||||
7. 資料 freshness gate 必須 source-aware:若 Drive / provider source preflight 證明沒有比最後乾淨 import 更新的來源,且 DB sync / import job 乾淨,stale business data 是 source freshness warning;auth/source/failed-folder/DB sync 有異常才是 hard blocker。
|
||||
8. Provider proxy gate 必須避免成本 / route 誤開:未 provisioned 且 repo 已標 optional retired 的 LiteLLM 等 proxy,只能列 warning;不得為了過 health gate 自動啟動或切 production provider route。
|
||||
|
||||
|
||||
@@ -418,7 +418,9 @@ quarantine restore source 或 `systemd-run` 讓它們恢復 active。
|
||||
- `ops/runner/awoooi-runner-failclosed-authority.service`
|
||||
- `ops/runner/awoooi-runner-failclosed-authority.timer`
|
||||
|
||||
live 110 必須安裝 canonical `/usr/local/lib/awoooi/enforce-110-runner-failclosed.sh`,
|
||||
live 110 必須安裝 authority copy `/usr/local/lib/awoooi/enforce-110-runner-failclosed.authority.sh`
|
||||
與 canonical `/usr/local/lib/awoooi/enforce-110-runner-failclosed.sh`;cron / systemd authority 一律執行
|
||||
authority copy,讓外部 opener 覆寫 canonical 時仍可自動修復。
|
||||
`/usr/local/bin/awoooi-enforce-runner-failclosed-110.sh` 只作相容 wrapper。必須啟用
|
||||
`awoooi-runner-failclosed-enforcer.timer` 與 `awoooi-runner-failclosed-authority.timer`。
|
||||
`/etc/cron.d/awoooi-runner-failclosed-authority` 必須存在,作為 systemd timers 被短命外部 opener mask 掉時的第三層收斂 authority。
|
||||
|
||||
@@ -1,10 +1,10 @@
|
||||
[Unit]
|
||||
Description=AWOOOI 110 runner/CD lane fail-closed authority
|
||||
Documentation=file:/usr/local/lib/awoooi/enforce-110-runner-failclosed.sh
|
||||
Documentation=file:/usr/local/lib/awoooi/enforce-110-runner-failclosed.authority.sh
|
||||
Wants=network-online.target
|
||||
After=network-online.target docker.service
|
||||
|
||||
[Service]
|
||||
Type=oneshot
|
||||
ExecStart=/usr/local/lib/awoooi/enforce-110-runner-failclosed.sh --apply
|
||||
ExecStart=/usr/local/lib/awoooi/enforce-110-runner-failclosed.authority.sh --apply
|
||||
TimeoutStartSec=180
|
||||
|
||||
@@ -1,10 +1,10 @@
|
||||
[Unit]
|
||||
Description=AWOOOI 110 runner/CD lane fail-closed enforcer
|
||||
Documentation=file:/usr/local/lib/awoooi/enforce-110-runner-failclosed.sh
|
||||
Documentation=file:/usr/local/lib/awoooi/enforce-110-runner-failclosed.authority.sh
|
||||
Wants=network-online.target
|
||||
After=network-online.target docker.service
|
||||
|
||||
[Service]
|
||||
Type=oneshot
|
||||
ExecStart=/usr/local/lib/awoooi/enforce-110-runner-failclosed.sh --apply
|
||||
ExecStart=/usr/local/lib/awoooi/enforce-110-runner-failclosed.authority.sh --apply
|
||||
TimeoutStartSec=180
|
||||
|
||||
@@ -8,4 +8,8 @@ if [ -x "$SCRIPT_DIR/enforce-110-runner-failclosed.sh" ]; then
|
||||
exec "$SCRIPT_DIR/enforce-110-runner-failclosed.sh" "$@"
|
||||
fi
|
||||
|
||||
if [ -x /usr/local/lib/awoooi/enforce-110-runner-failclosed.authority.sh ]; then
|
||||
exec /usr/local/lib/awoooi/enforce-110-runner-failclosed.authority.sh "$@"
|
||||
fi
|
||||
|
||||
exec /usr/local/lib/awoooi/enforce-110-runner-failclosed.sh "$@"
|
||||
|
||||
@@ -9,6 +9,7 @@ MODE="check"
|
||||
STAMP="$(date +%Y%m%dT%H%M%S%z)"
|
||||
APPLY_PERFORMED=0
|
||||
CANONICAL_ENFORCER="/usr/local/lib/awoooi/enforce-110-runner-failclosed.sh"
|
||||
AUTHORITY_ENFORCER="/usr/local/lib/awoooi/enforce-110-runner-failclosed.authority.sh"
|
||||
COMPAT_ENFORCER="/usr/local/bin/awoooi-enforce-runner-failclosed-110.sh"
|
||||
|
||||
usage() {
|
||||
@@ -335,16 +336,25 @@ repair_enforcer_entrypoints() {
|
||||
local tmp
|
||||
current="$(readlink -f "$0" 2>/dev/null || printf '%s' "$0")"
|
||||
as_root mkdir -p "$(dirname "$CANONICAL_ENFORCER")" >/dev/null 2>&1 || true
|
||||
as_root mkdir -p "$(dirname "$AUTHORITY_ENFORCER")" >/dev/null 2>&1 || true
|
||||
if [ -f "$current" ] && [ "$current" != "$CANONICAL_ENFORCER" ]; then
|
||||
as_root chattr -i "$CANONICAL_ENFORCER" >/dev/null 2>&1 || true
|
||||
as_root install -o root -g root -m 0755 "$current" "$CANONICAL_ENFORCER" >/dev/null 2>&1 || true
|
||||
fi
|
||||
as_root chattr +i "$CANONICAL_ENFORCER" >/dev/null 2>&1 || true
|
||||
if [ -f "$current" ] && [ "$current" != "$AUTHORITY_ENFORCER" ]; then
|
||||
as_root chattr -i "$AUTHORITY_ENFORCER" >/dev/null 2>&1 || true
|
||||
as_root install -o root -g root -m 0755 "$current" "$AUTHORITY_ENFORCER" >/dev/null 2>&1 || true
|
||||
fi
|
||||
as_root chattr +i "$AUTHORITY_ENFORCER" >/dev/null 2>&1 || true
|
||||
|
||||
tmp="$(mktemp)"
|
||||
cat >"$tmp" <<'EOF'
|
||||
#!/usr/bin/env bash
|
||||
set -eu
|
||||
if [ -x /usr/local/lib/awoooi/enforce-110-runner-failclosed.authority.sh ]; then
|
||||
exec /usr/local/lib/awoooi/enforce-110-runner-failclosed.authority.sh "$@"
|
||||
fi
|
||||
exec /usr/local/lib/awoooi/enforce-110-runner-failclosed.sh "$@"
|
||||
EOF
|
||||
as_root chattr -i "$COMPAT_ENFORCER" >/dev/null 2>&1 || true
|
||||
@@ -365,13 +375,13 @@ repair_enforcer_systemd_units() {
|
||||
cat >"$service_tmp" <<'EOF'
|
||||
[Unit]
|
||||
Description=AWOOOI 110 runner/CD lane fail-closed enforcer
|
||||
Documentation=file:/usr/local/lib/awoooi/enforce-110-runner-failclosed.sh
|
||||
Documentation=file:/usr/local/lib/awoooi/enforce-110-runner-failclosed.authority.sh
|
||||
Wants=network-online.target
|
||||
After=network-online.target docker.service
|
||||
|
||||
[Service]
|
||||
Type=oneshot
|
||||
ExecStart=/usr/local/lib/awoooi/enforce-110-runner-failclosed.sh --apply
|
||||
ExecStart=/usr/local/lib/awoooi/enforce-110-runner-failclosed.authority.sh --apply
|
||||
TimeoutStartSec=180
|
||||
EOF
|
||||
|
||||
@@ -395,13 +405,13 @@ EOF
|
||||
cat >"$authority_service_tmp" <<'EOF'
|
||||
[Unit]
|
||||
Description=AWOOOI 110 runner/CD lane fail-closed authority
|
||||
Documentation=file:/usr/local/lib/awoooi/enforce-110-runner-failclosed.sh
|
||||
Documentation=file:/usr/local/lib/awoooi/enforce-110-runner-failclosed.authority.sh
|
||||
Wants=network-online.target
|
||||
After=network-online.target docker.service
|
||||
|
||||
[Service]
|
||||
Type=oneshot
|
||||
ExecStart=/usr/local/lib/awoooi/enforce-110-runner-failclosed.sh --apply
|
||||
ExecStart=/usr/local/lib/awoooi/enforce-110-runner-failclosed.authority.sh --apply
|
||||
TimeoutStartSec=180
|
||||
EOF
|
||||
|
||||
@@ -455,7 +465,7 @@ repair_enforcer_cron_authority() {
|
||||
cat >"$tmp" <<'EOF'
|
||||
SHELL=/bin/bash
|
||||
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
|
||||
* * * * * root /usr/local/lib/awoooi/enforce-110-runner-failclosed.sh --apply >>/var/log/awoooi-runner-failclosed-authority-cron.log 2>&1
|
||||
* * * * * root /usr/local/lib/awoooi/enforce-110-runner-failclosed.authority.sh --apply >>/var/log/awoooi-runner-failclosed-authority-cron.log 2>&1
|
||||
EOF
|
||||
as_root install -o root -g root -m 0644 "$tmp" /etc/cron.d/awoooi-runner-failclosed-authority >/dev/null 2>&1 || true
|
||||
rm -f "$tmp"
|
||||
|
||||
Reference in New Issue
Block a user