fix(cd): keep drain recovery enforcer on controlled profile
All checks were successful
CD Pipeline / workflow-shape (push) Successful in 0s
CD Pipeline / cancel-stale-cd (push) Has been skipped
CD Pipeline / tests (push) Successful in 53s
CD Pipeline / build-and-deploy (push) Successful in 4m18s
CD Pipeline / post-deploy-checks (push) Successful in 1m43s

This commit is contained in:
Your Name
2026-07-02 01:33:00 +08:00
parent c1823b5f62
commit 0c73e98c3c
3 changed files with 32 additions and 0 deletions

View File

@@ -254,6 +254,8 @@ jobs:
;;
docs/LOGBOOK.md)
;;
docs/runbooks/REBOOT-RECOVERY-SOP.md)
;;
docs/runbooks/REBOOT-POST-START-QUICK-CHECK.md)
;;
docs/runbooks/FULL-STACK-COLD-START-SOP.md)
@@ -580,6 +582,8 @@ jobs:
;;
scripts/reboot-recovery/deploy-to-188.sh)
;;
scripts/reboot-recovery/enforce-110-runner-failclosed.sh)
;;
scripts/reboot-recovery/recover-110-control-path-and-harbor-local.sh)
;;
scripts/reboot-recovery/apply-credential-escrow-closeout-receipt-to-110.sh)
@@ -821,6 +825,7 @@ jobs:
../../ops/runner/install-awoooi-non110-runner-user-service.sh \
../../scripts/reboot-recovery/deploy-to-110.sh \
../../scripts/reboot-recovery/deploy-to-188.sh \
../../scripts/reboot-recovery/enforce-110-runner-failclosed.sh \
../../scripts/reboot-recovery/recover-110-control-path-and-harbor-local.sh \
../../scripts/reboot-recovery/awoooi-startup.sh \
../../scripts/reboot-recovery/install-reboot-auto-recovery-slo-110.sh \

View File

@@ -52405,6 +52405,30 @@ production browser smoke:
**下一步**
- commit / push 到 Gitea main 後讀回 CD再把新版 enforcer 受控同步到 110重跑非 secret guardrail apply 與 `check-awoooi-110-controlled-cd-lane-readiness.sh`。目標是 active blockers 收斂到 registration / service inactive不再出現 config / binary / unit 被 enforcer 回封的假 blocker。
## 2026-07-02 — P0 110 controlled drain live staging 與 CD #4341 B5 誤跑修正
**完成內容**
- source commit `c1823b5f6 fix(ops): preserve controlled drain lane staging` 已推到 Gitea mainlive 110 已受控同步新版 enforcer、readiness verifier、`awoooi-cd-lane-drain.service`、窄 label `config.yaml`,並從既有 `gitea/act_runner:latest` 抽出 ELF `awoooi_cd_lane_controlled`
- live apply 明確回報 `SERVICE_STARTED=0``REGISTRATION_TOUCHED=0``operation_boundary_runner_token_read=false``operation_boundary_raw_runner_registration_read=false`,沒有註冊 runner、沒有啟動 service、沒有讀 `.runner` 內容。
- 110 enforcer readback`CONTROLLED_DRAIN_STAGING_ALLOWED=1``RUNNER_UNITS_BAD_COUNT=0``awoooi-cd-lane-drain.service load=loaded active=inactive unitfile=disabled`legacy / generic runner 仍全 masked / inactive。
- 110 readiness verifier readback`CONFIG_READY=1``BINARY_READY=1``REGISTRATION_READY=0``SERVICE_READY=0``LEGACY_FAILCLOSED=1``PRIMARY_LANE_FAILCLOSED=1``BLOCKER_COUNT=2`,剩餘 blocker 只剩 `controlled_cd_lane_registration_missing``controlled_cd_lane_service_not_active`
- Gitea CD `#4341` 失敗原因已定位為 tests job 誤跑 full profile B5`BLOCKER b5_docker_socket_unavailable`。根因是本輪變更包含 `docs/runbooks/REBOOT-RECOVERY-SOP.md``scripts/reboot-recovery/enforce-110-runner-failclosed.sh`,但這兩個 path 未列入 controlled-runtime allowlist。
- `.gitea/workflows/cd.yaml` 已將上述兩個 path 加入 controlled-runtime profile並把 enforcer 加入 controlled-runtime `bash -n` syntax check`ops/runner/test_cd_controlled_runtime_profile.py` 新增 regression防止同類 recovery/enforcer patch 再落入 B5 Docker socket。
**本地驗證結果**
- `python3.11 -m pytest ops/runner/test_cd_controlled_runtime_profile.py -q``43 passed`
- `python3 ops/runner/guard-gitea-runner-pressure.py --root .`:通過,`auto_branch_events_on_110=0``generic_runner_labels=0`
- `node scripts/ci/check-gitea-step-env-secrets.js`:通過,`no Gitea run/with secrets or legacy Telegram routes`
- `git diff --check`:通過。
**仍維持**
- 沒有讀 secret / token / `.env` / raw sessions / SQLite / auth沒有讀 `.runner` 內容。
- 沒有使用 GitHub / gh / GitHub API / GitHub Actions。
- 沒有重啟主機,沒有 Docker / Nginx / K3s / DB / firewall restart沒有 workflow_dispatch沒有 DROP / TRUNCATE / restore / prune。
**下一步**
- commit / push workflow classifier 修法,讀回新的 Gitea CD確認 tests 走 controlled-runtime 並跳過 B5runner registration 仍需 token-safe path 補齊後才可啟動 `awoooi-cd-lane-drain.service`
## 2026-07-01 — 08:50 P0 188 DB circuit breaker post-push readback
**完成內容**

View File

@@ -743,6 +743,7 @@ def test_post_start_recovery_verifiers_stay_on_controlled_runtime_profile() -> N
text = _workflow_text()
expected_sources = [
"docs/runbooks/REBOOT-POST-START-QUICK-CHECK.md)",
"docs/runbooks/REBOOT-RECOVERY-SOP.md)",
"docs/runbooks/FULL-STACK-COLD-START-SOP.md)",
"docs/operations/host-cpu-pressure-drain-readback-2026-07-01.snapshot.json)",
"docs/operations/post-reboot-runtime-recovery-readback-2026-07-01.snapshot.json)",
@@ -759,6 +760,7 @@ def test_post_start_recovery_verifiers_stay_on_controlled_runtime_profile() -> N
"scripts/ops/host-runaway-process-exporter.py)",
"scripts/ops/host-sustained-load-evidence.py)",
"scripts/reboot-recovery/deploy-to-110.sh)",
"scripts/reboot-recovery/enforce-110-runner-failclosed.sh)",
"scripts/reboot-recovery/recover-110-control-path-and-harbor-local.sh)",
"scripts/reboot-recovery/apply-credential-escrow-closeout-receipt-to-110.sh)",
"scripts/reboot-recovery/post-start-quick-check.sh)",
@@ -791,6 +793,7 @@ def test_post_start_recovery_verifiers_stay_on_controlled_runtime_profile() -> N
"../../ops/reboot-recovery/full-stack-cold-start-baseline.yml",
"../../ops/runner/check-awoooi-110-controlled-cd-lane-readiness.sh",
"../../scripts/reboot-recovery/deploy-to-110.sh",
"../../scripts/reboot-recovery/enforce-110-runner-failclosed.sh",
"../../scripts/reboot-recovery/recover-110-control-path-and-harbor-local.sh",
"../../scripts/reboot-recovery/apply-credential-escrow-closeout-receipt-to-110.sh",
"../../scripts/reboot-recovery/post-start-quick-check.sh",