Merge remote-tracking branch 'gitea-ssh/main' into codex/github-redacted-evidence-validator-20260627
Some checks failed
Ansible / Reboot Recovery Contract / validate (push) Successful in 1m16s
CD Pipeline / tests (push) Successful in 1m42s
Code Review / ai-code-review (push) Successful in 15s
CD Pipeline / build-and-deploy (push) Failing after 15m29s
CD Pipeline / post-deploy-checks (push) Has been cancelled
Some checks failed
Ansible / Reboot Recovery Contract / validate (push) Successful in 1m16s
CD Pipeline / tests (push) Successful in 1m42s
Code Review / ai-code-review (push) Successful in 15s
CD Pipeline / build-and-deploy (push) Failing after 15m29s
CD Pipeline / post-deploy-checks (push) Has been cancelled
This commit is contained in:
@@ -46,7 +46,7 @@
|
||||
|
||||
正確動作是 AI 自動補齊 target selector、source-of-truth diff、check-mode / dry-run、rollback、post-apply verifier、KM / PlayBook trust writeback,然後推進可驗證、可回滾、低爆炸半徑的實作。
|
||||
|
||||
**110 runner 壓力事故例外**:Gitea / act-runner 對 110 造成 CPU / headless smoke 壓力時,屬事故級容量保護,不得用「全面授權」直接重開 runner、移除 mask、還原 runner binary 或把 host pressure gate 改成 warn-only。正確動作是先做 runner 搬遷 / 限流 / label isolation / smoke 排程,再以 check-mode、rollback 與 post-apply verifier 受控恢復。
|
||||
**110 runner 壓力事故例外**:Gitea / act-runner / direct transient runner 對 110 造成 CPU / headless smoke 壓力時,屬事故級容量保護,不得用「全面授權」直接重開 runner、移除 mask、還原 runner binary、用 `systemd-run` 直啟 `.real` binary,或把 host pressure gate 改成 warn-only。正確動作是先做 runner 搬遷 / 限流 / label isolation / smoke 排程,再以 check-mode、rollback 與 post-apply verifier 受控恢復。
|
||||
|
||||
---
|
||||
|
||||
|
||||
@@ -20568,7 +20568,7 @@
|
||||
},
|
||||
"wazuhAccepted": {
|
||||
"label": "Wazuh accepted",
|
||||
"detail": "Manager registry accepted 仍為 0。"
|
||||
"detail": "Manager registry accepted readback is 6; runtime gate remains 0."
|
||||
}
|
||||
},
|
||||
"domainMetric": {
|
||||
@@ -20583,6 +20583,7 @@
|
||||
"waiting_actor_before_after_and_recurrence_guard": "等待 actor、before / after 與防再發證據",
|
||||
"manifest_mapped_read_only_runtime_gate_closed": "Manifest 已映射,runtime gate 仍關閉",
|
||||
"waiting_manager_registry_readback": "等待 Wazuh manager registry 全量讀回",
|
||||
"manager_registry_readback_accepted_runtime_gate_closed": "Manager registry accepted readback is present; runtime gate remains closed",
|
||||
"draft_waiting_owner_review_runtime_gate_closed": "等待 owner evidence review,runtime gate 仍關閉",
|
||||
"read_only_inventory_runtime_write_gate_closed": "只讀盤點完成,AI runtime write gate 仍關閉"
|
||||
},
|
||||
|
||||
@@ -20568,7 +20568,7 @@
|
||||
},
|
||||
"wazuhAccepted": {
|
||||
"label": "Wazuh accepted",
|
||||
"detail": "Manager registry accepted 仍為 0。"
|
||||
"detail": "Manager registry accepted 已讀回 6;runtime gate 仍為 0。"
|
||||
}
|
||||
},
|
||||
"domainMetric": {
|
||||
@@ -20583,6 +20583,7 @@
|
||||
"waiting_actor_before_after_and_recurrence_guard": "等待 actor、before / after 與防再發證據",
|
||||
"manifest_mapped_read_only_runtime_gate_closed": "Manifest 已映射,runtime gate 仍關閉",
|
||||
"waiting_manager_registry_readback": "等待 Wazuh manager registry 全量讀回",
|
||||
"manager_registry_readback_accepted_runtime_gate_closed": "Manager registry accepted 已讀回,runtime gate 仍關閉",
|
||||
"draft_waiting_owner_review_runtime_gate_closed": "等待 owner evidence review,runtime gate 仍關閉",
|
||||
"read_only_inventory_runtime_write_gate_closed": "只讀盤點完成,AI runtime write gate 仍關閉"
|
||||
},
|
||||
|
||||
@@ -8573,7 +8573,7 @@ function IwoooSSecurityControlCoverageBoard() {
|
||||
key: 'wazuhAccepted',
|
||||
value: summary ? String(summary.wazuh_manager_registry_accepted_count) : '...',
|
||||
icon: Radar,
|
||||
tone: 'locked',
|
||||
tone: summary && summary.wazuh_manager_registry_accepted_count > 0 ? 'steady' : 'locked',
|
||||
},
|
||||
]
|
||||
const domains = data?.domains ?? []
|
||||
|
||||
@@ -289,9 +289,9 @@ force push / 刪 repo / 刪 refs / 改 repo visibility / raw runtime secret volu
|
||||
|
||||
### 110 runner 壓力事故例外
|
||||
|
||||
2026-06-28 事故後,110 上的 Gitea / act-runner、StockPlatform headless smoke、host-side Next build 與 Docker / BuildKit 壓力屬容量事故保護面。即使收到「批准 / 繼續 / 全面授權」,也不得直接重開 runner、解除 service mask、還原 live runner binary、恢復泛用 `ubuntu-latest` label,或把 host pressure gate 改成 warn-only 作為預設。
|
||||
2026-06-28 事故後,110 上的 Gitea / act-runner / direct transient runner、StockPlatform headless smoke、host-side Next build 與 Docker / BuildKit 壓力屬容量事故保護面。即使收到「批准 / 繼續 / 全面授權」,也不得直接重開 runner、解除 service mask、還原 live runner binary、用 `systemd-run` 直啟 `.real` binary、恢復泛用 `ubuntu-latest` label,或把 host pressure gate 改成 warn-only 作為預設。
|
||||
|
||||
允許的 controlled apply 是降壓與防再發:停止 / disable / mask runner、quarantine runner binary、收斂 labels、補 source fail-closed guard、搬遷 runner、限制 concurrency、把 smoke 改成排程 / 非 110 runner,以及執行只讀 pressure / cold-start verifier。
|
||||
允許的 controlled apply 是降壓與防再發:停止 / disable / mask runner、mask direct transient unit、quarantine runner binary、收斂 labels、補 source fail-closed guard、搬遷 runner、限制 concurrency、把 smoke 改成排程 / 非 110 runner,以及執行只讀 pressure / cold-start verifier。
|
||||
|
||||
恢復 runner 必須同時具備:
|
||||
|
||||
|
||||
@@ -30,6 +30,7 @@
|
||||
- Live 110 `/usr/local/bin/awoooi-wait-host-web-build-pressure.sh` 與 `/usr/local/bin/awoooi-startup-110.sh` 已同步 fail-closed 並加 immutable。
|
||||
- Live 110 四條 runner 入口改為 immutable fail-closed stub,原 ELF 僅 quarantine 不讀內容:`/home/wooo/act-runner/act_runner`、`/home/wooo/act-runner/act_runner.real-20260628-runner-pressure-guard`、`/home/wooo/act-runner-controlled/act_runner`、`/home/wooo/awoooi-controlled-runner/awoooi_controlled_runner`。
|
||||
- System-level 與 user-level `gitea-act-runner-host.service`、`gitea-act-runner-awoooi-controlled.service`、`gitea-awoooi-controlled-runner.service`、`gitea-act-runner-awoooi-open.service` 皆讀回 inactive / masked;原 unit file 僅 quarantine,並以 `/dev/null` mask symlink 防止直啟。
|
||||
- 08:54 又抓到 transient `awoooi-direct-runner-open.service` 直啟 `.real` binary,且把 `/home/wooo/act-runner/act_runner` / `.real` 還原成 ELF;已強制 kill、將 `awoooi-direct-runner-open.service` 與 `awoooi-direct-runner.service` 建立 `/dev/null` mask,並再次 quarantine ELF / 還原 immutable fail-closed stub。
|
||||
- `AGENTS.md`、`docs/HARD_RULES.md`、`ops/runner/README.md` 與 MASTER runner 章節補上 110 runner 壓力事故例外:全面授權不等於可重開 production host runner。
|
||||
|
||||
**Live readback**:
|
||||
@@ -37,6 +38,9 @@
|
||||
- 08:40 曾抓到 parent=1 的孤兒 `/home/wooo/act-runner/act_runner.real-20260628-runner-pressure-guard daemon --config config.yaml`,已 terminate;後續改為 stub 以防任何 direct exec 繞過 systemd。
|
||||
- 08:47 延遲讀回:四個 system runner unit 皆 `LoadState=masked`、`ActiveState=inactive`、`UnitFileState=masked`;user-level runner units 亦為 masked;精準 runner process scan 無命中。
|
||||
- 08:52 以乾淨 SSH command 重跑 live pressure gate:`/usr/local/bin/awoooi-wait-host-web-build-pressure.sh` 回 `GATE_RC=0`、`no host web/build/smoke pressure detected`。
|
||||
- 08:56 讀回:`awoooi-direct-runner-open.service`、`awoooi-direct-runner.service` 與四個 Gitea runner units 全部 `masked / inactive`;四條 runner binary 皆為 163-byte immutable shell stub;`pgrep` 無 runner process;pressure gate 回 `GATE_RC=0`。
|
||||
- 08:58 延遲讀回曾顯示 transient `awoooi-direct-runner-open.service` 已消失成 `not-found`;已立即補 `systemctl mask awoooi-direct-runner-open.service`,確認 `/etc/systemd/system/awoooi-direct-runner-open.service -> /dev/null` 且 `LoadState=masked` / `UnitFileState=masked`。
|
||||
- 08:59 最終短延遲讀回:兩個 direct runner units 與四個 Gitea runner units 全部 `masked / inactive`;runner process scan 無命中;pressure gate 回 `GATE_RC=0`。
|
||||
- `post-start-quick-check.sh --no-color` 回 `SCORECARD_RC=2`、`POST_START_QUICK_CHECK PASS=35 WARN=4 BLOCKED=4`、`RESULT=BLOCKED`;public routes 全部 HTTP OK,StockPlatform freshness `ok` / latest trading date `2026-06-26`,但 MOMO daily sales data stale `4|2026-06-24`、backup heartbeat core blocker、credential escrow missing `5` 仍阻擋全主機 green。
|
||||
- 08:49 曾觀察到 `/home/wooo/awoooi-manual-deploy` 的手動 Web image build;parent 是另一條 SSH session,不是 Gitea runner。08:52 乾淨 gate 已恢復 `GATE_RC=0`;此 live-only manual deploy path 仍需後續納入 pressure lock / 非 110 build path。
|
||||
|
||||
|
||||
@@ -41,7 +41,7 @@ resources:
|
||||
images:
|
||||
- name: 192.168.0.110:5000/library/api:IMAGE_TAG_PLACEHOLDER
|
||||
newName: 192.168.0.110:5000/awoooi/api
|
||||
newTag: d4c2cc6e200fc00e07d179ebb9a4a156cef2c6d5
|
||||
newTag: a1f5935481ad01cc3f73ebb4354726d57e7a2e41
|
||||
- name: 192.168.0.110:5000/library/web:IMAGE_TAG_PLACEHOLDER
|
||||
newName: 192.168.0.110:5000/awoooi/web
|
||||
newTag: d4c2cc6e200fc00e07d179ebb9a4a156cef2c6d5
|
||||
newTag: a1f5935481ad01cc3f73ebb4354726d57e7a2e41
|
||||
|
||||
@@ -388,14 +388,24 @@ runner registration / service:
|
||||
|
||||
2026-06-28 live update:110 runner 壓力事故確認有直呼
|
||||
`/home/wooo/act-runner/act_runner.real-20260628-runner-pressure-guard` 的孤兒
|
||||
daemon。四條 live runner 入口已改為 immutable fail-closed stub,原 ELF 僅
|
||||
quarantine 不讀內容;相關 systemd units 維持 inactive / disabled / masked:
|
||||
daemon,且曾透過 transient `awoooi-direct-runner-open.service` 繞過既有
|
||||
Gitea service 名稱。四條 live runner 入口已改為 immutable fail-closed stub,
|
||||
原 ELF 僅 quarantine 不讀內容;相關 systemd units 維持 inactive / masked:
|
||||
|
||||
- `/home/wooo/act-runner/act_runner`
|
||||
- `/home/wooo/act-runner/act_runner.real-20260628-runner-pressure-guard`
|
||||
- `/home/wooo/act-runner-controlled/act_runner`
|
||||
- `/home/wooo/awoooi-controlled-runner/awoooi_controlled_runner`
|
||||
|
||||
必須一併維持 masked 的 unit 名稱:
|
||||
|
||||
- `awoooi-direct-runner-open.service`
|
||||
- `awoooi-direct-runner.service`
|
||||
- `gitea-act-runner-host.service`
|
||||
- `gitea-act-runner-awoooi-controlled.service`
|
||||
- `gitea-awoooi-controlled-runner.service`
|
||||
- `gitea-act-runner-awoooi-open.service`
|
||||
|
||||
未完成 runner 搬遷 / 限流 / smoke 排程前,不得解除 mask、還原 ELF、恢復
|
||||
泛用 runner label,或把 host pressure gate 預設改成 warn-only。
|
||||
|
||||
|
||||
Reference in New Issue
Block a user