fix(alerts): surface legacy hitl backlog
Some checks failed
CD Pipeline / tests (push) Successful in 1m21s
Code Review / ai-code-review (push) Successful in 13s
Type Sync Check / check-type-sync (push) Failing after 40s
CD Pipeline / build-and-deploy (push) Successful in 5m22s
CD Pipeline / post-deploy-checks (push) Successful in 2m19s
Some checks failed
CD Pipeline / tests (push) Successful in 1m21s
Code Review / ai-code-review (push) Successful in 13s
Type Sync Check / check-type-sync (push) Failing after 40s
CD Pipeline / build-and-deploy (push) Successful in 5m22s
CD Pipeline / post-deploy-checks (push) Successful in 2m19s
This commit is contained in:
@@ -2665,6 +2665,12 @@ Phase 6 完成後
|
||||
- 判讀:T135 已把 runner ownership 從雙 runner 搶工收斂到 host runner 單一主控;下一段不要重新啟用 Docker-wrapped runner,而是做 runner pool / repo label 隔離、API image `apt-get` / `chown -R` 分層、Web build cache/offload、Playwright apt source-list hygiene。
|
||||
- 目前進度更新:AwoooP 告警可觀測鏈約 99.998%;Incident-level source correlation 可見性約 98.8%;Source correlation apply 狀態鏈可驗證性約 99.72%;Source correlation freshness / rolling gate 約 98.2%;前端 AI 自動化管理介面同步約 99.999%;Dashboard snapshot / SSE console noise 收斂約 99.2%;CI/CD runner hygiene 約 99.2%;Runner ownership 收斂約 96%;Build host pressure治理約 82%;完整 AI 自動化管理產品化約 99.960%。
|
||||
|
||||
**T153 Legacy HITL pending visibility + warning split(2026-05-31 台北)**:
|
||||
- 觸發:production `/api/v1/approvals/pending` 有 legacy HITL backlog `count=21`,但 `/api/v1/platform/approvals` 回 `total=0`,operator 會以為前台沒有人工待辦;同時 heartbeat 用 raw PENDING 數量觸發「PENDING 積壓,需人工處理」,把 OBSERVE / NO_ACTION 觀察卡與真正可執行審批混在一起。
|
||||
- 修正:`ApprovalRequest` / `ApprovalRequestResponse` 外露 `incident_id`、`matched_playbook_id`、`telegram_message_id`、`telegram_chat_id`;DB / repository converter 保留 legacy approval record 的 incident / playbook / Telegram delivery 欄位。`HeartbeatReportService` 將 24h pending 拆成 actionable、observe-only、without-TG;warning 只看 actionable backlog,並在訊息指向 `/awooop/approvals`,Telegram heartbeat 顯示「待審拆分」。
|
||||
- Verification:API py_compile pass;targeted ruff for new test pass;`test_approval_pending_visibility.py` 4 passed;`test_heartbeat_ollama_endpoints.py` + `test_heartbeat_pod_state_machine.py` 15 passed;`git diff --check` pass。
|
||||
- 判讀:T153 不批次 approve/reject 生產 PENDING,也不把觀察卡刪掉;它把「前台看得到 legacy HITL 事實」與「告警只針對真正人工 actionable backlog」補齊。舊 fallback kubectl / SSH action 仍需 operator 在 `/awooop/approvals` 逐筆決策;OBSERVE / NO_ACTION 類不再偽裝成 emergency manual backlog。下一段可追 LLM failure fallback 為何大量產生 `OBSERVE / medium` 卡片,但需避免破壞 agent 後續把 PENDING 更新成可執行 action 的路徑。
|
||||
|
||||
**T152 Ansible runtime readiness surfaced(2026-05-24 台北)**:
|
||||
- 觸發:T151 已讓首頁看到 execution backend / Ansible attribution,但 operator 仍看不到 runtime 端缺什麼,容易把「Ansible 有候選」誤解成「Ansible 已能自動修復」。
|
||||
- 修正:API image 複製 `infra/ansible/` 作 read-only catalog;`truth-chain/quality/summary` 新增 `ansible_runtime`,回報 playbook binary、catalog、inventory、playbook_count、can_run_check_mode、blockers。首頁 execution evidence 同步顯示 runtime 狀態;目前 production 顯示 `runtime 未就緒:ansible_playbook_binary_missing`。未安裝 `ansible-core`、未啟用 check-mode / apply。
|
||||
|
||||
Reference in New Issue
Block a user