From f5f3a10bf668fb05f3300a44a105a82c468660ce Mon Sep 17 00:00:00 2001 From: Your Name Date: Thu, 21 May 2026 20:30:26 +0800 Subject: [PATCH] docs(awooop): record t138 cicd evidence surface [skip ci] --- docs/LOGBOOK.md | 107 ++++++++++++++++++ ...-04-15-MASTER-ai-autonomous-flywheel-v2.md | 9 ++ 2 files changed, 116 insertions(+) diff --git a/docs/LOGBOOK.md b/docs/LOGBOOK.md index ae6544f1..e0ae4781 100644 --- a/docs/LOGBOOK.md +++ b/docs/LOGBOOK.md @@ -1,3 +1,110 @@ +## 2026-05-21|T138 CI/CD evidence API + Deployments frontend surface + +**觸發**: + +- T137 已把 recovered rollout-risk 從 CD log 轉成 AWOOI API/AwoooP 訊號,但 operator 仍需要在產品頁面直接看到 CI/CD、rollout-risk、post-deploy gate 的狀態,不應只靠 Telegram 或 Actions log。 +- Live 查證發現 T137 的 `CI_rollout_risk_pending` 已寫入 `alert_operation_log`,但舊事件沒有保存 `annotations`,因此 rollout summary 只能在 CD log 看到,無法從 AwoooP API 查回。 + +**修正**: + +- `apps/api/src/api/v1/webhooks.py` + - `ALERT_RECEIVED` 寫入 `alert_operation_log.context` 時保存 `annotations`,讓後續 CI/CD success、failure、rollout-risk 的 summary / description 可查。 +- `apps/api/src/services/platform_operator_service.py` + - 新增 read-only `list_cicd_events()`,從 `alert_operation_log` 擷取 `CI_*` 告警證據。 + - 支援 `project_id`、`stage`、`status`、`limit` filter,並輸出 `needs_attention`、`duration_seconds`、commit、trigger、summary、description、workflow_url。 +- `apps/api/src/api/v1/platform/operator_runs.py` + - 新增 `GET /api/v1/platform/cicd/events`,router 只轉呼叫 service,不直接碰 DB。 +- `apps/web/src/components/panels/DeploymentsPanel.tsx` + - 在 Deployments 面板新增「CI/CD 部署證據」區塊,顯示 code-review / tests / post-deploy / rollout-risk、commit、trigger、時間、耗時、summary / description。 + - `needs_attention=true` 的 rollout-risk / failed / warning 以「需注意」呈現。 +- `apps/web/messages/zh-TW.json`、`apps/web/messages/en.json` + - 補齊部署證據區塊 i18n 文案。 + +**Verification / deploy**: + +```text +Local: +python -m py_compile apps/api/src/api/v1/platform/operator_runs.py apps/api/src/services/platform_operator_service.py apps/api/src/api/v1/webhooks.py -> pass +DATABASE_URL=postgresql+asyncpg://test:test@localhost/test PYTHONPATH=apps/api pytest apps/api/tests/test_awooop_operator_timeline_labels.py apps/api/tests/test_cicd_alertmanager_mapping.py -q + -> 45 passed +pnpm --filter @awoooi/web typecheck -> pass +pnpm --filter @awoooi/web lint -- --file src/components/panels/DeploymentsPanel.tsx -> pass +NEXT_PUBLIC_API_URL=https://awoooi.wooo.work pnpm --filter @awoooi/web build -> pass +node JSON parse apps/web/messages/zh-TW.json apps/web/messages/en.json -> pass +git diff --check -> pass + +Code commit: +4bdb012c feat(awooop): surface cicd rollout evidence + +Gitea Actions: +#2834 ai-code-review -> success +#2833 CD -> success + tests job 3667 -> success + pytest: 2190 passed, 23 skipped + B5 integration: 5 passed + build-and-deploy job 3668 -> success + API/Web/Worker image: 4bdb012caae8e000efc7d938fbdf5a65f52c0ef8 + post-deploy-checks job 3669 -> success + Alert Chain Smoke: 8/8 checks passed + Source Link smoke: expected provider event matched + E2E smoke: 5 passed + CI/CD success notification mirrored through AWOOI API + +Production API readback: +GET https://awoooi.wooo.work/api/v1/platform/cicd/events?project_id=awoooi&limit=10 + -> includes CI_post_deploy_success for 4bdb012c + -> summary: AWOOOI 部署完成 + -> description: API=✅; Web=✅; AlertChain=✅; SourceLink=✅; Monitoring=✅; Smoke=✅ + -> duration_seconds: 97 + -> includes older CI_rollout_risk_pending with needs_attention=true + +Production health: +GET https://awoooi.wooo.work/api/v1/health + -> healthy, prod, mock_mode=false + -> api/postgresql/redis/ollama/openclaw/signoz up + +K8s / ArgoCD: + awoooi-api 192.168.0.110:5000/awoooi/api:4bdb012caae8e000efc7d938fbdf5a65f52c0ef8 2/2 + awoooi-web 192.168.0.110:5000/awoooi/web:4bdb012caae8e000efc7d938fbdf5a65f52c0ef8 2/2 + awoooi-worker 192.168.0.110:5000/awoooi/api:4bdb012caae8e000efc7d938fbdf5a65f52c0ef8 1/1 + awoooi-prod -> Synced / Healthy / a5ed12937cc6e8e95a2be2c5453783f49d528f84 + +Browser verification: +https://awoooi.wooo.work/zh-TW/deployments + -> navigation visible + -> CI/CD 部署證據 12 筆 + -> latest 4bdb012c code-review/tests/post-deploy rows visible + -> older CI_rollout_risk_pending visible as 需注意 +``` + +**新技術債 / T139 候選**: + +- CD #2833 的 `post-deploy-checks` 曾從 20:16:42 排隊到 20:25:44,原因是同一個 110 host runner `capacity: 1` 被 `wooo/ewoooc` 的 `ewoooc-host` deploy job 佔用。 +- Live runner 狀態:Docker `gitea-runner` 仍停用(`Restart=no Status=exited Running=false`),但 user-level `gitea-act-runner-host.service` 同時宣告 AWOOI 與 EwoooC labels。這會讓跨 repo workload 延遲 AWOOI post-deploy gates。 +- 下一段應處理 runner pool / repo label isolation,避免「部署已完成但驗證被其他 repo 卡住」。 + +**判讀**: + +- T138 把 CI/CD / rollout-risk 從「通知訊息」提升成可查 API 與前端產品區塊。 +- 後續新 CI/CD 通知會保存 annotations;舊的 T137 rollout-risk 因當時尚未寫 annotations,所以 summary/description 仍為 null,但已能以 `needs_attention=true` 顯示在前端。 +- 這一階段沒有新增自動修復策略;它補齊的是 operator 判斷與稽核可見性,避免 Telegram 成為唯一事實來源。 + +**目前整體進度**: + +- AwoooP 告警可觀測鏈:99.998%。 +- Incident-level source correlation 可見性:98.8%。 +- Source correlation apply 狀態鏈可驗證性:99.72%。 +- Source correlation freshness / rolling gate:98.2%。 +- 前端 AI 自動化管理介面同步:99.999%。 +- Dashboard snapshot / SSE console noise 收斂:99.2%。 +- CI/CD runner hygiene:99.2%。 +- Runner ownership 收斂:96%。 +- API image build layer hygiene:88%。 +- Deploy rollout-risk 可觀測性:82% → 91%。 +- CI/CD evidence 前端可見性:35% → 85%。 +- Build host pressure治理:86%。 +- 完整 AI 自動化管理產品化:99.962% → 99.963%。 + ## 2026-05-21|T137 CD rollout-risk evidence capture **觸發**: diff --git a/docs/superpowers/specs/2026-04-15-MASTER-ai-autonomous-flywheel-v2.md b/docs/superpowers/specs/2026-04-15-MASTER-ai-autonomous-flywheel-v2.md index 1249021f..86dc8097 100644 --- a/docs/superpowers/specs/2026-04-15-MASTER-ai-autonomous-flywheel-v2.md +++ b/docs/superpowers/specs/2026-04-15-MASTER-ai-autonomous-flywheel-v2.md @@ -2665,6 +2665,15 @@ Phase 6 完成後 - 判讀:T135 已把 runner ownership 從雙 runner 搶工收斂到 host runner 單一主控;下一段不要重新啟用 Docker-wrapped runner,而是做 runner pool / repo label 隔離、API image `apt-get` / `chown -R` 分層、Web build cache/offload、Playwright apt source-list hygiene。 - 目前進度更新:AwoooP 告警可觀測鏈約 99.998%;Incident-level source correlation 可見性約 98.8%;Source correlation apply 狀態鏈可驗證性約 99.72%;Source correlation freshness / rolling gate 約 98.2%;前端 AI 自動化管理介面同步約 99.999%;Dashboard snapshot / SSE console noise 收斂約 99.2%;CI/CD runner hygiene 約 99.2%;Runner ownership 收斂約 96%;Build host pressure治理約 82%;完整 AI 自動化管理產品化約 99.960%。 +**T138 CI/CD evidence API + Deployments frontend surface(2026-05-21 台北)**: +- 觸發:T137 已把 recovered rollout-risk 從 CD log 轉成 AWOOI API/AwoooP 訊號,但 operator 仍需要在產品頁面直接看到 CI/CD、rollout-risk、post-deploy gate 的狀態,不應只靠 Telegram 或 Actions log。Live 查證發現 T137 的 `CI_rollout_risk_pending` 已寫入 `alert_operation_log`,但舊事件沒有保存 `annotations`,因此 rollout summary 只能在 CD log 看到,無法從 AwoooP API 查回。 +- 修正:`apps/api/src/api/v1/webhooks.py` 在 `ALERT_RECEIVED` op log context 保存 `annotations`;`apps/api/src/services/platform_operator_service.py` 新增 read-only `list_cicd_events()`,從 `alert_operation_log` 擷取 `CI_*` 告警證據,輸出 stage/status/severity/commit/trigger/summary/description/duration/`needs_attention`;`apps/api/src/api/v1/platform/operator_runs.py` 新增 `GET /api/v1/platform/cicd/events`;`apps/web/src/components/panels/DeploymentsPanel.tsx` 新增「CI/CD 部署證據」區塊,顯示 code-review/tests/post-deploy/rollout-risk 與 `needs_attention`;`apps/web/messages/{zh-TW,en}.json` 補 i18n。 +- Verification:local `py_compile` pass;`DATABASE_URL=postgresql+asyncpg://test:test@localhost/test PYTHONPATH=apps/api pytest apps/api/tests/test_awooop_operator_timeline_labels.py apps/api/tests/test_cicd_alertmanager_mapping.py -q` -> `45 passed`;`pnpm --filter @awoooi/web typecheck` pass;`pnpm --filter @awoooi/web lint -- --file src/components/panels/DeploymentsPanel.tsx` pass;`NEXT_PUBLIC_API_URL=https://awoooi.wooo.work pnpm --filter @awoooi/web build` pass;JSON parse + `git diff --check` pass。`4bdb012c feat(awooop): surface cicd rollout evidence` 已推 Gitea main;Code Review #2834 success;CD #2833 success:tests job 3667 `2190 passed, 23 skipped` + B5 `5 passed`,build-and-deploy job 3668 success,post-deploy job 3669 Alert Chain `8/8`、Source Link provider event matched、E2E `5 passed`、CI/CD success notification mirrored。 +- Production readback:`GET /api/v1/platform/cicd/events?project_id=awoooi&limit=10` 回 `CI_post_deploy_success` for `4bdb012caae8e000efc7d938fbdf5a65f52c0ef8`,`summary=AWOOOI 部署完成`,`description=API=✅; Web=✅; AlertChain=✅; SourceLink=✅; Monitoring=✅; Smoke=✅`,`duration_seconds=97`;舊 `CI_rollout_risk_pending` 仍為 `needs_attention=true`。Production health healthy/prod/mock_mode=false;API/Web/Worker image `4bdb012c...` ready(API 2/2、Web 2/2、Worker 1/1);ArgoCD `awoooi-prod` Synced/Healthy at `a5ed12937cc6e8e95a2be2c5453783f49d528f84`。Browser `https://awoooi.wooo.work/zh-TW/deployments`:navigation visible、「CI/CD 部署證據 12 筆」、latest `4bdb012c` rows visible、old rollout-risk visible as「需注意」。 +- 新技術債:CD #2833 的 `post-deploy-checks` 曾從 20:16:42 排隊到 20:25:44,因同一個 110 user-level `gitea-act-runner-host.service` `capacity: 1` 被 `wooo/ewoooc` 的 `ewoooc-host` deploy job 佔用;Docker `gitea-runner` 仍停用,問題是同一個 user-level runner 同時宣告 AWOOI 與 EwoooC labels。下一段應處理 runner pool / repo label isolation,避免「部署已完成但驗證被其他 repo 卡住」。 +- 判讀:T138 把 CI/CD / rollout-risk 從「通知訊息」提升成可查 API 與前端產品區塊。後續新 CI/CD 通知會保存 annotations;舊 T137 rollout-risk 因當時尚未寫 annotations,所以 summary/description 仍為 null,但已能以 `needs_attention=true` 顯示在前端。這一階段沒有新增自動修復策略;補齊的是 operator 判斷與稽核可見性,避免 Telegram 成為唯一事實來源。 +- 目前進度更新:AwoooP 告警可觀測鏈約 99.998%;Incident-level source correlation 可見性約 98.8%;Source correlation apply 狀態鏈可驗證性約 99.72%;Source correlation freshness / rolling gate 約 98.2%;前端 AI 自動化管理介面同步約 99.999%;Dashboard snapshot / SSE console noise 收斂約 99.2%;CI/CD runner hygiene 約 99.2%;Runner ownership 收斂約 96%;API image build layer hygiene 約 88%;Deploy rollout-risk 可觀測性約 91%;CI/CD evidence 前端可見性約 85%;Build host pressure治理約 86%;完整 AI 自動化管理產品化約 99.963%。 + **T137 CD rollout-risk evidence capture(2026-05-21 台北)**: - 觸發:T136 deploy 期間曾短暫看到 production `502` 與 120 K8s API `ServiceUnavailable`,但原本 CD 只在最後成功/失敗通知;operator 無法從 Telegram/AwoooP 直接判斷 rollout 中間是否曾有風險、是否已恢復、是否需要人工介入。 - 修正:`.gitea/workflows/cd.yaml` 在 `Deploy to K8s (ArgoCD GitOps)` 的 ArgoCD wait / rollout / final health check 外層加 `ROLLOUT_LOG` capture;remote deploy wait 期間偵測 ArgoCD query failure、ArgoCD `Unknown` status、public `API_HEALTH_URL` 非 200 / curl timeout。若 rollout 最終成功但曾看到 risk,輸出 `AWOOOI_ROLLOUT_RISK=1` 與 `AWOOOI_ROLLOUT_SUMMARY=...`,並只送一則 `rollout-risk` pending/warning 到 AWOOI API/AwoooP;若 rollout 最終失敗,失敗通知會帶 rollout summary,不再只寫 commit message。