docs(awooop): record t138 cicd evidence surface [skip ci]
This commit is contained in:
@@ -2665,6 +2665,15 @@ Phase 6 完成後
|
||||
- 判讀:T135 已把 runner ownership 從雙 runner 搶工收斂到 host runner 單一主控;下一段不要重新啟用 Docker-wrapped runner,而是做 runner pool / repo label 隔離、API image `apt-get` / `chown -R` 分層、Web build cache/offload、Playwright apt source-list hygiene。
|
||||
- 目前進度更新:AwoooP 告警可觀測鏈約 99.998%;Incident-level source correlation 可見性約 98.8%;Source correlation apply 狀態鏈可驗證性約 99.72%;Source correlation freshness / rolling gate 約 98.2%;前端 AI 自動化管理介面同步約 99.999%;Dashboard snapshot / SSE console noise 收斂約 99.2%;CI/CD runner hygiene 約 99.2%;Runner ownership 收斂約 96%;Build host pressure治理約 82%;完整 AI 自動化管理產品化約 99.960%。
|
||||
|
||||
**T138 CI/CD evidence API + Deployments frontend surface(2026-05-21 台北)**:
|
||||
- 觸發:T137 已把 recovered rollout-risk 從 CD log 轉成 AWOOI API/AwoooP 訊號,但 operator 仍需要在產品頁面直接看到 CI/CD、rollout-risk、post-deploy gate 的狀態,不應只靠 Telegram 或 Actions log。Live 查證發現 T137 的 `CI_rollout_risk_pending` 已寫入 `alert_operation_log`,但舊事件沒有保存 `annotations`,因此 rollout summary 只能在 CD log 看到,無法從 AwoooP API 查回。
|
||||
- 修正:`apps/api/src/api/v1/webhooks.py` 在 `ALERT_RECEIVED` op log context 保存 `annotations`;`apps/api/src/services/platform_operator_service.py` 新增 read-only `list_cicd_events()`,從 `alert_operation_log` 擷取 `CI_*` 告警證據,輸出 stage/status/severity/commit/trigger/summary/description/duration/`needs_attention`;`apps/api/src/api/v1/platform/operator_runs.py` 新增 `GET /api/v1/platform/cicd/events`;`apps/web/src/components/panels/DeploymentsPanel.tsx` 新增「CI/CD 部署證據」區塊,顯示 code-review/tests/post-deploy/rollout-risk 與 `needs_attention`;`apps/web/messages/{zh-TW,en}.json` 補 i18n。
|
||||
- Verification:local `py_compile` pass;`DATABASE_URL=postgresql+asyncpg://test:test@localhost/test PYTHONPATH=apps/api pytest apps/api/tests/test_awooop_operator_timeline_labels.py apps/api/tests/test_cicd_alertmanager_mapping.py -q` -> `45 passed`;`pnpm --filter @awoooi/web typecheck` pass;`pnpm --filter @awoooi/web lint -- --file src/components/panels/DeploymentsPanel.tsx` pass;`NEXT_PUBLIC_API_URL=https://awoooi.wooo.work pnpm --filter @awoooi/web build` pass;JSON parse + `git diff --check` pass。`4bdb012c feat(awooop): surface cicd rollout evidence` 已推 Gitea main;Code Review #2834 success;CD #2833 success:tests job 3667 `2190 passed, 23 skipped` + B5 `5 passed`,build-and-deploy job 3668 success,post-deploy job 3669 Alert Chain `8/8`、Source Link provider event matched、E2E `5 passed`、CI/CD success notification mirrored。
|
||||
- Production readback:`GET /api/v1/platform/cicd/events?project_id=awoooi&limit=10` 回 `CI_post_deploy_success` for `4bdb012caae8e000efc7d938fbdf5a65f52c0ef8`,`summary=AWOOOI 部署完成`,`description=API=✅; Web=✅; AlertChain=✅; SourceLink=✅; Monitoring=✅; Smoke=✅`,`duration_seconds=97`;舊 `CI_rollout_risk_pending` 仍為 `needs_attention=true`。Production health healthy/prod/mock_mode=false;API/Web/Worker image `4bdb012c...` ready(API 2/2、Web 2/2、Worker 1/1);ArgoCD `awoooi-prod` Synced/Healthy at `a5ed12937cc6e8e95a2be2c5453783f49d528f84`。Browser `https://awoooi.wooo.work/zh-TW/deployments`:navigation visible、「CI/CD 部署證據 12 筆」、latest `4bdb012c` rows visible、old rollout-risk visible as「需注意」。
|
||||
- 新技術債:CD #2833 的 `post-deploy-checks` 曾從 20:16:42 排隊到 20:25:44,因同一個 110 user-level `gitea-act-runner-host.service` `capacity: 1` 被 `wooo/ewoooc` 的 `ewoooc-host` deploy job 佔用;Docker `gitea-runner` 仍停用,問題是同一個 user-level runner 同時宣告 AWOOI 與 EwoooC labels。下一段應處理 runner pool / repo label isolation,避免「部署已完成但驗證被其他 repo 卡住」。
|
||||
- 判讀:T138 把 CI/CD / rollout-risk 從「通知訊息」提升成可查 API 與前端產品區塊。後續新 CI/CD 通知會保存 annotations;舊 T137 rollout-risk 因當時尚未寫 annotations,所以 summary/description 仍為 null,但已能以 `needs_attention=true` 顯示在前端。這一階段沒有新增自動修復策略;補齊的是 operator 判斷與稽核可見性,避免 Telegram 成為唯一事實來源。
|
||||
- 目前進度更新:AwoooP 告警可觀測鏈約 99.998%;Incident-level source correlation 可見性約 98.8%;Source correlation apply 狀態鏈可驗證性約 99.72%;Source correlation freshness / rolling gate 約 98.2%;前端 AI 自動化管理介面同步約 99.999%;Dashboard snapshot / SSE console noise 收斂約 99.2%;CI/CD runner hygiene 約 99.2%;Runner ownership 收斂約 96%;API image build layer hygiene 約 88%;Deploy rollout-risk 可觀測性約 91%;CI/CD evidence 前端可見性約 85%;Build host pressure治理約 86%;完整 AI 自動化管理產品化約 99.963%。
|
||||
|
||||
**T137 CD rollout-risk evidence capture(2026-05-21 台北)**:
|
||||
- 觸發:T136 deploy 期間曾短暫看到 production `502` 與 120 K8s API `ServiceUnavailable`,但原本 CD 只在最後成功/失敗通知;operator 無法從 Telegram/AwoooP 直接判斷 rollout 中間是否曾有風險、是否已恢復、是否需要人工介入。
|
||||
- 修正:`.gitea/workflows/cd.yaml` 在 `Deploy to K8s (ArgoCD GitOps)` 的 ArgoCD wait / rollout / final health check 外層加 `ROLLOUT_LOG` capture;remote deploy wait 期間偵測 ArgoCD query failure、ArgoCD `Unknown` status、public `API_HEALTH_URL` 非 200 / curl timeout。若 rollout 最終成功但曾看到 risk,輸出 `AWOOOI_ROLLOUT_RISK=1` 與 `AWOOOI_ROLLOUT_SUMMARY=...`,並只送一則 `rollout-risk` pending/warning 到 AWOOI API/AwoooP;若 rollout 最終失敗,失敗通知會帶 rollout summary,不再只寫 commit message。
|
||||
|
||||
Reference in New Issue
Block a user