docs(awooop): record t138 cicd evidence surface [skip ci]

This commit is contained in:
Your Name
2026-05-21 20:30:26 +08:00
parent a5ed12937c
commit f5f3a10bf6
2 changed files with 116 additions and 0 deletions

View File

@@ -1,3 +1,110 @@
## 2026-05-21T138 CI/CD evidence API + Deployments frontend surface
**觸發**
- T137 已把 recovered rollout-risk 從 CD log 轉成 AWOOI API/AwoooP 訊號,但 operator 仍需要在產品頁面直接看到 CI/CD、rollout-risk、post-deploy gate 的狀態,不應只靠 Telegram 或 Actions log。
- Live 查證發現 T137 的 `CI_rollout_risk_pending` 已寫入 `alert_operation_log`,但舊事件沒有保存 `annotations`,因此 rollout summary 只能在 CD log 看到,無法從 AwoooP API 查回。
**修正**
- `apps/api/src/api/v1/webhooks.py`
- `ALERT_RECEIVED` 寫入 `alert_operation_log.context` 時保存 `annotations`,讓後續 CI/CD success、failure、rollout-risk 的 summary / description 可查。
- `apps/api/src/services/platform_operator_service.py`
- 新增 read-only `list_cicd_events()`,從 `alert_operation_log` 擷取 `CI_*` 告警證據。
- 支援 `project_id``stage``status``limit` filter並輸出 `needs_attention``duration_seconds`、commit、trigger、summary、description、workflow_url。
- `apps/api/src/api/v1/platform/operator_runs.py`
- 新增 `GET /api/v1/platform/cicd/events`router 只轉呼叫 service不直接碰 DB。
- `apps/web/src/components/panels/DeploymentsPanel.tsx`
- 在 Deployments 面板新增「CI/CD 部署證據」區塊,顯示 code-review / tests / post-deploy / rollout-risk、commit、trigger、時間、耗時、summary / description。
- `needs_attention=true` 的 rollout-risk / failed / warning 以「需注意」呈現。
- `apps/web/messages/zh-TW.json``apps/web/messages/en.json`
- 補齊部署證據區塊 i18n 文案。
**Verification / deploy**
```text
Local:
python -m py_compile apps/api/src/api/v1/platform/operator_runs.py apps/api/src/services/platform_operator_service.py apps/api/src/api/v1/webhooks.py -> pass
DATABASE_URL=postgresql+asyncpg://test:test@localhost/test PYTHONPATH=apps/api pytest apps/api/tests/test_awooop_operator_timeline_labels.py apps/api/tests/test_cicd_alertmanager_mapping.py -q
-> 45 passed
pnpm --filter @awoooi/web typecheck -> pass
pnpm --filter @awoooi/web lint -- --file src/components/panels/DeploymentsPanel.tsx -> pass
NEXT_PUBLIC_API_URL=https://awoooi.wooo.work pnpm --filter @awoooi/web build -> pass
node JSON parse apps/web/messages/zh-TW.json apps/web/messages/en.json -> pass
git diff --check -> pass
Code commit:
4bdb012c feat(awooop): surface cicd rollout evidence
Gitea Actions:
#2834 ai-code-review -> success
#2833 CD -> success
tests job 3667 -> success
pytest: 2190 passed, 23 skipped
B5 integration: 5 passed
build-and-deploy job 3668 -> success
API/Web/Worker image: 4bdb012caae8e000efc7d938fbdf5a65f52c0ef8
post-deploy-checks job 3669 -> success
Alert Chain Smoke: 8/8 checks passed
Source Link smoke: expected provider event matched
E2E smoke: 5 passed
CI/CD success notification mirrored through AWOOI API
Production API readback:
GET https://awoooi.wooo.work/api/v1/platform/cicd/events?project_id=awoooi&limit=10
-> includes CI_post_deploy_success for 4bdb012c
-> summary: AWOOOI 部署完成
-> description: API=✅; Web=✅; AlertChain=✅; SourceLink=✅; Monitoring=✅; Smoke=✅
-> duration_seconds: 97
-> includes older CI_rollout_risk_pending with needs_attention=true
Production health:
GET https://awoooi.wooo.work/api/v1/health
-> healthy, prod, mock_mode=false
-> api/postgresql/redis/ollama/openclaw/signoz up
K8s / ArgoCD:
awoooi-api 192.168.0.110:5000/awoooi/api:4bdb012caae8e000efc7d938fbdf5a65f52c0ef8 2/2
awoooi-web 192.168.0.110:5000/awoooi/web:4bdb012caae8e000efc7d938fbdf5a65f52c0ef8 2/2
awoooi-worker 192.168.0.110:5000/awoooi/api:4bdb012caae8e000efc7d938fbdf5a65f52c0ef8 1/1
awoooi-prod -> Synced / Healthy / a5ed12937cc6e8e95a2be2c5453783f49d528f84
Browser verification:
https://awoooi.wooo.work/zh-TW/deployments
-> navigation visible
-> CI/CD 部署證據 12 筆
-> latest 4bdb012c code-review/tests/post-deploy rows visible
-> older CI_rollout_risk_pending visible as 需注意
```
**新技術債 / T139 候選**
- CD #2833`post-deploy-checks` 曾從 20:16:42 排隊到 20:25:44原因是同一個 110 host runner `capacity: 1``wooo/ewoooc``ewoooc-host` deploy job 佔用。
- Live runner 狀態Docker `gitea-runner` 仍停用(`Restart=no Status=exited Running=false`),但 user-level `gitea-act-runner-host.service` 同時宣告 AWOOI 與 EwoooC labels。這會讓跨 repo workload 延遲 AWOOI post-deploy gates。
- 下一段應處理 runner pool / repo label isolation避免「部署已完成但驗證被其他 repo 卡住」。
**判讀**
- T138 把 CI/CD / rollout-risk 從「通知訊息」提升成可查 API 與前端產品區塊。
- 後續新 CI/CD 通知會保存 annotations舊的 T137 rollout-risk 因當時尚未寫 annotations所以 summary/description 仍為 null但已能以 `needs_attention=true` 顯示在前端。
- 這一階段沒有新增自動修復策略;它補齊的是 operator 判斷與稽核可見性,避免 Telegram 成為唯一事實來源。
**目前整體進度**
- AwoooP 告警可觀測鏈99.998%。
- Incident-level source correlation 可見性98.8%。
- Source correlation apply 狀態鏈可驗證性99.72%。
- Source correlation freshness / rolling gate98.2%。
- 前端 AI 自動化管理介面同步99.999%。
- Dashboard snapshot / SSE console noise 收斂99.2%。
- CI/CD runner hygiene99.2%。
- Runner ownership 收斂96%。
- API image build layer hygiene88%。
- Deploy rollout-risk 可觀測性82% → 91%。
- CI/CD evidence 前端可見性35% → 85%。
- Build host pressure治理86%。
- 完整 AI 自動化管理產品化99.962% → 99.963%。
## 2026-05-21T137 CD rollout-risk evidence capture
**觸發**

View File

@@ -2665,6 +2665,15 @@ Phase 6 完成後
- 判讀T135 已把 runner ownership 從雙 runner 搶工收斂到 host runner 單一主控;下一段不要重新啟用 Docker-wrapped runner而是做 runner pool / repo label 隔離、API image `apt-get` / `chown -R` 分層、Web build cache/offload、Playwright apt source-list hygiene。
- 目前進度更新AwoooP 告警可觀測鏈約 99.998%Incident-level source correlation 可見性約 98.8%Source correlation apply 狀態鏈可驗證性約 99.72%Source correlation freshness / rolling gate 約 98.2%;前端 AI 自動化管理介面同步約 99.999%Dashboard snapshot / SSE console noise 收斂約 99.2%CI/CD runner hygiene 約 99.2%Runner ownership 收斂約 96%Build host pressure治理約 82%;完整 AI 自動化管理產品化約 99.960%。
**T138 CI/CD evidence API + Deployments frontend surface2026-05-21 台北)**
- 觸發T137 已把 recovered rollout-risk 從 CD log 轉成 AWOOI API/AwoooP 訊號,但 operator 仍需要在產品頁面直接看到 CI/CD、rollout-risk、post-deploy gate 的狀態,不應只靠 Telegram 或 Actions log。Live 查證發現 T137 的 `CI_rollout_risk_pending` 已寫入 `alert_operation_log`,但舊事件沒有保存 `annotations`,因此 rollout summary 只能在 CD log 看到,無法從 AwoooP API 查回。
- 修正:`apps/api/src/api/v1/webhooks.py``ALERT_RECEIVED` op log context 保存 `annotations``apps/api/src/services/platform_operator_service.py` 新增 read-only `list_cicd_events()`,從 `alert_operation_log` 擷取 `CI_*` 告警證據,輸出 stage/status/severity/commit/trigger/summary/description/duration/`needs_attention``apps/api/src/api/v1/platform/operator_runs.py` 新增 `GET /api/v1/platform/cicd/events``apps/web/src/components/panels/DeploymentsPanel.tsx` 新增「CI/CD 部署證據」區塊,顯示 code-review/tests/post-deploy/rollout-risk 與 `needs_attention``apps/web/messages/{zh-TW,en}.json` 補 i18n。
- Verificationlocal `py_compile` pass`DATABASE_URL=postgresql+asyncpg://test:test@localhost/test PYTHONPATH=apps/api pytest apps/api/tests/test_awooop_operator_timeline_labels.py apps/api/tests/test_cicd_alertmanager_mapping.py -q` -> `45 passed``pnpm --filter @awoooi/web typecheck` pass`pnpm --filter @awoooi/web lint -- --file src/components/panels/DeploymentsPanel.tsx` pass`NEXT_PUBLIC_API_URL=https://awoooi.wooo.work pnpm --filter @awoooi/web build` passJSON parse + `git diff --check` pass。`4bdb012c feat(awooop): surface cicd rollout evidence` 已推 Gitea mainCode Review #2834 successCD #2833 successtests job 3667 `2190 passed, 23 skipped` + B5 `5 passed`build-and-deploy job 3668 successpost-deploy job 3669 Alert Chain `8/8`、Source Link provider event matched、E2E `5 passed`、CI/CD success notification mirrored。
- Production readback`GET /api/v1/platform/cicd/events?project_id=awoooi&limit=10``CI_post_deploy_success` for `4bdb012caae8e000efc7d938fbdf5a65f52c0ef8``summary=AWOOOI 部署完成``description=API=✅; Web=✅; AlertChain=✅; SourceLink=✅; Monitoring=✅; Smoke=✅``duration_seconds=97`;舊 `CI_rollout_risk_pending` 仍為 `needs_attention=true`。Production health healthy/prod/mock_mode=falseAPI/Web/Worker image `4bdb012c...` readyAPI 2/2、Web 2/2、Worker 1/1ArgoCD `awoooi-prod` Synced/Healthy at `a5ed12937cc6e8e95a2be2c5453783f49d528f84`。Browser `https://awoooi.wooo.work/zh-TW/deployments`navigation visible、「CI/CD 部署證據 12 筆」、latest `4bdb012c` rows visible、old rollout-risk visible as「需注意」。
- 新技術債CD #2833`post-deploy-checks` 曾從 20:16:42 排隊到 20:25:44因同一個 110 user-level `gitea-act-runner-host.service` `capacity: 1``wooo/ewoooc``ewoooc-host` deploy job 佔用Docker `gitea-runner` 仍停用,問題是同一個 user-level runner 同時宣告 AWOOI 與 EwoooC labels。下一段應處理 runner pool / repo label isolation避免「部署已完成但驗證被其他 repo 卡住」。
- 判讀T138 把 CI/CD / rollout-risk 從「通知訊息」提升成可查 API 與前端產品區塊。後續新 CI/CD 通知會保存 annotations舊 T137 rollout-risk 因當時尚未寫 annotations所以 summary/description 仍為 null但已能以 `needs_attention=true` 顯示在前端。這一階段沒有新增自動修復策略;補齊的是 operator 判斷與稽核可見性,避免 Telegram 成為唯一事實來源。
- 目前進度更新AwoooP 告警可觀測鏈約 99.998%Incident-level source correlation 可見性約 98.8%Source correlation apply 狀態鏈可驗證性約 99.72%Source correlation freshness / rolling gate 約 98.2%;前端 AI 自動化管理介面同步約 99.999%Dashboard snapshot / SSE console noise 收斂約 99.2%CI/CD runner hygiene 約 99.2%Runner ownership 收斂約 96%API image build layer hygiene 約 88%Deploy rollout-risk 可觀測性約 91%CI/CD evidence 前端可見性約 85%Build host pressure治理約 86%;完整 AI 自動化管理產品化約 99.963%。
**T137 CD rollout-risk evidence capture2026-05-21 台北)**
- 觸發T136 deploy 期間曾短暫看到 production `502` 與 120 K8s API `ServiceUnavailable`,但原本 CD 只在最後成功/失敗通知operator 無法從 Telegram/AwoooP 直接判斷 rollout 中間是否曾有風險、是否已恢復、是否需要人工介入。
- 修正:`.gitea/workflows/cd.yaml``Deploy to K8s (ArgoCD GitOps)` 的 ArgoCD wait / rollout / final health check 外層加 `ROLLOUT_LOG` captureremote deploy wait 期間偵測 ArgoCD query failure、ArgoCD `Unknown` status、public `API_HEALTH_URL` 非 200 / curl timeout。若 rollout 最終成功但曾看到 risk輸出 `AWOOOI_ROLLOUT_RISK=1``AWOOOI_ROLLOUT_SUMMARY=...`,並只送一則 `rollout-risk` pending/warning 到 AWOOI API/AwoooP若 rollout 最終失敗,失敗通知會帶 rollout summary不再只寫 commit message。