docs(awooop): record t138 cicd evidence surface [skip ci]
This commit is contained in:
107
docs/LOGBOOK.md
107
docs/LOGBOOK.md
@@ -1,3 +1,110 @@
|
||||
## 2026-05-21|T138 CI/CD evidence API + Deployments frontend surface
|
||||
|
||||
**觸發**:
|
||||
|
||||
- T137 已把 recovered rollout-risk 從 CD log 轉成 AWOOI API/AwoooP 訊號,但 operator 仍需要在產品頁面直接看到 CI/CD、rollout-risk、post-deploy gate 的狀態,不應只靠 Telegram 或 Actions log。
|
||||
- Live 查證發現 T137 的 `CI_rollout_risk_pending` 已寫入 `alert_operation_log`,但舊事件沒有保存 `annotations`,因此 rollout summary 只能在 CD log 看到,無法從 AwoooP API 查回。
|
||||
|
||||
**修正**:
|
||||
|
||||
- `apps/api/src/api/v1/webhooks.py`
|
||||
- `ALERT_RECEIVED` 寫入 `alert_operation_log.context` 時保存 `annotations`,讓後續 CI/CD success、failure、rollout-risk 的 summary / description 可查。
|
||||
- `apps/api/src/services/platform_operator_service.py`
|
||||
- 新增 read-only `list_cicd_events()`,從 `alert_operation_log` 擷取 `CI_*` 告警證據。
|
||||
- 支援 `project_id`、`stage`、`status`、`limit` filter,並輸出 `needs_attention`、`duration_seconds`、commit、trigger、summary、description、workflow_url。
|
||||
- `apps/api/src/api/v1/platform/operator_runs.py`
|
||||
- 新增 `GET /api/v1/platform/cicd/events`,router 只轉呼叫 service,不直接碰 DB。
|
||||
- `apps/web/src/components/panels/DeploymentsPanel.tsx`
|
||||
- 在 Deployments 面板新增「CI/CD 部署證據」區塊,顯示 code-review / tests / post-deploy / rollout-risk、commit、trigger、時間、耗時、summary / description。
|
||||
- `needs_attention=true` 的 rollout-risk / failed / warning 以「需注意」呈現。
|
||||
- `apps/web/messages/zh-TW.json`、`apps/web/messages/en.json`
|
||||
- 補齊部署證據區塊 i18n 文案。
|
||||
|
||||
**Verification / deploy**:
|
||||
|
||||
```text
|
||||
Local:
|
||||
python -m py_compile apps/api/src/api/v1/platform/operator_runs.py apps/api/src/services/platform_operator_service.py apps/api/src/api/v1/webhooks.py -> pass
|
||||
DATABASE_URL=postgresql+asyncpg://test:test@localhost/test PYTHONPATH=apps/api pytest apps/api/tests/test_awooop_operator_timeline_labels.py apps/api/tests/test_cicd_alertmanager_mapping.py -q
|
||||
-> 45 passed
|
||||
pnpm --filter @awoooi/web typecheck -> pass
|
||||
pnpm --filter @awoooi/web lint -- --file src/components/panels/DeploymentsPanel.tsx -> pass
|
||||
NEXT_PUBLIC_API_URL=https://awoooi.wooo.work pnpm --filter @awoooi/web build -> pass
|
||||
node JSON parse apps/web/messages/zh-TW.json apps/web/messages/en.json -> pass
|
||||
git diff --check -> pass
|
||||
|
||||
Code commit:
|
||||
4bdb012c feat(awooop): surface cicd rollout evidence
|
||||
|
||||
Gitea Actions:
|
||||
#2834 ai-code-review -> success
|
||||
#2833 CD -> success
|
||||
tests job 3667 -> success
|
||||
pytest: 2190 passed, 23 skipped
|
||||
B5 integration: 5 passed
|
||||
build-and-deploy job 3668 -> success
|
||||
API/Web/Worker image: 4bdb012caae8e000efc7d938fbdf5a65f52c0ef8
|
||||
post-deploy-checks job 3669 -> success
|
||||
Alert Chain Smoke: 8/8 checks passed
|
||||
Source Link smoke: expected provider event matched
|
||||
E2E smoke: 5 passed
|
||||
CI/CD success notification mirrored through AWOOI API
|
||||
|
||||
Production API readback:
|
||||
GET https://awoooi.wooo.work/api/v1/platform/cicd/events?project_id=awoooi&limit=10
|
||||
-> includes CI_post_deploy_success for 4bdb012c
|
||||
-> summary: AWOOOI 部署完成
|
||||
-> description: API=✅; Web=✅; AlertChain=✅; SourceLink=✅; Monitoring=✅; Smoke=✅
|
||||
-> duration_seconds: 97
|
||||
-> includes older CI_rollout_risk_pending with needs_attention=true
|
||||
|
||||
Production health:
|
||||
GET https://awoooi.wooo.work/api/v1/health
|
||||
-> healthy, prod, mock_mode=false
|
||||
-> api/postgresql/redis/ollama/openclaw/signoz up
|
||||
|
||||
K8s / ArgoCD:
|
||||
awoooi-api 192.168.0.110:5000/awoooi/api:4bdb012caae8e000efc7d938fbdf5a65f52c0ef8 2/2
|
||||
awoooi-web 192.168.0.110:5000/awoooi/web:4bdb012caae8e000efc7d938fbdf5a65f52c0ef8 2/2
|
||||
awoooi-worker 192.168.0.110:5000/awoooi/api:4bdb012caae8e000efc7d938fbdf5a65f52c0ef8 1/1
|
||||
awoooi-prod -> Synced / Healthy / a5ed12937cc6e8e95a2be2c5453783f49d528f84
|
||||
|
||||
Browser verification:
|
||||
https://awoooi.wooo.work/zh-TW/deployments
|
||||
-> navigation visible
|
||||
-> CI/CD 部署證據 12 筆
|
||||
-> latest 4bdb012c code-review/tests/post-deploy rows visible
|
||||
-> older CI_rollout_risk_pending visible as 需注意
|
||||
```
|
||||
|
||||
**新技術債 / T139 候選**:
|
||||
|
||||
- CD #2833 的 `post-deploy-checks` 曾從 20:16:42 排隊到 20:25:44,原因是同一個 110 host runner `capacity: 1` 被 `wooo/ewoooc` 的 `ewoooc-host` deploy job 佔用。
|
||||
- Live runner 狀態:Docker `gitea-runner` 仍停用(`Restart=no Status=exited Running=false`),但 user-level `gitea-act-runner-host.service` 同時宣告 AWOOI 與 EwoooC labels。這會讓跨 repo workload 延遲 AWOOI post-deploy gates。
|
||||
- 下一段應處理 runner pool / repo label isolation,避免「部署已完成但驗證被其他 repo 卡住」。
|
||||
|
||||
**判讀**:
|
||||
|
||||
- T138 把 CI/CD / rollout-risk 從「通知訊息」提升成可查 API 與前端產品區塊。
|
||||
- 後續新 CI/CD 通知會保存 annotations;舊的 T137 rollout-risk 因當時尚未寫 annotations,所以 summary/description 仍為 null,但已能以 `needs_attention=true` 顯示在前端。
|
||||
- 這一階段沒有新增自動修復策略;它補齊的是 operator 判斷與稽核可見性,避免 Telegram 成為唯一事實來源。
|
||||
|
||||
**目前整體進度**:
|
||||
|
||||
- AwoooP 告警可觀測鏈:99.998%。
|
||||
- Incident-level source correlation 可見性:98.8%。
|
||||
- Source correlation apply 狀態鏈可驗證性:99.72%。
|
||||
- Source correlation freshness / rolling gate:98.2%。
|
||||
- 前端 AI 自動化管理介面同步:99.999%。
|
||||
- Dashboard snapshot / SSE console noise 收斂:99.2%。
|
||||
- CI/CD runner hygiene:99.2%。
|
||||
- Runner ownership 收斂:96%。
|
||||
- API image build layer hygiene:88%。
|
||||
- Deploy rollout-risk 可觀測性:82% → 91%。
|
||||
- CI/CD evidence 前端可見性:35% → 85%。
|
||||
- Build host pressure治理:86%。
|
||||
- 完整 AI 自動化管理產品化:99.962% → 99.963%。
|
||||
|
||||
## 2026-05-21|T137 CD rollout-risk evidence capture
|
||||
|
||||
**觸發**:
|
||||
|
||||
@@ -2665,6 +2665,15 @@ Phase 6 完成後
|
||||
- 判讀:T135 已把 runner ownership 從雙 runner 搶工收斂到 host runner 單一主控;下一段不要重新啟用 Docker-wrapped runner,而是做 runner pool / repo label 隔離、API image `apt-get` / `chown -R` 分層、Web build cache/offload、Playwright apt source-list hygiene。
|
||||
- 目前進度更新:AwoooP 告警可觀測鏈約 99.998%;Incident-level source correlation 可見性約 98.8%;Source correlation apply 狀態鏈可驗證性約 99.72%;Source correlation freshness / rolling gate 約 98.2%;前端 AI 自動化管理介面同步約 99.999%;Dashboard snapshot / SSE console noise 收斂約 99.2%;CI/CD runner hygiene 約 99.2%;Runner ownership 收斂約 96%;Build host pressure治理約 82%;完整 AI 自動化管理產品化約 99.960%。
|
||||
|
||||
**T138 CI/CD evidence API + Deployments frontend surface(2026-05-21 台北)**:
|
||||
- 觸發:T137 已把 recovered rollout-risk 從 CD log 轉成 AWOOI API/AwoooP 訊號,但 operator 仍需要在產品頁面直接看到 CI/CD、rollout-risk、post-deploy gate 的狀態,不應只靠 Telegram 或 Actions log。Live 查證發現 T137 的 `CI_rollout_risk_pending` 已寫入 `alert_operation_log`,但舊事件沒有保存 `annotations`,因此 rollout summary 只能在 CD log 看到,無法從 AwoooP API 查回。
|
||||
- 修正:`apps/api/src/api/v1/webhooks.py` 在 `ALERT_RECEIVED` op log context 保存 `annotations`;`apps/api/src/services/platform_operator_service.py` 新增 read-only `list_cicd_events()`,從 `alert_operation_log` 擷取 `CI_*` 告警證據,輸出 stage/status/severity/commit/trigger/summary/description/duration/`needs_attention`;`apps/api/src/api/v1/platform/operator_runs.py` 新增 `GET /api/v1/platform/cicd/events`;`apps/web/src/components/panels/DeploymentsPanel.tsx` 新增「CI/CD 部署證據」區塊,顯示 code-review/tests/post-deploy/rollout-risk 與 `needs_attention`;`apps/web/messages/{zh-TW,en}.json` 補 i18n。
|
||||
- Verification:local `py_compile` pass;`DATABASE_URL=postgresql+asyncpg://test:test@localhost/test PYTHONPATH=apps/api pytest apps/api/tests/test_awooop_operator_timeline_labels.py apps/api/tests/test_cicd_alertmanager_mapping.py -q` -> `45 passed`;`pnpm --filter @awoooi/web typecheck` pass;`pnpm --filter @awoooi/web lint -- --file src/components/panels/DeploymentsPanel.tsx` pass;`NEXT_PUBLIC_API_URL=https://awoooi.wooo.work pnpm --filter @awoooi/web build` pass;JSON parse + `git diff --check` pass。`4bdb012c feat(awooop): surface cicd rollout evidence` 已推 Gitea main;Code Review #2834 success;CD #2833 success:tests job 3667 `2190 passed, 23 skipped` + B5 `5 passed`,build-and-deploy job 3668 success,post-deploy job 3669 Alert Chain `8/8`、Source Link provider event matched、E2E `5 passed`、CI/CD success notification mirrored。
|
||||
- Production readback:`GET /api/v1/platform/cicd/events?project_id=awoooi&limit=10` 回 `CI_post_deploy_success` for `4bdb012caae8e000efc7d938fbdf5a65f52c0ef8`,`summary=AWOOOI 部署完成`,`description=API=✅; Web=✅; AlertChain=✅; SourceLink=✅; Monitoring=✅; Smoke=✅`,`duration_seconds=97`;舊 `CI_rollout_risk_pending` 仍為 `needs_attention=true`。Production health healthy/prod/mock_mode=false;API/Web/Worker image `4bdb012c...` ready(API 2/2、Web 2/2、Worker 1/1);ArgoCD `awoooi-prod` Synced/Healthy at `a5ed12937cc6e8e95a2be2c5453783f49d528f84`。Browser `https://awoooi.wooo.work/zh-TW/deployments`:navigation visible、「CI/CD 部署證據 12 筆」、latest `4bdb012c` rows visible、old rollout-risk visible as「需注意」。
|
||||
- 新技術債:CD #2833 的 `post-deploy-checks` 曾從 20:16:42 排隊到 20:25:44,因同一個 110 user-level `gitea-act-runner-host.service` `capacity: 1` 被 `wooo/ewoooc` 的 `ewoooc-host` deploy job 佔用;Docker `gitea-runner` 仍停用,問題是同一個 user-level runner 同時宣告 AWOOI 與 EwoooC labels。下一段應處理 runner pool / repo label isolation,避免「部署已完成但驗證被其他 repo 卡住」。
|
||||
- 判讀:T138 把 CI/CD / rollout-risk 從「通知訊息」提升成可查 API 與前端產品區塊。後續新 CI/CD 通知會保存 annotations;舊 T137 rollout-risk 因當時尚未寫 annotations,所以 summary/description 仍為 null,但已能以 `needs_attention=true` 顯示在前端。這一階段沒有新增自動修復策略;補齊的是 operator 判斷與稽核可見性,避免 Telegram 成為唯一事實來源。
|
||||
- 目前進度更新:AwoooP 告警可觀測鏈約 99.998%;Incident-level source correlation 可見性約 98.8%;Source correlation apply 狀態鏈可驗證性約 99.72%;Source correlation freshness / rolling gate 約 98.2%;前端 AI 自動化管理介面同步約 99.999%;Dashboard snapshot / SSE console noise 收斂約 99.2%;CI/CD runner hygiene 約 99.2%;Runner ownership 收斂約 96%;API image build layer hygiene 約 88%;Deploy rollout-risk 可觀測性約 91%;CI/CD evidence 前端可見性約 85%;Build host pressure治理約 86%;完整 AI 自動化管理產品化約 99.963%。
|
||||
|
||||
**T137 CD rollout-risk evidence capture(2026-05-21 台北)**:
|
||||
- 觸發:T136 deploy 期間曾短暫看到 production `502` 與 120 K8s API `ServiceUnavailable`,但原本 CD 只在最後成功/失敗通知;operator 無法從 Telegram/AwoooP 直接判斷 rollout 中間是否曾有風險、是否已恢復、是否需要人工介入。
|
||||
- 修正:`.gitea/workflows/cd.yaml` 在 `Deploy to K8s (ArgoCD GitOps)` 的 ArgoCD wait / rollout / final health check 外層加 `ROLLOUT_LOG` capture;remote deploy wait 期間偵測 ArgoCD query failure、ArgoCD `Unknown` status、public `API_HEALTH_URL` 非 200 / curl timeout。若 rollout 最終成功但曾看到 risk,輸出 `AWOOOI_ROLLOUT_RISK=1` 與 `AWOOOI_ROLLOUT_SUMMARY=...`,並只送一則 `rollout-risk` pending/warning 到 AWOOI API/AwoooP;若 rollout 最終失敗,失敗通知會帶 rollout summary,不再只寫 commit message。
|
||||
|
||||
Reference in New Issue
Block a user