docs(awooop): record t139 stage evidence [skip ci]
This commit is contained in:
@@ -1,3 +1,97 @@
|
||||
## 2026-05-21|T139 CI/CD stage transition evidence
|
||||
|
||||
**觸發**:
|
||||
|
||||
- T138 已把 CI/CD evidence 顯示到 AwoooP Deployments,但實測 CD #2833 發現 `post-deploy-checks` 會被同一台 110 shared runner 的其他 repo job 卡住。
|
||||
- 只靠 tests running / post-deploy success,operator 仍看不出 pipeline 是卡在 build、rollout、post-deploy queue,還是 post-deploy gate 本身。
|
||||
|
||||
**修正**:
|
||||
|
||||
- `.gitea/workflows/cd.yaml`
|
||||
- `build-and-deploy` 開始時新增 `CI_build_and_deploy_running`。
|
||||
- `build-and-deploy` 成功完成 image build/push、ArgoCD rollout、API health 後新增 `CI_build_and_deploy_success`。
|
||||
- `post-deploy-checks` 開始時新增 `CI_post_deploy_checks_running`。
|
||||
- 這三個通知都只走 AWOOI API/AwoooP,失敗時只在 CI log warning,不 fallback Telegram 洗版。
|
||||
- `apps/web/src/components/panels/DeploymentsPanel.tsx`
|
||||
- 補 `build-and-deploy`、`post-deploy-checks` stage label。
|
||||
- `apps/web/messages/zh-TW.json`、`apps/web/messages/en.json`
|
||||
- 補「建置與部署 / Build and deploy」與「部署後驗證 / Post deploy checks」文案。
|
||||
|
||||
**Verification / deploy**:
|
||||
|
||||
```text
|
||||
Local:
|
||||
ruby YAML parse .gitea/workflows/cd.yaml -> yaml ok
|
||||
notify dry-run:
|
||||
CI_build_and_deploy_running stage=build-and-deploy summary=AWOOOI 建置部署開始
|
||||
CI_post_deploy_checks_running stage=post-deploy-checks summary=AWOOOI 部署後驗證開始
|
||||
node JSON parse apps/web/messages/zh-TW.json apps/web/messages/en.json -> pass
|
||||
git diff --check -> pass
|
||||
pnpm --filter @awoooi/web typecheck -> pass
|
||||
pnpm --filter @awoooi/web lint -- --file src/components/panels/DeploymentsPanel.tsx -> pass
|
||||
|
||||
Code commit:
|
||||
f3227817 ci(cd): expose build and post-deploy stages
|
||||
|
||||
Gitea Actions:
|
||||
#2841 ai-code-review -> success
|
||||
#2840 CD -> success
|
||||
tests job 3678 -> success
|
||||
build-and-deploy job 3679 -> success
|
||||
post-deploy-checks job 3680 -> success
|
||||
deploy marker: 5ed57748 chore(cd): deploy f322781 [skip ci]
|
||||
|
||||
Production API readback:
|
||||
GET https://awoooi.wooo.work/api/v1/platform/cicd/events?project_id=awoooi&limit=12
|
||||
-> CI_tests_running for f3227817
|
||||
-> CI_code_review_running / CI_code_review_success for f3227817
|
||||
-> CI_build_and_deploy_running for f3227817
|
||||
-> CI_build_and_deploy_success for f3227817, duration_seconds=282
|
||||
-> CI_post_deploy_checks_running for f3227817
|
||||
-> CI_post_deploy_success for f3227817, duration_seconds=74
|
||||
|
||||
Production health:
|
||||
GET https://awoooi.wooo.work/api/v1/health
|
||||
-> healthy, prod, mock_mode=false
|
||||
-> api/postgresql/redis/ollama/openclaw/signoz up
|
||||
|
||||
K8s / ArgoCD:
|
||||
awoooi-api 192.168.0.110:5000/awoooi/api:f322781798e34f1cf2084aba9cc813eb080e1a37 2/2
|
||||
awoooi-web 192.168.0.110:5000/awoooi/web:f322781798e34f1cf2084aba9cc813eb080e1a37 2/2
|
||||
awoooi-worker 192.168.0.110:5000/awoooi/api:f322781798e34f1cf2084aba9cc813eb080e1a37 1/1
|
||||
awoooi-prod -> Synced / Healthy / 5ed577481fc9e008dbb8659ca706e52aab28561a
|
||||
|
||||
Browser verification:
|
||||
https://awoooi.wooo.work/zh-TW/deployments
|
||||
-> navigation visible
|
||||
-> f3227817 rows visible
|
||||
-> 建置與部署 running/success visible
|
||||
-> 部署後驗證 running/success visible
|
||||
```
|
||||
|
||||
**判讀**:
|
||||
|
||||
- T139 沒有解決 shared runner pool 本身;它先讓 pipeline stage transition 變成 AwoooP 可見證據,避免「build 成功但 post-deploy 還沒開始」被誤判為 Telegram 或告警黑盒。
|
||||
- 這也證明 T138 的 `annotations` 保存已生效:新事件的 summary / description 都能從 API 與前端讀回。
|
||||
- 下一段真正的基礎設施修復是 runner pool / repo label isolation,避免 AWOOI post-deploy gate 被 ewoooc / stockplatform-v2 等 repo 佔用同一個 `capacity: 1` runner。
|
||||
|
||||
**目前整體進度**:
|
||||
|
||||
- AwoooP 告警可觀測鏈:99.998%。
|
||||
- Incident-level source correlation 可見性:98.8%。
|
||||
- Source correlation apply 狀態鏈可驗證性:99.72%。
|
||||
- Source correlation freshness / rolling gate:98.2%。
|
||||
- 前端 AI 自動化管理介面同步:99.999%。
|
||||
- Dashboard snapshot / SSE console noise 收斂:99.2%。
|
||||
- CI/CD runner hygiene:99.2%。
|
||||
- Runner ownership 收斂:96%。
|
||||
- API image build layer hygiene:88%。
|
||||
- Deploy rollout-risk 可觀測性:91%。
|
||||
- CI/CD evidence 前端可見性:85% → 92%。
|
||||
- Pipeline stage 可觀測性:45% → 88%。
|
||||
- Build host pressure治理:86%。
|
||||
- 完整 AI 自動化管理產品化:99.963% → 99.964%。
|
||||
|
||||
## 2026-05-21|T138 CI/CD evidence API + Deployments frontend surface
|
||||
|
||||
**觸發**:
|
||||
|
||||
@@ -2665,6 +2665,14 @@ Phase 6 完成後
|
||||
- 判讀:T135 已把 runner ownership 從雙 runner 搶工收斂到 host runner 單一主控;下一段不要重新啟用 Docker-wrapped runner,而是做 runner pool / repo label 隔離、API image `apt-get` / `chown -R` 分層、Web build cache/offload、Playwright apt source-list hygiene。
|
||||
- 目前進度更新:AwoooP 告警可觀測鏈約 99.998%;Incident-level source correlation 可見性約 98.8%;Source correlation apply 狀態鏈可驗證性約 99.72%;Source correlation freshness / rolling gate 約 98.2%;前端 AI 自動化管理介面同步約 99.999%;Dashboard snapshot / SSE console noise 收斂約 99.2%;CI/CD runner hygiene 約 99.2%;Runner ownership 收斂約 96%;Build host pressure治理約 82%;完整 AI 自動化管理產品化約 99.960%。
|
||||
|
||||
**T139 CI/CD stage transition evidence(2026-05-21 台北)**:
|
||||
- 觸發:T138 已把 CI/CD evidence 顯示到 AwoooP Deployments,但實測 CD #2833 發現 `post-deploy-checks` 會被同一台 110 shared runner 的其他 repo job 卡住。只靠 tests running / post-deploy success,operator 仍看不出 pipeline 是卡在 build、rollout、post-deploy queue,還是 post-deploy gate 本身。
|
||||
- 修正:`.gitea/workflows/cd.yaml` 在 `build-and-deploy` 開始時新增 `CI_build_and_deploy_running`,在 image build/push + ArgoCD rollout + API health 成功後新增 `CI_build_and_deploy_success`,在 `post-deploy-checks` 開始時新增 `CI_post_deploy_checks_running`;這三個通知都只走 AWOOI API/AwoooP,失敗時只在 CI log warning,不 fallback Telegram 洗版。`apps/web/src/components/panels/DeploymentsPanel.tsx` 與 `apps/web/messages/{zh-TW,en}.json` 補 `build-and-deploy` / `post-deploy-checks` stage label。
|
||||
- Verification:local workflow YAML parse ok;`scripts/ci/notify-awoooi-cicd.sh` dry-run 驗證 `CI_build_and_deploy_running` 與 `CI_post_deploy_checks_running` payload;messages JSON parse ok;`git diff --check` pass;`pnpm --filter @awoooi/web typecheck` pass;`pnpm --filter @awoooi/web lint -- --file src/components/panels/DeploymentsPanel.tsx` pass。`f3227817 ci(cd): expose build and post-deploy stages` 已推 Gitea main;Code Review #2841 success;CD #2840 success:tests job 3678 success,build-and-deploy job 3679 success,post-deploy job 3680 success,deploy marker `5ed57748 chore(cd): deploy f322781 [skip ci]`。
|
||||
- Production readback:`GET /api/v1/platform/cicd/events?project_id=awoooi&limit=12` 回 `CI_tests_running`、`CI_code_review_running/success`、`CI_build_and_deploy_running/success`、`CI_post_deploy_checks_running`、`CI_post_deploy_success` for `f322781798e34f1cf2084aba9cc813eb080e1a37`;`CI_build_and_deploy_success.duration_seconds=282`,`CI_post_deploy_success.duration_seconds=74`。Production health healthy/prod/mock_mode=false;API/Web/Worker image `f322781...` ready(API 2/2、Web 2/2、Worker 1/1);ArgoCD `awoooi-prod` Synced/Healthy at `5ed577481fc9e008dbb8659ca706e52aab28561a`。Browser `https://awoooi.wooo.work/zh-TW/deployments`:navigation visible,`f3227817` rows visible,「建置與部署」running/success 與「部署後驗證」running/success visible。
|
||||
- 判讀:T139 沒有解決 shared runner pool 本身;它先讓 pipeline stage transition 變成 AwoooP 可見證據,避免「build 成功但 post-deploy 還沒開始」被誤判為 Telegram 或告警黑盒。這也證明 T138 的 `annotations` 保存已生效:新事件的 summary / description 都能從 API 與前端讀回。下一段真正的基礎設施修復是 runner pool / repo label isolation,避免 AWOOI post-deploy gate 被 ewoooc / stockplatform-v2 等 repo 佔用同一個 `capacity: 1` runner。
|
||||
- 目前進度更新:AwoooP 告警可觀測鏈約 99.998%;Incident-level source correlation 可見性約 98.8%;Source correlation apply 狀態鏈可驗證性約 99.72%;Source correlation freshness / rolling gate 約 98.2%;前端 AI 自動化管理介面同步約 99.999%;Dashboard snapshot / SSE console noise 收斂約 99.2%;CI/CD runner hygiene 約 99.2%;Runner ownership 收斂約 96%;API image build layer hygiene 約 88%;Deploy rollout-risk 可觀測性約 91%;CI/CD evidence 前端可見性約 92%;Pipeline stage 可觀測性約 88%;Build host pressure治理約 86%;完整 AI 自動化管理產品化約 99.964%。
|
||||
|
||||
**T138 CI/CD evidence API + Deployments frontend surface(2026-05-21 台北)**:
|
||||
- 觸發:T137 已把 recovered rollout-risk 從 CD log 轉成 AWOOI API/AwoooP 訊號,但 operator 仍需要在產品頁面直接看到 CI/CD、rollout-risk、post-deploy gate 的狀態,不應只靠 Telegram 或 Actions log。Live 查證發現 T137 的 `CI_rollout_risk_pending` 已寫入 `alert_operation_log`,但舊事件沒有保存 `annotations`,因此 rollout summary 只能在 CD log 看到,無法從 AwoooP API 查回。
|
||||
- 修正:`apps/api/src/api/v1/webhooks.py` 在 `ALERT_RECEIVED` op log context 保存 `annotations`;`apps/api/src/services/platform_operator_service.py` 新增 read-only `list_cicd_events()`,從 `alert_operation_log` 擷取 `CI_*` 告警證據,輸出 stage/status/severity/commit/trigger/summary/description/duration/`needs_attention`;`apps/api/src/api/v1/platform/operator_runs.py` 新增 `GET /api/v1/platform/cicd/events`;`apps/web/src/components/panels/DeploymentsPanel.tsx` 新增「CI/CD 部署證據」區塊,顯示 code-review/tests/post-deploy/rollout-risk 與 `needs_attention`;`apps/web/messages/{zh-TW,en}.json` 補 i18n。
|
||||
|
||||
Reference in New Issue
Block a user