docs(awooop): record t139 stage evidence [skip ci]

This commit is contained in:
Your Name
2026-05-21 20:56:40 +08:00
parent 5ed577481f
commit 8ddc783af5
2 changed files with 102 additions and 0 deletions

View File

@@ -1,3 +1,97 @@
## 2026-05-21T139 CI/CD stage transition evidence
**觸發**
- T138 已把 CI/CD evidence 顯示到 AwoooP Deployments但實測 CD #2833 發現 `post-deploy-checks` 會被同一台 110 shared runner 的其他 repo job 卡住。
- 只靠 tests running / post-deploy successoperator 仍看不出 pipeline 是卡在 build、rollout、post-deploy queue還是 post-deploy gate 本身。
**修正**
- `.gitea/workflows/cd.yaml`
- `build-and-deploy` 開始時新增 `CI_build_and_deploy_running`
- `build-and-deploy` 成功完成 image build/push、ArgoCD rollout、API health 後新增 `CI_build_and_deploy_success`
- `post-deploy-checks` 開始時新增 `CI_post_deploy_checks_running`
- 這三個通知都只走 AWOOI API/AwoooP失敗時只在 CI log warning不 fallback Telegram 洗版。
- `apps/web/src/components/panels/DeploymentsPanel.tsx`
-`build-and-deploy``post-deploy-checks` stage label。
- `apps/web/messages/zh-TW.json``apps/web/messages/en.json`
- 補「建置與部署 / Build and deploy」與「部署後驗證 / Post deploy checks」文案。
**Verification / deploy**
```text
Local:
ruby YAML parse .gitea/workflows/cd.yaml -> yaml ok
notify dry-run:
CI_build_and_deploy_running stage=build-and-deploy summary=AWOOOI 建置部署開始
CI_post_deploy_checks_running stage=post-deploy-checks summary=AWOOOI 部署後驗證開始
node JSON parse apps/web/messages/zh-TW.json apps/web/messages/en.json -> pass
git diff --check -> pass
pnpm --filter @awoooi/web typecheck -> pass
pnpm --filter @awoooi/web lint -- --file src/components/panels/DeploymentsPanel.tsx -> pass
Code commit:
f3227817 ci(cd): expose build and post-deploy stages
Gitea Actions:
#2841 ai-code-review -> success
#2840 CD -> success
tests job 3678 -> success
build-and-deploy job 3679 -> success
post-deploy-checks job 3680 -> success
deploy marker: 5ed57748 chore(cd): deploy f322781 [skip ci]
Production API readback:
GET https://awoooi.wooo.work/api/v1/platform/cicd/events?project_id=awoooi&limit=12
-> CI_tests_running for f3227817
-> CI_code_review_running / CI_code_review_success for f3227817
-> CI_build_and_deploy_running for f3227817
-> CI_build_and_deploy_success for f3227817, duration_seconds=282
-> CI_post_deploy_checks_running for f3227817
-> CI_post_deploy_success for f3227817, duration_seconds=74
Production health:
GET https://awoooi.wooo.work/api/v1/health
-> healthy, prod, mock_mode=false
-> api/postgresql/redis/ollama/openclaw/signoz up
K8s / ArgoCD:
awoooi-api 192.168.0.110:5000/awoooi/api:f322781798e34f1cf2084aba9cc813eb080e1a37 2/2
awoooi-web 192.168.0.110:5000/awoooi/web:f322781798e34f1cf2084aba9cc813eb080e1a37 2/2
awoooi-worker 192.168.0.110:5000/awoooi/api:f322781798e34f1cf2084aba9cc813eb080e1a37 1/1
awoooi-prod -> Synced / Healthy / 5ed577481fc9e008dbb8659ca706e52aab28561a
Browser verification:
https://awoooi.wooo.work/zh-TW/deployments
-> navigation visible
-> f3227817 rows visible
-> 建置與部署 running/success visible
-> 部署後驗證 running/success visible
```
**判讀**
- T139 沒有解決 shared runner pool 本身;它先讓 pipeline stage transition 變成 AwoooP 可見證據避免「build 成功但 post-deploy 還沒開始」被誤判為 Telegram 或告警黑盒。
- 這也證明 T138 的 `annotations` 保存已生效:新事件的 summary / description 都能從 API 與前端讀回。
- 下一段真正的基礎設施修復是 runner pool / repo label isolation避免 AWOOI post-deploy gate 被 ewoooc / stockplatform-v2 等 repo 佔用同一個 `capacity: 1` runner。
**目前整體進度**
- AwoooP 告警可觀測鏈99.998%。
- Incident-level source correlation 可見性98.8%。
- Source correlation apply 狀態鏈可驗證性99.72%。
- Source correlation freshness / rolling gate98.2%。
- 前端 AI 自動化管理介面同步99.999%。
- Dashboard snapshot / SSE console noise 收斂99.2%。
- CI/CD runner hygiene99.2%。
- Runner ownership 收斂96%。
- API image build layer hygiene88%。
- Deploy rollout-risk 可觀測性91%。
- CI/CD evidence 前端可見性85% → 92%。
- Pipeline stage 可觀測性45% → 88%。
- Build host pressure治理86%。
- 完整 AI 自動化管理產品化99.963% → 99.964%。
## 2026-05-21T138 CI/CD evidence API + Deployments frontend surface
**觸發**

View File

@@ -2665,6 +2665,14 @@ Phase 6 完成後
- 判讀T135 已把 runner ownership 從雙 runner 搶工收斂到 host runner 單一主控;下一段不要重新啟用 Docker-wrapped runner而是做 runner pool / repo label 隔離、API image `apt-get` / `chown -R` 分層、Web build cache/offload、Playwright apt source-list hygiene。
- 目前進度更新AwoooP 告警可觀測鏈約 99.998%Incident-level source correlation 可見性約 98.8%Source correlation apply 狀態鏈可驗證性約 99.72%Source correlation freshness / rolling gate 約 98.2%;前端 AI 自動化管理介面同步約 99.999%Dashboard snapshot / SSE console noise 收斂約 99.2%CI/CD runner hygiene 約 99.2%Runner ownership 收斂約 96%Build host pressure治理約 82%;完整 AI 自動化管理產品化約 99.960%。
**T139 CI/CD stage transition evidence2026-05-21 台北)**
- 觸發T138 已把 CI/CD evidence 顯示到 AwoooP Deployments但實測 CD #2833 發現 `post-deploy-checks` 會被同一台 110 shared runner 的其他 repo job 卡住。只靠 tests running / post-deploy successoperator 仍看不出 pipeline 是卡在 build、rollout、post-deploy queue還是 post-deploy gate 本身。
- 修正:`.gitea/workflows/cd.yaml``build-and-deploy` 開始時新增 `CI_build_and_deploy_running`,在 image build/push + ArgoCD rollout + API health 成功後新增 `CI_build_and_deploy_success`,在 `post-deploy-checks` 開始時新增 `CI_post_deploy_checks_running`;這三個通知都只走 AWOOI API/AwoooP失敗時只在 CI log warning不 fallback Telegram 洗版。`apps/web/src/components/panels/DeploymentsPanel.tsx``apps/web/messages/{zh-TW,en}.json``build-and-deploy` / `post-deploy-checks` stage label。
- Verificationlocal workflow YAML parse ok`scripts/ci/notify-awoooi-cicd.sh` dry-run 驗證 `CI_build_and_deploy_running``CI_post_deploy_checks_running` payloadmessages JSON parse ok`git diff --check` pass`pnpm --filter @awoooi/web typecheck` pass`pnpm --filter @awoooi/web lint -- --file src/components/panels/DeploymentsPanel.tsx` pass。`f3227817 ci(cd): expose build and post-deploy stages` 已推 Gitea mainCode Review #2841 successCD #2840 successtests job 3678 successbuild-and-deploy job 3679 successpost-deploy job 3680 successdeploy marker `5ed57748 chore(cd): deploy f322781 [skip ci]`
- Production readback`GET /api/v1/platform/cicd/events?project_id=awoooi&limit=12``CI_tests_running``CI_code_review_running/success``CI_build_and_deploy_running/success``CI_post_deploy_checks_running``CI_post_deploy_success` for `f322781798e34f1cf2084aba9cc813eb080e1a37``CI_build_and_deploy_success.duration_seconds=282``CI_post_deploy_success.duration_seconds=74`。Production health healthy/prod/mock_mode=falseAPI/Web/Worker image `f322781...` readyAPI 2/2、Web 2/2、Worker 1/1ArgoCD `awoooi-prod` Synced/Healthy at `5ed577481fc9e008dbb8659ca706e52aab28561a`。Browser `https://awoooi.wooo.work/zh-TW/deployments`navigation visible`f3227817` rows visible「建置與部署」running/success 與「部署後驗證」running/success visible。
- 判讀T139 沒有解決 shared runner pool 本身;它先讓 pipeline stage transition 變成 AwoooP 可見證據避免「build 成功但 post-deploy 還沒開始」被誤判為 Telegram 或告警黑盒。這也證明 T138 的 `annotations` 保存已生效:新事件的 summary / description 都能從 API 與前端讀回。下一段真正的基礎設施修復是 runner pool / repo label isolation避免 AWOOI post-deploy gate 被 ewoooc / stockplatform-v2 等 repo 佔用同一個 `capacity: 1` runner。
- 目前進度更新AwoooP 告警可觀測鏈約 99.998%Incident-level source correlation 可見性約 98.8%Source correlation apply 狀態鏈可驗證性約 99.72%Source correlation freshness / rolling gate 約 98.2%;前端 AI 自動化管理介面同步約 99.999%Dashboard snapshot / SSE console noise 收斂約 99.2%CI/CD runner hygiene 約 99.2%Runner ownership 收斂約 96%API image build layer hygiene 約 88%Deploy rollout-risk 可觀測性約 91%CI/CD evidence 前端可見性約 92%Pipeline stage 可觀測性約 88%Build host pressure治理約 86%;完整 AI 自動化管理產品化約 99.964%。
**T138 CI/CD evidence API + Deployments frontend surface2026-05-21 台北)**
- 觸發T137 已把 recovered rollout-risk 從 CD log 轉成 AWOOI API/AwoooP 訊號,但 operator 仍需要在產品頁面直接看到 CI/CD、rollout-risk、post-deploy gate 的狀態,不應只靠 Telegram 或 Actions log。Live 查證發現 T137 的 `CI_rollout_risk_pending` 已寫入 `alert_operation_log`,但舊事件沒有保存 `annotations`,因此 rollout summary 只能在 CD log 看到,無法從 AwoooP API 查回。
- 修正:`apps/api/src/api/v1/webhooks.py``ALERT_RECEIVED` op log context 保存 `annotations``apps/api/src/services/platform_operator_service.py` 新增 read-only `list_cicd_events()`,從 `alert_operation_log` 擷取 `CI_*` 告警證據,輸出 stage/status/severity/commit/trigger/summary/description/duration/`needs_attention``apps/api/src/api/v1/platform/operator_runs.py` 新增 `GET /api/v1/platform/cicd/events``apps/web/src/components/panels/DeploymentsPanel.tsx` 新增「CI/CD 部署證據」區塊,顯示 code-review/tests/post-deploy/rollout-risk 與 `needs_attention``apps/web/messages/{zh-TW,en}.json` 補 i18n。