From 8ddc783af5a52bf160d232425d27e55ffde6acba Mon Sep 17 00:00:00 2001 From: Your Name Date: Thu, 21 May 2026 20:56:40 +0800 Subject: [PATCH] docs(awooop): record t139 stage evidence [skip ci] --- docs/LOGBOOK.md | 94 +++++++++++++++++++ ...-04-15-MASTER-ai-autonomous-flywheel-v2.md | 8 ++ 2 files changed, 102 insertions(+) diff --git a/docs/LOGBOOK.md b/docs/LOGBOOK.md index e0ae4781..eec36ab7 100644 --- a/docs/LOGBOOK.md +++ b/docs/LOGBOOK.md @@ -1,3 +1,97 @@ +## 2026-05-21|T139 CI/CD stage transition evidence + +**觸發**: + +- T138 已把 CI/CD evidence 顯示到 AwoooP Deployments,但實測 CD #2833 發現 `post-deploy-checks` 會被同一台 110 shared runner 的其他 repo job 卡住。 +- 只靠 tests running / post-deploy success,operator 仍看不出 pipeline 是卡在 build、rollout、post-deploy queue,還是 post-deploy gate 本身。 + +**修正**: + +- `.gitea/workflows/cd.yaml` + - `build-and-deploy` 開始時新增 `CI_build_and_deploy_running`。 + - `build-and-deploy` 成功完成 image build/push、ArgoCD rollout、API health 後新增 `CI_build_and_deploy_success`。 + - `post-deploy-checks` 開始時新增 `CI_post_deploy_checks_running`。 + - 這三個通知都只走 AWOOI API/AwoooP,失敗時只在 CI log warning,不 fallback Telegram 洗版。 +- `apps/web/src/components/panels/DeploymentsPanel.tsx` + - 補 `build-and-deploy`、`post-deploy-checks` stage label。 +- `apps/web/messages/zh-TW.json`、`apps/web/messages/en.json` + - 補「建置與部署 / Build and deploy」與「部署後驗證 / Post deploy checks」文案。 + +**Verification / deploy**: + +```text +Local: +ruby YAML parse .gitea/workflows/cd.yaml -> yaml ok +notify dry-run: + CI_build_and_deploy_running stage=build-and-deploy summary=AWOOOI 建置部署開始 + CI_post_deploy_checks_running stage=post-deploy-checks summary=AWOOOI 部署後驗證開始 +node JSON parse apps/web/messages/zh-TW.json apps/web/messages/en.json -> pass +git diff --check -> pass +pnpm --filter @awoooi/web typecheck -> pass +pnpm --filter @awoooi/web lint -- --file src/components/panels/DeploymentsPanel.tsx -> pass + +Code commit: +f3227817 ci(cd): expose build and post-deploy stages + +Gitea Actions: +#2841 ai-code-review -> success +#2840 CD -> success + tests job 3678 -> success + build-and-deploy job 3679 -> success + post-deploy-checks job 3680 -> success + deploy marker: 5ed57748 chore(cd): deploy f322781 [skip ci] + +Production API readback: +GET https://awoooi.wooo.work/api/v1/platform/cicd/events?project_id=awoooi&limit=12 + -> CI_tests_running for f3227817 + -> CI_code_review_running / CI_code_review_success for f3227817 + -> CI_build_and_deploy_running for f3227817 + -> CI_build_and_deploy_success for f3227817, duration_seconds=282 + -> CI_post_deploy_checks_running for f3227817 + -> CI_post_deploy_success for f3227817, duration_seconds=74 + +Production health: +GET https://awoooi.wooo.work/api/v1/health + -> healthy, prod, mock_mode=false + -> api/postgresql/redis/ollama/openclaw/signoz up + +K8s / ArgoCD: + awoooi-api 192.168.0.110:5000/awoooi/api:f322781798e34f1cf2084aba9cc813eb080e1a37 2/2 + awoooi-web 192.168.0.110:5000/awoooi/web:f322781798e34f1cf2084aba9cc813eb080e1a37 2/2 + awoooi-worker 192.168.0.110:5000/awoooi/api:f322781798e34f1cf2084aba9cc813eb080e1a37 1/1 + awoooi-prod -> Synced / Healthy / 5ed577481fc9e008dbb8659ca706e52aab28561a + +Browser verification: +https://awoooi.wooo.work/zh-TW/deployments + -> navigation visible + -> f3227817 rows visible + -> 建置與部署 running/success visible + -> 部署後驗證 running/success visible +``` + +**判讀**: + +- T139 沒有解決 shared runner pool 本身;它先讓 pipeline stage transition 變成 AwoooP 可見證據,避免「build 成功但 post-deploy 還沒開始」被誤判為 Telegram 或告警黑盒。 +- 這也證明 T138 的 `annotations` 保存已生效:新事件的 summary / description 都能從 API 與前端讀回。 +- 下一段真正的基礎設施修復是 runner pool / repo label isolation,避免 AWOOI post-deploy gate 被 ewoooc / stockplatform-v2 等 repo 佔用同一個 `capacity: 1` runner。 + +**目前整體進度**: + +- AwoooP 告警可觀測鏈:99.998%。 +- Incident-level source correlation 可見性:98.8%。 +- Source correlation apply 狀態鏈可驗證性:99.72%。 +- Source correlation freshness / rolling gate:98.2%。 +- 前端 AI 自動化管理介面同步:99.999%。 +- Dashboard snapshot / SSE console noise 收斂:99.2%。 +- CI/CD runner hygiene:99.2%。 +- Runner ownership 收斂:96%。 +- API image build layer hygiene:88%。 +- Deploy rollout-risk 可觀測性:91%。 +- CI/CD evidence 前端可見性:85% → 92%。 +- Pipeline stage 可觀測性:45% → 88%。 +- Build host pressure治理:86%。 +- 完整 AI 自動化管理產品化:99.963% → 99.964%。 + ## 2026-05-21|T138 CI/CD evidence API + Deployments frontend surface **觸發**: diff --git a/docs/superpowers/specs/2026-04-15-MASTER-ai-autonomous-flywheel-v2.md b/docs/superpowers/specs/2026-04-15-MASTER-ai-autonomous-flywheel-v2.md index 86dc8097..e3063456 100644 --- a/docs/superpowers/specs/2026-04-15-MASTER-ai-autonomous-flywheel-v2.md +++ b/docs/superpowers/specs/2026-04-15-MASTER-ai-autonomous-flywheel-v2.md @@ -2665,6 +2665,14 @@ Phase 6 完成後 - 判讀:T135 已把 runner ownership 從雙 runner 搶工收斂到 host runner 單一主控;下一段不要重新啟用 Docker-wrapped runner,而是做 runner pool / repo label 隔離、API image `apt-get` / `chown -R` 分層、Web build cache/offload、Playwright apt source-list hygiene。 - 目前進度更新:AwoooP 告警可觀測鏈約 99.998%;Incident-level source correlation 可見性約 98.8%;Source correlation apply 狀態鏈可驗證性約 99.72%;Source correlation freshness / rolling gate 約 98.2%;前端 AI 自動化管理介面同步約 99.999%;Dashboard snapshot / SSE console noise 收斂約 99.2%;CI/CD runner hygiene 約 99.2%;Runner ownership 收斂約 96%;Build host pressure治理約 82%;完整 AI 自動化管理產品化約 99.960%。 +**T139 CI/CD stage transition evidence(2026-05-21 台北)**: +- 觸發:T138 已把 CI/CD evidence 顯示到 AwoooP Deployments,但實測 CD #2833 發現 `post-deploy-checks` 會被同一台 110 shared runner 的其他 repo job 卡住。只靠 tests running / post-deploy success,operator 仍看不出 pipeline 是卡在 build、rollout、post-deploy queue,還是 post-deploy gate 本身。 +- 修正:`.gitea/workflows/cd.yaml` 在 `build-and-deploy` 開始時新增 `CI_build_and_deploy_running`,在 image build/push + ArgoCD rollout + API health 成功後新增 `CI_build_and_deploy_success`,在 `post-deploy-checks` 開始時新增 `CI_post_deploy_checks_running`;這三個通知都只走 AWOOI API/AwoooP,失敗時只在 CI log warning,不 fallback Telegram 洗版。`apps/web/src/components/panels/DeploymentsPanel.tsx` 與 `apps/web/messages/{zh-TW,en}.json` 補 `build-and-deploy` / `post-deploy-checks` stage label。 +- Verification:local workflow YAML parse ok;`scripts/ci/notify-awoooi-cicd.sh` dry-run 驗證 `CI_build_and_deploy_running` 與 `CI_post_deploy_checks_running` payload;messages JSON parse ok;`git diff --check` pass;`pnpm --filter @awoooi/web typecheck` pass;`pnpm --filter @awoooi/web lint -- --file src/components/panels/DeploymentsPanel.tsx` pass。`f3227817 ci(cd): expose build and post-deploy stages` 已推 Gitea main;Code Review #2841 success;CD #2840 success:tests job 3678 success,build-and-deploy job 3679 success,post-deploy job 3680 success,deploy marker `5ed57748 chore(cd): deploy f322781 [skip ci]`。 +- Production readback:`GET /api/v1/platform/cicd/events?project_id=awoooi&limit=12` 回 `CI_tests_running`、`CI_code_review_running/success`、`CI_build_and_deploy_running/success`、`CI_post_deploy_checks_running`、`CI_post_deploy_success` for `f322781798e34f1cf2084aba9cc813eb080e1a37`;`CI_build_and_deploy_success.duration_seconds=282`,`CI_post_deploy_success.duration_seconds=74`。Production health healthy/prod/mock_mode=false;API/Web/Worker image `f322781...` ready(API 2/2、Web 2/2、Worker 1/1);ArgoCD `awoooi-prod` Synced/Healthy at `5ed577481fc9e008dbb8659ca706e52aab28561a`。Browser `https://awoooi.wooo.work/zh-TW/deployments`:navigation visible,`f3227817` rows visible,「建置與部署」running/success 與「部署後驗證」running/success visible。 +- 判讀:T139 沒有解決 shared runner pool 本身;它先讓 pipeline stage transition 變成 AwoooP 可見證據,避免「build 成功但 post-deploy 還沒開始」被誤判為 Telegram 或告警黑盒。這也證明 T138 的 `annotations` 保存已生效:新事件的 summary / description 都能從 API 與前端讀回。下一段真正的基礎設施修復是 runner pool / repo label isolation,避免 AWOOI post-deploy gate 被 ewoooc / stockplatform-v2 等 repo 佔用同一個 `capacity: 1` runner。 +- 目前進度更新:AwoooP 告警可觀測鏈約 99.998%;Incident-level source correlation 可見性約 98.8%;Source correlation apply 狀態鏈可驗證性約 99.72%;Source correlation freshness / rolling gate 約 98.2%;前端 AI 自動化管理介面同步約 99.999%;Dashboard snapshot / SSE console noise 收斂約 99.2%;CI/CD runner hygiene 約 99.2%;Runner ownership 收斂約 96%;API image build layer hygiene 約 88%;Deploy rollout-risk 可觀測性約 91%;CI/CD evidence 前端可見性約 92%;Pipeline stage 可觀測性約 88%;Build host pressure治理約 86%;完整 AI 自動化管理產品化約 99.964%。 + **T138 CI/CD evidence API + Deployments frontend surface(2026-05-21 台北)**: - 觸發:T137 已把 recovered rollout-risk 從 CD log 轉成 AWOOI API/AwoooP 訊號,但 operator 仍需要在產品頁面直接看到 CI/CD、rollout-risk、post-deploy gate 的狀態,不應只靠 Telegram 或 Actions log。Live 查證發現 T137 的 `CI_rollout_risk_pending` 已寫入 `alert_operation_log`,但舊事件沒有保存 `annotations`,因此 rollout summary 只能在 CD log 看到,無法從 AwoooP API 查回。 - 修正:`apps/api/src/api/v1/webhooks.py` 在 `ALERT_RECEIVED` op log context 保存 `annotations`;`apps/api/src/services/platform_operator_service.py` 新增 read-only `list_cicd_events()`,從 `alert_operation_log` 擷取 `CI_*` 告警證據,輸出 stage/status/severity/commit/trigger/summary/description/duration/`needs_attention`;`apps/api/src/api/v1/platform/operator_runs.py` 新增 `GET /api/v1/platform/cicd/events`;`apps/web/src/components/panels/DeploymentsPanel.tsx` 新增「CI/CD 部署證據」區塊,顯示 code-review/tests/post-deploy/rollout-risk 與 `needs_attention`;`apps/web/messages/{zh-TW,en}.json` 補 i18n。