From dc34e81224630638c7a7202a12fdc17867fc1e1a Mon Sep 17 00:00:00 2001 From: Your Name Date: Tue, 19 May 2026 14:19:34 +0800 Subject: [PATCH] docs(awooop): record ai route visibility rollout [skip ci] --- docs/LOGBOOK.md | 131 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 131 insertions(+) diff --git a/docs/LOGBOOK.md b/docs/LOGBOOK.md index 8d8bc447..a21873dd 100644 --- a/docs/LOGBOOK.md +++ b/docs/LOGBOOK.md @@ -1,3 +1,134 @@ +## 2026-05-19|T79/T80 AI Provider 路由前端可視化 + CI/CD 通知主路徑修復 + +**背景**: + +- 統帥校正:所有 Ollama 類路徑必須固定為 `GCP-A → GCP-B → 111 local → Gemini`,且這個順序不能只存在於 Telegram 或 pod smoke,Operator Console 必須看得到目前 primary / fallback / health。 +- T79 目標是把 AI provider route status 做成 AwoooP Runs 的可見狀態,不再讓 Operator 猜測到底跑到 GCP-A、GCP-B、111 或 Gemini。 +- T79 production CD 另外暴露一個通知技術債:post-deploy job container 沒有 `python3` 時,`scripts/ci/notify-awoooi-cicd.sh` 無法產生 Alertmanager JSON,導致 success notification 退回 direct Telegram fallback。T80 立即修正,讓 CI/CD success notification 回到 AWOOI API / AwoooP timeline 主路徑。 + +**T79 完成變更**: + +- 新增 read-only `GET /api/v1/platform/ai-route-status?workload_type=deep_rca`。 +- Response 會回傳: + - `schema_version=awooop_ai_route_status_v1` + - `policy_order=ollama_gcp_a → ollama_gcp_b → ollama_local → gemini` + - live `selected_provider / selected_url / selected_model` + - `fallback_chain` + - health map;GCP-A healthy 時 GCP-B / 111 顯示 `not_checked`,避免誤讀為壞掉。 +- AwoooP Runs 前端新增「AI Provider 路由」區塊,顯示策略順序、目前 primary、model、health、latency、URL 與 active/standby。 +- i18n 補 `zh-TW` / `en`;新增區塊沒有引入新的 literal-string warning。 + +**T80 完成變更**: + +- `scripts/ci/notify-awoooi-cicd.sh` 保留 Python payload builder。 +- 新增 Node.js payload builder fallback;當 job container 沒有 `python3`、但有 `node` 時,仍能產生同一份 Alertmanager/AWOOI JSON payload。 +- 若 `python3` 與 `node` 都不存在,才回傳明確錯誤,讓呼叫端 fallback Telegram。 + +**本地驗證**: + +```text +python -m py_compile + apps/api/src/services/platform_operator_service.py + apps/api/src/api/v1/platform/operator_runs.py + apps/api/tests/test_awooop_operator_timeline_labels.py + -> OK + +ruff check --select F,E9,I + touched backend files + -> OK + +pytest + test_awooop_operator_timeline_labels.py + test_ollama_endpoint_resolver.py + test_ollama_failover_manager.py + -> 76 passed + +jq empty apps/web/messages/zh-TW.json apps/web/messages/en.json -> OK +pnpm --filter @awoooi/web typecheck -> OK +pnpm --dir apps/web exec next lint --file src/app/[locale]/awooop/runs/page.tsx + -> exit 0;此頁既有 literal-string warnings 仍存在,本輪新增區塊走 i18n +NEXT_PUBLIC_API_URL=https://awoooi.wooo.work pnpm --filter @awoooi/web build -> OK + +bash -n scripts/ci/notify-awoooi-cicd.sh -> OK +AWOOI_CICD_DRY_RUN=1 ... notify-awoooi-cicd.sh | jq + -> receiver=awoooi-cicd, alertname=CI_post_deploy_success, status=success +PATH=node-only AWOOI_CICD_DRY_RUN=1 ... notify-awoooi-cicd.sh | jq + -> receiver=awoooi-cicd, alertname=CI_post_deploy_success, status=success +git diff --check -> OK +``` + +**Commit / Deploy**: + +```text +56a8085d feat(awooop): surface ai provider route status +570b99e9 chore(cd): deploy 56a8085 [skip ci] + +170f927b fix(ci): build cicd notification payload without python +815dcf37 chore(cd): deploy 170f927 [skip ci] +``` + +**Gitea Actions**: + +```text +2445 Code Review for 56a8085d -> success +2444 CD for 56a8085d -> success + tests -> success + build-and-deploy -> success + post-deploy-checks -> success + +2449 Code Review for 170f927b -> success +2450 CD workflow_dispatch for 170f927b -> success + tests -> success + build-and-deploy -> success + post-deploy-checks -> success +``` + +**Production 驗證**: + +```text +K8s image after T80: +awoooi-api 192.168.0.110:5000/awoooi/api:170f927bc677da492d222d561504d6fe4b82c0f1 +awoooi-worker 192.168.0.110:5000/awoooi/api:170f927bc677da492d222d561504d6fe4b82c0f1 +awoooi-web 192.168.0.110:5000/awoooi/web:170f927bc677da492d222d561504d6fe4b82c0f1 + +GET https://awoooi.wooo.work/api/v1/health + -> healthy, prod, mock_mode=false + +GET /api/v1/platform/ai-route-status?workload_type=deep_rca + -> selected_provider=ollama_gcp_a + -> selected_url=http://34.143.170.20:11434 + -> selected_model=gemma3:4b + -> policy_order=ollama_gcp_a → ollama_gcp_b → ollama_local → gemini + -> fallback_chain=ollama_gcp_b → ollama_local → gemini + +Production Playwright smoke on /zh-TW/awooop/runs: + -> AI Provider 路由 visible + -> ollama_gcp_a / ollama_gcp_b / ollama_local / gemini visible + -> Primary=ollama_gcp_a visible + -> route error not visible + +CD post-deploy notification after T80: + -> AwoooP-mirrored CI/CD notification sent via http://192.168.0.125:32334/api/v1/webhooks/alertmanager + -> CI/CD success notification mirrored through AWOOI API + -> no python3 missing fallback +``` + +**邊界 / 技術債**: + +- T79 是路由狀態可視化,不會觸發 inference、自動修復、approval 或 incident 狀態變更。 +- T80 修掉 success notification 因 `python3 missing` 回退 Telegram 的問題;direct Telegram fallback 仍保留作為 API 離線保底。 +- Gitea act runner 仍偶發 cleanup warning(`__pycache__` permission / symlink cleanup),目前 job conclusion 為 success;這是 runner hygiene 技術債,不影響本輪交付。 + +**目前整體進度**: + +- AwoooP 告警可觀測鏈:約 96.5%。 +- 低風險自動修復閉環:約 95%。 +- 前端 AI 自動化管理介面同步:約 92.5%。 +- CI/CD notification AwoooP 主路徑:約 99%。 +- 完整 AI 自動化管理產品化:約 89%。 + +--- + ## 2026-05-19 | T72 Homepage live status and flow-pipeline stabilization **背景**:首頁 `https://awoooi.wooo.work/zh-TW` 已能載入 production 資料,但值班視角仍有三個明顯斷點:飛輪 KPI 卡會持續嘗試 production 未接通的 `/api/v1/stats/flywheel/ws` WebSocket 並造成 console 噪音;每張 IncidentCard 都各自抓 CSRF token,活躍事件很多時會把首頁網路請求放大;小龍蝦 / OpenClaw 流程管線只看 `incident.status`,沒有把 `decision.state` / proposal evidence 納入,導致已有 AI 提案或待授權的事件看起來仍停在早期偵測。