docs(awooop): record ai route visibility rollout [skip ci]
This commit is contained in:
131
docs/LOGBOOK.md
131
docs/LOGBOOK.md
@@ -1,3 +1,134 @@
|
||||
## 2026-05-19|T79/T80 AI Provider 路由前端可視化 + CI/CD 通知主路徑修復
|
||||
|
||||
**背景**:
|
||||
|
||||
- 統帥校正:所有 Ollama 類路徑必須固定為 `GCP-A → GCP-B → 111 local → Gemini`,且這個順序不能只存在於 Telegram 或 pod smoke,Operator Console 必須看得到目前 primary / fallback / health。
|
||||
- T79 目標是把 AI provider route status 做成 AwoooP Runs 的可見狀態,不再讓 Operator 猜測到底跑到 GCP-A、GCP-B、111 或 Gemini。
|
||||
- T79 production CD 另外暴露一個通知技術債:post-deploy job container 沒有 `python3` 時,`scripts/ci/notify-awoooi-cicd.sh` 無法產生 Alertmanager JSON,導致 success notification 退回 direct Telegram fallback。T80 立即修正,讓 CI/CD success notification 回到 AWOOI API / AwoooP timeline 主路徑。
|
||||
|
||||
**T79 完成變更**:
|
||||
|
||||
- 新增 read-only `GET /api/v1/platform/ai-route-status?workload_type=deep_rca`。
|
||||
- Response 會回傳:
|
||||
- `schema_version=awooop_ai_route_status_v1`
|
||||
- `policy_order=ollama_gcp_a → ollama_gcp_b → ollama_local → gemini`
|
||||
- live `selected_provider / selected_url / selected_model`
|
||||
- `fallback_chain`
|
||||
- health map;GCP-A healthy 時 GCP-B / 111 顯示 `not_checked`,避免誤讀為壞掉。
|
||||
- AwoooP Runs 前端新增「AI Provider 路由」區塊,顯示策略順序、目前 primary、model、health、latency、URL 與 active/standby。
|
||||
- i18n 補 `zh-TW` / `en`;新增區塊沒有引入新的 literal-string warning。
|
||||
|
||||
**T80 完成變更**:
|
||||
|
||||
- `scripts/ci/notify-awoooi-cicd.sh` 保留 Python payload builder。
|
||||
- 新增 Node.js payload builder fallback;當 job container 沒有 `python3`、但有 `node` 時,仍能產生同一份 Alertmanager/AWOOI JSON payload。
|
||||
- 若 `python3` 與 `node` 都不存在,才回傳明確錯誤,讓呼叫端 fallback Telegram。
|
||||
|
||||
**本地驗證**:
|
||||
|
||||
```text
|
||||
python -m py_compile
|
||||
apps/api/src/services/platform_operator_service.py
|
||||
apps/api/src/api/v1/platform/operator_runs.py
|
||||
apps/api/tests/test_awooop_operator_timeline_labels.py
|
||||
-> OK
|
||||
|
||||
ruff check --select F,E9,I
|
||||
touched backend files
|
||||
-> OK
|
||||
|
||||
pytest
|
||||
test_awooop_operator_timeline_labels.py
|
||||
test_ollama_endpoint_resolver.py
|
||||
test_ollama_failover_manager.py
|
||||
-> 76 passed
|
||||
|
||||
jq empty apps/web/messages/zh-TW.json apps/web/messages/en.json -> OK
|
||||
pnpm --filter @awoooi/web typecheck -> OK
|
||||
pnpm --dir apps/web exec next lint --file src/app/[locale]/awooop/runs/page.tsx
|
||||
-> exit 0;此頁既有 literal-string warnings 仍存在,本輪新增區塊走 i18n
|
||||
NEXT_PUBLIC_API_URL=https://awoooi.wooo.work pnpm --filter @awoooi/web build -> OK
|
||||
|
||||
bash -n scripts/ci/notify-awoooi-cicd.sh -> OK
|
||||
AWOOI_CICD_DRY_RUN=1 ... notify-awoooi-cicd.sh | jq
|
||||
-> receiver=awoooi-cicd, alertname=CI_post_deploy_success, status=success
|
||||
PATH=node-only AWOOI_CICD_DRY_RUN=1 ... notify-awoooi-cicd.sh | jq
|
||||
-> receiver=awoooi-cicd, alertname=CI_post_deploy_success, status=success
|
||||
git diff --check -> OK
|
||||
```
|
||||
|
||||
**Commit / Deploy**:
|
||||
|
||||
```text
|
||||
56a8085d feat(awooop): surface ai provider route status
|
||||
570b99e9 chore(cd): deploy 56a8085 [skip ci]
|
||||
|
||||
170f927b fix(ci): build cicd notification payload without python
|
||||
815dcf37 chore(cd): deploy 170f927 [skip ci]
|
||||
```
|
||||
|
||||
**Gitea Actions**:
|
||||
|
||||
```text
|
||||
2445 Code Review for 56a8085d -> success
|
||||
2444 CD for 56a8085d -> success
|
||||
tests -> success
|
||||
build-and-deploy -> success
|
||||
post-deploy-checks -> success
|
||||
|
||||
2449 Code Review for 170f927b -> success
|
||||
2450 CD workflow_dispatch for 170f927b -> success
|
||||
tests -> success
|
||||
build-and-deploy -> success
|
||||
post-deploy-checks -> success
|
||||
```
|
||||
|
||||
**Production 驗證**:
|
||||
|
||||
```text
|
||||
K8s image after T80:
|
||||
awoooi-api 192.168.0.110:5000/awoooi/api:170f927bc677da492d222d561504d6fe4b82c0f1
|
||||
awoooi-worker 192.168.0.110:5000/awoooi/api:170f927bc677da492d222d561504d6fe4b82c0f1
|
||||
awoooi-web 192.168.0.110:5000/awoooi/web:170f927bc677da492d222d561504d6fe4b82c0f1
|
||||
|
||||
GET https://awoooi.wooo.work/api/v1/health
|
||||
-> healthy, prod, mock_mode=false
|
||||
|
||||
GET /api/v1/platform/ai-route-status?workload_type=deep_rca
|
||||
-> selected_provider=ollama_gcp_a
|
||||
-> selected_url=http://34.143.170.20:11434
|
||||
-> selected_model=gemma3:4b
|
||||
-> policy_order=ollama_gcp_a → ollama_gcp_b → ollama_local → gemini
|
||||
-> fallback_chain=ollama_gcp_b → ollama_local → gemini
|
||||
|
||||
Production Playwright smoke on /zh-TW/awooop/runs:
|
||||
-> AI Provider 路由 visible
|
||||
-> ollama_gcp_a / ollama_gcp_b / ollama_local / gemini visible
|
||||
-> Primary=ollama_gcp_a visible
|
||||
-> route error not visible
|
||||
|
||||
CD post-deploy notification after T80:
|
||||
-> AwoooP-mirrored CI/CD notification sent via http://192.168.0.125:32334/api/v1/webhooks/alertmanager
|
||||
-> CI/CD success notification mirrored through AWOOI API
|
||||
-> no python3 missing fallback
|
||||
```
|
||||
|
||||
**邊界 / 技術債**:
|
||||
|
||||
- T79 是路由狀態可視化,不會觸發 inference、自動修復、approval 或 incident 狀態變更。
|
||||
- T80 修掉 success notification 因 `python3 missing` 回退 Telegram 的問題;direct Telegram fallback 仍保留作為 API 離線保底。
|
||||
- Gitea act runner 仍偶發 cleanup warning(`__pycache__` permission / symlink cleanup),目前 job conclusion 為 success;這是 runner hygiene 技術債,不影響本輪交付。
|
||||
|
||||
**目前整體進度**:
|
||||
|
||||
- AwoooP 告警可觀測鏈:約 96.5%。
|
||||
- 低風險自動修復閉環:約 95%。
|
||||
- 前端 AI 自動化管理介面同步:約 92.5%。
|
||||
- CI/CD notification AwoooP 主路徑:約 99%。
|
||||
- 完整 AI 自動化管理產品化:約 89%。
|
||||
|
||||
---
|
||||
|
||||
## 2026-05-19 | T72 Homepage live status and flow-pipeline stabilization
|
||||
|
||||
**背景**:首頁 `https://awoooi.wooo.work/zh-TW` 已能載入 production 資料,但值班視角仍有三個明顯斷點:飛輪 KPI 卡會持續嘗試 production 未接通的 `/api/v1/stats/flywheel/ws` WebSocket 並造成 console 噪音;每張 IncidentCard 都各自抓 CSRF token,活躍事件很多時會把首頁網路請求放大;小龍蝦 / OpenClaw 流程管線只看 `incident.status`,沒有把 `decision.state` / proposal evidence 納入,導致已有 AI 提案或待授權的事件看起來仍停在早期偵測。
|
||||
|
||||
Reference in New Issue
Block a user