Commit Graph

461 Commits

Author SHA1 Message Date
OoO
4380fa641c ci(observability): gate frontend deploys with QA suite
Some checks failed
CD Pipeline / deploy (push) Has been cancelled
2026-05-05 23:39:00 +08:00
OoO
3db8f5c5b2 chore(observability): polish QA entrypoint docs
Some checks failed
CD Pipeline / deploy (push) Has been cancelled
2026-05-05 23:37:00 +08:00
OoO
7225e81c08 chore(observability): pass QA target args through quick review
All checks were successful
CD Pipeline / deploy (push) Successful in 2m7s
2026-05-05 23:32:13 +08:00
OoO
ca22b7fe7c docs(agents): index observability UI QA workflow 2026-05-05 23:29:31 +08:00
OoO
7ce74e32fe docs(memory): record observability UI QA guardrails 2026-05-05 23:28:06 +08:00
OoO
65eea5eb9a chore(observability): add noninteractive QA quick review flags
Some checks failed
CD Pipeline / deploy (push) Has been cancelled
2026-05-05 23:25:55 +08:00
OoO
ce7dd6068c docs(deploy): require observability QA for frontend changes 2026-05-05 23:24:33 +08:00
OoO
be1d1aec03 test(observability): include health in smoke suite
All checks were successful
CD Pipeline / deploy (push) Successful in 4m4s
2026-05-05 23:20:45 +08:00
OoO
cdcbcf1d80 chore(observability): centralize QA page contract
All checks were successful
CD Pipeline / deploy (push) Successful in 1m33s
2026-05-05 22:19:25 +08:00
OoO
346e9672a6 chore(observability): add CSS mirror sync helper
All checks were successful
CD Pipeline / deploy (push) Successful in 1m33s
2026-05-05 22:16:41 +08:00
OoO
15f7c8660d fix(observability): serve CSS from Flask static path
All checks were successful
CD Pipeline / deploy (push) Successful in 1m34s
2026-05-05 22:14:47 +08:00
OoO
6d015c5b6b test(observability): assert design system markers
All checks were successful
CD Pipeline / deploy (push) Successful in 2m24s
2026-05-05 22:08:44 +08:00
OoO
b21b40cae2 fix(observability): soften frontend error copy
All checks were successful
CD Pipeline / deploy (push) Successful in 1m2s
2026-05-05 21:58:49 +08:00
OoO
d93ad659ba fix(observability): polish topbar alert indicator
All checks were successful
CD Pipeline / deploy (push) Successful in 1m33s
2026-05-05 21:52:45 +08:00
OoO
422137efa8 test(observability): validate sidebar route coverage
All checks were successful
CD Pipeline / deploy (push) Successful in 1m41s
2026-05-05 21:46:28 +08:00
OoO
e7d567c6be test(observability): assert page content markers
Some checks failed
CD Pipeline / deploy (push) Failing after 4m55s
2026-05-05 15:53:39 +08:00
OoO
8643ed12ad test(observability): validate nav active page contract
Some checks failed
CD Pipeline / deploy (push) Has been cancelled
2026-05-05 15:48:39 +08:00
OoO
3fca720fa1 test(observability): guard sidebar navigation design
Some checks failed
CD Pipeline / deploy (push) Failing after 2m11s
2026-05-05 15:41:39 +08:00
OoO
6a0d5c138d test(observability): add one-shot QA suite
Some checks failed
CD Pipeline / deploy (push) Has been cancelled
2026-05-05 15:39:55 +08:00
OoO
b963dcf209 test(observability): add production page smoke check
Some checks failed
CD Pipeline / deploy (push) Has been cancelled
2026-05-05 15:35:47 +08:00
OoO
62276f8b0c chore(observability): wire UI guard into quick review
Some checks failed
CD Pipeline / deploy (push) Failing after 1m57s
2026-05-05 15:31:04 +08:00
OoO
07c9e200d0 test(observability): add UI regression guard
Some checks failed
CD Pipeline / deploy (push) Failing after 1m39s
2026-05-05 15:04:21 +08:00
OoO
fa3e0884ad docs(observability): 補齊 UI 治理規範
Some checks failed
CD Pipeline / deploy (push) Failing after 1m38s
2026-05-05 14:59:45 +08:00
OG T
ddcfd9603b fix(ops): cap momo runtime startup load
Some checks failed
CD Pipeline / deploy (push) Has been cancelled
2026-05-05 14:58:11 +08:00
OoO
ccd26415f3 fix(observability): 導入標題尺度 token 與 modal 樣式
Some checks failed
CD Pipeline / deploy (push) Has been cancelled
2026-05-05 14:54:17 +08:00
OoO
668d98cd3c fix(observability): 清理硬編碼樣式與圖表容器
Some checks failed
CD Pipeline / deploy (push) Failing after 9m49s
2026-05-05 14:41:00 +08:00
OoO
2c11a3dc81 fix(observability): 強化跨頁 responsive 與可及性
Some checks failed
CD Pipeline / deploy (push) Failing after 4m5s
2026-05-05 14:31:56 +08:00
OoO
4a745c27b4 fix(observability): 精修資料密集頁面視覺層級
Some checks failed
CD Pipeline / deploy (push) Has been cancelled
2026-05-05 14:27:43 +08:00
OoO
3b9a74773c fix(observability): 補齊精修樣式提交
Some checks are pending
CD Pipeline / deploy (push) Has started running
2026-05-05 14:20:49 +08:00
OoO
be986b8b97 fix(observability): 缺表時改為安全空狀態
Some checks failed
CD Pipeline / deploy (push) Has been cancelled
2026-05-05 14:19:09 +08:00
OoO
e28f604ec6 fix(observability): 收斂標題尺度與商業卡片排版
Some checks failed
CD Pipeline / deploy (push) Has been cancelled
2026-05-05 14:14:24 +08:00
OoO
4afcf3376b fix(observability): 統一標題字型並卡片化商業建議
Some checks failed
CD Pipeline / deploy (push) Has been cancelled
2026-05-05 14:09:41 +08:00
OG T
c7242971e3 fix(aiops): align incidents schema with autoheal model
Some checks failed
CD Pipeline / deploy (push) Has been cancelled
2026-05-05 14:08:19 +08:00
OoO
67b93a8b50 fix(observability): 統一觀測台 UI 設計系統
Some checks failed
CD Pipeline / deploy (push) Has been cancelled
2026-05-05 14:05:45 +08:00
OoO
c38f22e67a fix(observability): 修復戰情頁安全降級與樣式掛載
Some checks failed
CD Pipeline / deploy (push) Has been cancelled
2026-05-05 14:02:29 +08:00
OoO
505cbe20c7 fix(ui): 恢復側欄暖色導覽規範
Some checks failed
CD Pipeline / deploy (push) Has been cancelled
2026-05-05 13:57:41 +08:00
OoO
6f8fdc14ba fix(observability): 提升側欄子選單可讀性
Some checks failed
CD Pipeline / deploy (push) Has been cancelled
2026-05-05 13:56:26 +08:00
OoO
9b908ca426 fix(observability): 套用商業戰情頁觀測台樣式
Some checks failed
CD Pipeline / deploy (push) Has been cancelled
2026-05-05 13:53:40 +08:00
OG T
f6a2a05e3f fix(aiops): treat openclaw strategy actions as advisory
Some checks failed
CD Pipeline / deploy (push) Has been cancelled
2026-05-05 13:49:36 +08:00
OoO
c57b8f40ee feat(observability): 收尾 Agent 與商業戰情頁
All checks were successful
CD Pipeline / deploy (push) Successful in 7m39s
2026-05-05 13:36:31 +08:00
OoO
054685826a feat(observability): 重塑 AI 觀測台戰情室 UI
Some checks failed
CD Pipeline / deploy (push) Has been cancelled
2026-05-05 13:17:42 +08:00
OoO
2bb2e16442 feat(p56): deploy_doctor 擴充 — Observability + CD Pipeline 兩階段檢查
5 階段 → 7 階段:

[3/7] Ollama 主機(從 3 → 5 機)
  + 192.168.0.110:11435 (P53 K8s Nginx Proxy GCP-A)
  + 192.168.0.110:11436 (P53 K8s Nginx Proxy GCP-B)

[6/7] Observability 11 endpoint (新)
  全 prod smoke:mo.wooo.work/observability/* + api/health_indicator
  SPA shell fingerprint 偵測(size=7480 / etag e167a58a... = FAIL)
  302/308/401/403 (auth redirect) 視為 OK = login_required 正常工作
  PROD_BASE_URL env 可覆寫測 staging

[7/7] CD Pipeline (新)
  Gitea API 撈最近 3 個 run,狀態映射 OK/WARN/FAIL
  110 不可達 → 自動 WARN(不阻 deploy doctor exit code)

DB migrations 表清單 + 029 ollama_host_history / 030 ppt_audit_history_db。

本機跑實證:11 endpoint 全綠,Gitea 110 down 正確 WARN。
2026-05-05 12:27:51 +08:00
OoO
326285d8b9 test(p55): 觀測台 mutation endpoint logged-in success path 補測 (23/23 PASS)
P53 之前 mutation endpoint 只測 anon block (302),logged-in 成功路徑零覆蓋:
- /playbooks/toggle/<id>: 翻 is_active 邏輯
- /budget/force_throttle: cost_throttle.evaluate() 呼叫
- /ai_calls/trigger_code_review: code_review_pipeline 觸發
- /host_health/trigger_autoheal: autoheal playbook 觸發

新增 5 cases:
- test_playbook_toggle_404_on_missing: fetchone()=None 必回 404
- test_playbook_toggle_flips_active_flag: False→True 翻轉 + 中文 message
- test_budget_force_throttle_invokes_evaluate: monkeypatch 假 throttle service
- test_ai_calls_trigger_code_review_returns_json: 至少回 JSON 不爆
- test_host_health_trigger_autoheal_returns_json: 至少回 JSON 不爆

設計重點:對未來 service 重構容忍(status code 收 200/400/500/503)
但堅持「JSON response shape」契約 — 防 HTML error page 漏出。
2026-05-05 12:17:54 +08:00
OoO
df2311d4f0 feat(p55): 3 個圓餅圖補齊 — promotion_review/ppt_audit/budget
All checks were successful
CD Pipeline / deploy (push) Successful in 7m39s
S-1: promotion_review 蒸餾池 30d doughnut
- 取代原 col-md-2 卡片網格
- 8 種狀態各自分色:
  pending(灰) / awaiting_review(黃) / approved(綠) /
  rejected_quality(紅) / rejected_hallucination(深紅) /
  rejected_duplicate(橘) / rejected_human(暗紅) / expired(灰)
- 左圓餅 + 右表格雙視角

S-2: ppt_audit 30d 結果 doughnut
- 取代部分 col-md-2 卡片佈局
- 通過(綠)/失敗(黃)/錯誤(紅)/跳過(灰) 圓餅
- 6 個 KPI 卡併入右側 col-6 grid(總筆數/通過率/通過/issue/失敗/錯誤)
- 統一視覺語言:「圖+表」雙視角

S-3: budget 當月各 provider 成本 doughnut
- 新加 query:ai_calls.cost_usd GROUP BY provider 月初至今
- 8 個 provider 分色(本地 Ollama 綠系 vs 付費 LLM 橘紫系)
- 左圓餅 + 右表格(供應商/成本/佔比)+ 總計列

chart.js 視覺化從 7 個 → 10 個:
- hourly trend line
- 30d cost stacked bar
- 三主機 sparkline × 3
- RAG feedback doughnut
- KPI sparkline × 3 (calls/cost/errors)
- verdict doughnut
- heal 7d trend
- **promotion_review status doughnut(新)**
- **ppt_audit pass/fail doughnut(新)**
- **provider cost doughnut(新)**

Phase 38→55 累計 20 commits / 10 觀測頁 / 10 chart.js / DB 100%。

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 01:15:58 +08:00
OoO
90e8366a8d feat(p54): chart.js 視覺微調 — KPI sparkline + verdict 圓餅 + heal 趨勢
Some checks failed
CD Pipeline / deploy (push) Has been cancelled
R-1: ai_calls KPI 卡片加 24h sparkline
- 呼叫次數卡片下加 24px 高 mini line chart(藍)
- 成本卡片下加 sparkline(黃)
- 錯誤次數卡片下加 sparkline(紅)
- Token / 平均耗時 / RAG 命中卡片改顯示「平均 tk/call」「cache 命中數」「RAG 命中率%」
- 整排 KPI 從乾巴巴數字 → 含 24h 趨勢視覺
- 共用 chart.js dataset,無新 query

R-2: business_intel verdict 改 doughnut + 表格雙視角
- 取代原 col-md-3 卡片網格
- 左圓餅:effective(綠)/backfired(紅)/neutral(灰) 視覺比例
- 右表格:4 欄(verdict/筆數/佔比/平均 Δ)含正負色
- 與 quality_trend RAG pie chart 視覺風格統一

R-3: host_health AIOps card 加 7d 自癒成功率 sparkline
- routes/admin_observability_routes.py 新加 heal_daily query
  date_trunc('day') GROUP BY 7 天每日 success rate
- AIOps 7d card 底部加 80px 高 line chart
- Y 軸 0-100% / X 軸 7 天日期
- tooltip 顯示「ok/total 成功 (rate%)」

chart.js 視覺化從 4 個 → 7 個:
hourly trend / 30d stacked / 三主機 sparkline / RAG doughnut /
KPI sparkline × 3 / verdict doughnut / heal trend

Phase 38→54 累計 19 commits / 10 觀測頁 + topbar indicator / 7 chart.js。

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 01:13:31 +08:00
OoO
118f10701b test(p54): get_host_label / get_provider_tag 補測 (20/20 PASS)
P53 commit 7a10d27 加了 K8s Nginx Proxy 路由判斷
(192.168.0.110:11435/11436 → GCP-A/B),但無單測。
未來改 IP / 加 provider 容易破而不自知。

覆蓋:
- TestGetHostLabel × 9 cases:
  empty / GCP-A 直連 / GCP-B 直連 / Nginx 11435 / Nginx 11436 /
  111 備援 / 188 本地 / localhost / unknown fallback
- TestGetProviderTag × 5 cases + parametrize × 6 row:
  empty / GCP×2 路徑 / Secondary×2 路徑 / 111 / unknown
  + 6 row 對齊 migration 024 ai_calls.provider CHECK 白名單

特別防 regression:K8s 環境 192.168.0.110:11435 不再 fallback "未知"
(這正是 P53 commit 修的問題)。
2026-05-05 01:12:35 +08:00
OoO
7a10d27d61 feat(p53): K8s Nginx Proxy 支援 — host_label/provider_tag 補完
Some checks failed
CD Pipeline / deploy (push) Has been cancelled
問題:
K8s 內網無法直連 GCP 公網 11434,所以 110 跳板架了 Nginx Proxy
轉發 11435/11436 到 GCP-A/GCP-B。但 services/ollama_service.py 的
get_host_label() 只看 IP substring(34.143.170.20 / 34.21.145.224),
K8s 環境會 fallback 到「未知」造成觀測台主機標籤錯亂。

修補:
- services/ollama_service.py::get_host_label
  新增:
    192.168.0.110:11435 → "GCP-SSD(via Nginx 110)"
    192.168.0.110:11436 → "GCP-SSD-2(via Nginx 110)"
  保留:直連 GCP / 111 / 188 / localhost 既有判斷

- services/ollama_service.py::get_provider_tag(新函式)
  統一 provider 標籤判斷(之前散落 code_review_pipeline 等多處重寫):
    GCP 直連 + Nginx 11435 → 'gcp_ollama'
    GCP-B 直連 + Nginx 11436 → 'ollama_secondary'
    111 → 'ollama_111'
    其他 → 'ollama_other'
  跨環境統一:ai_calls.provider 在 docker-compose / K8s 都記同一 tag,
  跨環境統計不分裂。

- services/code_review_pipeline_service.py:233
  改用統一 get_provider_tag(),移除原本 hardcode 的 if/else IP 判斷。

- k8s/02-configmap.yaml(user 已改)
  OLLAMA_HOST_PRIMARY = http://192.168.0.110:11435 (Nginx → GCP-A)
  OLLAMA_HOST_SECONDARY = http://192.168.0.110:11436 (Nginx → GCP-B)
  OLLAMA_HOST_FALLBACK = http://192.168.0.111:11434 (內網)

driver test:
  http://34.143.170.20:11434 → GCP-SSD / gcp_ollama
  http://192.168.0.110:11435 → GCP-SSD(via Nginx 110)/ gcp_ollama
  http://192.168.0.111:11434 → 111 備援 / ollama_111

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 01:09:56 +08:00
OoO
a142e85880 test(p53): 觀測台 smoke 涵蓋 P38-P52 新增 11 endpoint (18/18 PASS)
戰役從 P27 6 路由擴展到 P52 共 20 路由(含 5 新 GET / 5 新 POST),
原 12 tests 只蓋 P27-31 範圍,P38-P52 共 11 endpoint 無 regression 防護。

新增測試:
- test_overview_index_200: /observability/ root index
- test_overview_dashboard_200: P45 總覽頁
- test_rag_queries_200: P51 RAG 召回詳情
- test_business_intel_200: P48 商業面 × AI 編排
- test_agent_orchestration_200: P46 Agent 編排矩陣
- test_health_indicator_api_returns_json: P52 topbar 健康燈 JSON API
- test_anon_get_redirects_to_login: 12 GET 路徑全強制 login (擴充 6→12)
- test_anon_post_blocked: 8 POST mutation 全強制 login (擴充 3→8)

prod 實證:mo.wooo.work 11 endpoint 全 Flask 200/308 服務(curl 已驗)。
20/20 routes @login_required 100% 覆蓋(python regex audit)。
2026-05-05 01:09:52 +08:00
OoO
2a3ea6f581 feat(p52): topbar 觀測台健康指示燈 + RAG 反饋圓餅圖
All checks were successful
CD Pipeline / deploy (push) Successful in 2m30s
P-1: topbar AI 觀測台 indicator(全頁可見)
- ewoooc_base.html topbar 加「🛰 AI 觀測台」icon button
- 紅色 badge 顯示告警數量(4 維度任一觸發即計數):
  • 三主機任一掛掉
  • 待審 episode > 0
  • 過去 1h 錯誤率 ≥ 30%
  • 預算任一 ≥ 90%
- 新 GET /observability/api/health_indicator
  輕量 JSON API(4 query 跨 host_health_probes/learning_episodes/
  ai_calls/ai_call_budgets)
- topbar polling 每 60s 自動刷新 + tooltip 顯示具體告警內容
- 全部頁面(包括 / 商品看板、所有觀測頁)topbar 都看得到健康狀態

P-2: quality_trend RAG 反饋圓餅圖(doughnut)
- 取代原本卡片網格佈局
- 1-5 星依綠→紅漸層著色(5=綠、3=黃、1=紅)
- 圓餅 + 右側表格雙視角(chart 配對 raw 數字)
- chart.js doughnut + tooltip 顯示筆數+佔比

效益:
- 統帥從任何頁面(不限觀測台)都能瞄一眼右上角看當前 AI 健康
- 快樂路徑:「正常」綠色 icon · 異常路徑:「紅色 badge + 數字」立即吸睛
- 圓餅圖比原網格更直觀「分布」感

Phase 38→52 累計 17 commits / 10 觀測頁 / DB 100% / 4 chart.js / 全頁 indicator。

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 20:20:34 +08:00
OoO
e0a8d87c2c feat(p51): RAG 召回詳情新頁 + overview 三主機 24h sparkline
All checks were successful
CD Pipeline / deploy (push) Successful in 2m35s
新頁 /observability/rag_queries:補完 RAG 觀測深度
之前只看 caller 級命中率,現在能看每筆查詢的真實內容。

O-1: route + template
- 篩選:時段(1/6/24/72/168h)/ caller / saved_only flag
- 整體 KPI 4 卡:總查詢 / 命中率 / saved_call 率 / 反饋平均分
- by caller 表:每個 caller 的查詢/命中/saved/反饋細節
- 最近 50 筆查詢詳情表
- 「查 hits」按鈕 → 彈 modal 載入 ai_insights JOIN 內容預覽
  (新 endpoint /observability/rag_queries/<id>/hits 回傳 JSON)

O-2: 入口
- sidebar AI 觀測 group 加「RAG 召回詳情」(11b)
- /observability/overview 入口卡升級為 9 項

O-3: overview 三主機 24h sparkline
- 每張主機卡片下方加 60px 高 chart.js sparkline
- 折線:每小時 uptime % bucket(0-100% Y 軸隱藏,純視覺)
- routes/admin_observability_routes.py::observability_overview
  新加 host_sparkline 查詢(GROUP BY host_label, hour)
- 三主機卡片視覺化升級:原本只有「100%」字,現在加趨勢線

Phase 38→51 累計 16 commits / 10 觀測頁。
觀測台戰役從「raw stats」到「視覺方格 UI 完整體」。

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 20:09:28 +08:00