OG T
|
890d438cdf
|
fix(group): 群組告警格式對齊 TelegramMessage 模板 + 修復 AI 討論觸發
CD Pipeline / build-and-deploy (push) Has been cancelled
- 群組告警改用 ═══ 分隔線格式,與個人 chat 一致
- 加入「OpenClaw 與 NemoClaw 正在分析中...」提示
- 加 group_msg_id 為空時的 warning log
- clawbot-v5 STANDBY_MODE: main.py 檢查條件修正
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 17:36:01 +08:00 |
|
OG T
|
c65ed5b1c9
|
feat(telegram): SRE 戰情室群組三頭政治 Triumvirate (ADR-053)
CD Pipeline / build-and-deploy (push) Successful in 7m6s
- config.py: 新增 OPENCLAW_BOT_TOKEN / NEMOTRON_BOT_TOKEN / SRE_GROUP_CHAT_ID
- telegram_gateway.py: send_to_group / send_as_openclaw / send_as_nemotron / trigger_group_ai_discussion / _send_approval_card_to_group
- send_approval_card 告警發送後非同步觸發群組 AI 雙向討論
- configmap: SRE_GROUP_CHAT_ID=-1003711974679
- secrets: OPENCLAW_BOT_TOKEN / NEMOTRON_BOT_TOKEN CHANGE_ME 佔位
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 17:16:05 +08:00 |
|
OG T
|
ff5a77f7a9
|
fix(telegram): 啟用 Polling + 修正 InfraAlertMessage 格式
CD Pipeline / build-and-deploy (push) Successful in 6m52s
1. TELEGRAM_ENABLE_POLLING: false→true
- clawbot-v5 已停止 polling (STANDBY_MODE)
- AWOOOI API 接管,統帥可與 OpenClaw/NemoClaw 雙 AI 對話
2. InfraAlertMessage.format() 加入 note 欄位
- NIM 慢屬正常不再顯示「自動修復失敗」
- 改為 💡 資訊性提示
3. NIM 探測端點改為 /v1/models (輕量,不觸發計費)
timeout: 10s → 25s (NIM 免費 tier 冷啟動)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 16:43:40 +08:00 |
|
OG T
|
15aabd6ac5
|
fix(chat+nim): 修復首席架構師 Review I1-I4 + S3 四項重要問題
CD Pipeline / build-and-deploy (push) Successful in 7m9s
I1: chat_manager._call_openclaw timeout=30.0 → 讀 settings.OPENCLAW_TIMEOUT
I2: nvidia_provider.py stale comment "45" → "55" 對齊 ConfigMap
I3: asyncio.shield 移除 — shield 超時後 task 繼續跑但無人等待 (silent leak)
I4: ChatManager.__init__ 移除 repo 實例 (leWOOOgo 禁 Service 持有 repository)
S3: _check_nemotron_health probe 10s → 25s + /v1/models 輕量端點
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 16:36:16 +08:00 |
|
OG T
|
be247d6c5c
|
fix(chat): OpenClaw timeout 30→40s,NemoClaw 50→60s
CD Pipeline / build-and-deploy (push) Successful in 6m51s
get_system_context() k8s/DB 查詢加上 _call_openclaw 30s,
總計超過外層 shield 30s 導致 OpenClaw 全部超時。
放寬 timeout 讓兩個 AI 有足夠時間回應。
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 16:27:08 +08:00 |
|
OG T
|
4284337249
|
fix(config): NEMOTRON_TIMEOUT_SECONDS 30→55 固化到 ConfigMap
CD Pipeline / build-and-deploy (push) Successful in 7m0s
NIM 免費 tier 延遲 11-45s,30s 硬編碼導致所有慢請求超時。
已同步 prod/dev ConfigMap,避免下次 CD 部署被覆蓋。
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 15:58:11 +08:00 |
|
OG T
|
ce945fe89e
|
rule(cost): 🔴🔴🔴 費用變更強制審批 — HARD_RULES v1.8 + CLAUDE.md
統帥指示 2026-04-03:
所有涉及費用產生的變更必須停下來等統帥明確批准後才可執行
新增:
- HARD_RULES.md v1.8: Cost Change Approval 章節
- 定義涉費變更範圍
- 強制流程: 識別→停→說明→等批准→執行
- 今日違規教訓記錄
- CLAUDE.md 任務前必讀新增費用變更條目
Memory 已同步:
- feedback_cost_change_approval.md (新建)
- feedback_constitution_v2.md 第五章
- MEMORY.md 索引最高鐵律區
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 15:36:47 +08:00 |
|
OG T
|
d8c9e29485
|
fix(heartbeat): 撤銷錯誤的 Nemotron 自動關閉邏輯
CD Pipeline / build-and-deploy (push) Successful in 6m53s
之前錯誤地在偵測到 Nemotron 慢時自動執行
ENABLE_NEMOTRON_COLLABORATION=false,
這等於自動關掉產品核心功能。
Nemotron NIM 免費 tier 延遲 11-45s 是已知特性(Memory 有記載),
不是需要自動修復的異常。
現在:偵測慢只發告警通知,不執行任何自動修復。
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 15:34:34 +08:00 |
|
OG T
|
1430b1283d
|
fix(chat+nvidia): 還原 OpenClaw+Nemotron 架構 + 修 30s timeout 根因
CD Pipeline / build-and-deploy (push) Has been cancelled
ChatManager 還原:
- OpenClaw (188:8088) 負責 RCA 仲裁,不改用 Gemini (未經批准)
- NemoClaw (NVIDIA NIM nemotron-mini-4b) 負責補充/評論
- 雙 AI 並行執行,OpenClaw 30s / NemoClaw 50s timeout
- 支援 @openclaw / @nemo 指定對象
nvidia_provider.py 修 timeout 根因:
- NVIDIA_TIMEOUT 從硬編碼 30.0 改為讀 NEMOTRON_TIMEOUT_SECONDS (45s)
- Memory 記載 NIM 免費 tier 延遲 11-45s,30s 硬編碼導致慢請求全超時
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 15:34:02 +08:00 |
|
OG T
|
d522c51deb
|
fix(infra-alert): Nemotron 異常告警套用標準模板 + 真正自動修復
CD Pipeline / build-and-deploy (push) Has been cancelled
1. 新增 InfraAlertMessage dataclass — 基礎設施異常的標準告警格式
(之前 Nemotron 告警是硬編碼文字,不走任何模板)
2. 偵測 Nemotron 異常時自動執行修復:
kubectl set env ENABLE_NEMOTRON_COLLABORATION=false
(之前只是把指令印在訊息裡,從未執行)
3. 告警顯示自動修復結果 (✅ 已自動修復 / ❌ 失敗)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 15:29:20 +08:00 |
|
OG T
|
e93ada0452
|
fix(chat): OpenClaw 改走 Gemini Flash,移除 Ollama 依賴
CD Pipeline / build-and-deploy (push) Successful in 7m18s
Ollama 188 完全卡死 (0 bytes/30s timeout),無法作為對話後端。
雙 AI 皆使用 Gemini Flash,靠不同 persona 和 temperature 區分:
- OpenClaw: temperature=0.5 (精準果斷)
- NemoClaw: temperature=0.9 (分析發散)
同時 kubectl set env ENABLE_NEMOTRON_COLLABORATION=false
停止每個 incident 白白等待 30s Nemotron timeout
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 15:20:23 +08:00 |
|
OG T
|
d9007e6855
|
feat(chat+monitor): 雙 AI 對話重寫 + Nemotron 健康監控告警
CD Pipeline / build-and-deploy (push) Successful in 6m56s
ChatManager 重寫 (Phase 22.6):
- @openclaw <msg> → 只有 OpenClaw 回應 (Ollama qwen2.5:7b)
- @nemo <msg> → 只有 NemoClaw 回應 (Gemini Flash)
- 無前綴 → OpenClaw 先答,NemoClaw 評論/反駁
NemoClaw 改用 Gemini Flash (棄 NIM nemotron-mini-4b 因為 15s+ 回應時間)
TelegramGateway 心跳新增 Nemotron 健康探測:
- 每次心跳探測 NVIDIA NIM API (10s timeout)
- 異常時立刻發 Telegram 告警 + 緩解指令
- 補足 Nemotron 100% 超時卻無告警的監控盲區
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 14:59:06 +08:00 |
|
OG T
|
c1834a7156
|
feat(kb+apm): KB Phase 2-A 自動萃取 + KB-D Markdown 詳情面板 + APM 趨勢圖
CD Pipeline / build-and-deploy (push) Successful in 7m28s
- KB-A: 新增 knowledge_extractor_service.py (Ollama llama3.2:3b 本地推理)
- KB-A: incident_service.py resolve hook (fire-and-forget asyncio.create_task)
- KB-D: 引入 react-markdown + remark-gfm,知識庫詳情面板 Markdown 渲染
- KB-D: 批准/封存按鈕串接 API (POST /knowledge/{id}/approve, PATCH status)
- KB-D: i18n 新增 approving/archiving 載入狀態文字
- APM: apm/page.tsx 整合 TimeSeriesChart sparkline (使用 trend[] 欄位)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 14:40:27 +08:00 |
|
OG T
|
7ff0c5c304
|
fix(i18n): MonitoringTools 硬編碼中文 → i18n keys + MTTR 趨勢改為真實計算
CD Pipeline / build-and-deploy (push) Has been cancelled
- MonitoringTools: 載入中/無法連線/觸發/正常/離線/版本/統計/更新 → useTranslations
- MTTR 趨勢: '↓2m' hardcode → 前半/後半 resolved incidents 真實比較
- zh-TW.json + en.json: 新增 connectionError/monitoringStatus.firing/metaVersion/metaStats/metaUpdatedAt
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 14:36:46 +08:00 |
|
OG T
|
778d3cc2e4
|
fix(metrics): Pod健康 extra row 對齊 figma-v2 — 改用 sub 小字取代紅色 badge
CD Pipeline / build-and-deploy (push) Successful in 6m48s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 14:12:48 +08:00 |
|
OG T
|
2e9845074e
|
fix(test): nvidia → openclaw_nemo 對齊 RATE_LIMITS/COST_LIMITS key (I3)
CD Pipeline / build-and-deploy (push) Successful in 6m57s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 14:00:21 +08:00 |
|
OG T
|
37eb17fc78
|
fix(layout): sidebar/header 對齊 — ml-[224px] + pt-[68px] 消除 32px 空隙
CD Pipeline / build-and-deploy (push) Failing after 48s
- ml-64(256px) → ml-[224px] 對齊 sidebar 實際寬度
- pt-16(64px) → pt-[68px] 對齊 header 實際高度
- calc(100vh-64px) → calc(100vh-68px)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 13:35:47 +08:00 |
|
OG T
|
dc232ebb49
|
docs: LOGBOOK 更新 — KB Phase 1 + monitoring + I1/I3 完成
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 13:22:54 +08:00 |
|
OG T
|
e60225ea29
|
fix(ai): I1+I3 — Redis TTL + openclaw_nemo 命名對齊
CD Pipeline / build-and-deploy (push) Failing after 36s
I1: ai_control.py 所有寫入 Redis 的 key 加入 30 天 TTL
防止 ai:control:* keys 永久累積造成記憶體洩漏
I3: ai_rate_limiter.py "nvidia" key → "openclaw_nemo"
對齊 Phase 24 AIProviderEnum,使 rate limit 正確作用
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 13:22:36 +08:00 |
|
OG T
|
e7b4f43b60
|
fix(knowledge): 路由改為無尾斜線避免 307 redirect
CD Pipeline / build-and-deploy (push) Successful in 6m49s
GET "" 代替 "/" 讓 /api/v1/knowledge 直接回應,
不再觸發 FastAPI trailing-slash 307 重導向。
此修正與 ProxyHeadersMiddleware 雙重保障。
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 12:55:18 +08:00 |
|
OG T
|
9cf9e851e7
|
fix(api): 修正 Nginx 反向代理 307 redirect http:// Location 問題
CD Pipeline / build-and-deploy (push) Has been cancelled
加入 ProxyHeadersMiddleware,讓 FastAPI 信任 X-Forwarded-Proto header。
解決知識庫頁面無法載入內容的問題 (HTTPS→HTTP mixed content block)。
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 12:48:36 +08:00 |
|
OG T
|
d1936d57e1
|
ci: force rebuild web — metrics trend fix
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 12:43:56 +08:00 |
|
OG T
|
b225c23ad8
|
fix(ai_router): DIAGNOSE/ALERT_TRIAGE 改用 llama3.2:3b 避免 90秒 timeout
CD Pipeline / build-and-deploy (push) Successful in 7m5s
qwen2.5:7b-instruct 在 prod 需要 >90s,導致 DIAGNOSE intent 全鏈路失敗。
llama3.2:3b (summary model) 實測 4s 回應,適合 triage 類快速判斷。
規則 3 新增特判: DIAGNOSE/ALERT_TRIAGE/QUERY → ollama summary model
不影響其他 intent 的 model 選擇邏輯。
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 12:32:01 +08:00 |
|
OG T
|
c290507878
|
fix(dashboard): metrics 完整對齊 figma-v2 — trend箭頭+value-row
CD Pipeline / build-and-deploy (push) Has been cancelled
- MetricItem 加 trend 欄位(value-row 右側箭頭,figma exact copy)
- 今日事件: value-row 顯示 ↑N 橘色
- 自動處置率: value-row 顯示 ↑N% 綠色
- MTTR均值: value-row 顯示 ↓2m 綠色
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 12:30:07 +08:00 |
|
OG T
|
6ae655d943
|
fix(dashboard): metrics strip 完整對齊 figma-v2
CD Pipeline / build-and-deploy (push) Successful in 6m44s
- background 改為 #fff(白色)
- padding 改為 8px 16px,min-width:120px
- divider 改為獨立元素(width:0.5px height:36px alignSelf:center)
- label font-size 改為 11px
- 移除 borderRight hack,使用獨立 divider
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 12:15:32 +08:00 |
|
OG T
|
59eaf5c51b
|
fix(sidebar): 從 top:68px 開始,不再蓋住 header brand area
CD Pipeline / build-and-deploy (push) Has been cancelled
sidebar 原本從 top:0 + 68px spacer 實作,z-index:40 > header:30
導致 sidebar 蓋住 header 左側 brand area (AwoooI logo 消失)
修復: 改為 top:68px bottom:0,完全在 header 下方
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 12:12:10 +08:00 |
|
OG T
|
8788cdaaa0
|
fix(dashboard): 修復 metrics strip 排版與數據問題
CD Pipeline / build-and-deploy (push) Successful in 6m50s
- 活躍事件:有 incident 時值改橘色,下方顯示 P0×N + P2×N badge
- 服務健康:固定 4 條橫條按比例顯示健康率
- 待處理授權:i18n 修正「待簽核」→「待處理授權」,badge 顯示「等待確認」
- 自動處置率:移除錯誤 sparkline 覆蓋,恢復綠色進度條
- 移除未使用的 errorRateMetric/rpsMetric
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 12:00:35 +08:00 |
|
OG T
|
cbe528b5c6
|
feat(ui): header/sidebar/openclaw 完整對齊 figma-v2
CD Pipeline / build-and-deploy (push) Successful in 6m57s
- 移除 OpenClaw "AWOOOI v1.0.0 | 正式環境" header
- 語言按鈕標籤改為 繁/EN (pill 樣式)
- header/sidebar 視覺對齊 figma-v2
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 11:36:38 +08:00 |
|
OG T
|
741a8f4917
|
feat(dashboard): 完整對齊 figma-v2 設計 — 重寫主頁
CD Pipeline / build-and-deploy (push) Successful in 6m42s
- Metrics strip 從 6 個擴展為 7 個指標,新增「今日事件」(含趨勢折線圖)
- 服務健康指標加入彩色進度條視覺 (4 格色塊)
- 自動處置率加入漸層進度條 (figma-v2 style)
- MTTR 均值加入趨勢折線圖
- 監控工具卡片全面升級為 figma-v2 設計:
左側 3px 彩色條 (Grafana=橘/Prometheus=紅/Sentry=紫/Langfuse=藍/SigNoz=藍/Gitea=綠)
clickable <a> 連結加 ↗ 開新視窗圖示
底部 meta 行顯示版本/統計/更新時間
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 01:09:41 +08:00 |
|
OG T
|
2dcbedd80f
|
fix(host-grid): 對齊 figma — 服務行去掉 port/描述,hostname 顯示末段 IP
CD Pipeline / build-and-deploy (push) Successful in 7m4s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 00:59:59 +08:00 |
|
OG T
|
702350925a
|
fix(monitoring+layout): 修復基礎架構消失 + 監控工具全線上
CD Pipeline / build-and-deploy (push) Successful in 6m47s
- page.tsx: 右側 panel overflow:hidden → overflowY:auto,基礎架構重新顯示
- page.tsx: 監控工具卡片對齊 figma (icon box + 版本/統計行 + ›箭頭)
- monitoring.py: Gitea probe 改用 /api/v1/version (/-/readiness 404)
- monitoring.py: Grafana dashboard count 加 Basic auth
- NetworkPolicy: 補開 3002/9090/3001 egress (Grafana/Prometheus/Gitea)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 00:50:53 +08:00 |
|
OG T
|
b6105b8214
|
fix(ai): 首席架構師審查修復 C1+C2 (Phase 24 C)
C1 — telegram_gateway.py Fail-Closed 白名單:
白名單為空時 'if whitelist and ...' 為 False → 任何人可執行 /ai
修復: 'if not whitelist or user_id not in whitelist' Fail-Closed
加入 whitelist_empty 欄位到 warning log
C2 — openclaw.py list comprehension await 語法錯誤:
Python 3.11 不支援 list comprehension 中使用 await
'if not await is_provider_disabled(p)' → SyntaxError
修復: 改為 for loop 明確 await
I4: 靜默 except 改為 logger.warning
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 00:42:02 +08:00 |
|
OG T
|
8bc086af58
|
feat(infra): 完整監控工具 + 主機服務清單 + K3s Cluster 突顯
CD Pipeline / build-and-deploy (push) Successful in 6m50s
監控工具 (6個):
- 加入 Grafana (110:3002), Sentry (110:9000), Langfuse (110:3100)
- 保留 Prometheus, SigNoz, Gitea
基礎架構:
- 靜態服務目錄 HOST_CATALOG:每台主機完整服務+Port+說明
- K3s Server #2 (121) 補靜態卡 (API 未回傳)
- K3s Cluster HA 獨立藍色區塊,☸ 標題 + VIP 資訊
- 所有服務含 Port 號與功能描述
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 00:36:59 +08:00 |
|
OG T
|
dbe71f82e3
|
feat(ai): Phase 24 C — Telegram /ai 動態控制 + Redis 狀態管理
CD Pipeline / build-and-deploy (push) Has been cancelled
新增 ai_control.py:
- /ai status: 所有 Provider 狀態 + 路由模式
- /ai router on/off: 動態切換 AIRouter (覆蓋 env var)
- /ai primary <provider>: 設定主要 Provider
- /ai enable/disable <provider>: 控制 Provider 啟停
- /ai cost: 費用統計
- 白名單: OPENCLAW_TG_USER_WHITELIST 保護
telegram_gateway.py:
- _handle_chat_message 加入 /ai 指令攔截路由
- 白名單未授權返回警告
openclaw.py:
- Redis 狀態覆蓋 env USE_AI_ROUTER (/ai router on/off 生效)
- Redis primary_provider 覆蓋路由決策 (/ai primary 生效)
- Redis disabled provider 過濾 (/ai disable 生效)
Redis Keys:
ai:control:use_router
ai:control:primary_provider
ai:control:disabled:<provider>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 00:34:14 +08:00 |
|
OG T
|
b4b3a457c5
|
refactor(openclaw): Phase 24 B4 — 封存舊 fallback Provider 方法
CD Pipeline / build-and-deploy (push) Has been cancelled
[ARCHIVED] _call_ollama / _call_gemini / _call_claude
- 這三個方法為 USE_AI_ROUTER=false 回滾保留路徑
- 新路徑: USE_AI_ROUTER=true → AIRouterExecutor (ai_router.py)
- 新 Provider: ai_providers/ollama.py / gemini.py / claude.py
- 封存而非刪除: 完整移除等 Phase 24 全驗收後 (ADR-052 D11)
R3 觀察結果 (通過 ✅):
- openclaw_nemo provider: 12/12 incidents 全部正確路由
- 信心度: 0.8~0.9 正常
- USE_AI_ROUTER=true 生效確認
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 00:29:56 +08:00 |
|
OG T
|
e1e89c521a
|
fix(frontend): 修復 compliance resolved_rate 百分比重複 ×100 + users executed_at→created_at
CD Pipeline / build-and-deploy (push) Has been cancelled
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 00:28:22 +08:00 |
|
OG T
|
ce11fcdc3a
|
feat(monitoring): 監控工具區塊 — Grafana/Prometheus/SigNoz/Gitea 狀態
- 新增 GET /api/v1/monitoring/status,asyncio.gather 並行探測四工具
- 前端 MonitoringTools 元件,60s 輪詢顯示狀態/版本/統計
- 新增 monitoringTools i18n key
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 00:27:47 +08:00 |
|
OG T
|
30b7b10f01
|
feat(grafana): Wave D — AI監控 + 基礎設施 Dashboard (Grafana 188:3002)
新增 2 個 Dashboard,匯入既有 Nemotron Dashboard:
1. ai-monitoring.json — LLM + NVIDIA AI 監控
- LLM 呼叫速率 (req/min)
- LLM P99/P50 延遲
- Nemotron Tool Calling P99/P50 延遲
- LLM Cache 命中率 %
- LLM Fallback 次數
- Alert Chain 健康/最後成功時間
2. infra-monitoring.json — Node + K3s 基礎設施
- CPU/Memory 使用率
- K3s Pod 數量 (by namespace)
- K3s Pod 重啟次數
- Prometheus Targets UP/DOWN
- API 請求速率
3. nvidia-nemotron.json — 既有 18-panel Nemotron Dashboard (版控)
部署: 192.168.0.188:3002 (Grafana 12.4.1)
Provisioning: monitoring/grafana/provisioning/dashboards/
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 00:18:00 +08:00 |
|
OG T
|
cb0f92557d
|
feat(pages): 升級 5 個空殼頁面串接真實 API
CD Pipeline / build-and-deploy (push) Successful in 6m45s
- billing: /api/v1/audit-logs/stats (by operation/namespace)
- compliance: /api/v1/stats/incidents/summary + auto-repair/stats
- cost: /api/v1/stats/ai-performance (提案執行率/成功率)
- security: /api/v1/errors/stats + /errors/issues (Sentry BFF)
- users: /api/v1/audit-logs/stats + /audit-logs (操作稽核)
全部真實數據,無假頁面、無 mock data
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 00:11:27 +08:00 |
|
OG T
|
0b83707697
|
feat(web): APM/Apps/Deployments/Tickets 頁面升級 — 串接真實 API 數據
CD Pipeline / build-and-deploy (push) Has been cancelled
- apm/page.tsx: Golden Signals 真實數據 (SignOz ClickHouse)
- apps/page.tsx: 主機服務狀態 (/api/v1/dashboard 真實數據)
- deployments/page.tsx: K8s 部署狀態串接
- tickets/page.tsx: Incidents 列表串接
- i18n: apm/apps/deployments/tickets namespace 雙語補齊
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 00:08:11 +08:00 |
|
OG T
|
2253c1b74e
|
fix(layout): 修復主頁大空白 + Metrics Strip 右側溢出
CD Pipeline / build-and-deploy (push) Successful in 7m18s
E2E Health Check / e2e-health (push) Successful in 18s
新增 AppLayout fullBleed prop,主頁 opt-out p-6 包裝,
移除 page.tsx 的 margin: '-24px' hack。
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-02 23:58:48 +08:00 |
|
OG T
|
e93a50a4b4
|
feat(pages): 全部 ComingSoon 頁面升級為真實 UI — 串接真實 API / 空狀態頁面
CD Pipeline / build-and-deploy (push) Successful in 6m47s
- services/topology: 串接 /api/v1/dashboard,顯示服務清單表格與主機拓撲卡片 grid
- notifications: 串接 /api/v1/notifications/channels,404 時顯示空列表
- reports: 串接 /api/v1/stats/incident-summary + /api/v1/stats/resolution-stats,顯示統計卡片
- apm: 乾淨空狀態頁(SignOz 待整合)
- apps/tickets/users/deployments: 空列表表格結構
- billing/compliance/cost/security: 空狀態卡片結構
- help: 靜態系統版本資訊頁
- zh-TW.json + en.json: 新增所有頁面 i18n key(零 hardcode 字串)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-02 23:49:24 +08:00 |
|
OG T
|
6266a4fc01
|
fix(test): 更新 AIProviderEnum 測試 — NVIDIA → NEMOTRON (Phase 24 B3)
CD Pipeline / build-and-deploy (push) Successful in 7m6s
- test_nvidia_provider_in_router: 改為驗證 NEMOTRON enum
- test_tool_calling_route: 改為期望 NEMOTRON provider
- test_existing_routing_not_affected: 排除 NEMOTRON (非一般路由)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-02 23:39:46 +08:00 |
|
OG T
|
e9a1ac6276
|
fix(ui): 對齊 figma-v2 設計稿 — IncidentCard + OpenClawPanel 視覺修正
CD Pipeline / build-and-deploy (push) Failing after 35s
IncidentCard:
- 背景 #fff、圓角 12px、頂邊條 4px(對齊設計稿)
- P1 嚴重度色修正為 #F59E0B(amber,非 orange)
- Severity badge 改為 4px 圓角 uppercase 樣式
- Impact 指標列移除灰底方塊,改為細邊框分隔線
- AI 提案按鈕改為全寬居中橙色風格
OpenClawPanel:
- 移除多餘 rounded-xl/backdrop/border(由父層卡片容器提供)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-02 23:36:59 +08:00 |
|
OG T
|
97d86861ed
|
fix(ai_router): C1 修復 — AIProviderEnum 對齊 Registry 實際 Provider 名稱
CD Pipeline / build-and-deploy (push) Failing after 37s
問題: AIProviderEnum.NVIDIA = "nvidia" 在 Registry 無對應 Provider
OpenClawNemoProvider.name = "openclaw_nemo"
NemotronProvider.name = "nemotron"
→ 高複雜度/Tool Calling 路由永遠 skip,靜默 fallback 到 Gemini/Ollama
修復:
- 新增 OPENCLAW_NEMO = "openclaw_nemo" (一般推理, via .188 → NVIDIA NIM)
- 新增 NEMOTRON = "nemotron" (Tool Calling, direct NVIDIA NIM)
- 移除 NVIDIA = "nvidia" (Registry 無對應)
- 規則 4 (複雜度>=4/HIGH風險): NVIDIA → OPENCLAW_NEMO
- route_tool_calling: NVIDIA → NEMOTRON
- Rate Limiter check: "nvidia" → "openclaw_nemo"
- _full_fallback_chain: OPENCLAW_NEMO 首位
- _tool_calling_fallback_chain: NEMOTRON 首位
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-02 23:31:31 +08:00 |
|
OG T
|
a3f02888a1
|
feat(ui): 加入 chibi 龍蝦游泳列 + 主頁卡片式佈局對齊設計稿
CD Pipeline / build-and-deploy (push) Has been cancelled
- Metrics Strip 頂部加入龍蝦游泳動畫列
- 主體 Feed 和 Right Panel 改為圓角卡片式(背景白/陰影)
- Section header 加入橘點裝飾,對齊 figma-v2 設計稿
- 所有資料串接真實 API,無假資料
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-02 23:31:01 +08:00 |
|
OG T
|
ef5b1ab85a
|
fix(knowledge-base): 串接 NEXT_PUBLIC_API_URL 取代相對路徑
CD Pipeline / build-and-deploy (push) Successful in 7m6s
- /api/v1/knowledge 改用 process.env.NEXT_PUBLIC_API_URL 前綴
- 確保 Docker build 後能正確連到後端 API,不再打到 Next.js app server
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-02 23:19:14 +08:00 |
|
OG T
|
2d87eca5f6
|
fix(ci): 移除 e2e-health push 觸發 — 根治「每 commit 兩個 run」問題
CD Pipeline / build-and-deploy (push) Has been cancelled
根本原因:
cd.yaml + e2e-health.yaml 都監聽 push main
→ 每次 push 產生兩個 run,互相 cancel,code commit 被跳過
解法:
e2e-health.yaml 移除 push trigger,只保留排程(每日00:00)和手動觸發
CD 本身已有 smoke test,E2E 不需要每次 push 重複跑
Co-Authored-By: Claude Code <noreply@anthropic.com>
|
2026-04-02 23:17:31 +08:00 |
|
OG T
|
cde61b06ae
|
fix(ci): CD 改搶佔模式 — cancel-in-progress: true
CD Pipeline / build-and-deploy (push) Has been cancelled
E2E Health Check / e2e-health (push) Successful in 17s
問題: 多個 commit 快速推版時排隊堆積;docker build 卡住阻塞整條 queue
根因: cancel-in-progress:false 讓每個 commit 都排隊等,新的無法取消舊的
修復: cancel-in-progress:true — 新 push 立即取消舊 build,只部署最新 commit
安全: concurrency group 保證同時只有一個 job 跑,kubectl rollout status 防半部署
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-02 23:16:24 +08:00 |
|
OG T
|
1e1d7e34cd
|
fix(ci): 加入 timeout-minutes:45 防止 CD job 無限卡住
CD Pipeline / build-and-deploy (push) Waiting to run
E2E Health Check / e2e-health (push) Successful in 18s
問題: task 288 卡住 71 分鐘 (docker build/push Harbor 網路問題)
影響: 後續 task 排隊無法執行
修復: job 超過 45 分鐘自動 fail,下次 push 重新觸發
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-02 23:15:05 +08:00 |
|