OG T
|
890d438cdf
|
fix(group): 群組告警格式對齊 TelegramMessage 模板 + 修復 AI 討論觸發
CD Pipeline / build-and-deploy (push) Has been cancelled
- 群組告警改用 ═══ 分隔線格式,與個人 chat 一致
- 加入「OpenClaw 與 NemoClaw 正在分析中...」提示
- 加 group_msg_id 為空時的 warning log
- clawbot-v5 STANDBY_MODE: main.py 檢查條件修正
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 17:36:01 +08:00 |
|
OG T
|
c65ed5b1c9
|
feat(telegram): SRE 戰情室群組三頭政治 Triumvirate (ADR-053)
CD Pipeline / build-and-deploy (push) Successful in 7m6s
- config.py: 新增 OPENCLAW_BOT_TOKEN / NEMOTRON_BOT_TOKEN / SRE_GROUP_CHAT_ID
- telegram_gateway.py: send_to_group / send_as_openclaw / send_as_nemotron / trigger_group_ai_discussion / _send_approval_card_to_group
- send_approval_card 告警發送後非同步觸發群組 AI 雙向討論
- configmap: SRE_GROUP_CHAT_ID=-1003711974679
- secrets: OPENCLAW_BOT_TOKEN / NEMOTRON_BOT_TOKEN CHANGE_ME 佔位
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 17:16:05 +08:00 |
|
OG T
|
ff5a77f7a9
|
fix(telegram): 啟用 Polling + 修正 InfraAlertMessage 格式
CD Pipeline / build-and-deploy (push) Successful in 6m52s
1. TELEGRAM_ENABLE_POLLING: false→true
- clawbot-v5 已停止 polling (STANDBY_MODE)
- AWOOOI API 接管,統帥可與 OpenClaw/NemoClaw 雙 AI 對話
2. InfraAlertMessage.format() 加入 note 欄位
- NIM 慢屬正常不再顯示「自動修復失敗」
- 改為 💡 資訊性提示
3. NIM 探測端點改為 /v1/models (輕量,不觸發計費)
timeout: 10s → 25s (NIM 免費 tier 冷啟動)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 16:43:40 +08:00 |
|
OG T
|
15aabd6ac5
|
fix(chat+nim): 修復首席架構師 Review I1-I4 + S3 四項重要問題
CD Pipeline / build-and-deploy (push) Successful in 7m9s
I1: chat_manager._call_openclaw timeout=30.0 → 讀 settings.OPENCLAW_TIMEOUT
I2: nvidia_provider.py stale comment "45" → "55" 對齊 ConfigMap
I3: asyncio.shield 移除 — shield 超時後 task 繼續跑但無人等待 (silent leak)
I4: ChatManager.__init__ 移除 repo 實例 (leWOOOgo 禁 Service 持有 repository)
S3: _check_nemotron_health probe 10s → 25s + /v1/models 輕量端點
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 16:36:16 +08:00 |
|
OG T
|
be247d6c5c
|
fix(chat): OpenClaw timeout 30→40s,NemoClaw 50→60s
CD Pipeline / build-and-deploy (push) Successful in 6m51s
get_system_context() k8s/DB 查詢加上 _call_openclaw 30s,
總計超過外層 shield 30s 導致 OpenClaw 全部超時。
放寬 timeout 讓兩個 AI 有足夠時間回應。
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 16:27:08 +08:00 |
|
OG T
|
d8c9e29485
|
fix(heartbeat): 撤銷錯誤的 Nemotron 自動關閉邏輯
CD Pipeline / build-and-deploy (push) Successful in 6m53s
之前錯誤地在偵測到 Nemotron 慢時自動執行
ENABLE_NEMOTRON_COLLABORATION=false,
這等於自動關掉產品核心功能。
Nemotron NIM 免費 tier 延遲 11-45s 是已知特性(Memory 有記載),
不是需要自動修復的異常。
現在:偵測慢只發告警通知,不執行任何自動修復。
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 15:34:34 +08:00 |
|
OG T
|
1430b1283d
|
fix(chat+nvidia): 還原 OpenClaw+Nemotron 架構 + 修 30s timeout 根因
CD Pipeline / build-and-deploy (push) Has been cancelled
ChatManager 還原:
- OpenClaw (188:8088) 負責 RCA 仲裁,不改用 Gemini (未經批准)
- NemoClaw (NVIDIA NIM nemotron-mini-4b) 負責補充/評論
- 雙 AI 並行執行,OpenClaw 30s / NemoClaw 50s timeout
- 支援 @openclaw / @nemo 指定對象
nvidia_provider.py 修 timeout 根因:
- NVIDIA_TIMEOUT 從硬編碼 30.0 改為讀 NEMOTRON_TIMEOUT_SECONDS (45s)
- Memory 記載 NIM 免費 tier 延遲 11-45s,30s 硬編碼導致慢請求全超時
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 15:34:02 +08:00 |
|
OG T
|
d522c51deb
|
fix(infra-alert): Nemotron 異常告警套用標準模板 + 真正自動修復
CD Pipeline / build-and-deploy (push) Has been cancelled
1. 新增 InfraAlertMessage dataclass — 基礎設施異常的標準告警格式
(之前 Nemotron 告警是硬編碼文字,不走任何模板)
2. 偵測 Nemotron 異常時自動執行修復:
kubectl set env ENABLE_NEMOTRON_COLLABORATION=false
(之前只是把指令印在訊息裡,從未執行)
3. 告警顯示自動修復結果 (✅ 已自動修復 / ❌ 失敗)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 15:29:20 +08:00 |
|
OG T
|
e93ada0452
|
fix(chat): OpenClaw 改走 Gemini Flash,移除 Ollama 依賴
CD Pipeline / build-and-deploy (push) Successful in 7m18s
Ollama 188 完全卡死 (0 bytes/30s timeout),無法作為對話後端。
雙 AI 皆使用 Gemini Flash,靠不同 persona 和 temperature 區分:
- OpenClaw: temperature=0.5 (精準果斷)
- NemoClaw: temperature=0.9 (分析發散)
同時 kubectl set env ENABLE_NEMOTRON_COLLABORATION=false
停止每個 incident 白白等待 30s Nemotron timeout
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 15:20:23 +08:00 |
|
OG T
|
d9007e6855
|
feat(chat+monitor): 雙 AI 對話重寫 + Nemotron 健康監控告警
CD Pipeline / build-and-deploy (push) Successful in 6m56s
ChatManager 重寫 (Phase 22.6):
- @openclaw <msg> → 只有 OpenClaw 回應 (Ollama qwen2.5:7b)
- @nemo <msg> → 只有 NemoClaw 回應 (Gemini Flash)
- 無前綴 → OpenClaw 先答,NemoClaw 評論/反駁
NemoClaw 改用 Gemini Flash (棄 NIM nemotron-mini-4b 因為 15s+ 回應時間)
TelegramGateway 心跳新增 Nemotron 健康探測:
- 每次心跳探測 NVIDIA NIM API (10s timeout)
- 異常時立刻發 Telegram 告警 + 緩解指令
- 補足 Nemotron 100% 超時卻無告警的監控盲區
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 14:59:06 +08:00 |
|
OG T
|
c1834a7156
|
feat(kb+apm): KB Phase 2-A 自動萃取 + KB-D Markdown 詳情面板 + APM 趨勢圖
CD Pipeline / build-and-deploy (push) Successful in 7m28s
- KB-A: 新增 knowledge_extractor_service.py (Ollama llama3.2:3b 本地推理)
- KB-A: incident_service.py resolve hook (fire-and-forget asyncio.create_task)
- KB-D: 引入 react-markdown + remark-gfm,知識庫詳情面板 Markdown 渲染
- KB-D: 批准/封存按鈕串接 API (POST /knowledge/{id}/approve, PATCH status)
- KB-D: i18n 新增 approving/archiving 載入狀態文字
- APM: apm/page.tsx 整合 TimeSeriesChart sparkline (使用 trend[] 欄位)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 14:40:27 +08:00 |
|
OG T
|
2e9845074e
|
fix(test): nvidia → openclaw_nemo 對齊 RATE_LIMITS/COST_LIMITS key (I3)
CD Pipeline / build-and-deploy (push) Successful in 6m57s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 14:00:21 +08:00 |
|
OG T
|
e60225ea29
|
fix(ai): I1+I3 — Redis TTL + openclaw_nemo 命名對齊
CD Pipeline / build-and-deploy (push) Failing after 36s
I1: ai_control.py 所有寫入 Redis 的 key 加入 30 天 TTL
防止 ai:control:* keys 永久累積造成記憶體洩漏
I3: ai_rate_limiter.py "nvidia" key → "openclaw_nemo"
對齊 Phase 24 AIProviderEnum,使 rate limit 正確作用
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 13:22:36 +08:00 |
|
OG T
|
e7b4f43b60
|
fix(knowledge): 路由改為無尾斜線避免 307 redirect
CD Pipeline / build-and-deploy (push) Successful in 6m49s
GET "" 代替 "/" 讓 /api/v1/knowledge 直接回應,
不再觸發 FastAPI trailing-slash 307 重導向。
此修正與 ProxyHeadersMiddleware 雙重保障。
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 12:55:18 +08:00 |
|
OG T
|
9cf9e851e7
|
fix(api): 修正 Nginx 反向代理 307 redirect http:// Location 問題
CD Pipeline / build-and-deploy (push) Has been cancelled
加入 ProxyHeadersMiddleware,讓 FastAPI 信任 X-Forwarded-Proto header。
解決知識庫頁面無法載入內容的問題 (HTTPS→HTTP mixed content block)。
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 12:48:36 +08:00 |
|
OG T
|
b225c23ad8
|
fix(ai_router): DIAGNOSE/ALERT_TRIAGE 改用 llama3.2:3b 避免 90秒 timeout
CD Pipeline / build-and-deploy (push) Successful in 7m5s
qwen2.5:7b-instruct 在 prod 需要 >90s,導致 DIAGNOSE intent 全鏈路失敗。
llama3.2:3b (summary model) 實測 4s 回應,適合 triage 類快速判斷。
規則 3 新增特判: DIAGNOSE/ALERT_TRIAGE/QUERY → ollama summary model
不影響其他 intent 的 model 選擇邏輯。
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 12:32:01 +08:00 |
|
OG T
|
702350925a
|
fix(monitoring+layout): 修復基礎架構消失 + 監控工具全線上
CD Pipeline / build-and-deploy (push) Successful in 6m47s
- page.tsx: 右側 panel overflow:hidden → overflowY:auto,基礎架構重新顯示
- page.tsx: 監控工具卡片對齊 figma (icon box + 版本/統計行 + ›箭頭)
- monitoring.py: Gitea probe 改用 /api/v1/version (/-/readiness 404)
- monitoring.py: Grafana dashboard count 加 Basic auth
- NetworkPolicy: 補開 3002/9090/3001 egress (Grafana/Prometheus/Gitea)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 00:50:53 +08:00 |
|
OG T
|
b6105b8214
|
fix(ai): 首席架構師審查修復 C1+C2 (Phase 24 C)
C1 — telegram_gateway.py Fail-Closed 白名單:
白名單為空時 'if whitelist and ...' 為 False → 任何人可執行 /ai
修復: 'if not whitelist or user_id not in whitelist' Fail-Closed
加入 whitelist_empty 欄位到 warning log
C2 — openclaw.py list comprehension await 語法錯誤:
Python 3.11 不支援 list comprehension 中使用 await
'if not await is_provider_disabled(p)' → SyntaxError
修復: 改為 for loop 明確 await
I4: 靜默 except 改為 logger.warning
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 00:42:02 +08:00 |
|
OG T
|
8bc086af58
|
feat(infra): 完整監控工具 + 主機服務清單 + K3s Cluster 突顯
CD Pipeline / build-and-deploy (push) Successful in 6m50s
監控工具 (6個):
- 加入 Grafana (110:3002), Sentry (110:9000), Langfuse (110:3100)
- 保留 Prometheus, SigNoz, Gitea
基礎架構:
- 靜態服務目錄 HOST_CATALOG:每台主機完整服務+Port+說明
- K3s Server #2 (121) 補靜態卡 (API 未回傳)
- K3s Cluster HA 獨立藍色區塊,☸ 標題 + VIP 資訊
- 所有服務含 Port 號與功能描述
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 00:36:59 +08:00 |
|
OG T
|
dbe71f82e3
|
feat(ai): Phase 24 C — Telegram /ai 動態控制 + Redis 狀態管理
CD Pipeline / build-and-deploy (push) Has been cancelled
新增 ai_control.py:
- /ai status: 所有 Provider 狀態 + 路由模式
- /ai router on/off: 動態切換 AIRouter (覆蓋 env var)
- /ai primary <provider>: 設定主要 Provider
- /ai enable/disable <provider>: 控制 Provider 啟停
- /ai cost: 費用統計
- 白名單: OPENCLAW_TG_USER_WHITELIST 保護
telegram_gateway.py:
- _handle_chat_message 加入 /ai 指令攔截路由
- 白名單未授權返回警告
openclaw.py:
- Redis 狀態覆蓋 env USE_AI_ROUTER (/ai router on/off 生效)
- Redis primary_provider 覆蓋路由決策 (/ai primary 生效)
- Redis disabled provider 過濾 (/ai disable 生效)
Redis Keys:
ai:control:use_router
ai:control:primary_provider
ai:control:disabled:<provider>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 00:34:14 +08:00 |
|
OG T
|
b4b3a457c5
|
refactor(openclaw): Phase 24 B4 — 封存舊 fallback Provider 方法
CD Pipeline / build-and-deploy (push) Has been cancelled
[ARCHIVED] _call_ollama / _call_gemini / _call_claude
- 這三個方法為 USE_AI_ROUTER=false 回滾保留路徑
- 新路徑: USE_AI_ROUTER=true → AIRouterExecutor (ai_router.py)
- 新 Provider: ai_providers/ollama.py / gemini.py / claude.py
- 封存而非刪除: 完整移除等 Phase 24 全驗收後 (ADR-052 D11)
R3 觀察結果 (通過 ✅):
- openclaw_nemo provider: 12/12 incidents 全部正確路由
- 信心度: 0.8~0.9 正常
- USE_AI_ROUTER=true 生效確認
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 00:29:56 +08:00 |
|
OG T
|
ce11fcdc3a
|
feat(monitoring): 監控工具區塊 — Grafana/Prometheus/SigNoz/Gitea 狀態
- 新增 GET /api/v1/monitoring/status,asyncio.gather 並行探測四工具
- 前端 MonitoringTools 元件,60s 輪詢顯示狀態/版本/統計
- 新增 monitoringTools i18n key
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 00:27:47 +08:00 |
|
OG T
|
6266a4fc01
|
fix(test): 更新 AIProviderEnum 測試 — NVIDIA → NEMOTRON (Phase 24 B3)
CD Pipeline / build-and-deploy (push) Successful in 7m6s
- test_nvidia_provider_in_router: 改為驗證 NEMOTRON enum
- test_tool_calling_route: 改為期望 NEMOTRON provider
- test_existing_routing_not_affected: 排除 NEMOTRON (非一般路由)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-02 23:39:46 +08:00 |
|
OG T
|
97d86861ed
|
fix(ai_router): C1 修復 — AIProviderEnum 對齊 Registry 實際 Provider 名稱
CD Pipeline / build-and-deploy (push) Failing after 37s
問題: AIProviderEnum.NVIDIA = "nvidia" 在 Registry 無對應 Provider
OpenClawNemoProvider.name = "openclaw_nemo"
NemotronProvider.name = "nemotron"
→ 高複雜度/Tool Calling 路由永遠 skip,靜默 fallback 到 Gemini/Ollama
修復:
- 新增 OPENCLAW_NEMO = "openclaw_nemo" (一般推理, via .188 → NVIDIA NIM)
- 新增 NEMOTRON = "nemotron" (Tool Calling, direct NVIDIA NIM)
- 移除 NVIDIA = "nvidia" (Registry 無對應)
- 規則 4 (複雜度>=4/HIGH風險): NVIDIA → OPENCLAW_NEMO
- route_tool_calling: NVIDIA → NEMOTRON
- Rate Limiter check: "nvidia" → "openclaw_nemo"
- _full_fallback_chain: OPENCLAW_NEMO 首位
- _tool_calling_fallback_chain: NEMOTRON 首位
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-02 23:31:31 +08:00 |
|
OG T
|
58002e6bf4
|
feat(phase24-b3): NemotronProvider 抽取 + incident-card 重構
E2E Health Check / e2e-health (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
Phase 24 B3:
- 新增 ai_providers/nemotron.py: NemotronProvider 封裝 K8s Tool Calling
搬移自 openclaw.py _call_nemotron_tools (L1623-1785)
capabilities=tool_calling, privacy_level=cloud
- ai_router.py: 加入 NemotronProvider 到 Registry
- ai_providers/__init__.py: 匯出 NemotronProvider
Phase R-UI2 (架構師 Warning):
- incident-card.tsx: 抽取 useApprovalAction hook
handleApprove/handleReject 60行重複邏輯 → 共用 hook
行為完全不變,維護性提升
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-02 23:12:42 +08:00 |
|
OG T
|
5a8aae89c4
|
fix(phase24): 首席架構師 Review C1/C2/C3/I4 修復
CD Pipeline / build-and-deploy (push) Successful in 7m12s
E2E Health Check / e2e-health (push) Successful in 18s
C1 (P0): AIRouterExecutor.execute() 補 Langfuse Trace (D5)
- 建立 langfuse_trace("ai_router_execute") 包住整個執行鏈
- 成功時記錄 generation (model/input/output/tokens/cost)
- prod 所有 AI 呼叫現在有 LLMOps 追蹤
C2 (P0): 絞殺者改為呼叫 AIRouter.route() 智慧路由
- 先取得 RoutingDecision (意圖分類 + 複雜度評分)
- provider_order 從 selected_provider + fallback_chain 動態生成
- D1 意圖路由矩陣、D7 隱私保護 (DIAGNOSE 強制 local) 生效
C3 (P1): 型別標注 typo 修復
- AIProviderEnumEnum → AIProviderEnum
- AIProviderEnumProtocol → AIProviderProtocol
I4 (P1): interfaces.py AIProvider Protocol 補 close() 定義
S1: ai_router.py 模組版本標頭更新至 v4.0
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-02 21:47:06 +08:00 |
|
OG T
|
04978995c1
|
fix(metrics): 實際呼叫 record_alert_chain_success (Wave A.5)
CD Pipeline / build-and-deploy (push) Successful in 6m47s
E2E Health Check / e2e-health (push) Successful in 17s
alert_chain_last_success_timestamp 指標已定義但從未被 set。
在 alertmanager_webhook 兩個主要成功路徑呼叫 record_alert_chain_success():
- CI/CD 告警成功處理後
- LLM 分析完成後
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-02 20:10:58 +08:00 |
|
OG T
|
f5b8738185
|
fix(wave-a): Wave A 告警鏈路驗收修復
CD Pipeline / build-and-deploy (push) Has been cancelled
E2E Health Check / e2e-health (push) Has been cancelled
- sentry_webhook: 加入 GET /health endpoint (smoke test 探測用)
- smoke_test: alertmanager 路徑改為 /webhooks/health (已存在)
- smoke_test: Prometheus URL 改為正確的 110:9090
- smoke_test: Alert chain metric 標記 critical=False (初始化期正常)
Wave A.6 smoke test 現在 6/8 → 7/8 checks pass (sentry health deploy 後 8/8)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-02 20:08:26 +08:00 |
|
OG T
|
5a7919f55c
|
fix(test): AIProvider → AIProviderEnum (Phase 24 C1 rename fix)
CD Pipeline / build-and-deploy (push) Successful in 7m11s
E2E Health Check / e2e-health (push) Successful in 16s
C1 修復 (3ad7b60) 重命名 AIProvider Enum 為 AIProviderEnum
test_nvidia_provider.py 未同步更新,導致 CD 測試失敗。
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-02 19:38:04 +08:00 |
|
OG T
|
3ad7b60f68
|
fix(ai): Phase 24 R1+R2 首席架構師 Review 修復 (C1-C3 + I1-I5)
E2E Health Check / e2e-health (push) Successful in 18s
CD Pipeline / build-and-deploy (push) Has been cancelled
Critical 修復:
- C1: AIProvider Enum 改名為 AIProviderEnum (避免與 Protocol 同名衝突)
- C2: 共用 Circuit Breaker → per-provider _SimpleCircuitBreaker
(避免 Gemini 掛掉時 Ollama 也被擋)
- C3: cache_key 移到 try 外面 (避免 UnboundLocalError)
Important 修復:
- I1: Claude hardcode model → 用 get_model_registry()
- I2: Claude 追蹤 tokens/cost (input_tokens + output_tokens)
- I3: Ollama 追蹤 tokens (eval_count + prompt_eval_count)
- I4: Gemini temperature → 用 model_registry
- I5: AIProviderRegistry.close_all() shutdown hook
2026-04-02 ogt: Phase 24 首席架構師審查通過後修復
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-02 13:40:58 +08:00 |
|
OG T
|
73e8f8ab77
|
feat(ai): Phase 24-A+B1 — AI Provider Registry + 絞殺者包裝 (ADR-052)
E2E Health Check / e2e-health (push) Successful in 16s
CD Pipeline / build-and-deploy (push) Has been cancelled
Brain Layer 雙軌 Registry 架構:
- 新建 src/services/ai_providers/ 目錄 (interfaces + 4 providers)
- OllamaProvider (local, rca/chat/code_review)
- GeminiProvider (cloud, rca/chat)
- ClaudeProvider (cloud, rca/chat/code_review)
- OpenClawNemoProvider (cloud, rca — 委派 188→NIM)
- 擴展 ai_router.py 加入:
- AIProviderRegistry (動態註冊/啟停)
- AIRouterExecutor (Cache + 閘門 CB/RL/Sem + 執行)
- openclaw.py 絞殺者包裝: USE_AI_ROUTER=true 走新路徑
- config.py + ConfigMap 加入 USE_AI_ROUTER=false (安全預設)
- ADR-052 正式文件 (14 項決策 D1-D14)
- HARD_RULES v1.7 加入 AI Router 規範
安全: USE_AI_ROUTER=false 預設不啟用,需手動開啟觀察
回滾: kubectl set env deployment/awoooi-api USE_AI_ROUTER=false
2026-04-02 ogt: Phase 24 首批實作
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-02 13:16:09 +08:00 |
|
OG T
|
05cd9cbab4
|
fix(web): 驗收報告 6 個問題修復
E2E Health Check / e2e-health (push) Successful in 17s
CD Pipeline / build-and-deploy (push) Has been cancelled
1. [Medium] Metrics Strip [object Object] — 移除 pendingApprovals 陣列直接渲染
+ label 硬編碼改 i18n (activeIncidents/serviceHealth/todayIncidents 等)
2. [Low] KB GET /{id} 不過濾 archived — get_by_id 加 status != ARCHIVED
3. [Low] favicon.ico 404 — 新增 NemoClaw SVG favicon + layout metadata
4. [Medium] auto-repair console errors — fetchEval 加 try-catch 靜默處理
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-02 10:30:43 +08:00 |
|
OG T
|
db1aed81d9
|
fix(db): C1 時區統一遷移 — utc_now → taipei_now (全 5 table)
E2E Health Check / e2e-health (push) Successful in 18s
CD Pipeline / build-and-deploy (push) Has been cancelled
🔴 首席架構師審查 C1: 全系統禁止 UTC,必須台北時區 +8
- utc_now() → taipei_now() (調用 src.utils.timezone.now_taipei)
- 影響: ApprovalRecord, TimelineEvent, AuditLog, IncidentRecord, KnowledgeEntryRecord
- 13 處 default/onupdate 全部替換
- 移除 datetime.UTC import
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-02 09:13:36 +08:00 |
|
OG T
|
628387de8c
|
fix: risklevel migration 自動化 + Telegram Whitelist 注入
E2E Health Check / e2e-health (push) Successful in 17s
CD Pipeline / build-and-deploy (push) Has been cancelled
1. init_db() 啟動時自動確保 risklevel enum 包含 'high' 值
(Phase 23 新增,避免舊 DB 缺值導致 InvalidTextRepresentation)
2. CD Pipeline 新增 OPENCLAW_TG_USER_WHITELIST 自動注入
(之前為 CHANGE_ME,已更新為實際 user ID 5619078117)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-02 09:13:13 +08:00 |
|
OG T
|
d2bad44173
|
fix(api): KB 架構審查修復 I3-I5
E2E Health Check / e2e-health (push) Successful in 17s
CD Pipeline / build-and-deploy (push) Has been cancelled
- I3: Service 層加 IKnowledgeRepository Protocol 型別標注
- I4: search 方法加入 tags JSONB 搜尋 (cast→String→ilike)
- I5: get_categories 獨立方法,不再繞道 list_entries(limit=0)
首席架構師審查 87/100 → 全部 Important issues 已修復
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-02 09:05:54 +08:00 |
|
OG T
|
48a0bc66f7
|
fix(api): KB 首席架構師審查修復 (I1 tags filter + I2 type annotation)
E2E Health Check / e2e-health (push) Successful in 16s
CD Pipeline / build-and-deploy (push) Has been cancelled
- I1: Repository list_entries 實作 tags JSONB @> 篩選 (之前聲明未實作)
- I2: ORM tags 型別從 list[dict[str, Any]] 修正為 list[str]
首席架構師審查: 87/100
C1 時區(UTC→Taipei) 為既有系統性問題,另開 task 統一遷移
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-02 09:04:41 +08:00 |
|
OG T
|
e17248fd10
|
fix: 首席架構師審查修復 — i18n/CD/時區/死碼清理
E2E Health Check / e2e-health (push) Successful in 16s
CD Pipeline / build-and-deploy (push) Has been cancelled
P0 前端 i18n 合規 (6 檔案):
- settings/page.tsx: 全面改用 useTranslations('settings')
- auto-repair/page.tsx: 30+ 處硬編碼改用 t('autoRepair.*')
- sidebar.tsx: sectionLabel 改用 tSection(),aria-label 國際化
- openclaw-panel.tsx: STATUS_MESSAGES 改用 tPanel(),Production 改用 tBrand
- alerts/page.tsx: StatPill label 改用 t('incident.severity.*')
P1 CD Pipeline:
- cd.yaml: runs-on 改 self-hosted (ADR-039)
- Telegram Secret 注入失敗改為 exit 1 (ADR-035)
- kubectl patch op:replace → op:add (首次部署相容)
P2 後端:
- langfuse_client.py: 移除 v4.x 死碼分支 (SDK 鎖定 <3.0.0)
- ai.py: 標記 TODO(R4) Router 瘦身
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-02 09:02:41 +08:00 |
|
OG T
|
d32d84efce
|
feat(telegram): 接通 Phase 22 Nemotron 雙軌顯示 (ADR-044)
CD Pipeline / build-and-deploy (push) Has been cancelled
E2E Health Check / e2e-health (push) Has been cancelled
根本原因: format_with_nemotron() 已實作但從未被呼叫
- send_approval_card() 新增 nemotron_enabled/tools/validation/latency 參數
- TelegramMessage 建構時傳入 nemotron 欄位
- nemotron_enabled=true 時自動使用 format_with_nemotron() 格式
- _push_decision_to_telegram() 從 proposal_data 提取並傳遞 nemotron 資料
效果: Telegram 同時顯示 OpenClaw 仲裁 + Nemotron 執行方案雙區塊
2026-04-02 ogt: Phase 22 最後一哩路
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-02 08:59:03 +08:00 |
|
OG T
|
d8be78b135
|
feat(api): Knowledge Base Phase 1 後端四層架構
CD Pipeline / build-and-deploy (push) Successful in 7m0s
E2E Health Check / e2e-health (push) Successful in 17s
Type Sync Check / check-type-sync (push) Failing after 30s
- models/knowledge.py: Pydantic Schema (EntryType/Source/Status/CRUD)
- db/models.py: KnowledgeEntryRecord ORM (PostgreSQL)
- repositories/interfaces.py: IKnowledgeRepository Protocol
- repositories/knowledge_repository.py: PostgreSQL CRUD 實作
- services/knowledge_service.py: 業務邏輯 (get_db_context 內部管理 session)
- api/v1/knowledge.py: REST Router (get_knowledge_service,無直接 DB 存取)
- main.py: 掛載 Knowledge Base Router
- k8s/jobs/migrate-knowledge-entries.yaml: DB Migration Job
API 端點: GET/POST / | GET/PATCH/DELETE /{id} | POST /{id}/approve
GET /search | GET /categories
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-02 00:55:56 +08:00 |
|
OG T
|
077e1cd637
|
fix(telegram): 修復所有 webhook 路徑缺失 ai_provider → Telegram 顯示「AI 仲裁判定」
E2E Health Check / e2e-health (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
根本原因: send_approval_card() 有 ai_provider 參數,但三個 webhook 呼叫端都沒傳:
- signoz_webhook.py: 有 ai_provider 參數但未轉傳給 send_approval_card
- sentry_webhook.py: 有 analysis.analyzed_by 但未傳
- webhooks.py: _push_to_telegram_background 缺少 ai_provider 參數
修復後 Telegram 會顯示「🤖 OpenClaw Nemo 仲裁」而非「🤖 AI 仲裁判定」
2026-04-02 ogt: 三個 webhook 路徑統一修復
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-02 00:51:13 +08:00 |
|
OG T
|
46a346d948
|
fix(llmops): 鎖定 langfuse SDK v2.x (v3.x/v4.x 均已移除 client.trace() API)
CD Pipeline / build-and-deploy (push) Successful in 6m33s
E2E Health Check / e2e-health (push) Successful in 15s
排查確認:v3.x 也無 client.trace(),langfuse_client.py 依賴此 API。
鎖定 <3.0.0 確保安裝 v2.60.10 (v2 最新),trace/generation/score 均可用。
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-02 00:23:29 +08:00 |
|
OG T
|
de04de1d4f
|
fix(telegram): 新增 openclaw_nemo/nvidia_nim 顯示名稱映射
CD Pipeline / build-and-deploy (push) Has been cancelled
E2E Health Check / e2e-health (push) Has been cancelled
- format() 和 format_with_nemotron() 兩處 provider_names 均加入:
openclaw_nemo → "OpenClaw Nemo"
openclaw_nvidia_nim → "OpenClaw Nemo"
openclaw_qwen → "OpenClaw Nemo"
- 修正顯示 "OPENCLAW_NEMO" (大寫) 的問題
- 2026-04-01 ogt: 配合 AI 仲裁架構調整
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-02 00:20:37 +08:00 |
|
OG T
|
ae6af27bda
|
fix(llmops): 鎖定 langfuse SDK v2.x — pyproject.toml (實際建置來源)
E2E Health Check / e2e-health (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
requirements.txt 不是 Docker 建置來源,Dockerfile 使用 uv pip install . 從
pyproject.toml 安裝依賴。v4.x 改用 OTLP 協定,與 langfuse_client.py 的
client.trace() API 不相容,鎖定 <4.0.0。
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-01 22:58:49 +08:00 |
|
OG T
|
27b4d2a76a
|
fix(telegram): strip <placeholder> 佔位符防止 HTML parse 錯誤
CD Pipeline / build-and-deploy (push) Has been cancelled
E2E Health Check / e2e-health (push) Has been cancelled
OpenClaw 生成的 kubectl_command 含 <受影響服務名稱>
在 Telegram HTML parse mode 下造成 'Can't parse entities'
用 regex strip 所有 <...> 佔位符
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-01 22:50:07 +08:00 |
|
OG T
|
2aeec34735
|
fix(llmops): 鎖定 langfuse SDK v2.x (避免 v4.x OTLP 不相容)
CD Pipeline / build-and-deploy (push) Has been cancelled
E2E Health Check / e2e-health (push) Has been cancelled
CD Pipeline (Dev) / build-and-deploy-dev (push) Successful in 2m40s
問題: langfuse>=2.0.0 安裝了 v4.0.5,該版本移除 client.trace() 改用 OTLP
根因: Langfuse server v2.95.11 的 OTLP endpoint (/api/public/otel) 返回 404
但舊版 /api/public/ingestion endpoint 正常 (HTTP 207)
修復: 鎖定 langfuse>=2.0.0,<4.0.0,保留 client.trace() API 相容性
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-01 22:49:00 +08:00 |
|
OG T
|
88051388d4
|
fix(ai): 修復 _call_openclaw_analyze datetime 序列化失敗 → fallback Gemini
CD Pipeline / build-and-deploy (push) Has been cancelled
E2E Health Check / e2e-health (push) Has been cancelled
signals dict 內含 datetime 物件,httpx json= 無法序列化
加入 _to_serializable 遞迴轉換,datetime → str
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-01 22:37:04 +08:00 |
|
OG T
|
ff85b0581a
|
fix(ai): 修復 analyze_and_propose 方法呼叫錯誤
CD Pipeline (Dev) / build-and-deploy-dev (push) Failing after 9s
CD Pipeline / build-and-deploy (push) Has been cancelled
E2E Health Check / e2e-health (push) Has been cancelled
- OpenClawService 沒有 analyze(), 正確方法是 analyze_alert(alert_context)
- 包裝 host_statuses 為 alert_ctx 傳入
- 解包返回值 (8-tuple) 用 *_ 忽略尾端
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-01 22:33:51 +08:00 |
|
OG T
|
a1f7d1f495
|
fix(db): 固化 risklevel ADD VALUE 'high' 為正式 migration
CD Pipeline / build-and-deploy (push) Successful in 6m58s
E2E Health Check / e2e-health (push) Successful in 18s
Phase 23 緊急修復已在 prod/dev 手動執行,此檔作為正式記錄
使用 DO 塊防止重複執行錯誤
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-01 21:36:15 +08:00 |
|
OG T
|
5dd28b2fc6
|
test(telegram): add ADR-050 info action tests (detail/reanalyze/history/keyboard)
CD Pipeline / build-and-deploy (push) Successful in 6m47s
E2E Health Check / e2e-health (push) Successful in 18s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-01 21:11:45 +08:00 |
|
OG T
|
5809d3e336
|
feat(ai): 委派 Incident RCA 給 OpenClaw (Nemo) — 架構鐵律修正
E2E Health Check / e2e-health (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
架構鐵律: OpenClaw = AI 大腦,AWOOOI API 透過 HTTP 委派仲裁
修改:
- openclaw.py: 加入 _call_openclaw_analyze(),在 LLM fallback 前先呼叫 OpenClaw
- 04-configmap.yaml: OPENCLAW_URL 修正為 :8088 (新容器 port)
- AI_FALLBACK_ORDER 改為 ["ollama","claude"] (移除 Gemini 付費 API)
OpenClaw /api/v1/analyze/incident → qwen2.5:7b 本機 Ollama (Nemo)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-01 21:11:30 +08:00 |
|