Commit Graph

219 Commits

Author SHA1 Message Date
OG T
d0f09705e5 fix(auto-repair): 修復三個阻礙自動修復的根本原因
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
1. playbook_rag: Ollama embedding http_client 滾動重啟後 is_closed
   - 新增 _get_http_client() 偵測 is_closed 自動重建
   - singleton get_playbook_rag_service() 加 is_closed 重建判斷

2. telegram: 加入 ai_model 欄位顯示底層判斷模型
   - TelegramMessage.ai_model 欄位
   - format() / format_with_nemotron() 顯示 "Nemotron (nemotron-70b)"
   - openclaw proposal_dict 加入 model 欄位
   - decision_manager / send_approval_card 串接

3. DB: 清除 9 筆 3/26 殭屍 PENDING (mock_fallback CRITICAL 測試記錄)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 11:46:25 +08:00
OG T
cddc4cb1fc fix(knowledge): 首席架構師 Review 修復 C1+C2+I1+I2 (71→~88/100)
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 7m16s
C1: IKnowledgeRepository Protocol 補齊 save_embedding + semantic_search +
    list_unembedded_entries,恢復 Interface 先行保護層

C2: embed_all_entries Service 層 raw SQL 移至 Repository.list_unembedded_entries()
    Service 改透過 Protocol 呼叫,符合 leWOOOgo 積木化原則

I1: asyncio.create_task 加入 _pending_tasks set 持有引用,防 GC 回收與
    Shutdown 時 Task 遺失;task done 後自動 discard

I2: OllamaEmbeddingService 從每次 new 改為 KnowledgeService.__init__ 注入,
    單一實例重用

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 11:22:38 +08:00
OG T
8960bba7fe feat(knowledge): pgvector RAG — 語意搜尋 + 背景 Embedding 管線
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
- repository: save_embedding (raw SQL pgvector cast) + semantic_search (cosine <=>)
- service: create_entry 背景 embed + semantic_search + embed_all_entries 批次補 embed
- router: GET /semantic-search (q/limit/threshold) + POST /embed-all 管理端點

向量模型: nomic-embed-text (Ollama 192.168.0.188, 768 dims)
索引: ivfflat cosine (knowledge_entries.embedding vector(768))

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 11:17:24 +08:00
OG T
9e78d5222a feat(group-chat): 方案B slash commands — /status /incidents /cost /pods /help (2026-04-03 ogt)
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 7m5s
E2E Health Check / e2e-health (push) Successful in 17s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 20:02:27 +08:00
OG T
e833065043 feat(group-chat): Reply Bot 訊息時只有被Reply的Bot回應 (2026-04-03 ogt)
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 7m0s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 19:48:10 +08:00
OG T
8d09b18477 fix(group-chat): 移除雙AI互相評論 — 單獨@只有該AI回覆,雙AI路徑不再互評
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 19:44:49 +08:00
OG T
79a770ffe5 feat(group): 移除告警自動 AI 分析 — 老闆指示
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 7m3s
告警發到群組只顯示卡片,不自動觸發 OpenClaw/NemoClaw 分析
老闆和 AI 可手動在群組討論告警內容

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 19:36:40 +08:00
OG T
b62d7d3eb0 feat(chat): OpenClaw 改用 Gemini 2.0 Flash-Lite (最便宜)
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
Input $0.075/1M, Output $0.30/1M (比 Flash 便宜 25%)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 19:35:13 +08:00
OG T
6cd4280168 feat(chat): NemoClaw Claude API 加 token+費用統計
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
Claude Haiku 4.5: Input $0.80/1M, Output $4.00/1M
每次回覆顯示: token 數 | 本次費用 | 本月累計
Redis key: claude_cost:YYYY-MM,TTL 40 天

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 19:29:22 +08:00
OG T
781a6dac3e feat(chat): NemoClaw→Claude Haiku API + 告警只由 OpenClaw 分析
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 7m20s
老闆指示 (2026-04-03):
1. NemoClaw 改接 Claude API (claude-haiku-4-5),快速中文對話
2. 群組告警分析只觸發 OpenClaw,NemoClaw 不分析告警
3. OpenClaw/NemoClaw 雙向自然語言對話維持

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 19:19:56 +08:00
OG T
10ad2a67c7 fix(chat): gemini-2.0-flash 修正 + 全形小O支援 + NemoClaw 回 NIM
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
1. Gemini 模型名稱: gemini-1.5-flash → gemini-2.0-flash (404修復)
2. 費用計算: 2.0 Flash 定價 Input $0.10/1M, Output $0.40/1M
3. 全形/半形統一: unicodedata.normalize NFKC,支援「小O」全形輸入
4. NemoClaw: Ollama 188 負載高超時,暫回 NIM nemotron-mini-4b

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 19:17:08 +08:00
OG T
08b02280f8 feat(chat): Gemini 月費用上限 $10 USD + Redis 累計追蹤
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 6m55s
- 每次呼叫前檢查當月累計費用,超過 $10 USD 拒絕呼叫
- Redis key: gemini_cost:YYYY-MM,TTL 40 天
- 每次回覆顯示: token 數 | 本次費用 | 本月累計
- 超限時回傳警告訊息告知老闆

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 19:01:21 +08:00
OG T
2828cd897a feat(chat): OpenClaw→Gemini Flash + NemoClaw→Ollama llama3.2:3b
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
老闆指示 (2026-04-03):
- OpenClaw: Gemini 1.5 Flash API,每次回覆附 token+費用統計
- NemoClaw: Ollama llama3.2:3b,本地快速回應 (3-8s)
- 費用控管: Gemini 月上限 $10 USD

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 18:59:28 +08:00
OG T
fbf122fa1f fix(chat): OpenClaw 改用 NIM llama-3.1-8b 對話 + NemoClaw timeout 120s + 老闆稱謂
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 7m9s
1. _call_openclaw: 改用 NIM meta/llama-3.1-8b-instruct
   舊的 analyze/incident 是告警 API,回覆是告警格式,不適合對話
2. _call_nemotron: 移除 Ollama fallback,回到純 NIM
3. NEMOTRON_TIMEOUT_SECONDS: 55 → 120 (ConfigMap 已更新)
4. 修正「統帥」→「老闆」

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 18:41:15 +08:00
OG T
2da8da5a25 fix(chat): OpenClaw 改用 Ollama qwen2.5 做對話 + NemoClaw 加 Ollama fallback
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 6m51s
問題: _call_openclaw 用 analyze/incident API → 回覆是告警格式,不是自然語言
修法:
  1. OpenClaw chat → Ollama qwen2.5:7b-instruct (本地,快速,無格式污染)
  2. NemoClaw → NIM 優先,超時 fallback 到 Ollama llama3.2:3b

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 18:30:31 +08:00
OG T
d1436157b7 fix(polling): httpx client timeout 改為分開設定,read=50s > getUpdates 40s
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
根因: httpx.AsyncClient(timeout=30.0) 的 read timeout 30s
     < getUpdates 的 long polling timeout 40s
     導致每次 getUpdates 都被 client 打斷 → polling loop 無法正常收訊息

修法: httpx.Timeout(connect=10s, read=50s) 讓 long polling 正常等待

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 18:29:22 +08:00
OG T
dfc1e19c07 fix(group): 互相評論補充也加 reply_to_message_id 引用原訊息
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 18:24:51 +08:00
OG T
09241f102e fix(group): 群組訊息移到 security interceptor 前 — 修復 whitelist 擋掉所有群組訊息
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 7m10s
根因: intercept_telegram() 的 whitelist 是字串,user_id 是 int
      型別不匹配 → exception → telegram_chat_unauthorized → 群組訊息全被丟棄
修法: SRE 群組訊息優先路由,不走個人 whitelist
     (群組成員由 Telegram 群組管理員控制,安全邊界已存在)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 18:17:22 +08:00
OG T
203855a56e debug(group): 加 group_routing_check log 診斷 chat_id 不匹配
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 18:12:07 +08:00
OG T
63929a5e87 feat(group): 別名 小O→OpenClaw 小賀→NemoClaw + NemoClaw 強制繁中
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 7m6s
1. telegram_gateway.py: _handle_group_message 加入別名路由
   - 小O / 小o → 只有 OpenClaw 回應
   - 小賀 / 小贺 → 只有 NemoClaw 回應
   - clean_text 同步移除別名 token

2. chat_manager.py: NEMOCLAW_PERSONA 加強繁體中文強制指令
   - 明確「禁止使用英文或其他語言」防止 Nemotron 自動英文回應

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 18:00:51 +08:00
OG T
699e61ac87 feat(group): 群組雙向對話 + 格式選項C + 老闆稱謂
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 7m11s
1. _handle_group_message: SRE 群組訊息路由
   - @OpenClawAwoooI_Bot → 只有 OpenClaw 回應
   - @NemoTronAwoooI_Bot → 只有 NemoClaw 回應
   - 一般訊息 → 並行回應 + 互相評論第二輪
   - Bot 訊息自動忽略(防無限循環)

2. 告警格式改選項 C (老闆指示)
   - 【🔴 HIGH】resource_name
   - 區塊式,去掉 ═══ 長分隔線

3. AI persona 改稱呼用戶為「老闆」

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 17:51:48 +08:00
OG T
d2f02999b7 fix(alert-format): 移除 [LLM_OPENCLAW_NEMO] prefix + 擴大根因/建議字數
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 7m4s
- root_cause: 移除 [source.upper()] 前綴,直接顯示 AI 分析文字
- root_cause 截斷: 80→150 字
- suggested_action 截斷: 50→80 字
- AI provider 來源已在訊息標頭 「🤖 OpenClaw Nemo 仲裁」顯示,不需在根因重複

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 17:43:19 +08:00
OG T
50457675ef feat(group): OpenClaw + NemoClaw 並行分析告警 (統帥指示)
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
- 兩個 AI 同時分析,不互相影響(更客觀)
- 總等待時間 = max(OpenClaw, NemoClaw) 而非相加
- 兩者都 reply 同一條告警訊息,並排出現在群組
- 修正 unused message_id parameter noqa

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 17:41:50 +08:00
OG T
209fb8d4dc fix(group): supergroup 跨 Bot reply 改用 reply_parameters (Bot API v6.7+)
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
舊的 reply_to_message_id 在 supergroup 跨 Bot 回覆會 400
改用 reply_parameters + allow_sending_without_reply: true

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 17:39:53 +08:00
OG T
890d438cdf fix(group): 群組告警格式對齊 TelegramMessage 模板 + 修復 AI 討論觸發
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
- 群組告警改用 ═══ 分隔線格式,與個人 chat 一致
- 加入「OpenClaw 與 NemoClaw 正在分析中...」提示
- 加 group_msg_id 為空時的 warning log
- clawbot-v5 STANDBY_MODE: main.py 檢查條件修正

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 17:36:01 +08:00
OG T
c65ed5b1c9 feat(telegram): SRE 戰情室群組三頭政治 Triumvirate (ADR-053)
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 7m6s
- config.py: 新增 OPENCLAW_BOT_TOKEN / NEMOTRON_BOT_TOKEN / SRE_GROUP_CHAT_ID
- telegram_gateway.py: send_to_group / send_as_openclaw / send_as_nemotron / trigger_group_ai_discussion / _send_approval_card_to_group
- send_approval_card 告警發送後非同步觸發群組 AI 雙向討論
- configmap: SRE_GROUP_CHAT_ID=-1003711974679
- secrets: OPENCLAW_BOT_TOKEN / NEMOTRON_BOT_TOKEN CHANGE_ME 佔位

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 17:16:05 +08:00
OG T
ff5a77f7a9 fix(telegram): 啟用 Polling + 修正 InfraAlertMessage 格式
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 6m52s
1. TELEGRAM_ENABLE_POLLING: false→true
   - clawbot-v5 已停止 polling (STANDBY_MODE)
   - AWOOOI API 接管,統帥可與 OpenClaw/NemoClaw 雙 AI 對話

2. InfraAlertMessage.format() 加入 note 欄位
   - NIM 慢屬正常不再顯示「自動修復失敗」
   - 改為 💡 資訊性提示

3. NIM 探測端點改為 /v1/models (輕量,不觸發計費)
   timeout: 10s → 25s (NIM 免費 tier 冷啟動)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 16:43:40 +08:00
OG T
15aabd6ac5 fix(chat+nim): 修復首席架構師 Review I1-I4 + S3 四項重要問題
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 7m9s
I1: chat_manager._call_openclaw timeout=30.0 → 讀 settings.OPENCLAW_TIMEOUT
I2: nvidia_provider.py stale comment "45" → "55" 對齊 ConfigMap
I3: asyncio.shield 移除 — shield 超時後 task 繼續跑但無人等待 (silent leak)
I4: ChatManager.__init__ 移除 repo 實例 (leWOOOgo 禁 Service 持有 repository)
S3: _check_nemotron_health probe 10s → 25s + /v1/models 輕量端點

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 16:36:16 +08:00
OG T
be247d6c5c fix(chat): OpenClaw timeout 30→40s,NemoClaw 50→60s
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 6m51s
get_system_context() k8s/DB 查詢加上 _call_openclaw 30s,
總計超過外層 shield 30s 導致 OpenClaw 全部超時。
放寬 timeout 讓兩個 AI 有足夠時間回應。

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 16:27:08 +08:00
OG T
d8c9e29485 fix(heartbeat): 撤銷錯誤的 Nemotron 自動關閉邏輯
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 6m53s
之前錯誤地在偵測到 Nemotron 慢時自動執行
ENABLE_NEMOTRON_COLLABORATION=false,
這等於自動關掉產品核心功能。

Nemotron NIM 免費 tier 延遲 11-45s 是已知特性(Memory 有記載),
不是需要自動修復的異常。

現在:偵測慢只發告警通知,不執行任何自動修復。

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 15:34:34 +08:00
OG T
1430b1283d fix(chat+nvidia): 還原 OpenClaw+Nemotron 架構 + 修 30s timeout 根因
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
ChatManager 還原:
- OpenClaw (188:8088) 負責 RCA 仲裁,不改用 Gemini (未經批准)
- NemoClaw (NVIDIA NIM nemotron-mini-4b) 負責補充/評論
- 雙 AI 並行執行,OpenClaw 30s / NemoClaw 50s timeout
- 支援 @openclaw / @nemo 指定對象

nvidia_provider.py 修 timeout 根因:
- NVIDIA_TIMEOUT 從硬編碼 30.0 改為讀 NEMOTRON_TIMEOUT_SECONDS (45s)
- Memory 記載 NIM 免費 tier 延遲 11-45s,30s 硬編碼導致慢請求全超時

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 15:34:02 +08:00
OG T
d522c51deb fix(infra-alert): Nemotron 異常告警套用標準模板 + 真正自動修復
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
1. 新增 InfraAlertMessage dataclass — 基礎設施異常的標準告警格式
   (之前 Nemotron 告警是硬編碼文字,不走任何模板)

2. 偵測 Nemotron 異常時自動執行修復:
   kubectl set env ENABLE_NEMOTRON_COLLABORATION=false
   (之前只是把指令印在訊息裡,從未執行)

3. 告警顯示自動修復結果 ( 已自動修復 /  失敗)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 15:29:20 +08:00
OG T
e93ada0452 fix(chat): OpenClaw 改走 Gemini Flash,移除 Ollama 依賴
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 7m18s
Ollama 188 完全卡死 (0 bytes/30s timeout),無法作為對話後端。
雙 AI 皆使用 Gemini Flash,靠不同 persona 和 temperature 區分:
- OpenClaw: temperature=0.5 (精準果斷)
- NemoClaw: temperature=0.9 (分析發散)

同時 kubectl set env ENABLE_NEMOTRON_COLLABORATION=false
停止每個 incident 白白等待 30s Nemotron timeout

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 15:20:23 +08:00
OG T
d9007e6855 feat(chat+monitor): 雙 AI 對話重寫 + Nemotron 健康監控告警
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 6m56s
ChatManager 重寫 (Phase 22.6):
- @openclaw <msg> → 只有 OpenClaw 回應 (Ollama qwen2.5:7b)
- @nemo <msg>     → 只有 NemoClaw 回應 (Gemini Flash)
- 無前綴           → OpenClaw 先答,NemoClaw 評論/反駁

NemoClaw 改用 Gemini Flash (棄 NIM nemotron-mini-4b 因為 15s+ 回應時間)

TelegramGateway 心跳新增 Nemotron 健康探測:
- 每次心跳探測 NVIDIA NIM API (10s timeout)
- 異常時立刻發 Telegram 告警 + 緩解指令
- 補足 Nemotron 100% 超時卻無告警的監控盲區

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 14:59:06 +08:00
OG T
c1834a7156 feat(kb+apm): KB Phase 2-A 自動萃取 + KB-D Markdown 詳情面板 + APM 趨勢圖
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 7m28s
- KB-A: 新增 knowledge_extractor_service.py (Ollama llama3.2:3b 本地推理)
- KB-A: incident_service.py resolve hook (fire-and-forget asyncio.create_task)
- KB-D: 引入 react-markdown + remark-gfm,知識庫詳情面板 Markdown 渲染
- KB-D: 批准/封存按鈕串接 API (POST /knowledge/{id}/approve, PATCH status)
- KB-D: i18n 新增 approving/archiving 載入狀態文字
- APM: apm/page.tsx 整合 TimeSeriesChart sparkline (使用 trend[] 欄位)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 14:40:27 +08:00
OG T
e60225ea29 fix(ai): I1+I3 — Redis TTL + openclaw_nemo 命名對齊
Some checks failed
CD Pipeline / build-and-deploy (push) Failing after 36s
I1: ai_control.py 所有寫入 Redis 的 key 加入 30 天 TTL
    防止 ai:control:* keys 永久累積造成記憶體洩漏

I3: ai_rate_limiter.py "nvidia" key → "openclaw_nemo"
    對齊 Phase 24 AIProviderEnum,使 rate limit 正確作用

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 13:22:36 +08:00
OG T
b225c23ad8 fix(ai_router): DIAGNOSE/ALERT_TRIAGE 改用 llama3.2:3b 避免 90秒 timeout
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 7m5s
qwen2.5:7b-instruct 在 prod 需要 >90s,導致 DIAGNOSE intent 全鏈路失敗。
llama3.2:3b (summary model) 實測 4s 回應,適合 triage 類快速判斷。

規則 3 新增特判: DIAGNOSE/ALERT_TRIAGE/QUERY → ollama summary model
不影響其他 intent 的 model 選擇邏輯。

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 12:32:01 +08:00
OG T
b6105b8214 fix(ai): 首席架構師審查修復 C1+C2 (Phase 24 C)
C1 — telegram_gateway.py Fail-Closed 白名單:
  白名單為空時 'if whitelist and ...' 為 False → 任何人可執行 /ai
  修復: 'if not whitelist or user_id not in whitelist' Fail-Closed
  加入 whitelist_empty 欄位到 warning log

C2 — openclaw.py list comprehension await 語法錯誤:
  Python 3.11 不支援 list comprehension 中使用 await
  'if not await is_provider_disabled(p)' → SyntaxError
  修復: 改為 for loop 明確 await
  I4: 靜默 except 改為 logger.warning

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 00:42:02 +08:00
OG T
dbe71f82e3 feat(ai): Phase 24 C — Telegram /ai 動態控制 + Redis 狀態管理
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
新增 ai_control.py:
- /ai status: 所有 Provider 狀態 + 路由模式
- /ai router on/off: 動態切換 AIRouter (覆蓋 env var)
- /ai primary <provider>: 設定主要 Provider
- /ai enable/disable <provider>: 控制 Provider 啟停
- /ai cost: 費用統計
- 白名單: OPENCLAW_TG_USER_WHITELIST 保護

telegram_gateway.py:
- _handle_chat_message 加入 /ai 指令攔截路由
- 白名單未授權返回警告

openclaw.py:
- Redis 狀態覆蓋 env USE_AI_ROUTER (/ai router on/off 生效)
- Redis primary_provider 覆蓋路由決策 (/ai primary 生效)
- Redis disabled provider 過濾 (/ai disable 生效)

Redis Keys:
  ai:control:use_router
  ai:control:primary_provider
  ai:control:disabled:<provider>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 00:34:14 +08:00
OG T
b4b3a457c5 refactor(openclaw): Phase 24 B4 — 封存舊 fallback Provider 方法
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
[ARCHIVED] _call_ollama / _call_gemini / _call_claude
- 這三個方法為 USE_AI_ROUTER=false 回滾保留路徑
- 新路徑: USE_AI_ROUTER=true → AIRouterExecutor (ai_router.py)
- 新 Provider: ai_providers/ollama.py / gemini.py / claude.py
- 封存而非刪除: 完整移除等 Phase 24 全驗收後 (ADR-052 D11)

R3 觀察結果 (通過 ):
- openclaw_nemo provider: 12/12 incidents 全部正確路由
- 信心度: 0.8~0.9 正常
- USE_AI_ROUTER=true 生效確認

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 00:29:56 +08:00
OG T
97d86861ed fix(ai_router): C1 修復 — AIProviderEnum 對齊 Registry 實際 Provider 名稱
Some checks failed
CD Pipeline / build-and-deploy (push) Failing after 37s
問題: AIProviderEnum.NVIDIA = "nvidia" 在 Registry 無對應 Provider
      OpenClawNemoProvider.name = "openclaw_nemo"
      NemotronProvider.name = "nemotron"
      → 高複雜度/Tool Calling 路由永遠 skip,靜默 fallback 到 Gemini/Ollama

修復:
- 新增 OPENCLAW_NEMO = "openclaw_nemo" (一般推理, via .188 → NVIDIA NIM)
- 新增 NEMOTRON = "nemotron" (Tool Calling, direct NVIDIA NIM)
- 移除 NVIDIA = "nvidia" (Registry 無對應)
- 規則 4 (複雜度>=4/HIGH風險): NVIDIA → OPENCLAW_NEMO
- route_tool_calling: NVIDIA → NEMOTRON
- Rate Limiter check: "nvidia" → "openclaw_nemo"
- _full_fallback_chain: OPENCLAW_NEMO 首位
- _tool_calling_fallback_chain: NEMOTRON 首位

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 23:31:31 +08:00
OG T
58002e6bf4 feat(phase24-b3): NemotronProvider 抽取 + incident-card 重構
Some checks failed
E2E Health Check / e2e-health (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
Phase 24 B3:
- 新增 ai_providers/nemotron.py: NemotronProvider 封裝 K8s Tool Calling
  搬移自 openclaw.py _call_nemotron_tools (L1623-1785)
  capabilities=tool_calling, privacy_level=cloud
- ai_router.py: 加入 NemotronProvider 到 Registry
- ai_providers/__init__.py: 匯出 NemotronProvider

Phase R-UI2 (架構師 Warning):
- incident-card.tsx: 抽取 useApprovalAction hook
  handleApprove/handleReject 60行重複邏輯 → 共用 hook
  行為完全不變,維護性提升

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 23:12:42 +08:00
OG T
5a8aae89c4 fix(phase24): 首席架構師 Review C1/C2/C3/I4 修復
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 7m12s
E2E Health Check / e2e-health (push) Successful in 18s
C1 (P0): AIRouterExecutor.execute() 補 Langfuse Trace (D5)
  - 建立 langfuse_trace("ai_router_execute") 包住整個執行鏈
  - 成功時記錄 generation (model/input/output/tokens/cost)
  - prod 所有 AI 呼叫現在有 LLMOps 追蹤

C2 (P0): 絞殺者改為呼叫 AIRouter.route() 智慧路由
  - 先取得 RoutingDecision (意圖分類 + 複雜度評分)
  - provider_order 從 selected_provider + fallback_chain 動態生成
  - D1 意圖路由矩陣、D7 隱私保護 (DIAGNOSE 強制 local) 生效

C3 (P1): 型別標注 typo 修復
  - AIProviderEnumEnum → AIProviderEnum
  - AIProviderEnumProtocol → AIProviderProtocol

I4 (P1): interfaces.py AIProvider Protocol 補 close() 定義

S1: ai_router.py 模組版本標頭更新至 v4.0

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 21:47:06 +08:00
OG T
3ad7b60f68 fix(ai): Phase 24 R1+R2 首席架構師 Review 修復 (C1-C3 + I1-I5)
Some checks failed
E2E Health Check / e2e-health (push) Successful in 18s
CD Pipeline / build-and-deploy (push) Has been cancelled
Critical 修復:
- C1: AIProvider Enum 改名為 AIProviderEnum (避免與 Protocol 同名衝突)
- C2: 共用 Circuit Breaker → per-provider _SimpleCircuitBreaker
  (避免 Gemini 掛掉時 Ollama 也被擋)
- C3: cache_key 移到 try 外面 (避免 UnboundLocalError)

Important 修復:
- I1: Claude hardcode model → 用 get_model_registry()
- I2: Claude 追蹤 tokens/cost (input_tokens + output_tokens)
- I3: Ollama 追蹤 tokens (eval_count + prompt_eval_count)
- I4: Gemini temperature → 用 model_registry
- I5: AIProviderRegistry.close_all() shutdown hook

2026-04-02 ogt: Phase 24 首席架構師審查通過後修復

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 13:40:58 +08:00
OG T
73e8f8ab77 feat(ai): Phase 24-A+B1 — AI Provider Registry + 絞殺者包裝 (ADR-052)
Some checks failed
E2E Health Check / e2e-health (push) Successful in 16s
CD Pipeline / build-and-deploy (push) Has been cancelled
Brain Layer 雙軌 Registry 架構:
- 新建 src/services/ai_providers/ 目錄 (interfaces + 4 providers)
  - OllamaProvider (local, rca/chat/code_review)
  - GeminiProvider (cloud, rca/chat)
  - ClaudeProvider (cloud, rca/chat/code_review)
  - OpenClawNemoProvider (cloud, rca — 委派 188→NIM)
- 擴展 ai_router.py 加入:
  - AIProviderRegistry (動態註冊/啟停)
  - AIRouterExecutor (Cache + 閘門 CB/RL/Sem + 執行)
- openclaw.py 絞殺者包裝: USE_AI_ROUTER=true 走新路徑
- config.py + ConfigMap 加入 USE_AI_ROUTER=false (安全預設)
- ADR-052 正式文件 (14 項決策 D1-D14)
- HARD_RULES v1.7 加入 AI Router 規範

安全: USE_AI_ROUTER=false 預設不啟用,需手動開啟觀察
回滾: kubectl set env deployment/awoooi-api USE_AI_ROUTER=false

2026-04-02 ogt: Phase 24 首批實作

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 13:16:09 +08:00
OG T
d2bad44173 fix(api): KB 架構審查修復 I3-I5
Some checks failed
E2E Health Check / e2e-health (push) Successful in 17s
CD Pipeline / build-and-deploy (push) Has been cancelled
- I3: Service 層加 IKnowledgeRepository Protocol 型別標注
- I4: search 方法加入 tags JSONB 搜尋 (cast→String→ilike)
- I5: get_categories 獨立方法,不再繞道 list_entries(limit=0)

首席架構師審查 87/100 → 全部 Important issues 已修復

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 09:05:54 +08:00
OG T
e17248fd10 fix: 首席架構師審查修復 — i18n/CD/時區/死碼清理
Some checks failed
E2E Health Check / e2e-health (push) Successful in 16s
CD Pipeline / build-and-deploy (push) Has been cancelled
P0 前端 i18n 合規 (6 檔案):
- settings/page.tsx: 全面改用 useTranslations('settings')
- auto-repair/page.tsx: 30+ 處硬編碼改用 t('autoRepair.*')
- sidebar.tsx: sectionLabel 改用 tSection(),aria-label 國際化
- openclaw-panel.tsx: STATUS_MESSAGES 改用 tPanel(),Production 改用 tBrand
- alerts/page.tsx: StatPill label 改用 t('incident.severity.*')

P1 CD Pipeline:
- cd.yaml: runs-on 改 self-hosted (ADR-039)
- Telegram Secret 注入失敗改為 exit 1 (ADR-035)
- kubectl patch op:replace → op:add (首次部署相容)

P2 後端:
- langfuse_client.py: 移除 v4.x 死碼分支 (SDK 鎖定 <3.0.0)
- ai.py: 標記 TODO(R4) Router 瘦身

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 09:02:41 +08:00
OG T
d32d84efce feat(telegram): 接通 Phase 22 Nemotron 雙軌顯示 (ADR-044)
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
E2E Health Check / e2e-health (push) Has been cancelled
根本原因: format_with_nemotron() 已實作但從未被呼叫
- send_approval_card() 新增 nemotron_enabled/tools/validation/latency 參數
- TelegramMessage 建構時傳入 nemotron 欄位
- nemotron_enabled=true 時自動使用 format_with_nemotron() 格式
- _push_decision_to_telegram() 從 proposal_data 提取並傳遞 nemotron 資料

效果: Telegram 同時顯示 OpenClaw 仲裁 + Nemotron 執行方案雙區塊
2026-04-02 ogt: Phase 22 最後一哩路

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 08:59:03 +08:00
OG T
d8be78b135 feat(api): Knowledge Base Phase 1 後端四層架構
Some checks failed
CD Pipeline / build-and-deploy (push) Successful in 7m0s
E2E Health Check / e2e-health (push) Successful in 17s
Type Sync Check / check-type-sync (push) Failing after 30s
- models/knowledge.py: Pydantic Schema (EntryType/Source/Status/CRUD)
- db/models.py: KnowledgeEntryRecord ORM (PostgreSQL)
- repositories/interfaces.py: IKnowledgeRepository Protocol
- repositories/knowledge_repository.py: PostgreSQL CRUD 實作
- services/knowledge_service.py: 業務邏輯 (get_db_context 內部管理 session)
- api/v1/knowledge.py: REST Router (get_knowledge_service,無直接 DB 存取)
- main.py: 掛載 Knowledge Base Router
- k8s/jobs/migrate-knowledge-entries.yaml: DB Migration Job

API 端點: GET/POST / | GET/PATCH/DELETE /{id} | POST /{id}/approve
         GET /search | GET /categories

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 00:55:56 +08:00
OG T
de04de1d4f fix(telegram): 新增 openclaw_nemo/nvidia_nim 顯示名稱映射
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
E2E Health Check / e2e-health (push) Has been cancelled
- format() 和 format_with_nemotron() 兩處 provider_names 均加入:
  openclaw_nemo → "OpenClaw Nemo"
  openclaw_nvidia_nim → "OpenClaw Nemo"
  openclaw_qwen → "OpenClaw Nemo"
- 修正顯示 "OPENCLAW_NEMO" (大寫) 的問題
- 2026-04-01 ogt: 配合 AI 仲裁架構調整

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 00:20:37 +08:00