Commit Graph

379 Commits

Author SHA1 Message Date
OG T
76f7330c9d feat(api): POST /playbooks/ 建立端點 + seed-repair-playbooks.py (Task 14)
Some checks failed
CD Pipeline / build-and-deploy (push) Failing after 57s
CD Pipeline / Deploy Prometheus Alert Rules (push) Has been skipped
- playbooks.py: 新增 POST / 端點供直接建立 Playbook (seed/管理用)
- seed-repair-playbooks.py: 5個 Host Repair Playbooks (ssh_command)
  sentry/harbor/gitea/alertmanager (110) + openclaw (188)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 11:53:49 +08:00
OG T
bf4f81412c feat(api): ActionType.SSH_COMMAND + auto_repair_service SSH分支 (Task 12)
- playbook.py: 新增 SSH_COMMAND ActionType
- auto_repair_service._execute_step: SSH_COMMAND 分支,格式 layer/component

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 11:47:00 +08:00
OG T
e7d8da85f6 feat(api): HostRepairAgent — SSH 主機層修復 (Task 11)
- host_repair_agent.py: layer路由、command injection防護、asyncio SSH執行
- 測試: 12 cases 全通過 (routing/sanitize/success/fail/timeout/denied)
- SSH key: /etc/repair-ssh/id_ed25519 (K8s secret mount)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 11:22:00 +08:00
OG T
7a6fa6359e feat(api): Sentry init 加入統一 layer/component 標籤
對齊 Prometheus 告警標籤規範 (layer/component/team)
讓 Sentry 事件與 auto_repair 路由決策保持一致

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 11:10:40 +08:00
OG T
2243a21b96 fix(ai-router): v4.3 NIM 保護 — timeout 不計 CB 失敗,每次先跑 NIM 才切 Gemini
Some checks failed
CD Pipeline / build-and-deploy (push) Failing after 20s
需求: NIM 必須等到有回應才切換,不能因為慢就被 CB 封鎖走 Gemini

變更:
- Timeout exception 不累積 CB failure(只有真實連線錯誤才計)
- NIM CB: failure_threshold=10, recovery_timeout=30s(比預設寬鬆)
- 設計文件 v4.3: 更新方向二,移除錯誤假設

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 01:51:12 +08:00
OG T
5ad403b287 fix(p0): v4.3 — 實測確認 Ollama CPU-only 不可用,DIAGNOSE 統一走 NIM
實測依據 (2026-04-05):
- Ollama llama3.2:3b CPU-only: 238s 回 {"ok":true},生產不可用
- Nemotron NIM: 2.2s~27.3s,avg 10.6s,一直是主力(Phase 22 起)
- NIM 從未有隱私問題,Incident 資料一直送雲端 GPU

變更:
- ai_router.py: _local_fallback_chain 廢棄(空 list)
- ai_router.py: DIAGNOSE route/route_sync 改回 _full_fallback_chain
- config.py: 更新 timeout 說明反映實測結果
- test_p0_diagnose_routing.py: 更新 docstring

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 01:49:06 +08:00
OG T
8300879d02 chore: trigger CD deploy (warm-up + MinIO startup)
Some checks failed
CD Pipeline / build-and-deploy (push) Failing after 24s
2026-04-05 01:00:31 +08:00
OG T
a81bf50537 feat(drift): ADR-057 adopt() Gitea PR API 實作
- DriftAdoptService: 透過 Gitea REST API 建立 branch + commit + PR
  不在 API Pod 內執行 git(修復 C2 安全漏洞)
- adopt() 端點: 501 → 真實實作(呼叫 DriftAdoptService)
- config.py: 新增 GITEA_API_URL / GITEA_API_TOKEN / GITEA_REPO_OWNER / GITEA_REPO_NAME
- K8s secret awoooi-secrets 已注入 GITEA_API_TOKEN
- drift.py: 移除 trigger_drift_scan 中未使用的 interpreter 變數

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 00:39:29 +08:00
OG T
f4f454fd98 feat(api): 重開機後自動 warm-up Redis Working Memory from PostgreSQL
- main.py lifespan: 啟動時從 DB restore INVESTIGATING/MITIGATING incidents
- scripts/reboot-recovery: 188 + 110 自動化腳本 + systemd services
- scripts/reboot-recovery: aiops-network 自動建立 (ClawBot 依賴)
- docs/runbooks/REBOOT-RECOVERY-SOP.md: 完整改寫,含自動化腳本說明

Why: 重開機後 Redis 清空導致前端 incidents 顯示 0 筆(DB 完整保存)
統帥批准: 「所有數據必須被長久記錄下來」

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 00:39:20 +08:00
OG T
96d5e18924 fix(p0): 實測修正 — timeout 依 benchmark 調整,_local_fallback_chain 移除雲端 Nemotron
- config.py: NEMOTRON_DIAGNOSE_TIMEOUT_SECONDS=60s (NIM 實測 11-45s + 15s buffer)
- config.py: OLLAMA_DIAGNOSE_TIMEOUT_SECONDS=200s (Ollama 實測 ~173s + 27s buffer)
- ollama.py: 新增 per-task timeout (diagnose/force_local 用 200s)
- ai_router.py: _local_fallback_chain 移除 Nemotron (NIM=雲端,不可進 local chain)
- ai_router.py: v4.2 — Option C 分情境路由正式確立

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 00:29:09 +08:00
OG T
4912c7f307 fix(phase25): 首席架構師 Review R2 修正 (I1/I2/I3/I4/C3/M1)
I1: auto_repair_service — 失敗分支 anti_pattern task 補齊 _pending_tasks GC 防護
C3: drift_remediator — _kubectl_apply() 實作 resource_key 範圍過濾(修復虛設參數 bug)
M1: drift_remediator — _git_push() 標記 DISABLED,防止誤啟用
I2: drift.py — Telegram 通知移除失效的 adopt() 端點連結
I3: drift/page.tsx — handleScan POST body namespace→namespaces(對齊後端 DriftScanRequest)
I4: drift/page.tsx — 移除硬編碼英文字串,改用 t('loading')/t('highCount')/t('mediumCount')
i18n: zh-TW.json + en.json 補齊 drift.loading key

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 00:22:38 +08:00
OG T
4bc4757fdc test(phase25): Phase 25 P1/P2 source code inspection tests (36 tests)
- test_phase25_auto_harvesting.py: 18 tests for NemotronRunbookGenerator,
  AntiPattern gate, fire-and-forget pattern, symptoms_hash
- test_phase25_drift_detection.py: 18 tests for DriftDetector, NemotronDriftInterpreter
  (read-only), DriftRemediator, local fallback chain for DIAGNOSE

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 00:14:50 +08:00
OG T
688146ef9c test(ai-router): test_fallback_list >= 2 改 >= 1
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
DIAGNOSE local chain 選 Nemotron 後 fallback 只剩 Ollama 一個
>= 2 斷言過嚴,與 test_query_routes_to_ollama 同樣修正

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 18:05:25 +08:00
OG T
428ed5f8cd test(ai-router): 修正 test_query_routes_to_ollama 斷言
Some checks failed
CD Pipeline / build-and-deploy (push) Failing after 41s
Phase 25 P0 後 DIAGNOSE 走 _local_fallback_chain [NEMOTRON, OLLAMA]
選 NEMOTRON 為 primary,fallback 只剩 OLLAMA 一個,
>= 2 斷言過嚴,改為 >= 1。

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 18:02:43 +08:00
OG T
a562db4048 fix(phase25): 首席架構師 Review C1/C2/I1/I3 修正
Some checks failed
CD Pipeline / build-and-deploy (push) Failing after 57s
C1: NemotronProvider.privacy_level cloud→local
    NIM 部署在 192.168.0.188 內網,非官方雲端 API
    可納入 DIAGNOSE _local_fallback_chain 隱私邊界

C2: adopt() 端點暫停,返回 501
    API Pod 執行 git add -A 有安全風險
    ADR-057 起草後改用 Gitea PR API 實作

I1: timeout log 修正,記錄實際套用的 timeout 值
    原本永遠記錄 NEMOTRON_TIMEOUT_SECONDS=45
    現在記錄依 task_type 選擇的正確值

I3: route_sync() 補 DIAGNOSE 隱私邊界
    async route() 已有 _local_fallback_chain
    sync 版本遺漏,此次補齊

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 18:00:05 +08:00
OG T
c4eafd2a5b fix(ai-router): fallback_models 排除 selected_model 避免重複
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 6m58s
DIAGNOSE intent 路由至 Nemotron 後,fallback_chain 中的 OPENCLAW_NEMO
也使用相同 model string,導致 fallback_models 包含已選模型。

修正: 過濾掉與 selected_model 相同的 model string。

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 17:43:44 +08:00
OG T
8056be5847 feat(ai-router): DIAGNOSE intent override 升級至 Nemotron (P0) 2026-04-04 17:41:45 +08:00
OG T
671974dedb test(ai-router): TestLocalFallbackChain — require_local 隱私邊界驗證 (P0)
Some checks failed
CD Pipeline / build-and-deploy (push) Failing after 43s
新增兩個測試:cloud provider 被跳過 + 全失敗回傳 local_providers_unavailable。
實作邏輯已存在於 AIRouterExecutor.execute()(2026-04-04 ogt Phase 25 P0)。

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 17:32:32 +08:00
OG T
ffd679f5d3 feat(nemotron): per-task timeout,DIAGNOSE 使用獨立 timeout 設定 (P0)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 16:58:23 +08:00
OG T
3455044457 feat(phase25): Nemotron 主動防禦三方向 P0+P1+P2 完整實作
Some checks failed
CD Pipeline / build-and-deploy (push) Failing after 38s
Type Sync Check / check-type-sync (push) Failing after 35s
P0 - DIAGNOSE Privacy-First Routing:
- ai_router.py: _local_fallback_chain [NEMOTRON→OLLAMA→REJECT]
- DIAGNOSE 意圖 override 改為 NEMOTRON (原 OLLAMA)
- DIAGNOSE fallback 使用 local-only 鏈,不觸碰雲端
- 全部失敗時 REJECT + Telegram 通知
- config.py: NEMOTRON_DIAGNOSE_TIMEOUT_SECONDS=30, OLLAMA_DIAGNOSE_TIMEOUT_SECONDS=60
- nemotron.py: 根據 context[task_type] 選擇 timeout

P1 - Knowledge Auto-Harvesting:
- models/knowledge.py: EntryType.AUTO_RUNBOOK + ANTI_PATTERN + symptoms_hash
- EntryStatus.PUBLISHED (ANTI_PATTERN 直接發布,無需審核)
- models/playbook.py: SymptomPattern.compute_hash() (16字元確定性 hash)
- services/runbook_generator.py: NemotronRunbookGenerator (v1.1)
  - generate_runbook() → AUTO_RUNBOOK (DRAFT) + Telegram 審核 card
  - generate_anti_pattern() → ANTI_PATTERN (PUBLISHED) + Telegram 通知
  - 使用 nvidia.chat() (正確介面),Nemotron 超時時 Minimal fallback
- knowledge_service.py: check_anti_pattern(symptoms_hash, days=7)
- db/models.py: symptoms_hash VARCHAR(16) + ix_knowledge_symptoms_hash
- repositories/knowledge_repository.py: create() 支援 symptoms_hash + status
- auto_repair_service.py: anti_pattern_gate 在 decide() + runbook hook 在 execute()
- migrations/phase8_symptoms_hash.sql: ALTER TABLE + partial index + PUBLISHED constraint

P2 - Config Drift Detection:
- models/drift.py: DriftItem/DriftReport/DriftLevel/DriftIntent/DriftStatus
- services/drift_detector.py: GitStateReader + K8sStateReader + DriftDetector
- services/drift_analyzer.py: 白名單過濾 + DriftLevel 分級
- services/drift_interpreter.py: NemotronDriftInterpreter(意圖分析,不生成修復指令)
- services/drift_remediator.py: rollback(kubectl apply) + adopt(git push gitea)
- api/v1/drift.py: POST /scan, GET /reports, POST /rollback, POST /adopt
- migrations/phase9_drift_reports.sql: drift_reports 表
- k8s/drift-cronjob.yaml: 每小時自動掃描 CronJob

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 12:35:05 +08:00
OG T
b6e12f74f4 test(phase22): Phase 22.4 Nemotron 協作測試 18/18 PASSED
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 7m12s
- 修正 file path: apps/api/src/ → src/ (從 apps/api/ 目錄執行)
- 擴大 snippet size: 800→1500 chars (docstring 過長導致 flag check 超出範圍)
- 擴大 _call_nemotron_tools snippet: 2000→5000 chars (timeout 在函數後段)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 12:16:28 +08:00
OG T
df3ef9006c fix(auto-repair): 首席架構師 Review — 4 Critical/Important 修復
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 7m2s
Critical #1: KM write task 移出 try/except
- _trigger_learning 的 KM 寫入原在 try 內,learning 失敗時不寫 KM
- 移至 except 後確保成功/失敗都寫入
- 移除冗餘 import asyncio(已在頂層 import)
- Minor: approval.incident_id or None 防空字串

Important #2: migration 加 PRIMARY KEY
- playbook_id 從 UNIQUE 升為 PRIMARY KEY
- prod DB 已執行 ALTER TABLE ADD PRIMARY KEY

Important #3: s.sequence→s.step_number, s.description→s.command
- embed_playbook() 使用不存在的欄位名,RAG 向量索引靜默失敗
- RepairStep 正確欄位: step_number, command

Important #1: PlaybookService._get_rag_service 不再 Service 層快取
- 改為每次呼叫工廠 get_playbook_rag_service()
- 避免舊實例繞過工廠的 is_closed 重建邏輯

冷啟動修復 (首席架構師建議B+C):
- _trigger_playbook_extraction 執行成功後自動設定
  execution_success=True, effectiveness_score=4, status=RESOLVED
- skip 路徑 logger.debug → logger.info 提升可觀測性

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 12:02:03 +08:00
OG T
f6567751a9 test(knowledge): pgvector 語意搜尋整合測試 (5 tests)
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
- test_save_embedding: CAST AS vector 語法驗證
- test_semantic_search_returns_results: cosine similarity 查詢
- test_semantic_search_threshold_filters: 正交向量被 threshold 過濾
- test_semantic_search_archived_excluded: archived 不出現
- test_list_unembedded_entries: 未 embed 條目列舉

全部 5/5 PASSED (awoooi_dev PostgreSQL + pgvector)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 11:55:09 +08:00
OG T
72d7536ead feat(auto-repair): 完整自動修復閉環 + KM 沉澱串接
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
1. DB Migration: playbooks 資料表 (phase7_playbooks_table.sql)
   - 這是自動修復無法啟動的根本原因 — table 從未建立
   - 5 個索引: status/tags/alert_names/source_incidents/created_at
   - 已在 prod DB 執行

2. playbook_service: 萃取後自動沉澱 KM
   - extract_from_incident() 完成後 fire-and-forget _write_to_km()
   - 內容含症狀模式、修復步驟、信心度、來源 Incident

3. approval_execution: 執行結果沉澱 KM
   - _trigger_learning() 後 fire-and-forget _write_execution_result_to_km()
   - 成功/失敗記錄都寫入,category=execution_result

完整閉環:
告警 → AI分析 → 查Playbook → 決策 → 執行 → 結果寫KM
                                              ↓
                              Incident解決 → KM(knowledge_extractor)
                                          → Playbook萃取 → KM

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 11:54:15 +08:00
OG T
429d81d29b fix(knowledge): I2+I3 首席架構師 Important 修復 — 依賴注入 + exception 細分
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
I2: KnowledgeService 移至 DecisionManager.__init__ 注入
    _query_kb_context_inner 使用 self._knowledge_svc,移除函數內 import 耦合

I3: _query_kb_context exception 細分
    - asyncio.TimeoutError → warning (預期降級)
    - ConnectionError/OSError → warning (Ollama 連線問題,預期降級)
    - Exception → error (非預期,提升監控可見性)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 11:51:43 +08:00
OG T
f846000c8c fix(knowledge): C1 首席架構師必修 — _query_kb_context 5秒 hard timeout
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
C1 修復 (首席架構師 Review 74/100 → 條件通過):
- 抽出 _query_kb_context_inner 含實際查詢邏輯
- _query_kb_context 用 asyncio.wait_for(timeout=5.0) 包裝
- Ollama hang/慢響應最多消耗 5s,保護 30s 決策 SLA
- timeout 時 logger.warning("kb_rag_timeout") 靜默降級

同步移除 LLM prompt 中的 emoji (## 📚 → ## Knowledge Base)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 11:48:57 +08:00
OG T
860dc1d892 feat(knowledge): KB Phase 2 — OpenClaw RAG 整合
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
_dual_engine_analyze 新增 _query_kb_context():
- Incident 分析前語意搜尋相關 KB 條目 (top-3, threshold=0.4)
- 將 KB context 注入 expert_context.diagnosis_context 傳給 LLM
- 失敗時靜默降級,不影響主分析流程
- dual_engine_llm_win log 新增 kb_rag 欄位,可觀測 RAG 命中率

架構: _query_kb_context 透過 get_knowledge_service() 呼叫 Service 層
符合 leWOOOgo 積木化 — decision_manager 不直接存取 DB/pgvector

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 11:46:47 +08:00
OG T
d0f09705e5 fix(auto-repair): 修復三個阻礙自動修復的根本原因
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
1. playbook_rag: Ollama embedding http_client 滾動重啟後 is_closed
   - 新增 _get_http_client() 偵測 is_closed 自動重建
   - singleton get_playbook_rag_service() 加 is_closed 重建判斷

2. telegram: 加入 ai_model 欄位顯示底層判斷模型
   - TelegramMessage.ai_model 欄位
   - format() / format_with_nemotron() 顯示 "Nemotron (nemotron-70b)"
   - openclaw proposal_dict 加入 model 欄位
   - decision_manager / send_approval_card 串接

3. DB: 清除 9 筆 3/26 殭屍 PENDING (mock_fallback CRITICAL 測試記錄)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 11:46:25 +08:00
OG T
12bc94796a fix(knowledge): asyncpg 不支援 :param::type,改用 CAST(:param AS vector)
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
asyncpg 使用 $1 位置參數,:emb::vector 語法導致 PostgresSyntaxError。
save_embedding 和 semantic_search 均改用 CAST(:emb AS vector) 語法。

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 11:43:59 +08:00
OG T
cddc4cb1fc fix(knowledge): 首席架構師 Review 修復 C1+C2+I1+I2 (71→~88/100)
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 7m16s
C1: IKnowledgeRepository Protocol 補齊 save_embedding + semantic_search +
    list_unembedded_entries,恢復 Interface 先行保護層

C2: embed_all_entries Service 層 raw SQL 移至 Repository.list_unembedded_entries()
    Service 改透過 Protocol 呼叫,符合 leWOOOgo 積木化原則

I1: asyncio.create_task 加入 _pending_tasks set 持有引用,防 GC 回收與
    Shutdown 時 Task 遺失;task done 後自動 discard

I2: OllamaEmbeddingService 從每次 new 改為 KnowledgeService.__init__ 注入,
    單一實例重用

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 11:22:38 +08:00
OG T
8960bba7fe feat(knowledge): pgvector RAG — 語意搜尋 + 背景 Embedding 管線
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
- repository: save_embedding (raw SQL pgvector cast) + semantic_search (cosine <=>)
- service: create_entry 背景 embed + semantic_search + embed_all_entries 批次補 embed
- router: GET /semantic-search (q/limit/threshold) + POST /embed-all 管理端點

向量模型: nomic-embed-text (Ollama 192.168.0.188, 768 dims)
索引: ivfflat cosine (knowledge_entries.embedding vector(768))

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 11:17:24 +08:00
OG T
200c382ca4 feat(metrics): sparklines 串接真實數據 + TOOL_LINKS 移至 API (2026-04-04 ogt)
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 7m6s
前端 page.tsx:
- 今日事件 sparkline: 過去 6 小時每小時事件數 (從 incidents 計算)
- MTTR sparkline: 各已解決 incident 修復時間序列 (從 incidents 計算)
- 無數據時不顯示 sparkline (undefined 渲染 nothing)
- 移除硬碼 TOOL_LINKS,改讀 API 回傳的 tool.url

後端 monitoring.py:
- 每個 probe 函數回傳 dict 加入 "url" 欄位
- 前端工具連結由後端集中管理,解決多環境問題

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 11:09:04 +08:00
OG T
5e836bde24 test(integration): 新增真實 DB 整合測試 — knowledge_repository + API E2E (2026-04-04 ogt)
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 7m18s
- tests/integration/conftest.py: 連接 awoooi_dev PostgreSQL,每個測試後 rollback
- tests/integration/test_knowledge_repository.py: 23 個真實 DB 測試
  - create/get_by_id/list/update/delete(軟刪除)/search/categories/view_count
- tests/integration/test_incident_api.py: 7 個 HTTPS 端點測試
  - health check + knowledge API smoke test
- 遵循禁止 Mock 鐵律 (feedback_no_mock_testing.md)
- 本地驗證: 30/30 PASSED

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 02:35:38 +08:00
OG T
9e78d5222a feat(group-chat): 方案B slash commands — /status /incidents /cost /pods /help (2026-04-03 ogt)
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 7m5s
E2E Health Check / e2e-health (push) Successful in 17s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 20:02:27 +08:00
OG T
e833065043 feat(group-chat): Reply Bot 訊息時只有被Reply的Bot回應 (2026-04-03 ogt)
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 7m0s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 19:48:10 +08:00
OG T
8d09b18477 fix(group-chat): 移除雙AI互相評論 — 單獨@只有該AI回覆,雙AI路徑不再互評
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 19:44:49 +08:00
OG T
79a770ffe5 feat(group): 移除告警自動 AI 分析 — 老闆指示
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 7m3s
告警發到群組只顯示卡片,不自動觸發 OpenClaw/NemoClaw 分析
老闆和 AI 可手動在群組討論告警內容

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 19:36:40 +08:00
OG T
b62d7d3eb0 feat(chat): OpenClaw 改用 Gemini 2.0 Flash-Lite (最便宜)
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
Input $0.075/1M, Output $0.30/1M (比 Flash 便宜 25%)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 19:35:13 +08:00
OG T
6cd4280168 feat(chat): NemoClaw Claude API 加 token+費用統計
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
Claude Haiku 4.5: Input $0.80/1M, Output $4.00/1M
每次回覆顯示: token 數 | 本次費用 | 本月累計
Redis key: claude_cost:YYYY-MM,TTL 40 天

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 19:29:22 +08:00
OG T
781a6dac3e feat(chat): NemoClaw→Claude Haiku API + 告警只由 OpenClaw 分析
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 7m20s
老闆指示 (2026-04-03):
1. NemoClaw 改接 Claude API (claude-haiku-4-5),快速中文對話
2. 群組告警分析只觸發 OpenClaw,NemoClaw 不分析告警
3. OpenClaw/NemoClaw 雙向自然語言對話維持

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 19:19:56 +08:00
OG T
10ad2a67c7 fix(chat): gemini-2.0-flash 修正 + 全形小O支援 + NemoClaw 回 NIM
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
1. Gemini 模型名稱: gemini-1.5-flash → gemini-2.0-flash (404修復)
2. 費用計算: 2.0 Flash 定價 Input $0.10/1M, Output $0.40/1M
3. 全形/半形統一: unicodedata.normalize NFKC,支援「小O」全形輸入
4. NemoClaw: Ollama 188 負載高超時,暫回 NIM nemotron-mini-4b

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 19:17:08 +08:00
OG T
08b02280f8 feat(chat): Gemini 月費用上限 $10 USD + Redis 累計追蹤
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 6m55s
- 每次呼叫前檢查當月累計費用,超過 $10 USD 拒絕呼叫
- Redis key: gemini_cost:YYYY-MM,TTL 40 天
- 每次回覆顯示: token 數 | 本次費用 | 本月累計
- 超限時回傳警告訊息告知老闆

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 19:01:21 +08:00
OG T
2828cd897a feat(chat): OpenClaw→Gemini Flash + NemoClaw→Ollama llama3.2:3b
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
老闆指示 (2026-04-03):
- OpenClaw: Gemini 1.5 Flash API,每次回覆附 token+費用統計
- NemoClaw: Ollama llama3.2:3b,本地快速回應 (3-8s)
- 費用控管: Gemini 月上限 $10 USD

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 18:59:28 +08:00
OG T
fbf122fa1f fix(chat): OpenClaw 改用 NIM llama-3.1-8b 對話 + NemoClaw timeout 120s + 老闆稱謂
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 7m9s
1. _call_openclaw: 改用 NIM meta/llama-3.1-8b-instruct
   舊的 analyze/incident 是告警 API,回覆是告警格式,不適合對話
2. _call_nemotron: 移除 Ollama fallback,回到純 NIM
3. NEMOTRON_TIMEOUT_SECONDS: 55 → 120 (ConfigMap 已更新)
4. 修正「統帥」→「老闆」

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 18:41:15 +08:00
OG T
2da8da5a25 fix(chat): OpenClaw 改用 Ollama qwen2.5 做對話 + NemoClaw 加 Ollama fallback
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 6m51s
問題: _call_openclaw 用 analyze/incident API → 回覆是告警格式,不是自然語言
修法:
  1. OpenClaw chat → Ollama qwen2.5:7b-instruct (本地,快速,無格式污染)
  2. NemoClaw → NIM 優先,超時 fallback 到 Ollama llama3.2:3b

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 18:30:31 +08:00
OG T
d1436157b7 fix(polling): httpx client timeout 改為分開設定,read=50s > getUpdates 40s
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
根因: httpx.AsyncClient(timeout=30.0) 的 read timeout 30s
     < getUpdates 的 long polling timeout 40s
     導致每次 getUpdates 都被 client 打斷 → polling loop 無法正常收訊息

修法: httpx.Timeout(connect=10s, read=50s) 讓 long polling 正常等待

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 18:29:22 +08:00
OG T
dfc1e19c07 fix(group): 互相評論補充也加 reply_to_message_id 引用原訊息
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 18:24:51 +08:00
OG T
09241f102e fix(group): 群組訊息移到 security interceptor 前 — 修復 whitelist 擋掉所有群組訊息
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 7m10s
根因: intercept_telegram() 的 whitelist 是字串,user_id 是 int
      型別不匹配 → exception → telegram_chat_unauthorized → 群組訊息全被丟棄
修法: SRE 群組訊息優先路由,不走個人 whitelist
     (群組成員由 Telegram 群組管理員控制,安全邊界已存在)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 18:17:22 +08:00
OG T
203855a56e debug(group): 加 group_routing_check log 診斷 chat_id 不匹配
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 18:12:07 +08:00
OG T
63929a5e87 feat(group): 別名 小O→OpenClaw 小賀→NemoClaw + NemoClaw 強制繁中
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 7m6s
1. telegram_gateway.py: _handle_group_message 加入別名路由
   - 小O / 小o → 只有 OpenClaw 回應
   - 小賀 / 小贺 → 只有 NemoClaw 回應
   - clean_text 同步移除別名 token

2. chat_manager.py: NEMOCLAW_PERSONA 加強繁體中文強制指令
   - 明確「禁止使用英文或其他語言」防止 Nemotron 自動英文回應

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 18:00:51 +08:00