Commit Graph

37 Commits

Author SHA1 Message Date
OG T
1483218bab feat(approval): 批准/拒絕後立即回應 Telegram + 持久化 message_id 到 DB
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 13m9s
問題:按下 TG 批准/拒絕按鈕後完全沒有任何回應,使用者不知道是否成功。
      Telegram message_id 只存 Redis 24h TTL,過期後無法追蹤。

修正:
- approval_records 加 telegram_message_id / telegram_chat_id 欄位(已 ALTER TABLE)
- approval_db.update_telegram_message() — 持久化 message_id 到 DB
- decision_manager: 發送告警卡片後同時寫 Redis + DB
- telegram_gateway._notify_approval_result() — 批准/拒絕後:
    1. editMessageReplyMarkup 移除批准/拒絕按鈕,保留資訊按鈕
    2. sendMessage reply_to 在原訊息下回覆狀態行
    3. fallback: send_notification 發新訊息
- _handle_group_command: chat_id 改為 _chat_id 消除 IDE lint

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-09 18:19:31 +08:00
OG T
d8c2969341 feat(telegram): AI 鏈路透明化 — 告警訊息顯示 OpenClaw + Tool Calling 模型/後端
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 12m12s
- nemotron.py: 偵測 OllamaToolProvider vs NvidiaProvider,記錄 tool_model/tool_backend
- openclaw.py: 傳播 nemotron_tool_model/nemotron_tool_backend 到 proposal
- decision_manager.py: 從 proposal_data 提取並傳給 send_approval_card()
- telegram_gateway.py: TelegramMessage 新增兩個欄位,format_with_nemotron 顯示
  "🔧 Tool Calling: llama3.1:8b (Ollama 本機)" 或 "NVIDIA 雲端"

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-09 15:05:16 +08:00
OG T
4f80ba38c0 feat: 告警狀態變更在原訊息延續 (方案 B)
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 12m28s
**telegram_gateway.py**
- 新增 append_incident_update(incident_id, status_line)
  - 從 Redis tg_msg:{id} 取 message_id
  - editMessageReplyMarkup: 移除 Row1(批准/拒絕/靜默),保留 Row2(詳情/重診/歷史)
  - sendMessage reply_to_message_id: 在原訊息下方追加狀態行
  - 找不到 message_id 回傳 False(呼叫方自行 fallback)

**decision_manager.py**
- _push_decision_to_telegram: send_approval_card 後存 tg_msg:{id}=message_id (TTL 24h)
- _push_auto_repair_result: 改用 append_incident_update,找不到 message_id 才 fallback 新訊息

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-09 14:21:33 +08:00
OG T
2554ac1e60 fix: E2E test 告警識別 + 自動修復結果 Telegram 通知
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
**alert_rule_engine.py**
- _matches() 加入 instance_prefix 匹配(最高優先)
- match_rule() 傳入 instance label 至 _matches
- 用途: e2e-final-* / e2e-test-* instance 可被 YAML 規則識別

**alert_rules.yaml**
- 新增 e2e_smoke_test 規則 (priority=120)
- alertname: E2E_SMOKE_TEST / instance_prefix: e2e-final- / e2e-test- / test-host
- suggested_action: NO_ACTION,顯示「告警鏈路驗證成功」

**decision_manager.py**
- _auto_execute() 成功後發 Telegram 結果通知 
- _auto_execute() 失敗後發 Telegram 失敗通知 
- 新增 _push_auto_repair_result() 函數

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-09 14:16:15 +08:00
OG T
b66263ad36 fix(decision_manager): resolved Incident 不重送 Telegram
Some checks are pending
CD Pipeline / build-and-deploy (push) Has started running
dedup TTL 10分鐘過期後,已 resolve 的 Incident 仍被重新推送
加入狀態檢查,resolved/closed 直接跳過

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-09 12:00:39 +08:00
OG T
9361fd1fa7 fix(decision_manager): action 不應 strip_placeholders 避免截斷 deployment name
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
_strip_placeholders 移除 <...> 導致 kubectl rollout restart deployment/<name>
變成 kubectl rollout restart deployment/,Telegram 顯示建議指令不完整

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-09 11:45:33 +08:00
OG T
c5e475121a fix(telegram): 修復建議指令被截斷 + decision_manager enum string 補正
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
根因 1: telegram_gateway.py suggested_action[:35] 剛好截到 deployment/ 後
  → 改為 [:80],完整顯示 kubectl command

根因 2: 舊 Incident proposal_data 存 enum string (RESTART_DEPLOYMENT)
  → decision_manager.py 加入偵測,用規則引擎重新查 kubectl command

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-09 11:14:30 +08:00
OG T
4935cfc346 fix(telegram): 重設計訊息格式 + 修復 detail/reanalyze/history 按鈕失效
Some checks failed
CD Pipeline / build-and-deploy (push) Failing after 1m26s
- format() / format_with_nemotron(): 移除 ═══ 分隔符,改為簡潔換行佈局
- send_approval_card(): 新增 incident_id 參數,傳入 _build_inline_keyboard()
- decision_manager.py: 呼叫 send_approval_card() 時傳入 incident.incident_id
- 問題根因: incident_id 未傳入 _build_inline_keyboard() 導致第二排按鈕從未渲染

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 12:44:13 +08:00
OG T
429d81d29b fix(knowledge): I2+I3 首席架構師 Important 修復 — 依賴注入 + exception 細分
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
I2: KnowledgeService 移至 DecisionManager.__init__ 注入
    _query_kb_context_inner 使用 self._knowledge_svc,移除函數內 import 耦合

I3: _query_kb_context exception 細分
    - asyncio.TimeoutError → warning (預期降級)
    - ConnectionError/OSError → warning (Ollama 連線問題,預期降級)
    - Exception → error (非預期,提升監控可見性)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 11:51:43 +08:00
OG T
f846000c8c fix(knowledge): C1 首席架構師必修 — _query_kb_context 5秒 hard timeout
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
C1 修復 (首席架構師 Review 74/100 → 條件通過):
- 抽出 _query_kb_context_inner 含實際查詢邏輯
- _query_kb_context 用 asyncio.wait_for(timeout=5.0) 包裝
- Ollama hang/慢響應最多消耗 5s,保護 30s 決策 SLA
- timeout 時 logger.warning("kb_rag_timeout") 靜默降級

同步移除 LLM prompt 中的 emoji (## 📚 → ## Knowledge Base)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 11:48:57 +08:00
OG T
860dc1d892 feat(knowledge): KB Phase 2 — OpenClaw RAG 整合
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
_dual_engine_analyze 新增 _query_kb_context():
- Incident 分析前語意搜尋相關 KB 條目 (top-3, threshold=0.4)
- 將 KB context 注入 expert_context.diagnosis_context 傳給 LLM
- 失敗時靜默降級,不影響主分析流程
- dual_engine_llm_win log 新增 kb_rag 欄位,可觀測 RAG 命中率

架構: _query_kb_context 透過 get_knowledge_service() 呼叫 Service 層
符合 leWOOOgo 積木化 — decision_manager 不直接存取 DB/pgvector

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 11:46:47 +08:00
OG T
d0f09705e5 fix(auto-repair): 修復三個阻礙自動修復的根本原因
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
1. playbook_rag: Ollama embedding http_client 滾動重啟後 is_closed
   - 新增 _get_http_client() 偵測 is_closed 自動重建
   - singleton get_playbook_rag_service() 加 is_closed 重建判斷

2. telegram: 加入 ai_model 欄位顯示底層判斷模型
   - TelegramMessage.ai_model 欄位
   - format() / format_with_nemotron() 顯示 "Nemotron (nemotron-70b)"
   - openclaw proposal_dict 加入 model 欄位
   - decision_manager / send_approval_card 串接

3. DB: 清除 9 筆 3/26 殭屍 PENDING (mock_fallback CRITICAL 測試記錄)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 11:46:25 +08:00
OG T
d2f02999b7 fix(alert-format): 移除 [LLM_OPENCLAW_NEMO] prefix + 擴大根因/建議字數
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 7m4s
- root_cause: 移除 [source.upper()] 前綴,直接顯示 AI 分析文字
- root_cause 截斷: 80→150 字
- suggested_action 截斷: 50→80 字
- AI provider 來源已在訊息標頭 「🤖 OpenClaw Nemo 仲裁」顯示,不需在根因重複

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 17:43:19 +08:00
OG T
d32d84efce feat(telegram): 接通 Phase 22 Nemotron 雙軌顯示 (ADR-044)
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
E2E Health Check / e2e-health (push) Has been cancelled
根本原因: format_with_nemotron() 已實作但從未被呼叫
- send_approval_card() 新增 nemotron_enabled/tools/validation/latency 參數
- TelegramMessage 建構時傳入 nemotron 欄位
- nemotron_enabled=true 時自動使用 format_with_nemotron() 格式
- _push_decision_to_telegram() 從 proposal_data 提取並傳遞 nemotron 資料

效果: Telegram 同時顯示 OpenClaw 仲裁 + Nemotron 執行方案雙區塊
2026-04-02 ogt: Phase 22 最後一哩路

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 08:59:03 +08:00
OG T
27b4d2a76a fix(telegram): strip <placeholder> 佔位符防止 HTML parse 錯誤
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
E2E Health Check / e2e-health (push) Has been cancelled
OpenClaw 生成的 kubectl_command 含 <受影響服務名稱>
在 Telegram HTML parse mode 下造成 'Can't parse entities'
用 regex strip 所有 <...> 佔位符

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-01 22:50:07 +08:00
OG T
dd526684ab feat(ai): Phase 22 OpenClaw + Nemotron 協作架構 (ADR-044)
All checks were successful
E2E Health Check / e2e-health (push) Successful in 17s
統帥批准實作「仲裁-執行分工」架構:
- OpenClaw = 仲裁者 (Why + Risk Level)
- Nemotron = 執行者 (How + kubectl Command)

新增功能:
- config.py: ENABLE_NEMOTRON_COLLABORATION Feature Flag
- openclaw.py: generate_incident_proposal_with_tools()
- openclaw.py: _call_nemotron_tools() Nemotron 呼叫
- telegram_gateway.py: TelegramMessage Nemotron 欄位
- telegram_gateway.py: format_with_nemotron() 雙區塊格式
- decision_manager.py: 整合協作方法
- proposal_service.py: 整合協作方法

觸發條件:
- LOW 風險 → 僅 OpenClaw
- MEDIUM/HIGH/CRITICAL → OpenClaw + Nemotron 雙軌

首席架構師審查: 83/100 條件通過

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-31 18:52:53 +08:00
OG T
b97f9364fb feat(k8s): add Worker HPA + fix non-AI confidence values
Wave 2 Deployment:
- Worker HPA: min:1 max:3, CPU 70%, Memory 80%
- 前置條件: XCLAIM + terminationGracePeriodSeconds:90 (Wave 1 )
- 比 API/Web 更保守的擴縮策略 (120s up, 600s down)

Confidence Fix:
- 非 AI 分析來源 (fallback/playbook/historical/consensus) 設 confidence=0.0
- 避免混淆 AI 信心度與其他指標 (成功率/相似度)
- 涉及: github_webhook, decision_manager, intent_classifier, learning_service

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 16:09:37 +08:00
OG T
5f9a6a7e55 fix(ai): 移除假信心分數 + 顯示 AI 模型來源
問題: AI 仲裁顯示硬編碼信心分數 (0.75/0.88/0.92/0.70)

修復:
- decision_manager: 預設 confidence 0.75 → 0.0
- decision_manager: Expert System confidence=0.0 + is_rule_based
- openclaw: 所有 Mock Response confidence → 0.0
- telegram_gateway: 新增 ai_provider 欄位
- telegram_gateway: 動態來源標籤 (Ollama/Gemini/Claude/規則匹配)

Telegram 卡片顯示:
- confidence > 0 + provider=ollama → 🤖 Ollama 仲裁
- confidence > 0 + provider=gemini → 🤖 Gemini 仲裁
- confidence > 0 + provider=claude → 🤖 Claude 仲裁
- confidence == 0 → ⚙️ 規則匹配

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 15:19:51 +08:00
OG T
138ef0c2db fix(api): 修復 7 個 Lint 錯誤 (unused imports + zip strict + dict comprehension)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-27 14:42:47 +08:00
OG T
abc21c735e feat(api): P1 Telegram 按鈕優化 - 稍後/靜默
新增按鈕:
-  稍後 (snooze): 延遲 30 分鐘後再提醒
- 🔕 靜默 1h (silence): 同類資源告警靜默 1 小時

實作細節:
- telegram_gateway.py: 新增 _handle_snooze/_handle_silence
- decision_manager.py: 發送前檢查 silence 狀態
- Redis Key: telegram_snooze:{approval_id}, telegram_silence:{resource_name}
- Skill 03 v1.5 → v1.6

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-27 09:50:28 +08:00
OG T
e34b0f2e9a fix(api): Telegram 去重 + 修復 INC-INC-INC- 重複前綴
- 加入 Redis 去重機制 (10 分鐘 TTL)
- 修復 approval_id 重複添加 INC- 前綴

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-27 09:27:40 +08:00
OG T
855b7f5593 fix: 修復 Telegram 轟炸問題
問題: 765ee39 的修改導致 COMPLETED 狀態下 incident 未解決時
會建立新 decision,每次 poll 都觸發 Telegram 發送

修復: COMPLETED 狀態直接返回 existing_token,不建立新 decision

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-26 22:24:03 +08:00
OG T
ce7f8a1b23 feat(api): ADR-030 Phase 4 自動執行機制
實作低風險操作自動執行策略:

1. auto_approve.py - 自動執行策略服務
   - AutoApprovePolicy: 評估是否可自動執行
   - 條件: LOW 風險 + 信任分數 >= 5 + Playbook 成功率 >= 95%
   - CRITICAL 永遠不自動執行
   - 完整審計追蹤

2. trust_engine.py - 新增 singleton
   - get_trust_manager(): 取得全域 TrustScoreManager

3. decision_manager.py - 整合自動執行 (Tier 3 紅區)
   - Step 5 加入 AutoApprovePolicy 判斷
   - 條件滿足時跳過 Telegram,直接執行
   - _auto_execute(): 自動執行邏輯
   - 失敗時 fallback 到人工審核

流程:
Incident → 分析 → AutoApprovePolicy 評估
  ├─ 可自動執行 → 直接執行 → 完成
  └─ 需人工審核 → Telegram 通知 → 等待批准

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-26 22:13:10 +08:00
OG T
17ee8838be revert: 還原 Telegram + CD 到正常狀態
還原檔案到 d071019 版本:
- decision_manager.py: 移除 Redis dedup 邏輯
- telegram_gateway.py: 還原 INC- 前綴邏輯
- cd.yaml: 移除 selector immutable 處理和 Token injection

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-26 22:10:52 +08:00
OG T
60e9538889 feat(api): ADR-030 Phase 2 診斷資料收集強化
實作智能自動修復系統的資料收集層:

1. k8s_diagnostics.py - K8s 診斷服務
   - Pod Events/Logs/ResourceUsage 收集
   - CrashLoopBackOff/OOM/ImagePull 偵測
   - 非同步並行收集 + 錯誤容忍

2. diagnosis_aggregator.py - 診斷聚合器
   - 整合 K8s + SignOz + Expert Rules
   - DiagnosisContext 提供結構化 LLM Prompt
   - DiagnosisSignal 信號分析

3. decision_manager.py - 決策引擎整合
   - Step 2.5 加入診斷收集
   - 傳遞 diagnosis_context 給 LLM

4. openclaw.py - LLM Prompt 增強
   - 整合 K8s/SignOz 深度診斷上下文
   - 支援 diagnosis_signals 摘要

ADR-030 架構: 診斷先行,根因分析,非盲目重啟

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-26 21:55:50 +08:00
OG T
bb6151cf44 revert: 移除 Telegram Redis dedup 邏輯
原因: dedup 邏輯導致 Telegram 完全無法發送
保留: INC- 前綴修復 (approval_id = incident.incident_id)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-26 21:53:39 +08:00
OG T
d148756b67 feat(api): LLM 整合 Expert System 診斷上下文
長期方案實作: Expert 診斷 + LLM 智能分析

變更:
1. decision_manager._dual_engine_analyze():
   - 測試資源跳過 LLM (省錢)
   - 傳遞 Expert 診斷上下文給 LLM
   - LLM 失敗時根據診斷調整回應

2. openclaw.generate_incident_proposal():
   - 新增 expert_context 參數
   - Prompt 包含 Expert 診斷結果
   - 引導 LLM 基於診斷做決策

流程:
Playbook → Expert診斷 → LLM(with context) → 智能建議

這是「先診斷根因,再決定行動」的正確實作

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-26 21:41:26 +08:00
OG T
2ef7daccde feat(api): Expert System 智能診斷重構 - 根因優先
問題: 原本的 Expert System 只會建議「重啟」,不診斷根因

重構:
1. 分層診斷:
   - 層 1: 測試資源過濾 (test/demo/tmp 自動忽略)
   - 層 2: 規則匹配 (更精確的 pattern)
   - 層 3: 診斷指令 (提供 kubectl 診斷命令)

2. 根因優先:
   - OOM → 檢查記憶體用量,非重啟
   - CrashLoop → 查看崩潰日誌,非重啟
   - ImagePull → 檢查映像配置,非重啟
   - Default → 人工診斷,非盲目重啟

3. 人工標記:
   - 未知問題標記 human_review_required
   - 降低 confidence (0.5)

這才是正確的自動化修復:先診斷根因,再決定行動

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-26 21:35:20 +08:00
OG T
35aa690bf1 fix(decision_manager): 修復 Telegram 重複發送問題
問題:
- Phase 6.5 (765ee39) 修改導致每次 poll 都建立新 decision
- 觸發 Telegram 轟炸 (INC-INC-INC- prefix bug)

修復:
- 移除 INC- 重複前綴 (line 83)
- 加入 Redis 去重機制 (10 分鐘 TTL)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-26 19:21:36 +08:00
OG T
d1f0bbfbcd refactor(api): Phase 17 P1 Tier 3 紅區服務 Protocol 定義
新增 5 個紅區核心服務的 Protocol 介面:
- IDecisionManager: 決策狀態機
- ITrustScoreManager: 信任評分引擎
- IIncidentEngine: 事件處理引擎
- IMultiSigRedisService: 分散式鎖服務
- ITelegramSecurityInterceptor: 安全攔截器

符合 leWOOOgo 積木化規範:
- 支援依賴注入 (DI)
- 便於測試時 Mock
- 型別約束確保實作一致性

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-26 12:49:30 +08:00
OG T
2e75a20150 feat(api): Phase 7.5-7.6 Playbook 整合決策與自動萃取
Phase 7.5: DecisionManager 三軌決策
- 新增 Playbook 優先匹配 (similarity >= 85%)
- 三軌決策順序: Playbook > LLM > Expert System
- 整合 PlaybookService 推薦引擎

Phase 7.6: 自動萃取機制
- approval_execution.py 成功執行後觸發萃取
- 條件: RESOLVED/CLOSED + effectiveness >= 4
- 滿分 (5) 自動核准 Playbook

測試:
- 13 個 Playbook 單元測試全部通過
- 修復 Incident 模型欄位對應 (reasoning_steps)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-26 11:09:25 +08:00
OG T
765ee39a90 feat(api): Phase 6.5 Statistics API + Y/n 按鈕修復
新增:
- /stats/incidents/summary - 事件總覽統計
- /stats/incidents/resolution - 解決時間 P50/P95
- /stats/ai-performance - AI 提案效能
- /stats/services/affected - 受影響服務排名

修復:
- Y/n 按鈕永久禁用問題 (decision.state=completed 但 incident 未解決)
- decision_manager.py: 只有當 incident 也已解決才返回已完成的 decision

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-24 09:50:03 +08:00
OG T
4f1c8ae473 fix(ci): Resolve Python and TypeScript lint errors
- Fix 35 Python ruff errors (B904, F841, E722, E741, B007, B008)
- Add eslint config for lewooogo-core package
- Update pyproject.toml to new ruff lint config format
- Relax frontend eslint rules to warnings for unused vars
- Allow console.* for debugging (TODO: unified logger)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-24 09:20:56 +08:00
OG T
6f049877fc fix(lint): ruff auto-fix + lewooogo-core src 加入 git
- Python: ruff --fix 修復 280 個 lint 錯誤
- lewooogo-core: src/ 目錄未追蹤,導致 CI eslint 失敗

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 23:51:37 +08:00
OG T
eca3759fde fix(telegram): 修復 Signal Worker 流程 Telegram 通知斷鏈
問題:
- Phase 6 Signal Worker 新架構沒有整合 Telegram 推送
- 決策就緒時 Telegram 完全沒收到通知
- 這是嚴重的監控盲點!

修復:
- 新增 _push_decision_to_telegram() 推送函數
- DecisionManager 決策 READY 時自動推送
- 非阻塞執行 (asyncio.create_task)

Telegram 通知內容:
- 告警來源 (LLM/Expert System)
- 受影響服務
- 建議動作
- 風險等級
- 信心分數

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 23:22:26 +08:00
OG T
7478dc0254 feat(phase6-9): Complete modular architecture and Agent Teams
Phase 6.4 - Modular Architecture:
- Add lewooogo-brain adapters for LLM providers
- Add lewooogo-data dual memory (Redis + PostgreSQL)
- Implement consensus engine for multi-agent decisions
- Add incident memory service for historical context

Phase 9 - Agent Teams (Claude Agent SDK):
- Add base agent class with Claude Sonnet 4 integration
- Implement action planner, blast radius, and security agents
- Add agent API endpoints and proposal workflow
- Integrate ADR-009 OpenClaw Agent Teams architecture

DevOps & CI/CD:
- Add GitHub Actions CI/CD workflows (ci.yaml, cd.yaml)
- Add pre-commit hooks and secrets baseline
- Add docker-compose for local development
- Update Kubernetes network policies

Frontend Improvements:
- Add auto-healing error boundary component
- Update i18n messages for agent features
- Enhance dual-state incident card with execution feedback

Documentation:
- Add 7 ADRs covering MCP, design system, architecture decisions
- Update ARCHITECTURE_MEMORY.md with modular design
- Add GLOBAL_RULES.md and SOUL.md for project identity

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 18:40:36 +08:00
OG T
0aaf6a276b feat(api,web): Phase 6.5 DecisionManager with dual-engine fallback
Backend:
- Add DecisionManager with state machine (INIT→ANALYZING→READY→EXECUTING)
- Implement Expert System rules engine (100% local, never fails)
- Dual-engine: LLM (primary) + Expert System (fallback)
- Auto-generate decision_token for each incident
- 30-second timeout guarantee

Frontend:
- Use decision.state to unlock [Y/n] buttons
- Display AI action suggestion in card
- Show source indicator [AI] or [EXP]
- Generate proposal on-demand if needed

Fixes: UI locked with hourglass when LLM times out

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 13:19:55 +08:00