OG T
7768924fea
fix(flywheel): 自動修復後移除 Telegram 按鈕 + 心跳告警排除飛輪
...
CD Pipeline / build-and-deploy (push) Failing after 6m56s
問題: 自動修復成功後 Telegram 卡片仍顯示批准/拒絕/靜默按鈕
Fix 1 — Telegram 卡片回饋閉環 (積木化合規):
- telegram_gateway.send_approval_card: 發送後自動存 tg_approval:{id} 到 Redis
- telegram_gateway.mark_auto_repaired(): 新方法 — 移除按鈕 + reply 結果
- _try_auto_repair_background: 改呼叫 gateway.mark_auto_repaired() (Service 層)
Fix 2 — 心跳/看門狗告警排除飛輪:
- constants.py: is_heartbeat_alertname() + HEARTBEAT_ALERT_NAMES
- NoAlertsReceived2Hours 等不觸發 _try_auto_repair_background
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-10 11:52:04 +08:00
OG T
8c2983b70a
fix(api+web): CORS 補 K3s NodePort origins + sign 補 signer_id/name
...
CD Pipeline / build-and-deploy (push) Has been cancelled
CORS (config.py):
- 補 http://192.168.0.125:32335 (K3s VIP NodePort)
- 補 http://192.168.0.120:32335 + 121:32335 (K3s nodes)
- 修前: 內網瀏覽器開 :32335 打 API 全 CORS blocked
(incidents Failed to fetch / monitoring 無法連線根因)
sign body (pending-approvals-card.tsx):
- signer: 'web-ui' → signer_id: CURRENT_USER.id + signer_name: CURRENT_USER.name
- 修前: POST /approvals/{id}/sign 回 403 (缺必填欄位 422 誤報為 403)
— 實際是 422 Field required signer_id + signer_name
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-09 19:50:48 +08:00
OG T
7857c25677
feat: Ollama 本機 Tool Calling 取代 NVIDIA 雲端 (44s→~5s)
...
CD Pipeline / build-and-deploy (push) Has been cancelled
- nvidia_provider.py: 新增 OllamaToolProvider
- 實作 INvidiaProvider protocol,打 Ollama /v1/chat/completions
- 模型: llama3.1:8b (tool calling 最穩定的 8B)
- 延遲: 44s → ~5s(本機 M1 Pro 192.168.0.111)
- get_nvidia_provider() 根據 USE_OLLAMA_TOOL_CALLING 切換
- config.py: USE_OLLAMA_TOOL_CALLING=True (預設開啟), OLLAMA_TOOL_MODEL=llama3.1:8b
- 回退: USE_OLLAMA_TOOL_CALLING=False → 恢復 NvidiaProvider 雲端
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-09 14:55:04 +08:00
OG T
8b5db2f58e
feat(infra): 切換 Ollama 到 M1 Pro 192.168.0.111 + NetworkPolicy 更新
...
CD Pipeline / build-and-deploy (push) Has been cancelled
- OLLAMA_URL: 188 → 111 (M1 Pro, 40+ tok/s vs 0.45 tok/s)
- OPENCLAW_DEFAULT_MODEL: qwen2.5:7b-instruct → deepseek-r1:14b (SRE最強推理)
- OPENCLAW_TIMEOUT: 90s → 120s (deepseek-r1:14b 實測最慢 54s)
- NetworkPolicy v1.3: 新增 192.168.0.111:11434 egress,移除 188 的 Ollama port
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-08 22:05:14 +08:00
OG T
84f1f9f021
refactor(config): GITHUB_WEBHOOK_SECRET → GITEA_WEBHOOK_SECRET (ADR-059)
2026-04-05 14:25:47 +08:00
OG T
5ad403b287
fix(p0): v4.3 — 實測確認 Ollama CPU-only 不可用,DIAGNOSE 統一走 NIM
...
實測依據 (2026-04-05):
- Ollama llama3.2:3b CPU-only: 238s 回 {"ok":true},生產不可用
- Nemotron NIM: 2.2s~27.3s,avg 10.6s,一直是主力(Phase 22 起)
- NIM 從未有隱私問題,Incident 資料一直送雲端 GPU
變更:
- ai_router.py: _local_fallback_chain 廢棄(空 list)
- ai_router.py: DIAGNOSE route/route_sync 改回 _full_fallback_chain
- config.py: 更新 timeout 說明反映實測結果
- test_p0_diagnose_routing.py: 更新 docstring
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-05 01:49:06 +08:00
OG T
a81bf50537
feat(drift): ADR-057 adopt() Gitea PR API 實作
...
- DriftAdoptService: 透過 Gitea REST API 建立 branch + commit + PR
不在 API Pod 內執行 git(修復 C2 安全漏洞)
- adopt() 端點: 501 → 真實實作(呼叫 DriftAdoptService)
- config.py: 新增 GITEA_API_URL / GITEA_API_TOKEN / GITEA_REPO_OWNER / GITEA_REPO_NAME
- K8s secret awoooi-secrets 已注入 GITEA_API_TOKEN
- drift.py: 移除 trigger_drift_scan 中未使用的 interpreter 變數
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-05 00:39:29 +08:00
OG T
96d5e18924
fix(p0): 實測修正 — timeout 依 benchmark 調整,_local_fallback_chain 移除雲端 Nemotron
...
- config.py: NEMOTRON_DIAGNOSE_TIMEOUT_SECONDS=60s (NIM 實測 11-45s + 15s buffer)
- config.py: OLLAMA_DIAGNOSE_TIMEOUT_SECONDS=200s (Ollama 實測 ~173s + 27s buffer)
- ollama.py: 新增 per-task timeout (diagnose/force_local 用 200s)
- ai_router.py: _local_fallback_chain 移除 Nemotron (NIM=雲端,不可進 local chain)
- ai_router.py: v4.2 — Option C 分情境路由正式確立
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-05 00:29:09 +08:00
OG T
3455044457
feat(phase25): Nemotron 主動防禦三方向 P0+P1+P2 完整實作
...
CD Pipeline / build-and-deploy (push) Failing after 38s
Type Sync Check / check-type-sync (push) Failing after 35s
P0 - DIAGNOSE Privacy-First Routing:
- ai_router.py: _local_fallback_chain [NEMOTRON→OLLAMA→REJECT]
- DIAGNOSE 意圖 override 改為 NEMOTRON (原 OLLAMA)
- DIAGNOSE fallback 使用 local-only 鏈,不觸碰雲端
- 全部失敗時 REJECT + Telegram 通知
- config.py: NEMOTRON_DIAGNOSE_TIMEOUT_SECONDS=30, OLLAMA_DIAGNOSE_TIMEOUT_SECONDS=60
- nemotron.py: 根據 context[task_type] 選擇 timeout
P1 - Knowledge Auto-Harvesting:
- models/knowledge.py: EntryType.AUTO_RUNBOOK + ANTI_PATTERN + symptoms_hash
- EntryStatus.PUBLISHED (ANTI_PATTERN 直接發布,無需審核)
- models/playbook.py: SymptomPattern.compute_hash() (16字元確定性 hash)
- services/runbook_generator.py: NemotronRunbookGenerator (v1.1)
- generate_runbook() → AUTO_RUNBOOK (DRAFT) + Telegram 審核 card
- generate_anti_pattern() → ANTI_PATTERN (PUBLISHED) + Telegram 通知
- 使用 nvidia.chat() (正確介面),Nemotron 超時時 Minimal fallback
- knowledge_service.py: check_anti_pattern(symptoms_hash, days=7)
- db/models.py: symptoms_hash VARCHAR(16) + ix_knowledge_symptoms_hash
- repositories/knowledge_repository.py: create() 支援 symptoms_hash + status
- auto_repair_service.py: anti_pattern_gate 在 decide() + runbook hook 在 execute()
- migrations/phase8_symptoms_hash.sql: ALTER TABLE + partial index + PUBLISHED constraint
P2 - Config Drift Detection:
- models/drift.py: DriftItem/DriftReport/DriftLevel/DriftIntent/DriftStatus
- services/drift_detector.py: GitStateReader + K8sStateReader + DriftDetector
- services/drift_analyzer.py: 白名單過濾 + DriftLevel 分級
- services/drift_interpreter.py: NemotronDriftInterpreter(意圖分析,不生成修復指令)
- services/drift_remediator.py: rollback(kubectl apply) + adopt(git push gitea)
- api/v1/drift.py: POST /scan, GET /reports, POST /rollback, POST /adopt
- migrations/phase9_drift_reports.sql: drift_reports 表
- k8s/drift-cronjob.yaml: 每小時自動掃描 CronJob
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-04 12:35:05 +08:00
OG T
c65ed5b1c9
feat(telegram): SRE 戰情室群組三頭政治 Triumvirate (ADR-053)
...
CD Pipeline / build-and-deploy (push) Successful in 7m6s
- config.py: 新增 OPENCLAW_BOT_TOKEN / NEMOTRON_BOT_TOKEN / SRE_GROUP_CHAT_ID
- telegram_gateway.py: send_to_group / send_as_openclaw / send_as_nemotron / trigger_group_ai_discussion / _send_approval_card_to_group
- send_approval_card 告警發送後非同步觸發群組 AI 雙向討論
- configmap: SRE_GROUP_CHAT_ID=-1003711974679
- secrets: OPENCLAW_BOT_TOKEN / NEMOTRON_BOT_TOKEN CHANGE_ME 佔位
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-03 17:16:05 +08:00
OG T
73e8f8ab77
feat(ai): Phase 24-A+B1 — AI Provider Registry + 絞殺者包裝 (ADR-052)
...
E2E Health Check / e2e-health (push) Successful in 16s
CD Pipeline / build-and-deploy (push) Has been cancelled
Brain Layer 雙軌 Registry 架構:
- 新建 src/services/ai_providers/ 目錄 (interfaces + 4 providers)
- OllamaProvider (local, rca/chat/code_review)
- GeminiProvider (cloud, rca/chat)
- ClaudeProvider (cloud, rca/chat/code_review)
- OpenClawNemoProvider (cloud, rca — 委派 188→NIM)
- 擴展 ai_router.py 加入:
- AIProviderRegistry (動態註冊/啟停)
- AIRouterExecutor (Cache + 閘門 CB/RL/Sem + 執行)
- openclaw.py 絞殺者包裝: USE_AI_ROUTER=true 走新路徑
- config.py + ConfigMap 加入 USE_AI_ROUTER=false (安全預設)
- ADR-052 正式文件 (14 項決策 D1-D14)
- HARD_RULES v1.7 加入 AI Router 規範
安全: USE_AI_ROUTER=false 預設不啟用,需手動開啟觀察
回滾: kubectl set env deployment/awoooi-api USE_AI_ROUTER=false
2026-04-02 ogt: Phase 24 首批實作
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-04-02 13:16:09 +08:00
OG T
c9c60c3a61
feat(mcp-integrations): Phase S 架構修復 + MCP 整合基礎建設
...
E2E Health Check / e2e-health (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
Type Sync Check / check-type-sync (push) Failing after 22s
Phase S 技術債修復 (首席架構師審查 82→完整):
- S-01: generate_alert_fingerprint 移至 AlertAnalyzer.generate_fingerprint() staticmethod
- S-04: 移除 Pydantic v2 deprecated json_encoders (直接用原生 datetime 序列化)
Sentry MCP 整合 (Phase 23):
- ADR-048: Sentry→OpenClaw AI Triage 架構決策
- sentry_webhook_service.py: parse/analyze/create_incident/build_message Service 層
- config.py: SENTRY_WEBHOOK_SECRET (Fail-Closed HMAC-SHA256)
Playwright MCP 整合 (短期):
- smoke.spec.ts: 5 頁面 E2E smoke test (home/dashboard/incidents/approvals/terminal)
- cd.yaml: E2E Smoke Test 步驟 + Telegram 🎭 Smoke 狀態通知
長期規劃 ADR:
- ADR-049: Figma Code Connect 設計系統同步
- ADR-050: Telegram 互動式 Incident 2.0 (6鍵 Inline Keyboard)
- ADR-051: Context7 依賴升級顧問 (Next.js 14→15, FastAPI 0.115→0.128)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-01 16:20:57 +08:00
OG T
394f85954e
fix(api): 修復 Y/n 404 + 停用 Multi-Sig
...
E2E Health Check / e2e-health (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
1. proposal_service._load_incident() 改用 incident_service.get_from_working_memory()
- brain engine 使用 awoooi:incidents: prefix,資料實際在 incident: prefix
- 兩個 prefix 不符導致永遠 404 (Y/n 按鈕全部失敗)
- 2026-04-02 ogt
2. trust_engine CRITICAL required_signatures 2→1
- 統帥決策: 所有審核只需 1 層簽核
- 2026-04-02 ogt
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-01 16:16:28 +08:00
OG T
eccf61fbc9
fix(ai): 修復假信心度 + 解除 Shadow Mode (Phase 22 P1)
...
CD Pipeline / build-and-deploy (push) Has been cancelled
E2E Health Check / e2e-health (push) Has been cancelled
1. openclaw.py: LLM 截斷時 confidence 0.82→0.0 (禁止偽造信心度)
2. prompts.py: NEMOTRON schema 範例值改用佔位符,防模型照抄 0.75
3. configmap: SHADOW_MODE_ENABLED=false,開放 low 風險自動執行
條件門檻: confidence≥90% + trust_score≥5 + playbook_success≥95%
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-01 15:59:42 +08:00
OG T
0fd53422c6
fix(openclaw): NEMOTRON_SYSTEM_PROMPT confidence/reasoning 移至最前
...
CD Pipeline / build-and-deploy (push) Failing after 5m36s
E2E Health Check / e2e-health (push) Successful in 17s
Nemo-4B 4B 參數模型輸出長度有限,confidence/reasoning 排在 schema 末尾
時常被截斷,導致 openclaw.py:1045 fallback 補 0.82 假數據。
修復:將 confidence 和 reasoning 移至 schema 最前兩個欄位,確保模型
輸出截斷時仍包含最關鍵欄位。同時明確禁止模型抄範例值。
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-01 13:19:18 +08:00
OG T
22de22c989
refactor(phase-s): Phase S 技術債清理 - 五項架構改善
...
S-01: generate_alert_fingerprint() 移至 alert_analyzer_service (Router→Service)
S-02: 移除廢棄 USE_NEW_ENGINE config (Phase R 已完成歷史使命)
S-03: github_webhook.py linter 清理 (Field unused + delivery_id noqa)
S-04: Pydantic v2 遷移 - approval/incident models (class Config → ConfigDict)
S-05: Skill 09 v1.1 更新 (USE_NEW_ENGINE 廢棄說明)
測試: 393 passed, 零失敗
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-01 13:12:02 +08:00
OG T
2ba61acf72
fix(api): Phase R-R2.2 首席架構師 72/100 P2 修復
...
P2-01 signal_worker.py: persisted_to_pg 改用 getattr 防 BrainIncident AttributeError
P2-02 IIncidentEngine Protocol: update_incident_status → update_status 對齊 brain 實作
P2-03 config.py USE_NEW_ENGINE: 標記失效 + 回滾路徑更正 (git revert 而非 kubectl)
ADR-046: Option B (IncidentConverter) 決策完成,待實作清單更新
ADR-024: 審查結論 + 正式回滾指令更新
Skill 02: v2.5 版本記錄
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-03-31 22:33:08 +08:00
OG T
dd526684ab
feat(ai): Phase 22 OpenClaw + Nemotron 協作架構 (ADR-044)
...
E2E Health Check / e2e-health (push) Successful in 17s
統帥批准實作「仲裁-執行分工」架構:
- OpenClaw = 仲裁者 (Why + Risk Level)
- Nemotron = 執行者 (How + kubectl Command)
新增功能:
- config.py: ENABLE_NEMOTRON_COLLABORATION Feature Flag
- openclaw.py: generate_incident_proposal_with_tools()
- openclaw.py: _call_nemotron_tools() Nemotron 呼叫
- telegram_gateway.py: TelegramMessage Nemotron 欄位
- telegram_gateway.py: format_with_nemotron() 雙區塊格式
- decision_manager.py: 整合協作方法
- proposal_service.py: 整合協作方法
觸發條件:
- LOW 風險 → 僅 OpenClaw
- MEDIUM/HIGH/CRITICAL → OpenClaw + Nemotron 雙軌
首席架構師審查: 83/100 條件通過
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-31 18:52:53 +08:00
OG T
d03668669b
fix(openclaw): optimize for Nemo-4B with lightweight prompt and resilient parsing
E2E Health Check / e2e-health (push) Successful in 26s
2026-03-31 15:59:58 +08:00
OG T
fb0ddf305c
fix(api): fix dockerfile to include models.json, remove huge prompt example to fit 4K limit
E2E Health Check / e2e-health (push) Successful in 17s
2026-03-31 14:03:34 +08:00
OG T
46843c8e19
fix(nvidia): revert to nemotron-mini, truncate context for 4K limit, enforce precise confidence
E2E Health Check / e2e-health (push) Successful in 17s
2026-03-31 13:57:10 +08:00
OG T
bb85d89874
refactor(api): Phase A P1 快速勝利 (3 項)
...
CD Pipeline / build-and-deploy (push) Has been cancelled
E2E Health Check / e2e-health (push) Has been cancelled
1. 常數提取: SSE_DELAY_SECONDS, MAX_APPROVAL_DISPLAY
2. 錯誤訊息安全化: sanitize_error_message() 移除敏感資訊
3. CI/CD alertname 配置化: is_cicd_alertname() 函數
首席架構師審查 P1 改進 (非阻塞)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-30 01:44:42 +08:00
OG T
27509db212
feat(api): Wave 1 安全網 - Circuit Breaker + Global Repair Cooldown
...
ADR-038: OpenClaw 雙層保護
- Layer 1: Circuit Breaker (5 failures → 60s cooldown)
- Layer 2: Concurrency Semaphore (max 3 concurrent)
- 新增 src/core/circuit_breaker.py
ADR-039: 全域修復熔斷
- Global Cooldown: 5 repairs/15min → freeze
- StatefulSet Blacklist: postgres/redis/clickhouse 禁止自動重啟
- 新增 src/services/global_repair_cooldown.py
- 整合到 auto_repair_service.py
測試:
- test_circuit_breaker.py (狀態轉換 + Semaphore)
- test_global_repair_cooldown.py (黑名單 + 計數閾值)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-29 15:48:03 +08:00
OG T
d89f0520f9
fix(api): 修復 34 個 Ruff lint 錯誤
...
- 自動修復 import 排序、unused imports
- 手動修復 raise from、isinstance union、unused variable
- scripts/ 暫時保留 (非 CI 阻擋)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-29 15:27:49 +08:00
OG T
b77e151387
feat(ai): ADR-036 NVIDIA Nemotron Tool Calling 整合
...
Phase 20 - 提升 Tool Calling 精準度 50% → 83.3%
新增:
- src/models/nvidia.py: Pydantic Schema
- src/services/nvidia_provider.py: NvidiaProvider 類別
- tests/test_nvidia_provider.py: 15 項單元測試 (全部通過)
修改:
- ai_router.py: AIProvider.NVIDIA + route_tool_calling()
- ai_rate_limiter.py: NVIDIA 限制 (5 RPM, 100/day)
- models.json: NVIDIA 配置
- cd.yaml: Secrets 注入 NVIDIA_API_KEY
路由策略:
- Tool Calling: Nemotron → Gemini → Claude
- 一般對話: Ollama → Gemini → Claude (不變)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-29 00:00:08 +08:00
OG T
d206460751
feat(security): Phase 20 CSRF 防護實作
...
Phase 19 首席架構師審查指出: 核鑰 UX 安全性缺 CSRF 防護
後端:
- 新增 src/core/csrf.py (Double Submit Cookie 模式)
- 新增 src/api/v1/csrf.py (GET /api/v1/csrf/token)
- 新增 src/models/csrf.py (CSRFTokenResponse)
- 修改 approvals.py sign/reject/bulk 端點加入 CSRFToken 驗證
前端:
- 新增 hooks/useCSRF.ts (React Hook)
- 修改 approval.store.ts 整合 CSRF Token 參數
安全特性:
- 256-bit Token (secrets.token_hex)
- 時序安全比較 (secrets.compare_digest)
- SameSite=Strict Cookie
- 1 小時 Token 有效期
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-28 18:31:58 +08:00
OG T
e5ded3b3f2
feat(phase19): OmniTerminal + GenUI + Hybrid SSE 架構實作 (Wave 0-2)
...
Phase 19 OmniTerminal MVP 完成:
- Wave 0: Backend (Hybrid SSE POST→GET 架構)
- Wave 1: Frontend (OmniTerminal 狀態機 + GenUI Registry)
- Wave 2: UI 組件 (8 個 GenUI 動態卡片)
ADR 文檔:
- ADR-031: OmniTerminal SSE 架構
- ADR-032: GenUI 動態渲染框架
- ADR-033: K3s HA 架構設計
GenUI 組件:
- GenUIRenderer, K8sPodStatusCard, SentryErrorCard
- MetricsSummaryCard, IncidentTimelineCard
- TraceWaterfallCard, ApprovalCard, NuclearKeyButton
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-28 00:17:26 +08:00
OG T
801b08a4b7
fix(api): AI_FALLBACK_ORDER 無法正確解析 JSON 格式
...
根因: ConfigMap 用 JSON '["gemini","ollama","claude"]'
但 validator 用 split(",") 解析,導致無法匹配任何 provider
結果永遠用 default ["ollama","gemini","claude"]
影響: /api/v1/incidents 超時 (Ollama CPU 推理慢)
修復: 新增 JSON 格式支援,優先嘗試 json.loads()
這是根因修復,不是重啟!
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-26 20:10:56 +08:00
OG T
00e2c94a8e
ci: API 分層檢查 + LLM 測試移至 Nightly
...
CI 強化:
- 新增 API Layer Check (#96 ): services/repositories/models 分層規則
- LLM 測試移至 nightly-llm.yaml (CPU 推理 ~300s/測試)
分層規則:
- services 禁止引用 api/routers
- repositories 禁止引用 services
- models 禁止引用業務層
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-26 19:10:30 +08:00
OG T
a9f8ad56c1
chore: 未提交變更整理 (API core + docs + scripts)
...
API 核心:
- constants.py: 系統常量定義
- unit_of_work.py: Unit of Work 模式
- incident_approval_service.py: Incident-Approval 同步服務
文檔更新:
- LOGBOOK.md: 進度更新
- AWOOOI_AGENTIC_WORKSPACE_ROADMAP.md: 路線圖
- 2026-03-26_llm_testing_evaluation.md: LLM 測試評估
- phase5_telemetry_architecture.md: 遙測架構
- SECRETS_REFERENCE.md: 密鑰參考
配置/腳本:
- Skill 02 v1.x: leWOOOgo 後端更新
- .dependency-cruiser.cjs: 依賴規則
- demo-multisig-flow.sh: 演示腳本
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-26 19:10:12 +08:00
OG T
30153496d1
fix(api): 修復全部 lint 錯誤 (ruff --fix)
...
- Import sorting (I001)
- Unused imports (F401)
- f-string without placeholders (F541)
- Loop variable unused (B007)
- zip() strict parameter (B905)
- Exception chaining (B904)
- collections.abc imports (UP035)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-26 16:06:20 +08:00
OG T
e58da5c534
feat(api): Phase 13.2 #83 Grafana MCP Tool
...
New MCP provider for Grafana dashboard integration:
- list_dashboards: List available dashboards with filtering
- get_dashboard: Get dashboard details by UID
- get_panel_data: Query panel data via Grafana Query API
- generate_dashboard_url: Generate shareable dashboard URLs
Security:
- API key authentication (Bearer token)
- Dashboard UID validation (alphanumeric + dash/underscore)
- Read-only operations only
- 30s request timeout
Config:
- GRAFANA_URL (default: http://192.168.0.188:3000 )
- GRAFANA_API_KEY
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-26 15:36:17 +08:00
OG T
30f045bf28
feat: ADR-019 System Prompt 集中管理 + Nightly LLM Workflow
...
新增:
- docs/adr/ADR-019-system-prompt-management.md - System Prompt 規範
- apps/api/src/core/prompts.py - 集中管理 System Prompts
- .github/workflows/nightly-llm.yaml - 每夜 LLM 迴歸測試
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-26 12:27:47 +08:00
OG T
46ab6a838a
fix(api): 修復 ruff lint 錯誤
...
- langfuse_client.py: import Callable from collections.abc
- telemetry.py: import block 格式化
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-26 09:27:00 +08:00
OG T
b6cff31653
feat(api): Phase 15.3 Deep Linking 三系統互連
...
實現 Sentry ↔ SignOz ↔ Langfuse 零斷鏈觀測:
新增 deep_linking.py:
- SignOz Trace URL 生成器
- Langfuse Trace URL 生成器
- Sentry Issue URL 生成器
- get_all_links() 統一取得所有連結
整合點:
- main.py: Sentry before_send 注入 otel_trace_id + signoz_trace_url
- langfuse_client.py: 自動注入 OTEL trace_id 到 metadata
- openclaw.py: SignOz span 記錄 langfuse.trace_id 反向連結
架構圖:
┌─────────┐ trace_id ┌─────────┐ trace_id ┌──────────┐
│ Sentry │◄────────►│ SignOz │◄────────►│ Langfuse │
│ Errors │ │ Traces │ │ LLMOps │
└─────────┘ └─────────┘ └──────────┘
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-26 00:48:28 +08:00
OG T
0d31ccb911
feat(api): Phase 15.2 Redis Trace Context 傳遞
...
實現 Redis Streams 跨服務追蹤零斷鏈:
- telemetry.py: 新增 get_trace_context() + restore_trace_context()
- webhooks.py: Producer 注入 _trace_id, _span_id 到 Redis
- signal_worker.py: Consumer 還原 Trace Context 建立子 Span
架構: API → Redis Streams → Worker 完整追蹤鏈
格式: W3C Trace Context (traceparent)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-26 00:40:20 +08:00
OG T
1ac8965a7a
feat(api): Phase 15.1 Langfuse LLMOps 整合 + 模型升級
...
## 新功能
- Langfuse 自建部署 (192.168.0.110:3100)
- langfuse_client.py - LLM 呼叫追蹤包裝
- OpenClaw 整合 Langfuse trace
## 模型升級 (統帥批准)
- 生產預設: llama3.2:3b → qwen2.5:7b-instruct
- 摘要任務: llama3.2:3b (速度優先)
## 配置更新
- requirements.txt: +langfuse>=2.0.0
- config.py: +LANGFUSE_* 設定
- models.json: 更新 Ollama 模型配置
- K8s: Secret + ConfigMap 更新
## 審查通過
- 模組化檢查 ✅
- 核心測試 31/31 ✅
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-26 00:32:19 +08:00
OG T
a202a2693a
feat(api): Phase 16 R1.2 絞殺者模式 (Strangler Fig Pattern)
...
- 新增 USE_NEW_ENGINE 設定開關 (預設 False)
- incident_memory.py 雙軌切換: 內嵌版本 ↔ lewooogo-brain
- 自動降級: lewooogo-brain 不可用時回退內嵌版本
- 回滾指令: kubectl set env deployment/awoooi-api USE_NEW_ENGINE=false
統帥批准 2026-03-26 立即執行
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-25 15:23:03 +08:00
OG T
22cada563b
fix(config): Share Redis DB 0 with OpenClaw
...
- Change REDIS_URL from DB 10 to DB 0
- AWOOOI and OpenClaw now share the same Redis database
- Incidents created by OpenClaw visible in AWOOOI UI
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-24 18:44:34 +08:00
OG T
ad05bbf64c
feat(api): Add human feedback API ( #6 ) + async_utils module
...
Phase 6.6 人類回饋 API:
- PUT /api/v1/incidents/{id}/feedback endpoint
- effectiveness_score (1-5), human_feedback, learning_notes fields
- Sync to Redis (Working Memory) + PostgreSQL (Episodic Memory)
- For stats aggregation at /api/v1/stats/feedback/summary
async_utils module:
- fire_and_forget() for safe background tasks
- Prevents swallowed exceptions in asyncio.create_task()
- Addresses P2 #8 tech debt
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-24 14:16:17 +08:00
OG T
8159d22db9
refactor: ClawBot → OpenClaw 全域更名
...
- 刪除舊版 clawbot.py (已有新版 openclaw.py)
- 更新 models/ai.py 類型定義 (ClawBotAnalysisRequest/Response)
- 更新 api/v1/ai.py import 與註解
- 更新 Discord username
- 更新所有註解與文檔
依據: feedback_openclaw_naming.md (統帥 2026-03-20 正式命名決議)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-24 12:57:36 +08:00
OG T
4f1c8ae473
fix(ci): Resolve Python and TypeScript lint errors
...
- Fix 35 Python ruff errors (B904, F841, E722, E741, B007, B008)
- Add eslint config for lewooogo-core package
- Update pyproject.toml to new ruff lint config format
- Relax frontend eslint rules to warnings for unused vars
- Allow console.* for debugging (TODO: unified logger)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-24 09:20:56 +08:00
OG T
6f049877fc
fix(lint): ruff auto-fix + lewooogo-core src 加入 git
...
- Python: ruff --fix 修復 280 個 lint 錯誤
- lewooogo-core: src/ 目錄未追蹤,導致 CI eslint 失敗
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-23 23:51:37 +08:00
OG T
e23493741a
fix(telegram): respect C-Suite decision - OpenClaw is sole brain
...
架構修正 2026-03-23 (遵循 C-Suite 決議):
- 鐵律: .188 為唯一大腦,禁止腦分裂
- OpenClaw (192.168.0.188) = 唯一 Telegram Gateway
- AWOOOI API (K8s) = Web API + Sensor,不做 Polling
- TELEGRAM_ENABLE_POLLING 預設 False
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-23 19:25:08 +08:00
OG T
7478dc0254
feat(phase6-9): Complete modular architecture and Agent Teams
...
Phase 6.4 - Modular Architecture:
- Add lewooogo-brain adapters for LLM providers
- Add lewooogo-data dual memory (Redis + PostgreSQL)
- Implement consensus engine for multi-agent decisions
- Add incident memory service for historical context
Phase 9 - Agent Teams (Claude Agent SDK):
- Add base agent class with Claude Sonnet 4 integration
- Implement action planner, blast radius, and security agents
- Add agent API endpoints and proposal workflow
- Integrate ADR-009 OpenClaw Agent Teams architecture
DevOps & CI/CD:
- Add GitHub Actions CI/CD workflows (ci.yaml, cd.yaml)
- Add pre-commit hooks and secrets baseline
- Add docker-compose for local development
- Update Kubernetes network policies
Frontend Improvements:
- Add auto-healing error boundary component
- Update i18n messages for agent features
- Enhance dual-state incident card with execution feedback
Documentation:
- Add 7 ADRs covering MCP, design system, architecture decisions
- Update ARCHITECTURE_MEMORY.md with modular design
- Add GLOBAL_RULES.md and SOUL.md for project identity
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-23 18:40:36 +08:00
OG T
962b1e75a5
refactor: Rename ClawBot → OpenClaw across documentation
...
- Update .awoooi-agent-rules.md (4 occurrences)
- Update docs/api/openapi.yaml (all schema references)
- Update apps/web/tailwind.config.ts (comment)
- Update apps/api/src/core/config.py (comment)
Legacy CLAWBOT_URL field kept for backward compatibility.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-23 14:05:53 +08:00
OG T
9f353343c9
fix(worker): dedicated Redis pool with unlimited timeout for XREADGROUP
...
Root cause: Worker shared Redis pool with API (socket_timeout=5s),
but XREADGROUP blocks for 5s causing timeout errors every cycle.
Fix:
- Add init_worker_redis_pool() with socket_timeout=None
- Worker now uses get_worker_redis() for XREADGROUP operations
- API continues using get_redis() with short timeout
Also destroyed 50 zombie consumers via:
XGROUP DESTROY stream:awoooi_signals awoooi_workers
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-23 09:42:11 +08:00
OG T
b00f318450
fix(api): correct OTEL gRPC endpoint format and SignOz query table
...
Root cause analysis:
1. OTEL gRPC endpoint had http:// prefix which is invalid for gRPC
2. SignOz query was targeting wrong table (signoz_metrics.distributed_samples_v4)
3. Should query signoz_traces.distributed_signoz_index_v2 for trace data
Fixes:
- Remove http:// prefix from OTEL_EXPORTER_OTLP_ENDPOINT (gRPC needs host:port)
- Update SignOz client to query traces table instead of metrics table
- Fix timestamp format (nanoseconds for DateTime64(9))
- statusCode: 0=Unset, 1=Ok, 2=Error
This should enable OTEL traces to reach SigNoz and GlobalPulse to show real metrics.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-23 00:41:51 +08:00
OG T
21ce7056fa
fix(otel): correct OTEL endpoint to port 24317 and fix NetworkPolicy
...
- SigNoz OTEL Collector maps container:4317 to host:24317
- Updated NetworkPolicy egress to allow 24317/24318
- Updated ConfigMap with correct OTEL_EXPORTER_OTLP_ENDPOINT
- Fixed OpenClaw port from 8089 to 8088
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-23 00:06:07 +08:00
OG T
551a305fcf
fix(config): rename _OPENCLAW_TG_USER_WHITELIST_RAW to comply with pydantic v2
...
Pydantic v2 does not allow field names with leading underscores.
Changed from @property pattern to method pattern.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-22 23:40:09 +08:00