Your Name
7e4d995e4b
feat(aiops): add mcp agent loop foundation
CD Pipeline / tests (push) Successful in 1m59s
Code Review / ai-code-review (push) Successful in 28s
run-migration / migrate (push) Failing after 24s
CD Pipeline / post-deploy-checks (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
2026-05-01 13:21:19 +08:00
Your Name
c1a1be61bd
fix(ssh-auto): 主機告警 SSH 自動診斷授權(HostHighCpuLoad 不再卡人工審核)
...
CD Pipeline / build-and-deploy (push) Successful in 9m7s
根因:SSH_MCP_ALLOWED_HOSTS 未設定 → _ssh_execute() 全部攔截
+ auto_approve 只認 kubectl 不認 ssh → 主機告警永遠降級人工
修復:
- ConfigMap: 補 SSH_MCP_ALLOWED_HOSTS 四主機白名單
- alert_rules: HostHighCpuLoad 等從 NO_ACTION 改為 ssh_diagnose 指令
- auto_approve: _has_executable 加入 ssh 開頭識別
- decision_manager: _ssh_execute() 加入 ssh_diagnose 路由
- ssh_provider: 新增 ssh_diagnose tool(ps aux + free -h + df -h,只讀)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-27 20:13:07 +08:00
Your Name
0fd71b3e33
fix(mcp/k8s): _kubectl_scale 補 validate_deployment_exists dry-run
...
CD Pipeline / build-and-deploy (push) Has been cancelled
根因:_kubectl_restart 有 dry-run 驗證,_kubectl_scale 完全沒有
→ gitea(docker-compose,不在 K8s)直接被 kubectl scale 執行
→ Deployment 'gitea' not found in namespace 'awoooi-prod'(INC-20260425-3B6C39)
修復:_kubectl_scale 在執行前加 validate_deployment_exists,
K8s 找不到 deployment 時返回 error 而非繼續執行
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-27 15:59:37 +08:00
Your Name
7f4088bcd0
fix(aiops-p0): 六大病根 P0 全面修復(ADR-092 B4)
...
【P0.1】knowledge_extractor_service.py:210 — AttributeError 修復
- Signal.description 欄位不存在(100% 失敗,KM 每天+5 根因)
- 改用 alert_name + annotations.summary 拼接文字
【P0.2+P0.3】Gate 9+11 唯讀指令鬆綁
- blast_radius_calculator: kubectl get/top/describe/logs/version → score=1(非 50)
- operation_parser: 增加 INVESTIGATE 類型識別(唯讀 kubectl 不回 None)
- executor.py: OperationType 新增 INVESTIGATE enum
- approval_execution.py: INVESTIGATE 路徑直接呼叫 execute_kubectl_command
【P0.4】MCP SSH/K8s Provider 修復
- decision_manager: params= → parameters=(符合 MCPToolProvider.execute 簽名)
- decision_manager: MCPToolResult .get() → .success/.output(dataclass 用法)
- decision_manager + ssh_provider: 補入 hosts 120/121(原 default 缺失)
- auto_approve: phase2_agent_debate source bypass confidence 閾值
【P0.5】告警規則語義矛盾修復
- alert_rules.yaml: 8 條 kubectl 查詢規則 RESTART_DEPLOYMENT → NO_ACTION
(CrashLoopBackOff/PostgreSQL 連線/慢查詢/MinIO 磁碟/K3s 節點/告警鏈路/SSL/CoreDNS 等)
- incident_service.py: cAdvisor/CoreDNS 從 general 拆出獨立分類
【P0.6】proactive_inspector 動態基線 PromQL 全修
- 5 個 MONITORED_METRICS PromQL 全部修正(cadvisor label/datname/blackbox)
- db_connection_pool: datname="awoooi" → "awoooi_prod"
- http_error_rate: 無效 http_requests_total → blackbox probe_success
- cpu/memory: namespace label → name=~"k8s_api_awoooi-api.*"
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-24 15:32:23 +08:00
Your Name
45dbe07188
fix(flywheel): 自動化飛輪六大能力修復(ADR-092 B3)
...
run-migration / migrate (push) Failing after 22s
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Successful in 53s
Type Sync Check / check-type-sync (push) Successful in 2m54s
CD Pipeline / build-and-deploy (push) Has been cancelled
Ansible Lint / lint (push) Has been cancelled
【根因鏈修復】
MCP Provider bugs → PreDecisionInvestigator 失敗 → Agent Debate 無上下文
→ LLM 逾時 → description="待分析" → ADR-091 鐵閘攔截 → tg_sent 未設
→ W-2 Watchdog 誤報「靜默故障」
【六大修復】
1. MCP Provider 三蟲修復
- ssh_provider: asyncssh.run() → conn.run()
- prometheus_provider: KeyError 'query' → .get() 容錯
- k8s_provider: 空 pod_name → 早返回錯誤字典
2. Agent Debate / 決策品質
- decision_manager: 逾時降級文字改為明確描述(繞過 ADR-091 鐵閘)
- intent_classifier: LLM 逾時降級至關鍵字分類(非 None)
3. Watchdog 誤報修復(ADR-092 B3)
- W-2: tg_sent Redis TTL → telegram_message_id IS NULL(DB 真值)
- W-5 新增: suggested_action IN 空/待分析/NO_ACTION + tg_id IS NULL
- approval_timeout_resolver: 60min → 15min,batch 50 → 200
4. Config Drift 自動化
- drift_adopt_service: auto_adopt_if_safe() 六條件安全閘
- drift.py: 背景任務先嘗試自動採納再發人工 Telegram 卡片
5. Playbook 飛輪穩定
- playbook_seed_service: 修復幂等性(deprecated 不視為缺失)
- playbook_evolver: 只載 DRAFT+APPROVED(非全部 294 筆)
6. 可觀測性
- alert_rule_engine: auto_rule 結構化日誌 + Redis 計數器(pipeline)
- auto_approve: reject 原因 Redis 計數器
- heartbeat_report_service: 新增「⚙️ 自動化統計(今日)」區塊
【待人工執行】
psql $DATABASE_URL -f apps/api/migrations/cleanup_duplicate_deprecated_playbooks.sql
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-24 10:55:50 +08:00
OG T
f3236338a5
fix(security): Code Review P0+P1+P2 全修補 — MCP Phase 2b-3 + decision_manager
...
CD Pipeline / build-and-deploy (push) Has been cancelled
P0: decision_manager _fetch_metrics_snapshot 參數型別錯誤
- prom._instant_query(str) → prom._instant_query({"query": str})
- 結果解析 r.get("status")=="success" → r.get("result", [])
P1: prometheus_provider — alertname PromQL injection 防範
- 新增 _RE_SAFE_ALERTNAME 白名單正則
P1: decision_manager — kubectl action 危險字元注入防範
- 新增 _ALLOWED_KUBECTL_PATTERN 白名單,非法指令格式直接拒絕
P1: decision_manager — 6 個 asyncio.create_task() GC 風險
- 新增 _background_tasks: set + _fire_and_forget() helper
- 所有 bare create_task 改用 _fire_and_forget
P1: ssh_provider — Group B 寫入工具強制需要 known_hosts
- known_hosts 未設定或檔案不存在時拒絕執行,防 MITM
P2: sentry_provider — query 語意白名單驗證
- 新增 _RE_SAFE_SENTRY_QUERY,拒絕含特殊字元的 query
P2: argocd_provider — verify=False 改為 ARGOCD_VERIFY_TLS 環境變數開關
- 新增 _tls_verify() helper,預設 false(self-signed cert)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-11 20:10:33 +08:00
OG T
a6e6f389e2
chore: 清理觸發 CD 的臨時注釋
CD Pipeline / build-and-deploy (push) Failing after 8m9s
2026-04-11 19:15:04 +08:00
OG T
40d6536b62
ci: 觸發 CD — MCP Phase 3/4 + SSH MCP 完整啟用 (providers注釋更新)
CD Pipeline / build-and-deploy (push) Waiting to run
2026-04-11 19:14:17 +08:00
OG T
a2cc985f60
feat(mcp-phase3): ArgoCD MCP + Sentry MCP + 完整 Provider 註冊
...
CD Pipeline / build-and-deploy (push) Has been cancelled
ArgoCDProvider (3 工具):
- argocd_list_apps: 列出所有 App + sync/health 狀態
- argocd_get_app_status: 詳細狀態 + 問題資源清單
- argocd_get_sync_history: 最近 N 筆部署記錄
- 輸入驗證: app_name 白名單 regex
- 需 ARGOCD_API_TOKEN + ARGOCD_MCP_ENABLED=true
SentryProvider (3 工具):
- sentry_list_issues: 列出最近 Issues(狀態過濾)
- sentry_get_issue: 詳情 + stacktrace 最後 5 frames
- sentry_search_issues: PromQL 風格搜尋
- issue_id 白名單驗證(只允許純數字)
- 需 SENTRY_AUTH_TOKEN + SENTRY_MCP_ENABLED=true
providers/__init__.py: 補上 Prometheus + SSH + ArgoCD + Sentry 全部 10 個 providers
config.py: 新增 ARGOCD_URL / ARGOCD_API_TOKEN / ARGOCD_MCP_ENABLED / SENTRY_MCP_ENABLED
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-11 09:11:53 +08:00
OG T
a29e5e1de2
feat(mcp-phase1): K8s MCP 強化 — 6 個新工具 + namespace 白名單
...
MCP Phase 1 (ADR-069 Sprint B 後驗收):
k8s_get_pod_logs — Pod log 取得 (tail 1-500,支援 previous)
k8s_watch_rollout — rollout 狀態監控直到完成 (timeout 10-300s)
k8s_get_events — K8s events (可過濾 resource_name / event_type)
k8s_describe_pod — 完整 Pod describe (Conditions/Volumes/Env)
k8s_get_hpa_status — HPA 副本數/CPU utilization
k8s_get_node_conditions — Node Ready/MemoryPressure/DiskPressure
安全強化:
- ALLOWED_NAMESPACES = {"awoooi-prod"} 硬編碼白名單
- _validate_namespace() + _validate_name() 參數白名單
- 數值參數上下限夾緊 (tail 1-500, timeout 10-300s)
- event_type 只允許 Warning / Normal
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-11 03:01:38 +08:00
OG T
2af4dffcc6
fix(security): Architecture Review 修復 5 項高信心問題
...
安全修復 (P0):
1. ssh_provider: 新增 _validate_param() 白名單驗證,防止 command injection
- container_name/service/filter_name: [a-zA-Z0-9._-]{1,128}
- compose_dir: 必須以 /opt/ 或 /srv/ 開頭,禁止 ..
- domain: FQDN 白名單
- tail/port/lines: int() 轉換 + 上下限夾緊
2. ssh_provider: known_hosts=None 改為讀 SSH_MCP_KNOWN_HOSTS_FILE 環境變數
- 預設仍 None(內網快速啟動),但啟動時寫入 warning log
- 設定文件:ops/runbooks/ssh-mcp-setup.md (待補)
模組化修復 (P1):
3. km_conversion_service: 移除 import 時的 ALERT_EVENT_TYPES.update() 副作用
- ADR-071 event types 移入 alert_operation_log_repository.py 靜態集合
4. telegram_gateway: create_task() 改為 await + try/except
- 避免 DB session 關閉後的競爭條件
- KM 轉換失敗記錄 warning log,不中斷主流程
5. km_conversion_service: 新增頂層 try/except,錯誤一律 error log 後 re-raise
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-11 02:50:26 +08:00
OG T
6351e9a0e9
feat(mcp-phase2): MCP Phase 2 — Prometheus MCP + SSH MCP + alert labels
...
CD Pipeline / build-and-deploy (push) Successful in 13m37s
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Successful in 35s
MCP-2b: prometheus_provider.py
- prometheus_query (PromQL 即時查詢)
- prometheus_query_range (歷史趨勢,預設 15 分鐘)
- prometheus_get_alert_history (告警觸發歷史)
- config: PROMETHEUS_URL + PROMETHEUS_MCP_ENABLED
MCP-2a: ssh_provider.py
- 群組A 9 個只讀診斷工具 (top/disk/memory/logs/status/port/nginx/swap)
- 群組B 6 個安全操作工具 (restart/compose/systemctl/clear-log/ssl/nginx-reload)
- 四層安全守衛 (白名單/allowed_hosts/forbidden_patterns/trust_score)
- config: SSH_MCP_ENABLED + SSH_MCP_ALLOWED_HOSTS
K8s: 04-ssh-mcp-secret.example.yaml (ssh-mcp-key Secret 範本 + 建立步驟)
Alert labels: alerts-unified.yml 補充 mcp_provider/host_type/alert_category
覆蓋: HostHighCpuLoad/HostOutOfMemory/HostOutOfDiskSpace/DockerContainer*
SignOzDown/SentryDown/HarborDown/GiteaDown
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-11 02:35:35 +08:00
OG T
8724ed7dcf
fix(mcp): P1 修復 - DI 一致性 + 測試補充 + 配置優化
...
首席架構師審查 P1 修復清單:
P1-1 RAG Provider DI 模式一致性:
- 支援 rag_service 參數注入
- 新增 close() 方法
- TYPE_CHECKING 延遲導入
P1-3 RAG 測試補充:
- test_rag_provider.py (9 tests)
- DI 注入/Lazy Load/Tool Schema/驗證/Close
P1-4 Grafana Config 快取優化:
- URL/Key 首次查詢後快取
- 減少重複 settings 存取
P1-5 Embedding 維度配置化:
- MODEL_DIMENSIONS 字典 (qwen/llama/nomic)
- default_dimension 參數
- 支援更多模型
測試: 9/9 PASSED
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-29 16:23:30 +08:00
OG T
f1117a3e79
chore: trigger CD build for RAGProvider
...
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-26 18:44:30 +08:00
OG T
539f14bcd5
feat(api): Phase 13.2 #84 RAG Provider + Gemini 優先切換
...
1. 新增 RAGProvider MCP Tool Provider
- search_runbook: 語義搜尋維運手冊
- index_documents: 索引文檔
- get_index_stats: 取得索引統計
2. 更新 AI_FALLBACK_ORDER 為 Gemini 優先
- 臨時措施:Ollama CPU 推論緩慢導致 mock_fallback
- 預計 2026-03-27 切回 Ollama
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-26 18:21:24 +08:00
OG T
30153496d1
fix(api): 修復全部 lint 錯誤 (ruff --fix)
...
- Import sorting (I001)
- Unused imports (F401)
- f-string without placeholders (F541)
- Loop variable unused (B007)
- zip() strict parameter (B905)
- Exception chaining (B904)
- collections.abc imports (UP035)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-26 16:06:20 +08:00
OG T
e58da5c534
feat(api): Phase 13.2 #83 Grafana MCP Tool
...
New MCP provider for Grafana dashboard integration:
- list_dashboards: List available dashboards with filtering
- get_dashboard: Get dashboard details by UID
- get_panel_data: Query panel data via Grafana Query API
- generate_dashboard_url: Generate shareable dashboard URLs
Security:
- API key authentication (Bearer token)
- Dashboard UID validation (alphanumeric + dash/underscore)
- Read-only operations only
- 30s request timeout
Config:
- GRAFANA_URL (default: http://192.168.0.188:3000 )
- GRAFANA_API_KEY
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-26 15:36:17 +08:00
OG T
579da38b8b
feat(api): Phase 13 智能路由 + CI/CD 整合 (#74-88)
...
Phase 13.1 CI/CD Integration:
- #76 workflow_run handler for CI failure diagnosis
- #77 SignOz log query (query_logs, error_logs_summary MCP)
- #78 CIAutoRepairService with risk-based execution decisions
Phase 13.3 Smart Routing:
- #85 Intent Classifier v2.0 (rule engine + LLM fallback)
- #86 Complexity Scorer (9-dimension scoring)
- #87 AI Router v3.0 (routing decision matrix)
- #88 Token Counter (OTEL + Langfuse integration)
New files:
- services/ci_auto_repair.py (risk stratification)
- services/model_registry.py (centralized model config)
- services/token_counter.py (677 lines)
- Skill 08: Model Router Expert
- Skill 09: Strangler Pattern Expert
- ADR-023: Smart Routing Architecture
- ADR-024: API Layer Architecture
Tests:
- phase11-conversational.spec.ts (E2E tests)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-26 15:32:52 +08:00
OG T
cdbd6f0fa6
fix(api): 修復 MCP providers lint 錯誤
...
- interfaces.py: 修正 import 排序
- signoz_provider.py: 移除未使用變數
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-25 14:44:03 +08:00
OG T
643946e60c
refactor(api): ADR-015 MCP 模組化架構重構
...
## 重構內容
符合 leWOOOgo 積木化原則:
- 新增 interfaces.py: MCPToolProvider ABC 定義
- 新增 registry.py: Provider 註冊中心 (DI 模式)
- 新增 providers/: K8s, SignOz, Database 具體實作
- 重構 mcp_bridge.py: 透過 ProviderRegistry 委派執行
## 修復 Code Review 問題
- 🔴 移除 _execute_stdio logging 敏感 parameters
- 🔴 修復 conversational-view.tsx i18n 硬編碼
## 新增檔案
- apps/api/src/plugins/mcp/interfaces.py
- apps/api/src/plugins/mcp/registry.py
- apps/api/src/plugins/mcp/providers/__init__.py
- apps/api/src/plugins/mcp/providers/k8s_provider.py
- apps/api/src/plugins/mcp/providers/signoz_provider.py
- apps/api/src/plugins/mcp/providers/database_provider.py
- docs/adr/ADR-015-mcp-modular-architecture.md
- .dependency-cruiser.cjs (Phase 14.2 準備)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-25 14:31:32 +08:00
OG T
7a8f869104
feat(api): Phase 13.2 #81 PostgreSQL MCP Tool 整合
...
整合 Approval/Incident/Timeline 查詢到 MCP Bridge:
- list_approvals: 列出授權請求 (可依狀態篩選)
- get_approval: 取得單一授權詳情
- list_incidents: 列出 Incident (可依狀態篩選)
- list_timeline: 列出最近時間軸事件
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-25 12:46:52 +08:00
OG T
23b753dbec
feat(api): Phase 13.2 #79 SignOz MCP Tool 整合
...
整合真實 SignOzClient 到 MCP Bridge:
- gold_metrics: RPS + Error Rate + P99 Latency
- trace_url: 動態 Trace URL 生成
- system_metrics: CPU/Disk 系統指標
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-25 12:44:46 +08:00
OG T
d31160f4e1
feat(api): Phase 13.2 #80 Kubernetes MCP Tool real implementation
...
- Integrate real ActionExecutor instead of mock responses
- kubectl_get: Execute real kubectl get with JSON output
- kubectl_delete: Dry-run validation + actual pod deletion
- kubectl_scale: Real kubectl scale command
- kubectl_restart: Deployment rollout restart with validation
- Database query placeholder for #81
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-25 12:30:13 +08:00
OG T
749b8bc554
fix(api): 修復時區 import 排序與未使用變數 lint 錯誤
...
- 修正 import 順序 (standard → third-party → local)
- 修復 datetime/timedelta 未定義錯誤
- 移除未使用的 imports
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-25 09:26:58 +08:00
OG T
2a2dac865a
feat(api): 統一使用台北時區 UTC+8 (禁止 UTC)
...
- 新增 src/utils/timezone.py 時區工具函式
- 修改 11 個後端檔案,全部改用 now_taipei()
- 更新 HARD_RULES.md 加入時區鐵律章節
- 更新 Skills 02/04 加入時區禁令
🔴 HARD RULE: 禁止 datetime.utcnow() / datetime.now(UTC)
✅ 正確做法: from src.utils.timezone import now_taipei
Memory: feedback_timezone_taipei.md
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-25 09:08:34 +08:00
OG T
8159d22db9
refactor: ClawBot → OpenClaw 全域更名
...
- 刪除舊版 clawbot.py (已有新版 openclaw.py)
- 更新 models/ai.py 類型定義 (ClawBotAnalysisRequest/Response)
- 更新 api/v1/ai.py import 與註解
- 更新 Discord username
- 更新所有註解與文檔
依據: feedback_openclaw_naming.md (統帥 2026-03-20 正式命名決議)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-24 12:57:36 +08:00
OG T
6f049877fc
fix(lint): ruff auto-fix + lewooogo-core src 加入 git
...
- Python: ruff --fix 修復 280 個 lint 錯誤
- lewooogo-core: src/ 目錄未追蹤,導致 CI eslint 失敗
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-23 23:51:37 +08:00
OG T
7478dc0254
feat(phase6-9): Complete modular architecture and Agent Teams
...
Phase 6.4 - Modular Architecture:
- Add lewooogo-brain adapters for LLM providers
- Add lewooogo-data dual memory (Redis + PostgreSQL)
- Implement consensus engine for multi-agent decisions
- Add incident memory service for historical context
Phase 9 - Agent Teams (Claude Agent SDK):
- Add base agent class with Claude Sonnet 4 integration
- Implement action planner, blast radius, and security agents
- Add agent API endpoints and proposal workflow
- Integrate ADR-009 OpenClaw Agent Teams architecture
DevOps & CI/CD:
- Add GitHub Actions CI/CD workflows (ci.yaml, cd.yaml)
- Add pre-commit hooks and secrets baseline
- Add docker-compose for local development
- Update Kubernetes network policies
Frontend Improvements:
- Add auto-healing error boundary component
- Update i18n messages for agent features
- Enhance dual-state incident card with execution feedback
Documentation:
- Add 7 ADRs covering MCP, design system, architecture decisions
- Update ARCHITECTURE_MEMORY.md with modular design
- Add GLOBAL_RULES.md and SOUL.md for project identity
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-23 18:40:36 +08:00
OG T
196d269b92
feat: add all application source code
...
- apps/api: FastAPI backend with Dockerfile
- apps/web: Next.js frontend with Dockerfile
- apps/sensor: Signal collection agent
- packages: shared packages
Co-Authored-By: Claude <noreply@anthropic.com >
2026-03-22 18:57:44 +08:00