Commit Graph

21 Commits

Author SHA1 Message Date
OG T
a6e6f389e2 chore: 清理觸發 CD 的臨時注釋
Some checks failed
CD Pipeline / build-and-deploy (push) Failing after 8m9s
2026-04-11 19:15:04 +08:00
OG T
40d6536b62 ci: 觸發 CD — MCP Phase 3/4 + SSH MCP 完整啟用 (providers注釋更新)
Some checks are pending
CD Pipeline / build-and-deploy (push) Waiting to run
2026-04-11 19:14:17 +08:00
OG T
a2cc985f60 feat(mcp-phase3): ArgoCD MCP + Sentry MCP + 完整 Provider 註冊
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
ArgoCDProvider (3 工具):
  - argocd_list_apps: 列出所有 App + sync/health 狀態
  - argocd_get_app_status: 詳細狀態 + 問題資源清單
  - argocd_get_sync_history: 最近 N 筆部署記錄
  - 輸入驗證: app_name 白名單 regex
  - 需 ARGOCD_API_TOKEN + ARGOCD_MCP_ENABLED=true

SentryProvider (3 工具):
  - sentry_list_issues: 列出最近 Issues(狀態過濾)
  - sentry_get_issue: 詳情 + stacktrace 最後 5 frames
  - sentry_search_issues: PromQL 風格搜尋
  - issue_id 白名單驗證(只允許純數字)
  - 需 SENTRY_AUTH_TOKEN + SENTRY_MCP_ENABLED=true

providers/__init__.py: 補上 Prometheus + SSH + ArgoCD + Sentry 全部 10 個 providers
config.py: 新增 ARGOCD_URL / ARGOCD_API_TOKEN / ARGOCD_MCP_ENABLED / SENTRY_MCP_ENABLED

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-11 09:11:53 +08:00
OG T
a29e5e1de2 feat(mcp-phase1): K8s MCP 強化 — 6 個新工具 + namespace 白名單
MCP Phase 1 (ADR-069 Sprint B 後驗收):
  k8s_get_pod_logs    — Pod log 取得 (tail 1-500,支援 previous)
  k8s_watch_rollout   — rollout 狀態監控直到完成 (timeout 10-300s)
  k8s_get_events      — K8s events (可過濾 resource_name / event_type)
  k8s_describe_pod    — 完整 Pod describe (Conditions/Volumes/Env)
  k8s_get_hpa_status  — HPA 副本數/CPU utilization
  k8s_get_node_conditions — Node Ready/MemoryPressure/DiskPressure

安全強化:
  - ALLOWED_NAMESPACES = {"awoooi-prod"} 硬編碼白名單
  - _validate_namespace() + _validate_name() 參數白名單
  - 數值參數上下限夾緊 (tail 1-500, timeout 10-300s)
  - event_type 只允許 Warning / Normal

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-11 03:01:38 +08:00
OG T
2af4dffcc6 fix(security): Architecture Review 修復 5 項高信心問題
安全修復 (P0):
1. ssh_provider: 新增 _validate_param() 白名單驗證,防止 command injection
   - container_name/service/filter_name: [a-zA-Z0-9._-]{1,128}
   - compose_dir: 必須以 /opt/ 或 /srv/ 開頭,禁止 ..
   - domain: FQDN 白名單
   - tail/port/lines: int() 轉換 + 上下限夾緊
2. ssh_provider: known_hosts=None 改為讀 SSH_MCP_KNOWN_HOSTS_FILE 環境變數
   - 預設仍 None(內網快速啟動),但啟動時寫入 warning log
   - 設定文件:ops/runbooks/ssh-mcp-setup.md (待補)

模組化修復 (P1):
3. km_conversion_service: 移除 import 時的 ALERT_EVENT_TYPES.update() 副作用
   - ADR-071 event types 移入 alert_operation_log_repository.py 靜態集合
4. telegram_gateway: create_task() 改為 await + try/except
   - 避免 DB session 關閉後的競爭條件
   - KM 轉換失敗記錄 warning log,不中斷主流程
5. km_conversion_service: 新增頂層 try/except,錯誤一律 error log 後 re-raise

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-11 02:50:26 +08:00
OG T
6351e9a0e9 feat(mcp-phase2): MCP Phase 2 — Prometheus MCP + SSH MCP + alert labels
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 13m37s
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Successful in 35s
MCP-2b: prometheus_provider.py
  - prometheus_query (PromQL 即時查詢)
  - prometheus_query_range (歷史趨勢,預設 15 分鐘)
  - prometheus_get_alert_history (告警觸發歷史)
  - config: PROMETHEUS_URL + PROMETHEUS_MCP_ENABLED

MCP-2a: ssh_provider.py
  - 群組A 9 個只讀診斷工具 (top/disk/memory/logs/status/port/nginx/swap)
  - 群組B 6 個安全操作工具 (restart/compose/systemctl/clear-log/ssl/nginx-reload)
  - 四層安全守衛 (白名單/allowed_hosts/forbidden_patterns/trust_score)
  - config: SSH_MCP_ENABLED + SSH_MCP_ALLOWED_HOSTS

K8s: 04-ssh-mcp-secret.example.yaml (ssh-mcp-key Secret 範本 + 建立步驟)

Alert labels: alerts-unified.yml 補充 mcp_provider/host_type/alert_category
  覆蓋: HostHighCpuLoad/HostOutOfMemory/HostOutOfDiskSpace/DockerContainer*
        SignOzDown/SentryDown/HarborDown/GiteaDown

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-11 02:35:35 +08:00
OG T
8724ed7dcf fix(mcp): P1 修復 - DI 一致性 + 測試補充 + 配置優化
首席架構師審查 P1 修復清單:

P1-1 RAG Provider DI 模式一致性:
- 支援 rag_service 參數注入
- 新增 close() 方法
- TYPE_CHECKING 延遲導入

P1-3 RAG 測試補充:
- test_rag_provider.py (9 tests)
- DI 注入/Lazy Load/Tool Schema/驗證/Close

P1-4 Grafana Config 快取優化:
- URL/Key 首次查詢後快取
- 減少重複 settings 存取

P1-5 Embedding 維度配置化:
- MODEL_DIMENSIONS 字典 (qwen/llama/nomic)
- default_dimension 參數
- 支援更多模型

測試: 9/9 PASSED

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 16:23:30 +08:00
OG T
f1117a3e79 chore: trigger CD build for RAGProvider
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-26 18:44:30 +08:00
OG T
539f14bcd5 feat(api): Phase 13.2 #84 RAG Provider + Gemini 優先切換
1. 新增 RAGProvider MCP Tool Provider
   - search_runbook: 語義搜尋維運手冊
   - index_documents: 索引文檔
   - get_index_stats: 取得索引統計

2. 更新 AI_FALLBACK_ORDER 為 Gemini 優先
   - 臨時措施:Ollama CPU 推論緩慢導致 mock_fallback
   - 預計 2026-03-27 切回 Ollama

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-26 18:21:24 +08:00
OG T
30153496d1 fix(api): 修復全部 lint 錯誤 (ruff --fix)
- Import sorting (I001)
- Unused imports (F401)
- f-string without placeholders (F541)
- Loop variable unused (B007)
- zip() strict parameter (B905)
- Exception chaining (B904)
- collections.abc imports (UP035)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-26 16:06:20 +08:00
OG T
e58da5c534 feat(api): Phase 13.2 #83 Grafana MCP Tool
New MCP provider for Grafana dashboard integration:
- list_dashboards: List available dashboards with filtering
- get_dashboard: Get dashboard details by UID
- get_panel_data: Query panel data via Grafana Query API
- generate_dashboard_url: Generate shareable dashboard URLs

Security:
- API key authentication (Bearer token)
- Dashboard UID validation (alphanumeric + dash/underscore)
- Read-only operations only
- 30s request timeout

Config:
- GRAFANA_URL (default: http://192.168.0.188:3000)
- GRAFANA_API_KEY

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-26 15:36:17 +08:00
OG T
579da38b8b feat(api): Phase 13 智能路由 + CI/CD 整合 (#74-88)
Phase 13.1 CI/CD Integration:
- #76 workflow_run handler for CI failure diagnosis
- #77 SignOz log query (query_logs, error_logs_summary MCP)
- #78 CIAutoRepairService with risk-based execution decisions

Phase 13.3 Smart Routing:
- #85 Intent Classifier v2.0 (rule engine + LLM fallback)
- #86 Complexity Scorer (9-dimension scoring)
- #87 AI Router v3.0 (routing decision matrix)
- #88 Token Counter (OTEL + Langfuse integration)

New files:
- services/ci_auto_repair.py (risk stratification)
- services/model_registry.py (centralized model config)
- services/token_counter.py (677 lines)
- Skill 08: Model Router Expert
- Skill 09: Strangler Pattern Expert
- ADR-023: Smart Routing Architecture
- ADR-024: API Layer Architecture

Tests:
- phase11-conversational.spec.ts (E2E tests)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-26 15:32:52 +08:00
OG T
cdbd6f0fa6 fix(api): 修復 MCP providers lint 錯誤
- interfaces.py: 修正 import 排序
- signoz_provider.py: 移除未使用變數

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-25 14:44:03 +08:00
OG T
643946e60c refactor(api): ADR-015 MCP 模組化架構重構
## 重構內容

符合 leWOOOgo 積木化原則:
- 新增 interfaces.py: MCPToolProvider ABC 定義
- 新增 registry.py: Provider 註冊中心 (DI 模式)
- 新增 providers/: K8s, SignOz, Database 具體實作
- 重構 mcp_bridge.py: 透過 ProviderRegistry 委派執行

## 修復 Code Review 問題

- 🔴 移除 _execute_stdio logging 敏感 parameters
- 🔴 修復 conversational-view.tsx i18n 硬編碼

## 新增檔案

- apps/api/src/plugins/mcp/interfaces.py
- apps/api/src/plugins/mcp/registry.py
- apps/api/src/plugins/mcp/providers/__init__.py
- apps/api/src/plugins/mcp/providers/k8s_provider.py
- apps/api/src/plugins/mcp/providers/signoz_provider.py
- apps/api/src/plugins/mcp/providers/database_provider.py
- docs/adr/ADR-015-mcp-modular-architecture.md
- .dependency-cruiser.cjs (Phase 14.2 準備)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-25 14:31:32 +08:00
OG T
7a8f869104 feat(api): Phase 13.2 #81 PostgreSQL MCP Tool 整合
整合 Approval/Incident/Timeline 查詢到 MCP Bridge:
- list_approvals: 列出授權請求 (可依狀態篩選)
- get_approval: 取得單一授權詳情
- list_incidents: 列出 Incident (可依狀態篩選)
- list_timeline: 列出最近時間軸事件

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-25 12:46:52 +08:00
OG T
23b753dbec feat(api): Phase 13.2 #79 SignOz MCP Tool 整合
整合真實 SignOzClient 到 MCP Bridge:
- gold_metrics: RPS + Error Rate + P99 Latency
- trace_url: 動態 Trace URL 生成
- system_metrics: CPU/Disk 系統指標

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-25 12:44:46 +08:00
OG T
d31160f4e1 feat(api): Phase 13.2 #80 Kubernetes MCP Tool real implementation
- Integrate real ActionExecutor instead of mock responses
- kubectl_get: Execute real kubectl get with JSON output
- kubectl_delete: Dry-run validation + actual pod deletion
- kubectl_scale: Real kubectl scale command
- kubectl_restart: Deployment rollout restart with validation
- Database query placeholder for #81

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-25 12:30:13 +08:00
OG T
749b8bc554 fix(api): 修復時區 import 排序與未使用變數 lint 錯誤
- 修正 import 順序 (standard → third-party → local)
- 修復 datetime/timedelta 未定義錯誤
- 移除未使用的 imports

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-25 09:26:58 +08:00
OG T
2a2dac865a feat(api): 統一使用台北時區 UTC+8 (禁止 UTC)
- 新增 src/utils/timezone.py 時區工具函式
- 修改 11 個後端檔案,全部改用 now_taipei()
- 更新 HARD_RULES.md 加入時區鐵律章節
- 更新 Skills 02/04 加入時區禁令

🔴 HARD RULE: 禁止 datetime.utcnow() / datetime.now(UTC)
 正確做法: from src.utils.timezone import now_taipei

Memory: feedback_timezone_taipei.md

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-25 09:08:34 +08:00
OG T
6f049877fc fix(lint): ruff auto-fix + lewooogo-core src 加入 git
- Python: ruff --fix 修復 280 個 lint 錯誤
- lewooogo-core: src/ 目錄未追蹤,導致 CI eslint 失敗

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 23:51:37 +08:00
OG T
196d269b92 feat: add all application source code
- apps/api: FastAPI backend with Dockerfile
- apps/web: Next.js frontend with Dockerfile
- apps/sensor: Signal collection agent
- packages: shared packages

Co-Authored-By: Claude <noreply@anthropic.com>
2026-03-22 18:57:44 +08:00