docs: ADR-025 CI/CD AI 整合架構 + Skill 07 更新
- ADR-025: 文檔化 Phase 13.1 CI/CD AI 整合架構決策 - GitHub Webhook 事件驅動流程 - 風險分級執行決策 (AUTO/TELEGRAM/APPROVAL/BLOCKED) - SignOz Log 整合 - Skill 07 v1.3: 新增 Grafana MCP + SignOz query_logs Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -10,16 +10,18 @@
|
||||
|
||||
| 欄位 | 值 |
|
||||
|------|-----|
|
||||
| **版本** | v1.1 |
|
||||
| **版本** | v1.3 |
|
||||
| **建立日期** | 2026-03-25 23:30 (台北) |
|
||||
| **建立者** | Claude Code |
|
||||
| **最後修改** | 2026-03-26 14:20 (台北) |
|
||||
| **最後修改** | 2026-03-26 18:00 (台北) |
|
||||
| **修改者** | Claude Code |
|
||||
|
||||
### 變更紀錄
|
||||
|
||||
| 版本 | 日期 | 執行者 | 變更內容 |
|
||||
|------|------|--------|----------|
|
||||
| v1.3 | 2026-03-26 18:00 | Claude Code | 新增 Grafana MCP (#83) + SignOz query_logs |
|
||||
| v1.2 | 2026-03-26 23:30 | Claude Code | 新增 Filesystem MCP Tool (#82 已完成) |
|
||||
| v1.1 | 2026-03-26 14:20 | Claude Code | 更新 MCP Tool 狀態 (#79/#80/#81 已完成) |
|
||||
| v1.0 | 2026-03-25 23:30 | Claude Code | 初始建立 - Phase 13.2 Tool 整合專家 |
|
||||
|
||||
@@ -39,12 +41,12 @@ Phase 13.2 Tool 實作 (P0 最優先):
|
||||
|
||||
| Tool | 狀態 | 位置 | 工作項 |
|
||||
|------|------|------|--------|
|
||||
| **Kubernetes** | ✅ 真實 | `mcp_bridge.py` | #80 ✅ |
|
||||
| **Database** | ✅ 真實 | `mcp_bridge.py` | #81 ✅ |
|
||||
| **SignOz** | ✅ 真實 | `mcp_bridge.py` | #79 ✅ |
|
||||
| Filesystem | 🟡 Mock | `mcp_bridge.py` | #82 |
|
||||
| Grafana | ❌ 缺失 | - | #83 |
|
||||
| 維運手冊 RAG | ❌ 缺失 | - | #84 |
|
||||
| **Kubernetes** | ✅ 真實 | `providers/k8s_provider.py` | #80 ✅ |
|
||||
| **Database** | ✅ 真實 | `providers/database_provider.py` | #81 ✅ |
|
||||
| **SignOz** | ✅ 真實 | `providers/signoz_provider.py` | #79 ✅ |
|
||||
| **Filesystem** | ✅ 真實 | `providers/filesystem_provider.py` | #82 ✅ |
|
||||
| **Grafana** | ✅ 真實 | `providers/grafana_provider.py` | #83 ✅ |
|
||||
| 維運手冊 RAG | 📋 設計完成 | - | #84 (待實作) |
|
||||
|
||||
### 已完成 Tool 功能
|
||||
|
||||
@@ -52,6 +54,14 @@ Phase 13.2 Tool 實作 (P0 最優先):
|
||||
- `gold_metrics`: RPS, Error Rate, P99 Latency, AI Success Rate
|
||||
- `trace_url`: 生成 Trace 查詢 URL
|
||||
- `system_metrics`: 系統層級指標
|
||||
- `query_logs`: 日誌查詢 (服務/級別/搜尋) - Phase 13.1 #77
|
||||
- `error_logs_summary`: 錯誤摘要統計 - Phase 13.1 #77
|
||||
|
||||
**Grafana MCP (#83)**:
|
||||
- `list_dashboards`: 列出儀表板 (支援過濾)
|
||||
- `get_dashboard`: 取得儀表板詳情 (UID)
|
||||
- `get_panel_data`: 查詢面板數據
|
||||
- `generate_dashboard_url`: 生成可分享 URL
|
||||
|
||||
**PostgreSQL MCP (#81)**:
|
||||
- `list_approvals`: 依狀態/incident 過濾
|
||||
@@ -59,6 +69,11 @@ Phase 13.2 Tool 實作 (P0 最優先):
|
||||
- `list_incidents`: 列出活躍事件
|
||||
- `list_timeline`: 時間線事件
|
||||
|
||||
**Filesystem MCP (#82)**:
|
||||
- `read_file`: 讀取文件內容 (支援 tail_lines)
|
||||
- `list_directory`: 列出目錄 (支援 glob pattern)
|
||||
- `search_in_file`: 搜尋文件內容 (正則表達式)
|
||||
|
||||
**Kubernetes MCP (#80)**:
|
||||
- `kubectl_get`: 整合真實 ActionExecutor
|
||||
- `kubectl_restart`: Pod/Deployment 重啟
|
||||
|
||||
153
docs/adr/ADR-025-cicd-ai-integration.md
Normal file
153
docs/adr/ADR-025-cicd-ai-integration.md
Normal file
@@ -0,0 +1,153 @@
|
||||
# ADR-025: CI/CD AI 整合架構 (Phase 13.1)
|
||||
|
||||
| 項目 | 內容 |
|
||||
|------|------|
|
||||
| **狀態** | ✅ 已採用 |
|
||||
| **日期** | 2026-03-26 |
|
||||
| **決策者** | 首席架構師 + 統帥 |
|
||||
| **Phase** | Phase 13.1 |
|
||||
|
||||
## 背景
|
||||
|
||||
Phase 13.1 需要將 GitHub CI/CD 事件整合到 AWOOOI + OpenClaw 系統,實現:
|
||||
1. PR/Push 自動代碼審查
|
||||
2. CI 失敗自動診斷
|
||||
3. 風險分級自動修復
|
||||
|
||||
## 決策
|
||||
|
||||
採用 **事件驅動 + 風險分級** 架構:
|
||||
|
||||
```
|
||||
GitHub Events
|
||||
│
|
||||
├─ pull_request ──→ OpenClaw Code Review
|
||||
├─ push ──────────→ OpenClaw Push Review
|
||||
└─ workflow_run ──→ CI Failure Diagnosis
|
||||
│
|
||||
▼
|
||||
┌─────────────────┐
|
||||
│ CIAutoRepair │
|
||||
│ Service │
|
||||
└─────────────────┘
|
||||
│
|
||||
┌───────────────┼───────────────┐
|
||||
▼ ▼ ▼
|
||||
┌───────────┐ ┌───────────┐ ┌───────────┐
|
||||
│ LOW │ │ MEDIUM │ │ HIGH/CRIT │
|
||||
│ 自動執行 │ │ TG 確認 │ │ Approval │
|
||||
└───────────┘ └───────────┘ └───────────┘
|
||||
```
|
||||
|
||||
## 架構元件
|
||||
|
||||
### 1. GitHub Webhook Handler (github_webhook.py)
|
||||
|
||||
```python
|
||||
# 支援事件類型
|
||||
- pull_request: opened, synchronize, reopened
|
||||
- push: refs/heads/main, refs/heads/master
|
||||
- workflow_run: completed + failure/timed_out
|
||||
|
||||
# 安全機制
|
||||
- HMAC-SHA256 簽名驗證 (X-Hub-Signature-256)
|
||||
- 倉庫白名單 (GITHUB_ALLOWED_REPOS)
|
||||
- Fail-closed 策略
|
||||
```
|
||||
|
||||
### 2. CI Auto-Repair Service (ci_auto_repair.py)
|
||||
|
||||
```python
|
||||
# 風險等級決策
|
||||
class ExecutionDecision(Enum):
|
||||
AUTO_EXECUTE = "auto_execute" # LOW 風險
|
||||
TELEGRAM_CONFIRM = "telegram_confirm" # MEDIUM 風險
|
||||
APPROVAL_REQUIRED = "approval_required" # HIGH 風險
|
||||
BLOCKED = "blocked" # CRITICAL
|
||||
|
||||
# 整合 Phase 13.3 智能路由
|
||||
- Intent Classifier: 判斷修復意圖
|
||||
- Complexity Scorer: 評估複雜度
|
||||
```
|
||||
|
||||
### 3. SignOz Log Query (signoz_client.py)
|
||||
|
||||
```python
|
||||
# 新增方法
|
||||
- get_logs(): 查詢日誌 (服務/級別/搜尋)
|
||||
- get_error_logs_summary(): 錯誤摘要統計
|
||||
|
||||
# MCP Tools
|
||||
- query_logs: 通用日誌查詢
|
||||
- error_logs_summary: 錯誤摘要
|
||||
```
|
||||
|
||||
## 資料流
|
||||
|
||||
```
|
||||
1. GitHub Webhook → AWOOOI API
|
||||
├─ HMAC 驗證
|
||||
└─ 倉庫白名單檢查
|
||||
|
||||
2. 背景任務處理
|
||||
├─ 收集失敗資訊
|
||||
├─ 查詢 SignOz 日誌
|
||||
└─ 呼叫 OpenClaw 診斷
|
||||
|
||||
3. 風險評估
|
||||
├─ Intent Classification
|
||||
├─ Complexity Scoring
|
||||
└─ 生成修復建議
|
||||
|
||||
4. 執行決策
|
||||
├─ LOW → 自動執行
|
||||
├─ MEDIUM → Telegram 快速確認
|
||||
├─ HIGH → 建立 Approval
|
||||
└─ CRITICAL → 禁止自動修復
|
||||
|
||||
5. 結果通知
|
||||
├─ Redis 儲存 (7 天 TTL)
|
||||
└─ Telegram 通知
|
||||
```
|
||||
|
||||
## 錯誤類型映射
|
||||
|
||||
| 錯誤類型 | 修復動作 | 風險等級 |
|
||||
|---------|---------|---------|
|
||||
| build | clear_cache, fix_dependency | LOW-MEDIUM |
|
||||
| test | retry_workflow, fix_config | LOW-MEDIUM |
|
||||
| lint | retry_workflow | LOW |
|
||||
| deploy | rollback_commit, fix_config | HIGH |
|
||||
| timeout | restart_runner, scale_resource | LOW-MEDIUM |
|
||||
| runner | restart_runner | LOW |
|
||||
|
||||
## 配置
|
||||
|
||||
```python
|
||||
# config.py
|
||||
GITHUB_WEBHOOK_SECRET: str # Webhook 簽名密鑰
|
||||
GITHUB_ALLOWED_REPOS: str # 白名單倉庫
|
||||
|
||||
# 環境變數 (K8s Secret)
|
||||
GITHUB_WEBHOOK_SECRET=<secret>
|
||||
GITHUB_ALLOWED_REPOS=owner/repo1,owner/repo2
|
||||
```
|
||||
|
||||
## 後果
|
||||
|
||||
### 正面
|
||||
- CI 失敗自動診斷,減少人工介入
|
||||
- 風險分級保護,防止誤操作
|
||||
- 整合智能路由,決策更精準
|
||||
|
||||
### 負面
|
||||
- OpenClaw 依賴 (需要 AI 服務可用)
|
||||
- 額外延遲 (診斷需要 AI 處理時間)
|
||||
|
||||
## 相關文件
|
||||
|
||||
- ADR-023: Smart Routing Architecture
|
||||
- ADR-024: API Layer Architecture
|
||||
- Skill 07: Tool Integration Expert
|
||||
- Skill 08: Model Router Expert
|
||||
- `feedback_cicd_aiops_patterns.md`
|
||||
Reference in New Issue
Block a user