awoooi/docs/adr/ADR-009-openclaw-agent-teams.md

# ADR-009: OpenClaw Agent Teams 架構

**狀態**: **已實作** ✅
**日期**: 2026-03-23
**實作完成**: 2026-03-24 (Phase 9.1-9.5 全部完成)
**決策者**: 統帥 + AI 架構師
**Phase**: 9.1-9.5 (SDK 研究 → 完整實作)

## 背景

AWOOOI 的核心價值是 "AI Sees. AI Acts. You Approve."

目前 OpenClaw 是單一 AI 大腦，面對複雜告警時：
- 單一視角可能遺漏問題
- 無法並行分析多個面向
- 決策品質依賴單一模型

Claude 推出了 **Claude Agent SDK** (原 Claude Code SDK，2026-03-20 發布 v0.1.50)，支援多 Agent 協調。我們評估將此概念整合進 AWOOOI 產品。

### SDK 研究結論 (2026-03-23)

| 項目 | 研究結果 |
|------|---------|
| **SDK 名稱** | `claude-agent-sdk` (PyPI) |
| **最新版本** | v0.1.50 (2026-03-20) |
| **Python 版本** | ≥ 3.10 |
| **核心 API** | `query()`, `ClaudeSDKClient` |
| **Subagent 支援** | ✅ 原生支援 (`AgentDefinition`) |
| **自訂 Tools** | ✅ `@tool` 裝飾器 + MCP 整合 |

## 決策

**採用 Claude Agent SDK 實作 OpenClaw Agent Teams，升級為多專家共識決策架構。**

### 為何選擇 Claude Agent SDK (而非自建)

| 考量 | 自建方案 | Claude Agent SDK |
|------|---------|------------------|
| 開發時間 | 2-3 週 | 2-3 天 |
| Tool 執行 | 需自行實作 | 內建 (Read, Edit, Bash...) |
| Subagent | 需自行設計 | 原生支援 |
| Session 管理 | 需自行實作 | 內建 (resume, fork) |
| MCP 整合 | 需橋接 | 原生支援 |
| 維護成本 | 高 | 低 (跟隨 Anthropic 更新) |

### 架構設計

```
┌─────────────────────────────────────────────────────────────┐
│                    OpenClaw Coordinator                      │
│                    (Team Lead Agent)                         │
├─────────────────────────────────────────────────────────────┤
│                           ↓                                  │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐         │
│  │  Security   │  │ BlastRadius │  │   Action    │         │
│  │   Agent     │  │   Agent     │  │  Planner    │         │
│  │  (資安評估)  │  │ (影響範圍)  │  │ (行動方案)  │         │
│  └─────────────┘  └─────────────┘  └─────────────┘         │
│         ↓                ↓                ↓                  │
│  ┌─────────────────────────────────────────────────────┐   │
│  │              Consensus Engine                        │   │
│  │              (共識引擎 - 加權投票)                    │   │
│  └─────────────────────────────────────────────────────┘   │
│                           ↓                                  │
│  ┌─────────────────────────────────────────────────────┐   │
│  │              Final Proposal                          │   │
│  │              (統一提案 → 人類審批)                    │   │
│  └─────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘
```

### Agent 職責

| Agent | 職責 | 輸出 |
|-------|------|------|
| **Coordinator** | 分配任務、彙整共識 | Final Proposal |
| **SecurityAgent** | 評估安全風險、權限影響 | Risk Score (0-10) |
| **BlastRadiusAgent** | 分析影響範圍、相依服務 | Affected Services List |
| **ActionPlannerAgent** | 規劃修復步驟、回滾方案 | Action Steps + Rollback |

### 共識機制

```python
class ConsensusEngine:
    weights = {
        "security": 0.4,      # 資安權重最高
        "blast_radius": 0.3,  # 影響範圍次之
        "action_plan": 0.3,   # 行動方案
    }

    def calculate_confidence(self, results: dict) -> float:
        """加權計算整體信心分數"""
        score = 0
        for agent, weight in self.weights.items():
            score += results[agent].confidence * weight
        return score

    def should_auto_approve(self, confidence: float) -> bool:
        """信心分數 > 0.9 且無高風險 → 可自動執行"""
        return confidence > 0.9 and not self.has_high_risk()
```

## 技術實作

### 依賴 (Phase 9.2 研究結果)

```toml
# apps/api/pyproject.toml
[project.dependencies]
# Phase 9: OpenClaw Agent Teams
claude-agent-sdk = ">=0.1.50"   # Claude Agent SDK (原 Claude Code SDK)
# Note: SDK 自動包含 Claude Code CLI，無需額外安裝
```

#### 安裝指令

```bash
# 使用 uv (推薦)
uv add claude-agent-sdk

# 使用 pip
pip install claude-agent-sdk

# 驗證安裝
python -c "from claude_agent_sdk import query; print('OK')"
```

#### 環境變數

```bash
# 必須
export ANTHROPIC_API_KEY=sk-ant-...

# 可選 (雲端備援，參考 ADR-006)
export CLAUDE_CODE_USE_BEDROCK=1   # AWS Bedrock
export CLAUDE_CODE_USE_VERTEX=1    # Google Vertex AI
```

### 核心類別 (使用 Claude Agent SDK)

```python
# apps/api/src/services/openclaw_team.py

import asyncio
from claude_agent_sdk import (
    query,
    ClaudeAgentOptions,
    AgentDefinition,
    ClaudeSDKClient,
    AssistantMessage,
    ResultMessage,
)
from dataclasses import dataclass
from typing import AsyncIterator


@dataclass
class AgentResult:
    agent: str
    analysis: str
    confidence: float
    risk_score: float | None = None
    affected_services: list[str] | None = None
    action_steps: list[str] | None = None


@dataclass
class Proposal:
    incident_id: str
    summary: str
    agent_results: list[AgentResult]
    consensus_score: float
    recommended_action: str
    auto_approvable: bool


class OpenClawTeam:
    """
    使用 Claude Agent SDK 實作多專家協調分析
    符合 leWOOOgo BRAIN 積木介面
    """

    def __init__(self):
        # 定義專家 Subagents
        self.agents = {
            "security-expert": AgentDefinition(
                description="資安專家，評估安全風險與權限影響",
                prompt="""你是 AWOOOI 的資安專家。
                分析告警的安全風險，評估：
                1. 是否涉及敏感資料
                2. 是否可能被利用
                3. 權限邊界是否被突破
                輸出 JSON: {"risk_score": 0-10, "analysis": "...", "confidence": 0-1}""",
                tools=["Read", "Grep"],  # 只讀權限
            ),
            "blast-radius": AgentDefinition(
                description="影響範圍分析師，評估相依服務與影響範圍",
                prompt="""你是 AWOOOI 的影響範圍分析師。
                分析告警的影響範圍：
                1. 直接影響的服務
                2. 間接相依的服務
                3. 使用者影響人數估計
                輸出 JSON: {"affected_services": [...], "blast_radius": "low|medium|high", "confidence": 0-1}""",
                tools=["Read", "Glob", "Grep"],
            ),
            "action-planner": AgentDefinition(
                description="行動規劃師，制定修復步驟與回滾方案",
                prompt="""你是 AWOOOI 的行動規劃師。
                根據告警制定修復計畫：
                1. 立即修復步驟 (kubectl 指令)
                2. 驗證步驟
                3. 回滾方案
                注意: 所有 kubectl 必須帶 -n awoooi-prod
                輸出 JSON: {"action_steps": [...], "rollback_steps": [...], "confidence": 0-1}""",
                tools=["Read", "Glob"],
            ),
        }

        self.options = ClaudeAgentOptions(
            allowed_tools=["Read", "Glob", "Grep", "Agent"],  # Agent 用於調用 Subagent
            agents=self.agents,
            system_prompt="""你是 OpenClaw Coordinator，AWOOOI 的 AI 決策引擎。
            你的任務是協調多個專家 Agent 分析告警，彙整共識並產出最終提案。
            呼叫順序: security-expert → blast-radius → action-planner
            最終輸出統一提案供人類審批。""",
        )

    async def analyze_incident(self, incident: dict) -> Proposal:
        """
        並行呼叫多個 Subagent 分析告警
        """
        prompt = f"""
        分析以下告警並產出修復提案:

        ```json
        {json.dumps(incident, ensure_ascii=False, indent=2)}
        ```

        請依序呼叫以下 Agent:
        1. security-expert - 評估安全風險
        2. blast-radius - 分析影響範圍
        3. action-planner - 規劃修復步驟

        收集所有分析結果後，使用 ConsensusEngine 邏輯 (security 40%, blast_radius 30%, action 30%)
        計算整體信心分數，並產出最終提案。

        輸出格式:
        ```json
        {{
          "summary": "一句話摘要",
          "agent_results": [...],
          "consensus_score": 0-1,
          "recommended_action": "建議的 kubectl 指令",
          "auto_approvable": true/false (>0.9 且無高風險)
        }}
        ```
        """

        result_json = None
        async for message in query(prompt=prompt, options=self.options):
            if isinstance(message, ResultMessage):
                # 解析最終結果
                result_json = self._extract_json(message.result)

        if not result_json:
            raise ValueError("Agent Team 未能產出有效提案")

        return Proposal(
            incident_id=incident.get("id", "unknown"),
            summary=result_json.get("summary", ""),
            agent_results=self._parse_agent_results(result_json.get("agent_results", [])),
            consensus_score=result_json.get("consensus_score", 0),
            recommended_action=result_json.get("recommended_action", ""),
            auto_approvable=result_json.get("auto_approvable", False),
        )

    def _extract_json(self, text: str) -> dict:
        """從回應中提取 JSON"""
        import json
        import re
        match = re.search(r'```json\s*(.*?)\s*```', text, re.DOTALL)
        if match:
            return json.loads(match.group(1))
        return json.loads(text)

    def _parse_agent_results(self, results: list) -> list[AgentResult]:
        """解析各 Agent 結果"""
        return [
            AgentResult(
                agent=r.get("agent", "unknown"),
                analysis=r.get("analysis", ""),
                confidence=r.get("confidence", 0),
                risk_score=r.get("risk_score"),
                affected_services=r.get("affected_services"),
                action_steps=r.get("action_steps"),
            )
            for r in results
        ]
```

### 替代方案: ClaudeSDKClient (互動式)

```python
# 適用於需要人機互動的場景
async def interactive_analysis(incident: dict):
    async with ClaudeSDKClient(options=options) as client:
        # 第一輪: 安全分析
        await client.query(f"使用 security-expert 分析: {json.dumps(incident)}")
        security_result = await collect_response(client)

        # 人類可在此介入調整

        # 第二輪: 影響範圍
        await client.query("繼續使用 blast-radius 分析影響範圍")
        blast_result = await collect_response(client)

        # ...
```

### API 端點

```python
# apps/api/src/routes/incidents.py

@router.post("/api/v1/incidents/{incident_id}/analyze")
async def analyze_with_team(incident_id: str):
    """使用 Agent Team 分析告警"""
    incident = await get_incident(incident_id)
    team = OpenClawTeam()
    proposal = await team.analyze_incident(incident)

    return {
        "proposal": proposal,
        "agent_results": proposal.agent_results,
        "consensus_score": proposal.consensus_score,
        "auto_approvable": proposal.auto_approvable
    }
```

### UI 呈現

```tsx
// apps/web/src/components/incident/agent-team-analysis.tsx

export function AgentTeamAnalysis({ proposal }: Props) {
  return (
    <GlassCard>
      <h3>{t('incident.teamAnalysis')}</h3>

      {/* 各 Agent 分析結果 */}
      <div className="grid grid-cols-3 gap-4">
        {proposal.agentResults.map(result => (
          <AgentResultCard
            key={result.agent}
            agent={result.agent}
            confidence={result.confidence}
            summary={result.summary}
          />
        ))}
      </div>

      {/* 共識分數 */}
      <ConsensusScore score={proposal.consensusScore} />

      {/* 最終提案 */}
      <ProposalCard proposal={proposal} />
    </GlassCard>
  )
}
```

## 對應 leWOOOgo 積木

| 積木類別 | 新增模組 |
|---------|---------|
| **BRAIN** | `SecurityAgent` |
| **BRAIN** | `BlastRadiusAgent` |
| **BRAIN** | `ActionPlannerAgent` |
| **BRAIN** | `CoordinatorAgent` |
| **BRAIN** | `ConsensusEngine` |

## 後果

### 優點

1. **多視角分析** - 不同專家 Agent 各司其職
2. **共識決策** - 加權投票提高決策品質
3. **可解釋性** - 每個 Agent 的分析過程透明
4. **彈性擴展** - 可新增更多專家 Agent
5. **差異化** - 競品無此功能

### 缺點

1. **成本增加** - 多 Agent 呼叫增加 API 費用
2. **延遲增加** - 並行分析仍需等待最慢的 Agent
3. **複雜度** - 共識機制需要調優

### 風險

| 風險 | 緩解措施 |
|------|---------|
| API 成本爆炸 | 設定 Token 上限、快取策略 |
| Agent 意見衝突 | 共識引擎加權投票 |
| SDK 不穩定 | 先用 Anthropic SDK 模擬 |

## 與 leWOOOgo 整合 (ADR-003)

OpenClaw Agent Teams 作為 **BRAIN 積木** 整合進 leWOOOgo 架構：

```
┌─────────────────────────────────────────────────────────────────┐
│                      leWOOOgo Engine                             │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│   🧱 INPUT ──────→ 🧠 BRAIN ──────────────→ 📢 OUTPUT          │
│   (Prometheus)      │                         (Telegram)        │
│                     │                                           │
│              ┌──────┴──────┐                                    │
│              │ OpenClawTeam │  ← NEW: Agent Teams              │
│              │  (SDK-based) │                                   │
│              └──────┬──────┘                                    │
│                     │                                           │
│              ┌──────┴──────┐                                    │
│              │ 🔧 ACTION   │                                    │
│              │ K8sExecutor │                                    │
│              └─────────────┘                                    │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘
```

### BRAIN 積木介面實作

```python
# packages/lewooogo-brain/src/openclaw_team_plugin.py

from lewooogo_core.interfaces import AgentProvider, AgentInput, AgentOutput


class OpenClawTeamPlugin(AgentProvider):
    """
    leWOOOgo BRAIN 積木: OpenClaw Agent Teams
    符合 ADR-003 定義的 AgentProvider 介面
    """

    id = "openclaw-agent-team"
    name = "OpenClaw Agent Team"
    version = "0.1.0"
    category = "BRAIN"

    def __init__(self):
        self.team = OpenClawTeam()

    async def initialize(self) -> None:
        # 驗證 API Key
        assert os.environ.get("ANTHROPIC_API_KEY"), "Missing ANTHROPIC_API_KEY"

    async def process(self, input: AgentInput) -> AgentOutput:
        proposal = await self.team.analyze_incident(input.payload)
        return AgentOutput(
            result=proposal,
            confidence=proposal.consensus_score,
            metadata={"agent_count": 3, "sdk_version": "0.1.50"},
        )

    def get_capabilities(self) -> list[str]:
        return [
            "security-analysis",
            "blast-radius-analysis",
            "action-planning",
            "consensus-decision",
        ]

    async def health_check(self) -> dict:
        return {"status": "healthy", "sdk": "claude-agent-sdk"}

    async def shutdown(self) -> None:
        pass
```

## 與 ADR-006 整合 (AI 備援)

Agent Teams 整合現有 AI Fallback 策略：

```
優先級 1: Ollama (本地) → 簡單告警走 Ollama
優先級 2: Claude Agent SDK → 複雜告警走 Agent Teams
優先級 3: Gemini API → SDK 失敗時備援
優先級 4: 靜態回應
```

### 路由邏輯

```python
class OpenClawRouter:
    async def route(self, incident: dict) -> Proposal:
        # 根據告警複雜度選擇處理器
        if self._is_simple_alert(incident):
            # 簡單告警: Ollama 足夠
            return await self.ollama_handler.analyze(incident)
        else:
            # 複雜告警: 使用 Agent Teams
            try:
                return await self.agent_team.analyze_incident(incident)
            except ClaudeSDKError:
                # SDK 失敗，降級到 Gemini
                return await self.gemini_fallback.analyze(incident)

    def _is_simple_alert(self, incident: dict) -> bool:
        # 判斷邏輯: P3/P4 且影響單一服務 → 簡單
        severity = incident.get("severity", "P3")
        affected = incident.get("affected_services", [])
        return severity in ["P3", "P4"] and len(affected) <= 1
```

## 實作驗收 (2026-03-24)

| Phase | 內容 | 狀態 | 實際 |
|-------|------|------|------|
| 9.1 | ADR 審核 + SDK 研究 | ✅ 完成 | 0.5 天 |
| 9.2 | SDK 整合 + POC | ✅ 完成 | 1 天 |
| 9.3 | 3 專家 Agent 實作 | ✅ 完成 | 1.5 天 |
| 9.4 | ConsensusEngine + leWOOOgo 整合 | ✅ 完成 | 1 天 |
| 9.5 | API 端點 + UI 呈現 | ✅ 完成 | 1 天 |

**總計: 5 天** (比預估提前完成)

### 實作檔案

```
apps/api/src/agents/
├── __init__.py
├── security_agent.py      # SecurityAgent
├── blast_radius_agent.py  # BlastRadiusAgent
├── action_planner_agent.py # ActionPlannerAgent
└── coordinator.py         # OpenClawCoordinator

apps/api/src/services/
├── consensus_engine.py    # ConsensusEngine
└── openclaw_team.py       # OpenClawTeam 整合
```

### Phase 9.2 POC 驗證項目

```bash
# 1. 安裝 SDK
cd apps/api && uv add claude-agent-sdk

# 2. 建立測試腳本
cat > scripts/test-agent-team.py << 'EOF'
import asyncio
from claude_agent_sdk import query, ClaudeAgentOptions, AgentDefinition

async def main():
    # 簡單 Subagent 測試
    options = ClaudeAgentOptions(
        allowed_tools=["Agent"],
        agents={
            "test-agent": AgentDefinition(
                description="測試 Agent",
                prompt="回答問題並回傳 JSON",
                tools=[],
            )
        },
    )

    async for msg in query(
        prompt="使用 test-agent 回答: 2+2=?",
        options=options,
    ):
        print(msg)

asyncio.run(main())
EOF

# 3. 執行測試
python scripts/test-agent-team.py
```

## 相關 ADR

- ADR-003: leWOOOgo 模組架構 (BRAIN 積木)
- ADR-006: AI 備援策略 (Fallback 整合)
- ADR-001: MCP Protocol 採用 (SDK 支援 MCP)

## 參考資料

- [Claude Agent SDK Overview](https://platform.claude.com/docs/en/agent-sdk/overview)
- [Claude Agent SDK Quickstart](https://platform.claude.com/docs/en/agent-sdk/quickstart)
- [Claude Agent SDK Python GitHub](https://github.com/anthropics/claude-agent-sdk-python)
- [Claude Agent SDK Demos](https://github.com/anthropics/claude-agent-sdk-demos)
- [LangGraph + Claude Agent SDK 整合](https://www.mager.co/blog/2026-03-07-langgraph-claude-agent-sdk-ultimate-guide/)

## 變更記錄

| 日期 | 版本 | 變更 | 作者 |
|------|------|------|------|
| 2026-03-23 | v0.1 | 初稿提議 | AI 架構師 |
| 2026-03-23 | v0.2 | SDK 研究完成，加入具體整合方案 | AI 架構師 |
| 2026-04-24 | v0.3 | 由 [ADR-095](ADR-095-12agent-sdk-integration.md) 擴充 ConsensusEngine weights 為 12-agent（security=0.4 鎖定，其餘 0.6 重分配），原 3 核心 agent 保留 | 12-Agent 全景分析 |