Files

OG T 7478dc0254 feat(phase6-9): Complete modular architecture and Agent Teams

Phase 6.4 - Modular Architecture:
- Add lewooogo-brain adapters for LLM providers
- Add lewooogo-data dual memory (Redis + PostgreSQL)
- Implement consensus engine for multi-agent decisions
- Add incident memory service for historical context

Phase 9 - Agent Teams (Claude Agent SDK):
- Add base agent class with Claude Sonnet 4 integration
- Implement action planner, blast radius, and security agents
- Add agent API endpoints and proposal workflow
- Integrate ADR-009 OpenClaw Agent Teams architecture

DevOps & CI/CD:
- Add GitHub Actions CI/CD workflows (ci.yaml, cd.yaml)
- Add pre-commit hooks and secrets baseline
- Add docker-compose for local development
- Update Kubernetes network policies

Frontend Improvements:
- Add auto-healing error boundary component
- Update i18n messages for agent features
- Enhance dual-state incident card with execution feedback

Documentation:
- Add 7 ADRs covering MCP, design system, architecture decisions
- Update ARCHITECTURE_MEMORY.md with modular design
- Add GLOBAL_RULES.md and SOUL.md for project identity

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-03-23 18:40:36 +08:00

20 KiB

Raw Blame History

ADR-009: OpenClaw Agent Teams 架構

狀態: 提議中 → 研究完成日期: 2026-03-23 決策者: 統帥 + AI 架構師 Phase: 9.1-9.2 (SDK 研究 + 架構設計)

背景

AWOOOI 的核心價值是 "AI Sees. AI Acts. You Approve."

目前 OpenClaw 是單一 AI 大腦，面對複雜告警時：

單一視角可能遺漏問題
無法並行分析多個面向
決策品質依賴單一模型

Claude 推出了 Claude Agent SDK (原 Claude Code SDK，2026-03-20 發布 v0.1.50)，支援多 Agent 協調。我們評估將此概念整合進 AWOOOI 產品。

SDK 研究結論 (2026-03-23)

項目	研究結果
SDK 名稱	`claude-agent-sdk` (PyPI)
最新版本	v0.1.50 (2026-03-20)
Python 版本	≥ 3.10
核心 API	`query()`, `ClaudeSDKClient`
Subagent 支援	✅ 原生支援 (`AgentDefinition`)
自訂 Tools	✅ `@tool` 裝飾器 + MCP 整合

決策

採用 Claude Agent SDK 實作 OpenClaw Agent Teams，升級為多專家共識決策架構。

為何選擇 Claude Agent SDK (而非自建)

考量	自建方案	Claude Agent SDK
開發時間	2-3 週	2-3 天
Tool 執行	需自行實作	內建 (Read, Edit, Bash...)
Subagent	需自行設計	原生支援
Session 管理	需自行實作	內建 (resume, fork)
MCP 整合	需橋接	原生支援
維護成本	高	低 (跟隨 Anthropic 更新)

架構設計

┌─────────────────────────────────────────────────────────────┐
│                    OpenClaw Coordinator                      │
│                    (Team Lead Agent)                         │
├─────────────────────────────────────────────────────────────┤
│                           ↓                                  │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐         │
│  │  Security   │  │ BlastRadius │  │   Action    │         │
│  │   Agent     │  │   Agent     │  │  Planner    │         │
│  │  (資安評估)  │  │ (影響範圍)  │  │ (行動方案)  │         │
│  └─────────────┘  └─────────────┘  └─────────────┘         │
│         ↓                ↓                ↓                  │
│  ┌─────────────────────────────────────────────────────┐   │
│  │              Consensus Engine                        │   │
│  │              (共識引擎 - 加權投票)                    │   │
│  └─────────────────────────────────────────────────────┘   │
│                           ↓                                  │
│  ┌─────────────────────────────────────────────────────┐   │
│  │              Final Proposal                          │   │
│  │              (統一提案 → 人類審批)                    │   │
│  └─────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘

Agent 職責

Agent	職責	輸出
Coordinator	分配任務、彙整共識	Final Proposal
SecurityAgent	評估安全風險、權限影響	Risk Score (0-10)
BlastRadiusAgent	分析影響範圍、相依服務	Affected Services List
ActionPlannerAgent	規劃修復步驟、回滾方案	Action Steps + Rollback

共識機制

class ConsensusEngine:
    weights = {
        "security": 0.4,      # 資安權重最高
        "blast_radius": 0.3,  # 影響範圍次之
        "action_plan": 0.3,   # 行動方案
    }

    def calculate_confidence(self, results: dict) -> float:
        """加權計算整體信心分數"""
        score = 0
        for agent, weight in self.weights.items():
            score += results[agent].confidence * weight
        return score

    def should_auto_approve(self, confidence: float) -> bool:
        """信心分數 > 0.9 且無高風險 → 可自動執行"""
        return confidence > 0.9 and not self.has_high_risk()

技術實作

依賴 (Phase 9.2 研究結果)

# apps/api/pyproject.toml
[project.dependencies]
# Phase 9: OpenClaw Agent Teams
claude-agent-sdk = ">=0.1.50"   # Claude Agent SDK (原 Claude Code SDK)
# Note: SDK 自動包含 Claude Code CLI，無需額外安裝

安裝指令

# 使用 uv (推薦)
uv add claude-agent-sdk

# 使用 pip
pip install claude-agent-sdk

# 驗證安裝
python -c "from claude_agent_sdk import query; print('OK')"

環境變數

# 必須
export ANTHROPIC_API_KEY=sk-ant-...

# 可選 (雲端備援，參考 ADR-006)
export CLAUDE_CODE_USE_BEDROCK=1   # AWS Bedrock
export CLAUDE_CODE_USE_VERTEX=1    # Google Vertex AI

核心類別 (使用 Claude Agent SDK)

# apps/api/src/services/openclaw_team.py

import asyncio
from claude_agent_sdk import (
    query,
    ClaudeAgentOptions,
    AgentDefinition,
    ClaudeSDKClient,
    AssistantMessage,
    ResultMessage,
)
from dataclasses import dataclass
from typing import AsyncIterator


@dataclass
class AgentResult:
    agent: str
    analysis: str
    confidence: float
    risk_score: float | None = None
    affected_services: list[str] | None = None
    action_steps: list[str] | None = None


@dataclass
class Proposal:
    incident_id: str
    summary: str
    agent_results: list[AgentResult]
    consensus_score: float
    recommended_action: str
    auto_approvable: bool


class OpenClawTeam:
    """
    使用 Claude Agent SDK 實作多專家協調分析
    符合 leWOOOgo BRAIN 積木介面
    """

    def __init__(self):
        # 定義專家 Subagents
        self.agents = {
            "security-expert": AgentDefinition(
                description="資安專家，評估安全風險與權限影響",
                prompt="""你是 AWOOOI 的資安專家。
                分析告警的安全風險，評估：
                1. 是否涉及敏感資料
                2. 是否可能被利用
                3. 權限邊界是否被突破
                輸出 JSON: {"risk_score": 0-10, "analysis": "...", "confidence": 0-1}""",
                tools=["Read", "Grep"],  # 只讀權限
            ),
            "blast-radius": AgentDefinition(
                description="影響範圍分析師，評估相依服務與影響範圍",
                prompt="""你是 AWOOOI 的影響範圍分析師。
                分析告警的影響範圍：
                1. 直接影響的服務
                2. 間接相依的服務
                3. 使用者影響人數估計
                輸出 JSON: {"affected_services": [...], "blast_radius": "low|medium|high", "confidence": 0-1}""",
                tools=["Read", "Glob", "Grep"],
            ),
            "action-planner": AgentDefinition(
                description="行動規劃師，制定修復步驟與回滾方案",
                prompt="""你是 AWOOOI 的行動規劃師。
                根據告警制定修復計畫：
                1. 立即修復步驟 (kubectl 指令)
                2. 驗證步驟
                3. 回滾方案
                注意: 所有 kubectl 必須帶 -n awoooi-prod
                輸出 JSON: {"action_steps": [...], "rollback_steps": [...], "confidence": 0-1}""",
                tools=["Read", "Glob"],
            ),
        }

        self.options = ClaudeAgentOptions(
            allowed_tools=["Read", "Glob", "Grep", "Agent"],  # Agent 用於調用 Subagent
            agents=self.agents,
            system_prompt="""你是 OpenClaw Coordinator，AWOOOI 的 AI 決策引擎。
            你的任務是協調多個專家 Agent 分析告警，彙整共識並產出最終提案。
            呼叫順序: security-expert → blast-radius → action-planner
            最終輸出統一提案供人類審批。""",
        )

    async def analyze_incident(self, incident: dict) -> Proposal:
        """
        並行呼叫多個 Subagent 分析告警
        """
        prompt = f"""
        分析以下告警並產出修復提案:

        ```json
        {json.dumps(incident, ensure_ascii=False, indent=2)}
        ```

        請依序呼叫以下 Agent:
        1. security-expert - 評估安全風險
        2. blast-radius - 分析影響範圍
        3. action-planner - 規劃修復步驟

        收集所有分析結果後，使用 ConsensusEngine 邏輯 (security 40%, blast_radius 30%, action 30%)
        計算整體信心分數，並產出最終提案。

        輸出格式:
        ```json
        {{
          "summary": "一句話摘要",
          "agent_results": [...],
          "consensus_score": 0-1,
          "recommended_action": "建議的 kubectl 指令",
          "auto_approvable": true/false (>0.9 且無高風險)
        }}
        ```
        """

        result_json = None
        async for message in query(prompt=prompt, options=self.options):
            if isinstance(message, ResultMessage):
                # 解析最終結果
                result_json = self._extract_json(message.result)

        if not result_json:
            raise ValueError("Agent Team 未能產出有效提案")

        return Proposal(
            incident_id=incident.get("id", "unknown"),
            summary=result_json.get("summary", ""),
            agent_results=self._parse_agent_results(result_json.get("agent_results", [])),
            consensus_score=result_json.get("consensus_score", 0),
            recommended_action=result_json.get("recommended_action", ""),
            auto_approvable=result_json.get("auto_approvable", False),
        )

    def _extract_json(self, text: str) -> dict:
        """從回應中提取 JSON"""
        import json
        import re
        match = re.search(r'```json\s*(.*?)\s*```', text, re.DOTALL)
        if match:
            return json.loads(match.group(1))
        return json.loads(text)

    def _parse_agent_results(self, results: list) -> list[AgentResult]:
        """解析各 Agent 結果"""
        return [
            AgentResult(
                agent=r.get("agent", "unknown"),
                analysis=r.get("analysis", ""),
                confidence=r.get("confidence", 0),
                risk_score=r.get("risk_score"),
                affected_services=r.get("affected_services"),
                action_steps=r.get("action_steps"),
            )
            for r in results
        ]

替代方案: ClaudeSDKClient (互動式)

# 適用於需要人機互動的場景
async def interactive_analysis(incident: dict):
    async with ClaudeSDKClient(options=options) as client:
        # 第一輪: 安全分析
        await client.query(f"使用 security-expert 分析: {json.dumps(incident)}")
        security_result = await collect_response(client)

        # 人類可在此介入調整

        # 第二輪: 影響範圍
        await client.query("繼續使用 blast-radius 分析影響範圍")
        blast_result = await collect_response(client)

        # ...

API 端點

# apps/api/src/routes/incidents.py

@router.post("/api/v1/incidents/{incident_id}/analyze")
async def analyze_with_team(incident_id: str):
    """使用 Agent Team 分析告警"""
    incident = await get_incident(incident_id)
    team = OpenClawTeam()
    proposal = await team.analyze_incident(incident)

    return {
        "proposal": proposal,
        "agent_results": proposal.agent_results,
        "consensus_score": proposal.consensus_score,
        "auto_approvable": proposal.auto_approvable
    }

UI 呈現

// apps/web/src/components/incident/agent-team-analysis.tsx

export function AgentTeamAnalysis({ proposal }: Props) {
  return (
    <GlassCard>
      <h3>{t('incident.teamAnalysis')}</h3>

      {/* 各 Agent 分析結果 */}
      <div className="grid grid-cols-3 gap-4">
        {proposal.agentResults.map(result => (
          <AgentResultCard
            key={result.agent}
            agent={result.agent}
            confidence={result.confidence}
            summary={result.summary}
          />
        ))}
      </div>

      {/* 共識分數 */}
      <ConsensusScore score={proposal.consensusScore} />

      {/* 最終提案 */}
      <ProposalCard proposal={proposal} />
    </GlassCard>
  )
}

對應 leWOOOgo 積木

積木類別	新增模組
BRAIN	`SecurityAgent`
BRAIN	`BlastRadiusAgent`
BRAIN	`ActionPlannerAgent`
BRAIN	`CoordinatorAgent`
BRAIN	`ConsensusEngine`

後果

優點

多視角分析 - 不同專家 Agent 各司其職
共識決策 - 加權投票提高決策品質
可解釋性 - 每個 Agent 的分析過程透明
彈性擴展 - 可新增更多專家 Agent
差異化 - 競品無此功能

缺點

成本增加 - 多 Agent 呼叫增加 API 費用
延遲增加 - 並行分析仍需等待最慢的 Agent
複雜度 - 共識機制需要調優

風險

風險	緩解措施
API 成本爆炸	設定 Token 上限、快取策略
Agent 意見衝突	共識引擎加權投票
SDK 不穩定	先用 Anthropic SDK 模擬

與 leWOOOgo 整合 (ADR-003)

OpenClaw Agent Teams 作為 BRAIN 積木 整合進 leWOOOgo 架構：

┌─────────────────────────────────────────────────────────────────┐
│                      leWOOOgo Engine                             │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│   🧱 INPUT ──────→ 🧠 BRAIN ──────────────→ 📢 OUTPUT          │
│   (Prometheus)      │                         (Telegram)        │
│                     │                                           │
│              ┌──────┴──────┐                                    │
│              │ OpenClawTeam │  ← NEW: Agent Teams              │
│              │  (SDK-based) │                                   │
│              └──────┬──────┘                                    │
│                     │                                           │
│              ┌──────┴──────┐                                    │
│              │ 🔧 ACTION   │                                    │
│              │ K8sExecutor │                                    │
│              └─────────────┘                                    │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

BRAIN 積木介面實作

# packages/lewooogo-brain/src/openclaw_team_plugin.py

from lewooogo_core.interfaces import AgentProvider, AgentInput, AgentOutput


class OpenClawTeamPlugin(AgentProvider):
    """
    leWOOOgo BRAIN 積木: OpenClaw Agent Teams
    符合 ADR-003 定義的 AgentProvider 介面
    """

    id = "openclaw-agent-team"
    name = "OpenClaw Agent Team"
    version = "0.1.0"
    category = "BRAIN"

    def __init__(self):
        self.team = OpenClawTeam()

    async def initialize(self) -> None:
        # 驗證 API Key
        assert os.environ.get("ANTHROPIC_API_KEY"), "Missing ANTHROPIC_API_KEY"

    async def process(self, input: AgentInput) -> AgentOutput:
        proposal = await self.team.analyze_incident(input.payload)
        return AgentOutput(
            result=proposal,
            confidence=proposal.consensus_score,
            metadata={"agent_count": 3, "sdk_version": "0.1.50"},
        )

    def get_capabilities(self) -> list[str]:
        return [
            "security-analysis",
            "blast-radius-analysis",
            "action-planning",
            "consensus-decision",
        ]

    async def health_check(self) -> dict:
        return {"status": "healthy", "sdk": "claude-agent-sdk"}

    async def shutdown(self) -> None:
        pass

與 ADR-006 整合 (AI 備援)

Agent Teams 整合現有 AI Fallback 策略：

優先級 1: Ollama (本地) → 簡單告警走 Ollama
優先級 2: Claude Agent SDK → 複雜告警走 Agent Teams
優先級 3: Gemini API → SDK 失敗時備援
優先級 4: 靜態回應

路由邏輯

class OpenClawRouter:
    async def route(self, incident: dict) -> Proposal:
        # 根據告警複雜度選擇處理器
        if self._is_simple_alert(incident):
            # 簡單告警: Ollama 足夠
            return await self.ollama_handler.analyze(incident)
        else:
            # 複雜告警: 使用 Agent Teams
            try:
                return await self.agent_team.analyze_incident(incident)
            except ClaudeSDKError:
                # SDK 失敗，降級到 Gemini
                return await self.gemini_fallback.analyze(incident)

    def _is_simple_alert(self, incident: dict) -> bool:
        # 判斷邏輯: P3/P4 且影響單一服務 → 簡單
        severity = incident.get("severity", "P3")
        affected = incident.get("affected_services", [])
        return severity in ["P3", "P4"] and len(affected) <= 1

實作計劃 (更新版)

Phase	內容	狀態	預估
9.1	ADR 審核 + SDK 研究	✅ 完成	0.5 天
9.2	SDK 整合 + POC	🔜 下一步	1 天
9.3	3 專家 Agent 實作		2 天
9.4	ConsensusEngine + leWOOOgo 整合		1.5 天
9.5	API 端點 + UI 呈現		1.5 天
9.6	測試 + 文檔 + ADR-006 整合		1 天

總計: 7.5 天 (原估 10 天，因 SDK 簡化減少)

Phase 9.2 POC 驗證項目

# 1. 安裝 SDK
cd apps/api && uv add claude-agent-sdk

# 2. 建立測試腳本
cat > scripts/test-agent-team.py << 'EOF'
import asyncio
from claude_agent_sdk import query, ClaudeAgentOptions, AgentDefinition

async def main():
    # 簡單 Subagent 測試
    options = ClaudeAgentOptions(
        allowed_tools=["Agent"],
        agents={
            "test-agent": AgentDefinition(
                description="測試 Agent",
                prompt="回答問題並回傳 JSON",
                tools=[],
            )
        },
    )

    async for msg in query(
        prompt="使用 test-agent 回答: 2+2=?",
        options=options,
    ):
        print(msg)

asyncio.run(main())
EOF

# 3. 執行測試
python scripts/test-agent-team.py

參考資料

變更記錄

日期	版本	變更	作者
2026-03-23	v0.1	初稿提議	AI 架構師
2026-03-23	v0.2	SDK 研究完成，加入具體整合方案	AI 架構師

20 KiB Raw Blame History Unescape Escape