Files
awoooi/docs/adr/ADR-006-ai-fallback-strategy.md
OG T 7478dc0254 feat(phase6-9): Complete modular architecture and Agent Teams
Phase 6.4 - Modular Architecture:
- Add lewooogo-brain adapters for LLM providers
- Add lewooogo-data dual memory (Redis + PostgreSQL)
- Implement consensus engine for multi-agent decisions
- Add incident memory service for historical context

Phase 9 - Agent Teams (Claude Agent SDK):
- Add base agent class with Claude Sonnet 4 integration
- Implement action planner, blast radius, and security agents
- Add agent API endpoints and proposal workflow
- Integrate ADR-009 OpenClaw Agent Teams architecture

DevOps & CI/CD:
- Add GitHub Actions CI/CD workflows (ci.yaml, cd.yaml)
- Add pre-commit hooks and secrets baseline
- Add docker-compose for local development
- Update Kubernetes network policies

Frontend Improvements:
- Add auto-healing error boundary component
- Update i18n messages for agent features
- Enhance dual-state incident card with execution feedback

Documentation:
- Add 7 ADRs covering MCP, design system, architecture decisions
- Update ARCHITECTURE_MEMORY.md with modular design
- Add GLOBAL_RULES.md and SOUL.md for project identity

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 18:40:36 +08:00

9.8 KiB
Raw Blame History

ADR-006: AI 降級備援策略

狀態: 已接受 日期: 2026-03-20 決策者: CTO, CEO


背景

AWOOOI 系統高度依賴 AI 功能,包括 AI Copilot、異常偵測、智能摘要等。 當本地 Ollama 服務不可用時,需要有完善的降級備援機制,同時嚴格控制雲端 API 成本。

CEO 指示 #2

雲端備援的順序採 Gemini API 然後才是 Claude API,並且要有效控管、監控, API Token 使用的數量,要搭配告警機制,避免費用暴增!


決策

1. AI 服務優先順序

┌─────────────────────────────────────────────────────┐
│ 優先級 1: Ollama (本地)                              │
│ 192.168.0.188:11434                                 │
│ 成本: $0 / 延遲: ~200ms                             │
└─────────────────────────────────────────────────────┘
                        │ 失敗
                        ▼
┌─────────────────────────────────────────────────────┐
│ 優先級 2: Gemini API (雲端備援 - 優先)               │
│ 成本: ~$0.001/1K tokens                             │
└─────────────────────────────────────────────────────┘
                        │ 失敗
                        ▼
┌─────────────────────────────────────────────────────┐
│ 優先級 3: Claude API (雲端備援 - 次選)               │
│ 成本: ~$0.008/1K tokens                             │
└─────────────────────────────────────────────────────┘
                        │ 失敗
                        ▼
┌─────────────────────────────────────────────────────┐
│ 優先級 4: 靜態回應 (完全降級)                        │
│ 返回預設訊息,不調用任何 AI                          │
└─────────────────────────────────────────────────────┘

2. Circuit Breaker 機制

# apps/api/app/services/ai/circuit_breaker.py

from enum import Enum
from datetime import datetime, timedelta
import asyncio

class CircuitState(Enum):
    CLOSED = "closed"      # 正常
    OPEN = "open"          # 熔斷
    HALF_OPEN = "half_open" # 試探

class CircuitBreaker:
    def __init__(
        self,
        failure_threshold: int = 5,      # 連續失敗 5 次觸發熔斷
        recovery_timeout: int = 60,       # 熔斷後 60 秒嘗試恢復
        half_open_max_calls: int = 3      # 半開狀態最多 3 次試探
    ):
        self.state = CircuitState.CLOSED
        self.failure_count = 0
        self.last_failure_time = None
        self.half_open_calls = 0
        # ...

    async def call(self, func, *args, **kwargs):
        if self.state == CircuitState.OPEN:
            if self._should_try_recovery():
                self.state = CircuitState.HALF_OPEN
                self.half_open_calls = 0
            else:
                raise CircuitOpenError("Circuit is open")

        try:
            result = await func(*args, **kwargs)
            self._on_success()
            return result
        except Exception as e:
            self._on_failure()
            raise

3. Token 使用量監控與告警

每日/每月配額

API 每日上限 每月上限 告警閾值
Gemini 100K tokens 2M tokens 70%
Claude 50K tokens 500K tokens 70%

監控 Schema

# apps/api/app/models/ai_usage.py

class AIUsageLog(Base):
    __tablename__ = "ai_usage_logs"

    id = Column(UUID, primary_key=True)
    provider = Column(String)  # ollama, gemini, claude
    model = Column(String)
    input_tokens = Column(Integer)
    output_tokens = Column(Integer)
    latency_ms = Column(Integer)
    success = Column(Boolean)
    error_message = Column(String, nullable=True)
    user_id = Column(UUID, ForeignKey("users.id"))
    created_at = Column(DateTime, default=func.now())

告警規則

# k8s/monitoring/prometheus/ai-usage-alerts.yaml
groups:
  - name: ai-usage-alerts
    rules:
      # Gemini 每日用量 70% 告警
      - alert: GeminiDailyUsageWarning
        expr: |
          sum(increase(ai_tokens_total{provider="gemini"}[24h])) > 70000
        labels:
          severity: warning
        annotations:
          summary: "Gemini API 每日用量已達 70%"
          description: "今日 Gemini 已使用 {{ $value | humanize }} tokens"

      # Gemini 每日用量 90% 嚴重告警
      - alert: GeminiDailyUsageCritical
        expr: |
          sum(increase(ai_tokens_total{provider="gemini"}[24h])) > 90000
        labels:
          severity: critical
        annotations:
          summary: "Gemini API 每日用量已達 90%,即將觸發限流"

      # Claude 每日用量 70% 告警
      - alert: ClaudeDailyUsageWarning
        expr: |
          sum(increase(ai_tokens_total{provider="claude"}[24h])) > 35000
        labels:
          severity: warning
        annotations:
          summary: "Claude API 每日用量已達 70%"

      # Ollama 連續失敗告警
      - alert: OllamaConsecutiveFailures
        expr: |
          increase(ai_requests_failed_total{provider="ollama"}[5m]) > 5
        labels:
          severity: critical
        annotations:
          summary: "Ollama 服務可能已離線"
          description: "過去 5 分鐘 Ollama 請求失敗超過 5 次,已啟動雲端備援"

      # 月度預算 50% 提醒
      - alert: MonthlyAIBudgetWarning
        expr: |
          (
            sum(increase(ai_tokens_total{provider="gemini"}[30d])) * 0.000001 +
            sum(increase(ai_tokens_total{provider="claude"}[30d])) * 0.000008
          ) > 5
        labels:
          severity: warning
        annotations:
          summary: "AI 月度成本已達 $5 (預算 50%)"

4. 成本預估

場景 Gemini Claude 月成本
正常 (Ollama 100%) 0 0 $0
輕度降級 (Ollama 90%, Gemini 10%) ~200K 0 ~$0.20
中度降級 (Gemini 80%, Claude 20%) ~1.6M ~400K ~$5
完全降級 (雲端 100%) ~2M ~500K ~$10

5. 實作範例

# apps/api/app/services/ai/router.py

from app.services.ai.providers import OllamaProvider, GeminiProvider, ClaudeProvider
from app.services.ai.circuit_breaker import CircuitBreaker
from app.services.ai.usage_tracker import UsageTracker

class AIRouter:
    def __init__(self):
        self.ollama = OllamaProvider()
        self.gemini = GeminiProvider()
        self.claude = ClaudeProvider()

        self.ollama_circuit = CircuitBreaker(failure_threshold=3, recovery_timeout=30)
        self.gemini_circuit = CircuitBreaker(failure_threshold=5, recovery_timeout=60)
        self.claude_circuit = CircuitBreaker(failure_threshold=5, recovery_timeout=60)

        self.usage_tracker = UsageTracker()

    async def generate(self, prompt: str, user_id: str) -> AIResponse:
        providers = [
            ("ollama", self.ollama, self.ollama_circuit),
            ("gemini", self.gemini, self.gemini_circuit),
            ("claude", self.claude, self.claude_circuit),
        ]

        for name, provider, circuit in providers:
            # 檢查配額
            if name in ["gemini", "claude"]:
                if await self.usage_tracker.is_quota_exceeded(name):
                    logger.warning(f"{name} daily quota exceeded, skipping")
                    continue

            try:
                result = await circuit.call(provider.generate, prompt)

                # 記錄使用量
                await self.usage_tracker.log(
                    provider=name,
                    input_tokens=result.input_tokens,
                    output_tokens=result.output_tokens,
                    user_id=user_id,
                    success=True
                )

                return result

            except CircuitOpenError:
                logger.info(f"{name} circuit is open, trying next provider")
                continue
            except Exception as e:
                logger.error(f"{name} failed: {e}, trying next provider")
                await self.usage_tracker.log(
                    provider=name,
                    error_message=str(e),
                    user_id=user_id,
                    success=False
                )
                continue

        # 所有 AI 都失敗,返回靜態回應
        return AIResponse(
            content="抱歉AI 服務暫時不可用。請稍後再試,或聯繫管理員。",
            provider="fallback",
            tokens=0
        )

6. Dashboard 展示

AI 用量監控面板應顯示:

  • 今日各 Provider 使用量 (tokens)
  • 本月累計成本 (USD)
  • 各 Provider 健康狀態 (綠/黃/紅)
  • 平均延遲 (ms)
  • 成功率 (%)

影響

正面

  • 確保 AI 功能高可用性
  • 成本可控、可預測
  • 即時告警避免帳單爆炸

需要注意

  • 需維護多個 API Key
  • 不同 Provider 回應品質可能有差異
  • 需要處理 API 格式轉換

變更記錄

日期 版本 變更 作者
2026-03-20 v1.0 初版建立 CTO

此 ADR 記錄 AI 降級備援策略的決策過程與實作規範。