Files

OG T 24e35fee1b docs(adr): ADR-016 智能路由 (Smart Routing)

新增 Intent + Complexity → Model Selection 架構決策文件，
作為 ADR-006 (AI Fallback) 的補充，實現動態模型選擇。

- IntentClassifier: 關鍵字優先 + LLM 備援
- ComplexityScorer: 規則引擎加權評分
- AIRouter: 整合路由決策

Phase 13.3 #85-87

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-03-26 10:13:05 +08:00

9.4 KiB

Raw Blame History

ADR-016: 智能路由 (Smart Routing)

狀態: 已接受日期: 2026-03-26 決策者: CTO, CEO 相關: ADR-006 (AI Fallback Strategy)

背景

問題描述

ADR-006 建立了 AI 降級備援策略，採用固定順序 (Ollama → Gemini → Claude) 進行 Fallback。然而，這種「一刀切」的方式存在以下問題：

資源浪費: 簡單查詢（如「Pod 狀態如何？」）使用 7B 模型，過度消耗本地資源
回應延遲: 複雜告警分析卻只用本地模型，品質不足需人工介入
成本失控: 無法根據任務價值動態選擇模型，高價值任務與低價值任務同等對待

解決方向

需要一個智能路由機制，根據請求的意圖和複雜度自動選擇最適合的模型，而非固定備援順序。

決策

核心策略

Intent (意圖) + Complexity (複雜度) → Model Selection (模型選擇)

架構圖

User Request / Alert
         │
         ▼
┌─────────────────────┐
│  Intent Classifier  │  ← 關鍵字優先 (0ms) + Qwen 1B 備援 (< 100ms)
│  (意圖分類器)        │
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│  Complexity Scorer  │  ← 純規則引擎 (< 10ms)
│  (複雜度評分器)      │
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│     AI Router       │  ← 動態模型選擇
│   (智能路由器)       │
└──────────┬──────────┘
           │
     ┌─────┴─────┐
     │           │
     ▼           ▼
┌─────────┐ ┌─────────┐
│  Local  │ │  Cloud  │
│ Ollama  │ │ Gemini  │
│ 3B/7B   │ │ Claude  │
└─────────┘ └─────────┘

技術實作

1. IntentClassifier (意圖分類器)

檔案: apps/api/src/services/intent_classifier.py

意圖類型

IntentType	說明	典型場景
`ALERT_TRIAGE`	告警分流	高負載告警、OOM、服務 Down
`DEPLOYMENT`	部署操作	kubectl apply、helm upgrade
`QUERY`	資訊查詢	狀態查詢、日誌查看
`MAINTENANCE`	維運操作	重啟、擴容、回滾
`CODE_REVIEW`	程式碼審查	PR Review、Commit 分析
`UNKNOWN`	未知	無法分類的請求

分類策略 (兩階段)

async def classify(self, text: str) -> IntentType:
    # 階段 1: 關鍵字快速匹配 (0ms)
    intent = self._keyword_match(text.lower())
    if intent != IntentType.UNKNOWN:
        return intent

    # 階段 2: LLM 分類 (< 100ms) - 備援
    # 使用 Qwen 2.5 1B 小模型
    return IntentType.UNKNOWN

關鍵字映射

INTENT_KEYWORDS = {
    IntentType.ALERT_TRIAGE: [
        "alert", "告警", "警報", "異常", "error", "critical",
        "high cpu", "memory", "oom", "crash", "down",
    ],
    IntentType.DEPLOYMENT: [
        "deploy", "部署", "rollout", "kubectl apply", "helm",
        "版本", "upgrade", "更新", "上線",
    ],
    IntentType.QUERY: [
        "查詢", "狀態", "status", "describe", "get", "list",
        "日誌", "log", "哪個", "什麼", "how many", "多少",
    ],
    # ...
}

2. ComplexityScorer (複雜度評分器)

檔案: apps/api/src/services/complexity_scorer.py

評分維度與權重

特徵	權重	說明
`service_count`	+0.5/服務	每增加一個受影響服務
`metric_count`	+0.3/指標	每增加一個相關指標
`code_analysis`	+1.5	需要程式碼分析
`cross_system`	+1.0	跨系統問題
`has_history`	-0.5	有歷史案例 (降低複雜度)
`critical_severity`	+1.0	CRITICAL 嚴重程度

複雜度 → 模型映射

分數	複雜度等級	推薦模型	理由
1	簡單	`llama3.2:3b`	快速回應，資源節省
2	中等	`qwen2.5:7b-instruct`	平衡品質與延遲
3	複雜	`qwen2.5:7b-instruct`	需要較強推理能力
4	高複雜	`gemini`	需要雲端能力
5	極複雜	`claude`	最強模型處理

3. AIRouter (智能路由器)

檔案: apps/api/src/services/ai_router.py

路由決策流程

async def route(self, text: str, context: dict | None = None) -> RoutingDecision:
    # Step 1: 意圖分類
    intent = await self._intent_classifier.classify(text)

    # Step 2: 複雜度評分
    complexity = self._complexity_scorer.score(context or {})

    # Step 3: 模型選擇 (考慮意圖覆寫)
    model, reason = self._select_model(intent, complexity)

    # Step 4: 建立 Fallback 列表
    fallbacks = self._build_fallback_list(model)

    return RoutingDecision(
        model=model,
        intent=intent,
        complexity=complexity,
        reason=reason,
        fallback_models=fallbacks,
    )

意圖強制覆寫

某些意圖無論複雜度如何，都強制使用特定模型：

INTENT_OVERRIDES = {
    IntentType.CODE_REVIEW: "qwen2.5:7b-instruct",  # 程式碼審查需要強模型
    IntentType.QUERY: "llama3.2:3b",  # 查詢用快速模型
    # 其他依複雜度選擇
}

Fallback 順序

當選定模型失敗時，依序嘗試：

FALLBACK_ORDER = [
    "qwen2.5:7b-instruct",  # 本地主力
    "llama3.2:3b",          # 本地備援
    "gemini",               # 雲端備援
    "claude",               # 最終備援
]

與 ADR-006 的關係

面向	ADR-006 (固定備援)	ADR-016 (智能路由)
觸發時機	服務失敗時	每個請求
選擇邏輯	固定順序	意圖 + 複雜度
目標	高可用性	資源最佳化
狀態	仍然有效	補充 ADR-006

協作關係:

ADR-016 先根據請求特性選擇「最適模型」
若該模型失敗，ADR-006 的 Fallback 機制接手
Circuit Breaker 與 Token 配額監控仍依 ADR-006 執行

Request → [ADR-016: 智能選擇] → Model A
                                    │
                              失敗 ▼
          [ADR-006: Fallback] → Model B → Model C → Static Response

使用範例

範例 1: 簡單查詢

from src.services.ai_router import get_ai_router

router = get_ai_router()
decision = await router.route(
    text="awoooi-api Pod 狀態如何？",
    context={"affected_services": ["awoooi-api"]}
)

# 結果:
# decision.model = "llama3.2:3b"
# decision.intent = IntentType.QUERY
# decision.complexity.score = 1
# decision.reason = "意圖 query 強制使用 llama3.2:3b"

範例 2: 複雜告警

decision = await router.route(
    text="CRITICAL: awoooi-api OOM Killed，worker 也連不上 Redis",
    context={
        "affected_services": ["awoooi-api", "awoooi-worker", "redis"],
        "metrics": ["memory_usage", "connection_errors", "restart_count"],
        "severity": "CRITICAL",
        "cross_system": True,
    }
)

# 結果:
# decision.model = "gemini"
# decision.intent = IntentType.ALERT_TRIAGE
# decision.complexity.score = 4
# decision.reason = "高複雜度告警 (score=4) → 使用雲端模型"
# decision.fallback_models = ["qwen2.5:7b-instruct", "llama3.2:3b", "claude"]

範例 3: 程式碼審查

decision = await router.route(
    text="請審查這個 PR 的變更",
    context={"requires_code_analysis": True}
)

# 結果:
# decision.model = "qwen2.5:7b-instruct"
# decision.intent = IntentType.CODE_REVIEW
# decision.complexity.score = 3
# decision.reason = "意圖 code_review 強制使用 qwen2.5:7b-instruct"

監控指標

指標	說明	目標
`intent_classification_latency`	意圖分類延遲	< 100ms
`complexity_scoring_latency`	複雜度評分延遲	< 10ms
`model_selection_distribution`	模型選擇分佈	監控
`routing_decision_reason`	路由決策原因統計	監控

影響

正面

資源優化: 簡單任務用小模型，節省 GPU 資源
品質提升: 複雜任務自動升級到強模型
成本可控: 只有真正需要時才使用雲端 API
延遲改善: 簡單查詢回應更快

需要注意

分類準確度: 關鍵字匹配可能有邊界情況
複雜度評估: 規則可能需要持續調優
模型可用性: 需配合 ADR-006 的 Circuit Breaker

風險

錯誤分類可能導致品質不如預期
需持續收集數據優化分類規則

變更記錄

日期	版本	變更	作者
2026-03-26	v1.0	初版建立 (Phase 13.3 #85-87)	CTO

參考

ADR-006: AI 降級備援策略
Phase 13.3 Smart Router 設計
apps/api/src/services/intent_classifier.py
apps/api/src/services/complexity_scorer.py
apps/api/src/services/ai_router.py

此 ADR 記錄智能路由機制的決策過程與實作規範，作為 ADR-006 的補充。

9.4 KiB Raw Blame History Unescape Escape