新增 Intent + Complexity → Model Selection 架構決策文件, 作為 ADR-006 (AI Fallback) 的補充,實現動態模型選擇。 - IntentClassifier: 關鍵字優先 + LLM 備援 - ComplexityScorer: 規則引擎加權評分 - AIRouter: 整合路由決策 Phase 13.3 #85-87 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
338 lines
9.4 KiB
Markdown
338 lines
9.4 KiB
Markdown
# ADR-016: 智能路由 (Smart Routing)
|
||
|
||
> **狀態**: 已接受
|
||
> **日期**: 2026-03-26
|
||
> **決策者**: CTO, CEO
|
||
> **相關**: ADR-006 (AI Fallback Strategy)
|
||
|
||
---
|
||
|
||
## 背景
|
||
|
||
### 問題描述
|
||
|
||
ADR-006 建立了 AI 降級備援策略,採用固定順序 (Ollama → Gemini → Claude) 進行 Fallback。然而,這種「一刀切」的方式存在以下問題:
|
||
|
||
1. **資源浪費**: 簡單查詢(如「Pod 狀態如何?」)使用 7B 模型,過度消耗本地資源
|
||
2. **回應延遲**: 複雜告警分析卻只用本地模型,品質不足需人工介入
|
||
3. **成本失控**: 無法根據任務價值動態選擇模型,高價值任務與低價值任務同等對待
|
||
|
||
### 解決方向
|
||
|
||
需要一個**智能路由機制**,根據請求的**意圖**和**複雜度**自動選擇最適合的模型,而非固定備援順序。
|
||
|
||
---
|
||
|
||
## 決策
|
||
|
||
### 核心策略
|
||
|
||
```
|
||
Intent (意圖) + Complexity (複雜度) → Model Selection (模型選擇)
|
||
```
|
||
|
||
### 架構圖
|
||
|
||
```
|
||
User Request / Alert
|
||
│
|
||
▼
|
||
┌─────────────────────┐
|
||
│ Intent Classifier │ ← 關鍵字優先 (0ms) + Qwen 1B 備援 (< 100ms)
|
||
│ (意圖分類器) │
|
||
└──────────┬──────────┘
|
||
│
|
||
▼
|
||
┌─────────────────────┐
|
||
│ Complexity Scorer │ ← 純規則引擎 (< 10ms)
|
||
│ (複雜度評分器) │
|
||
└──────────┬──────────┘
|
||
│
|
||
▼
|
||
┌─────────────────────┐
|
||
│ AI Router │ ← 動態模型選擇
|
||
│ (智能路由器) │
|
||
└──────────┬──────────┘
|
||
│
|
||
┌─────┴─────┐
|
||
│ │
|
||
▼ ▼
|
||
┌─────────┐ ┌─────────┐
|
||
│ Local │ │ Cloud │
|
||
│ Ollama │ │ Gemini │
|
||
│ 3B/7B │ │ Claude │
|
||
└─────────┘ └─────────┘
|
||
```
|
||
|
||
---
|
||
|
||
## 技術實作
|
||
|
||
### 1. IntentClassifier (意圖分類器)
|
||
|
||
**檔案**: `apps/api/src/services/intent_classifier.py`
|
||
|
||
#### 意圖類型
|
||
|
||
| IntentType | 說明 | 典型場景 |
|
||
|------------|------|----------|
|
||
| `ALERT_TRIAGE` | 告警分流 | 高負載告警、OOM、服務 Down |
|
||
| `DEPLOYMENT` | 部署操作 | kubectl apply、helm upgrade |
|
||
| `QUERY` | 資訊查詢 | 狀態查詢、日誌查看 |
|
||
| `MAINTENANCE` | 維運操作 | 重啟、擴容、回滾 |
|
||
| `CODE_REVIEW` | 程式碼審查 | PR Review、Commit 分析 |
|
||
| `UNKNOWN` | 未知 | 無法分類的請求 |
|
||
|
||
#### 分類策略 (兩階段)
|
||
|
||
```python
|
||
async def classify(self, text: str) -> IntentType:
|
||
# 階段 1: 關鍵字快速匹配 (0ms)
|
||
intent = self._keyword_match(text.lower())
|
||
if intent != IntentType.UNKNOWN:
|
||
return intent
|
||
|
||
# 階段 2: LLM 分類 (< 100ms) - 備援
|
||
# 使用 Qwen 2.5 1B 小模型
|
||
return IntentType.UNKNOWN
|
||
```
|
||
|
||
#### 關鍵字映射
|
||
|
||
```python
|
||
INTENT_KEYWORDS = {
|
||
IntentType.ALERT_TRIAGE: [
|
||
"alert", "告警", "警報", "異常", "error", "critical",
|
||
"high cpu", "memory", "oom", "crash", "down",
|
||
],
|
||
IntentType.DEPLOYMENT: [
|
||
"deploy", "部署", "rollout", "kubectl apply", "helm",
|
||
"版本", "upgrade", "更新", "上線",
|
||
],
|
||
IntentType.QUERY: [
|
||
"查詢", "狀態", "status", "describe", "get", "list",
|
||
"日誌", "log", "哪個", "什麼", "how many", "多少",
|
||
],
|
||
# ...
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
### 2. ComplexityScorer (複雜度評分器)
|
||
|
||
**檔案**: `apps/api/src/services/complexity_scorer.py`
|
||
|
||
#### 評分維度與權重
|
||
|
||
| 特徵 | 權重 | 說明 |
|
||
|------|------|------|
|
||
| `service_count` | +0.5/服務 | 每增加一個受影響服務 |
|
||
| `metric_count` | +0.3/指標 | 每增加一個相關指標 |
|
||
| `code_analysis` | +1.5 | 需要程式碼分析 |
|
||
| `cross_system` | +1.0 | 跨系統問題 |
|
||
| `has_history` | -0.5 | 有歷史案例 (降低複雜度) |
|
||
| `critical_severity` | +1.0 | CRITICAL 嚴重程度 |
|
||
|
||
#### 複雜度 → 模型映射
|
||
|
||
| 分數 | 複雜度等級 | 推薦模型 | 理由 |
|
||
|------|-----------|----------|------|
|
||
| 1 | 簡單 | `llama3.2:3b` | 快速回應,資源節省 |
|
||
| 2 | 中等 | `qwen2.5:7b-instruct` | 平衡品質與延遲 |
|
||
| 3 | 複雜 | `qwen2.5:7b-instruct` | 需要較強推理能力 |
|
||
| 4 | 高複雜 | `gemini` | 需要雲端能力 |
|
||
| 5 | 極複雜 | `claude` | 最強模型處理 |
|
||
|
||
---
|
||
|
||
### 3. AIRouter (智能路由器)
|
||
|
||
**檔案**: `apps/api/src/services/ai_router.py`
|
||
|
||
#### 路由決策流程
|
||
|
||
```python
|
||
async def route(self, text: str, context: dict | None = None) -> RoutingDecision:
|
||
# Step 1: 意圖分類
|
||
intent = await self._intent_classifier.classify(text)
|
||
|
||
# Step 2: 複雜度評分
|
||
complexity = self._complexity_scorer.score(context or {})
|
||
|
||
# Step 3: 模型選擇 (考慮意圖覆寫)
|
||
model, reason = self._select_model(intent, complexity)
|
||
|
||
# Step 4: 建立 Fallback 列表
|
||
fallbacks = self._build_fallback_list(model)
|
||
|
||
return RoutingDecision(
|
||
model=model,
|
||
intent=intent,
|
||
complexity=complexity,
|
||
reason=reason,
|
||
fallback_models=fallbacks,
|
||
)
|
||
```
|
||
|
||
#### 意圖強制覆寫
|
||
|
||
某些意圖無論複雜度如何,都強制使用特定模型:
|
||
|
||
```python
|
||
INTENT_OVERRIDES = {
|
||
IntentType.CODE_REVIEW: "qwen2.5:7b-instruct", # 程式碼審查需要強模型
|
||
IntentType.QUERY: "llama3.2:3b", # 查詢用快速模型
|
||
# 其他依複雜度選擇
|
||
}
|
||
```
|
||
|
||
#### Fallback 順序
|
||
|
||
當選定模型失敗時,依序嘗試:
|
||
|
||
```python
|
||
FALLBACK_ORDER = [
|
||
"qwen2.5:7b-instruct", # 本地主力
|
||
"llama3.2:3b", # 本地備援
|
||
"gemini", # 雲端備援
|
||
"claude", # 最終備援
|
||
]
|
||
```
|
||
|
||
---
|
||
|
||
## 與 ADR-006 的關係
|
||
|
||
| 面向 | ADR-006 (固定備援) | ADR-016 (智能路由) |
|
||
|------|-------------------|-------------------|
|
||
| **觸發時機** | 服務失敗時 | 每個請求 |
|
||
| **選擇邏輯** | 固定順序 | 意圖 + 複雜度 |
|
||
| **目標** | 高可用性 | 資源最佳化 |
|
||
| **狀態** | 仍然有效 | 補充 ADR-006 |
|
||
|
||
**協作關係**:
|
||
- ADR-016 先根據請求特性選擇「最適模型」
|
||
- 若該模型失敗,ADR-006 的 Fallback 機制接手
|
||
- Circuit Breaker 與 Token 配額監控仍依 ADR-006 執行
|
||
|
||
```
|
||
Request → [ADR-016: 智能選擇] → Model A
|
||
│
|
||
失敗 ▼
|
||
[ADR-006: Fallback] → Model B → Model C → Static Response
|
||
```
|
||
|
||
---
|
||
|
||
## 使用範例
|
||
|
||
### 範例 1: 簡單查詢
|
||
|
||
```python
|
||
from src.services.ai_router import get_ai_router
|
||
|
||
router = get_ai_router()
|
||
decision = await router.route(
|
||
text="awoooi-api Pod 狀態如何?",
|
||
context={"affected_services": ["awoooi-api"]}
|
||
)
|
||
|
||
# 結果:
|
||
# decision.model = "llama3.2:3b"
|
||
# decision.intent = IntentType.QUERY
|
||
# decision.complexity.score = 1
|
||
# decision.reason = "意圖 query 強制使用 llama3.2:3b"
|
||
```
|
||
|
||
### 範例 2: 複雜告警
|
||
|
||
```python
|
||
decision = await router.route(
|
||
text="CRITICAL: awoooi-api OOM Killed,worker 也連不上 Redis",
|
||
context={
|
||
"affected_services": ["awoooi-api", "awoooi-worker", "redis"],
|
||
"metrics": ["memory_usage", "connection_errors", "restart_count"],
|
||
"severity": "CRITICAL",
|
||
"cross_system": True,
|
||
}
|
||
)
|
||
|
||
# 結果:
|
||
# decision.model = "gemini"
|
||
# decision.intent = IntentType.ALERT_TRIAGE
|
||
# decision.complexity.score = 4
|
||
# decision.reason = "高複雜度告警 (score=4) → 使用雲端模型"
|
||
# decision.fallback_models = ["qwen2.5:7b-instruct", "llama3.2:3b", "claude"]
|
||
```
|
||
|
||
### 範例 3: 程式碼審查
|
||
|
||
```python
|
||
decision = await router.route(
|
||
text="請審查這個 PR 的變更",
|
||
context={"requires_code_analysis": True}
|
||
)
|
||
|
||
# 結果:
|
||
# decision.model = "qwen2.5:7b-instruct"
|
||
# decision.intent = IntentType.CODE_REVIEW
|
||
# decision.complexity.score = 3
|
||
# decision.reason = "意圖 code_review 強制使用 qwen2.5:7b-instruct"
|
||
```
|
||
|
||
---
|
||
|
||
## 監控指標
|
||
|
||
| 指標 | 說明 | 目標 |
|
||
|------|------|------|
|
||
| `intent_classification_latency` | 意圖分類延遲 | < 100ms |
|
||
| `complexity_scoring_latency` | 複雜度評分延遲 | < 10ms |
|
||
| `model_selection_distribution` | 模型選擇分佈 | 監控 |
|
||
| `routing_decision_reason` | 路由決策原因統計 | 監控 |
|
||
|
||
---
|
||
|
||
## 影響
|
||
|
||
### 正面
|
||
|
||
- **資源優化**: 簡單任務用小模型,節省 GPU 資源
|
||
- **品質提升**: 複雜任務自動升級到強模型
|
||
- **成本可控**: 只有真正需要時才使用雲端 API
|
||
- **延遲改善**: 簡單查詢回應更快
|
||
|
||
### 需要注意
|
||
|
||
- **分類準確度**: 關鍵字匹配可能有邊界情況
|
||
- **複雜度評估**: 規則可能需要持續調優
|
||
- **模型可用性**: 需配合 ADR-006 的 Circuit Breaker
|
||
|
||
### 風險
|
||
|
||
- 錯誤分類可能導致品質不如預期
|
||
- 需持續收集數據優化分類規則
|
||
|
||
---
|
||
|
||
## 變更記錄
|
||
|
||
| 日期 | 版本 | 變更 | 作者 |
|
||
|------|------|------|------|
|
||
| 2026-03-26 | v1.0 | 初版建立 (Phase 13.3 #85-87) | CTO |
|
||
|
||
---
|
||
|
||
## 參考
|
||
|
||
- [ADR-006: AI 降級備援策略](./ADR-006-ai-fallback-strategy.md)
|
||
- [Phase 13.3 Smart Router 設計](../../.claude/projects/-Users-ogt-awoooi/memory/project_phase13_3_smart_router.md)
|
||
- `apps/api/src/services/intent_classifier.py`
|
||
- `apps/api/src/services/complexity_scorer.py`
|
||
- `apps/api/src/services/ai_router.py`
|
||
|
||
---
|
||
|
||
*此 ADR 記錄智能路由機制的決策過程與實作規範,作為 ADR-006 的補充。*
|