feat(p21): Caller × Context 動態 Model Router + ADR-034
All checks were successful
CD Pipeline / deploy (push) Successful in 2m45s
All checks were successful
CD Pipeline / deploy (push) Successful in 2m45s
Operation Ollama-First v5.0 / Phase 21 — 動態路由治理 services/llm_model_router.py (160+ 行) - 純規則引擎,零 LLM 成本(Python lambda predicate) - 6 caller × 12 條路由規則: • sales_copy: 短文 < 100 字 → gemma3:4b / 長文 → llama3.1:8b • hermes_analyst: gap > 20% 或銷量 < -50% → qwen3:14b / 預設 hermes3 • aider_heal: diff > 200 行 → qwen2.5-coder:32b / 預設 7b • openclaw_qa: query > 200 字或 multi_turn → qwen3:14b / 預設 qwen2.5:7b-instruct • ppt_vision: minicpm 不健康 → llava / 預設 minicpm-v • ea_engine: require_chain_of_thought → deepseek-r1:14b / 預設 Gemini - feature flag MODEL_ROUTER_ENABLED 預設 OFF(向下相容) - 失敗安全:predicate 例外 skip 到下一條 tests/test_llm_model_router.py (18 tests 全綠) - T1 flag OFF 不路由 - T2 sales_copy 短/長文路由 - T3 hermes 簡單/複雜 SKU - T4 aider_heal 簡單/重構 - T5 ppt_vision 主備援 - T6 ea_engine CoT 路由 - T7 predicate 例外容錯 - T8 utility 函數 ADR-034 — Caller × Context 動態 Model Router - 6 caller 路由規則對應表 - 5 段否決方案(LLM-based / hardcode / 配置檔 / 統一升級) - Phase 21.2-21.6 戰略性遷移計畫 - V1-V3 驗收 SQL(caller 整合後 model 分布觀察) 關聯:Primary + Secondary 兩台 GCP 已備齊 10 模型(67GB 對稱)支援所有 路由規則;caller 整合可分階段進行(Phase 21.2-21.5)。 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
176
docs/adr/ADR-034-dynamic-model-router.md
Normal file
176
docs/adr/ADR-034-dynamic-model-router.md
Normal file
@@ -0,0 +1,176 @@
|
||||
# ADR-034: Caller × Context 動態 Model Router
|
||||
|
||||
- **Status**: Accepted (待整合到 caller 後 Active)
|
||||
- **Date**: 2026-05-04
|
||||
- **Decision Maker**: 統帥
|
||||
- **Author**: Operation Ollama-First v5.0 / Phase 21
|
||||
- **Related**: ADR-028(LLM 路由)、ADR-029(雙塔分工)、ADR-030(多供應商)
|
||||
|
||||
---
|
||||
|
||||
## Context
|
||||
|
||||
戰役 v5.0 累積完成 Primary + Secondary 兩台 GCP × 各 10 個 Ollama 模型(~67GB)。但既有 caller 多用單一寫死 model(如 sales_copy 永遠用 `llama3.1:8b`),無法動態根據 context 選最佳 model。
|
||||
|
||||
**痛點**:
|
||||
1. **資源浪費**:sales_copy 短文(< 100 字)也用 8B 模型 → 應走 `gemma3:4b`(4GB vs 5GB,延遲 -50%)
|
||||
2. **品質瓶頸**:Hermes 競價遇複雜 SKU(gap > 20%)仍用 `hermes3:latest`(8B)→ 應升 `qwen3:14b`
|
||||
3. **重構斷層**:AiderHeal 大型重構(diff > 200 行)用 `qwen2.5-coder:7b` 不夠 → 應升 `qwen2.5-coder:32b`
|
||||
4. **推理空缺**:EA HITL 需 chain-of-thought 時無 deepseek-r1 路徑
|
||||
|
||||
**前置已完成**:
|
||||
- Primary + Secondary 各 10 模型完整對稱
|
||||
- `services/llm_caller_registry.py` 30+ caller 集中
|
||||
- `services/cost_throttle_service.py` 成本守門
|
||||
|
||||
本 ADR 鎖定**動態路由規則**設計。
|
||||
|
||||
---
|
||||
|
||||
## Decision
|
||||
|
||||
### 1. 純規則引擎,零 LLM 成本
|
||||
|
||||
```python
|
||||
# services/llm_model_router.py
|
||||
ROUTING_RULES: Dict[str, list] = {
|
||||
'sales_copy': [
|
||||
(lambda ctx: ctx.get('expected_length', 0) < 100, 'gemma3:4b'),
|
||||
(lambda ctx: True, 'llama3.1:8b'),
|
||||
],
|
||||
'hermes_analyst': [
|
||||
(lambda ctx: ctx['max_gap_pct'] > 20 or ctx['min_sales_delta'] < -50,
|
||||
'qwen3:14b'),
|
||||
(lambda ctx: True,
|
||||
'hermes3:latest'),
|
||||
],
|
||||
# ... 6 個 caller 共 12 條規則
|
||||
}
|
||||
```
|
||||
|
||||
### 2. 路由規則對應表
|
||||
|
||||
| Caller | Context 觸發條件 | 升級 Model | 預設 Model |
|
||||
|---|---|---|---|
|
||||
| `sales_copy` | expected_length < 100 字 | `gemma3:4b` | `llama3.1:8b` |
|
||||
| `hermes_analyst` | max_gap_pct > 20% 或 銷量 < -50% | `qwen3:14b` | `hermes3:latest` |
|
||||
| `aider_heal` | diff_lines > 200 | `qwen2.5-coder:32b` | `qwen2.5-coder:7b` |
|
||||
| `openclaw_qa` | query_length > 200 或 multi_turn | `qwen3:14b` | `qwen2.5:7b-instruct` |
|
||||
| `ppt_vision` | minicpm_unhealthy | `llava:latest` | `minicpm-v:latest` |
|
||||
| `ea_engine` | require_chain_of_thought | `deepseek-r1:14b` | (回 default = Gemini)|
|
||||
|
||||
### 3. Feature Flag 灰度
|
||||
|
||||
- `MODEL_ROUTER_ENABLED` 預設 OFF
|
||||
- caller 端 `select_model(caller, context, default='既有 model')`
|
||||
- flag OFF → 直接回 default(不評估規則)→ 行為與戰前完全相同
|
||||
|
||||
### 4. 失敗安全
|
||||
|
||||
- predicate 拋例外 → log warning + skip 到下一條
|
||||
- caller 不在 ROUTING_RULES → 回 default
|
||||
- 所有規則都不命中 → 回 default
|
||||
|
||||
### 5. 整合方式(建議分階段)
|
||||
|
||||
```python
|
||||
# Caller 範例(如 ollama_service.generate_sales_copy):
|
||||
from services.llm_model_router import select_model
|
||||
|
||||
def generate_sales_copy(self, product_name, ...):
|
||||
model = select_model(
|
||||
caller='sales_copy',
|
||||
context={'expected_length': len(product_name) * 3},
|
||||
default='llama3.1:8b',
|
||||
)
|
||||
return self.generate(prompt=..., model=model, ...)
|
||||
```
|
||||
|
||||
**戰略性遷移**:
|
||||
- Phase 21.1: model_router service + test 落地(本 commit)✅
|
||||
- Phase 21.2: sales_copy 整合(低風險示範)⏳
|
||||
- Phase 21.3: aider_heal 整合(中風險,需 diff_lines 取得)
|
||||
- Phase 21.4: hermes_analyst 整合(高風險,動戰術主流程)
|
||||
- Phase 21.5: 全 caller 遷移完成 → MODEL_ROUTER_ENABLED 預設 ON
|
||||
|
||||
---
|
||||
|
||||
## Alternatives Considered
|
||||
|
||||
| 方案 | 否決理由 |
|
||||
|---|---|
|
||||
| **A. LLM-based routing**(用 LLM 決定用哪個 model)| 循環燒錢 + 引入新延遲 |
|
||||
| **B. caller 各自 hardcode 多 model**(不集中)| 規則漂移無 single source of truth |
|
||||
| **C. 直接統一升級到大模型**(如全用 qwen3:14b)| 浪費資源,短文不需 14B |
|
||||
| **D. 配置檔 YAML/JSON**(運行時讀檔)| 過度工程;Python lambda 已夠彈性 |
|
||||
|
||||
---
|
||||
|
||||
## Consequences
|
||||
|
||||
### 正面(5)
|
||||
1. **資源節省**:短文 sales_copy 用 4GB gemma3 vs 5GB llama3.1,延遲 -50%
|
||||
2. **品質提升**:複雜場景自動升大模型(hermes 14B / aider 32B)
|
||||
3. **零 LLM 成本**:純 Python lambda 規則
|
||||
4. **失敗安全**:規則例外不阻擋主流程
|
||||
5. **集中治理**:規則改動只需 PR `llm_model_router.py`,不動 caller
|
||||
|
||||
### 負面(3)
|
||||
1. **規則維護成本**:新 caller / 新 context 條件需更新 rules(但這正是 ADR 治理目標)
|
||||
2. **context 取得負擔**:caller 必須先計算 context(如 diff_lines)才能呼叫 router
|
||||
3. **debug 複雜度**:路由命中哪條規則需看 logger.debug
|
||||
|
||||
### 風險(3)
|
||||
1. **規則設計失誤**:閾值(20% / 200 lines)可能不準 → mitigate by Phase 21.2-21.5 灰度觀察
|
||||
2. **GCP 主機沒拉到對應 model**:select 回的 model 不存在 → mitigate by 拉模型前提(已完成 10 模型對稱)
|
||||
3. **caller 整合不完整**:部分 caller 仍 hardcode → 文件化遷移計畫
|
||||
|
||||
---
|
||||
|
||||
## Verification
|
||||
|
||||
### V1:unit test
|
||||
```bash
|
||||
pytest tests/test_llm_model_router.py -v
|
||||
# 預期 18 tests 全綠
|
||||
```
|
||||
|
||||
### V2:caller 整合後 ai_calls 觀察
|
||||
```sql
|
||||
SELECT model, COUNT(*), AVG(duration_ms)
|
||||
FROM ai_calls
|
||||
WHERE caller = 'sales_copy' AND called_at > NOW() - INTERVAL '7 days'
|
||||
GROUP BY model;
|
||||
-- 期望:gemma3:4b 短文佔 60%+,llama3.1:8b 長文佔 40%-
|
||||
-- 平均 duration: gemma3 < llama3.1 約 50%
|
||||
```
|
||||
|
||||
### V3:cost throttle 整合
|
||||
```python
|
||||
# Phase 22 規劃:cost_throttle 觸發時自動切便宜 model
|
||||
# 例:claude throttled → select_model 改回 default Gemini Flash
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Migration Plan
|
||||
|
||||
| Phase | 工作 | 狀態 |
|
||||
|---|---|---|
|
||||
| 21.1 | services/llm_model_router.py + 18 tests | ✅ 本 commit |
|
||||
| 21.2 | sales_copy 整合(generate_sales_copy 加 select_model)| ⏳ |
|
||||
| 21.3 | aider_heal 整合(需 diff_lines context)| ⏳ |
|
||||
| 21.4 | hermes_analyst 整合(需 max_gap_pct context)| ⏳ |
|
||||
| 21.5 | openclaw_qa / ppt_vision / ea_engine | ⏳ |
|
||||
| 21.6 | MODEL_ROUTER_ENABLED 預設 ON(觀察 1 週後)| ⏳ |
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- `services/llm_model_router.py`(本 commit)
|
||||
- `tests/test_llm_model_router.py`(18 tests)
|
||||
- `docs/llm_model_full_evaluation_20260504.md` 路由優化建議
|
||||
- ADR-028(LLM 路由統一準則)
|
||||
- ADR-029(Hermes-First 雙塔分工)
|
||||
- ADR-030(Frontier 多供應商策略)
|
||||
@@ -55,6 +55,7 @@
|
||||
| [031](ADR-031-mcp-self-hosted-stack.md) | MCP 自建 Stack(postgres + omnisearch + firecrawl + filesystem;含 Owen 護欄 #2 Firecrawl 2g 限制) | Accepted | 2026-05-04 |
|
||||
| [032](ADR-032-rag-autonomous-learning-loop.md) | RAG 自主學習迴圈 — Distiller + PromotionGate + 反饋環(Phase 11) | Accepted | 2026-05-03 |
|
||||
| [033](ADR-033-rag-three-guardrails.md) | RAG 治理三護欄 — Promotion Gate / Firecrawl 資源 / BGE-M3 一致性(Owen v5.0 鐵律) | Accepted | 2026-05-03 |
|
||||
| [034](ADR-034-dynamic-model-router.md) | Caller × Context 動態 Model Router(短文 gemma3 / 複雜 SKU qwen3:14b / 重構 coder:32b) | Accepted | 2026-05-04 |
|
||||
|
||||
## 規範
|
||||
|
||||
|
||||
149
services/llm_model_router.py
Normal file
149
services/llm_model_router.py
Normal file
@@ -0,0 +1,149 @@
|
||||
#!/usr/bin/env python3
|
||||
# -*- coding: utf-8 -*-
|
||||
"""
|
||||
services/llm_model_router.py
|
||||
Operation Ollama-First v5.0 / Phase 21 — Caller × Context 動態 Model Router
|
||||
|
||||
設計原則:
|
||||
- 不同 caller 在不同 context 下動態選擇最佳 model(同 provider)
|
||||
例:sales_copy 短文 → gemma3:4b / 長文 → llama3.1:8b / Hermes 複雜 SKU → qwen3:14b
|
||||
- 純規則引擎,零 LLM 成本
|
||||
- caller 透過 select_model(caller, context) 取 model name
|
||||
- feature flag MODEL_ROUTER_ENABLED 預設 OFF(不影響既有預設值)
|
||||
- 失敗 fallback:規則沒命中 → 回 caller 預設 model(向下相容)
|
||||
|
||||
對應 ADR-028 caller 白名單 + ADR-034 動態路由(待寫)。
|
||||
GCP Primary + Secondary 已備齊 10 模型支援所有路由規則。
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
import os
|
||||
import logging
|
||||
from typing import Dict, Any, Optional, Callable
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def is_model_router_enabled() -> bool:
|
||||
"""Runtime check(避免 import-time freeze)"""
|
||||
return os.getenv('MODEL_ROUTER_ENABLED', 'false').strip().lower() in ('true', '1', 'yes', 'on')
|
||||
|
||||
|
||||
# ─────────────────────────────────────────────────────────────────────────────
|
||||
# Routing 規則(ADR-034 規格)
|
||||
# ─────────────────────────────────────────────────────────────────────────────
|
||||
# 結構:caller → list of (predicate(context) → model_name) tuples
|
||||
# 取第一個 predicate 回 True 的 model;都不命中 → None(caller 用預設)
|
||||
# ─────────────────────────────────────────────────────────────────────────────
|
||||
|
||||
ROUTING_RULES: Dict[str, list] = {
|
||||
# Sales Copy: 短文走 gemma3:4b(輕量快),長文走 llama3.1:8b
|
||||
'sales_copy': [
|
||||
(lambda ctx: int(ctx.get('expected_length', 0) or 0) > 0
|
||||
and int(ctx.get('expected_length', 0)) < 100,
|
||||
'gemma3:4b'),
|
||||
(lambda ctx: True, # 預設
|
||||
'llama3.1:8b'),
|
||||
],
|
||||
|
||||
# Hermes 競價:簡單比價走 hermes3,複雜分析(gap > 20% 或銷量大跌)升 qwen3:14b
|
||||
'hermes_analyst': [
|
||||
(lambda ctx: float(ctx.get('max_gap_pct', 0) or 0) > 20
|
||||
or float(ctx.get('min_sales_delta', 0) or 0) < -50,
|
||||
'qwen3:14b'),
|
||||
(lambda ctx: True,
|
||||
'hermes3:latest'),
|
||||
],
|
||||
|
||||
# AiderHeal: 簡單 syntax fix 走 qwen2.5-coder:7b,重構級(diff > 200 行)升 32b
|
||||
'aider_heal': [
|
||||
(lambda ctx: int(ctx.get('diff_lines', 0) or 0) > 200,
|
||||
'qwen2.5-coder:32b'),
|
||||
(lambda ctx: True,
|
||||
'qwen2.5-coder:7b'),
|
||||
],
|
||||
|
||||
# OpenClaw Q&A: 簡單問題走 qwen2.5:7b-instruct,複雜走 qwen3:14b
|
||||
'openclaw_qa': [
|
||||
(lambda ctx: int(ctx.get('query_length', 0) or 0) > 200
|
||||
or bool(ctx.get('multi_turn', False)),
|
||||
'qwen3:14b'),
|
||||
(lambda ctx: True,
|
||||
'qwen2.5:7b-instruct'),
|
||||
],
|
||||
|
||||
# PPT vision: 主用 minicpm-v,主機標 unhealthy 時切 llava
|
||||
'ppt_vision': [
|
||||
(lambda ctx: bool(ctx.get('minicpm_unhealthy', False)),
|
||||
'llava:latest'),
|
||||
(lambda ctx: True,
|
||||
'minicpm-v:latest'),
|
||||
],
|
||||
|
||||
# 推理增強場景(EA HITL 戰略決策;目前未啟用,預留)
|
||||
'ea_engine': [
|
||||
(lambda ctx: bool(ctx.get('require_chain_of_thought', False)),
|
||||
'deepseek-r1:14b'),
|
||||
(lambda ctx: True,
|
||||
None), # None → caller 用預設(gemini-2.0-flash)
|
||||
],
|
||||
}
|
||||
|
||||
|
||||
def select_model(
|
||||
caller: str,
|
||||
context: Optional[Dict[str, Any]] = None,
|
||||
default: Optional[str] = None,
|
||||
) -> Optional[str]:
|
||||
"""主入口:依 caller × context 選 model。
|
||||
|
||||
Args:
|
||||
caller: 在 ROUTING_RULES key 內才路由;否則直接回 default
|
||||
context: 路由判斷依據(如 expected_length / diff_lines / max_gap_pct)
|
||||
default: caller 不在 rules 或所有 rule 都不命中時回傳
|
||||
|
||||
Returns:
|
||||
model name 字串 / None(None 代表 caller 用既有預設)
|
||||
|
||||
flag OFF 時直接回 default(不評估規則,向下相容)
|
||||
"""
|
||||
if not is_model_router_enabled():
|
||||
return default
|
||||
|
||||
if caller not in ROUTING_RULES:
|
||||
return default
|
||||
|
||||
ctx = context or {}
|
||||
for predicate, model_name in ROUTING_RULES[caller]:
|
||||
try:
|
||||
if predicate(ctx):
|
||||
if model_name is None:
|
||||
return default # 規則命中但要走預設
|
||||
logger.debug("[ModelRouter] %s ctx=%s → %s", caller, ctx, model_name)
|
||||
return model_name
|
||||
except Exception as exc:
|
||||
logger.warning("[ModelRouter] %s rule eval failed: %s", caller, exc)
|
||||
continue
|
||||
|
||||
# 沒命中 → default
|
||||
return default
|
||||
|
||||
|
||||
def list_routes_for_caller(caller: str) -> list:
|
||||
"""除錯:列出 caller 的所有路由規則 model"""
|
||||
rules = ROUTING_RULES.get(caller, [])
|
||||
return [model for _, model in rules]
|
||||
|
||||
|
||||
def all_callers_with_routes() -> list:
|
||||
"""所有有動態路由規則的 caller"""
|
||||
return list(ROUTING_RULES.keys())
|
||||
|
||||
|
||||
__all__ = [
|
||||
'select_model',
|
||||
'is_model_router_enabled',
|
||||
'list_routes_for_caller',
|
||||
'all_callers_with_routes',
|
||||
'ROUTING_RULES',
|
||||
]
|
||||
254
tests/test_llm_model_router.py
Normal file
254
tests/test_llm_model_router.py
Normal file
@@ -0,0 +1,254 @@
|
||||
"""
|
||||
tests/test_llm_model_router.py
|
||||
─────────────────────────────────────────────────────────────────
|
||||
Operation Ollama-First v5.0 / Phase 21 — Caller × Context 動態路由驗證
|
||||
"""
|
||||
|
||||
import pytest
|
||||
|
||||
|
||||
@pytest.fixture(autouse=True)
|
||||
def _reset_env(monkeypatch):
|
||||
monkeypatch.delenv('MODEL_ROUTER_ENABLED', raising=False)
|
||||
yield
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════════════════════
|
||||
# T1: feature flag OFF 時不路由(向下相容)
|
||||
# ═══════════════════════════════════════════════════════════════════════════
|
||||
|
||||
def test_flag_off_returns_default():
|
||||
from services.llm_model_router import select_model
|
||||
|
||||
# flag OFF 直接回 default(不評估規則)
|
||||
result = select_model(
|
||||
caller='sales_copy',
|
||||
context={'expected_length': 50},
|
||||
default='llama3.1:8b',
|
||||
)
|
||||
assert result == 'llama3.1:8b'
|
||||
|
||||
|
||||
def test_flag_off_unknown_caller_returns_default():
|
||||
from services.llm_model_router import select_model
|
||||
|
||||
result = select_model(caller='nonexistent', default='hermes3:latest')
|
||||
assert result == 'hermes3:latest'
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════════════════════
|
||||
# T2: sales_copy 路由(短文 vs 長文)
|
||||
# ═══════════════════════════════════════════════════════════════════════════
|
||||
|
||||
def test_sales_copy_short_text_routes_to_gemma3(monkeypatch):
|
||||
monkeypatch.setenv('MODEL_ROUTER_ENABLED', 'true')
|
||||
from services.llm_model_router import select_model
|
||||
|
||||
# 50 字短文 → gemma3:4b 輕量
|
||||
result = select_model(
|
||||
caller='sales_copy',
|
||||
context={'expected_length': 50},
|
||||
default='llama3.1:8b',
|
||||
)
|
||||
assert result == 'gemma3:4b'
|
||||
|
||||
|
||||
def test_sales_copy_long_text_routes_to_llama(monkeypatch):
|
||||
monkeypatch.setenv('MODEL_ROUTER_ENABLED', 'true')
|
||||
from services.llm_model_router import select_model
|
||||
|
||||
result = select_model(
|
||||
caller='sales_copy',
|
||||
context={'expected_length': 200},
|
||||
default='llama3.1:8b',
|
||||
)
|
||||
assert result == 'llama3.1:8b'
|
||||
|
||||
|
||||
def test_sales_copy_no_length_falls_back_to_default(monkeypatch):
|
||||
monkeypatch.setenv('MODEL_ROUTER_ENABLED', 'true')
|
||||
from services.llm_model_router import select_model
|
||||
|
||||
# 沒給 expected_length → 規則 1 不觸發 → 規則 2 always True → 回 llama3.1:8b
|
||||
result = select_model(
|
||||
caller='sales_copy',
|
||||
context={},
|
||||
default='llama3.1:8b',
|
||||
)
|
||||
assert result == 'llama3.1:8b'
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════════════════════
|
||||
# T3: Hermes 競價(簡單 vs 複雜 SKU)
|
||||
# ═══════════════════════════════════════════════════════════════════════════
|
||||
|
||||
def test_hermes_simple_routes_to_hermes3(monkeypatch):
|
||||
monkeypatch.setenv('MODEL_ROUTER_ENABLED', 'true')
|
||||
from services.llm_model_router import select_model
|
||||
|
||||
result = select_model(
|
||||
caller='hermes_analyst',
|
||||
context={'max_gap_pct': 5.2, 'min_sales_delta': -10.0},
|
||||
default='hermes3:latest',
|
||||
)
|
||||
assert result == 'hermes3:latest'
|
||||
|
||||
|
||||
def test_hermes_high_gap_routes_to_qwen3(monkeypatch):
|
||||
monkeypatch.setenv('MODEL_ROUTER_ENABLED', 'true')
|
||||
from services.llm_model_router import select_model
|
||||
|
||||
# gap > 20% → 升 qwen3:14b
|
||||
result = select_model(
|
||||
caller='hermes_analyst',
|
||||
context={'max_gap_pct': 25.0, 'min_sales_delta': -5.0},
|
||||
default='hermes3:latest',
|
||||
)
|
||||
assert result == 'qwen3:14b'
|
||||
|
||||
|
||||
def test_hermes_sales_crash_routes_to_qwen3(monkeypatch):
|
||||
monkeypatch.setenv('MODEL_ROUTER_ENABLED', 'true')
|
||||
from services.llm_model_router import select_model
|
||||
|
||||
# 銷量 < -50% → 升 qwen3:14b
|
||||
result = select_model(
|
||||
caller='hermes_analyst',
|
||||
context={'max_gap_pct': 5.0, 'min_sales_delta': -60.0},
|
||||
default='hermes3:latest',
|
||||
)
|
||||
assert result == 'qwen3:14b'
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════════════════════
|
||||
# T4: AiderHeal(簡單 vs 重構)
|
||||
# ═══════════════════════════════════════════════════════════════════════════
|
||||
|
||||
def test_aider_heal_small_diff_routes_to_7b(monkeypatch):
|
||||
monkeypatch.setenv('MODEL_ROUTER_ENABLED', 'true')
|
||||
from services.llm_model_router import select_model
|
||||
|
||||
result = select_model(
|
||||
caller='aider_heal',
|
||||
context={'diff_lines': 50},
|
||||
default='qwen2.5-coder:7b',
|
||||
)
|
||||
assert result == 'qwen2.5-coder:7b'
|
||||
|
||||
|
||||
def test_aider_heal_large_refactor_routes_to_32b(monkeypatch):
|
||||
monkeypatch.setenv('MODEL_ROUTER_ENABLED', 'true')
|
||||
from services.llm_model_router import select_model
|
||||
|
||||
# diff > 200 行 → 32b 重構級
|
||||
result = select_model(
|
||||
caller='aider_heal',
|
||||
context={'diff_lines': 350},
|
||||
default='qwen2.5-coder:7b',
|
||||
)
|
||||
assert result == 'qwen2.5-coder:32b'
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════════════════════
|
||||
# T5: PPT vision(主備援)
|
||||
# ═══════════════════════════════════════════════════════════════════════════
|
||||
|
||||
def test_ppt_vision_normal_routes_to_minicpm(monkeypatch):
|
||||
monkeypatch.setenv('MODEL_ROUTER_ENABLED', 'true')
|
||||
from services.llm_model_router import select_model
|
||||
|
||||
result = select_model(
|
||||
caller='ppt_vision',
|
||||
context={},
|
||||
default='minicpm-v:latest',
|
||||
)
|
||||
assert result == 'minicpm-v:latest'
|
||||
|
||||
|
||||
def test_ppt_vision_minicpm_unhealthy_routes_to_llava(monkeypatch):
|
||||
monkeypatch.setenv('MODEL_ROUTER_ENABLED', 'true')
|
||||
from services.llm_model_router import select_model
|
||||
|
||||
result = select_model(
|
||||
caller='ppt_vision',
|
||||
context={'minicpm_unhealthy': True},
|
||||
default='minicpm-v:latest',
|
||||
)
|
||||
assert result == 'llava:latest'
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════════════════════
|
||||
# T6: EA engine(推理需求 → deepseek-r1)
|
||||
# ═══════════════════════════════════════════════════════════════════════════
|
||||
|
||||
def test_ea_engine_no_cot_returns_default(monkeypatch):
|
||||
"""規則命中但 model_name=None → 回 default(caller 用既有 Gemini)"""
|
||||
monkeypatch.setenv('MODEL_ROUTER_ENABLED', 'true')
|
||||
from services.llm_model_router import select_model
|
||||
|
||||
result = select_model(
|
||||
caller='ea_engine',
|
||||
context={'require_chain_of_thought': False},
|
||||
default='gemini-2.0-flash',
|
||||
)
|
||||
assert result == 'gemini-2.0-flash'
|
||||
|
||||
|
||||
def test_ea_engine_cot_routes_to_deepseek_r1(monkeypatch):
|
||||
monkeypatch.setenv('MODEL_ROUTER_ENABLED', 'true')
|
||||
from services.llm_model_router import select_model
|
||||
|
||||
result = select_model(
|
||||
caller='ea_engine',
|
||||
context={'require_chain_of_thought': True},
|
||||
default='gemini-2.0-flash',
|
||||
)
|
||||
assert result == 'deepseek-r1:14b'
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════════════════════
|
||||
# T7: 規則例外不阻擋(容錯)
|
||||
# ═══════════════════════════════════════════════════════════════════════════
|
||||
|
||||
def test_predicate_exception_skipped_to_next_rule(monkeypatch):
|
||||
"""predicate 拋例外應 skip 到下一條(不 raise 給 caller)"""
|
||||
monkeypatch.setenv('MODEL_ROUTER_ENABLED', 'true')
|
||||
from services.llm_model_router import select_model
|
||||
|
||||
# context 給非數字會讓 int() 拋例外
|
||||
# 規則 1 期待 expected_length 可 int 化;給 'abc' 會炸
|
||||
# 但規則應 catch + skip 到規則 2 (always True → llama3.1:8b)
|
||||
result = select_model(
|
||||
caller='sales_copy',
|
||||
context={'expected_length': 'abc'}, # 故意給壞值
|
||||
default='llama3.1:8b',
|
||||
)
|
||||
# 結果:規則 1 失敗(int('abc') raise)→ skip → 規則 2 命中 → 'llama3.1:8b'
|
||||
assert result == 'llama3.1:8b'
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════════════════════
|
||||
# T8: utility 函數
|
||||
# ═══════════════════════════════════════════════════════════════════════════
|
||||
|
||||
def test_list_routes_for_known_caller():
|
||||
from services.llm_model_router import list_routes_for_caller
|
||||
|
||||
sales_routes = list_routes_for_caller('sales_copy')
|
||||
assert 'gemma3:4b' in sales_routes
|
||||
assert 'llama3.1:8b' in sales_routes
|
||||
|
||||
|
||||
def test_list_routes_for_unknown_caller():
|
||||
from services.llm_model_router import list_routes_for_caller
|
||||
|
||||
assert list_routes_for_caller('nonexistent') == []
|
||||
|
||||
|
||||
def test_all_callers_with_routes():
|
||||
from services.llm_model_router import all_callers_with_routes
|
||||
|
||||
callers = all_callers_with_routes()
|
||||
expected = {'sales_copy', 'hermes_analyst', 'aider_heal',
|
||||
'openclaw_qa', 'ppt_vision', 'ea_engine'}
|
||||
assert expected.issubset(set(callers))
|
||||
Reference in New Issue
Block a user