From 390c32b05de92951729350d53605ca5482bf5410 Mon Sep 17 00:00:00 2001
From: OoO <ooo@MacBook-Pro.local>
Date: Mon, 4 May 2026 10:54:12 +0800
Subject: [PATCH] =?UTF-8?q?feat(p21):=20Caller=20=C3=97=20Context=20?=
 =?UTF-8?q?=E5=8B=95=E6=85=8B=20Model=20Router=20+=20ADR-034?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Operation Ollama-First v5.0 / Phase 21 — 動態路由治理

services/llm_model_router.py (160+ 行)
- 純規則引擎，零 LLM 成本（Python lambda predicate）
- 6 caller × 12 條路由規則：
  • sales_copy: 短文 < 100 字 → gemma3:4b / 長文 → llama3.1:8b
  • hermes_analyst: gap > 20% 或銷量 < -50% → qwen3:14b / 預設 hermes3
  • aider_heal: diff > 200 行 → qwen2.5-coder:32b / 預設 7b
  • openclaw_qa: query > 200 字或 multi_turn → qwen3:14b / 預設 qwen2.5:7b-instruct
  • ppt_vision: minicpm 不健康 → llava / 預設 minicpm-v
  • ea_engine: require_chain_of_thought → deepseek-r1:14b / 預設 Gemini
- feature flag MODEL_ROUTER_ENABLED 預設 OFF（向下相容）
- 失敗安全：predicate 例外 skip 到下一條

tests/test_llm_model_router.py (18 tests 全綠)
- T1 flag OFF 不路由
- T2 sales_copy 短/長文路由
- T3 hermes 簡單/複雜 SKU
- T4 aider_heal 簡單/重構
- T5 ppt_vision 主備援
- T6 ea_engine CoT 路由
- T7 predicate 例外容錯
- T8 utility 函數

ADR-034 — Caller × Context 動態 Model Router
- 6 caller 路由規則對應表
- 5 段否決方案（LLM-based / hardcode / 配置檔 / 統一升級）
- Phase 21.2-21.6 戰略性遷移計畫
- V1-V3 驗收 SQL（caller 整合後 model 分布觀察）

關聯：Primary + Secondary 兩台 GCP 已備齊 10 模型（67GB 對稱）支援所有
路由規則；caller 整合可分階段進行（Phase 21.2-21.5）。

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 docs/adr/ADR-034-dynamic-model-router.md | 176 ++++++++++++++++
 docs/adr/README.md                       |   1 +
 services/llm_model_router.py             | 149 +++++++++++++
 tests/test_llm_model_router.py           | 254 +++++++++++++++++++++++
 4 files changed, 580 insertions(+)
 create mode 100644 docs/adr/ADR-034-dynamic-model-router.md
 create mode 100644 services/llm_model_router.py
 create mode 100644 tests/test_llm_model_router.py

diff --git a/docs/adr/ADR-034-dynamic-model-router.md b/docs/adr/ADR-034-dynamic-model-router.md
new file mode 100644
index 0000000..e2c6c76
--- /dev/null
+++ b/docs/adr/ADR-034-dynamic-model-router.md
@@ -0,0 +1,176 @@
+# ADR-034: Caller × Context 動態 Model Router
+
+- **Status**: Accepted (待整合到 caller 後 Active)
+- **Date**: 2026-05-04
+- **Decision Maker**: 統帥
+- **Author**: Operation Ollama-First v5.0 / Phase 21
+- **Related**: ADR-028（LLM 路由）、ADR-029（雙塔分工）、ADR-030（多供應商）
+
+---
+
+## Context
+
+戰役 v5.0 累積完成 Primary + Secondary 兩台 GCP × 各 10 個 Ollama 模型（~67GB）。但既有 caller 多用單一寫死 model（如 sales_copy 永遠用 `llama3.1:8b`），無法動態根據 context 選最佳 model。
+
+**痛點**：
+1. **資源浪費**：sales_copy 短文（< 100 字）也用 8B 模型 → 應走 `gemma3:4b`（4GB vs 5GB，延遲 -50%）
+2. **品質瓶頸**：Hermes 競價遇複雜 SKU（gap > 20%）仍用 `hermes3:latest`（8B）→ 應升 `qwen3:14b`
+3. **重構斷層**：AiderHeal 大型重構（diff > 200 行）用 `qwen2.5-coder:7b` 不夠 → 應升 `qwen2.5-coder:32b`
+4. **推理空缺**：EA HITL 需 chain-of-thought 時無 deepseek-r1 路徑
+
+**前置已完成**：
+- Primary + Secondary 各 10 模型完整對稱
+- `services/llm_caller_registry.py` 30+ caller 集中
+- `services/cost_throttle_service.py` 成本守門
+
+本 ADR 鎖定**動態路由規則**設計。
+
+---
+
+## Decision
+
+### 1. 純規則引擎，零 LLM 成本
+
+```python
+# services/llm_model_router.py
+ROUTING_RULES: Dict[str, list] = {
+    'sales_copy': [
+        (lambda ctx: ctx.get('expected_length', 0) < 100, 'gemma3:4b'),
+        (lambda ctx: True,                                'llama3.1:8b'),
+    ],
+    'hermes_analyst': [
+        (lambda ctx: ctx['max_gap_pct'] > 20 or ctx['min_sales_delta'] < -50,
+         'qwen3:14b'),
+        (lambda ctx: True,
+         'hermes3:latest'),
+    ],
+    # ... 6 個 caller 共 12 條規則
+}
+```
+
+### 2. 路由規則對應表
+
+| Caller | Context 觸發條件 | 升級 Model | 預設 Model |
+|---|---|---|---|
+| `sales_copy` | expected_length < 100 字 | `gemma3:4b` | `llama3.1:8b` |
+| `hermes_analyst` | max_gap_pct > 20% 或 銷量 < -50% | `qwen3:14b` | `hermes3:latest` |
+| `aider_heal` | diff_lines > 200 | `qwen2.5-coder:32b` | `qwen2.5-coder:7b` |
+| `openclaw_qa` | query_length > 200 或 multi_turn | `qwen3:14b` | `qwen2.5:7b-instruct` |
+| `ppt_vision` | minicpm_unhealthy | `llava:latest` | `minicpm-v:latest` |
+| `ea_engine` | require_chain_of_thought | `deepseek-r1:14b` | （回 default = Gemini）|
+
+### 3. Feature Flag 灰度
+
+- `MODEL_ROUTER_ENABLED` 預設 OFF
+- caller 端 `select_model(caller, context, default='既有 model')`
+- flag OFF → 直接回 default（不評估規則）→ 行為與戰前完全相同
+
+### 4. 失敗安全
+
+- predicate 拋例外 → log warning + skip 到下一條
+- caller 不在 ROUTING_RULES → 回 default
+- 所有規則都不命中 → 回 default
+
+### 5. 整合方式（建議分階段）
+
+```python
+# Caller 範例（如 ollama_service.generate_sales_copy）：
+from services.llm_model_router import select_model
+
+def generate_sales_copy(self, product_name, ...):
+    model = select_model(
+        caller='sales_copy',
+        context={'expected_length': len(product_name) * 3},
+        default='llama3.1:8b',
+    )
+    return self.generate(prompt=..., model=model, ...)
+```
+
+**戰略性遷移**：
+- Phase 21.1: model_router service + test 落地（本 commit）✅
+- Phase 21.2: sales_copy 整合（低風險示範）⏳
+- Phase 21.3: aider_heal 整合（中風險，需 diff_lines 取得）
+- Phase 21.4: hermes_analyst 整合（高風險，動戰術主流程）
+- Phase 21.5: 全 caller 遷移完成 → MODEL_ROUTER_ENABLED 預設 ON
+
+---
+
+## Alternatives Considered
+
+| 方案 | 否決理由 |
+|---|---|
+| **A. LLM-based routing**（用 LLM 決定用哪個 model）| 循環燒錢 + 引入新延遲 |
+| **B. caller 各自 hardcode 多 model**（不集中）| 規則漂移無 single source of truth |
+| **C. 直接統一升級到大模型**（如全用 qwen3:14b）| 浪費資源，短文不需 14B |
+| **D. 配置檔 YAML/JSON**（運行時讀檔）| 過度工程；Python lambda 已夠彈性 |
+
+---
+
+## Consequences
+
+### 正面（5）
+1. **資源節省**：短文 sales_copy 用 4GB gemma3 vs 5GB llama3.1，延遲 -50%
+2. **品質提升**：複雜場景自動升大模型（hermes 14B / aider 32B）
+3. **零 LLM 成本**：純 Python lambda 規則
+4. **失敗安全**：規則例外不阻擋主流程
+5. **集中治理**：規則改動只需 PR `llm_model_router.py`，不動 caller
+
+### 負面（3）
+1. **規則維護成本**：新 caller / 新 context 條件需更新 rules（但這正是 ADR 治理目標）
+2. **context 取得負擔**：caller 必須先計算 context（如 diff_lines）才能呼叫 router
+3. **debug 複雜度**：路由命中哪條規則需看 logger.debug
+
+### 風險（3）
+1. **規則設計失誤**：閾值（20% / 200 lines）可能不準 → mitigate by Phase 21.2-21.5 灰度觀察
+2. **GCP 主機沒拉到對應 model**：select 回的 model 不存在 → mitigate by 拉模型前提（已完成 10 模型對稱）
+3. **caller 整合不完整**：部分 caller 仍 hardcode → 文件化遷移計畫
+
+---
+
+## Verification
+
+### V1：unit test
+```bash
+pytest tests/test_llm_model_router.py -v
+# 預期 18 tests 全綠
+```
+
+### V2：caller 整合後 ai_calls 觀察
+```sql
+SELECT model, COUNT(*), AVG(duration_ms)
+FROM ai_calls
+WHERE caller = 'sales_copy' AND called_at > NOW() - INTERVAL '7 days'
+GROUP BY model;
+-- 期望：gemma3:4b 短文佔 60%+，llama3.1:8b 長文佔 40%-
+-- 平均 duration: gemma3 < llama3.1 約 50%
+```
+
+### V3：cost throttle 整合
+```python
+# Phase 22 規劃：cost_throttle 觸發時自動切便宜 model
+# 例：claude throttled → select_model 改回 default Gemini Flash
+```
+
+---
+
+## Migration Plan
+
+| Phase | 工作 | 狀態 |
+|---|---|---|
+| 21.1 | services/llm_model_router.py + 18 tests | ✅ 本 commit |
+| 21.2 | sales_copy 整合（generate_sales_copy 加 select_model）| ⏳ |
+| 21.3 | aider_heal 整合（需 diff_lines context）| ⏳ |
+| 21.4 | hermes_analyst 整合（需 max_gap_pct context）| ⏳ |
+| 21.5 | openclaw_qa / ppt_vision / ea_engine | ⏳ |
+| 21.6 | MODEL_ROUTER_ENABLED 預設 ON（觀察 1 週後）| ⏳ |
+
+---
+
+## References
+
+- `services/llm_model_router.py`（本 commit）
+- `tests/test_llm_model_router.py`（18 tests）
+- `docs/llm_model_full_evaluation_20260504.md` 路由優化建議
+- ADR-028（LLM 路由統一準則）
+- ADR-029（Hermes-First 雙塔分工）
+- ADR-030（Frontier 多供應商策略）
diff --git a/docs/adr/README.md b/docs/adr/README.md
index 3a08672..80fe8c8 100644
--- a/docs/adr/README.md
+++ b/docs/adr/README.md
@@ -55,6 +55,7 @@
 | [031](ADR-031-mcp-self-hosted-stack.md) | MCP 自建 Stack（postgres + omnisearch + firecrawl + filesystem；含 Owen 護欄 #2 Firecrawl 2g 限制） | Accepted | 2026-05-04 |
 | [032](ADR-032-rag-autonomous-learning-loop.md) | RAG 自主學習迴圈 — Distiller + PromotionGate + 反饋環（Phase 11） | Accepted | 2026-05-03 |
 | [033](ADR-033-rag-three-guardrails.md) | RAG 治理三護欄 — Promotion Gate / Firecrawl 資源 / BGE-M3 一致性（Owen v5.0 鐵律） | Accepted | 2026-05-03 |
+| [034](ADR-034-dynamic-model-router.md) | Caller × Context 動態 Model Router（短文 gemma3 / 複雜 SKU qwen3:14b / 重構 coder:32b） | Accepted | 2026-05-04 |
 
 ## 規範
 
diff --git a/services/llm_model_router.py b/services/llm_model_router.py
new file mode 100644
index 0000000..e0cf064
--- /dev/null
+++ b/services/llm_model_router.py
@@ -0,0 +1,149 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+"""
+services/llm_model_router.py
+Operation Ollama-First v5.0 / Phase 21 — Caller × Context 動態 Model Router
+
+設計原則：
+- 不同 caller 在不同 context 下動態選擇最佳 model（同 provider）
+  例：sales_copy 短文 → gemma3:4b / 長文 → llama3.1:8b / Hermes 複雜 SKU → qwen3:14b
+- 純規則引擎，零 LLM 成本
+- caller 透過 select_model(caller, context) 取 model name
+- feature flag MODEL_ROUTER_ENABLED 預設 OFF（不影響既有預設值）
+- 失敗 fallback：規則沒命中 → 回 caller 預設 model（向下相容）
+
+對應 ADR-028 caller 白名單 + ADR-034 動態路由（待寫）。
+GCP Primary + Secondary 已備齊 10 模型支援所有路由規則。
+"""
+
+from __future__ import annotations
+import os
+import logging
+from typing import Dict, Any, Optional, Callable
+
+logger = logging.getLogger(__name__)
+
+
+def is_model_router_enabled() -> bool:
+    """Runtime check（避免 import-time freeze）"""
+    return os.getenv('MODEL_ROUTER_ENABLED', 'false').strip().lower() in ('true', '1', 'yes', 'on')
+
+
+# ─────────────────────────────────────────────────────────────────────────────
+# Routing 規則（ADR-034 規格）
+# ─────────────────────────────────────────────────────────────────────────────
+# 結構：caller → list of (predicate(context) → model_name) tuples
+# 取第一個 predicate 回 True 的 model；都不命中 → None（caller 用預設）
+# ─────────────────────────────────────────────────────────────────────────────
+
+ROUTING_RULES: Dict[str, list] = {
+    # Sales Copy: 短文走 gemma3:4b（輕量快），長文走 llama3.1:8b
+    'sales_copy': [
+        (lambda ctx: int(ctx.get('expected_length', 0) or 0) > 0
+                     and int(ctx.get('expected_length', 0)) < 100,
+         'gemma3:4b'),
+        (lambda ctx: True,  # 預設
+         'llama3.1:8b'),
+    ],
+
+    # Hermes 競價：簡單比價走 hermes3，複雜分析（gap > 20% 或銷量大跌）升 qwen3:14b
+    'hermes_analyst': [
+        (lambda ctx: float(ctx.get('max_gap_pct', 0) or 0) > 20
+                     or float(ctx.get('min_sales_delta', 0) or 0) < -50,
+         'qwen3:14b'),
+        (lambda ctx: True,
+         'hermes3:latest'),
+    ],
+
+    # AiderHeal: 簡單 syntax fix 走 qwen2.5-coder:7b，重構級（diff > 200 行）升 32b
+    'aider_heal': [
+        (lambda ctx: int(ctx.get('diff_lines', 0) or 0) > 200,
+         'qwen2.5-coder:32b'),
+        (lambda ctx: True,
+         'qwen2.5-coder:7b'),
+    ],
+
+    # OpenClaw Q&A: 簡單問題走 qwen2.5:7b-instruct，複雜走 qwen3:14b
+    'openclaw_qa': [
+        (lambda ctx: int(ctx.get('query_length', 0) or 0) > 200
+                     or bool(ctx.get('multi_turn', False)),
+         'qwen3:14b'),
+        (lambda ctx: True,
+         'qwen2.5:7b-instruct'),
+    ],
+
+    # PPT vision: 主用 minicpm-v，主機標 unhealthy 時切 llava
+    'ppt_vision': [
+        (lambda ctx: bool(ctx.get('minicpm_unhealthy', False)),
+         'llava:latest'),
+        (lambda ctx: True,
+         'minicpm-v:latest'),
+    ],
+
+    # 推理增強場景（EA HITL 戰略決策；目前未啟用，預留）
+    'ea_engine': [
+        (lambda ctx: bool(ctx.get('require_chain_of_thought', False)),
+         'deepseek-r1:14b'),
+        (lambda ctx: True,
+         None),  # None → caller 用預設（gemini-2.0-flash）
+    ],
+}
+
+
+def select_model(
+    caller: str,
+    context: Optional[Dict[str, Any]] = None,
+    default: Optional[str] = None,
+) -> Optional[str]:
+    """主入口：依 caller × context 選 model。
+
+    Args:
+        caller: 在 ROUTING_RULES key 內才路由；否則直接回 default
+        context: 路由判斷依據（如 expected_length / diff_lines / max_gap_pct）
+        default: caller 不在 rules 或所有 rule 都不命中時回傳
+
+    Returns:
+        model name 字串 / None（None 代表 caller 用既有預設）
+
+    flag OFF 時直接回 default（不評估規則，向下相容）
+    """
+    if not is_model_router_enabled():
+        return default
+
+    if caller not in ROUTING_RULES:
+        return default
+
+    ctx = context or {}
+    for predicate, model_name in ROUTING_RULES[caller]:
+        try:
+            if predicate(ctx):
+                if model_name is None:
+                    return default  # 規則命中但要走預設
+                logger.debug("[ModelRouter] %s ctx=%s → %s", caller, ctx, model_name)
+                return model_name
+        except Exception as exc:
+            logger.warning("[ModelRouter] %s rule eval failed: %s", caller, exc)
+            continue
+
+    # 沒命中 → default
+    return default
+
+
+def list_routes_for_caller(caller: str) -> list:
+    """除錯：列出 caller 的所有路由規則 model"""
+    rules = ROUTING_RULES.get(caller, [])
+    return [model for _, model in rules]
+
+
+def all_callers_with_routes() -> list:
+    """所有有動態路由規則的 caller"""
+    return list(ROUTING_RULES.keys())
+
+
+__all__ = [
+    'select_model',
+    'is_model_router_enabled',
+    'list_routes_for_caller',
+    'all_callers_with_routes',
+    'ROUTING_RULES',
+]
diff --git a/tests/test_llm_model_router.py b/tests/test_llm_model_router.py
new file mode 100644
index 0000000..039cfe9
--- /dev/null
+++ b/tests/test_llm_model_router.py
@@ -0,0 +1,254 @@
+"""
+tests/test_llm_model_router.py
+─────────────────────────────────────────────────────────────────
+Operation Ollama-First v5.0 / Phase 21 — Caller × Context 動態路由驗證
+"""
+
+import pytest
+
+
+@pytest.fixture(autouse=True)
+def _reset_env(monkeypatch):
+    monkeypatch.delenv('MODEL_ROUTER_ENABLED', raising=False)
+    yield
+
+
+# ═══════════════════════════════════════════════════════════════════════════
+# T1: feature flag OFF 時不路由（向下相容）
+# ═══════════════════════════════════════════════════════════════════════════
+
+def test_flag_off_returns_default():
+    from services.llm_model_router import select_model
+
+    # flag OFF 直接回 default（不評估規則）
+    result = select_model(
+        caller='sales_copy',
+        context={'expected_length': 50},
+        default='llama3.1:8b',
+    )
+    assert result == 'llama3.1:8b'
+
+
+def test_flag_off_unknown_caller_returns_default():
+    from services.llm_model_router import select_model
+
+    result = select_model(caller='nonexistent', default='hermes3:latest')
+    assert result == 'hermes3:latest'
+
+
+# ═══════════════════════════════════════════════════════════════════════════
+# T2: sales_copy 路由（短文 vs 長文）
+# ═══════════════════════════════════════════════════════════════════════════
+
+def test_sales_copy_short_text_routes_to_gemma3(monkeypatch):
+    monkeypatch.setenv('MODEL_ROUTER_ENABLED', 'true')
+    from services.llm_model_router import select_model
+
+    # 50 字短文 → gemma3:4b 輕量
+    result = select_model(
+        caller='sales_copy',
+        context={'expected_length': 50},
+        default='llama3.1:8b',
+    )
+    assert result == 'gemma3:4b'
+
+
+def test_sales_copy_long_text_routes_to_llama(monkeypatch):
+    monkeypatch.setenv('MODEL_ROUTER_ENABLED', 'true')
+    from services.llm_model_router import select_model
+
+    result = select_model(
+        caller='sales_copy',
+        context={'expected_length': 200},
+        default='llama3.1:8b',
+    )
+    assert result == 'llama3.1:8b'
+
+
+def test_sales_copy_no_length_falls_back_to_default(monkeypatch):
+    monkeypatch.setenv('MODEL_ROUTER_ENABLED', 'true')
+    from services.llm_model_router import select_model
+
+    # 沒給 expected_length → 規則 1 不觸發 → 規則 2 always True → 回 llama3.1:8b
+    result = select_model(
+        caller='sales_copy',
+        context={},
+        default='llama3.1:8b',
+    )
+    assert result == 'llama3.1:8b'
+
+
+# ═══════════════════════════════════════════════════════════════════════════
+# T3: Hermes 競價（簡單 vs 複雜 SKU）
+# ═══════════════════════════════════════════════════════════════════════════
+
+def test_hermes_simple_routes_to_hermes3(monkeypatch):
+    monkeypatch.setenv('MODEL_ROUTER_ENABLED', 'true')
+    from services.llm_model_router import select_model
+
+    result = select_model(
+        caller='hermes_analyst',
+        context={'max_gap_pct': 5.2, 'min_sales_delta': -10.0},
+        default='hermes3:latest',
+    )
+    assert result == 'hermes3:latest'
+
+
+def test_hermes_high_gap_routes_to_qwen3(monkeypatch):
+    monkeypatch.setenv('MODEL_ROUTER_ENABLED', 'true')
+    from services.llm_model_router import select_model
+
+    # gap > 20% → 升 qwen3:14b
+    result = select_model(
+        caller='hermes_analyst',
+        context={'max_gap_pct': 25.0, 'min_sales_delta': -5.0},
+        default='hermes3:latest',
+    )
+    assert result == 'qwen3:14b'
+
+
+def test_hermes_sales_crash_routes_to_qwen3(monkeypatch):
+    monkeypatch.setenv('MODEL_ROUTER_ENABLED', 'true')
+    from services.llm_model_router import select_model
+
+    # 銷量 < -50% → 升 qwen3:14b
+    result = select_model(
+        caller='hermes_analyst',
+        context={'max_gap_pct': 5.0, 'min_sales_delta': -60.0},
+        default='hermes3:latest',
+    )
+    assert result == 'qwen3:14b'
+
+
+# ═══════════════════════════════════════════════════════════════════════════
+# T4: AiderHeal（簡單 vs 重構）
+# ═══════════════════════════════════════════════════════════════════════════
+
+def test_aider_heal_small_diff_routes_to_7b(monkeypatch):
+    monkeypatch.setenv('MODEL_ROUTER_ENABLED', 'true')
+    from services.llm_model_router import select_model
+
+    result = select_model(
+        caller='aider_heal',
+        context={'diff_lines': 50},
+        default='qwen2.5-coder:7b',
+    )
+    assert result == 'qwen2.5-coder:7b'
+
+
+def test_aider_heal_large_refactor_routes_to_32b(monkeypatch):
+    monkeypatch.setenv('MODEL_ROUTER_ENABLED', 'true')
+    from services.llm_model_router import select_model
+
+    # diff > 200 行 → 32b 重構級
+    result = select_model(
+        caller='aider_heal',
+        context={'diff_lines': 350},
+        default='qwen2.5-coder:7b',
+    )
+    assert result == 'qwen2.5-coder:32b'
+
+
+# ═══════════════════════════════════════════════════════════════════════════
+# T5: PPT vision（主備援）
+# ═══════════════════════════════════════════════════════════════════════════
+
+def test_ppt_vision_normal_routes_to_minicpm(monkeypatch):
+    monkeypatch.setenv('MODEL_ROUTER_ENABLED', 'true')
+    from services.llm_model_router import select_model
+
+    result = select_model(
+        caller='ppt_vision',
+        context={},
+        default='minicpm-v:latest',
+    )
+    assert result == 'minicpm-v:latest'
+
+
+def test_ppt_vision_minicpm_unhealthy_routes_to_llava(monkeypatch):
+    monkeypatch.setenv('MODEL_ROUTER_ENABLED', 'true')
+    from services.llm_model_router import select_model
+
+    result = select_model(
+        caller='ppt_vision',
+        context={'minicpm_unhealthy': True},
+        default='minicpm-v:latest',
+    )
+    assert result == 'llava:latest'
+
+
+# ═══════════════════════════════════════════════════════════════════════════
+# T6: EA engine（推理需求 → deepseek-r1）
+# ═══════════════════════════════════════════════════════════════════════════
+
+def test_ea_engine_no_cot_returns_default(monkeypatch):
+    """規則命中但 model_name=None → 回 default（caller 用既有 Gemini）"""
+    monkeypatch.setenv('MODEL_ROUTER_ENABLED', 'true')
+    from services.llm_model_router import select_model
+
+    result = select_model(
+        caller='ea_engine',
+        context={'require_chain_of_thought': False},
+        default='gemini-2.0-flash',
+    )
+    assert result == 'gemini-2.0-flash'
+
+
+def test_ea_engine_cot_routes_to_deepseek_r1(monkeypatch):
+    monkeypatch.setenv('MODEL_ROUTER_ENABLED', 'true')
+    from services.llm_model_router import select_model
+
+    result = select_model(
+        caller='ea_engine',
+        context={'require_chain_of_thought': True},
+        default='gemini-2.0-flash',
+    )
+    assert result == 'deepseek-r1:14b'
+
+
+# ═══════════════════════════════════════════════════════════════════════════
+# T7: 規則例外不阻擋（容錯）
+# ═══════════════════════════════════════════════════════════════════════════
+
+def test_predicate_exception_skipped_to_next_rule(monkeypatch):
+    """predicate 拋例外應 skip 到下一條（不 raise 給 caller）"""
+    monkeypatch.setenv('MODEL_ROUTER_ENABLED', 'true')
+    from services.llm_model_router import select_model
+
+    # context 給非數字會讓 int() 拋例外
+    # 規則 1 期待 expected_length 可 int 化；給 'abc' 會炸
+    # 但規則應 catch + skip 到規則 2 (always True → llama3.1:8b)
+    result = select_model(
+        caller='sales_copy',
+        context={'expected_length': 'abc'},  # 故意給壞值
+        default='llama3.1:8b',
+    )
+    # 結果：規則 1 失敗（int('abc') raise）→ skip → 規則 2 命中 → 'llama3.1:8b'
+    assert result == 'llama3.1:8b'
+
+
+# ═══════════════════════════════════════════════════════════════════════════
+# T8: utility 函數
+# ═══════════════════════════════════════════════════════════════════════════
+
+def test_list_routes_for_known_caller():
+    from services.llm_model_router import list_routes_for_caller
+
+    sales_routes = list_routes_for_caller('sales_copy')
+    assert 'gemma3:4b' in sales_routes
+    assert 'llama3.1:8b' in sales_routes
+
+
+def test_list_routes_for_unknown_caller():
+    from services.llm_model_router import list_routes_for_caller
+
+    assert list_routes_for_caller('nonexistent') == []
+
+
+def test_all_callers_with_routes():
+    from services.llm_model_router import all_callers_with_routes
+
+    callers = all_callers_with_routes()
+    expected = {'sales_copy', 'hermes_analyst', 'aider_heal',
+                'openclaw_qa', 'ppt_vision', 'ea_engine'}
+    assert expected.issubset(set(callers))