Files
awoooi/apps/api/tests/test_p0_diagnose_routing.py
Your Name fb0c72db42
Some checks failed
CD Pipeline / build-and-deploy (push) Failing after 2m26s
feat(ai-router): 推翻 A2 鐵律 — DIAGNOSE primary 改 Ollama 本地優先
統帥鐵律 2026-04-29:「主要優先用 111 主機的 Ollama」
+ feedback_ai_autonomous_direction.md:以本地免費 LLM 為主
+ feedback_ollama_111_only.md:Ollama 唯一主機 = 111

## 推翻 A2 (2026-04-27 INC-20260425) 的事實基礎

**舊事實**:Ollama = CPU-only deepseek-r1:14b @ 238s(不可用)
**新事實**:prod Ollama 111 = M1 Pro Apple Silicon GPU + qwen2.5:7b-instruct
           VRAM 8.2GB 全載入,ctx 32k,實測 hi prompt 0.54s

**雲端全死**(2026-04-29 prod log 證據):
- OpenClaw 188:8088 → 500 Internal Server Error
- Gemini → 429 Too Many Requests(配額爆)
- Claude → 404 Not Found(model claude-3-haiku-20240307 過期)

**不推翻 A2 → 100% incident llm_failed → AI 自動修復永遠不啟動**

## 修改範圍(最小、安全、可驗證)

### ai_router.py
- `_diagnose_fallback_chain`: OLLAMA 第一順位(取代「永久排除」舊註解)
  順序:[OLLAMA, OPENCLAW_NEMO, GEMINI, CLAUDE]
- `_intent_provider_overrides[DIAGNOSE]`: OPENCLAW_NEMO → OLLAMA
- 不動 _full_fallback_chain(避免影響 RESTART/SCALE/CONFIG/DELETE)
- 不動 _tool_calling_fallback_chain
- 不動 complexity_map(critic M2 留待後續)

### openclaw.py
- 注入 task_type="diagnose" 到 alert_context(critic C2 真根因)
- 修復 ai_providers/ollama.py:77 timeout 對齊問題:
  - 有 task_type → OLLAMA_DIAGNOSE_TIMEOUT_SECONDS=200s
  - 沒有 → OPENCLAW_TIMEOUT=30s(不夠 qwen2.5:7b 推理)
- prod log 看到 latency_ms=120014 的根因
- 用 dict(alert_context) 複製,不污染原 context

## Regression Test 同步更新(5 個)

A2 鐵律守門 test 全部反映新鐵律:

- test_p0_diagnose_routing.py::test_diagnose_override_is_ollama
  (原 test_diagnose_override_is_openclaw_nemo)
- test_ai_router_diagnose_fallback.py::test_diagnose_fallback_chain_ollama_primary
  (原 test_diagnose_fallback_chain_no_ollama)
- test_ai_router_diagnose_fallback.py::test_diagnose_route_primary_is_ollama
  (原 test_diagnose_route_fallback_chain_excludes_ollama)
- test_ai_router_diagnose_fallback.py::test_diagnose_route_sync_primary_is_ollama
  (原 test_diagnose_route_sync_fallback_chain_excludes_ollama)
- test_ai_router_diagnose_fallback.py::test_build_fallback_chain_for_intent_diagnose_with_ollama_primary
  (原 test_build_fallback_chain_for_intent_diagnose_no_ollama)
- test_ai_router_failover_integration.py::test_router_uses_failover_for_diagnose_ollama_primary
  (原 test_router_does_not_use_failover_for_openclaw_nemo)

每個 test docstring 都記載歷史脈絡 + 推翻原因。

## 驗證

- 1608 unit tests 全綠
- LLM 路徑 16 個 test 全綠(含 6 個 A2 守門 test 更新版)
- complexity_scorer / failover_manager / intent_classifier 不受影響

## 期望 prod 行為(部署後驗證)

incident 進入 → DIAGNOSE intent → primary OLLAMA (qwen2.5:7b on M1 Pro GPU)
  失敗才 fallback → OpenClaw 188 → Gemini → Claude
  Ollama 用 200s timeout(之前 30s 不夠)
  → AI 自動修復終於可以啟動,不再 100% llm_failed

## 已知債(後續處理)

- models.json:21 ollama.default 仍是 deepseek-r1:14b(critic C1,但 prod 已自動 route 到實載 model)
- complexity 4/5 仍寫死 gemini/claude(critic M2)
- Gemini API key 在 prod log 明文(需輪換 + sanitize)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 11:39:36 +08:00

171 lines
6.3 KiB
Python
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
"""
P0 DIAGNOSE Routing Tests
==========================
測試 AIRouter DIAGNOSE 路由 + require_local 隔離行為
建立時間: 2026-04-04 (台北時區)
建立者: Claude Code (P0 DIAGNOSE Privacy-First)
2026-04-05 v4.3: Ollama CPU-only 238s 不可用DIAGNOSE 統一走 NIM (_full_fallback_chain)
"""
import os
os.environ.setdefault("MOCK_MODE", "true")
import pytest
from unittest.mock import AsyncMock, MagicMock, patch
class TestNemotronPerTaskTimeout:
"""Nemotron 支援 per-task timeout"""
@pytest.mark.asyncio
async def test_diagnose_uses_diagnose_timeout(self):
"""DIAGNOSE context 應使用 NEMOTRON_DIAGNOSE_TIMEOUT_SECONDS"""
from src.services.ai_providers.nemotron import NemotronProvider
provider = NemotronProvider()
# 建立 mock nvidia provider
mock_nvidia = MagicMock()
mock_result = MagicMock()
mock_result.tool_calls = []
mock_nvidia.tool_call = AsyncMock(return_value=mock_result)
with patch.object(provider, '_get_nvidia', return_value=mock_nvidia):
result = await provider.analyze(
prompt="測試診斷",
context={"task_type": "diagnose"},
)
assert result.success is True
mock_nvidia.tool_call.assert_called_once()
class TestLocalFallbackChain:
"""require_local=True 時 privacy 過濾生效cloud provider 不被呼叫;全部失敗 → REJECT"""
@pytest.mark.asyncio
async def test_require_local_skips_cloud_providers(self):
"""require_local=True 時cloud provider 不被呼叫"""
import os
from src.services.ai_router import AIRouterExecutor, AIProviderRegistry
from src.services.ai_providers.interfaces import AIResult
registry = AIProviderRegistry()
# Mock: Ollama 成功
mock_ollama = AsyncMock()
mock_ollama.name = "ollama"
mock_ollama.privacy_level = "local"
mock_ollama.is_enabled = True
mock_ollama.capabilities = {"rca", "chat"}
mock_ollama.analyze = AsyncMock(return_value=AIResult(
raw_response="本地診斷結果",
success=True,
provider="ollama",
))
mock_ollama.health_check = AsyncMock(return_value=True)
# Mock: Gemini不應該被呼叫
mock_gemini = AsyncMock()
mock_gemini.name = "gemini"
mock_gemini.privacy_level = "cloud"
mock_gemini.is_enabled = True
mock_gemini.analyze = AsyncMock(return_value=AIResult(
raw_response="雲端結果",
success=True,
provider="gemini",
))
registry._providers = {
"ollama": mock_ollama,
"gemini": mock_gemini,
}
executor = AIRouterExecutor(registry)
# 暫時關閉 MOCK_MODE測試真實執行路徑
with patch("src.services.ai_router._settings") as mock_settings:
mock_settings.MOCK_MODE = False
result = await executor.execute(
prompt="診斷這個問題",
provider_order=["ollama", "gemini"],
require_local=True,
)
assert result.success is True
assert result.provider == "ollama"
mock_gemini.analyze.assert_not_called()
@pytest.mark.asyncio
async def test_require_local_all_fail_returns_reject(self):
"""require_local=True 且所有 local provider 失敗 → 回傳明確錯誤"""
import os
from src.services.ai_router import AIRouterExecutor, AIProviderRegistry
from src.services.ai_providers.interfaces import AIResult
registry = AIProviderRegistry()
# Mock: Ollama 失敗
mock_ollama = AsyncMock()
mock_ollama.name = "ollama"
mock_ollama.privacy_level = "local"
mock_ollama.is_enabled = True
mock_ollama.capabilities = {"rca", "chat"}
mock_ollama.analyze = AsyncMock(return_value=AIResult(
raw_response="",
success=False,
provider="ollama",
error="timeout",
))
mock_ollama.health_check = AsyncMock(return_value=False)
registry._providers = {
"ollama": mock_ollama,
}
executor = AIRouterExecutor(registry)
# 暫時關閉 MOCK_MODE + 讓 telegram import 失敗(不影響主流程)
with patch("src.services.ai_router._settings") as mock_settings:
mock_settings.MOCK_MODE = False
result = await executor.execute(
prompt="診斷這個問題",
provider_order=["ollama"],
require_local=True,
)
assert result.success is False
assert result.error == "local_providers_unavailable"
class TestDiagnoseIntentOverride:
"""DIAGNOSE intent 路由設定驗證"""
def test_diagnose_override_is_ollama(self):
"""_intent_provider_overrides[DIAGNOSE] 應為 OLLAMA2026-04-29 推翻 A2
歷史脈絡:
- 2026-04-12 ogt: NEMOTRON routing 暫停 — NIM tool_call 無 confidence 欄位
- 2026-04-16 ogt: 恢復 DIAGNOSE → OPENCLAW_NEMO — None 複雜度路由落入 Rule 6
→ Ollama deepseek-r1:14b CPU 需 238s → timeout → degraded → 全部「待分析」
- 2026-04-27 Claude Sonnet 4.6 A2: 確立「Ollama 永久排除於 DIAGNOSE chain」
2026-04-29 推翻 A2 鐵律:
- 統帥指令: 「主要優先用 111 主機的 Ollama」
- 統帥鐵律 feedback_ai_autonomous_direction.md: 以本地免費 LLM 為主
- 統帥鐵律 feedback_ollama_111_only.md: Ollama 唯一主機 = 111
- 新事實: prod Ollama 111 = M1 Pro Apple Silicon GPU + qwen2.5:7b-instruct
VRAM 8.2GB 全載入,實測 hi 0.54s
- 雲端全死: OpenClaw 500 / Gemini 429 / Claude 404
- 配套openclaw.py 注入 task_type="diagnose" → Ollama 用 200s timeout
"""
from src.services.ai_router import AIRouter, AIProviderEnum
from src.services.intent_classifier import IntentType
router = AIRouter()
override = router._intent_provider_overrides.get(IntentType.DIAGNOSE)
assert override is AIProviderEnum.OLLAMA, (
f"統帥鐵律: DIAGNOSE 應為 OLLAMA本地優先實際為 {override}"
)