Some checks failed
CD Pipeline / build-and-deploy (push) Failing after 2m26s
統帥鐵律 2026-04-29:「主要優先用 111 主機的 Ollama」
+ feedback_ai_autonomous_direction.md:以本地免費 LLM 為主
+ feedback_ollama_111_only.md:Ollama 唯一主機 = 111
## 推翻 A2 (2026-04-27 INC-20260425) 的事實基礎
**舊事實**:Ollama = CPU-only deepseek-r1:14b @ 238s(不可用)
**新事實**:prod Ollama 111 = M1 Pro Apple Silicon GPU + qwen2.5:7b-instruct
VRAM 8.2GB 全載入,ctx 32k,實測 hi prompt 0.54s
**雲端全死**(2026-04-29 prod log 證據):
- OpenClaw 188:8088 → 500 Internal Server Error
- Gemini → 429 Too Many Requests(配額爆)
- Claude → 404 Not Found(model claude-3-haiku-20240307 過期)
**不推翻 A2 → 100% incident llm_failed → AI 自動修復永遠不啟動**
## 修改範圍(最小、安全、可驗證)
### ai_router.py
- `_diagnose_fallback_chain`: OLLAMA 第一順位(取代「永久排除」舊註解)
順序:[OLLAMA, OPENCLAW_NEMO, GEMINI, CLAUDE]
- `_intent_provider_overrides[DIAGNOSE]`: OPENCLAW_NEMO → OLLAMA
- 不動 _full_fallback_chain(避免影響 RESTART/SCALE/CONFIG/DELETE)
- 不動 _tool_calling_fallback_chain
- 不動 complexity_map(critic M2 留待後續)
### openclaw.py
- 注入 task_type="diagnose" 到 alert_context(critic C2 真根因)
- 修復 ai_providers/ollama.py:77 timeout 對齊問題:
- 有 task_type → OLLAMA_DIAGNOSE_TIMEOUT_SECONDS=200s
- 沒有 → OPENCLAW_TIMEOUT=30s(不夠 qwen2.5:7b 推理)
- prod log 看到 latency_ms=120014 的根因
- 用 dict(alert_context) 複製,不污染原 context
## Regression Test 同步更新(5 個)
A2 鐵律守門 test 全部反映新鐵律:
- test_p0_diagnose_routing.py::test_diagnose_override_is_ollama
(原 test_diagnose_override_is_openclaw_nemo)
- test_ai_router_diagnose_fallback.py::test_diagnose_fallback_chain_ollama_primary
(原 test_diagnose_fallback_chain_no_ollama)
- test_ai_router_diagnose_fallback.py::test_diagnose_route_primary_is_ollama
(原 test_diagnose_route_fallback_chain_excludes_ollama)
- test_ai_router_diagnose_fallback.py::test_diagnose_route_sync_primary_is_ollama
(原 test_diagnose_route_sync_fallback_chain_excludes_ollama)
- test_ai_router_diagnose_fallback.py::test_build_fallback_chain_for_intent_diagnose_with_ollama_primary
(原 test_build_fallback_chain_for_intent_diagnose_no_ollama)
- test_ai_router_failover_integration.py::test_router_uses_failover_for_diagnose_ollama_primary
(原 test_router_does_not_use_failover_for_openclaw_nemo)
每個 test docstring 都記載歷史脈絡 + 推翻原因。
## 驗證
- 1608 unit tests 全綠
- LLM 路徑 16 個 test 全綠(含 6 個 A2 守門 test 更新版)
- complexity_scorer / failover_manager / intent_classifier 不受影響
## 期望 prod 行為(部署後驗證)
incident 進入 → DIAGNOSE intent → primary OLLAMA (qwen2.5:7b on M1 Pro GPU)
失敗才 fallback → OpenClaw 188 → Gemini → Claude
Ollama 用 200s timeout(之前 30s 不夠)
→ AI 自動修復終於可以啟動,不再 100% llm_failed
## 已知債(後續處理)
- models.json:21 ollama.default 仍是 deepseek-r1:14b(critic C1,但 prod 已自動 route 到實載 model)
- complexity 4/5 仍寫死 gemini/claude(critic M2)
- Gemini API key 在 prod log 明文(需輪換 + sanitize)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
171 lines
6.3 KiB
Python
171 lines
6.3 KiB
Python
"""
|
||
P0 DIAGNOSE Routing Tests
|
||
==========================
|
||
測試 AIRouter DIAGNOSE 路由 + require_local 隔離行為
|
||
|
||
建立時間: 2026-04-04 (台北時區)
|
||
建立者: Claude Code (P0 DIAGNOSE Privacy-First)
|
||
2026-04-05 v4.3: Ollama CPU-only 238s 不可用;DIAGNOSE 統一走 NIM (_full_fallback_chain)
|
||
"""
|
||
|
||
import os
|
||
os.environ.setdefault("MOCK_MODE", "true")
|
||
|
||
import pytest
|
||
from unittest.mock import AsyncMock, MagicMock, patch
|
||
|
||
|
||
class TestNemotronPerTaskTimeout:
|
||
"""Nemotron 支援 per-task timeout"""
|
||
|
||
@pytest.mark.asyncio
|
||
async def test_diagnose_uses_diagnose_timeout(self):
|
||
"""DIAGNOSE context 應使用 NEMOTRON_DIAGNOSE_TIMEOUT_SECONDS"""
|
||
from src.services.ai_providers.nemotron import NemotronProvider
|
||
|
||
provider = NemotronProvider()
|
||
|
||
# 建立 mock nvidia provider
|
||
mock_nvidia = MagicMock()
|
||
mock_result = MagicMock()
|
||
mock_result.tool_calls = []
|
||
mock_nvidia.tool_call = AsyncMock(return_value=mock_result)
|
||
|
||
with patch.object(provider, '_get_nvidia', return_value=mock_nvidia):
|
||
result = await provider.analyze(
|
||
prompt="測試診斷",
|
||
context={"task_type": "diagnose"},
|
||
)
|
||
|
||
assert result.success is True
|
||
mock_nvidia.tool_call.assert_called_once()
|
||
|
||
|
||
class TestLocalFallbackChain:
|
||
"""require_local=True 時 privacy 過濾生效,cloud provider 不被呼叫;全部失敗 → REJECT"""
|
||
|
||
@pytest.mark.asyncio
|
||
async def test_require_local_skips_cloud_providers(self):
|
||
"""require_local=True 時,cloud provider 不被呼叫"""
|
||
import os
|
||
from src.services.ai_router import AIRouterExecutor, AIProviderRegistry
|
||
from src.services.ai_providers.interfaces import AIResult
|
||
|
||
registry = AIProviderRegistry()
|
||
|
||
# Mock: Ollama 成功
|
||
mock_ollama = AsyncMock()
|
||
mock_ollama.name = "ollama"
|
||
mock_ollama.privacy_level = "local"
|
||
mock_ollama.is_enabled = True
|
||
mock_ollama.capabilities = {"rca", "chat"}
|
||
mock_ollama.analyze = AsyncMock(return_value=AIResult(
|
||
raw_response="本地診斷結果",
|
||
success=True,
|
||
provider="ollama",
|
||
))
|
||
mock_ollama.health_check = AsyncMock(return_value=True)
|
||
|
||
# Mock: Gemini(不應該被呼叫)
|
||
mock_gemini = AsyncMock()
|
||
mock_gemini.name = "gemini"
|
||
mock_gemini.privacy_level = "cloud"
|
||
mock_gemini.is_enabled = True
|
||
mock_gemini.analyze = AsyncMock(return_value=AIResult(
|
||
raw_response="雲端結果",
|
||
success=True,
|
||
provider="gemini",
|
||
))
|
||
|
||
registry._providers = {
|
||
"ollama": mock_ollama,
|
||
"gemini": mock_gemini,
|
||
}
|
||
|
||
executor = AIRouterExecutor(registry)
|
||
|
||
# 暫時關閉 MOCK_MODE,測試真實執行路徑
|
||
with patch("src.services.ai_router._settings") as mock_settings:
|
||
mock_settings.MOCK_MODE = False
|
||
result = await executor.execute(
|
||
prompt="診斷這個問題",
|
||
provider_order=["ollama", "gemini"],
|
||
require_local=True,
|
||
)
|
||
|
||
assert result.success is True
|
||
assert result.provider == "ollama"
|
||
mock_gemini.analyze.assert_not_called()
|
||
|
||
@pytest.mark.asyncio
|
||
async def test_require_local_all_fail_returns_reject(self):
|
||
"""require_local=True 且所有 local provider 失敗 → 回傳明確錯誤"""
|
||
import os
|
||
from src.services.ai_router import AIRouterExecutor, AIProviderRegistry
|
||
from src.services.ai_providers.interfaces import AIResult
|
||
|
||
registry = AIProviderRegistry()
|
||
|
||
# Mock: Ollama 失敗
|
||
mock_ollama = AsyncMock()
|
||
mock_ollama.name = "ollama"
|
||
mock_ollama.privacy_level = "local"
|
||
mock_ollama.is_enabled = True
|
||
mock_ollama.capabilities = {"rca", "chat"}
|
||
mock_ollama.analyze = AsyncMock(return_value=AIResult(
|
||
raw_response="",
|
||
success=False,
|
||
provider="ollama",
|
||
error="timeout",
|
||
))
|
||
mock_ollama.health_check = AsyncMock(return_value=False)
|
||
|
||
registry._providers = {
|
||
"ollama": mock_ollama,
|
||
}
|
||
|
||
executor = AIRouterExecutor(registry)
|
||
|
||
# 暫時關閉 MOCK_MODE + 讓 telegram import 失敗(不影響主流程)
|
||
with patch("src.services.ai_router._settings") as mock_settings:
|
||
mock_settings.MOCK_MODE = False
|
||
result = await executor.execute(
|
||
prompt="診斷這個問題",
|
||
provider_order=["ollama"],
|
||
require_local=True,
|
||
)
|
||
|
||
assert result.success is False
|
||
assert result.error == "local_providers_unavailable"
|
||
|
||
|
||
class TestDiagnoseIntentOverride:
|
||
"""DIAGNOSE intent 路由設定驗證"""
|
||
|
||
def test_diagnose_override_is_ollama(self):
|
||
"""_intent_provider_overrides[DIAGNOSE] 應為 OLLAMA(2026-04-29 推翻 A2)
|
||
|
||
歷史脈絡:
|
||
- 2026-04-12 ogt: NEMOTRON routing 暫停 — NIM tool_call 無 confidence 欄位
|
||
- 2026-04-16 ogt: 恢復 DIAGNOSE → OPENCLAW_NEMO — None 複雜度路由落入 Rule 6
|
||
→ Ollama deepseek-r1:14b CPU 需 238s → timeout → degraded → 全部「待分析」
|
||
- 2026-04-27 Claude Sonnet 4.6 A2: 確立「Ollama 永久排除於 DIAGNOSE chain」
|
||
|
||
2026-04-29 推翻 A2 鐵律:
|
||
- 統帥指令: 「主要優先用 111 主機的 Ollama」
|
||
- 統帥鐵律 feedback_ai_autonomous_direction.md: 以本地免費 LLM 為主
|
||
- 統帥鐵律 feedback_ollama_111_only.md: Ollama 唯一主機 = 111
|
||
- 新事實: prod Ollama 111 = M1 Pro Apple Silicon GPU + qwen2.5:7b-instruct
|
||
VRAM 8.2GB 全載入,實測 hi 0.54s
|
||
- 雲端全死: OpenClaw 500 / Gemini 429 / Claude 404
|
||
- 配套:openclaw.py 注入 task_type="diagnose" → Ollama 用 200s timeout
|
||
"""
|
||
from src.services.ai_router import AIRouter, AIProviderEnum
|
||
from src.services.intent_classifier import IntentType
|
||
|
||
router = AIRouter()
|
||
override = router._intent_provider_overrides.get(IntentType.DIAGNOSE)
|
||
assert override is AIProviderEnum.OLLAMA, (
|
||
f"統帥鐵律: DIAGNOSE 應為 OLLAMA(本地優先),實際為 {override}"
|
||
)
|