fix(ai-router): ADR-110 GCP 三層容災 — 修復 Ollama 直跳 Gemini 根因
All checks were successful
Code Review / ai-code-review (push) Successful in 55s
run-migration / migrate (push) Successful in 41s

根因(所有告警 Ollama 失敗直接跳 Gemini 的原因):
AIProviderEnum 缺少 ollama_gcp_a / ollama_gcp_b / ollama_local
→ AIProviderEnum("ollama_gcp_a") 拋 ValueError
→ fallback chain 清空(所有 GCP 端點轉換全失敗)
→ failover_fallback = [](空 list,非 None)
→ fallback_chain 被覆寫為 [] 而非走 Gemini 備援
→ AIProviderRegistry.get("ollama_gcp_a") 回傳 None → not_registered → 跳過
→ 整條 Ollama 鏈(GCP-A → GCP-B → 111)全部略過,直接跳 Gemini

修復:
1. AIProviderEnum 新增 OLLAMA_GCP_A / OLLAMA_GCP_B / OLLAMA_LOCAL
2. PROVIDER_LATENCY_BUDGET 補齊三個新 enum
3. ollama.py 新增 OllamaGcpBProvider(OLLAMA_SECONDARY_URL = GCP-B 34.21.145.224)
4. _init_registry() 補登:
   - "ollama_gcp_a" alias → OllamaProvider(GCP-A,OLLAMA_URL)
   - OllamaGcpBProvider("ollama_gcp_b",OLLAMA_SECONDARY_URL)
   - "ollama_local" alias → Ollama188Provider(111,OLLAMA_FALLBACK_URL)

修復後路由順序:GCP-A → GCP-B → Local(111) → Gemini → Claude

2026-05-04 ogt + Claude Sonnet 4.6

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Your Name
2026-05-04 13:49:32 +08:00
parent 14bf86a462
commit 2b2359e367
2 changed files with 68 additions and 5 deletions

View File

@@ -389,3 +389,38 @@ class Ollama188Provider(OllamaProvider):
return resp.status_code == 200
except Exception:
return False
class OllamaGcpBProvider(OllamaProvider):
"""
GCP-B Secondary Ollama Provider
繼承 OllamaProvider使用 OLLAMA_SECONDARY_URL34.21.145.224:11434
ADR-110 三層容災GCP-A → GCP-B → Local(111)。
OllamaFailoverManager 回傳 provider_name="ollama_gcp_b" 時由此 Provider 執行。
2026-05-04 ogt + Claude Sonnet 4.6: ADR-110 GCP-B 容災補全
根因AIProviderRegistry 缺少 "ollama_gcp_b" → not_registered → 跳 Gemini
"""
@property
def name(self) -> str:
return "ollama_gcp_b"
@property
def is_enabled(self) -> bool:
return bool(getattr(settings, "OLLAMA_SECONDARY_URL", ""))
def _endpoint_url(self) -> str:
return getattr(settings, "OLLAMA_SECONDARY_URL", "")
async def health_check(self) -> bool:
url = getattr(settings, "OLLAMA_SECONDARY_URL", "")
if not url:
return False
try:
client = await self._get_client()
resp = await client.get(f"{url}/api/tags", timeout=5.0)
return resp.status_code == 200
except Exception:
return False

View File

@@ -77,6 +77,12 @@ class AIProviderEnum(str, Enum):
# P1.1b OllamaFailoverManager 使用 provider_name="ollama_188"
# 但 AIProviderEnum 沒有此值 → P1.2 整合時 lookup 失敗
OLLAMA_188 = "ollama_188" # 188 CPU-only 備援節點P1.1b
# 2026-05-04 ogt + Claude Sonnet 4.6: ADR-110 GCP 三層容災
# OllamaFailoverManager 回傳 provider_name="ollama_gcp_a"/"ollama_gcp_b"/"ollama_local"
# 缺少 enum 值 → AIProviderEnum(primary_str) 拋 ValueError → fallback chain 清空 → 直跳 Gemini
OLLAMA_GCP_A = "ollama_gcp_a" # GCP-A 34.143.170.20 Primary
OLLAMA_GCP_B = "ollama_gcp_b" # GCP-B 34.21.145.224 Secondary
OLLAMA_LOCAL = "ollama_local" # 192.168.0.111 Local Fallback
GEMINI = "gemini"
CLAUDE = "claude"
# 2026-04-02 ogt: C1 修復 — 對齊 Registry 實際名稱
@@ -92,6 +98,10 @@ PROVIDER_LATENCY_BUDGET: dict[AIProviderEnum, int] = {
AIProviderEnum.OLLAMA: 60000, # 本地,允許較長處理時間
# 2026-04-25 critic-fix Part2 B2 by Claude Engineer-C2 — 188 CPU-only 推理較慢
AIProviderEnum.OLLAMA_188: 120000, # 120s budget for CPU inference
# 2026-05-04 ogt: ADR-110 GCP 三層容災 — GCP NVMe SSD 推理快60s 足夠
AIProviderEnum.OLLAMA_GCP_A: 60000,
AIProviderEnum.OLLAMA_GCP_B: 60000,
AIProviderEnum.OLLAMA_LOCAL: 90000, # 111 本地 HDD 稍慢
AIProviderEnum.GEMINI: 30000, # 雲端,較低延遲
AIProviderEnum.CLAUDE: 30000, # 雲端,較低延遲
# 2026-04-02 ogt: C1 修復 — 對齊 Registry 名稱
@@ -1294,13 +1304,21 @@ _executor: AIRouterExecutor | None = None
def _init_registry() -> AIProviderRegistry:
"""初始化 Provider Registry (首次呼叫時自動註冊所有 Provider)"""
from src.services.ai_providers.ollama import OllamaProvider, Ollama188Provider # 2026-04-26 Wave5 B1-fix by Claude Engineer-A4
from src.services.ai_providers.ollama import (
OllamaProvider,
Ollama188Provider,
OllamaGcpBProvider, # 2026-05-04 ADR-110 GCP-B
)
from src.services.ai_providers.gemini import GeminiProvider
from src.services.ai_providers.claude import ClaudeProvider
from src.services.ai_providers.openclaw_nemo import OpenClawNemoProvider
registry = AIProviderRegistry()
registry.register(OllamaProvider())
# GCP-A Primaryname="ollama"OLLAMA_URL
ollama_gcp_a = OllamaProvider()
registry.register(ollama_gcp_a)
registry.register(GeminiProvider())
registry.register(ClaudeProvider())
registry.register(OpenClawNemoProvider())
@@ -1310,9 +1328,19 @@ def _init_registry() -> AIProviderRegistry:
registry.register(NemotronProvider())
# 2026-04-26 Wave5 B1-fix by Claude Engineer-A4 — 補登 OLLAMA_188 備援 provider
# 修復:原本 failover_manager 決策返回 "ollama_188",但 executor 查不到 → not_registered
# → 188 從未被打到。必須明確 register 才能讓 executor.execute() 路由到 188。
registry.register(Ollama188Provider())
ollama_local = Ollama188Provider()
registry.register(ollama_local)
# 2026-05-04 ogt + Claude Sonnet 4.6: ADR-110 GCP 三層容災修復
# 根因OllamaFailoverManager 回傳 "ollama_gcp_a"/"ollama_gcp_b"/"ollama_local"
# 但 registry 無這些名稱 → not_registered → 整條 Ollama 鏈跳過 → 直接跳 Gemini
# 修復:
# "ollama_gcp_a" alias → 同 OllamaProviderOLLAMA_URL = GCP-A
# "ollama_gcp_b" → 新 OllamaGcpBProviderOLLAMA_SECONDARY_URL = GCP-B
# "ollama_local" alias → 同 Ollama188ProviderOLLAMA_FALLBACK_URL = 111
registry._providers["ollama_gcp_a"] = ollama_gcp_a
registry.register(OllamaGcpBProvider())
registry._providers["ollama_local"] = ollama_local
return registry