feat(p1): Ollama 多層容災系統 — P1.1 健康檢測 + P1.2 ai_router 整合 + P1.5 容災告警

ADR-092 P1 飛輪閉環的 Ollama 失敗轉移子系統,全部 Engineer-A2/C/C2 補上。

新服務 (1581 行):
- ollama_health_monitor.py (356):3 層健康檢測(TCP/HTTP/推理)
- ollama_failover_manager.py (571):111→188 自動切換 + Redis 持久化 + recovery callback
- ollama_auto_recovery.py (436):30s 背景監控 + 連續 3 次 HEALTHY → 切回 + clear_cache
- failover_alerter.py (218):P1.5 Telegram 容災告警

服務整合:
- ai_router.py: AIProviderEnum.OLLAMA_188 + 120s budget + failover fallback chain
- main.py lifespan: 啟動時 wire callback + start recovery,關閉時優雅 stop
- config.py: OLLAMA_FALLBACK_URL / OLLAMA_HEALTH_CHECK_MODEL / GEMINI_DAILY_QUOTA(帳單熔斷)

K8s 配置:
- 04-configmap.yaml.patch-188-fallback:注入 OLLAMA_FALLBACK_URL=http://192.168.0.188:11434

測試 (2082 行):
- test_ollama_health_monitor.py (402)
- test_ollama_failover_manager.py (707)
- test_ollama_auto_recovery.py (580)
- test_ai_router_failover_integration.py (257)
- test_lifespan_failover_wiring.py (136)

依賴鏈:service 三件套 + ai_router + main.py 一起 commit,缺一就 ImportError。

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Your Name
2026-04-26 20:18:33 +08:00
parent d3a4fb4d15
commit 55c6b4e2d9
13 changed files with 3798 additions and 2 deletions

View File

@@ -189,11 +189,29 @@ class Settings(BaseSettings):
default="http://192.168.0.111:11434", # 2026-04-08 ogt: 切換至 M1 Pro (40+ tok/s vs 0.45 tok/s)
description="Ollama LLM service URL",
)
# 2026-04-25 Claude Engineer-C (P1.1): Ollama 188 CPU-only 備援 (方案 C)
# 若空字串則 OllamaFailoverManager 僅使用 OLLAMA_URL單節點模式
OLLAMA_FALLBACK_URL: str = Field(
default="",
description="Ollama CPU-only fallback URL (188 備援P1.1),空字串=停用",
)
# 2026-04-25 Claude Engineer-C (P1.1): Ollama 健康檢測推理測試模型
OLLAMA_HEALTH_CHECK_MODEL: str = Field(
default="qwen2.5:7b-instruct",
description="OllamaHealthMonitor 推理測試使用模型P1.1",
)
# 2026-04-12 ogt: 心跳必須確認載入的 Ollama 模型清單
OLLAMA_REQUIRED_MODELS: list[str] = Field(
default=["nomic-embed-text", "qwen2.5:7b-instruct", "deepseek-r1:14b"],
description="HeartbeatReportService 探測必要模型是否載入",
)
# 2026-04-25 critic-fix Part2 H7 by Claude Engineer-C2
# Gemini 帳單熔斷:每日呼叫上限,超過改走 188+Nemotron
# 超過上限後寫 Redis key ollama:gemini_daily_count:{date}TTL 86400s
GEMINI_DAILY_QUOTA: int = Field(
default=1000,
description="每日 Gemini 呼叫上限,超過切到 188+NemotronP1.1 帳單熔斷)",
)
# Deprecated: use OPENCLAW_URL instead
CLAWBOT_URL: str = Field(
default="http://192.168.0.188:8088", # 🔧 修正: OpenClaw 實際 port 是 8088