fix(ops): keep Ollama health checks on alert fast model
This commit is contained in:
@@ -370,9 +370,10 @@ class Settings(BaseSettings):
|
||||
)
|
||||
return v
|
||||
|
||||
# 2026-04-25 Claude Engineer-C (P1.1): Ollama 健康檢測推理測試模型
|
||||
# 2026-05-05 Codex: health inference must stay on alert-fast model; qwen2.5
|
||||
# keeps reloading a 7B model on CPU-only GCP and slows incident fallback.
|
||||
OLLAMA_HEALTH_CHECK_MODEL: str = Field(
|
||||
default="qwen2.5:7b-instruct",
|
||||
default="gemma3:4b",
|
||||
description="OllamaHealthMonitor 推理測試使用模型(P1.1)",
|
||||
)
|
||||
# 2026-04-12 ogt: 心跳必須確認載入的 Ollama 模型清單
|
||||
|
||||
@@ -3201,6 +3201,7 @@ bash scripts/ops/ollama-topology-check.sh
|
||||
- `interactive` / `healthcheck` / `alert_fast` 保持 GCP-A 優先
|
||||
- `code_review` / `rag` / `embedding` / `deep_rca` / `image_analysis` / `hermes` 改為 111 優先
|
||||
- 111 不可用時才回 GCP-B,避免 GCP-A/B 在告警 canary 期間被 7B/14B/32B 模型污染
|
||||
- `OLLAMA_HEALTH_CHECK_MODEL` 改為 `gemma3:4b`,避免 health probe 自己把 `qwen2.5:7b-instruct` 載入 GCP-A
|
||||
|
||||
驗證:
|
||||
|
||||
|
||||
@@ -79,6 +79,8 @@ spec:
|
||||
value: "true" # 告警診斷強制先走 GCP-A → GCP-B → 111
|
||||
- name: ALERT_OLLAMA_MODEL
|
||||
value: "gemma3:4b" # 2026-05-05 Codex: qwen3:14b 告警 JSON prompt 會拖到 504
|
||||
- name: OLLAMA_HEALTH_CHECK_MODEL
|
||||
value: "gemma3:4b" # 2026-05-05 Codex: 避免 health probe 載入 qwen2.5 7B 污染 GCP alert lane
|
||||
- name: OPENCLAW_DEFAULT_MODEL
|
||||
value: "qwen2.5:7b-instruct"
|
||||
- name: OPENCLAW_TIMEOUT
|
||||
|
||||
Reference in New Issue
Block a user