強化 Ollama host health runtime 探針

2026-05-25 12:53:35 +08:00
parent aad26ea87c
commit e3dadc28db
11 changed files with 127 additions and 4 deletions
--- a/.env.example
+++ b/.env.example
@@ -414,6 +414,13 @@ OLLAMA_EMBED_TIMEOUT=15
 OLLAMA_EMBED_MAX_TIMEOUT=15
 OLLAMA_EMBED_KEEP_ALIVE=1m
 OLLAMA_EMBED_MAX_CHARS=4000
+OLLAMA_EMBED_GCP_FAILURE_COOLDOWN_SEC=60
+OLLAMA_EMBED_GCP_FAILURE_NOTICE_SEC=30
+OLLAMA_HOST_HEALTH_MODEL_PROBE_ENABLED=true
+OLLAMA_HOST_HEALTH_MODEL_PROBE_INCLUDE_111=false
+OLLAMA_HOST_HEALTH_EMBED_MODEL=bge-m3:latest
+OLLAMA_HOST_HEALTH_EMBED_TIMEOUT=8
+OLLAMA_HOST_HEALTH_EMBED_KEEP_ALIVE=1m
 # 111 是 Mac final fallback，不承接 7B+ / vision / long-context / 長輸出任務；落到 111 時自動降級與縮短常駐。
 OLLAMA_111_MODEL_FALLBACK=llama3.2:latest
 OLLAMA_111_MODEL_DOWNGRADE_PATTERNS=qwen3:*,deepseek-r1:*,hermes3:*,llama3.1:*,qwen2.5:*,qwen2.5-coder:*,gemma3:*,minicpm-v:*,llava:*,*:7b*,*:8b*,*:14b*,*:32b*,*:70b*
--- a/TODO_NEXT_STEPS.txt
+++ b/TODO_NEXT_STEPS.txt
@@ -4,6 +4,7 @@
 ================================================================================

 【已完成】
+   - V10.470 強化 Ollama host health probe，已部署正式環境並確認 `/health=V10.470`：scheduler 與觀測台 host health 對 GCP-A / GCP-B 除 `/api/tags` 外，再做短 `bge-m3` `/api/embed` 實作探針；可抓出 GCP-B「tags/version 正常但 embedding runner 8s timeout」這類假健康。111 預設不做背景 embedding probe，避免監控任務把 `bge-m3` 載入 fallback Mac。正式 smoke 後 `host_health_probes` 最新狀態為 GCP-A unhealthy、GCP-B unhealthy、111 healthy。
   - V10.469 將背景 embedding 的 GCP-only 全失敗改為專業降級語意，已部署正式環境並確認 `/health=V10.469`：`allow_111_fallback=False` 時若 GCP-A/GCP-B 都不可用，開啟 failure circuit 並記 WARNING，不再把可預期的背景熔斷每分鐘打成 ERROR；同步 / 允許 fallback 的 embedding 全失敗仍保留 ERROR。Smoke 顯示 GCP-B `/api/version` 可用，但 `/api/embed` 仍可能 15s timeout，下一步需修 GCP-A primary 與 GCP-B runner/model 負載。
   - V10.468 補 Ollama import-time 防凍結與背景 embedding GCP failure circuit，已部署正式環境並確認 `/health=V10.468`：`config.OLLAMA_HOST` / `HERMES_URL` / `EMBEDDING_HOST` 舊相容常數不再於 import 時 probe network，也不會因 GCP-A/GCP-B 暫時拒連而 freeze 到 111；動態 caller 仍走 `get_*()` / `OllamaService` 三主機級聯。當 `allow_111_fallback=False` 且 GCP-A/GCP-B 皆失敗時，短暫熔斷 60 秒，不重複打兩台 GCP、不落 111，降低 app/scheduler 因連續 embedding timeout 造成的 log 與 worker 壓力；部署 smoke 時 GCP-B `/api/version` 已恢復 200 並成為動態路由落點，GCP-A `22/11434` 仍拒連，需後續用 GCP 權限修復 primary Ollama 主機。
   - V10.467 補 PChome focused exact total-price 安全通道：針對正式近門檻樣本中已確認同品牌、同品名、同規格/同入數的 3W CLINIC 粉底液 2入、花美水凝膠 3支、The Ordinary 咖啡因 EGCG 30ml、KUSSEN 屁屁膏 3入、Bone 擴香禮盒、1990 融燭燈白色款與 CANMAKE 淚袋盤，從 `exact/manual_review` 收斂為 `exact/total_price`；未放寬 `MIN_MATCH_SCORE`，DASHING DIVA、唇彩、香味、色號/款式敏感商品仍維持 variant / veto 保護。Production pilot 已將 9 筆安全 SKU 送入 `rescore_accepted_current`，`true_low_confidence` 802→793、`rescore_accepted_current` 38→47；`6101784` 即期品保留在 `true_low_confidence`。
--- a/config.py
+++ b/config.py
@@ -350,7 +350,7 @@ YOUTUBE_API_KEY = os.getenv('YOUTUBE_API_KEY', '')
 # ==========================================
 # 系統版本與路徑
 # ==========================================
-SYSTEM_VERSION = "V10.469"
+SYSTEM_VERSION = "V10.470"
 LOG_FILE_PATH = os.path.join(BASE_DIR, 'logs/system.log')
 public_url = PUBLIC_URL  # 用於模板顯示

--- a/docs/AI_INTELLIGENCE_MODULE_SOT.md
+++ b/docs/AI_INTELLIGENCE_MODULE_SOT.md
@@ -2,7 +2,7 @@

 > **最後更新**: 2026-05-25 (台北時間)
 > **狀態**: 🟢 四 AI Agent 自動化閉環已落地；LLM 路由紅線升級為 Ollama-first 三主機級聯，Gemini 備援預設關閉
-> **適用版本**: V10.469
+> **適用版本**: V10.470

 ---

@@ -25,6 +25,7 @@
 - `allow_111_fallback=False` 時，若 resolver 因 unhealthy cache 回傳 111，不得直接結束 embedding；必須強制改試尚未嘗試的 GCP-A / GCP-B，避免正式 log 出現 `tried=[]` 或只試單台 GCP-B。
 - `allow_111_fallback=False` 且 GCP-A / GCP-B 皆失敗時，背景 embedding 會開啟短暫 GCP failure circuit（預設 60 秒），期間不重複打兩台 GCP、不落 111，避免 worker 與 log 被連續失敗拖慢；GCP 恢復後會自然再試。
 - 背景 embedding 的 GCP-only 熔斷屬於可降級背景能力，應記錄為明確 WARNING 與 circuit 狀態，不應每次污染 ERROR 通道；真正允許三主機 fallback 的同步 embedding 全失敗仍保留 ERROR。
+- Scheduler host health probe 不只看 `/api/tags`；GCP-A / GCP-B 節點必須再通過短 `bge-m3` `/api/embed` 實作探針，才算 healthy。111 預設不納入這個背景 embedding 探針，避免監測任務把 fallback Mac 載入 `bge-m3`。
 - BGE-M3 一致性檢查是監測任務，不是 fallback 壓測；預設只比對 GCP-A / GCP-B。111 Mac fallback 只有 `EMBED_CONSISTENCY_INCLUDE_111=true` 時才納入，避免每週背景檢查把 `bge-m3` 載入 111。
 - OpenClaw Telegram Q&A 主路徑也不得綁單一 host：`_call_qwen3_qa()` 必須透過 `OllamaService` 跑 GCP-A → GCP-B → 111，並把實際落點寫入 `ai_calls.provider`。
 - OpenClaw Telegram 圖片商品辨識也必須 Ollama-first：`_identify_product_name_with_ollama_vision()` 透過 `OllamaService` 嘗試 GCP-A → GCP-B → 111；Gemini 只允許以 `openclaw_bot_image_gemini` caller 作為失敗後備援。
--- a/docs/memory/current_execution_queue_20260524.md
+++ b/docs/memory/current_execution_queue_20260524.md
@@ -23,6 +23,7 @@
 - 2026-05-25 12:10 CST 狀態：已部署 `V10.467` 到 188，正式 `/health` 為 `V10.467`。本輪 recreate `momo-app`、`scheduler`、`telegram-bot`；未使用 `--remove-orphans`，未碰 `momo-db`。Smoke 通過：三個 app 容器 healthy、`/`、`/daily_sales`、`/growth_analysis`、`/observability/ppt_audit_history`、PChome rescore queue API HTTP 200。Production pilot 將 9 筆 focused exact total-price SKU 追加為 `rescore_accepted_current`，整體 latest counts 從 `true_low_confidence=802` / `rescore_accepted_current=38` 變為 `true_low_confidence=793` / `rescore_accepted_current=47`；目標 SKU 的 `competitor_prices` 最新 `crawled_at` 仍停在 2026-05-22～2026-05-23，確認本輪未寫正式價差表。已知後續：GCP-A / GCP-B Ollama `/api/version` 目前連線失敗，背景 embedding 正確沒有落 111，但 app/scheduler log 仍會出現 `[Embed] all 2 hosts failed`，需另開 Ollama 健康處理。
 - 2026-05-25 12:27 CST 狀態：已部署 `V10.468` 到 188，正式 `/health` 為 `V10.468`。本輪 recreate `momo-app`、`scheduler`、`telegram-bot`；未使用 `--remove-orphans`，未碰 `momo-db`。Smoke 通過：三個 app 容器 healthy、`/`、`/daily_sales`、`/growth_analysis`、`/observability/ppt_audit_history`、PChome review queue API `/api/pchome-review/queue` HTTP 200；容器內 mock smoke 證明背景 embedding 在 GCP-A / GCP-B 全失敗後會開啟 60 秒 failure circuit，第二筆不再重複打兩台 GCP，且不落 111。GCP 維運盤點：GCP-A `22/11434` refused；GCP-B `22` open 但現有 key publickey denied，部署 smoke 時 GCP-B `11434` 已恢復 200、`get_ollama_host()` 選到 GCP-B；111 `/api/version` 可用，但 111 仍不得承接背景 `bge-m3`。
 - 2026-05-25 12:39 CST 狀態：已部署 `V10.469` 到 188，正式 `/health` 為 `V10.469`。本輪 recreate `momo-app`、`scheduler`、`telegram-bot`；未使用 `--remove-orphans`，未碰 `momo-db`。Smoke 通過：三個 app 容器 healthy、首頁 / daily / growth / PChome review queue HTTP 200、Gemini hard disabled；`allow_111_fallback=False` 時 GCP-only embedding 全失敗會開啟 failure circuit 並記 WARNING，不再把預期內的背景熔斷打進 ERROR 通道。觀測到 GCP-B `/api/version` 200，但 `/api/embed` 仍可能 15s timeout，下一步需修 GCP-A primary 與 GCP-B runner/model 負載。
+- 2026-05-25 12:53 CST 狀態：已部署 `V10.470` 到 188，正式 `/health` 為 `V10.470`。本輪 recreate `momo-app`、`scheduler`、`telegram-bot`；未使用 `--remove-orphans`，未碰 `momo-db`。Smoke 通過：三容器 healthy、host health page HTTP 200 並顯示 Runtime 狀態、scheduler probe 寫入 DB。最新 `host_health_probes`：GCP-A unhealthy（11434 refused）、GCP-B unhealthy（`EmbedProbe ReadTimeout`, `/api/tags` 仍可列出 4 模型）、111 healthy；這補上「HTTP API 活著但模型 runtime 卡住」的假健康監控缺口。
 - 2026-05-25 12:05 CST 狀態：`main` 已部署到 188，正式 `/health` 為 `V10.467`，待推 Gitea。兩段變更已合併驗證：V10.466 rescore duplicate 改看 latest-state，7 筆 SKU 最新 attempt 全為 `rescore_accepted_current`，`competitor_prices` / `competitor_price_history` 目標計數未變；V10.467 focused exact matcher 在容器內回 `exact / total_price / price_alert_exact`。本輪 recreate `momo-app`、`scheduler`、`telegram-bot`；未使用 `--remove-orphans`，未碰 `momo-db`。Smoke 通過：三容器 healthy、PChome rescore queue API HTTP 200、Gemini 24 小時無 provider 紀錄、Ollama env 順序維持 GCP-A → GCP-B → 111、3 分鐘三容器 log 未見 Traceback / ERROR / CRITICAL / IntegrityError。

 ## 1. MOMO / PChome 核心比價準確率
--- a/docs/memory/history_logs.md
+++ b/docs/memory/history_logs.md
@@ -13,6 +13,7 @@
 ## 📅 詳細更新日誌 (考古存檔)

 ### 2026-05-24：PChome 近門檻身份回收第二輪
+- **V10.470 Ollama host health 實作探針**: `run_host_health_probe()` 對 GCP-A / GCP-B 在 `/api/tags` 成功後追加短 `bge-m3` `/api/embed` probe，避免 GCP-B 出現 tags/version 正常、但實際 embedding runner 20s timeout 時仍被標 healthy；111 預設不做背景 embedding probe，避免監測任務把 fallback Mac 載入 `bge-m3`。
 - **V10.469 Background embedding 降級語意修正**: `OllamaService.generate_embedding(..., allow_111_fallback=False)` 在 GCP-A/GCP-B 全失敗時會開啟短暫 failure circuit 並記 WARNING，不再把背景 `bge-m3` 降級熔斷每分鐘寫成 ERROR；同步或允許三主機 fallback 的 embedding 全失敗仍維持 ERROR，保留真正阻塞型故障訊號。
 - **V10.468 Ollama import-time / embedding 熔斷治理**: `config.OLLAMA_HOST`、`HERMES_URL`、`EMBEDDING_HOST` 舊相容常數改成靜態核准 env reader，不再於 import 時呼叫 `resolve_ollama_host()`，避免 GCP-A/GCP-B 短暫拒連時把 process 常數 freeze 到 111。`generate_embedding(..., allow_111_fallback=False)` 在 GCP-A/GCP-B 都失敗後會開短暫 GCP embedding circuit，避免背景任務每筆重打兩台故障主機；111 仍不承接背景 `bge-m3`。維運盤點曾見 110 proxy 11435/11436 因 GCP 11434 refused 回 502；部署 smoke 時 GCP-B `/api/version` 已恢復 200 並成為動態路由落點，GCP-A 22/11434 仍 refused，後續需以 GCP 權限恢復 primary Ollama 主機或 SSH key。
 - **V10.467 Focused exact total-price 安全通道**: `marketplace_product_matcher` 新增窄範圍 `focused_exact_total_price_safe` lane，僅針對正式近門檻樣本中同品牌、同品名、同規格/同入數的 3W CLINIC 粉底液 2入、花美水凝膠 3支、The Ordinary 咖啡因 EGCG 30ml、KUSSEN 屁屁膏 3入、Bone 擴香禮盒、1990 融燭燈白色款與 CANMAKE 淚袋盤，讓 `exact/manual_review` 可升到 `exact/total_price/price_alert_exact`；未放寬 `MIN_MATCH_SCORE`，DASHING DIVA、唇彩、香味、色號/款式敏感商品仍維持 variant / veto 保護。Production pilot 已將 SKU `6101639`、`10074951`、`7760902`、`TP00074980000005`、`14774766`、`10142589`、`10262470`、`10262471`、`11308520` materialize 到人工覆核隊列，`true_low_confidence` 802→793、`rescore_accepted_current` 38→47；`6101784` 即期品因商業條件不同仍留在低信心覆核。
--- a/routes/admin_observability_routes.py
+++ b/routes/admin_observability_routes.py
@@ -3596,6 +3596,10 @@ def host_health_dashboard():
            OLLAMA_HOST_PRIMARY, OLLAMA_HOST_SECONDARY, OLLAMA_HOST_FALLBACK,
            _is_unhealthy, _unhealthy_marks,
        )
+        from services.ollama_health_probe import (
+            host_health_model_probe_enabled,
+            probe_ollama_embedding_runtime,
+        )
        import requests as _r
        for label, host in [
            ('Primary (GCP)', OLLAMA_HOST_PRIMARY),
@@ -3603,7 +3607,7 @@ def host_health_dashboard():
            ('Fallback (111)', OLLAMA_HOST_FALLBACK),
        ]:
            entry = {'label': label, 'host': host, 'healthy': False,
-                     'unhealthy_mark': _is_unhealthy(host), 'models': []}
+                     'unhealthy_mark': _is_unhealthy(host), 'models': [], 'error': None}
            t0 = _time.monotonic()
            err = None
            try:
@@ -3613,10 +3617,16 @@ def host_health_dashboard():
                    entry['models'] = [
                        m.get('name', '') for m in resp.json().get('models', [])
                    ][:15]
+                    if host_health_model_probe_enabled(label):
+                        model_ok, model_err = probe_ollama_embedding_runtime(_r, host)
+                        if not model_ok:
+                            entry['healthy'] = False
+                            err = model_err
                else:
                    err = f"HTTP {resp.status_code}"
            except Exception as e:
                err = f"{type(e).__name__}: {str(e)[:200]}"
+            entry['error'] = err
            response_ms = int((_time.monotonic() - t0) * 1000)
            probe_records.append({
                'host_label': label, 'host_url': host, 'healthy': entry['healthy'],
--- a/run_scheduler.py
+++ b/run_scheduler.py
@@ -63,6 +63,18 @@ def _env_flag(name: str, default: bool = False) -> bool:
    return str(raw).strip().lower() in {"1", "true", "yes", "on"}


+def _host_health_model_probe_enabled(label: str) -> bool:
+    from services.ollama_health_probe import host_health_model_probe_enabled
+
+    return host_health_model_probe_enabled(label)
+
+
+def _probe_ollama_embedding_runtime(requests_module, host: str) -> tuple[bool, str | None]:
+    from services.ollama_health_probe import probe_ollama_embedding_runtime
+
+    return probe_ollama_embedding_runtime(requests_module, host)
+
+
 def _legacy_edm_schedule_enabled() -> bool:
    """Legacy fixed-LPN EDM/Festival crawlers are opt-in to avoid stale campaign browser loops."""
    return _env_flag("MOMO_ENABLE_LEGACY_EDM_SCHEDULE", False)
@@ -490,6 +502,11 @@ def run_host_health_probe():
                if resp.status_code == 200:
                    healthy = True
                    models_count = len(resp.json().get('models', []) or [])
+                    if _host_health_model_probe_enabled(label):
+                        model_ok, model_err = _probe_ollama_embedding_runtime(_r, host)
+                        if not model_ok:
+                            healthy = False
+                            err = model_err
                else:
                    err = f"HTTP {resp.status_code}"
            except Exception as e:
--- a/services/ollama_health_probe.py
+++ b/services/ollama_health_probe.py
@@ -0,0 +1,46 @@
+"""Lightweight Ollama runtime health probes shared by scheduler and UI."""
+
+import os
+
+
+def _env_flag(name: str, default: bool = False) -> bool:
+    raw = os.getenv(name)
+    if raw is None:
+        return default
+    return str(raw).strip().lower() in {"1", "true", "yes", "on"}
+
+
+def host_health_model_probe_enabled(label: str) -> bool:
+    """Return whether host health should verify a tiny real model operation."""
+    if not _env_flag("OLLAMA_HOST_HEALTH_MODEL_PROBE_ENABLED", True):
+        return False
+    if "Fallback" in label:
+        return _env_flag("OLLAMA_HOST_HEALTH_MODEL_PROBE_INCLUDE_111", False)
+    return True
+
+
+def probe_ollama_embedding_runtime(requests_module, host: str) -> tuple[bool, str | None]:
+    """Verify Ollama can serve a tiny embedding, not just answer /api/tags."""
+    model = os.getenv("OLLAMA_HOST_HEALTH_EMBED_MODEL", "bge-m3:latest")
+    timeout = float(os.getenv("OLLAMA_HOST_HEALTH_EMBED_TIMEOUT", "8"))
+    keep_alive = os.getenv("OLLAMA_HOST_HEALTH_EMBED_KEEP_ALIVE", "1m")
+    try:
+        resp = requests_module.post(
+            f"{host.rstrip('/')}/api/embed",
+            json={"model": model, "input": "health", "keep_alive": keep_alive},
+            timeout=timeout,
+        )
+        if resp.status_code != 200:
+            return False, f"EmbedProbe HTTP {resp.status_code}"
+        payload = resp.json()
+        embeddings = payload.get("embeddings")
+        if isinstance(embeddings, list) and embeddings:
+            first = embeddings[0]
+            if isinstance(first, list) and first:
+                return True, None
+        embedding = payload.get("embedding")
+        if isinstance(embedding, list) and embedding:
+            return True, None
+        return False, "EmbedProbe empty embedding"
+    except Exception as exc:
+        return False, f"EmbedProbe {type(exc).__name__}: {str(exc)[:160]}"
--- a/templates/admin/host_health.html
+++ b/templates/admin/host_health.html
@@ -84,9 +84,10 @@
              <div>
                <div class="host-lane-top">
                  <span class="host-name">{{ h.label }}</span>
-                  {% if h.healthy %}<span class="badge bg-success">HTTP 正常</span>{% else %}<span class="badge bg-danger">離線</span>{% endif %}
+                  {% if h.healthy %}<span class="badge bg-success">Runtime 正常</span>{% else %}<span class="badge bg-danger">異常</span>{% endif %}
                </div>
                <div class="host-url"><code>{{ h.host }}</code></div>
+                {% if h.error %}<div class="text-danger small mt-1">{{ h.error }}</div>{% endif %}
                <div class="model-cloud">
                  {% for m in h.models %}<span class="model-chip">{{ m }}</span>{% endfor %}
                  {% if not h.models %}<span class="text-muted small">無模型資料 / 未連線</span>{% endif %}
--- a/tests/test_run_scheduler_embed_consistency.py
+++ b/tests/test_run_scheduler_embed_consistency.py
@@ -133,6 +133,44 @@ def test_host_health_transition_alert_keeps_db_dedup_window(monkeypatch):
    assert "_push_host_transition_alert(tr)" in source


+def test_host_health_probe_verifies_gcp_embedding_runtime(monkeypatch):
+    run_scheduler = _load_run_scheduler(monkeypatch)
+
+    class Resp:
+        status_code = 200
+
+        def json(self):
+            return {"embeddings": [[0.1, 0.2, 0.3]]}
+
+    ok, err = run_scheduler._probe_ollama_embedding_runtime(
+        type("Requests", (), {"post": staticmethod(lambda *args, **kwargs: Resp())}),
+        "http://34.21.145.224:11434",
+    )
+
+    assert ok is True
+    assert err is None
+    assert run_scheduler._host_health_model_probe_enabled("Primary (GCP)") is True
+    assert run_scheduler._host_health_model_probe_enabled("Secondary (GCP)") is True
+    assert run_scheduler._host_health_model_probe_enabled("Fallback (111)") is False
+
+
+def test_host_health_probe_reports_embedding_runtime_failure(monkeypatch):
+    run_scheduler = _load_run_scheduler(monkeypatch)
+
+    class Requests:
+        @staticmethod
+        def post(*_args, **_kwargs):
+            raise TimeoutError("embed timeout")
+
+    ok, err = run_scheduler._probe_ollama_embedding_runtime(
+        Requests,
+        "http://34.21.145.224:11434",
+    )
+
+    assert ok is False
+    assert "EmbedProbe TimeoutError" in err
+
+
 def test_v2_cron_blind_spot_list_has_failure_notifications(monkeypatch):
    run_scheduler = _load_run_scheduler(monkeypatch)