fix(ai): 擴大 ElephantAlpha 暫時性 fallback

2026-04-30 13:59:12 +08:00
parent 78ec7b5b08
commit 89e7f2ccd2
10 changed files with 146 additions and 16 deletions
--- a/CONSTITUTION.md
+++ b/CONSTITUTION.md
@@ -2,7 +2,7 @@

 > 本文件定義專案開發的核心準則與不可違反的規範
 > **建立日期**: 2026-01-12
-> **當前版本**: V10.19 (AI metrics baseline 觀測版)
+> **當前版本**: V10.20 (ElephantAlpha transient fallback 版)
 > **最後更新**: 2026-04-30

 ---
--- a/TODO_NEXT_STEPS.txt
+++ b/TODO_NEXT_STEPS.txt
@@ -28,6 +28,7 @@
   - Ollama embedding 強化：改為優先 `/api/embed`，舊節點才 fallback `/api/embeddings`，並新增 `EMBEDDING_TIMEOUT`。
   - Scheduler 例外記錄強化：清除 `scheduler.py` 靜默 `except/pass`，資源清理、EDM 可選欄位、備份 insight/通知失敗全改為可診斷 log。
   - AI metrics baseline 觀測：`/metrics` 在尚無 AI 自動化事件時仍輸出 `momo_ai_*` zero-baseline series，避免重啟後 Grafana/Prometheus 看不到 metric names。
+   - ElephantAlpha transient fallback：NVIDIA NIM timeout、connection error、429 與 5xx 會嘗試下一個 fallback model；400 等非暫時性請求錯誤不重試。

 【下次待辦】
   - 觀察 Prometheus scrape 後 `momo_ai_*` baseline 與非 baseline 事件序列是否持續穩定。
--- a/app.py
+++ b/app.py
@@ -95,8 +95,8 @@ except Exception as e:
    sys_log.error(f"無法檢測磁碟空間: {e}")

 # 🚩 系統版本定義 (備份與顯示用)
-# 🚩 2026-04-30 V10.19: AI metrics zero-baseline export
-SYSTEM_VERSION = "V10.19"
+# 🚩 2026-04-30 V10.20: ElephantAlpha transient NIM fallback
+SYSTEM_VERSION = "V10.20"

 # ==========================================
 # 🔒 SQL Injection 防護函數
--- a/config.py
+++ b/config.py
@@ -254,7 +254,7 @@ YOUTUBE_API_KEY = os.getenv('YOUTUBE_API_KEY', '')
 # ==========================================
 # 系統版本與路徑
 # ==========================================
-SYSTEM_VERSION = "V10.19"
+SYSTEM_VERSION = "V10.20"
 LOG_FILE_PATH = os.path.join(BASE_DIR, 'logs/system.log')
 public_url = PUBLIC_URL  # 用於模板顯示

--- a/docs/AI_INTELLIGENCE_MODULE_SOT.md
+++ b/docs/AI_INTELLIGENCE_MODULE_SOT.md
@@ -2,7 +2,7 @@

 > **最後更新**: 2026-04-30 (台北時間)
 > **狀態**: 🟢 四 AI Agent 自動化閉環已落地 — EventRouter / AutoHeal / OpenClaw Memory / ElephantAlpha bridge / Prometheus metrics / Smoke Dashboard / Smoke Trend Management / Telegram Summary / Grafana provisioning / Prometheus scrape / CD Gunicorn 掛載具測試覆蓋
-> **適用版本**: V10.19 AI metrics baseline 觀測版
+> **適用版本**: V10.20 ElephantAlpha transient fallback 版

 ---

@@ -73,7 +73,7 @@ SQL漏斗(~300筆)
 - `/metrics` 對 `realtime_sales_monthly` 只用 raw `SELECT COUNT(*)` 取得總筆數，避免 ORM schema drift 讓 Prometheus scrape 產生 warning。
 - `momo-app` 必須 bind mount `./gunicorn.conf.py:/app/gunicorn.conf.py:ro`，讓 CD sync/rebuild 後的 Gunicorn runtime 設定與 repo 保持一致。
 - CD rebuild 模式必須先 build image 成功，再短暫 stop/rm/recreate 三應用容器，避免 no-cache build 造成長時間 502。
- ElephantAlpha 使用 NVIDIA NIM hosted API；production 預設模型為 `nvidia/llama-3.3-nemotron-super-49b-v1.5`，`ELEPHANT_ALPHA_FALLBACK_MODELS` 需保留至少一個可呼叫備援。
+- ElephantAlpha 使用 NVIDIA NIM hosted API；production 預設模型為 `nvidia/llama-3.3-nemotron-super-49b-v1.5`，`ELEPHANT_ALPHA_FALLBACK_MODELS` 需保留至少一個可呼叫備援；403/404、408/409/425/429、5xx、timeout 與 connection error 必須嘗試下一個模型。
 - OpenClaw/Hermes embedding 優先呼叫 Ollama `/api/embed`，只在舊節點不支援時 fallback `/api/embeddings`；timeout 由 `EMBEDDING_TIMEOUT` / `OLLAMA_EMBED_TIMEOUT` 控制。

 ---
--- a/docs/ELEPHANT_ALPHA_SETUP.md
+++ b/docs/ELEPHANT_ALPHA_SETUP.md
@@ -47,21 +47,27 @@ Elephant Alpha (Super Orchestrator)
 cp .env.example .env
 ```

-2. **Configure OpenRouter API:**
+2. **Configure NVIDIA NIM API:**
 ```bash
-# Get API key from https://openrouter.ai/keys
-export OPENROUTER_API_KEY="sk-or-v1-your-api-key"
+# Get API key from NVIDIA NIM / build.nvidia.com
+export NVIDIA_API_KEY="nvapi-your-api-key"
 ```

 3. **Update .env file:**
 ```env
 # Elephant Alpha Configuration
-OPENROUTER_API_KEY=sk-or-v1-your-openrouter-api-key-here
-ELEPHANT_ALPHA_MODEL=openrouter/elephant-alpha
+NVIDIA_API_KEY=nvapi-your-nvidia-api-key-here
+ELEPHANT_ALPHA_NEMOTRON_NIM_ENDPOINT=https://integrate.api.nvidia.com/v1
+ELEPHANT_ALPHA_URL=https://integrate.api.nvidia.com/v1/chat/completions
+ELEPHANT_ALPHA_MODEL=nvidia/llama-3.3-nemotron-super-49b-v1.5
+ELEPHANT_ALPHA_FALLBACK_MODELS=nvidia/llama-3.3-nemotron-super-49b-v1.5,nvidia/llama-3.1-nemotron-70b-instruct,meta/llama-3.1-8b-instruct
+ELEPHANT_TIMEOUT=120
 ELEPHANT_ALPHA_CONFIDENCE_THRESHOLD=0.7
 ELEPHANT_ALPHA_MAX_AUTONOMOUS_DECISIONS_PER_HOUR=10
 ```

+Runtime fallback rule: ElephantService tries the next `ELEPHANT_ALPHA_FALLBACK_MODELS` entry when NVIDIA NIM returns 403/404, transient 408/409/425/429, 5xx, timeout, or connection error. Non-transient client errors such as HTTP 400 fail fast so bad requests do not burn quota across all models.
+
 ### Step 2: Install Dependencies

 ```bash
--- a/docs/memory/ai_automation_closure_20260429.md
+++ b/docs/memory/ai_automation_closure_20260429.md
@@ -28,6 +28,7 @@
 - 2026-04-30 OpenClaw embedding worker 曾在舊 `/api/embeddings` 路徑遇到 Hermes timeout；Ollama client 已改為優先 `/api/embed`，舊節點才 fallback `/api/embeddings`。
 - 2026-04-30 `scheduler.py` 殘留靜默 `except/pass`；已改為 warning/debug log，備份 insight 與 Telegram 通知失敗不再靜默。
 - 2026-04-30 `/metrics` 已補 `momo_ai_*` zero-baseline series；app 重啟後即使尚無 EventRouter / AutoHeal 事件，Prometheus/Grafana 也能先看到 metric names。
+- 2026-04-30 ElephantAlpha NIM fallback 已擴大到 timeout、connection error、429 與 5xx；primary model 暫時卡住時會嘗試下一個 `ELEPHANT_ALPHA_FALLBACK_MODELS`。

 ## 已落地範圍

@@ -68,6 +69,7 @@
 - 2026-04-30 Ollama embedding API migration：新增 `tests/test_ollama_embedding.py`。
 - 2026-04-30 Phase 3f cleanup contracts：`tests/test_phase3f_cleanup_contracts.py` 覆蓋 orphan services、env 範例、scheduler 靜默例外。
 - 2026-04-30 AI metrics baseline：`tests/test_ai_automation_metrics.py` 覆蓋無事件 snapshot 仍匯出 `momo_ai_*` baseline。
+- 2026-04-30 ElephantAlpha transient fallback：`tests/test_elephant_service.py` 覆蓋 timeout、503 fallback 與 400 不 fallback。
 - 2026-04-29 L2 安全記憶批次：`24 passed`。
 - collect-only：`48 tests collected`。
 - `git diff --check` 已通過。
--- a/docs/memory/history_logs.md
+++ b/docs/memory/history_logs.md
@@ -41,6 +41,7 @@
 - **Ollama embedding API 遷移**: embedding client 優先使用官方 `/api/embed`，舊節點才 fallback `/api/embeddings`，降低 deprecated endpoint 與 timeout 風險。
 - **Scheduler 例外記錄強化**: 清除 `scheduler.py` 靜默 `except/pass`，Chrome 清理、EDM optional 欄位、備份 insight/Telegram 失敗均保留 log。
 - **AI metrics baseline 觀測**: `/metrics` 在尚無 AI 自動化事件時仍輸出 `momo_ai_*` zero-baseline series，避免 app 重啟後 Grafana/Prometheus 看不到 metric names。
+- **ElephantAlpha transient fallback**: NVIDIA NIM primary model timeout、connection error、429 與 5xx 會嘗試下一個 fallback model，400 等非暫時性請求錯誤不重試。

 ### 2026-04-28~29：Phase 3e 重構大戰 + daily_sales cache 隱形 bug 根除
 - **app.py 縮減 -10.8%**: 7,386 → 6,590 行，11 commits 全綠零 502。
--- a/services/elephant_service.py
+++ b/services/elephant_service.py
@@ -38,6 +38,7 @@ ELEPHANT_FALLBACK_MODELS = [
    if model.strip()
 ]
 ELEPHANT_TIMEOUT = int(os.getenv('ELEPHANT_TIMEOUT', '120'))  # 預設 2 分鐘
+ELEPHANT_FALLBACK_HTTP_STATUS_CODES = {403, 404, 408, 409, 425, 429, 500, 502, 503, 504}

 # Elephant Alpha 定價 (USD per 1M tokens) - NVIDIA NIM 定價
 ELEPHANT_PRICING = {
@@ -115,6 +116,10 @@ class ElephantService:
                candidates.append(model_name)
        return candidates

+    @staticmethod
+    def _has_next_model(model_name: str, model_candidates: List[str]) -> bool:
+        return bool(model_candidates) and model_name != model_candidates[-1]
+
    def generate(self, prompt: str, model: str = None, 
                 system_prompt: str = None, temperature: float = 0.3,
                 json_mode: bool = False, timeout: int = None) -> ElephantResponse:
@@ -187,18 +192,26 @@ class ElephantService:

            except requests.HTTPError as e:
                status_code = e.response.status_code if e.response is not None else None
-                last_error = str(e)
-                if status_code in (404, 403) and model_name != model_candidates[-1]:
-                    logger.warning(f"[Elephant] 模型不可用，改用 fallback: {model_name} ({status_code})")
+                last_error = f"{model_name}: {e}"
+                if status_code in ELEPHANT_FALLBACK_HTTP_STATUS_CODES and self._has_next_model(model_name, model_candidates):
+                    logger.warning(f"[Elephant] NIM 模型/API 暫時不可用，改用 fallback: {model_name} ({status_code})")
+                    continue
+                logger.error(f"[Elephant] 生成失敗: {e}")
+                return ElephantResponse(success=False, content='', model=model_name, error=last_error)
+            except (requests.Timeout, requests.ConnectionError) as e:
+                last_error = f"{model_name}: {e}"
+                if self._has_next_model(model_name, model_candidates):
+                    logger.warning(f"[Elephant] NIM 暫時性連線錯誤，改用 fallback: {model_name} ({e})")
                    continue
                logger.error(f"[Elephant] 生成失敗: {e}")
                return ElephantResponse(success=False, content='', model=model_name, error=last_error)
            except Exception as e:
-                last_error = str(e)
+                last_error = f"{model_name}: {e}"
                logger.error(f"[Elephant] 生成失敗: {e}")
                return ElephantResponse(success=False, content='', model=model_name, error=last_error)

-        return ElephantResponse(success=False, content='', model=primary_model, error=last_error or "所有 Elephant fallback model 均不可用")
+        failed_model = model_candidates[-1] if model_candidates else primary_model
+        return ElephantResponse(success=False, content='', model=failed_model, error=last_error or "所有 Elephant fallback model 均不可用")

 # 單例實例
 elephant_service = ElephantService()
--- a/tests/test_elephant_service.py
+++ b/tests/test_elephant_service.py
@@ -45,6 +45,113 @@ def test_elephant_service_falls_back_when_primary_model_is_unavailable(monkeypat
    assert calls == ["nvidia/unavailable", "nvidia/available"]


+def test_elephant_service_falls_back_when_primary_model_times_out(monkeypatch):
+    from services import elephant_service as module
+
+    calls = []
+
+    def fake_post(_url, json, headers, timeout):
+        calls.append(json["model"])
+        if json["model"] == "nvidia/slow":
+            raise requests.Timeout("read timed out")
+        return FakeResponse(
+            200,
+            {
+                "choices": [{"message": {"content": "Fallback OK"}}],
+                "usage": {"prompt_tokens": 4, "completion_tokens": 3},
+            },
+        )
+
+    monkeypatch.setattr(module, "ELEPHANT_FALLBACK_MODELS", ["nvidia/available"])
+    monkeypatch.setattr(module.requests, "post", fake_post)
+
+    service = module.ElephantService(api_key="test-key", model="nvidia/slow")
+    result = service.generate("hello", timeout=3)
+
+    assert result.success is True
+    assert result.model == "nvidia/available"
+    assert result.content == "Fallback OK"
+    assert calls == ["nvidia/slow", "nvidia/available"]
+
+
+def test_elephant_service_falls_back_when_primary_model_connection_fails(monkeypatch):
+    from services import elephant_service as module
+
+    calls = []
+
+    def fake_post(_url, json, headers, timeout):
+        calls.append(json["model"])
+        if json["model"] == "nvidia/disconnected":
+            raise requests.ConnectionError("connection reset")
+        return FakeResponse(
+            200,
+            {
+                "choices": [{"message": {"content": "Connected fallback"}}],
+                "usage": {},
+            },
+        )
+
+    monkeypatch.setattr(module, "ELEPHANT_FALLBACK_MODELS", ["nvidia/available"])
+    monkeypatch.setattr(module.requests, "post", fake_post)
+
+    service = module.ElephantService(api_key="test-key", model="nvidia/disconnected")
+    result = service.generate("hello")
+
+    assert result.success is True
+    assert result.model == "nvidia/available"
+    assert result.content == "Connected fallback"
+    assert calls == ["nvidia/disconnected", "nvidia/available"]
+
+
+def test_elephant_service_falls_back_on_transient_http_status(monkeypatch):
+    from services import elephant_service as module
+
+    calls = []
+
+    def fake_post(_url, json, headers, timeout):
+        calls.append(json["model"])
+        if json["model"] == "nvidia/overloaded":
+            return FakeResponse(503, {"detail": "temporarily unavailable"})
+        return FakeResponse(
+            200,
+            {
+                "choices": [{"message": {"content": "Recovered"}}],
+                "usage": {},
+            },
+        )
+
+    monkeypatch.setattr(module, "ELEPHANT_FALLBACK_MODELS", ["nvidia/available"])
+    monkeypatch.setattr(module.requests, "post", fake_post)
+
+    service = module.ElephantService(api_key="test-key", model="nvidia/overloaded")
+    result = service.generate("hello")
+
+    assert result.success is True
+    assert result.model == "nvidia/available"
+    assert result.content == "Recovered"
+    assert calls == ["nvidia/overloaded", "nvidia/available"]
+
+
+def test_elephant_service_does_not_fallback_on_non_transient_client_error(monkeypatch):
+    from services import elephant_service as module
+
+    calls = []
+
+    def fake_post(_url, json, headers, timeout):
+        calls.append(json["model"])
+        return FakeResponse(400, {"detail": "bad request"})
+
+    monkeypatch.setattr(module, "ELEPHANT_FALLBACK_MODELS", ["nvidia/available"])
+    monkeypatch.setattr(module.requests, "post", fake_post)
+
+    service = module.ElephantService(api_key="test-key", model="nvidia/bad-request")
+    result = service.generate("hello")
+
+    assert result.success is False
+    assert result.model == "nvidia/bad-request"
+    assert calls == ["nvidia/bad-request"]
+
+
 def test_elephant_service_uses_reasoning_content_when_content_is_empty(monkeypatch):
    from services import elephant_service as module