fix(ai): 擴大 ElephantAlpha 暫時性 fallback
All checks were successful
CD Pipeline / deploy (push) Successful in 1m46s

This commit is contained in:
OoO
2026-04-30 13:59:12 +08:00
parent 78ec7b5b08
commit 89e7f2ccd2
10 changed files with 146 additions and 16 deletions

View File

@@ -2,7 +2,7 @@
> 本文件定義專案開發的核心準則與不可違反的規範
> **建立日期**: 2026-01-12
> **當前版本**: V10.19 (AI metrics baseline 觀測版)
> **當前版本**: V10.20 (ElephantAlpha transient fallback 版)
> **最後更新**: 2026-04-30
---

View File

@@ -28,6 +28,7 @@
- Ollama embedding 強化:改為優先 `/api/embed`,舊節點才 fallback `/api/embeddings`,並新增 `EMBEDDING_TIMEOUT`。
- Scheduler 例外記錄強化:清除 `scheduler.py` 靜默 `except/pass`資源清理、EDM 可選欄位、備份 insight/通知失敗全改為可診斷 log。
- AI metrics baseline 觀測:`/metrics` 在尚無 AI 自動化事件時仍輸出 `momo_ai_*` zero-baseline series避免重啟後 Grafana/Prometheus 看不到 metric names。
- ElephantAlpha transient fallbackNVIDIA NIM timeout、connection error、429 與 5xx 會嘗試下一個 fallback model400 等非暫時性請求錯誤不重試。
【下次待辦】
- 觀察 Prometheus scrape 後 `momo_ai_*` baseline 與非 baseline 事件序列是否持續穩定。

4
app.py
View File

@@ -95,8 +95,8 @@ except Exception as e:
sys_log.error(f"無法檢測磁碟空間: {e}")
# 🚩 系統版本定義 (備份與顯示用)
# 🚩 2026-04-30 V10.19: AI metrics zero-baseline export
SYSTEM_VERSION = "V10.19"
# 🚩 2026-04-30 V10.20: ElephantAlpha transient NIM fallback
SYSTEM_VERSION = "V10.20"
# ==========================================
# 🔒 SQL Injection 防護函數

View File

@@ -254,7 +254,7 @@ YOUTUBE_API_KEY = os.getenv('YOUTUBE_API_KEY', '')
# ==========================================
# 系統版本與路徑
# ==========================================
SYSTEM_VERSION = "V10.19"
SYSTEM_VERSION = "V10.20"
LOG_FILE_PATH = os.path.join(BASE_DIR, 'logs/system.log')
public_url = PUBLIC_URL # 用於模板顯示

View File

@@ -2,7 +2,7 @@
> **最後更新**: 2026-04-30 (台北時間)
> **狀態**: 🟢 四 AI Agent 自動化閉環已落地 — EventRouter / AutoHeal / OpenClaw Memory / ElephantAlpha bridge / Prometheus metrics / Smoke Dashboard / Smoke Trend Management / Telegram Summary / Grafana provisioning / Prometheus scrape / CD Gunicorn 掛載具測試覆蓋
> **適用版本**: V10.19 AI metrics baseline 觀測
> **適用版本**: V10.20 ElephantAlpha transient fallback
---
@@ -73,7 +73,7 @@ SQL漏斗(~300筆)
- `/metrics``realtime_sales_monthly` 只用 raw `SELECT COUNT(*)` 取得總筆數,避免 ORM schema drift 讓 Prometheus scrape 產生 warning。
- `momo-app` 必須 bind mount `./gunicorn.conf.py:/app/gunicorn.conf.py:ro`,讓 CD sync/rebuild 後的 Gunicorn runtime 設定與 repo 保持一致。
- CD rebuild 模式必須先 build image 成功,再短暫 stop/rm/recreate 三應用容器,避免 no-cache build 造成長時間 502。
- ElephantAlpha 使用 NVIDIA NIM hosted APIproduction 預設模型為 `nvidia/llama-3.3-nemotron-super-49b-v1.5``ELEPHANT_ALPHA_FALLBACK_MODELS` 需保留至少一個可呼叫備援。
- ElephantAlpha 使用 NVIDIA NIM hosted APIproduction 預設模型為 `nvidia/llama-3.3-nemotron-super-49b-v1.5``ELEPHANT_ALPHA_FALLBACK_MODELS` 需保留至少一個可呼叫備援403/404、408/409/425/429、5xx、timeout 與 connection error 必須嘗試下一個模型
- OpenClaw/Hermes embedding 優先呼叫 Ollama `/api/embed`,只在舊節點不支援時 fallback `/api/embeddings`timeout 由 `EMBEDDING_TIMEOUT` / `OLLAMA_EMBED_TIMEOUT` 控制。
---

View File

@@ -47,21 +47,27 @@ Elephant Alpha (Super Orchestrator)
cp .env.example .env
```
2. **Configure OpenRouter API:**
2. **Configure NVIDIA NIM API:**
```bash
# Get API key from https://openrouter.ai/keys
export OPENROUTER_API_KEY="sk-or-v1-your-api-key"
# Get API key from NVIDIA NIM / build.nvidia.com
export NVIDIA_API_KEY="nvapi-your-api-key"
```
3. **Update .env file:**
```env
# Elephant Alpha Configuration
OPENROUTER_API_KEY=sk-or-v1-your-openrouter-api-key-here
ELEPHANT_ALPHA_MODEL=openrouter/elephant-alpha
NVIDIA_API_KEY=nvapi-your-nvidia-api-key-here
ELEPHANT_ALPHA_NEMOTRON_NIM_ENDPOINT=https://integrate.api.nvidia.com/v1
ELEPHANT_ALPHA_URL=https://integrate.api.nvidia.com/v1/chat/completions
ELEPHANT_ALPHA_MODEL=nvidia/llama-3.3-nemotron-super-49b-v1.5
ELEPHANT_ALPHA_FALLBACK_MODELS=nvidia/llama-3.3-nemotron-super-49b-v1.5,nvidia/llama-3.1-nemotron-70b-instruct,meta/llama-3.1-8b-instruct
ELEPHANT_TIMEOUT=120
ELEPHANT_ALPHA_CONFIDENCE_THRESHOLD=0.7
ELEPHANT_ALPHA_MAX_AUTONOMOUS_DECISIONS_PER_HOUR=10
```
Runtime fallback rule: ElephantService tries the next `ELEPHANT_ALPHA_FALLBACK_MODELS` entry when NVIDIA NIM returns 403/404, transient 408/409/425/429, 5xx, timeout, or connection error. Non-transient client errors such as HTTP 400 fail fast so bad requests do not burn quota across all models.
### Step 2: Install Dependencies
```bash

View File

@@ -28,6 +28,7 @@
- 2026-04-30 OpenClaw embedding worker 曾在舊 `/api/embeddings` 路徑遇到 Hermes timeoutOllama client 已改為優先 `/api/embed`,舊節點才 fallback `/api/embeddings`
- 2026-04-30 `scheduler.py` 殘留靜默 `except/pass`;已改為 warning/debug log備份 insight 與 Telegram 通知失敗不再靜默。
- 2026-04-30 `/metrics` 已補 `momo_ai_*` zero-baseline seriesapp 重啟後即使尚無 EventRouter / AutoHeal 事件Prometheus/Grafana 也能先看到 metric names。
- 2026-04-30 ElephantAlpha NIM fallback 已擴大到 timeout、connection error、429 與 5xxprimary model 暫時卡住時會嘗試下一個 `ELEPHANT_ALPHA_FALLBACK_MODELS`
## 已落地範圍
@@ -68,6 +69,7 @@
- 2026-04-30 Ollama embedding API migration新增 `tests/test_ollama_embedding.py`
- 2026-04-30 Phase 3f cleanup contracts`tests/test_phase3f_cleanup_contracts.py` 覆蓋 orphan services、env 範例、scheduler 靜默例外。
- 2026-04-30 AI metrics baseline`tests/test_ai_automation_metrics.py` 覆蓋無事件 snapshot 仍匯出 `momo_ai_*` baseline。
- 2026-04-30 ElephantAlpha transient fallback`tests/test_elephant_service.py` 覆蓋 timeout、503 fallback 與 400 不 fallback。
- 2026-04-29 L2 安全記憶批次:`24 passed`
- collect-only`48 tests collected`
- `git diff --check` 已通過。

View File

@@ -41,6 +41,7 @@
- **Ollama embedding API 遷移**: embedding client 優先使用官方 `/api/embed`,舊節點才 fallback `/api/embeddings`,降低 deprecated endpoint 與 timeout 風險。
- **Scheduler 例外記錄強化**: 清除 `scheduler.py` 靜默 `except/pass`Chrome 清理、EDM optional 欄位、備份 insight/Telegram 失敗均保留 log。
- **AI metrics baseline 觀測**: `/metrics` 在尚無 AI 自動化事件時仍輸出 `momo_ai_*` zero-baseline series避免 app 重啟後 Grafana/Prometheus 看不到 metric names。
- **ElephantAlpha transient fallback**: NVIDIA NIM primary model timeout、connection error、429 與 5xx 會嘗試下一個 fallback model400 等非暫時性請求錯誤不重試。
### 2026-04-28~29Phase 3e 重構大戰 + daily_sales cache 隱形 bug 根除
- **app.py 縮減 -10.8%**: 7,386 → 6,590 行11 commits 全綠零 502。

View File

@@ -38,6 +38,7 @@ ELEPHANT_FALLBACK_MODELS = [
if model.strip()
]
ELEPHANT_TIMEOUT = int(os.getenv('ELEPHANT_TIMEOUT', '120')) # 預設 2 分鐘
ELEPHANT_FALLBACK_HTTP_STATUS_CODES = {403, 404, 408, 409, 425, 429, 500, 502, 503, 504}
# Elephant Alpha 定價 (USD per 1M tokens) - NVIDIA NIM 定價
ELEPHANT_PRICING = {
@@ -115,6 +116,10 @@ class ElephantService:
candidates.append(model_name)
return candidates
@staticmethod
def _has_next_model(model_name: str, model_candidates: List[str]) -> bool:
return bool(model_candidates) and model_name != model_candidates[-1]
def generate(self, prompt: str, model: str = None,
system_prompt: str = None, temperature: float = 0.3,
json_mode: bool = False, timeout: int = None) -> ElephantResponse:
@@ -187,18 +192,26 @@ class ElephantService:
except requests.HTTPError as e:
status_code = e.response.status_code if e.response is not None else None
last_error = str(e)
if status_code in (404, 403) and model_name != model_candidates[-1]:
logger.warning(f"[Elephant] 模型不可用,改用 fallback: {model_name} ({status_code})")
last_error = f"{model_name}: {e}"
if status_code in ELEPHANT_FALLBACK_HTTP_STATUS_CODES and self._has_next_model(model_name, model_candidates):
logger.warning(f"[Elephant] NIM 模型/API 暫時不可用,改用 fallback: {model_name} ({status_code})")
continue
logger.error(f"[Elephant] 生成失敗: {e}")
return ElephantResponse(success=False, content='', model=model_name, error=last_error)
except (requests.Timeout, requests.ConnectionError) as e:
last_error = f"{model_name}: {e}"
if self._has_next_model(model_name, model_candidates):
logger.warning(f"[Elephant] NIM 暫時性連線錯誤,改用 fallback: {model_name} ({e})")
continue
logger.error(f"[Elephant] 生成失敗: {e}")
return ElephantResponse(success=False, content='', model=model_name, error=last_error)
except Exception as e:
last_error = str(e)
last_error = f"{model_name}: {e}"
logger.error(f"[Elephant] 生成失敗: {e}")
return ElephantResponse(success=False, content='', model=model_name, error=last_error)
return ElephantResponse(success=False, content='', model=primary_model, error=last_error or "所有 Elephant fallback model 均不可用")
failed_model = model_candidates[-1] if model_candidates else primary_model
return ElephantResponse(success=False, content='', model=failed_model, error=last_error or "所有 Elephant fallback model 均不可用")
# 單例實例
elephant_service = ElephantService()

View File

@@ -45,6 +45,113 @@ def test_elephant_service_falls_back_when_primary_model_is_unavailable(monkeypat
assert calls == ["nvidia/unavailable", "nvidia/available"]
def test_elephant_service_falls_back_when_primary_model_times_out(monkeypatch):
from services import elephant_service as module
calls = []
def fake_post(_url, json, headers, timeout):
calls.append(json["model"])
if json["model"] == "nvidia/slow":
raise requests.Timeout("read timed out")
return FakeResponse(
200,
{
"choices": [{"message": {"content": "Fallback OK"}}],
"usage": {"prompt_tokens": 4, "completion_tokens": 3},
},
)
monkeypatch.setattr(module, "ELEPHANT_FALLBACK_MODELS", ["nvidia/available"])
monkeypatch.setattr(module.requests, "post", fake_post)
service = module.ElephantService(api_key="test-key", model="nvidia/slow")
result = service.generate("hello", timeout=3)
assert result.success is True
assert result.model == "nvidia/available"
assert result.content == "Fallback OK"
assert calls == ["nvidia/slow", "nvidia/available"]
def test_elephant_service_falls_back_when_primary_model_connection_fails(monkeypatch):
from services import elephant_service as module
calls = []
def fake_post(_url, json, headers, timeout):
calls.append(json["model"])
if json["model"] == "nvidia/disconnected":
raise requests.ConnectionError("connection reset")
return FakeResponse(
200,
{
"choices": [{"message": {"content": "Connected fallback"}}],
"usage": {},
},
)
monkeypatch.setattr(module, "ELEPHANT_FALLBACK_MODELS", ["nvidia/available"])
monkeypatch.setattr(module.requests, "post", fake_post)
service = module.ElephantService(api_key="test-key", model="nvidia/disconnected")
result = service.generate("hello")
assert result.success is True
assert result.model == "nvidia/available"
assert result.content == "Connected fallback"
assert calls == ["nvidia/disconnected", "nvidia/available"]
def test_elephant_service_falls_back_on_transient_http_status(monkeypatch):
from services import elephant_service as module
calls = []
def fake_post(_url, json, headers, timeout):
calls.append(json["model"])
if json["model"] == "nvidia/overloaded":
return FakeResponse(503, {"detail": "temporarily unavailable"})
return FakeResponse(
200,
{
"choices": [{"message": {"content": "Recovered"}}],
"usage": {},
},
)
monkeypatch.setattr(module, "ELEPHANT_FALLBACK_MODELS", ["nvidia/available"])
monkeypatch.setattr(module.requests, "post", fake_post)
service = module.ElephantService(api_key="test-key", model="nvidia/overloaded")
result = service.generate("hello")
assert result.success is True
assert result.model == "nvidia/available"
assert result.content == "Recovered"
assert calls == ["nvidia/overloaded", "nvidia/available"]
def test_elephant_service_does_not_fallback_on_non_transient_client_error(monkeypatch):
from services import elephant_service as module
calls = []
def fake_post(_url, json, headers, timeout):
calls.append(json["model"])
return FakeResponse(400, {"detail": "bad request"})
monkeypatch.setattr(module, "ELEPHANT_FALLBACK_MODELS", ["nvidia/available"])
monkeypatch.setattr(module.requests, "post", fake_post)
service = module.ElephantService(api_key="test-key", model="nvidia/bad-request")
result = service.generate("hello")
assert result.success is False
assert result.model == "nvidia/bad-request"
assert calls == ["nvidia/bad-request"]
def test_elephant_service_uses_reasoning_content_when_content_is_empty(monkeypatch):
from services import elephant_service as module