fix(ai): 擴大 ElephantAlpha 暫時性 fallback
All checks were successful
CD Pipeline / deploy (push) Successful in 1m46s
All checks were successful
CD Pipeline / deploy (push) Successful in 1m46s
This commit is contained in:
@@ -2,7 +2,7 @@
|
||||
|
||||
> 本文件定義專案開發的核心準則與不可違反的規範
|
||||
> **建立日期**: 2026-01-12
|
||||
> **當前版本**: V10.19 (AI metrics baseline 觀測版)
|
||||
> **當前版本**: V10.20 (ElephantAlpha transient fallback 版)
|
||||
> **最後更新**: 2026-04-30
|
||||
|
||||
---
|
||||
|
||||
@@ -28,6 +28,7 @@
|
||||
- Ollama embedding 強化:改為優先 `/api/embed`,舊節點才 fallback `/api/embeddings`,並新增 `EMBEDDING_TIMEOUT`。
|
||||
- Scheduler 例外記錄強化:清除 `scheduler.py` 靜默 `except/pass`,資源清理、EDM 可選欄位、備份 insight/通知失敗全改為可診斷 log。
|
||||
- AI metrics baseline 觀測:`/metrics` 在尚無 AI 自動化事件時仍輸出 `momo_ai_*` zero-baseline series,避免重啟後 Grafana/Prometheus 看不到 metric names。
|
||||
- ElephantAlpha transient fallback:NVIDIA NIM timeout、connection error、429 與 5xx 會嘗試下一個 fallback model;400 等非暫時性請求錯誤不重試。
|
||||
|
||||
【下次待辦】
|
||||
- 觀察 Prometheus scrape 後 `momo_ai_*` baseline 與非 baseline 事件序列是否持續穩定。
|
||||
|
||||
4
app.py
4
app.py
@@ -95,8 +95,8 @@ except Exception as e:
|
||||
sys_log.error(f"無法檢測磁碟空間: {e}")
|
||||
|
||||
# 🚩 系統版本定義 (備份與顯示用)
|
||||
# 🚩 2026-04-30 V10.19: AI metrics zero-baseline export
|
||||
SYSTEM_VERSION = "V10.19"
|
||||
# 🚩 2026-04-30 V10.20: ElephantAlpha transient NIM fallback
|
||||
SYSTEM_VERSION = "V10.20"
|
||||
|
||||
# ==========================================
|
||||
# 🔒 SQL Injection 防護函數
|
||||
|
||||
@@ -254,7 +254,7 @@ YOUTUBE_API_KEY = os.getenv('YOUTUBE_API_KEY', '')
|
||||
# ==========================================
|
||||
# 系統版本與路徑
|
||||
# ==========================================
|
||||
SYSTEM_VERSION = "V10.19"
|
||||
SYSTEM_VERSION = "V10.20"
|
||||
LOG_FILE_PATH = os.path.join(BASE_DIR, 'logs/system.log')
|
||||
public_url = PUBLIC_URL # 用於模板顯示
|
||||
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
|
||||
> **最後更新**: 2026-04-30 (台北時間)
|
||||
> **狀態**: 🟢 四 AI Agent 自動化閉環已落地 — EventRouter / AutoHeal / OpenClaw Memory / ElephantAlpha bridge / Prometheus metrics / Smoke Dashboard / Smoke Trend Management / Telegram Summary / Grafana provisioning / Prometheus scrape / CD Gunicorn 掛載具測試覆蓋
|
||||
> **適用版本**: V10.19 AI metrics baseline 觀測版
|
||||
> **適用版本**: V10.20 ElephantAlpha transient fallback 版
|
||||
|
||||
---
|
||||
|
||||
@@ -73,7 +73,7 @@ SQL漏斗(~300筆)
|
||||
- `/metrics` 對 `realtime_sales_monthly` 只用 raw `SELECT COUNT(*)` 取得總筆數,避免 ORM schema drift 讓 Prometheus scrape 產生 warning。
|
||||
- `momo-app` 必須 bind mount `./gunicorn.conf.py:/app/gunicorn.conf.py:ro`,讓 CD sync/rebuild 後的 Gunicorn runtime 設定與 repo 保持一致。
|
||||
- CD rebuild 模式必須先 build image 成功,再短暫 stop/rm/recreate 三應用容器,避免 no-cache build 造成長時間 502。
|
||||
- ElephantAlpha 使用 NVIDIA NIM hosted API;production 預設模型為 `nvidia/llama-3.3-nemotron-super-49b-v1.5`,`ELEPHANT_ALPHA_FALLBACK_MODELS` 需保留至少一個可呼叫備援。
|
||||
- ElephantAlpha 使用 NVIDIA NIM hosted API;production 預設模型為 `nvidia/llama-3.3-nemotron-super-49b-v1.5`,`ELEPHANT_ALPHA_FALLBACK_MODELS` 需保留至少一個可呼叫備援;403/404、408/409/425/429、5xx、timeout 與 connection error 必須嘗試下一個模型。
|
||||
- OpenClaw/Hermes embedding 優先呼叫 Ollama `/api/embed`,只在舊節點不支援時 fallback `/api/embeddings`;timeout 由 `EMBEDDING_TIMEOUT` / `OLLAMA_EMBED_TIMEOUT` 控制。
|
||||
|
||||
---
|
||||
|
||||
@@ -47,21 +47,27 @@ Elephant Alpha (Super Orchestrator)
|
||||
cp .env.example .env
|
||||
```
|
||||
|
||||
2. **Configure OpenRouter API:**
|
||||
2. **Configure NVIDIA NIM API:**
|
||||
```bash
|
||||
# Get API key from https://openrouter.ai/keys
|
||||
export OPENROUTER_API_KEY="sk-or-v1-your-api-key"
|
||||
# Get API key from NVIDIA NIM / build.nvidia.com
|
||||
export NVIDIA_API_KEY="nvapi-your-api-key"
|
||||
```
|
||||
|
||||
3. **Update .env file:**
|
||||
```env
|
||||
# Elephant Alpha Configuration
|
||||
OPENROUTER_API_KEY=sk-or-v1-your-openrouter-api-key-here
|
||||
ELEPHANT_ALPHA_MODEL=openrouter/elephant-alpha
|
||||
NVIDIA_API_KEY=nvapi-your-nvidia-api-key-here
|
||||
ELEPHANT_ALPHA_NEMOTRON_NIM_ENDPOINT=https://integrate.api.nvidia.com/v1
|
||||
ELEPHANT_ALPHA_URL=https://integrate.api.nvidia.com/v1/chat/completions
|
||||
ELEPHANT_ALPHA_MODEL=nvidia/llama-3.3-nemotron-super-49b-v1.5
|
||||
ELEPHANT_ALPHA_FALLBACK_MODELS=nvidia/llama-3.3-nemotron-super-49b-v1.5,nvidia/llama-3.1-nemotron-70b-instruct,meta/llama-3.1-8b-instruct
|
||||
ELEPHANT_TIMEOUT=120
|
||||
ELEPHANT_ALPHA_CONFIDENCE_THRESHOLD=0.7
|
||||
ELEPHANT_ALPHA_MAX_AUTONOMOUS_DECISIONS_PER_HOUR=10
|
||||
```
|
||||
|
||||
Runtime fallback rule: ElephantService tries the next `ELEPHANT_ALPHA_FALLBACK_MODELS` entry when NVIDIA NIM returns 403/404, transient 408/409/425/429, 5xx, timeout, or connection error. Non-transient client errors such as HTTP 400 fail fast so bad requests do not burn quota across all models.
|
||||
|
||||
### Step 2: Install Dependencies
|
||||
|
||||
```bash
|
||||
|
||||
@@ -28,6 +28,7 @@
|
||||
- 2026-04-30 OpenClaw embedding worker 曾在舊 `/api/embeddings` 路徑遇到 Hermes timeout;Ollama client 已改為優先 `/api/embed`,舊節點才 fallback `/api/embeddings`。
|
||||
- 2026-04-30 `scheduler.py` 殘留靜默 `except/pass`;已改為 warning/debug log,備份 insight 與 Telegram 通知失敗不再靜默。
|
||||
- 2026-04-30 `/metrics` 已補 `momo_ai_*` zero-baseline series;app 重啟後即使尚無 EventRouter / AutoHeal 事件,Prometheus/Grafana 也能先看到 metric names。
|
||||
- 2026-04-30 ElephantAlpha NIM fallback 已擴大到 timeout、connection error、429 與 5xx;primary model 暫時卡住時會嘗試下一個 `ELEPHANT_ALPHA_FALLBACK_MODELS`。
|
||||
|
||||
## 已落地範圍
|
||||
|
||||
@@ -68,6 +69,7 @@
|
||||
- 2026-04-30 Ollama embedding API migration:新增 `tests/test_ollama_embedding.py`。
|
||||
- 2026-04-30 Phase 3f cleanup contracts:`tests/test_phase3f_cleanup_contracts.py` 覆蓋 orphan services、env 範例、scheduler 靜默例外。
|
||||
- 2026-04-30 AI metrics baseline:`tests/test_ai_automation_metrics.py` 覆蓋無事件 snapshot 仍匯出 `momo_ai_*` baseline。
|
||||
- 2026-04-30 ElephantAlpha transient fallback:`tests/test_elephant_service.py` 覆蓋 timeout、503 fallback 與 400 不 fallback。
|
||||
- 2026-04-29 L2 安全記憶批次:`24 passed`。
|
||||
- collect-only:`48 tests collected`。
|
||||
- `git diff --check` 已通過。
|
||||
|
||||
@@ -41,6 +41,7 @@
|
||||
- **Ollama embedding API 遷移**: embedding client 優先使用官方 `/api/embed`,舊節點才 fallback `/api/embeddings`,降低 deprecated endpoint 與 timeout 風險。
|
||||
- **Scheduler 例外記錄強化**: 清除 `scheduler.py` 靜默 `except/pass`,Chrome 清理、EDM optional 欄位、備份 insight/Telegram 失敗均保留 log。
|
||||
- **AI metrics baseline 觀測**: `/metrics` 在尚無 AI 自動化事件時仍輸出 `momo_ai_*` zero-baseline series,避免 app 重啟後 Grafana/Prometheus 看不到 metric names。
|
||||
- **ElephantAlpha transient fallback**: NVIDIA NIM primary model timeout、connection error、429 與 5xx 會嘗試下一個 fallback model,400 等非暫時性請求錯誤不重試。
|
||||
|
||||
### 2026-04-28~29:Phase 3e 重構大戰 + daily_sales cache 隱形 bug 根除
|
||||
- **app.py 縮減 -10.8%**: 7,386 → 6,590 行,11 commits 全綠零 502。
|
||||
|
||||
@@ -38,6 +38,7 @@ ELEPHANT_FALLBACK_MODELS = [
|
||||
if model.strip()
|
||||
]
|
||||
ELEPHANT_TIMEOUT = int(os.getenv('ELEPHANT_TIMEOUT', '120')) # 預設 2 分鐘
|
||||
ELEPHANT_FALLBACK_HTTP_STATUS_CODES = {403, 404, 408, 409, 425, 429, 500, 502, 503, 504}
|
||||
|
||||
# Elephant Alpha 定價 (USD per 1M tokens) - NVIDIA NIM 定價
|
||||
ELEPHANT_PRICING = {
|
||||
@@ -115,6 +116,10 @@ class ElephantService:
|
||||
candidates.append(model_name)
|
||||
return candidates
|
||||
|
||||
@staticmethod
|
||||
def _has_next_model(model_name: str, model_candidates: List[str]) -> bool:
|
||||
return bool(model_candidates) and model_name != model_candidates[-1]
|
||||
|
||||
def generate(self, prompt: str, model: str = None,
|
||||
system_prompt: str = None, temperature: float = 0.3,
|
||||
json_mode: bool = False, timeout: int = None) -> ElephantResponse:
|
||||
@@ -187,18 +192,26 @@ class ElephantService:
|
||||
|
||||
except requests.HTTPError as e:
|
||||
status_code = e.response.status_code if e.response is not None else None
|
||||
last_error = str(e)
|
||||
if status_code in (404, 403) and model_name != model_candidates[-1]:
|
||||
logger.warning(f"[Elephant] 模型不可用,改用 fallback: {model_name} ({status_code})")
|
||||
last_error = f"{model_name}: {e}"
|
||||
if status_code in ELEPHANT_FALLBACK_HTTP_STATUS_CODES and self._has_next_model(model_name, model_candidates):
|
||||
logger.warning(f"[Elephant] NIM 模型/API 暫時不可用,改用 fallback: {model_name} ({status_code})")
|
||||
continue
|
||||
logger.error(f"[Elephant] 生成失敗: {e}")
|
||||
return ElephantResponse(success=False, content='', model=model_name, error=last_error)
|
||||
except (requests.Timeout, requests.ConnectionError) as e:
|
||||
last_error = f"{model_name}: {e}"
|
||||
if self._has_next_model(model_name, model_candidates):
|
||||
logger.warning(f"[Elephant] NIM 暫時性連線錯誤,改用 fallback: {model_name} ({e})")
|
||||
continue
|
||||
logger.error(f"[Elephant] 生成失敗: {e}")
|
||||
return ElephantResponse(success=False, content='', model=model_name, error=last_error)
|
||||
except Exception as e:
|
||||
last_error = str(e)
|
||||
last_error = f"{model_name}: {e}"
|
||||
logger.error(f"[Elephant] 生成失敗: {e}")
|
||||
return ElephantResponse(success=False, content='', model=model_name, error=last_error)
|
||||
|
||||
return ElephantResponse(success=False, content='', model=primary_model, error=last_error or "所有 Elephant fallback model 均不可用")
|
||||
failed_model = model_candidates[-1] if model_candidates else primary_model
|
||||
return ElephantResponse(success=False, content='', model=failed_model, error=last_error or "所有 Elephant fallback model 均不可用")
|
||||
|
||||
# 單例實例
|
||||
elephant_service = ElephantService()
|
||||
|
||||
@@ -45,6 +45,113 @@ def test_elephant_service_falls_back_when_primary_model_is_unavailable(monkeypat
|
||||
assert calls == ["nvidia/unavailable", "nvidia/available"]
|
||||
|
||||
|
||||
def test_elephant_service_falls_back_when_primary_model_times_out(monkeypatch):
|
||||
from services import elephant_service as module
|
||||
|
||||
calls = []
|
||||
|
||||
def fake_post(_url, json, headers, timeout):
|
||||
calls.append(json["model"])
|
||||
if json["model"] == "nvidia/slow":
|
||||
raise requests.Timeout("read timed out")
|
||||
return FakeResponse(
|
||||
200,
|
||||
{
|
||||
"choices": [{"message": {"content": "Fallback OK"}}],
|
||||
"usage": {"prompt_tokens": 4, "completion_tokens": 3},
|
||||
},
|
||||
)
|
||||
|
||||
monkeypatch.setattr(module, "ELEPHANT_FALLBACK_MODELS", ["nvidia/available"])
|
||||
monkeypatch.setattr(module.requests, "post", fake_post)
|
||||
|
||||
service = module.ElephantService(api_key="test-key", model="nvidia/slow")
|
||||
result = service.generate("hello", timeout=3)
|
||||
|
||||
assert result.success is True
|
||||
assert result.model == "nvidia/available"
|
||||
assert result.content == "Fallback OK"
|
||||
assert calls == ["nvidia/slow", "nvidia/available"]
|
||||
|
||||
|
||||
def test_elephant_service_falls_back_when_primary_model_connection_fails(monkeypatch):
|
||||
from services import elephant_service as module
|
||||
|
||||
calls = []
|
||||
|
||||
def fake_post(_url, json, headers, timeout):
|
||||
calls.append(json["model"])
|
||||
if json["model"] == "nvidia/disconnected":
|
||||
raise requests.ConnectionError("connection reset")
|
||||
return FakeResponse(
|
||||
200,
|
||||
{
|
||||
"choices": [{"message": {"content": "Connected fallback"}}],
|
||||
"usage": {},
|
||||
},
|
||||
)
|
||||
|
||||
monkeypatch.setattr(module, "ELEPHANT_FALLBACK_MODELS", ["nvidia/available"])
|
||||
monkeypatch.setattr(module.requests, "post", fake_post)
|
||||
|
||||
service = module.ElephantService(api_key="test-key", model="nvidia/disconnected")
|
||||
result = service.generate("hello")
|
||||
|
||||
assert result.success is True
|
||||
assert result.model == "nvidia/available"
|
||||
assert result.content == "Connected fallback"
|
||||
assert calls == ["nvidia/disconnected", "nvidia/available"]
|
||||
|
||||
|
||||
def test_elephant_service_falls_back_on_transient_http_status(monkeypatch):
|
||||
from services import elephant_service as module
|
||||
|
||||
calls = []
|
||||
|
||||
def fake_post(_url, json, headers, timeout):
|
||||
calls.append(json["model"])
|
||||
if json["model"] == "nvidia/overloaded":
|
||||
return FakeResponse(503, {"detail": "temporarily unavailable"})
|
||||
return FakeResponse(
|
||||
200,
|
||||
{
|
||||
"choices": [{"message": {"content": "Recovered"}}],
|
||||
"usage": {},
|
||||
},
|
||||
)
|
||||
|
||||
monkeypatch.setattr(module, "ELEPHANT_FALLBACK_MODELS", ["nvidia/available"])
|
||||
monkeypatch.setattr(module.requests, "post", fake_post)
|
||||
|
||||
service = module.ElephantService(api_key="test-key", model="nvidia/overloaded")
|
||||
result = service.generate("hello")
|
||||
|
||||
assert result.success is True
|
||||
assert result.model == "nvidia/available"
|
||||
assert result.content == "Recovered"
|
||||
assert calls == ["nvidia/overloaded", "nvidia/available"]
|
||||
|
||||
|
||||
def test_elephant_service_does_not_fallback_on_non_transient_client_error(monkeypatch):
|
||||
from services import elephant_service as module
|
||||
|
||||
calls = []
|
||||
|
||||
def fake_post(_url, json, headers, timeout):
|
||||
calls.append(json["model"])
|
||||
return FakeResponse(400, {"detail": "bad request"})
|
||||
|
||||
monkeypatch.setattr(module, "ELEPHANT_FALLBACK_MODELS", ["nvidia/available"])
|
||||
monkeypatch.setattr(module.requests, "post", fake_post)
|
||||
|
||||
service = module.ElephantService(api_key="test-key", model="nvidia/bad-request")
|
||||
result = service.generate("hello")
|
||||
|
||||
assert result.success is False
|
||||
assert result.model == "nvidia/bad-request"
|
||||
assert calls == ["nvidia/bad-request"]
|
||||
|
||||
|
||||
def test_elephant_service_uses_reasoning_content_when_content_is_empty(monkeypatch):
|
||||
from services import elephant_service as module
|
||||
|
||||
|
||||
Reference in New Issue
Block a user