diff --git a/CONSTITUTION.md b/CONSTITUTION.md index 5cd65a7..25a5e33 100644 --- a/CONSTITUTION.md +++ b/CONSTITUTION.md @@ -2,7 +2,7 @@ > 本文件定義專案開發的核心準則與不可違反的規範 > **建立日期**: 2026-01-12 -> **當前版本**: V10.19 (AI metrics baseline 觀測版) +> **當前版本**: V10.20 (ElephantAlpha transient fallback 版) > **最後更新**: 2026-04-30 --- diff --git a/TODO_NEXT_STEPS.txt b/TODO_NEXT_STEPS.txt index 1060b6f..6c65f2e 100644 --- a/TODO_NEXT_STEPS.txt +++ b/TODO_NEXT_STEPS.txt @@ -28,6 +28,7 @@ - Ollama embedding 強化:改為優先 `/api/embed`,舊節點才 fallback `/api/embeddings`,並新增 `EMBEDDING_TIMEOUT`。 - Scheduler 例外記錄強化:清除 `scheduler.py` 靜默 `except/pass`,資源清理、EDM 可選欄位、備份 insight/通知失敗全改為可診斷 log。 - AI metrics baseline 觀測:`/metrics` 在尚無 AI 自動化事件時仍輸出 `momo_ai_*` zero-baseline series,避免重啟後 Grafana/Prometheus 看不到 metric names。 + - ElephantAlpha transient fallback:NVIDIA NIM timeout、connection error、429 與 5xx 會嘗試下一個 fallback model;400 等非暫時性請求錯誤不重試。 【下次待辦】 - 觀察 Prometheus scrape 後 `momo_ai_*` baseline 與非 baseline 事件序列是否持續穩定。 diff --git a/app.py b/app.py index 13fe8c1..f90af88 100644 --- a/app.py +++ b/app.py @@ -95,8 +95,8 @@ except Exception as e: sys_log.error(f"無法檢測磁碟空間: {e}") # 🚩 系統版本定義 (備份與顯示用) -# 🚩 2026-04-30 V10.19: AI metrics zero-baseline export -SYSTEM_VERSION = "V10.19" +# 🚩 2026-04-30 V10.20: ElephantAlpha transient NIM fallback +SYSTEM_VERSION = "V10.20" # ========================================== # 🔒 SQL Injection 防護函數 diff --git a/config.py b/config.py index 09053e8..8cbc8f0 100644 --- a/config.py +++ b/config.py @@ -254,7 +254,7 @@ YOUTUBE_API_KEY = os.getenv('YOUTUBE_API_KEY', '') # ========================================== # 系統版本與路徑 # ========================================== -SYSTEM_VERSION = "V10.19" +SYSTEM_VERSION = "V10.20" LOG_FILE_PATH = os.path.join(BASE_DIR, 'logs/system.log') public_url = PUBLIC_URL # 用於模板顯示 diff --git a/docs/AI_INTELLIGENCE_MODULE_SOT.md b/docs/AI_INTELLIGENCE_MODULE_SOT.md index 7393405..f758bd1 100644 --- a/docs/AI_INTELLIGENCE_MODULE_SOT.md +++ b/docs/AI_INTELLIGENCE_MODULE_SOT.md @@ -2,7 +2,7 @@ > **最後更新**: 2026-04-30 (台北時間) > **狀態**: 🟢 四 AI Agent 自動化閉環已落地 — EventRouter / AutoHeal / OpenClaw Memory / ElephantAlpha bridge / Prometheus metrics / Smoke Dashboard / Smoke Trend Management / Telegram Summary / Grafana provisioning / Prometheus scrape / CD Gunicorn 掛載具測試覆蓋 -> **適用版本**: V10.19 AI metrics baseline 觀測版 +> **適用版本**: V10.20 ElephantAlpha transient fallback 版 --- @@ -73,7 +73,7 @@ SQL漏斗(~300筆) - `/metrics` 對 `realtime_sales_monthly` 只用 raw `SELECT COUNT(*)` 取得總筆數,避免 ORM schema drift 讓 Prometheus scrape 產生 warning。 - `momo-app` 必須 bind mount `./gunicorn.conf.py:/app/gunicorn.conf.py:ro`,讓 CD sync/rebuild 後的 Gunicorn runtime 設定與 repo 保持一致。 - CD rebuild 模式必須先 build image 成功,再短暫 stop/rm/recreate 三應用容器,避免 no-cache build 造成長時間 502。 -- ElephantAlpha 使用 NVIDIA NIM hosted API;production 預設模型為 `nvidia/llama-3.3-nemotron-super-49b-v1.5`,`ELEPHANT_ALPHA_FALLBACK_MODELS` 需保留至少一個可呼叫備援。 +- ElephantAlpha 使用 NVIDIA NIM hosted API;production 預設模型為 `nvidia/llama-3.3-nemotron-super-49b-v1.5`,`ELEPHANT_ALPHA_FALLBACK_MODELS` 需保留至少一個可呼叫備援;403/404、408/409/425/429、5xx、timeout 與 connection error 必須嘗試下一個模型。 - OpenClaw/Hermes embedding 優先呼叫 Ollama `/api/embed`,只在舊節點不支援時 fallback `/api/embeddings`;timeout 由 `EMBEDDING_TIMEOUT` / `OLLAMA_EMBED_TIMEOUT` 控制。 --- diff --git a/docs/ELEPHANT_ALPHA_SETUP.md b/docs/ELEPHANT_ALPHA_SETUP.md index 7509fb8..5f1f6ef 100644 --- a/docs/ELEPHANT_ALPHA_SETUP.md +++ b/docs/ELEPHANT_ALPHA_SETUP.md @@ -47,21 +47,27 @@ Elephant Alpha (Super Orchestrator) cp .env.example .env ``` -2. **Configure OpenRouter API:** +2. **Configure NVIDIA NIM API:** ```bash -# Get API key from https://openrouter.ai/keys -export OPENROUTER_API_KEY="sk-or-v1-your-api-key" +# Get API key from NVIDIA NIM / build.nvidia.com +export NVIDIA_API_KEY="nvapi-your-api-key" ``` 3. **Update .env file:** ```env # Elephant Alpha Configuration -OPENROUTER_API_KEY=sk-or-v1-your-openrouter-api-key-here -ELEPHANT_ALPHA_MODEL=openrouter/elephant-alpha +NVIDIA_API_KEY=nvapi-your-nvidia-api-key-here +ELEPHANT_ALPHA_NEMOTRON_NIM_ENDPOINT=https://integrate.api.nvidia.com/v1 +ELEPHANT_ALPHA_URL=https://integrate.api.nvidia.com/v1/chat/completions +ELEPHANT_ALPHA_MODEL=nvidia/llama-3.3-nemotron-super-49b-v1.5 +ELEPHANT_ALPHA_FALLBACK_MODELS=nvidia/llama-3.3-nemotron-super-49b-v1.5,nvidia/llama-3.1-nemotron-70b-instruct,meta/llama-3.1-8b-instruct +ELEPHANT_TIMEOUT=120 ELEPHANT_ALPHA_CONFIDENCE_THRESHOLD=0.7 ELEPHANT_ALPHA_MAX_AUTONOMOUS_DECISIONS_PER_HOUR=10 ``` +Runtime fallback rule: ElephantService tries the next `ELEPHANT_ALPHA_FALLBACK_MODELS` entry when NVIDIA NIM returns 403/404, transient 408/409/425/429, 5xx, timeout, or connection error. Non-transient client errors such as HTTP 400 fail fast so bad requests do not burn quota across all models. + ### Step 2: Install Dependencies ```bash diff --git a/docs/memory/ai_automation_closure_20260429.md b/docs/memory/ai_automation_closure_20260429.md index 0af04f0..f19f977 100644 --- a/docs/memory/ai_automation_closure_20260429.md +++ b/docs/memory/ai_automation_closure_20260429.md @@ -28,6 +28,7 @@ - 2026-04-30 OpenClaw embedding worker 曾在舊 `/api/embeddings` 路徑遇到 Hermes timeout;Ollama client 已改為優先 `/api/embed`,舊節點才 fallback `/api/embeddings`。 - 2026-04-30 `scheduler.py` 殘留靜默 `except/pass`;已改為 warning/debug log,備份 insight 與 Telegram 通知失敗不再靜默。 - 2026-04-30 `/metrics` 已補 `momo_ai_*` zero-baseline series;app 重啟後即使尚無 EventRouter / AutoHeal 事件,Prometheus/Grafana 也能先看到 metric names。 +- 2026-04-30 ElephantAlpha NIM fallback 已擴大到 timeout、connection error、429 與 5xx;primary model 暫時卡住時會嘗試下一個 `ELEPHANT_ALPHA_FALLBACK_MODELS`。 ## 已落地範圍 @@ -68,6 +69,7 @@ - 2026-04-30 Ollama embedding API migration:新增 `tests/test_ollama_embedding.py`。 - 2026-04-30 Phase 3f cleanup contracts:`tests/test_phase3f_cleanup_contracts.py` 覆蓋 orphan services、env 範例、scheduler 靜默例外。 - 2026-04-30 AI metrics baseline:`tests/test_ai_automation_metrics.py` 覆蓋無事件 snapshot 仍匯出 `momo_ai_*` baseline。 +- 2026-04-30 ElephantAlpha transient fallback:`tests/test_elephant_service.py` 覆蓋 timeout、503 fallback 與 400 不 fallback。 - 2026-04-29 L2 安全記憶批次:`24 passed`。 - collect-only:`48 tests collected`。 - `git diff --check` 已通過。 diff --git a/docs/memory/history_logs.md b/docs/memory/history_logs.md index f964bed..c04a34c 100644 --- a/docs/memory/history_logs.md +++ b/docs/memory/history_logs.md @@ -41,6 +41,7 @@ - **Ollama embedding API 遷移**: embedding client 優先使用官方 `/api/embed`,舊節點才 fallback `/api/embeddings`,降低 deprecated endpoint 與 timeout 風險。 - **Scheduler 例外記錄強化**: 清除 `scheduler.py` 靜默 `except/pass`,Chrome 清理、EDM optional 欄位、備份 insight/Telegram 失敗均保留 log。 - **AI metrics baseline 觀測**: `/metrics` 在尚無 AI 自動化事件時仍輸出 `momo_ai_*` zero-baseline series,避免 app 重啟後 Grafana/Prometheus 看不到 metric names。 +- **ElephantAlpha transient fallback**: NVIDIA NIM primary model timeout、connection error、429 與 5xx 會嘗試下一個 fallback model,400 等非暫時性請求錯誤不重試。 ### 2026-04-28~29:Phase 3e 重構大戰 + daily_sales cache 隱形 bug 根除 - **app.py 縮減 -10.8%**: 7,386 → 6,590 行,11 commits 全綠零 502。 diff --git a/services/elephant_service.py b/services/elephant_service.py index 1e1b7b9..9502070 100644 --- a/services/elephant_service.py +++ b/services/elephant_service.py @@ -38,6 +38,7 @@ ELEPHANT_FALLBACK_MODELS = [ if model.strip() ] ELEPHANT_TIMEOUT = int(os.getenv('ELEPHANT_TIMEOUT', '120')) # 預設 2 分鐘 +ELEPHANT_FALLBACK_HTTP_STATUS_CODES = {403, 404, 408, 409, 425, 429, 500, 502, 503, 504} # Elephant Alpha 定價 (USD per 1M tokens) - NVIDIA NIM 定價 ELEPHANT_PRICING = { @@ -115,6 +116,10 @@ class ElephantService: candidates.append(model_name) return candidates + @staticmethod + def _has_next_model(model_name: str, model_candidates: List[str]) -> bool: + return bool(model_candidates) and model_name != model_candidates[-1] + def generate(self, prompt: str, model: str = None, system_prompt: str = None, temperature: float = 0.3, json_mode: bool = False, timeout: int = None) -> ElephantResponse: @@ -187,18 +192,26 @@ class ElephantService: except requests.HTTPError as e: status_code = e.response.status_code if e.response is not None else None - last_error = str(e) - if status_code in (404, 403) and model_name != model_candidates[-1]: - logger.warning(f"[Elephant] 模型不可用,改用 fallback: {model_name} ({status_code})") + last_error = f"{model_name}: {e}" + if status_code in ELEPHANT_FALLBACK_HTTP_STATUS_CODES and self._has_next_model(model_name, model_candidates): + logger.warning(f"[Elephant] NIM 模型/API 暫時不可用,改用 fallback: {model_name} ({status_code})") + continue + logger.error(f"[Elephant] 生成失敗: {e}") + return ElephantResponse(success=False, content='', model=model_name, error=last_error) + except (requests.Timeout, requests.ConnectionError) as e: + last_error = f"{model_name}: {e}" + if self._has_next_model(model_name, model_candidates): + logger.warning(f"[Elephant] NIM 暫時性連線錯誤,改用 fallback: {model_name} ({e})") continue logger.error(f"[Elephant] 生成失敗: {e}") return ElephantResponse(success=False, content='', model=model_name, error=last_error) except Exception as e: - last_error = str(e) + last_error = f"{model_name}: {e}" logger.error(f"[Elephant] 生成失敗: {e}") return ElephantResponse(success=False, content='', model=model_name, error=last_error) - return ElephantResponse(success=False, content='', model=primary_model, error=last_error or "所有 Elephant fallback model 均不可用") + failed_model = model_candidates[-1] if model_candidates else primary_model + return ElephantResponse(success=False, content='', model=failed_model, error=last_error or "所有 Elephant fallback model 均不可用") # 單例實例 elephant_service = ElephantService() diff --git a/tests/test_elephant_service.py b/tests/test_elephant_service.py index cff7023..c2e9bbc 100644 --- a/tests/test_elephant_service.py +++ b/tests/test_elephant_service.py @@ -45,6 +45,113 @@ def test_elephant_service_falls_back_when_primary_model_is_unavailable(monkeypat assert calls == ["nvidia/unavailable", "nvidia/available"] +def test_elephant_service_falls_back_when_primary_model_times_out(monkeypatch): + from services import elephant_service as module + + calls = [] + + def fake_post(_url, json, headers, timeout): + calls.append(json["model"]) + if json["model"] == "nvidia/slow": + raise requests.Timeout("read timed out") + return FakeResponse( + 200, + { + "choices": [{"message": {"content": "Fallback OK"}}], + "usage": {"prompt_tokens": 4, "completion_tokens": 3}, + }, + ) + + monkeypatch.setattr(module, "ELEPHANT_FALLBACK_MODELS", ["nvidia/available"]) + monkeypatch.setattr(module.requests, "post", fake_post) + + service = module.ElephantService(api_key="test-key", model="nvidia/slow") + result = service.generate("hello", timeout=3) + + assert result.success is True + assert result.model == "nvidia/available" + assert result.content == "Fallback OK" + assert calls == ["nvidia/slow", "nvidia/available"] + + +def test_elephant_service_falls_back_when_primary_model_connection_fails(monkeypatch): + from services import elephant_service as module + + calls = [] + + def fake_post(_url, json, headers, timeout): + calls.append(json["model"]) + if json["model"] == "nvidia/disconnected": + raise requests.ConnectionError("connection reset") + return FakeResponse( + 200, + { + "choices": [{"message": {"content": "Connected fallback"}}], + "usage": {}, + }, + ) + + monkeypatch.setattr(module, "ELEPHANT_FALLBACK_MODELS", ["nvidia/available"]) + monkeypatch.setattr(module.requests, "post", fake_post) + + service = module.ElephantService(api_key="test-key", model="nvidia/disconnected") + result = service.generate("hello") + + assert result.success is True + assert result.model == "nvidia/available" + assert result.content == "Connected fallback" + assert calls == ["nvidia/disconnected", "nvidia/available"] + + +def test_elephant_service_falls_back_on_transient_http_status(monkeypatch): + from services import elephant_service as module + + calls = [] + + def fake_post(_url, json, headers, timeout): + calls.append(json["model"]) + if json["model"] == "nvidia/overloaded": + return FakeResponse(503, {"detail": "temporarily unavailable"}) + return FakeResponse( + 200, + { + "choices": [{"message": {"content": "Recovered"}}], + "usage": {}, + }, + ) + + monkeypatch.setattr(module, "ELEPHANT_FALLBACK_MODELS", ["nvidia/available"]) + monkeypatch.setattr(module.requests, "post", fake_post) + + service = module.ElephantService(api_key="test-key", model="nvidia/overloaded") + result = service.generate("hello") + + assert result.success is True + assert result.model == "nvidia/available" + assert result.content == "Recovered" + assert calls == ["nvidia/overloaded", "nvidia/available"] + + +def test_elephant_service_does_not_fallback_on_non_transient_client_error(monkeypatch): + from services import elephant_service as module + + calls = [] + + def fake_post(_url, json, headers, timeout): + calls.append(json["model"]) + return FakeResponse(400, {"detail": "bad request"}) + + monkeypatch.setattr(module, "ELEPHANT_FALLBACK_MODELS", ["nvidia/available"]) + monkeypatch.setattr(module.requests, "post", fake_post) + + service = module.ElephantService(api_key="test-key", model="nvidia/bad-request") + result = service.generate("hello") + + assert result.success is False + assert result.model == "nvidia/bad-request" + assert calls == ["nvidia/bad-request"] + + def test_elephant_service_uses_reasoning_content_when_content_is_empty(monkeypatch): from services import elephant_service as module