fix(ai): 遷移 Ollama embedding 至 api embed
All checks were successful
CD Pipeline / deploy (push) Successful in 1m46s
All checks were successful
CD Pipeline / deploy (push) Successful in 1m46s
This commit is contained in:
@@ -102,6 +102,8 @@ HERMES_TIMEOUT=120
|
||||
|
||||
# [預設 HERMES_URL] Embedding 服務主機(ADR-003 對齊:embedding 走 Hermes 主機)
|
||||
# EMBEDDING_HOST=http://192.168.0.111:11434
|
||||
# [預設 45] Embedding API timeout;優先使用 Ollama /api/embed,舊節點 fallback /api/embeddings
|
||||
EMBEDDING_TIMEOUT=45
|
||||
|
||||
# ==========================================
|
||||
# Elephant Alpha AI Agent Super Orchestrator Settings
|
||||
@@ -236,6 +238,7 @@ OLLAMA_HOST=https://ollama.wooo.work/ollama
|
||||
OLLAMA_MODEL=gemma3:4b
|
||||
OLLAMA_TIMEOUT=120
|
||||
OLLAMA_COPY_TIMEOUT=180
|
||||
OLLAMA_EMBED_TIMEOUT=45
|
||||
MCP_CACHE_TTL_HOURS=24
|
||||
MCP_GEMINI_MODEL=gemini-2.0-flash
|
||||
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
|
||||
> 本文件定義專案開發的核心準則與不可違反的規範
|
||||
> **建立日期**: 2026-01-12
|
||||
> **當前版本**: V10.16 (DatabaseManager 連線池收斂版)
|
||||
> **當前版本**: V10.17 (Ollama embedding /api/embed 強化版)
|
||||
> **最後更新**: 2026-04-30
|
||||
|
||||
---
|
||||
|
||||
@@ -25,6 +25,7 @@
|
||||
- CD Rebuild 切換強化:rebuild 模式改為先 `docker compose build --no-cache momo-app` 成功,再 stop/rm/recreate 三應用容器,避免長時間 502。
|
||||
- ElephantAlpha NIM fallback 強化:預設改用 production 可呼叫的 `nvidia/llama-3.3-nemotron-super-49b-v1.5`,Ultra 253B 權限 404 時自動 fallback。
|
||||
- DatabaseManager 連線池收斂:PostgreSQL 每 worker pool 調整為 `pool_size=2/max_overflow=3`,避免多 route 重複 new manager 時吃滿連線。
|
||||
- Ollama embedding 強化:改為優先 `/api/embed`,舊節點才 fallback `/api/embeddings`,並新增 `EMBEDDING_TIMEOUT`。
|
||||
|
||||
【下次待辦】
|
||||
- 觀察 Prometheus scrape 後 `momo_ai_*` 是否在事件發生後產生時間序列。
|
||||
|
||||
4
app.py
4
app.py
@@ -95,8 +95,8 @@ except Exception as e:
|
||||
sys_log.error(f"無法檢測磁碟空間: {e}")
|
||||
|
||||
# 🚩 系統版本定義 (備份與顯示用)
|
||||
# 🚩 2026-04-30 V10.16: DatabaseManager PostgreSQL pool convergence
|
||||
SYSTEM_VERSION = "V10.16"
|
||||
# 🚩 2026-04-30 V10.17: Ollama embedding /api/embed hardening
|
||||
SYSTEM_VERSION = "V10.17"
|
||||
|
||||
# ==========================================
|
||||
# 🔒 SQL Injection 防護函數
|
||||
|
||||
@@ -229,6 +229,7 @@ HERMES_TIMEOUT = int(os.getenv('HERMES_TIMEOUT', '120')) # 秒;批量 300 筆
|
||||
# Embedding 服務(ADR-003 對齊:embedding 走 Hermes 主機,內網免認證)
|
||||
# 預設 fallback 到 HERMES_URL;若需獨立 embedding 主機可透過 env 覆寫
|
||||
EMBEDDING_HOST = os.getenv('EMBEDDING_HOST', HERMES_URL)
|
||||
EMBEDDING_TIMEOUT = int(os.getenv('EMBEDDING_TIMEOUT', os.getenv('OLLAMA_EMBED_TIMEOUT', '45')))
|
||||
|
||||
# SSH Jump Configuration (AIOps AutoHeal)
|
||||
SSH_JUMP_HOST = os.getenv('SSH_JUMP_HOST', '192.168.0.110')
|
||||
@@ -253,7 +254,7 @@ YOUTUBE_API_KEY = os.getenv('YOUTUBE_API_KEY', '')
|
||||
# ==========================================
|
||||
# 系統版本與路徑
|
||||
# ==========================================
|
||||
SYSTEM_VERSION = "V10.16"
|
||||
SYSTEM_VERSION = "V10.17"
|
||||
LOG_FILE_PATH = os.path.join(BASE_DIR, 'logs/system.log')
|
||||
public_url = PUBLIC_URL # 用於模板顯示
|
||||
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
|
||||
> **最後更新**: 2026-04-30 (台北時間)
|
||||
> **狀態**: 🟢 四 AI Agent 自動化閉環已落地 — EventRouter / AutoHeal / OpenClaw Memory / ElephantAlpha bridge / Prometheus metrics / Smoke Dashboard / Smoke Trend Management / Telegram Summary / Grafana provisioning / Prometheus scrape / CD Gunicorn 掛載具測試覆蓋
|
||||
> **適用版本**: V10.15 ElephantAlpha NIM fallback 強化版
|
||||
> **適用版本**: V10.17 Ollama embedding /api/embed 強化版
|
||||
|
||||
---
|
||||
|
||||
@@ -73,6 +73,7 @@ SQL漏斗(~300筆)
|
||||
- `momo-app` 必須 bind mount `./gunicorn.conf.py:/app/gunicorn.conf.py:ro`,讓 CD sync/rebuild 後的 Gunicorn runtime 設定與 repo 保持一致。
|
||||
- CD rebuild 模式必須先 build image 成功,再短暫 stop/rm/recreate 三應用容器,避免 no-cache build 造成長時間 502。
|
||||
- ElephantAlpha 使用 NVIDIA NIM hosted API;production 預設模型為 `nvidia/llama-3.3-nemotron-super-49b-v1.5`,`ELEPHANT_ALPHA_FALLBACK_MODELS` 需保留至少一個可呼叫備援。
|
||||
- OpenClaw/Hermes embedding 優先呼叫 Ollama `/api/embed`,只在舊節點不支援時 fallback `/api/embeddings`;timeout 由 `EMBEDDING_TIMEOUT` / `OLLAMA_EMBED_TIMEOUT` 控制。
|
||||
|
||||
---
|
||||
|
||||
|
||||
@@ -101,3 +101,8 @@
|
||||
- **原因**: Flask routes 會頻繁建立 `DatabaseManager()`,若每次都產生新 engine/pool,Gunicorn 多 worker 會快速吃滿 PostgreSQL clients。
|
||||
- **修復**: `DatabaseManager` 以 `(DATABASE_TYPE, DATABASE_PATH)` 重用 engine/session,且 PostgreSQL pool 收斂為 `pool_size=2/max_overflow=3`。
|
||||
- **檢查**: app log 應出現 `使用 PostgreSQL 資料庫 (連線池已收斂)`,Gunicorn `post_fork` 仍需 dispose inherited engines。
|
||||
|
||||
### 10. OpenClaw embedding timeout
|
||||
- **原因**: Hermes/Ollama 負載高或舊 `/api/embeddings` endpoint 慢,會讓 embedding worker 累加 retry。
|
||||
- **檢查**: 看 `embedding_retry_queue` 的 `pending/processing/failed` 分布,並測 `http://192.168.0.111:11434/api/embed`。
|
||||
- **修復**: client 預設使用官方 `/api/embed`;若舊節點 404/405 才 fallback `/api/embeddings`。必要時調整 `EMBEDDING_TIMEOUT`。
|
||||
|
||||
@@ -25,6 +25,7 @@
|
||||
- 2026-04-30 CD Rebuild 模式曾先停三應用容器再 no-cache build,造成 build 時間全變成 502;已改為 build 成功後才短暫 stop/rm/recreate。
|
||||
- 2026-04-30 production `NVIDIA_API_KEY` 可列出 Ultra 253B 但呼叫 `nvidia/llama-3.1-nemotron-ultra-253b-v1` 會 404;ElephantAlpha 預設改用 `nvidia/llama-3.3-nemotron-super-49b-v1.5` 並加入 fallback models。
|
||||
- 2026-04-30 `DatabaseManager()` 多 route 重複建立曾有吃滿 PostgreSQL clients 風險;已重用 engine/session 並將每 worker pool 收斂為 `pool_size=2/max_overflow=3`。
|
||||
- 2026-04-30 OpenClaw embedding worker 曾在舊 `/api/embeddings` 路徑遇到 Hermes timeout;Ollama client 已改為優先 `/api/embed`,舊節點才 fallback `/api/embeddings`。
|
||||
|
||||
## 已落地範圍
|
||||
|
||||
@@ -62,6 +63,7 @@
|
||||
- 2026-04-30 CD rebuild cutover hardening:`tests/test_cd_health_check.py` 覆蓋 build-before-stop 順序。
|
||||
- 2026-04-30 ElephantAlpha NIM fallback hardening:新增 `tests/test_elephant_service.py`。
|
||||
- 2026-04-30 DatabaseManager pool convergence:`tests/test_database_manager_cache.py` 覆蓋 pool size/overflow 與 engine reuse。
|
||||
- 2026-04-30 Ollama embedding API migration:新增 `tests/test_ollama_embedding.py`。
|
||||
- 2026-04-29 L2 安全記憶批次:`24 passed`。
|
||||
- collect-only:`48 tests collected`。
|
||||
- `git diff --check` 已通過。
|
||||
|
||||
@@ -38,6 +38,7 @@
|
||||
- **CD Rebuild 切換強化**: rebuild 模式改成先 build 成功、再短暫 stop/rm/recreate 三應用容器,避免 no-cache build 長時間 502。
|
||||
- **ElephantAlpha NIM fallback 強化**: production 帳號呼叫 Ultra 253B 會 404,預設改用可呼叫的 Nemotron Super 49B v1.5,並加入 70B / 8B fallback。
|
||||
- **DatabaseManager 連線池收斂**: PostgreSQL 每 worker pool 收斂為 `pool_size=2/max_overflow=3`,並以 cache 重用 engine/session。
|
||||
- **Ollama embedding API 遷移**: embedding client 優先使用官方 `/api/embed`,舊節點才 fallback `/api/embeddings`,降低 deprecated endpoint 與 timeout 風險。
|
||||
|
||||
### 2026-04-28~29:Phase 3e 重構大戰 + daily_sales cache 隱形 bug 根除
|
||||
- **app.py 縮減 -10.8%**: 7,386 → 6,590 行,11 commits 全綠零 502。
|
||||
|
||||
@@ -21,6 +21,7 @@ OLLAMA_HOST = os.getenv('OLLAMA_HOST', 'http://192.168.0.111:11434')
|
||||
DEFAULT_MODEL = os.getenv('OLLAMA_MODEL', 'llama3.1:8b') # 較快速的模型
|
||||
TIMEOUT = int(os.getenv('OLLAMA_TIMEOUT', '120')) # 秒 - 2 分鐘
|
||||
COPY_TIMEOUT = int(os.getenv('OLLAMA_COPY_TIMEOUT', '180')) # 文案生成專用超時 - 3 分鐘
|
||||
EMBED_TIMEOUT = int(os.getenv('OLLAMA_EMBED_TIMEOUT', os.getenv('EMBEDDING_TIMEOUT', '45')))
|
||||
|
||||
|
||||
@dataclass
|
||||
@@ -505,8 +506,25 @@ class OllamaService:
|
||||
|
||||
return self.generate(prompt, system_prompt=system_prompt, temperature=0.5, timeout=120)
|
||||
|
||||
@staticmethod
|
||||
def _extract_embedding(payload: Dict[str, Any]) -> List[float]:
|
||||
"""Normalize Ollama /api/embed and legacy /api/embeddings response shapes."""
|
||||
embeddings = payload.get("embeddings")
|
||||
if isinstance(embeddings, list) and embeddings:
|
||||
first = embeddings[0]
|
||||
if isinstance(first, list):
|
||||
return first
|
||||
if all(isinstance(value, (int, float)) for value in embeddings):
|
||||
return embeddings
|
||||
|
||||
embedding = payload.get("embedding")
|
||||
if isinstance(embedding, list):
|
||||
return embedding
|
||||
|
||||
return []
|
||||
|
||||
def generate_embedding(self, text: str, model: str = "bge-m3:latest",
|
||||
host: str = None) -> List[float]:
|
||||
host: str = None, timeout: int = None) -> List[float]:
|
||||
"""
|
||||
[ADR-007, Step 3] 呼叫 Ollama API 將文字轉換為向量 Embedding
|
||||
|
||||
@@ -516,22 +534,38 @@ class OllamaService:
|
||||
避免 self.host 若指向公開 ollama.wooo.work 時回 401。
|
||||
可透過 host 參數 override。
|
||||
"""
|
||||
target_host = host or os.getenv("EMBEDDING_HOST", "http://192.168.0.111:11434")
|
||||
target_host = (host or os.getenv("EMBEDDING_HOST", "http://192.168.0.111:11434")).rstrip("/")
|
||||
request_timeout = timeout or EMBED_TIMEOUT
|
||||
try:
|
||||
payload = {"model": model, "prompt": text}
|
||||
payload = {"model": model, "input": text}
|
||||
response = requests.post(
|
||||
f"{target_host}/api/embeddings",
|
||||
f"{target_host}/api/embed",
|
||||
json=payload,
|
||||
timeout=60,
|
||||
timeout=request_timeout,
|
||||
)
|
||||
if response.status_code == 200:
|
||||
data = response.json()
|
||||
return data.get("embedding", [])
|
||||
else:
|
||||
vec = self._extract_embedding(response.json())
|
||||
if vec:
|
||||
return vec
|
||||
logger.warning(f"Ollama Embed Empty Response @ {target_host}/api/embed")
|
||||
elif response.status_code not in (404, 405):
|
||||
logger.error(
|
||||
f"Ollama Embed Error HTTP {response.status_code} @ {target_host}: {response.text[:200]}"
|
||||
f"Ollama Embed Error HTTP {response.status_code} @ {target_host}/api/embed: {response.text[:200]}"
|
||||
)
|
||||
return []
|
||||
|
||||
# V-Fix: 舊 Ollama 相容;/api/embeddings 已 deprecated,但仍是部分舊節點唯一可用路徑。
|
||||
legacy_response = requests.post(
|
||||
f"{target_host}/api/embeddings",
|
||||
json={"model": model, "prompt": text},
|
||||
timeout=request_timeout,
|
||||
)
|
||||
if legacy_response.status_code == 200:
|
||||
return self._extract_embedding(legacy_response.json())
|
||||
logger.error(
|
||||
f"Ollama Embed Error HTTP {legacy_response.status_code} @ {target_host}/api/embeddings: {legacy_response.text[:200]}"
|
||||
)
|
||||
return []
|
||||
except Exception as e:
|
||||
logger.error(f"Ollama Embed Exception @ {target_host}: {e}")
|
||||
return []
|
||||
|
||||
52
tests/test_ollama_embedding.py
Normal file
52
tests/test_ollama_embedding.py
Normal file
@@ -0,0 +1,52 @@
|
||||
from services.ollama_service import OllamaService
|
||||
|
||||
|
||||
class FakeResponse:
|
||||
def __init__(self, status_code, payload=None, text=""):
|
||||
self.status_code = status_code
|
||||
self._payload = payload or {}
|
||||
self.text = text
|
||||
|
||||
def json(self):
|
||||
return self._payload
|
||||
|
||||
|
||||
def test_generate_embedding_uses_current_embed_endpoint(monkeypatch):
|
||||
calls = []
|
||||
|
||||
def fake_post(url, json, timeout):
|
||||
calls.append((url, json, timeout))
|
||||
return FakeResponse(200, {"embeddings": [[0.1, 0.2, 0.3]]})
|
||||
|
||||
monkeypatch.setattr("services.ollama_service.requests.post", fake_post)
|
||||
|
||||
vec = OllamaService().generate_embedding("hello", model="bge-m3:latest", host="http://ollama", timeout=7)
|
||||
|
||||
assert vec == [0.1, 0.2, 0.3]
|
||||
assert calls == [
|
||||
("http://ollama/api/embed", {"model": "bge-m3:latest", "input": "hello"}, 7),
|
||||
]
|
||||
|
||||
|
||||
def test_generate_embedding_falls_back_to_legacy_embeddings_endpoint(monkeypatch):
|
||||
calls = []
|
||||
|
||||
def fake_post(url, json, timeout):
|
||||
calls.append((url, json, timeout))
|
||||
if url.endswith("/api/embed"):
|
||||
return FakeResponse(404, text="not found")
|
||||
return FakeResponse(200, {"embedding": [0.4, 0.5]})
|
||||
|
||||
monkeypatch.setattr("services.ollama_service.requests.post", fake_post)
|
||||
|
||||
vec = OllamaService().generate_embedding("hello", model="bge-m3:latest", host="http://ollama/", timeout=9)
|
||||
|
||||
assert vec == [0.4, 0.5]
|
||||
assert calls == [
|
||||
("http://ollama/api/embed", {"model": "bge-m3:latest", "input": "hello"}, 9),
|
||||
("http://ollama/api/embeddings", {"model": "bge-m3:latest", "prompt": "hello"}, 9),
|
||||
]
|
||||
|
||||
|
||||
def test_extract_embedding_accepts_flat_embeddings_shape():
|
||||
assert OllamaService._extract_embedding({"embeddings": [0.1, 0.2]}) == [0.1, 0.2]
|
||||
Reference in New Issue
Block a user