migrations 024/025/026 — 統一 LLM 遙測 + 預算告警 + RAG 一致性護欄 - 024: ai_calls 表 + 5 索引 + 6 CHECK constraint(H1/H2/M3/L3) - 025: mcp_calls + ai_call_budgets + 10 種子預算(含 ollama_secondary) - 026: ai_insights.embedding_signature + pgcrypto + CONCURRENTLY index A11 critic 三輪審查記錄完整保留: - Phase 1 schema review: 2 BLOCKER + 4 HIGH + 6 MEDIUM 全處理 - Phase 1 final sign-off: 0 BLOCKER + 2 HIGH + 4 MEDIUM - Phase 6 ADR review: 5 BLOCKER + 6 HIGH 全修 Operation Ollama-First v5.0 / Phase 0+1+6 護欄 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
263 lines
14 KiB
Markdown
263 lines
14 KiB
Markdown
# Phase 0 探測報告 — Operation Ollama-First v5.0
|
||
|
||
> **日期**:2026-05-03
|
||
> **產出**:A1 onboarder(LLM/MCP audit)+ A2 web-researcher(替代查證)
|
||
> **狀態**:Phase 0 完成,作為 Phase 1+ 的事實基線
|
||
|
||
---
|
||
|
||
## TL;DR — 三個必讀結論
|
||
|
||
1. **LLM 呼叫點實測 ≥ 34 個**(戰役清單原 26 個,補強 8 個遺漏點)。AIGenerationHistory 覆蓋率僅 **11.8%**(4/34),其餘 88% 完全沒結構化記錄。
|
||
2. **A2 三項紅綠燈**:Tavily+Exa 🟢 / Qwen 替代 🟡 / DeepSeek-R1 🔴(改用 qwen3:14b)
|
||
3. **四個 P0 風險**:AiderHeal 寫死 111、Code Review Hermes 寫死 111、bge-m3 `:latest` tag 漂移、OllamaService 多 worker 競態
|
||
|
||
---
|
||
|
||
## Section 1 — LLM 呼叫點完整盤點(34 個)
|
||
|
||
### 1.1 主機標記原則
|
||
|
||
| 標記 | 定義 |
|
||
|---|---|
|
||
| `gcp_ollama` | 預設 GCP(34.21.145.224:11434),失敗自動 fallback `111_ollama` |
|
||
| `ollama_111` | 寫死 `192.168.0.111:11434`(如 AiderHeal、Code Review Hermes)|
|
||
| `gemini` | `google.generativeai` SDK |
|
||
| `nim` | NVIDIA NIM `https://integrate.api.nvidia.com/v1` |
|
||
| `nim_via_elephant` | `services/elephant_service.py` 走 NIM endpoint |
|
||
|
||
### 1.2 完整呼叫點表
|
||
|
||
| ID | 功能 | file:line | 模型 | 主機 | Cron 觸發 | History? |
|
||
|----|------|-----------|------|------|-----------|----------|
|
||
| 1 | Hermes 競價分析(批量威脅)| `services/hermes_analyst_service.py:411-426` | `hermes3:latest` (keep_alive 24h) | gcp_ollama → 111 | 每 4h | ❌ |
|
||
| 2 | Hermes L1 意圖分類(Telegram NLP)| `services/hermes_analyst_service.py:151-167` | `hermes3:latest` | gcp_ollama → 111 | 事件驅動 | ❌ |
|
||
| 3 | KM Embedding(worker queue)| `services/openclaw_learning_service.py:111` + `services/ollama_service.py:592-639` | `bge-m3:latest` | EMBEDDING_HOST → resolve | 每 60s 輪詢 | ❌ |
|
||
| 4 | KM Embedding(即時 RAG 查詢)| `services/openclaw_learning_service.py:399` | `bge-m3:latest` | 同上 | 事件驅動 | ❌ |
|
||
| 5 | **AiderHeal Code Repair** ⚠️| `services/aider_heal_executor.py:48-49` | `qwen2.5-coder:7b` | **寫死 111**(違反 ADR-027)| Code Review 觸發 | ❌ |
|
||
| 6 | MCP L1/L2 Gemini Grounding | `services/mcp_collector_service.py:163-167, 185-186` | `gemini-2.0-flash` → `gemini-1.5-flash` | gemini | 6 topic / 24h | ❌ |
|
||
| 7 | MCP L3 Ollama Fallback | `services/mcp_collector_service.py:205-214` | `qwen2.5-coder:7b` | gcp_ollama → 111 | Gemini 雙重失敗才觸發 | ❌ |
|
||
| 8 | OpenClaw 日報 | `services/openclaw_strategist_service.py:1093` → `_call_gemini` (L668) → `_call_nvidia_nim` (L694) | `gemini-2.5-flash` → `meta/llama-3.3-70b-instruct` | gemini → nim | 每日 09:00 | ❌ |
|
||
| 9 | OpenClaw 週報 | `services/openclaw_strategist_service.py:759` | 同上 | 同上 | 週一 06:00 | ❌ |
|
||
| 10 | OpenClaw 月報 | `services/openclaw_strategist_service.py:1267` | 同上 | 同上 | 每月 1 日 07:00 | ❌ |
|
||
| 11 | OpenClaw Meta 自審 | `services/openclaw_strategist_service.py:1503` | 同上 | 同上 | 每 6h | ❌ |
|
||
| 12 | OpenClaw Q&A(Telegram NLP)| `services/openclaw_strategist_service.py:56` | 同上 | gemini → nim | 事件驅動 | ❌ |
|
||
| 13 | **NemoTron 行動派發** | `services/nemoton_dispatcher_service.py:101-102` | `meta/llama-3.1-8b-instruct` | nim(80 calls/day 配額)| 每 4h | ❌ |
|
||
| 14 | **Code Review – Hermes 掃描** ⚠️| `services/code_review_pipeline_service.py:218-225` | `hermes3:latest` | **寫死 HERMES_URL(111)**| CD 部署 | ❌ |
|
||
| 15 | Code Review – OpenClaw 評估 | `services/code_review_pipeline_service.py:278-286` | `gemini-2.5-flash` | gemini | CD 部署 | ❌ |
|
||
| 16 | Code Review – ElephantAlpha 降級 | `services/code_review_pipeline_service.py:293-299` → `services/elephant_service.py:24-30` | `nvidia/llama-3.3-nemotron-super-49b-v1.5` (chain) | nim | CD 部署 | ❌ |
|
||
| 17 | EA Autonomous Engine | `services/elephant_alpha_autonomous_engine.py:540` | ElephantService | nim | daemon thread | ❌ |
|
||
| 18 | EA HITL pre-fetch(Hermes 預跑)| `services/elephant_alpha_orchestrator.py`(line TBD)| `hermes3:latest` | gcp_ollama → 111 | EA escalation 事件 | ❌ |
|
||
| 19 | PPT Gemini 分析 | `routes/openclaw_bot_routes.py:2464-2477` `_call_gemini` | `gemini-2.0-flash` | gemini | Telegram 指令 | ❌ |
|
||
| 20 | PPT Ollama Fallback | `routes/openclaw_bot_routes.py:2479-2500` | `qwen2.5-coder:7b` | gcp_ollama → 111 | 主路徑失敗 | ❌ |
|
||
| 21 | **PPT NIM (deepseek-v3.2)** ⚠️| `routes/openclaw_bot_routes.py:2513-2528` | `deepseek-ai/deepseek-v3.2`(不在 ELEPHANT_FALLBACK 列表)| nim | 同上 | ❌ |
|
||
| 22 | Sales Copy | `routes/ai_routes.py:650` + `services/ollama_service.py:219-308` | `llama3.1:8b` | gcp_ollama → 111 | HTTP API | ✅ |
|
||
| 23 | Trend 商品比對 | `routes/ai_routes.py:503` | `llama3.1:8b` | gcp_ollama → 111 | HTTP API | ✅ |
|
||
| 24 | Trend Web Search Q&A | `routes/trend_routes.py:293-294` + `routes/ai_routes.py:1129` | `llama3.1:8b` | gcp_ollama → 111 | HTTP | 部分 ✅ |
|
||
| 25 | Product Insights | `routes/ai_routes.py:1219` | `llama3.1:8b` | gcp_ollama → 111 | HTTP | ✅ |
|
||
| 26 | Trend Keywords | `routes/ai_routes.py:1307` | `llama3.1:8b` | gcp_ollama → 111 | HTTP | ✅ |
|
||
| 27 | Telegram Bot `/copy` | `services/telegram_bot_service.py:347-362` | `llama3.1:8b` | gcp_ollama → 111 | Telegram | ❌ |
|
||
| 28 | Telegram Bot 第二處 | `services/telegram_bot_service.py:1204-1206` | `llama3.1:8b` | gcp_ollama → 111 | Telegram | ❌ |
|
||
| 29 | OpenClaw Bot Q&A 主鏈 Ollama | `routes/openclaw_bot_routes.py:6784-6824` | `llama3.1:8b` | gcp_ollama → 111 | Telegram | ❌ |
|
||
| 30 | OpenClaw Bot Q&A 備援 Gemini | `routes/openclaw_bot_routes.py:~6843+` | `gemini-2.0-flash` | gemini | fallback | ❌ |
|
||
| 31 | OpenClaw Bot Q&A 備援 NIM | `routes/openclaw_bot_routes.py` | `deepseek-ai/deepseek-v3.2` | nim | fallback | ❌ |
|
||
| 32 | bot_api_routes 文案 | `routes/bot_api_routes.py:673-693` | `llama3.1:8b` | gcp_ollama → 111 | HTTP 內部 | ❌ |
|
||
| 33 | trend_crawler_service Ollama | `services/trend_crawler_service.py:35` | `llama3.1:8b` | gcp_ollama → 111 | 趨勢爬蟲流程 | ❌ |
|
||
| 34 | ai_provider 抽象層 | `services/ai_provider.py:74` | `llama3.1:8b` | gcp_ollama → 111 | 由 caller 觸發 | ❌ |
|
||
|
||
### 1.3 戰役清單未列的 8 個遺漏點
|
||
|
||
- #27/#28 `telegram_bot_service.py` 兩處
|
||
- #32 `routes/bot_api_routes.py:673`
|
||
- #33 `services/trend_crawler_service.py:35`
|
||
- #34 `services/ai_provider.py:74`
|
||
- #17 EA Engine 與 #18 EA HITL pre-fetch 是兩條獨立鏈
|
||
- Code Review pipeline 內部其實**同時呼叫 Hermes(#14)+ Gemini(#15)+ ElephantAlpha(#16)三個獨立 LLM**
|
||
|
||
### 1.4 AIGenerationHistory 覆蓋率
|
||
|
||
- 只有 `routes/ai_routes.py` 4 處(L361/1163/1252/1339)
|
||
- **覆蓋率 4/34 ≈ 11.8%**
|
||
- Phase 1 必須建立統一 `ai_calls` 表並接入剩餘 30 個呼叫點
|
||
|
||
---
|
||
|
||
## Section 2 — 13 個 MCP Server 紅綠燈
|
||
|
||
| # | MCP Server | 紅綠燈 | 評估 |
|
||
|---|-----------|--------|------|
|
||
| 1 | mcp-omnisearch(Tavily/Exa)| 🟢 立即引入 | 取代 Gemini Grounding 單點依賴 |
|
||
| 2 | firecrawl-mcp(自建)| 🟢 立即引入 | 補強 SPA 反爬蟲,**強制 mem_limit:2g + chrome-reaper** |
|
||
| 3 | postgres-mcp | 🟢 立即引入 | RBAC 限 SELECT 到 ai_insights/daily_sales/competitor_prices 等熱表 |
|
||
| 4 | playwright-mcp | 🟡 評估後 | 與 firecrawl 重疊,選一個即可 |
|
||
| 5 | memory-mcp(Anthropic KG)| 🔴 不採用 | 違反 ADR-002(pgvector 唯一)|
|
||
| 6 | fetch-mcp | 🟡 評估後 | 簡單 HTTP,requests.get 寫一行就好 |
|
||
| 7 | sequential-thinking-mcp | 🟡 評估後 | Phase 11 RAG 完成後再評估 |
|
||
| 8 | filesystem-mcp | 🟢 立即引入 | 跨 188/110/MacBook 開發效率 |
|
||
| 9 | git-mcp | 🟢 立即引入 | momo 用 Gitea,選 git-mcp(github-mcp 不適用)|
|
||
| 10 | time-mcp | 🟡 評估後 | 已有 TAIPEI_TZ 處理,低優先 |
|
||
| 11 | sentry-mcp | 🔴 不採用 | momo 沒用 Sentry,走 ADR-013 AutoHeal 既有閉環 |
|
||
| 12 | slack-mcp | 🔴 不採用 | 統帥用 Telegram |
|
||
| 13 | gdrive-mcp | 🟡 評估後 | PPT v3 穩定後再考慮 |
|
||
|
||
### 2.1 Phase 10 引入順序(5 個 🟢)
|
||
|
||
1. **postgres-mcp**(最高 ROI — 統帥每天 SQL 查詢)
|
||
2. **mcp-omnisearch**(Tavily 主 + Exa 備,Tavily 1000 free/月,避開 Brave)
|
||
3. **filesystem-mcp**(跨主機開發效率)
|
||
4. **firecrawl-mcp**(爬蟲韌性)
|
||
5. **git-mcp**(Gitea 兼容)
|
||
|
||
---
|
||
|
||
## Section 3 — BGE-M3 一致性現況報告
|
||
|
||
### 3.1 模型參數盤點
|
||
|
||
| 項目 | 實況 |
|
||
|------|------|
|
||
| 主呼叫位置 | `services/ollama_service.py:592-639` `generate_embedding` |
|
||
| 預設模型 | `bge-m3:latest`(floating tag — **風險**)|
|
||
| API endpoint | 主:`POST /api/embed`,fallback:`POST /api/embeddings` |
|
||
| Host 解析 | `host` 參數 > `EMBEDDING_HOST` env > `resolve_ollama_host()` |
|
||
| Timeout | env `OLLAMA_EMBED_TIMEOUT` 或 `EMBEDDING_TIMEOUT`,預設 45s |
|
||
| **normalize 參數** | ❌ **未顯式傳遞**(依賴 server-side 預設)|
|
||
| **pooling 策略** | ❌ **未顯式傳遞**(依賴 server-side 預設 mean)|
|
||
| 維度 | 1024(pgvector column 鎖定)|
|
||
| HNSW 索引 | `vector_cosine_ops`(cosine 距離)|
|
||
|
||
### 3.2 風險警示
|
||
|
||
🔴 **HIGH 風險 1:normalize 未強制**
|
||
- bge-m3 server-side 預設 normalize=True,但無程式契約鎖定
|
||
- **護欄**:在 ai_insights 寫入時記錄 `embedding_signature`(model+normalize+dim hash)
|
||
|
||
🟡 **MED 風險 2:`bge-m3:latest` floating tag**
|
||
- `:latest` 在任何 Ollama upgrade 都會跳版本,**RAG 召回會悄悄退化**
|
||
- **護欄**:固定為某個 digest 或固定 tag
|
||
|
||
🟢 **LOW 風險 3:dim=1024 一致性**
|
||
- 程式與 schema 都鎖 1024,無衝突
|
||
|
||
### 3.3 ai_insights.embedding 統計(**待 SSH 188 確認**)
|
||
|
||
```sql
|
||
SELECT
|
||
COUNT(*) AS total,
|
||
COUNT(embedding) AS with_embedding,
|
||
COUNT(*) - COUNT(embedding) AS missing,
|
||
MIN(created_at) FILTER (WHERE embedding IS NOT NULL) AS earliest,
|
||
MAX(created_at) FILTER (WHERE embedding IS NOT NULL) AS latest,
|
||
COUNT(DISTINCT array_length(embedding::real[], 1)) AS distinct_dims
|
||
FROM ai_insights;
|
||
```
|
||
|
||
> **statistics needed before Phase 11 開工**
|
||
|
||
### 3.4 Embedding worker 存活確認(**待 SSH 188**)
|
||
|
||
```bash
|
||
docker logs momo-scheduler 2>&1 | grep "OCLearn"
|
||
```
|
||
|
||
若 worker 死了,新 ai_insights 會持續累積 `embedding IS NULL`,RAG 召回率降級而無告警。
|
||
|
||
---
|
||
|
||
## Section 4 — A2 替代查證紅綠燈
|
||
|
||
| 任務 | 結論 | 戰術 |
|
||
|------|------|------|
|
||
| OpenClaw Q&A: Gemini → Qwen | 🟡 黃燈 | qwen3:14b + 繁中強制 prompt + Gemini fallback chain + **黃金測試集 A/B 必跑** |
|
||
| Nemotron: NIM → DeepSeek-R1 | 🔴 紅燈 | **改用 qwen3:14b**(DeepSeek-R1 Ollama tool_calls 假支援,GitHub Issue #10935 未解)|
|
||
| Phase 10 Search API | 🟢 綠燈 | Tavily 主(1000 free/月)+ Exa 備(1000 free),月成本 $0;**避開 Brave**(2026-02-12 取消免費 tier)|
|
||
|
||
### 4.1 三大警訊
|
||
|
||
1. **Qwen 繁中短板有學術佐證**(TMMLU+ 論文):必跑黃金集 A/B
|
||
2. **DeepSeek-R1 在 Ollama 是「假支援」**:官方 tools capability 標示但 chat template 缺對應 jinja
|
||
3. **Brave 政策大改**:2026-02-12 後新用戶須綁信用卡
|
||
|
||
---
|
||
|
||
## Section 5 — 統帥決策建議
|
||
|
||
### 5.1 Phase 1 LLM Logger 優先接點 TOP 5
|
||
|
||
| 優先 | 呼叫點 | 理由 |
|
||
|-----|--------|------|
|
||
| **#1** | NemoTron 派發(#13)| NIM 80 calls/day 硬上限 + 結構化輸出,配額管理剛需 |
|
||
| **#2** | OpenClaw 三大報告(#8/#9/#10/#11,4 個合併)| Gemini 主力,prompt+output+token 完整 trace |
|
||
| **#3** | Hermes 競價分析(#1)| 4h 一次 + 每次 ~300 商品,需回溯為何漏 SKU |
|
||
| **#4** | Code Review 三鏈(#14/#15/#16)| ElephantAlpha 49B 成本可觀,需追蹤 |
|
||
| **#5** | OpenClaw Bot Q&A 三層 fallback(#29/#30/#31)| Telegram 用戶端體驗一線 |
|
||
|
||
### 5.2 統一介面建議
|
||
|
||
```python
|
||
@llm_call_logger(provider, model, callsite)
|
||
def some_llm_call(...):
|
||
# 自動捕捉:prompt/output/tokens_in/tokens_out/duration/host/error/cost
|
||
# 雙寫 ai_calls + 結構化 log
|
||
```
|
||
|
||
AiderHeal(#5)暫不接 logger(透過 SSH 跑 CLI,不在 Python 進程內)。
|
||
|
||
### 5.3 Phase 11 RAG 一致性護欄(必須 Phase 11 開工前完成)
|
||
|
||
1. **bge-m3 模型簽名鎖定**:固定 digest + ai_insights 加 `embedding_signature` 欄位
|
||
2. **Embedding worker 存活確認**:SSH 188 驗證 retry queue worker 真的在跑
|
||
|
||
### 5.4 戰役級風險揭示(v5.1 修訂)
|
||
|
||
🔴 **新增 Phase 2 修補項**:
|
||
- AiderHeal `services/aider_heal_executor.py:48` 寫死 111 → 改 resolve_ollama_host
|
||
- Code Review Hermes `services/code_review_pipeline_service.py:218` 寫死 111 → 同上
|
||
|
||
🟡 **新增 Phase 3 觀察項**:
|
||
- PPT NIM 用 deepseek-v3.2 不在 ELEPHANT_FALLBACK_MODELS → 兩條 NIM 鏈用不同模型,配額易漏算
|
||
- OllamaService 全域單例 + monkey-patch 競態風險(gunicorn 多 worker)
|
||
|
||
---
|
||
|
||
## 附錄:關鍵檔案絕對路徑
|
||
|
||
```
|
||
services/ollama_service.py
|
||
services/hermes_analyst_service.py
|
||
services/openclaw_strategist_service.py
|
||
services/openclaw_learning_service.py
|
||
services/mcp_collector_service.py
|
||
services/nemoton_dispatcher_service.py
|
||
services/elephant_service.py
|
||
services/elephant_alpha_autonomous_engine.py
|
||
services/elephant_alpha_orchestrator.py
|
||
services/code_review_pipeline_service.py
|
||
services/aider_heal_executor.py
|
||
services/ai_history_service.py
|
||
services/telegram_bot_service.py
|
||
services/trend_crawler_service.py
|
||
services/ai_provider.py
|
||
routes/openclaw_bot_routes.py
|
||
routes/ai_routes.py
|
||
routes/trend_routes.py
|
||
routes/bot_api_routes.py
|
||
scheduler.py
|
||
run_scheduler.py
|
||
migrations/009_pgvector_embedding.sql
|
||
migrations/011_embedding_retry_queue.sql
|
||
```
|
||
|
||
---
|
||
|
||
## 來源(A2 web research)
|
||
|
||
- [Qwen3 Technical Report — arXiv](https://arxiv.org/pdf/2505.09388)
|
||
- [Ollama qwen3 registry](https://ollama.com/library/qwen3)
|
||
- [TMMLU+ Traditional Chinese Eval — arXiv](https://arxiv.org/html/2403.01858v1)
|
||
- [DeepSeek-R1-0528 Release Notes](https://api-docs.deepseek.com/news/news250528)
|
||
- [Ollama Issue #10935 — R1 missing tool calling](https://github.com/ollama/ollama/issues/10935)
|
||
- [Tavily Pricing](https://www.tavily.com/pricing)
|
||
- [Brave Free Tier Removal](https://www.implicator.ai/brave-drops-free-search-api-tier-puts-all-developers-on-metered-billing/)
|
||
- [Exa API Pricing](https://exa.ai/pricing)
|