# Nemotron 整合提案 > **版本**: 1.1 > **建立日期**: 2026-03-28 (台北時間) > **建立者**: Claude Code > **狀態**: ✅ **實測完成,待統帥批准** --- ## 🔥 實測結果摘要 (2026-03-28) | 指標 | Nemotron (NIM) | Ollama (CPU) | 結論 | |------|----------------|--------------|------| | **Tool Calling 精準度** | 83.3% (5/6) | ~50% | **Nemotron 勝** | | **平均延遲** | 11-23 秒 | 100+ 秒 | **Nemotron 快 5-10x** | | **繁中支援** | ✅ 良好 | ✅ 良好 | 平手 | | **成本** | 免費 tier | 免費 | 平手 | **建議**: 將 Nemotron 加入 Tool Calling 任務的首選路由 --- ## 目錄 1. [NIM API 整合規格](#1-nim-api-整合規格) 2. [架構設計](#2-架構設計) 3. [測試腳本](#3-測試腳本) 4. [實作計畫](#4-實作計畫) --- ## 1. NIM API 整合規格 ### 1.1 Endpoint 資訊 | 項目 | 值 | |------|-----| | **Base URL** | `https://integrate.api.nvidia.com/v1` | | **Chat Completions** | `/chat/completions` | | **相容性** | ✅ OpenAI API 格式完全相容 | ### 1.2 認證方式 ```bash # 環境變數 export NVIDIA_API_KEY="nvapi-xxxx" # HTTP Header Authorization: Bearer $NVIDIA_API_KEY ``` ### 1.3 可用模型 | 模型 ID | 大小 | 特色 | 建議用途 | |---------|------|------|----------| | `nvidia/nemotron-mini-4b-instruct` | 4B | 輕量、Tool Calling | 快速分類、簡單決策 | | `nvidia/llama-3.1-nemotron-70b-instruct` | 70B | 強推理 | 複雜 Incident 分析 | | `nvidia/nemotron-3-super` | 120B (MoE) | 最強、100萬 Token | 多代理協作 | ### 1.4 請求格式 (OpenAI 相容) ```python import httpx response = httpx.post( "https://integrate.api.nvidia.com/v1/chat/completions", headers={ "Content-Type": "application/json", "Authorization": f"Bearer {NVIDIA_API_KEY}" }, json={ "model": "nvidia/nemotron-mini-4b-instruct", "messages": [ {"role": "system", "content": "You are an SRE assistant."}, {"role": "user", "content": "Analyze this K8s error..."} ], "temperature": 0.2, "max_tokens": 1024, "tools": [...] # Tool Calling 定義 } ) ``` ### 1.5 Tool Calling 格式 ```python tools = [ { "type": "function", "function": { "name": "kubectl_execute", "description": "Execute kubectl command on K8s cluster", "parameters": { "type": "object", "properties": { "command": { "type": "string", "description": "kubectl command (e.g., 'get pods -n awoooi-prod')" }, "namespace": { "type": "string", "description": "Target namespace" } }, "required": ["command"] } } }, { "type": "function", "function": { "name": "restart_deployment", "description": "Restart a Kubernetes deployment", "parameters": { "type": "object", "properties": { "deployment": {"type": "string"}, "namespace": {"type": "string"} }, "required": ["deployment", "namespace"] } } } ] ``` ### 1.6 回應格式 (Tool Call) ```json { "choices": [{ "message": { "role": "assistant", "content": null, "tool_calls": [{ "id": "call_abc123", "type": "function", "function": { "name": "restart_deployment", "arguments": "{\"deployment\": \"awoooi-api\", \"namespace\": \"awoooi-prod\"}" } }] }, "finish_reason": "tool_calls" }] } ``` --- ## 2. 架構設計 ### 2.1 Fallback 層級調整 ``` ┌─────────────────────────────────────────────────────────────────┐ │ 現有架構 │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ Tier 1 Tier 2 Tier 3 │ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │ │ Ollama │ ──▶ │ Gemini │ ──▶ │ Claude │ │ │ │ (188) │ │ (API) │ │ (API) │ │ │ │ 本地 │ │ 免費額度 │ │ 付費 │ │ │ └─────────┘ └─────────┘ └─────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────────────┐ │ 新架構 (加入 Nemotron) │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ ┌──────────────────────────────────┐ │ │ │ Smart Model Router │ │ │ │ (任務類型路由) │ │ │ └──────────────────────────────────┘ │ │ │ │ │ ┌─────────────────┼─────────────────┐ │ │ │ │ │ │ │ ▼ ▼ ▼ │ │ ┌─────────────────┐ ┌───────────┐ ┌─────────────────┐ │ │ │ Tool Calling │ │ 一般對話 │ │ 複雜推理 │ │ │ │ 路徑 │ │ 路徑 │ │ 路徑 │ │ │ └────────┬────────┘ └─────┬─────┘ └────────┬────────┘ │ │ │ │ │ │ │ ▼ ▼ ▼ │ │ ┌─────────────────┐ ┌───────────┐ ┌─────────────────┐ │ │ │ Nemotron (NIM) │ │ Ollama │ │ Nemotron-70B │ │ │ │ nemotron-mini │ │ qwen2.5 │ │ 或 Claude │ │ │ │ 4B, Tool專用 │ │ 本地 │ │ 高品質 │ │ │ └────────┬────────┘ └─────┬─────┘ └────────┬────────┘ │ │ │ │ │ │ │ └────────────────┼────────────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────┐ │ │ │ Fallback Chain │ │ │ │ Gemini → Claude │ │ │ └─────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘ ``` ### 2.2 任務路由規則 ```python # apps/api/src/services/ai/model_router.py ROUTING_RULES = { # Tool Calling 任務 → Nemotron 優先 "tool_calling": { "primary": "nvidia/nemotron-mini-4b-instruct", "fallback": ["gemini-1.5-flash", "claude-3-haiku"] }, # K8s 操作決策 → Nemotron 優先 "k8s_operation": { "primary": "nvidia/nemotron-mini-4b-instruct", "fallback": ["ollama/qwen2.5:7b", "gemini-1.5-flash"] }, # Incident 分析 (複雜推理) → Nemotron-70B 或 Claude "incident_analysis": { "primary": "nvidia/llama-3.1-nemotron-70b-instruct", "fallback": ["claude-3-sonnet", "gemini-1.5-pro"] }, # 一般對話 → 本地 Ollama 優先 "general_chat": { "primary": "ollama/qwen2.5:7b", "fallback": ["gemini-1.5-flash", "claude-3-haiku"] }, # Playbook 生成 → Nemotron (程式碼能力強) "code_generation": { "primary": "nvidia/nemotron-mini-4b-instruct", "fallback": ["ollama/qwen2.5-coder:7b", "claude-3-sonnet"] } } ``` ### 2.3 OpenClaw 整合位置 ``` ┌─────────────────────────────────────────────────────────────────┐ │ OpenClaw Decision Flow │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ 1. Incident 進入 │ │ │ │ │ ▼ │ │ 2. Intent Classifier (意圖分類) │ │ │ └── Ollama qwen2.5 (本地、快速) │ │ │ │ │ ▼ │ │ 3. Complexity Analyzer (複雜度評估) │ │ │ └── Ollama qwen2.5 (本地、快速) │ │ │ │ │ ▼ │ │ 4. Decision Manager (決策生成) ← 🔴 Nemotron 在這裡! │ │ │ ├── Tool Calling 決策 → Nemotron-mini (NIM) │ │ │ ├── 複雜推理 → Nemotron-70B (NIM) │ │ │ └── 一般回覆 → Ollama/Gemini │ │ │ │ │ ▼ │ │ 5. Trust Engine (信任驗證) │ │ │ │ │ ▼ │ │ 6. Multi-Sig (需要時) │ │ │ │ │ ▼ │ │ 7. K8s Executor (執行) │ │ │ └─────────────────────────────────────────────────────────────────┘ ``` ### 2.4 環境變數配置 ```bash # .env.production 新增 # NVIDIA NIM API NVIDIA_API_KEY=nvapi-xxxx NVIDIA_API_BASE_URL=https://integrate.api.nvidia.com/v1 # Model 選擇 NEMOTRON_TOOL_MODEL=nvidia/nemotron-mini-4b-instruct NEMOTRON_REASONING_MODEL=nvidia/llama-3.1-nemotron-70b-instruct # Rate Limiting (免費額度保護) NEMOTRON_RPM_LIMIT=60 NEMOTRON_TPM_LIMIT=100000 ``` --- ## 3. 測試腳本 ### 3.1 Tool Calling 精準度測試 ```python #!/usr/bin/env python3 """ Nemotron Tool Calling 精準度測試 比較 Nemotron vs Gemini vs Qwen 的 Tool Calling 能力 使用方式: export NVIDIA_API_KEY=nvapi-xxxx export GEMINI_API_KEY=xxxx python test_nemotron_tool_calling.py """ import os import json import httpx import asyncio from dataclasses import dataclass from typing import Optional import time # ============================================================================ # 配置 # ============================================================================ NVIDIA_API_KEY = os.getenv("NVIDIA_API_KEY") GEMINI_API_KEY = os.getenv("GEMINI_API_KEY") OLLAMA_BASE_URL = "http://192.168.0.188:11434" # ============================================================================ # Tool 定義 (K8s SRE 場景) # ============================================================================ TOOLS = [ { "type": "function", "function": { "name": "kubectl_get", "description": "Get Kubernetes resources (pods, deployments, services, etc.)", "parameters": { "type": "object", "properties": { "resource": { "type": "string", "enum": ["pods", "deployments", "services", "nodes", "events"], "description": "Resource type to query" }, "namespace": { "type": "string", "description": "Kubernetes namespace (default: awoooi-prod)" }, "name": { "type": "string", "description": "Specific resource name (optional)" } }, "required": ["resource"] } } }, { "type": "function", "function": { "name": "restart_deployment", "description": "Restart a Kubernetes deployment by rolling restart", "parameters": { "type": "object", "properties": { "deployment": { "type": "string", "description": "Deployment name" }, "namespace": { "type": "string", "description": "Kubernetes namespace" } }, "required": ["deployment", "namespace"] } } }, { "type": "function", "function": { "name": "scale_deployment", "description": "Scale a Kubernetes deployment to specified replicas", "parameters": { "type": "object", "properties": { "deployment": {"type": "string"}, "namespace": {"type": "string"}, "replicas": {"type": "integer", "minimum": 0, "maximum": 10} }, "required": ["deployment", "namespace", "replicas"] } } }, { "type": "function", "function": { "name": "get_logs", "description": "Get logs from a Kubernetes pod", "parameters": { "type": "object", "properties": { "pod": {"type": "string"}, "namespace": {"type": "string"}, "tail": {"type": "integer", "description": "Number of lines (default: 100)"}, "container": {"type": "string", "description": "Container name (optional)"} }, "required": ["pod", "namespace"] } } }, { "type": "function", "function": { "name": "send_alert", "description": "Send alert notification via Telegram", "parameters": { "type": "object", "properties": { "severity": {"type": "string", "enum": ["info", "warning", "critical"]}, "message": {"type": "string"}, "incident_id": {"type": "string"} }, "required": ["severity", "message"] } } } ] # ============================================================================ # 測試案例 # ============================================================================ TEST_CASES = [ { "id": "TC001", "description": "簡單查詢 - 列出所有 pods", "prompt": "Show me all pods in awoooi-prod namespace", "expected_tool": "kubectl_get", "expected_params": {"resource": "pods", "namespace": "awoooi-prod"} }, { "id": "TC002", "description": "重啟服務", "prompt": "The API is not responding, please restart the awoooi-api deployment", "expected_tool": "restart_deployment", "expected_params": {"deployment": "awoooi-api", "namespace": "awoooi-prod"} }, { "id": "TC003", "description": "擴展副本", "prompt": "We're getting high traffic, scale awoooi-web to 3 replicas", "expected_tool": "scale_deployment", "expected_params": {"deployment": "awoooi-web", "replicas": 3} }, { "id": "TC004", "description": "查看日誌", "prompt": "Get the last 50 lines of logs from awoooi-api-xxx pod", "expected_tool": "get_logs", "expected_params": {"tail": 50} }, { "id": "TC005", "description": "發送告警", "prompt": "Send a critical alert: Database connection failed for incident INC-2026-001", "expected_tool": "send_alert", "expected_params": {"severity": "critical"} }, { "id": "TC006", "description": "複合理解 - 需要推理", "prompt": "The web frontend is showing 502 errors. Check if the API pods are running.", "expected_tool": "kubectl_get", "expected_params": {"resource": "pods"} }, { "id": "TC007", "description": "繁體中文指令", "prompt": "請重啟 awoooi-worker 這個 deployment", "expected_tool": "restart_deployment", "expected_params": {"deployment": "awoooi-worker"} }, { "id": "TC008", "description": "模糊指令 - 需要推理", "prompt": "Something is wrong with the worker, it keeps crashing. Fix it.", "expected_tool": "restart_deployment", # 或 get_logs "expected_params": {} # 接受多種合理回應 } ] # ============================================================================ # API 客戶端 # ============================================================================ @dataclass class ToolCallResult: model: str test_id: str success: bool tool_called: Optional[str] params: Optional[dict] latency_ms: float error: Optional[str] = None async def call_nemotron(prompt: str, model: str = "nvidia/nemotron-mini-4b-instruct") -> dict: """呼叫 NVIDIA NIM API""" async with httpx.AsyncClient(timeout=30) as client: start = time.time() response = await client.post( "https://integrate.api.nvidia.com/v1/chat/completions", headers={ "Content-Type": "application/json", "Authorization": f"Bearer {NVIDIA_API_KEY}" }, json={ "model": model, "messages": [ {"role": "system", "content": "You are an SRE assistant for AWOOOI AIOps platform. Use the provided tools to help with Kubernetes operations."}, {"role": "user", "content": prompt} ], "tools": TOOLS, "tool_choice": "auto", "temperature": 0.1, "max_tokens": 512 } ) latency = (time.time() - start) * 1000 return {"data": response.json(), "latency_ms": latency} async def call_ollama(prompt: str, model: str = "qwen2.5:7b") -> dict: """呼叫本地 Ollama""" async with httpx.AsyncClient(timeout=60) as client: start = time.time() response = await client.post( f"{OLLAMA_BASE_URL}/api/chat", json={ "model": model, "messages": [ {"role": "system", "content": "You are an SRE assistant. Respond with JSON indicating which tool to call and parameters."}, {"role": "user", "content": f"Based on this request, which tool should be called and with what parameters? Request: {prompt}\n\nAvailable tools: kubectl_get, restart_deployment, scale_deployment, get_logs, send_alert\n\nRespond in JSON format: {{\"tool\": \"tool_name\", \"params\": {{...}}}}"} ], "stream": False, "format": "json" } ) latency = (time.time() - start) * 1000 return {"data": response.json(), "latency_ms": latency} # ============================================================================ # 測試執行 # ============================================================================ def parse_tool_call(response: dict, model_type: str) -> tuple: """解析不同模型的 Tool Call 回應""" try: if model_type == "nemotron": choices = response.get("choices", []) if choices and choices[0].get("message", {}).get("tool_calls"): tool_call = choices[0]["message"]["tool_calls"][0] return ( tool_call["function"]["name"], json.loads(tool_call["function"]["arguments"]) ) # 如果沒有 tool_calls,檢查 content content = choices[0].get("message", {}).get("content", "") return (None, {"content": content}) elif model_type == "ollama": content = response.get("message", {}).get("content", "{}") parsed = json.loads(content) return (parsed.get("tool"), parsed.get("params", {})) except Exception as e: return (None, {"error": str(e)}) return (None, {}) async def run_test(test_case: dict) -> list: """執行單一測試案例""" results = [] prompt = test_case["prompt"] # 測試 Nemotron if NVIDIA_API_KEY: try: resp = await call_nemotron(prompt) tool, params = parse_tool_call(resp["data"], "nemotron") success = tool == test_case["expected_tool"] results.append(ToolCallResult( model="Nemotron-mini-4B", test_id=test_case["id"], success=success, tool_called=tool, params=params, latency_ms=resp["latency_ms"] )) except Exception as e: results.append(ToolCallResult( model="Nemotron-mini-4B", test_id=test_case["id"], success=False, tool_called=None, params=None, latency_ms=0, error=str(e) )) # 測試 Ollama try: resp = await call_ollama(prompt) tool, params = parse_tool_call(resp["data"], "ollama") success = tool == test_case["expected_tool"] results.append(ToolCallResult( model="Ollama-Qwen2.5-7B", test_id=test_case["id"], success=success, tool_called=tool, params=params, latency_ms=resp["latency_ms"] )) except Exception as e: results.append(ToolCallResult( model="Ollama-Qwen2.5-7B", test_id=test_case["id"], success=False, tool_called=None, params=None, latency_ms=0, error=str(e) )) return results async def main(): """主測試流程""" print("=" * 70) print("Nemotron vs Ollama Tool Calling 精準度測試") print("=" * 70) print() all_results = [] for tc in TEST_CASES: print(f"[{tc['id']}] {tc['description']}") print(f" Prompt: {tc['prompt'][:50]}...") print(f" Expected: {tc['expected_tool']}") results = await run_test(tc) all_results.extend(results) for r in results: status = "✅" if r.success else "❌" print(f" {r.model}: {status} → {r.tool_called} ({r.latency_ms:.0f}ms)") if r.error: print(f" Error: {r.error}") print() # 統計結果 print("=" * 70) print("統計結果") print("=" * 70) models = {} for r in all_results: if r.model not in models: models[r.model] = {"success": 0, "total": 0, "latency": []} models[r.model]["total"] += 1 if r.success: models[r.model]["success"] += 1 if r.latency_ms > 0: models[r.model]["latency"].append(r.latency_ms) print(f"{'Model':<25} {'Accuracy':<15} {'Avg Latency':<15}") print("-" * 55) for model, stats in models.items(): acc = stats["success"] / stats["total"] * 100 if stats["total"] > 0 else 0 avg_lat = sum(stats["latency"]) / len(stats["latency"]) if stats["latency"] else 0 print(f"{model:<25} {acc:>6.1f}% {avg_lat:>8.0f}ms") print() print("測試完成!") if __name__ == "__main__": asyncio.run(main()) ``` ### 3.2 快速驗證腳本 (curl) ```bash #!/bin/bash # quick_test_nemotron.sh # 快速驗證 Nemotron API 連線 set -e echo "=== Nemotron API 快速測試 ===" echo "" # 檢查 API Key if [ -z "$NVIDIA_API_KEY" ]; then echo "❌ 請設定 NVIDIA_API_KEY" echo " export NVIDIA_API_KEY=nvapi-xxxx" exit 1 fi echo "✅ API Key 已設定" echo "" # 測試簡單請求 echo "測試 1: 簡單對話..." curl -s -X POST "https://integrate.api.nvidia.com/v1/chat/completions" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $NVIDIA_API_KEY" \ -d '{ "model": "nvidia/nemotron-mini-4b-instruct", "messages": [{"role": "user", "content": "Say hello in JSON format"}], "max_tokens": 50 }' | jq '.choices[0].message.content' echo "" echo "測試 2: Tool Calling..." curl -s -X POST "https://integrate.api.nvidia.com/v1/chat/completions" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $NVIDIA_API_KEY" \ -d '{ "model": "nvidia/nemotron-mini-4b-instruct", "messages": [ {"role": "system", "content": "You are a K8s assistant."}, {"role": "user", "content": "Restart the nginx deployment in production namespace"} ], "tools": [{ "type": "function", "function": { "name": "restart_deployment", "description": "Restart a K8s deployment", "parameters": { "type": "object", "properties": { "deployment": {"type": "string"}, "namespace": {"type": "string"} }, "required": ["deployment", "namespace"] } } }], "tool_choice": "auto", "max_tokens": 200 }' | jq '.choices[0].message' echo "" echo "=== 測試完成 ===" ``` --- ## 4. 實作計畫 ### 4.1 階段規劃 ``` Phase N.1: 驗證 (1-2 天) ────────────────────────── ├── 註冊 build.nvidia.com ├── 取得 NVIDIA_API_KEY ├── 執行 quick_test_nemotron.sh ├── 執行完整 Tool Calling 測試 └── 分析結果,決定是否繼續 Phase N.2: 整合 (2-3 天) ────────────────────────── ├── 建立 NvidiaAIProvider (參考現有 GeminiProvider) ├── 加入 Model Router 路由規則 ├── 配置環境變數 + K8s Secrets ├── Langfuse Tracing 整合 └── 單元測試 Phase N.3: 驗收 (1 天) ────────────────────────── ├── E2E 測試 (真實 Incident 場景) ├── 延遲 + 成本分析 ├── 首席架構師審查 └── 統帥批准上線 ``` ### 4.2 檔案結構 ``` apps/api/src/ ├── services/ │ └── ai/ │ ├── providers/ │ │ ├── ollama_provider.py # 現有 │ │ ├── gemini_provider.py # 現有 │ │ ├── claude_provider.py # 現有 │ │ └── nvidia_provider.py # 🆕 新增 │ │ │ ├── model_router.py # 修改: 加入 Nemotron 路由 │ └── rate_limiter.py # 修改: 加入 Nemotron 限流 ``` ### 4.3 GitHub Secrets 新增 ```yaml # 需要新增到 GitHub Secrets NVIDIA_API_KEY: nvapi-xxxx # 需要新增到 K8s Secrets kubectl create secret generic nvidia-api \ --from-literal=NVIDIA_API_KEY=nvapi-xxxx \ -n awoooi-prod ``` --- ## 5. 成本估算 ### 5.1 免費額度 | 項目 | 預估 | |------|------| | **開發測試** | 免費 (build.nvidia.com) | | **Rate Limit** | 待確認 (可能 60 RPM) | ### 5.2 生產環境 (如需付費) | 模型 | 定價 (預估) | 月用量 | 月成本 | |------|-------------|--------|--------| | nemotron-mini-4b | ~$0.1/1M tokens | ~5M | ~$0.5 | | nemotron-70b | ~$1.0/1M tokens | ~1M | ~$1.0 | **結論**: 成本極低,比 Claude API 便宜很多。 --- ## 6. 風險評估 | 風險 | 機率 | 影響 | 緩解措施 | |------|------|------|----------| | 免費額度不足 | 中 | 低 | Fallback 到 Gemini | | API 延遲高 | 低 | 中 | 本地快取 + Timeout | | Tool Calling 精準度差 | 低 | 高 | 測試階段驗證 | | 服務不穩定 | 低 | 中 | 多層 Fallback | --- ## 附錄: 下一步行動 統帥批准後,立即執行: ```bash # Step 1: 取得 API Key # 前往 https://build.nvidia.com 註冊並取得 Key # Step 2: 設定環境變數 export NVIDIA_API_KEY=nvapi-xxxx # Step 3: 快速驗證 cd apps/api ./scripts/quick_test_nemotron.sh # Step 4: 完整測試 python scripts/test_nemotron_tool_calling.py ``` --- **建立者**: Claude Code **日期**: 2026-03-28 (台北時間) **狀態**: 待審核