- ADR-037 監控增強架構 - MONITORING_MASTER_PLAN 主計畫 - MASTER_EXECUTION_SCHEDULE 執行排程 - Phase D/E/Worker HPA Runbooks Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
874 lines
31 KiB
Markdown
874 lines
31 KiB
Markdown
# Nemotron 整合提案
|
||
|
||
> **版本**: 1.1
|
||
> **建立日期**: 2026-03-28 (台北時間)
|
||
> **建立者**: Claude Code
|
||
> **狀態**: ✅ **實測完成,待統帥批准**
|
||
|
||
---
|
||
|
||
## 🔥 實測結果摘要 (2026-03-28)
|
||
|
||
| 指標 | Nemotron (NIM) | Ollama (CPU) | 結論 |
|
||
|------|----------------|--------------|------|
|
||
| **Tool Calling 精準度** | 83.3% (5/6) | ~50% | **Nemotron 勝** |
|
||
| **平均延遲** | 11-23 秒 | 100+ 秒 | **Nemotron 快 5-10x** |
|
||
| **繁中支援** | ✅ 良好 | ✅ 良好 | 平手 |
|
||
| **成本** | 免費 tier | 免費 | 平手 |
|
||
|
||
**建議**: 將 Nemotron 加入 Tool Calling 任務的首選路由
|
||
|
||
---
|
||
|
||
## 目錄
|
||
|
||
1. [NIM API 整合規格](#1-nim-api-整合規格)
|
||
2. [架構設計](#2-架構設計)
|
||
3. [測試腳本](#3-測試腳本)
|
||
4. [實作計畫](#4-實作計畫)
|
||
|
||
---
|
||
|
||
## 1. NIM API 整合規格
|
||
|
||
### 1.1 Endpoint 資訊
|
||
|
||
| 項目 | 值 |
|
||
|------|-----|
|
||
| **Base URL** | `https://integrate.api.nvidia.com/v1` |
|
||
| **Chat Completions** | `/chat/completions` |
|
||
| **相容性** | ✅ OpenAI API 格式完全相容 |
|
||
|
||
### 1.2 認證方式
|
||
|
||
```bash
|
||
# 環境變數
|
||
export NVIDIA_API_KEY="nvapi-xxxx"
|
||
|
||
# HTTP Header
|
||
Authorization: Bearer $NVIDIA_API_KEY
|
||
```
|
||
|
||
### 1.3 可用模型
|
||
|
||
| 模型 ID | 大小 | 特色 | 建議用途 |
|
||
|---------|------|------|----------|
|
||
| `nvidia/nemotron-mini-4b-instruct` | 4B | 輕量、Tool Calling | 快速分類、簡單決策 |
|
||
| `nvidia/llama-3.1-nemotron-70b-instruct` | 70B | 強推理 | 複雜 Incident 分析 |
|
||
| `nvidia/nemotron-3-super` | 120B (MoE) | 最強、100萬 Token | 多代理協作 |
|
||
|
||
### 1.4 請求格式 (OpenAI 相容)
|
||
|
||
```python
|
||
import httpx
|
||
|
||
response = httpx.post(
|
||
"https://integrate.api.nvidia.com/v1/chat/completions",
|
||
headers={
|
||
"Content-Type": "application/json",
|
||
"Authorization": f"Bearer {NVIDIA_API_KEY}"
|
||
},
|
||
json={
|
||
"model": "nvidia/nemotron-mini-4b-instruct",
|
||
"messages": [
|
||
{"role": "system", "content": "You are an SRE assistant."},
|
||
{"role": "user", "content": "Analyze this K8s error..."}
|
||
],
|
||
"temperature": 0.2,
|
||
"max_tokens": 1024,
|
||
"tools": [...] # Tool Calling 定義
|
||
}
|
||
)
|
||
```
|
||
|
||
### 1.5 Tool Calling 格式
|
||
|
||
```python
|
||
tools = [
|
||
{
|
||
"type": "function",
|
||
"function": {
|
||
"name": "kubectl_execute",
|
||
"description": "Execute kubectl command on K8s cluster",
|
||
"parameters": {
|
||
"type": "object",
|
||
"properties": {
|
||
"command": {
|
||
"type": "string",
|
||
"description": "kubectl command (e.g., 'get pods -n awoooi-prod')"
|
||
},
|
||
"namespace": {
|
||
"type": "string",
|
||
"description": "Target namespace"
|
||
}
|
||
},
|
||
"required": ["command"]
|
||
}
|
||
}
|
||
},
|
||
{
|
||
"type": "function",
|
||
"function": {
|
||
"name": "restart_deployment",
|
||
"description": "Restart a Kubernetes deployment",
|
||
"parameters": {
|
||
"type": "object",
|
||
"properties": {
|
||
"deployment": {"type": "string"},
|
||
"namespace": {"type": "string"}
|
||
},
|
||
"required": ["deployment", "namespace"]
|
||
}
|
||
}
|
||
}
|
||
]
|
||
```
|
||
|
||
### 1.6 回應格式 (Tool Call)
|
||
|
||
```json
|
||
{
|
||
"choices": [{
|
||
"message": {
|
||
"role": "assistant",
|
||
"content": null,
|
||
"tool_calls": [{
|
||
"id": "call_abc123",
|
||
"type": "function",
|
||
"function": {
|
||
"name": "restart_deployment",
|
||
"arguments": "{\"deployment\": \"awoooi-api\", \"namespace\": \"awoooi-prod\"}"
|
||
}
|
||
}]
|
||
},
|
||
"finish_reason": "tool_calls"
|
||
}]
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 2. 架構設計
|
||
|
||
### 2.1 Fallback 層級調整
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────────────┐
|
||
│ 現有架構 │
|
||
├─────────────────────────────────────────────────────────────────┤
|
||
│ │
|
||
│ Tier 1 Tier 2 Tier 3 │
|
||
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
|
||
│ │ Ollama │ ──▶ │ Gemini │ ──▶ │ Claude │ │
|
||
│ │ (188) │ │ (API) │ │ (API) │ │
|
||
│ │ 本地 │ │ 免費額度 │ │ 付費 │ │
|
||
│ └─────────┘ └─────────┘ └─────────┘ │
|
||
│ │
|
||
└─────────────────────────────────────────────────────────────────┘
|
||
|
||
┌─────────────────────────────────────────────────────────────────┐
|
||
│ 新架構 (加入 Nemotron) │
|
||
├─────────────────────────────────────────────────────────────────┤
|
||
│ │
|
||
│ ┌──────────────────────────────────┐ │
|
||
│ │ Smart Model Router │ │
|
||
│ │ (任務類型路由) │ │
|
||
│ └──────────────────────────────────┘ │
|
||
│ │ │
|
||
│ ┌─────────────────┼─────────────────┐ │
|
||
│ │ │ │ │
|
||
│ ▼ ▼ ▼ │
|
||
│ ┌─────────────────┐ ┌───────────┐ ┌─────────────────┐ │
|
||
│ │ Tool Calling │ │ 一般對話 │ │ 複雜推理 │ │
|
||
│ │ 路徑 │ │ 路徑 │ │ 路徑 │ │
|
||
│ └────────┬────────┘ └─────┬─────┘ └────────┬────────┘ │
|
||
│ │ │ │ │
|
||
│ ▼ ▼ ▼ │
|
||
│ ┌─────────────────┐ ┌───────────┐ ┌─────────────────┐ │
|
||
│ │ Nemotron (NIM) │ │ Ollama │ │ Nemotron-70B │ │
|
||
│ │ nemotron-mini │ │ qwen2.5 │ │ 或 Claude │ │
|
||
│ │ 4B, Tool專用 │ │ 本地 │ │ 高品質 │ │
|
||
│ └────────┬────────┘ └─────┬─────┘ └────────┬────────┘ │
|
||
│ │ │ │ │
|
||
│ └────────────────┼────────────────┘ │
|
||
│ │ │
|
||
│ ▼ │
|
||
│ ┌─────────────────┐ │
|
||
│ │ Fallback Chain │ │
|
||
│ │ Gemini → Claude │ │
|
||
│ └─────────────────┘ │
|
||
│ │
|
||
└─────────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
### 2.2 任務路由規則
|
||
|
||
```python
|
||
# apps/api/src/services/ai/model_router.py
|
||
|
||
ROUTING_RULES = {
|
||
# Tool Calling 任務 → Nemotron 優先
|
||
"tool_calling": {
|
||
"primary": "nvidia/nemotron-mini-4b-instruct",
|
||
"fallback": ["gemini-1.5-flash", "claude-3-haiku"]
|
||
},
|
||
|
||
# K8s 操作決策 → Nemotron 優先
|
||
"k8s_operation": {
|
||
"primary": "nvidia/nemotron-mini-4b-instruct",
|
||
"fallback": ["ollama/qwen2.5:7b", "gemini-1.5-flash"]
|
||
},
|
||
|
||
# Incident 分析 (複雜推理) → Nemotron-70B 或 Claude
|
||
"incident_analysis": {
|
||
"primary": "nvidia/llama-3.1-nemotron-70b-instruct",
|
||
"fallback": ["claude-3-sonnet", "gemini-1.5-pro"]
|
||
},
|
||
|
||
# 一般對話 → 本地 Ollama 優先
|
||
"general_chat": {
|
||
"primary": "ollama/qwen2.5:7b",
|
||
"fallback": ["gemini-1.5-flash", "claude-3-haiku"]
|
||
},
|
||
|
||
# Playbook 生成 → Nemotron (程式碼能力強)
|
||
"code_generation": {
|
||
"primary": "nvidia/nemotron-mini-4b-instruct",
|
||
"fallback": ["ollama/qwen2.5-coder:7b", "claude-3-sonnet"]
|
||
}
|
||
}
|
||
```
|
||
|
||
### 2.3 OpenClaw 整合位置
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────────────┐
|
||
│ OpenClaw Decision Flow │
|
||
├─────────────────────────────────────────────────────────────────┤
|
||
│ │
|
||
│ 1. Incident 進入 │
|
||
│ │ │
|
||
│ ▼ │
|
||
│ 2. Intent Classifier (意圖分類) │
|
||
│ │ └── Ollama qwen2.5 (本地、快速) │
|
||
│ │ │
|
||
│ ▼ │
|
||
│ 3. Complexity Analyzer (複雜度評估) │
|
||
│ │ └── Ollama qwen2.5 (本地、快速) │
|
||
│ │ │
|
||
│ ▼ │
|
||
│ 4. Decision Manager (決策生成) ← 🔴 Nemotron 在這裡! │
|
||
│ │ ├── Tool Calling 決策 → Nemotron-mini (NIM) │
|
||
│ │ ├── 複雜推理 → Nemotron-70B (NIM) │
|
||
│ │ └── 一般回覆 → Ollama/Gemini │
|
||
│ │ │
|
||
│ ▼ │
|
||
│ 5. Trust Engine (信任驗證) │
|
||
│ │ │
|
||
│ ▼ │
|
||
│ 6. Multi-Sig (需要時) │
|
||
│ │ │
|
||
│ ▼ │
|
||
│ 7. K8s Executor (執行) │
|
||
│ │
|
||
└─────────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
### 2.4 環境變數配置
|
||
|
||
```bash
|
||
# .env.production 新增
|
||
|
||
# NVIDIA NIM API
|
||
NVIDIA_API_KEY=nvapi-xxxx
|
||
NVIDIA_API_BASE_URL=https://integrate.api.nvidia.com/v1
|
||
|
||
# Model 選擇
|
||
NEMOTRON_TOOL_MODEL=nvidia/nemotron-mini-4b-instruct
|
||
NEMOTRON_REASONING_MODEL=nvidia/llama-3.1-nemotron-70b-instruct
|
||
|
||
# Rate Limiting (免費額度保護)
|
||
NEMOTRON_RPM_LIMIT=60
|
||
NEMOTRON_TPM_LIMIT=100000
|
||
```
|
||
|
||
---
|
||
|
||
## 3. 測試腳本
|
||
|
||
### 3.1 Tool Calling 精準度測試
|
||
|
||
```python
|
||
#!/usr/bin/env python3
|
||
"""
|
||
Nemotron Tool Calling 精準度測試
|
||
比較 Nemotron vs Gemini vs Qwen 的 Tool Calling 能力
|
||
|
||
使用方式:
|
||
export NVIDIA_API_KEY=nvapi-xxxx
|
||
export GEMINI_API_KEY=xxxx
|
||
python test_nemotron_tool_calling.py
|
||
"""
|
||
|
||
import os
|
||
import json
|
||
import httpx
|
||
import asyncio
|
||
from dataclasses import dataclass
|
||
from typing import Optional
|
||
import time
|
||
|
||
# ============================================================================
|
||
# 配置
|
||
# ============================================================================
|
||
|
||
NVIDIA_API_KEY = os.getenv("NVIDIA_API_KEY")
|
||
GEMINI_API_KEY = os.getenv("GEMINI_API_KEY")
|
||
OLLAMA_BASE_URL = "http://192.168.0.188:11434"
|
||
|
||
# ============================================================================
|
||
# Tool 定義 (K8s SRE 場景)
|
||
# ============================================================================
|
||
|
||
TOOLS = [
|
||
{
|
||
"type": "function",
|
||
"function": {
|
||
"name": "kubectl_get",
|
||
"description": "Get Kubernetes resources (pods, deployments, services, etc.)",
|
||
"parameters": {
|
||
"type": "object",
|
||
"properties": {
|
||
"resource": {
|
||
"type": "string",
|
||
"enum": ["pods", "deployments", "services", "nodes", "events"],
|
||
"description": "Resource type to query"
|
||
},
|
||
"namespace": {
|
||
"type": "string",
|
||
"description": "Kubernetes namespace (default: awoooi-prod)"
|
||
},
|
||
"name": {
|
||
"type": "string",
|
||
"description": "Specific resource name (optional)"
|
||
}
|
||
},
|
||
"required": ["resource"]
|
||
}
|
||
}
|
||
},
|
||
{
|
||
"type": "function",
|
||
"function": {
|
||
"name": "restart_deployment",
|
||
"description": "Restart a Kubernetes deployment by rolling restart",
|
||
"parameters": {
|
||
"type": "object",
|
||
"properties": {
|
||
"deployment": {
|
||
"type": "string",
|
||
"description": "Deployment name"
|
||
},
|
||
"namespace": {
|
||
"type": "string",
|
||
"description": "Kubernetes namespace"
|
||
}
|
||
},
|
||
"required": ["deployment", "namespace"]
|
||
}
|
||
}
|
||
},
|
||
{
|
||
"type": "function",
|
||
"function": {
|
||
"name": "scale_deployment",
|
||
"description": "Scale a Kubernetes deployment to specified replicas",
|
||
"parameters": {
|
||
"type": "object",
|
||
"properties": {
|
||
"deployment": {"type": "string"},
|
||
"namespace": {"type": "string"},
|
||
"replicas": {"type": "integer", "minimum": 0, "maximum": 10}
|
||
},
|
||
"required": ["deployment", "namespace", "replicas"]
|
||
}
|
||
}
|
||
},
|
||
{
|
||
"type": "function",
|
||
"function": {
|
||
"name": "get_logs",
|
||
"description": "Get logs from a Kubernetes pod",
|
||
"parameters": {
|
||
"type": "object",
|
||
"properties": {
|
||
"pod": {"type": "string"},
|
||
"namespace": {"type": "string"},
|
||
"tail": {"type": "integer", "description": "Number of lines (default: 100)"},
|
||
"container": {"type": "string", "description": "Container name (optional)"}
|
||
},
|
||
"required": ["pod", "namespace"]
|
||
}
|
||
}
|
||
},
|
||
{
|
||
"type": "function",
|
||
"function": {
|
||
"name": "send_alert",
|
||
"description": "Send alert notification via Telegram",
|
||
"parameters": {
|
||
"type": "object",
|
||
"properties": {
|
||
"severity": {"type": "string", "enum": ["info", "warning", "critical"]},
|
||
"message": {"type": "string"},
|
||
"incident_id": {"type": "string"}
|
||
},
|
||
"required": ["severity", "message"]
|
||
}
|
||
}
|
||
}
|
||
]
|
||
|
||
# ============================================================================
|
||
# 測試案例
|
||
# ============================================================================
|
||
|
||
TEST_CASES = [
|
||
{
|
||
"id": "TC001",
|
||
"description": "簡單查詢 - 列出所有 pods",
|
||
"prompt": "Show me all pods in awoooi-prod namespace",
|
||
"expected_tool": "kubectl_get",
|
||
"expected_params": {"resource": "pods", "namespace": "awoooi-prod"}
|
||
},
|
||
{
|
||
"id": "TC002",
|
||
"description": "重啟服務",
|
||
"prompt": "The API is not responding, please restart the awoooi-api deployment",
|
||
"expected_tool": "restart_deployment",
|
||
"expected_params": {"deployment": "awoooi-api", "namespace": "awoooi-prod"}
|
||
},
|
||
{
|
||
"id": "TC003",
|
||
"description": "擴展副本",
|
||
"prompt": "We're getting high traffic, scale awoooi-web to 3 replicas",
|
||
"expected_tool": "scale_deployment",
|
||
"expected_params": {"deployment": "awoooi-web", "replicas": 3}
|
||
},
|
||
{
|
||
"id": "TC004",
|
||
"description": "查看日誌",
|
||
"prompt": "Get the last 50 lines of logs from awoooi-api-xxx pod",
|
||
"expected_tool": "get_logs",
|
||
"expected_params": {"tail": 50}
|
||
},
|
||
{
|
||
"id": "TC005",
|
||
"description": "發送告警",
|
||
"prompt": "Send a critical alert: Database connection failed for incident INC-2026-001",
|
||
"expected_tool": "send_alert",
|
||
"expected_params": {"severity": "critical"}
|
||
},
|
||
{
|
||
"id": "TC006",
|
||
"description": "複合理解 - 需要推理",
|
||
"prompt": "The web frontend is showing 502 errors. Check if the API pods are running.",
|
||
"expected_tool": "kubectl_get",
|
||
"expected_params": {"resource": "pods"}
|
||
},
|
||
{
|
||
"id": "TC007",
|
||
"description": "繁體中文指令",
|
||
"prompt": "請重啟 awoooi-worker 這個 deployment",
|
||
"expected_tool": "restart_deployment",
|
||
"expected_params": {"deployment": "awoooi-worker"}
|
||
},
|
||
{
|
||
"id": "TC008",
|
||
"description": "模糊指令 - 需要推理",
|
||
"prompt": "Something is wrong with the worker, it keeps crashing. Fix it.",
|
||
"expected_tool": "restart_deployment", # 或 get_logs
|
||
"expected_params": {} # 接受多種合理回應
|
||
}
|
||
]
|
||
|
||
# ============================================================================
|
||
# API 客戶端
|
||
# ============================================================================
|
||
|
||
@dataclass
|
||
class ToolCallResult:
|
||
model: str
|
||
test_id: str
|
||
success: bool
|
||
tool_called: Optional[str]
|
||
params: Optional[dict]
|
||
latency_ms: float
|
||
error: Optional[str] = None
|
||
|
||
async def call_nemotron(prompt: str, model: str = "nvidia/nemotron-mini-4b-instruct") -> dict:
|
||
"""呼叫 NVIDIA NIM API"""
|
||
async with httpx.AsyncClient(timeout=30) as client:
|
||
start = time.time()
|
||
response = await client.post(
|
||
"https://integrate.api.nvidia.com/v1/chat/completions",
|
||
headers={
|
||
"Content-Type": "application/json",
|
||
"Authorization": f"Bearer {NVIDIA_API_KEY}"
|
||
},
|
||
json={
|
||
"model": model,
|
||
"messages": [
|
||
{"role": "system", "content": "You are an SRE assistant for AWOOOI AIOps platform. Use the provided tools to help with Kubernetes operations."},
|
||
{"role": "user", "content": prompt}
|
||
],
|
||
"tools": TOOLS,
|
||
"tool_choice": "auto",
|
||
"temperature": 0.1,
|
||
"max_tokens": 512
|
||
}
|
||
)
|
||
latency = (time.time() - start) * 1000
|
||
return {"data": response.json(), "latency_ms": latency}
|
||
|
||
async def call_ollama(prompt: str, model: str = "qwen2.5:7b") -> dict:
|
||
"""呼叫本地 Ollama"""
|
||
async with httpx.AsyncClient(timeout=60) as client:
|
||
start = time.time()
|
||
response = await client.post(
|
||
f"{OLLAMA_BASE_URL}/api/chat",
|
||
json={
|
||
"model": model,
|
||
"messages": [
|
||
{"role": "system", "content": "You are an SRE assistant. Respond with JSON indicating which tool to call and parameters."},
|
||
{"role": "user", "content": f"Based on this request, which tool should be called and with what parameters? Request: {prompt}\n\nAvailable tools: kubectl_get, restart_deployment, scale_deployment, get_logs, send_alert\n\nRespond in JSON format: {{\"tool\": \"tool_name\", \"params\": {{...}}}}"}
|
||
],
|
||
"stream": False,
|
||
"format": "json"
|
||
}
|
||
)
|
||
latency = (time.time() - start) * 1000
|
||
return {"data": response.json(), "latency_ms": latency}
|
||
|
||
# ============================================================================
|
||
# 測試執行
|
||
# ============================================================================
|
||
|
||
def parse_tool_call(response: dict, model_type: str) -> tuple:
|
||
"""解析不同模型的 Tool Call 回應"""
|
||
try:
|
||
if model_type == "nemotron":
|
||
choices = response.get("choices", [])
|
||
if choices and choices[0].get("message", {}).get("tool_calls"):
|
||
tool_call = choices[0]["message"]["tool_calls"][0]
|
||
return (
|
||
tool_call["function"]["name"],
|
||
json.loads(tool_call["function"]["arguments"])
|
||
)
|
||
# 如果沒有 tool_calls,檢查 content
|
||
content = choices[0].get("message", {}).get("content", "")
|
||
return (None, {"content": content})
|
||
|
||
elif model_type == "ollama":
|
||
content = response.get("message", {}).get("content", "{}")
|
||
parsed = json.loads(content)
|
||
return (parsed.get("tool"), parsed.get("params", {}))
|
||
|
||
except Exception as e:
|
||
return (None, {"error": str(e)})
|
||
|
||
return (None, {})
|
||
|
||
async def run_test(test_case: dict) -> list:
|
||
"""執行單一測試案例"""
|
||
results = []
|
||
prompt = test_case["prompt"]
|
||
|
||
# 測試 Nemotron
|
||
if NVIDIA_API_KEY:
|
||
try:
|
||
resp = await call_nemotron(prompt)
|
||
tool, params = parse_tool_call(resp["data"], "nemotron")
|
||
success = tool == test_case["expected_tool"]
|
||
results.append(ToolCallResult(
|
||
model="Nemotron-mini-4B",
|
||
test_id=test_case["id"],
|
||
success=success,
|
||
tool_called=tool,
|
||
params=params,
|
||
latency_ms=resp["latency_ms"]
|
||
))
|
||
except Exception as e:
|
||
results.append(ToolCallResult(
|
||
model="Nemotron-mini-4B",
|
||
test_id=test_case["id"],
|
||
success=False,
|
||
tool_called=None,
|
||
params=None,
|
||
latency_ms=0,
|
||
error=str(e)
|
||
))
|
||
|
||
# 測試 Ollama
|
||
try:
|
||
resp = await call_ollama(prompt)
|
||
tool, params = parse_tool_call(resp["data"], "ollama")
|
||
success = tool == test_case["expected_tool"]
|
||
results.append(ToolCallResult(
|
||
model="Ollama-Qwen2.5-7B",
|
||
test_id=test_case["id"],
|
||
success=success,
|
||
tool_called=tool,
|
||
params=params,
|
||
latency_ms=resp["latency_ms"]
|
||
))
|
||
except Exception as e:
|
||
results.append(ToolCallResult(
|
||
model="Ollama-Qwen2.5-7B",
|
||
test_id=test_case["id"],
|
||
success=False,
|
||
tool_called=None,
|
||
params=None,
|
||
latency_ms=0,
|
||
error=str(e)
|
||
))
|
||
|
||
return results
|
||
|
||
async def main():
|
||
"""主測試流程"""
|
||
print("=" * 70)
|
||
print("Nemotron vs Ollama Tool Calling 精準度測試")
|
||
print("=" * 70)
|
||
print()
|
||
|
||
all_results = []
|
||
|
||
for tc in TEST_CASES:
|
||
print(f"[{tc['id']}] {tc['description']}")
|
||
print(f" Prompt: {tc['prompt'][:50]}...")
|
||
print(f" Expected: {tc['expected_tool']}")
|
||
|
||
results = await run_test(tc)
|
||
all_results.extend(results)
|
||
|
||
for r in results:
|
||
status = "✅" if r.success else "❌"
|
||
print(f" {r.model}: {status} → {r.tool_called} ({r.latency_ms:.0f}ms)")
|
||
if r.error:
|
||
print(f" Error: {r.error}")
|
||
print()
|
||
|
||
# 統計結果
|
||
print("=" * 70)
|
||
print("統計結果")
|
||
print("=" * 70)
|
||
|
||
models = {}
|
||
for r in all_results:
|
||
if r.model not in models:
|
||
models[r.model] = {"success": 0, "total": 0, "latency": []}
|
||
models[r.model]["total"] += 1
|
||
if r.success:
|
||
models[r.model]["success"] += 1
|
||
if r.latency_ms > 0:
|
||
models[r.model]["latency"].append(r.latency_ms)
|
||
|
||
print(f"{'Model':<25} {'Accuracy':<15} {'Avg Latency':<15}")
|
||
print("-" * 55)
|
||
for model, stats in models.items():
|
||
acc = stats["success"] / stats["total"] * 100 if stats["total"] > 0 else 0
|
||
avg_lat = sum(stats["latency"]) / len(stats["latency"]) if stats["latency"] else 0
|
||
print(f"{model:<25} {acc:>6.1f}% {avg_lat:>8.0f}ms")
|
||
|
||
print()
|
||
print("測試完成!")
|
||
|
||
if __name__ == "__main__":
|
||
asyncio.run(main())
|
||
```
|
||
|
||
### 3.2 快速驗證腳本 (curl)
|
||
|
||
```bash
|
||
#!/bin/bash
|
||
# quick_test_nemotron.sh
|
||
# 快速驗證 Nemotron API 連線
|
||
|
||
set -e
|
||
|
||
echo "=== Nemotron API 快速測試 ==="
|
||
echo ""
|
||
|
||
# 檢查 API Key
|
||
if [ -z "$NVIDIA_API_KEY" ]; then
|
||
echo "❌ 請設定 NVIDIA_API_KEY"
|
||
echo " export NVIDIA_API_KEY=nvapi-xxxx"
|
||
exit 1
|
||
fi
|
||
|
||
echo "✅ API Key 已設定"
|
||
echo ""
|
||
|
||
# 測試簡單請求
|
||
echo "測試 1: 簡單對話..."
|
||
curl -s -X POST "https://integrate.api.nvidia.com/v1/chat/completions" \
|
||
-H "Content-Type: application/json" \
|
||
-H "Authorization: Bearer $NVIDIA_API_KEY" \
|
||
-d '{
|
||
"model": "nvidia/nemotron-mini-4b-instruct",
|
||
"messages": [{"role": "user", "content": "Say hello in JSON format"}],
|
||
"max_tokens": 50
|
||
}' | jq '.choices[0].message.content'
|
||
|
||
echo ""
|
||
echo "測試 2: Tool Calling..."
|
||
curl -s -X POST "https://integrate.api.nvidia.com/v1/chat/completions" \
|
||
-H "Content-Type: application/json" \
|
||
-H "Authorization: Bearer $NVIDIA_API_KEY" \
|
||
-d '{
|
||
"model": "nvidia/nemotron-mini-4b-instruct",
|
||
"messages": [
|
||
{"role": "system", "content": "You are a K8s assistant."},
|
||
{"role": "user", "content": "Restart the nginx deployment in production namespace"}
|
||
],
|
||
"tools": [{
|
||
"type": "function",
|
||
"function": {
|
||
"name": "restart_deployment",
|
||
"description": "Restart a K8s deployment",
|
||
"parameters": {
|
||
"type": "object",
|
||
"properties": {
|
||
"deployment": {"type": "string"},
|
||
"namespace": {"type": "string"}
|
||
},
|
||
"required": ["deployment", "namespace"]
|
||
}
|
||
}
|
||
}],
|
||
"tool_choice": "auto",
|
||
"max_tokens": 200
|
||
}' | jq '.choices[0].message'
|
||
|
||
echo ""
|
||
echo "=== 測試完成 ==="
|
||
```
|
||
|
||
---
|
||
|
||
## 4. 實作計畫
|
||
|
||
### 4.1 階段規劃
|
||
|
||
```
|
||
Phase N.1: 驗證 (1-2 天)
|
||
──────────────────────────
|
||
├── 註冊 build.nvidia.com
|
||
├── 取得 NVIDIA_API_KEY
|
||
├── 執行 quick_test_nemotron.sh
|
||
├── 執行完整 Tool Calling 測試
|
||
└── 分析結果,決定是否繼續
|
||
|
||
Phase N.2: 整合 (2-3 天)
|
||
──────────────────────────
|
||
├── 建立 NvidiaAIProvider (參考現有 GeminiProvider)
|
||
├── 加入 Model Router 路由規則
|
||
├── 配置環境變數 + K8s Secrets
|
||
├── Langfuse Tracing 整合
|
||
└── 單元測試
|
||
|
||
Phase N.3: 驗收 (1 天)
|
||
──────────────────────────
|
||
├── E2E 測試 (真實 Incident 場景)
|
||
├── 延遲 + 成本分析
|
||
├── 首席架構師審查
|
||
└── 統帥批准上線
|
||
```
|
||
|
||
### 4.2 檔案結構
|
||
|
||
```
|
||
apps/api/src/
|
||
├── services/
|
||
│ └── ai/
|
||
│ ├── providers/
|
||
│ │ ├── ollama_provider.py # 現有
|
||
│ │ ├── gemini_provider.py # 現有
|
||
│ │ ├── claude_provider.py # 現有
|
||
│ │ └── nvidia_provider.py # 🆕 新增
|
||
│ │
|
||
│ ├── model_router.py # 修改: 加入 Nemotron 路由
|
||
│ └── rate_limiter.py # 修改: 加入 Nemotron 限流
|
||
```
|
||
|
||
### 4.3 GitHub Secrets 新增
|
||
|
||
```yaml
|
||
# 需要新增到 GitHub Secrets
|
||
NVIDIA_API_KEY: nvapi-xxxx
|
||
|
||
# 需要新增到 K8s Secrets
|
||
kubectl create secret generic nvidia-api \
|
||
--from-literal=NVIDIA_API_KEY=nvapi-xxxx \
|
||
-n awoooi-prod
|
||
```
|
||
|
||
---
|
||
|
||
## 5. 成本估算
|
||
|
||
### 5.1 免費額度
|
||
|
||
| 項目 | 預估 |
|
||
|------|------|
|
||
| **開發測試** | 免費 (build.nvidia.com) |
|
||
| **Rate Limit** | 待確認 (可能 60 RPM) |
|
||
|
||
### 5.2 生產環境 (如需付費)
|
||
|
||
| 模型 | 定價 (預估) | 月用量 | 月成本 |
|
||
|------|-------------|--------|--------|
|
||
| nemotron-mini-4b | ~$0.1/1M tokens | ~5M | ~$0.5 |
|
||
| nemotron-70b | ~$1.0/1M tokens | ~1M | ~$1.0 |
|
||
|
||
**結論**: 成本極低,比 Claude API 便宜很多。
|
||
|
||
---
|
||
|
||
## 6. 風險評估
|
||
|
||
| 風險 | 機率 | 影響 | 緩解措施 |
|
||
|------|------|------|----------|
|
||
| 免費額度不足 | 中 | 低 | Fallback 到 Gemini |
|
||
| API 延遲高 | 低 | 中 | 本地快取 + Timeout |
|
||
| Tool Calling 精準度差 | 低 | 高 | 測試階段驗證 |
|
||
| 服務不穩定 | 低 | 中 | 多層 Fallback |
|
||
|
||
---
|
||
|
||
## 附錄: 下一步行動
|
||
|
||
統帥批准後,立即執行:
|
||
|
||
```bash
|
||
# Step 1: 取得 API Key
|
||
# 前往 https://build.nvidia.com 註冊並取得 Key
|
||
|
||
# Step 2: 設定環境變數
|
||
export NVIDIA_API_KEY=nvapi-xxxx
|
||
|
||
# Step 3: 快速驗證
|
||
cd apps/api
|
||
./scripts/quick_test_nemotron.sh
|
||
|
||
# Step 4: 完整測試
|
||
python scripts/test_nemotron_tool_calling.py
|
||
```
|
||
|
||
---
|
||
|
||
**建立者**: Claude Code
|
||
**日期**: 2026-03-28 (台北時間)
|
||
**狀態**: 待審核
|