Files
awoooi/docs/proposals/NEMOTRON-INTEGRATION-PROPOSAL.md
OG T 89e05e6ea2 docs: ADR-037 + 監控架構提案 + Runbooks
- ADR-037 監控增強架構
- MONITORING_MASTER_PLAN 主計畫
- MASTER_EXECUTION_SCHEDULE 執行排程
- Phase D/E/Worker HPA Runbooks

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 16:04:08 +08:00

874 lines
31 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Nemotron 整合提案
> **版本**: 1.1
> **建立日期**: 2026-03-28 (台北時間)
> **建立者**: Claude Code
> **狀態**: ✅ **實測完成,待統帥批准**
---
## 🔥 實測結果摘要 (2026-03-28)
| 指標 | Nemotron (NIM) | Ollama (CPU) | 結論 |
|------|----------------|--------------|------|
| **Tool Calling 精準度** | 83.3% (5/6) | ~50% | **Nemotron 勝** |
| **平均延遲** | 11-23 秒 | 100+ 秒 | **Nemotron 快 5-10x** |
| **繁中支援** | ✅ 良好 | ✅ 良好 | 平手 |
| **成本** | 免費 tier | 免費 | 平手 |
**建議**: 將 Nemotron 加入 Tool Calling 任務的首選路由
---
## 目錄
1. [NIM API 整合規格](#1-nim-api-整合規格)
2. [架構設計](#2-架構設計)
3. [測試腳本](#3-測試腳本)
4. [實作計畫](#4-實作計畫)
---
## 1. NIM API 整合規格
### 1.1 Endpoint 資訊
| 項目 | 值 |
|------|-----|
| **Base URL** | `https://integrate.api.nvidia.com/v1` |
| **Chat Completions** | `/chat/completions` |
| **相容性** | ✅ OpenAI API 格式完全相容 |
### 1.2 認證方式
```bash
# 環境變數
export NVIDIA_API_KEY="nvapi-xxxx"
# HTTP Header
Authorization: Bearer $NVIDIA_API_KEY
```
### 1.3 可用模型
| 模型 ID | 大小 | 特色 | 建議用途 |
|---------|------|------|----------|
| `nvidia/nemotron-mini-4b-instruct` | 4B | 輕量、Tool Calling | 快速分類、簡單決策 |
| `nvidia/llama-3.1-nemotron-70b-instruct` | 70B | 強推理 | 複雜 Incident 分析 |
| `nvidia/nemotron-3-super` | 120B (MoE) | 最強、100萬 Token | 多代理協作 |
### 1.4 請求格式 (OpenAI 相容)
```python
import httpx
response = httpx.post(
"https://integrate.api.nvidia.com/v1/chat/completions",
headers={
"Content-Type": "application/json",
"Authorization": f"Bearer {NVIDIA_API_KEY}"
},
json={
"model": "nvidia/nemotron-mini-4b-instruct",
"messages": [
{"role": "system", "content": "You are an SRE assistant."},
{"role": "user", "content": "Analyze this K8s error..."}
],
"temperature": 0.2,
"max_tokens": 1024,
"tools": [...] # Tool Calling 定義
}
)
```
### 1.5 Tool Calling 格式
```python
tools = [
{
"type": "function",
"function": {
"name": "kubectl_execute",
"description": "Execute kubectl command on K8s cluster",
"parameters": {
"type": "object",
"properties": {
"command": {
"type": "string",
"description": "kubectl command (e.g., 'get pods -n awoooi-prod')"
},
"namespace": {
"type": "string",
"description": "Target namespace"
}
},
"required": ["command"]
}
}
},
{
"type": "function",
"function": {
"name": "restart_deployment",
"description": "Restart a Kubernetes deployment",
"parameters": {
"type": "object",
"properties": {
"deployment": {"type": "string"},
"namespace": {"type": "string"}
},
"required": ["deployment", "namespace"]
}
}
}
]
```
### 1.6 回應格式 (Tool Call)
```json
{
"choices": [{
"message": {
"role": "assistant",
"content": null,
"tool_calls": [{
"id": "call_abc123",
"type": "function",
"function": {
"name": "restart_deployment",
"arguments": "{\"deployment\": \"awoooi-api\", \"namespace\": \"awoooi-prod\"}"
}
}]
},
"finish_reason": "tool_calls"
}]
}
```
---
## 2. 架構設計
### 2.1 Fallback 層級調整
```
┌─────────────────────────────────────────────────────────────────┐
│ 現有架構 │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Tier 1 Tier 2 Tier 3 │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Ollama │ ──▶ │ Gemini │ ──▶ │ Claude │ │
│ │ (188) │ │ (API) │ │ (API) │ │
│ │ 本地 │ │ 免費額度 │ │ 付費 │ │
│ └─────────┘ └─────────┘ └─────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ 新架構 (加入 Nemotron) │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────────────────────┐ │
│ │ Smart Model Router │ │
│ │ (任務類型路由) │ │
│ └──────────────────────────────────┘ │
│ │ │
│ ┌─────────────────┼─────────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────────┐ ┌───────────┐ ┌─────────────────┐ │
│ │ Tool Calling │ │ 一般對話 │ │ 複雜推理 │ │
│ │ 路徑 │ │ 路徑 │ │ 路徑 │ │
│ └────────┬────────┘ └─────┬─────┘ └────────┬────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────────┐ ┌───────────┐ ┌─────────────────┐ │
│ │ Nemotron (NIM) │ │ Ollama │ │ Nemotron-70B │ │
│ │ nemotron-mini │ │ qwen2.5 │ │ 或 Claude │ │
│ │ 4B, Tool專用 │ │ 本地 │ │ 高品質 │ │
│ └────────┬────────┘ └─────┬─────┘ └────────┬────────┘ │
│ │ │ │ │
│ └────────────────┼────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ Fallback Chain │ │
│ │ Gemini → Claude │ │
│ └─────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
```
### 2.2 任務路由規則
```python
# apps/api/src/services/ai/model_router.py
ROUTING_RULES = {
# Tool Calling 任務 → Nemotron 優先
"tool_calling": {
"primary": "nvidia/nemotron-mini-4b-instruct",
"fallback": ["gemini-1.5-flash", "claude-3-haiku"]
},
# K8s 操作決策 → Nemotron 優先
"k8s_operation": {
"primary": "nvidia/nemotron-mini-4b-instruct",
"fallback": ["ollama/qwen2.5:7b", "gemini-1.5-flash"]
},
# Incident 分析 (複雜推理) → Nemotron-70B 或 Claude
"incident_analysis": {
"primary": "nvidia/llama-3.1-nemotron-70b-instruct",
"fallback": ["claude-3-sonnet", "gemini-1.5-pro"]
},
# 一般對話 → 本地 Ollama 優先
"general_chat": {
"primary": "ollama/qwen2.5:7b",
"fallback": ["gemini-1.5-flash", "claude-3-haiku"]
},
# Playbook 生成 → Nemotron (程式碼能力強)
"code_generation": {
"primary": "nvidia/nemotron-mini-4b-instruct",
"fallback": ["ollama/qwen2.5-coder:7b", "claude-3-sonnet"]
}
}
```
### 2.3 OpenClaw 整合位置
```
┌─────────────────────────────────────────────────────────────────┐
│ OpenClaw Decision Flow │
├─────────────────────────────────────────────────────────────────┤
│ │
│ 1. Incident 進入 │
│ │ │
│ ▼ │
│ 2. Intent Classifier (意圖分類) │
│ │ └── Ollama qwen2.5 (本地、快速) │
│ │ │
│ ▼ │
│ 3. Complexity Analyzer (複雜度評估) │
│ │ └── Ollama qwen2.5 (本地、快速) │
│ │ │
│ ▼ │
│ 4. Decision Manager (決策生成) ← 🔴 Nemotron 在這裡! │
│ │ ├── Tool Calling 決策 → Nemotron-mini (NIM) │
│ │ ├── 複雜推理 → Nemotron-70B (NIM) │
│ │ └── 一般回覆 → Ollama/Gemini │
│ │ │
│ ▼ │
│ 5. Trust Engine (信任驗證) │
│ │ │
│ ▼ │
│ 6. Multi-Sig (需要時) │
│ │ │
│ ▼ │
│ 7. K8s Executor (執行) │
│ │
└─────────────────────────────────────────────────────────────────┘
```
### 2.4 環境變數配置
```bash
# .env.production 新增
# NVIDIA NIM API
NVIDIA_API_KEY=nvapi-xxxx
NVIDIA_API_BASE_URL=https://integrate.api.nvidia.com/v1
# Model 選擇
NEMOTRON_TOOL_MODEL=nvidia/nemotron-mini-4b-instruct
NEMOTRON_REASONING_MODEL=nvidia/llama-3.1-nemotron-70b-instruct
# Rate Limiting (免費額度保護)
NEMOTRON_RPM_LIMIT=60
NEMOTRON_TPM_LIMIT=100000
```
---
## 3. 測試腳本
### 3.1 Tool Calling 精準度測試
```python
#!/usr/bin/env python3
"""
Nemotron Tool Calling 精準度測試
比較 Nemotron vs Gemini vs Qwen 的 Tool Calling 能力
使用方式:
export NVIDIA_API_KEY=nvapi-xxxx
export GEMINI_API_KEY=xxxx
python test_nemotron_tool_calling.py
"""
import os
import json
import httpx
import asyncio
from dataclasses import dataclass
from typing import Optional
import time
# ============================================================================
# 配置
# ============================================================================
NVIDIA_API_KEY = os.getenv("NVIDIA_API_KEY")
GEMINI_API_KEY = os.getenv("GEMINI_API_KEY")
OLLAMA_BASE_URL = "http://192.168.0.188:11434"
# ============================================================================
# Tool 定義 (K8s SRE 場景)
# ============================================================================
TOOLS = [
{
"type": "function",
"function": {
"name": "kubectl_get",
"description": "Get Kubernetes resources (pods, deployments, services, etc.)",
"parameters": {
"type": "object",
"properties": {
"resource": {
"type": "string",
"enum": ["pods", "deployments", "services", "nodes", "events"],
"description": "Resource type to query"
},
"namespace": {
"type": "string",
"description": "Kubernetes namespace (default: awoooi-prod)"
},
"name": {
"type": "string",
"description": "Specific resource name (optional)"
}
},
"required": ["resource"]
}
}
},
{
"type": "function",
"function": {
"name": "restart_deployment",
"description": "Restart a Kubernetes deployment by rolling restart",
"parameters": {
"type": "object",
"properties": {
"deployment": {
"type": "string",
"description": "Deployment name"
},
"namespace": {
"type": "string",
"description": "Kubernetes namespace"
}
},
"required": ["deployment", "namespace"]
}
}
},
{
"type": "function",
"function": {
"name": "scale_deployment",
"description": "Scale a Kubernetes deployment to specified replicas",
"parameters": {
"type": "object",
"properties": {
"deployment": {"type": "string"},
"namespace": {"type": "string"},
"replicas": {"type": "integer", "minimum": 0, "maximum": 10}
},
"required": ["deployment", "namespace", "replicas"]
}
}
},
{
"type": "function",
"function": {
"name": "get_logs",
"description": "Get logs from a Kubernetes pod",
"parameters": {
"type": "object",
"properties": {
"pod": {"type": "string"},
"namespace": {"type": "string"},
"tail": {"type": "integer", "description": "Number of lines (default: 100)"},
"container": {"type": "string", "description": "Container name (optional)"}
},
"required": ["pod", "namespace"]
}
}
},
{
"type": "function",
"function": {
"name": "send_alert",
"description": "Send alert notification via Telegram",
"parameters": {
"type": "object",
"properties": {
"severity": {"type": "string", "enum": ["info", "warning", "critical"]},
"message": {"type": "string"},
"incident_id": {"type": "string"}
},
"required": ["severity", "message"]
}
}
}
]
# ============================================================================
# 測試案例
# ============================================================================
TEST_CASES = [
{
"id": "TC001",
"description": "簡單查詢 - 列出所有 pods",
"prompt": "Show me all pods in awoooi-prod namespace",
"expected_tool": "kubectl_get",
"expected_params": {"resource": "pods", "namespace": "awoooi-prod"}
},
{
"id": "TC002",
"description": "重啟服務",
"prompt": "The API is not responding, please restart the awoooi-api deployment",
"expected_tool": "restart_deployment",
"expected_params": {"deployment": "awoooi-api", "namespace": "awoooi-prod"}
},
{
"id": "TC003",
"description": "擴展副本",
"prompt": "We're getting high traffic, scale awoooi-web to 3 replicas",
"expected_tool": "scale_deployment",
"expected_params": {"deployment": "awoooi-web", "replicas": 3}
},
{
"id": "TC004",
"description": "查看日誌",
"prompt": "Get the last 50 lines of logs from awoooi-api-xxx pod",
"expected_tool": "get_logs",
"expected_params": {"tail": 50}
},
{
"id": "TC005",
"description": "發送告警",
"prompt": "Send a critical alert: Database connection failed for incident INC-2026-001",
"expected_tool": "send_alert",
"expected_params": {"severity": "critical"}
},
{
"id": "TC006",
"description": "複合理解 - 需要推理",
"prompt": "The web frontend is showing 502 errors. Check if the API pods are running.",
"expected_tool": "kubectl_get",
"expected_params": {"resource": "pods"}
},
{
"id": "TC007",
"description": "繁體中文指令",
"prompt": "請重啟 awoooi-worker 這個 deployment",
"expected_tool": "restart_deployment",
"expected_params": {"deployment": "awoooi-worker"}
},
{
"id": "TC008",
"description": "模糊指令 - 需要推理",
"prompt": "Something is wrong with the worker, it keeps crashing. Fix it.",
"expected_tool": "restart_deployment", # 或 get_logs
"expected_params": {} # 接受多種合理回應
}
]
# ============================================================================
# API 客戶端
# ============================================================================
@dataclass
class ToolCallResult:
model: str
test_id: str
success: bool
tool_called: Optional[str]
params: Optional[dict]
latency_ms: float
error: Optional[str] = None
async def call_nemotron(prompt: str, model: str = "nvidia/nemotron-mini-4b-instruct") -> dict:
"""呼叫 NVIDIA NIM API"""
async with httpx.AsyncClient(timeout=30) as client:
start = time.time()
response = await client.post(
"https://integrate.api.nvidia.com/v1/chat/completions",
headers={
"Content-Type": "application/json",
"Authorization": f"Bearer {NVIDIA_API_KEY}"
},
json={
"model": model,
"messages": [
{"role": "system", "content": "You are an SRE assistant for AWOOOI AIOps platform. Use the provided tools to help with Kubernetes operations."},
{"role": "user", "content": prompt}
],
"tools": TOOLS,
"tool_choice": "auto",
"temperature": 0.1,
"max_tokens": 512
}
)
latency = (time.time() - start) * 1000
return {"data": response.json(), "latency_ms": latency}
async def call_ollama(prompt: str, model: str = "qwen2.5:7b") -> dict:
"""呼叫本地 Ollama"""
async with httpx.AsyncClient(timeout=60) as client:
start = time.time()
response = await client.post(
f"{OLLAMA_BASE_URL}/api/chat",
json={
"model": model,
"messages": [
{"role": "system", "content": "You are an SRE assistant. Respond with JSON indicating which tool to call and parameters."},
{"role": "user", "content": f"Based on this request, which tool should be called and with what parameters? Request: {prompt}\n\nAvailable tools: kubectl_get, restart_deployment, scale_deployment, get_logs, send_alert\n\nRespond in JSON format: {{\"tool\": \"tool_name\", \"params\": {{...}}}}"}
],
"stream": False,
"format": "json"
}
)
latency = (time.time() - start) * 1000
return {"data": response.json(), "latency_ms": latency}
# ============================================================================
# 測試執行
# ============================================================================
def parse_tool_call(response: dict, model_type: str) -> tuple:
"""解析不同模型的 Tool Call 回應"""
try:
if model_type == "nemotron":
choices = response.get("choices", [])
if choices and choices[0].get("message", {}).get("tool_calls"):
tool_call = choices[0]["message"]["tool_calls"][0]
return (
tool_call["function"]["name"],
json.loads(tool_call["function"]["arguments"])
)
# 如果沒有 tool_calls檢查 content
content = choices[0].get("message", {}).get("content", "")
return (None, {"content": content})
elif model_type == "ollama":
content = response.get("message", {}).get("content", "{}")
parsed = json.loads(content)
return (parsed.get("tool"), parsed.get("params", {}))
except Exception as e:
return (None, {"error": str(e)})
return (None, {})
async def run_test(test_case: dict) -> list:
"""執行單一測試案例"""
results = []
prompt = test_case["prompt"]
# 測試 Nemotron
if NVIDIA_API_KEY:
try:
resp = await call_nemotron(prompt)
tool, params = parse_tool_call(resp["data"], "nemotron")
success = tool == test_case["expected_tool"]
results.append(ToolCallResult(
model="Nemotron-mini-4B",
test_id=test_case["id"],
success=success,
tool_called=tool,
params=params,
latency_ms=resp["latency_ms"]
))
except Exception as e:
results.append(ToolCallResult(
model="Nemotron-mini-4B",
test_id=test_case["id"],
success=False,
tool_called=None,
params=None,
latency_ms=0,
error=str(e)
))
# 測試 Ollama
try:
resp = await call_ollama(prompt)
tool, params = parse_tool_call(resp["data"], "ollama")
success = tool == test_case["expected_tool"]
results.append(ToolCallResult(
model="Ollama-Qwen2.5-7B",
test_id=test_case["id"],
success=success,
tool_called=tool,
params=params,
latency_ms=resp["latency_ms"]
))
except Exception as e:
results.append(ToolCallResult(
model="Ollama-Qwen2.5-7B",
test_id=test_case["id"],
success=False,
tool_called=None,
params=None,
latency_ms=0,
error=str(e)
))
return results
async def main():
"""主測試流程"""
print("=" * 70)
print("Nemotron vs Ollama Tool Calling 精準度測試")
print("=" * 70)
print()
all_results = []
for tc in TEST_CASES:
print(f"[{tc['id']}] {tc['description']}")
print(f" Prompt: {tc['prompt'][:50]}...")
print(f" Expected: {tc['expected_tool']}")
results = await run_test(tc)
all_results.extend(results)
for r in results:
status = "" if r.success else ""
print(f" {r.model}: {status}{r.tool_called} ({r.latency_ms:.0f}ms)")
if r.error:
print(f" Error: {r.error}")
print()
# 統計結果
print("=" * 70)
print("統計結果")
print("=" * 70)
models = {}
for r in all_results:
if r.model not in models:
models[r.model] = {"success": 0, "total": 0, "latency": []}
models[r.model]["total"] += 1
if r.success:
models[r.model]["success"] += 1
if r.latency_ms > 0:
models[r.model]["latency"].append(r.latency_ms)
print(f"{'Model':<25} {'Accuracy':<15} {'Avg Latency':<15}")
print("-" * 55)
for model, stats in models.items():
acc = stats["success"] / stats["total"] * 100 if stats["total"] > 0 else 0
avg_lat = sum(stats["latency"]) / len(stats["latency"]) if stats["latency"] else 0
print(f"{model:<25} {acc:>6.1f}% {avg_lat:>8.0f}ms")
print()
print("測試完成!")
if __name__ == "__main__":
asyncio.run(main())
```
### 3.2 快速驗證腳本 (curl)
```bash
#!/bin/bash
# quick_test_nemotron.sh
# 快速驗證 Nemotron API 連線
set -e
echo "=== Nemotron API 快速測試 ==="
echo ""
# 檢查 API Key
if [ -z "$NVIDIA_API_KEY" ]; then
echo "❌ 請設定 NVIDIA_API_KEY"
echo " export NVIDIA_API_KEY=nvapi-xxxx"
exit 1
fi
echo "✅ API Key 已設定"
echo ""
# 測試簡單請求
echo "測試 1: 簡單對話..."
curl -s -X POST "https://integrate.api.nvidia.com/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $NVIDIA_API_KEY" \
-d '{
"model": "nvidia/nemotron-mini-4b-instruct",
"messages": [{"role": "user", "content": "Say hello in JSON format"}],
"max_tokens": 50
}' | jq '.choices[0].message.content'
echo ""
echo "測試 2: Tool Calling..."
curl -s -X POST "https://integrate.api.nvidia.com/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $NVIDIA_API_KEY" \
-d '{
"model": "nvidia/nemotron-mini-4b-instruct",
"messages": [
{"role": "system", "content": "You are a K8s assistant."},
{"role": "user", "content": "Restart the nginx deployment in production namespace"}
],
"tools": [{
"type": "function",
"function": {
"name": "restart_deployment",
"description": "Restart a K8s deployment",
"parameters": {
"type": "object",
"properties": {
"deployment": {"type": "string"},
"namespace": {"type": "string"}
},
"required": ["deployment", "namespace"]
}
}
}],
"tool_choice": "auto",
"max_tokens": 200
}' | jq '.choices[0].message'
echo ""
echo "=== 測試完成 ==="
```
---
## 4. 實作計畫
### 4.1 階段規劃
```
Phase N.1: 驗證 (1-2 天)
──────────────────────────
├── 註冊 build.nvidia.com
├── 取得 NVIDIA_API_KEY
├── 執行 quick_test_nemotron.sh
├── 執行完整 Tool Calling 測試
└── 分析結果,決定是否繼續
Phase N.2: 整合 (2-3 天)
──────────────────────────
├── 建立 NvidiaAIProvider (參考現有 GeminiProvider)
├── 加入 Model Router 路由規則
├── 配置環境變數 + K8s Secrets
├── Langfuse Tracing 整合
└── 單元測試
Phase N.3: 驗收 (1 天)
──────────────────────────
├── E2E 測試 (真實 Incident 場景)
├── 延遲 + 成本分析
├── 首席架構師審查
└── 統帥批准上線
```
### 4.2 檔案結構
```
apps/api/src/
├── services/
│ └── ai/
│ ├── providers/
│ │ ├── ollama_provider.py # 現有
│ │ ├── gemini_provider.py # 現有
│ │ ├── claude_provider.py # 現有
│ │ └── nvidia_provider.py # 🆕 新增
│ │
│ ├── model_router.py # 修改: 加入 Nemotron 路由
│ └── rate_limiter.py # 修改: 加入 Nemotron 限流
```
### 4.3 GitHub Secrets 新增
```yaml
# 需要新增到 GitHub Secrets
NVIDIA_API_KEY: nvapi-xxxx
# 需要新增到 K8s Secrets
kubectl create secret generic nvidia-api \
--from-literal=NVIDIA_API_KEY=nvapi-xxxx \
-n awoooi-prod
```
---
## 5. 成本估算
### 5.1 免費額度
| 項目 | 預估 |
|------|------|
| **開發測試** | 免費 (build.nvidia.com) |
| **Rate Limit** | 待確認 (可能 60 RPM) |
### 5.2 生產環境 (如需付費)
| 模型 | 定價 (預估) | 月用量 | 月成本 |
|------|-------------|--------|--------|
| nemotron-mini-4b | ~$0.1/1M tokens | ~5M | ~$0.5 |
| nemotron-70b | ~$1.0/1M tokens | ~1M | ~$1.0 |
**結論**: 成本極低,比 Claude API 便宜很多。
---
## 6. 風險評估
| 風險 | 機率 | 影響 | 緩解措施 |
|------|------|------|----------|
| 免費額度不足 | 中 | 低 | Fallback 到 Gemini |
| API 延遲高 | 低 | 中 | 本地快取 + Timeout |
| Tool Calling 精準度差 | 低 | 高 | 測試階段驗證 |
| 服務不穩定 | 低 | 中 | 多層 Fallback |
---
## 附錄: 下一步行動
統帥批准後,立即執行:
```bash
# Step 1: 取得 API Key
# 前往 https://build.nvidia.com 註冊並取得 Key
# Step 2: 設定環境變數
export NVIDIA_API_KEY=nvapi-xxxx
# Step 3: 快速驗證
cd apps/api
./scripts/quick_test_nemotron.sh
# Step 4: 完整測試
python scripts/test_nemotron_tool_calling.py
```
---
**建立者**: Claude Code
**日期**: 2026-03-28 (台北時間)
**狀態**: 待審核