- ADR-037 監控增強架構 - MONITORING_MASTER_PLAN 主計畫 - MASTER_EXECUTION_SCHEDULE 執行排程 - Phase D/E/Worker HPA Runbooks Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1062 lines
36 KiB
Markdown
1062 lines
36 KiB
Markdown
# Nemotron 整合完整解決方案
|
||
|
||
> **版本**: 1.0
|
||
> **建立日期**: 2026-03-29 00:30 (台北時間)
|
||
> **建立者**: Claude Code (首席架構師)
|
||
> **狀態**: 📋 **待統帥批准**
|
||
|
||
---
|
||
|
||
## 目錄
|
||
|
||
1. [執行摘要](#1-執行摘要)
|
||
2. [架構設計](#2-架構設計)
|
||
3. [實作任務清單](#3-實作任務清單)
|
||
4. [程式碼骨架](#4-程式碼骨架)
|
||
5. [工作衝突評估](#5-工作衝突評估)
|
||
6. [風險評估](#6-風險評估)
|
||
7. [首席架構師審查](#7-首席架構師審查)
|
||
8. [批准與執行](#8-批准與執行)
|
||
|
||
---
|
||
|
||
## 1. 執行摘要
|
||
|
||
### 1.1 背景
|
||
|
||
2026-03-28 統帥指示評估 NVIDIA Nemotron 模型整合可行性。經過實測:
|
||
|
||
| 指標 | 結果 |
|
||
|------|------|
|
||
| **Tool Calling 精準度** | 83.3% (5/6 測試通過) |
|
||
| **平均延遲** | 11-45 秒 (免費 tier) |
|
||
| **繁中支援** | ✅ 良好 |
|
||
| **API 相容性** | ✅ OpenAI 格式 100% 相容 |
|
||
|
||
### 1.2 結論
|
||
|
||
**Nemotron 不是取代,是專才補充**
|
||
|
||
```
|
||
任務類型 → 路由目標
|
||
────────────────────────────────
|
||
Tool Calling → Nemotron (精準度高)
|
||
即時對話 → Ollama (低延遲)
|
||
複雜推理 → Claude (最強)
|
||
通用備援 → Gemini (平衡)
|
||
```
|
||
|
||
### 1.3 預估工時
|
||
|
||
| 階段 | 內容 | 工時 |
|
||
|------|------|------|
|
||
| Phase A | NvidiaProvider 實作 | 4-5h |
|
||
| Phase B | Task Queue 架構 | 3-4h |
|
||
| **總計** | | **7-9h** |
|
||
|
||
---
|
||
|
||
## 2. 架構設計
|
||
|
||
### 2.1 整合位置
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────────────┐
|
||
│ OpenClaw Decision Flow │
|
||
├─────────────────────────────────────────────────────────────────┤
|
||
│ │
|
||
│ 1. Request 進入 │
|
||
│ │ │
|
||
│ ▼ │
|
||
│ 2. TaskRouter (NEW) │
|
||
│ ├── is_tool_calling? ──────────────────────┐ │
|
||
│ │ │ │
|
||
│ │ NO │ YES │
|
||
│ ▼ ▼ │
|
||
│ 3. AIRouter (現有) 4. AsyncQueue │
|
||
│ ├── Intent Classifier │ │
|
||
│ ├── Complexity Scorer ▼ │
|
||
│ └── Provider Selection 5. NvidiaProvider │
|
||
│ ├── Ollama (背景處理) │
|
||
│ ├── Gemini │ │
|
||
│ └── Claude ▼ │
|
||
│ 6. Tool Execution │
|
||
│ │ │
|
||
│ ▼ ▼ │
|
||
│ 7. Response 8. Callback/Webhook │
|
||
│ │
|
||
└─────────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
### 2.2 新增/修改檔案清單
|
||
|
||
| 類型 | 檔案路徑 | 變更內容 |
|
||
|------|----------|----------|
|
||
| **新增** | `src/services/ai/nvidia_provider.py` | NvidiaProvider 類別 |
|
||
| **新增** | `src/services/ai/task_router.py` | 同步/非同步任務路由 |
|
||
| **新增** | `src/models/nvidia.py` | Pydantic Schema |
|
||
| **修改** | `src/services/ai_router.py` | 加入 NVIDIA Provider |
|
||
| **修改** | `src/services/ai_rate_limiter.py` | 加入 NVIDIA 限制 |
|
||
| **修改** | `src/services/model_registry.py` | 加入 NVIDIA 模型 |
|
||
| **修改** | `src/core/config.py` | 加入 NVIDIA_API_KEY |
|
||
| **新增** | `tests/test_nvidia_provider.py` | 單元測試 |
|
||
|
||
### 2.3 Provider 對照表 (更新後)
|
||
|
||
| Provider | 用途 | 延遲 | 精準度 | 成本 |
|
||
|----------|------|------|--------|------|
|
||
| **Ollama** | 即時對話、簡單查詢 | < 5s | 中 | $0 |
|
||
| **Nemotron** | Tool Calling、K8s 操作 | 11-45s | 高 (83%) | 免費 tier |
|
||
| **Gemini** | 通用備援 | 2-5s | 中高 | 低 |
|
||
| **Claude** | 複雜推理、CRITICAL | 2-5s | 最高 | 高 |
|
||
|
||
---
|
||
|
||
## 3. 實作任務清單
|
||
|
||
### Phase A: NvidiaProvider 實作 (4-5h)
|
||
|
||
#### A.1 環境配置 (30min)
|
||
|
||
| # | 任務 | 優先級 | 依賴 |
|
||
|---|------|--------|------|
|
||
| A.1.1 | 新增 `NVIDIA_API_KEY` 到 `config.py` | P0 | - |
|
||
| A.1.2 | 新增 `NVIDIA_API_KEY` 到 GitHub Secrets | P0 | A.1.1 |
|
||
| A.1.3 | 新增 `NVIDIA_API_KEY` 到 K8s Secrets | P0 | A.1.1 |
|
||
| A.1.4 | 更新 `03-secrets.yaml` 模板 | P0 | A.1.1 |
|
||
|
||
#### A.2 核心實作 (2h)
|
||
|
||
| # | 任務 | 優先級 | 依賴 |
|
||
|---|------|--------|------|
|
||
| A.2.1 | 建立 `src/models/nvidia.py` (Pydantic Schema) | P0 | - |
|
||
| A.2.2 | 建立 `src/services/ai/nvidia_provider.py` | P0 | A.2.1 |
|
||
| A.2.3 | 實作 `NvidiaProvider.chat()` 基本對話 | P0 | A.2.2 |
|
||
| A.2.4 | 實作 `NvidiaProvider.tool_call()` Tool Calling | P0 | A.2.2 |
|
||
| A.2.5 | 實作 Schema 驗證 (`_validate_tool_call()`) | P0 | A.2.4 |
|
||
| A.2.6 | 實作重試機制 (`_with_retry()`) | P1 | A.2.4 |
|
||
| A.2.7 | 實作 Fallback 降級 (`_fallback_to_gemini()`) | P1 | A.2.4 |
|
||
|
||
#### A.3 整合現有系統 (1.5h)
|
||
|
||
| # | 任務 | 優先級 | 依賴 |
|
||
|---|------|--------|------|
|
||
| A.3.1 | 更新 `ai_router.py` - 加入 `AIProvider.NVIDIA` | P0 | A.2.2 |
|
||
| A.3.2 | 更新 `ai_router.py` - Tool Calling 路由規則 | P0 | A.3.1 |
|
||
| A.3.3 | 更新 `ai_rate_limiter.py` - NVIDIA 限制 | P0 | A.2.2 |
|
||
| A.3.4 | 更新 `model_registry.py` - NVIDIA 模型 | P0 | A.2.2 |
|
||
| A.3.5 | Langfuse Tracing 整合 | P1 | A.2.2 |
|
||
|
||
#### A.4 HITL 高風險保護 (1h)
|
||
|
||
| # | 任務 | 優先級 | 依賴 |
|
||
|---|------|--------|------|
|
||
| A.4.1 | 定義 `HIGH_RISK_TOOLS` 清單 | P0 | - |
|
||
| A.4.2 | 實作 `_request_human_approval()` | P0 | A.4.1 |
|
||
| A.4.3 | 整合 Telegram 確認按鈕 | P0 | A.4.2 |
|
||
| A.4.4 | 實作審批超時處理 | P1 | A.4.3 |
|
||
|
||
#### A.5 測試 (30min)
|
||
|
||
| # | 任務 | 優先級 | 依賴 |
|
||
|---|------|--------|------|
|
||
| A.5.1 | 建立 `tests/test_nvidia_provider.py` | P0 | A.2.7 |
|
||
| A.5.2 | 單元測試: Schema 驗證 | P0 | A.5.1 |
|
||
| A.5.3 | 單元測試: 重試機制 | P1 | A.5.1 |
|
||
| A.5.4 | 整合測試: Tool Calling E2E | P1 | A.5.1 |
|
||
|
||
### Phase B: Task Queue 架構 (3-4h)
|
||
|
||
#### B.1 Task Router (1h)
|
||
|
||
| # | 任務 | 優先級 | 依賴 |
|
||
|---|------|--------|------|
|
||
| B.1.1 | 建立 `src/services/ai/task_router.py` | P0 | A.2.7 |
|
||
| B.1.2 | 定義 `SYNC_TASKS` / `ASYNC_TASKS` | P0 | B.1.1 |
|
||
| B.1.3 | 實作 `TaskRouter.route()` | P0 | B.1.2 |
|
||
| B.1.4 | 整合到 OpenClaw 主流程 | P0 | B.1.3 |
|
||
|
||
#### B.2 Redis Queue (1.5h)
|
||
|
||
| # | 任務 | 優先級 | 依賴 |
|
||
|---|------|--------|------|
|
||
| B.2.1 | 選擇 Queue 方案 (ARQ vs Celery vs RQ) | P0 | - |
|
||
| B.2.2 | 安裝依賴 (`arq` 或 `celery`) | P0 | B.2.1 |
|
||
| B.2.3 | 建立 `src/workers/nvidia_worker.py` | P0 | B.2.2 |
|
||
| B.2.4 | 實作 `enqueue_tool_call()` | P0 | B.2.3 |
|
||
| B.2.5 | 實作 `process_tool_call()` | P0 | B.2.3 |
|
||
| B.2.6 | 實作結果回調 (Webhook/SSE) | P1 | B.2.5 |
|
||
|
||
#### B.3 部署配置 (1h)
|
||
|
||
| # | 任務 | 優先級 | 依賴 |
|
||
|---|------|--------|------|
|
||
| B.3.1 | 建立 K8s Worker Deployment | P0 | B.2.6 |
|
||
| B.3.2 | 更新 Nginx 配置 (如需) | P1 | B.3.1 |
|
||
| B.3.3 | 更新 CD 流程 | P0 | B.3.1 |
|
||
|
||
#### B.4 測試 (30min)
|
||
|
||
| # | 任務 | 優先級 | 依賴 |
|
||
|---|------|--------|------|
|
||
| B.4.1 | Queue 整合測試 | P0 | B.2.6 |
|
||
| B.4.2 | E2E 測試: 非同步 Tool Calling | P0 | B.4.1 |
|
||
|
||
---
|
||
|
||
## 4. 程式碼骨架
|
||
|
||
### 4.1 nvidia_provider.py
|
||
|
||
```python
|
||
"""
|
||
NVIDIA NIM API Provider
|
||
=======================
|
||
|
||
Nemotron Tool Calling 專用 Provider
|
||
|
||
功能:
|
||
- OpenAI 相容 API 格式
|
||
- Pydantic Schema 驗證
|
||
- 重試機制 (3 次)
|
||
- Fallback 降級 (→ Gemini)
|
||
- HITL 高風險保護
|
||
|
||
版本: v1.0
|
||
建立日期: 2026-03-29 (台北時區)
|
||
建立者: Claude Code
|
||
"""
|
||
|
||
from __future__ import annotations
|
||
|
||
import asyncio
|
||
import json
|
||
from dataclasses import dataclass
|
||
from enum import Enum
|
||
from typing import Any, Callable
|
||
|
||
import httpx
|
||
import structlog
|
||
from pydantic import BaseModel, ValidationError
|
||
|
||
from src.core.config import settings
|
||
from src.services.ai_rate_limiter import get_ai_rate_limiter
|
||
|
||
logger = structlog.get_logger(__name__)
|
||
|
||
|
||
# =============================================================================
|
||
# 配置
|
||
# =============================================================================
|
||
|
||
NVIDIA_API_BASE = "https://integrate.api.nvidia.com/v1"
|
||
NVIDIA_TIMEOUT = 60.0 # 免費 tier 延遲較高
|
||
MAX_RETRIES = 3
|
||
RETRY_DELAY = 5.0 # 秒
|
||
|
||
# 高風險 Tool (需要 HITL)
|
||
HIGH_RISK_TOOLS = {
|
||
"restart_deployment",
|
||
"scale_deployment",
|
||
"delete_resource",
|
||
"apply_manifest",
|
||
"rollback_deployment",
|
||
}
|
||
|
||
|
||
# =============================================================================
|
||
# Pydantic Schema
|
||
# =============================================================================
|
||
|
||
class ToolFunction(BaseModel):
|
||
"""Tool Function Schema"""
|
||
name: str
|
||
arguments: str # JSON string
|
||
|
||
|
||
class ToolCall(BaseModel):
|
||
"""Tool Call Response Schema"""
|
||
id: str
|
||
type: str = "function"
|
||
function: ToolFunction
|
||
|
||
|
||
class NvidiaMessage(BaseModel):
|
||
"""NVIDIA API Message Schema"""
|
||
role: str
|
||
content: str | None = None
|
||
tool_calls: list[ToolCall] | None = None
|
||
|
||
|
||
class NvidiaChoice(BaseModel):
|
||
"""NVIDIA API Choice Schema"""
|
||
index: int
|
||
message: NvidiaMessage
|
||
finish_reason: str
|
||
|
||
|
||
class NvidiaResponse(BaseModel):
|
||
"""NVIDIA API Response Schema"""
|
||
id: str
|
||
object: str
|
||
created: int
|
||
model: str
|
||
choices: list[NvidiaChoice]
|
||
usage: dict[str, int] | None = None
|
||
|
||
|
||
# =============================================================================
|
||
# Result 類型
|
||
# =============================================================================
|
||
|
||
@dataclass
|
||
class ToolCallResult:
|
||
"""Tool Calling 結果"""
|
||
success: bool
|
||
tool_name: str | None = None
|
||
arguments: dict[str, Any] | None = None
|
||
error: str | None = None
|
||
requires_approval: bool = False
|
||
approval_id: str | None = None
|
||
latency_ms: float = 0.0
|
||
raw_response: dict | None = None
|
||
|
||
|
||
# =============================================================================
|
||
# NvidiaProvider
|
||
# =============================================================================
|
||
|
||
class NvidiaProvider:
|
||
"""
|
||
NVIDIA NIM API Provider
|
||
|
||
專為 Tool Calling 優化的 Provider,具備:
|
||
- Pydantic Schema 驗證
|
||
- 自動重試 (3 次)
|
||
- Fallback 降級 (→ Gemini)
|
||
- HITL 高風險保護
|
||
|
||
Usage:
|
||
provider = NvidiaProvider()
|
||
result = await provider.tool_call(
|
||
prompt="Restart the awoooi-api deployment",
|
||
tools=[...]
|
||
)
|
||
"""
|
||
|
||
def __init__(
|
||
self,
|
||
api_key: str | None = None,
|
||
model: str = "nvidia/nemotron-mini-4b-instruct",
|
||
fallback_provider: Any = None,
|
||
):
|
||
self._api_key = api_key or settings.NVIDIA_API_KEY
|
||
self._model = model
|
||
self._fallback_provider = fallback_provider
|
||
self._rate_limiter = get_ai_rate_limiter()
|
||
self._client: httpx.AsyncClient | None = None
|
||
|
||
async def _get_client(self) -> httpx.AsyncClient:
|
||
"""Lazy load HTTP client"""
|
||
if self._client is None:
|
||
self._client = httpx.AsyncClient(
|
||
timeout=NVIDIA_TIMEOUT,
|
||
headers={
|
||
"Content-Type": "application/json",
|
||
"Authorization": f"Bearer {self._api_key}",
|
||
},
|
||
)
|
||
return self._client
|
||
|
||
async def close(self) -> None:
|
||
"""Close HTTP client"""
|
||
if self._client:
|
||
await self._client.aclose()
|
||
self._client = None
|
||
|
||
# =========================================================================
|
||
# 核心方法
|
||
# =========================================================================
|
||
|
||
async def tool_call(
|
||
self,
|
||
prompt: str,
|
||
tools: list[dict],
|
||
system_prompt: str | None = None,
|
||
temperature: float = 0.1,
|
||
max_tokens: int = 512,
|
||
) -> ToolCallResult:
|
||
"""
|
||
執行 Tool Calling
|
||
|
||
Args:
|
||
prompt: 用戶輸入
|
||
tools: Tool 定義 (OpenAI 格式)
|
||
system_prompt: 系統提示
|
||
temperature: 溫度 (建議 0.1 for Tool Calling)
|
||
max_tokens: 最大 Token 數
|
||
|
||
Returns:
|
||
ToolCallResult: Tool Calling 結果
|
||
"""
|
||
import time
|
||
start_time = time.perf_counter()
|
||
|
||
# Rate limit 檢查
|
||
allowed, reason = await self._rate_limiter.check_and_increment("nvidia")
|
||
if not allowed:
|
||
logger.warning("nvidia_rate_limited", reason=reason)
|
||
return await self._fallback(prompt, tools, reason)
|
||
|
||
# 重試迴圈
|
||
last_error: str | None = None
|
||
for attempt in range(MAX_RETRIES):
|
||
try:
|
||
result = await self._call_api(
|
||
prompt=prompt,
|
||
tools=tools,
|
||
system_prompt=system_prompt,
|
||
temperature=temperature,
|
||
max_tokens=max_tokens,
|
||
)
|
||
|
||
# Schema 驗證
|
||
validated = self._validate_tool_call(result, tools)
|
||
if validated.error:
|
||
# 驗證失敗,重試
|
||
logger.warning(
|
||
"nvidia_validation_failed",
|
||
attempt=attempt + 1,
|
||
error=validated.error,
|
||
)
|
||
last_error = validated.error
|
||
await asyncio.sleep(RETRY_DELAY)
|
||
continue
|
||
|
||
# 計算延遲
|
||
validated.latency_ms = (time.perf_counter() - start_time) * 1000
|
||
|
||
# 高風險檢查
|
||
if validated.tool_name in HIGH_RISK_TOOLS:
|
||
validated.requires_approval = True
|
||
validated.approval_id = await self._request_human_approval(validated)
|
||
|
||
logger.info(
|
||
"nvidia_tool_call_success",
|
||
tool=validated.tool_name,
|
||
latency_ms=validated.latency_ms,
|
||
requires_approval=validated.requires_approval,
|
||
)
|
||
|
||
return validated
|
||
|
||
except httpx.TimeoutException:
|
||
last_error = f"Timeout on attempt {attempt + 1}"
|
||
logger.warning("nvidia_timeout", attempt=attempt + 1)
|
||
if attempt < MAX_RETRIES - 1:
|
||
await asyncio.sleep(RETRY_DELAY)
|
||
continue
|
||
|
||
except Exception as e:
|
||
last_error = str(e)
|
||
logger.error("nvidia_error", attempt=attempt + 1, error=str(e))
|
||
if attempt < MAX_RETRIES - 1:
|
||
await asyncio.sleep(RETRY_DELAY)
|
||
continue
|
||
|
||
# 全部重試失敗,降級
|
||
return await self._fallback(prompt, tools, last_error)
|
||
|
||
async def _call_api(
|
||
self,
|
||
prompt: str,
|
||
tools: list[dict],
|
||
system_prompt: str | None = None,
|
||
temperature: float = 0.1,
|
||
max_tokens: int = 512,
|
||
) -> dict:
|
||
"""呼叫 NVIDIA API"""
|
||
client = await self._get_client()
|
||
|
||
messages = []
|
||
if system_prompt:
|
||
messages.append({"role": "system", "content": system_prompt})
|
||
messages.append({"role": "user", "content": prompt})
|
||
|
||
response = await client.post(
|
||
f"{NVIDIA_API_BASE}/chat/completions",
|
||
json={
|
||
"model": self._model,
|
||
"messages": messages,
|
||
"tools": tools,
|
||
"tool_choice": "auto",
|
||
"temperature": temperature,
|
||
"max_tokens": max_tokens,
|
||
},
|
||
)
|
||
response.raise_for_status()
|
||
return response.json()
|
||
|
||
def _validate_tool_call(
|
||
self,
|
||
response: dict,
|
||
tools: list[dict],
|
||
) -> ToolCallResult:
|
||
"""
|
||
驗證 Tool Calling 回應
|
||
|
||
使用 Pydantic 進行 Schema 驗證,
|
||
處理常見的格式問題 (如 Markdown 標籤)
|
||
"""
|
||
try:
|
||
# Pydantic 驗證
|
||
parsed = NvidiaResponse.model_validate(response)
|
||
|
||
if not parsed.choices:
|
||
return ToolCallResult(success=False, error="No choices in response")
|
||
|
||
message = parsed.choices[0].message
|
||
|
||
if not message.tool_calls:
|
||
return ToolCallResult(
|
||
success=False,
|
||
error=f"No tool_calls, content: {message.content[:100] if message.content else 'None'}",
|
||
)
|
||
|
||
tool_call = message.tool_calls[0]
|
||
|
||
# 解析 arguments JSON
|
||
try:
|
||
# 清理可能的 Markdown 標籤
|
||
args_str = tool_call.function.arguments
|
||
args_str = args_str.strip()
|
||
if args_str.startswith("```"):
|
||
args_str = args_str.split("\n", 1)[-1]
|
||
if args_str.endswith("```"):
|
||
args_str = args_str.rsplit("```", 1)[0]
|
||
args_str = args_str.strip()
|
||
|
||
arguments = json.loads(args_str)
|
||
except json.JSONDecodeError as e:
|
||
return ToolCallResult(
|
||
success=False,
|
||
error=f"Invalid JSON in arguments: {e}",
|
||
)
|
||
|
||
# 驗證 Tool 是否存在
|
||
tool_names = {t["function"]["name"] for t in tools}
|
||
if tool_call.function.name not in tool_names:
|
||
return ToolCallResult(
|
||
success=False,
|
||
error=f"Unknown tool: {tool_call.function.name}",
|
||
)
|
||
|
||
# 驗證必填參數
|
||
tool_def = next(
|
||
t for t in tools if t["function"]["name"] == tool_call.function.name
|
||
)
|
||
required = tool_def["function"]["parameters"].get("required", [])
|
||
missing = [r for r in required if r not in arguments]
|
||
if missing:
|
||
return ToolCallResult(
|
||
success=False,
|
||
error=f"Missing required parameters: {missing}",
|
||
)
|
||
|
||
return ToolCallResult(
|
||
success=True,
|
||
tool_name=tool_call.function.name,
|
||
arguments=arguments,
|
||
raw_response=response,
|
||
)
|
||
|
||
except ValidationError as e:
|
||
return ToolCallResult(success=False, error=f"Pydantic validation: {e}")
|
||
|
||
async def _fallback(
|
||
self,
|
||
prompt: str,
|
||
tools: list[dict],
|
||
reason: str | None,
|
||
) -> ToolCallResult:
|
||
"""降級到 Gemini"""
|
||
logger.warning("nvidia_fallback_to_gemini", reason=reason)
|
||
|
||
if self._fallback_provider:
|
||
# 使用注入的 fallback provider
|
||
return await self._fallback_provider.tool_call(prompt, tools)
|
||
|
||
# 無 fallback,返回錯誤
|
||
return ToolCallResult(
|
||
success=False,
|
||
error=f"All retries failed, no fallback available. Last error: {reason}",
|
||
)
|
||
|
||
async def _request_human_approval(self, result: ToolCallResult) -> str:
|
||
"""
|
||
發送 Telegram 確認請求 (HITL)
|
||
|
||
Returns:
|
||
approval_id: 審批 ID
|
||
"""
|
||
import uuid
|
||
from src.services.telegram_service import get_telegram_service
|
||
|
||
approval_id = str(uuid.uuid4())[:8]
|
||
|
||
try:
|
||
tg = get_telegram_service()
|
||
await tg.send_approval_request(
|
||
title=f"🔴 高風險操作確認",
|
||
content=(
|
||
f"Tool: `{result.tool_name}`\n"
|
||
f"Parameters: ```{json.dumps(result.arguments, indent=2)}```"
|
||
),
|
||
approval_id=approval_id,
|
||
timeout_seconds=300,
|
||
)
|
||
except Exception as e:
|
||
logger.error("nvidia_hitl_request_failed", error=str(e))
|
||
|
||
return approval_id
|
||
|
||
|
||
# =============================================================================
|
||
# Singleton
|
||
# =============================================================================
|
||
|
||
_provider: NvidiaProvider | None = None
|
||
|
||
|
||
def get_nvidia_provider() -> NvidiaProvider:
|
||
"""取得 NvidiaProvider 單例"""
|
||
global _provider
|
||
if _provider is None:
|
||
_provider = NvidiaProvider()
|
||
return _provider
|
||
|
||
|
||
async def close_nvidia_provider() -> None:
|
||
"""關閉 Provider (應用關閉時呼叫)"""
|
||
global _provider
|
||
if _provider:
|
||
await _provider.close()
|
||
_provider = None
|
||
```
|
||
|
||
### 4.2 task_router.py
|
||
|
||
```python
|
||
"""
|
||
Task Router - 同步/非同步任務路由
|
||
================================
|
||
|
||
根據任務類型決定同步處理或丟入 Queue 背景處理
|
||
|
||
版本: v1.0
|
||
建立日期: 2026-03-29 (台北時區)
|
||
建立者: Claude Code
|
||
"""
|
||
|
||
from __future__ import annotations
|
||
|
||
import structlog
|
||
from enum import Enum
|
||
from dataclasses import dataclass
|
||
from typing import Any
|
||
|
||
logger = structlog.get_logger(__name__)
|
||
|
||
|
||
class TaskType(Enum):
|
||
"""任務類型"""
|
||
# 同步任務 (需要即時回應)
|
||
GENERAL_CHAT = "general_chat"
|
||
STATUS_QUERY = "status_query"
|
||
SIMPLE_QA = "simple_qa"
|
||
|
||
# 非同步任務 (可以排隊等待)
|
||
TOOL_CALLING = "tool_calling"
|
||
K8S_OPERATION = "k8s_operation"
|
||
INCIDENT_ANALYSIS = "incident_analysis"
|
||
PLAYBOOK_GENERATION = "playbook_generation"
|
||
|
||
|
||
# 任務分類
|
||
SYNC_TASKS = {
|
||
TaskType.GENERAL_CHAT,
|
||
TaskType.STATUS_QUERY,
|
||
TaskType.SIMPLE_QA,
|
||
}
|
||
|
||
ASYNC_TASKS = {
|
||
TaskType.TOOL_CALLING,
|
||
TaskType.K8S_OPERATION,
|
||
TaskType.INCIDENT_ANALYSIS,
|
||
TaskType.PLAYBOOK_GENERATION,
|
||
}
|
||
|
||
|
||
@dataclass
|
||
class TaskRoutingResult:
|
||
"""任務路由結果"""
|
||
task_type: TaskType
|
||
is_async: bool
|
||
job_id: str | None = None # 非同步任務的 Job ID
|
||
result: Any = None # 同步任務的結果
|
||
|
||
|
||
class TaskRouter:
|
||
"""
|
||
任務路由器
|
||
|
||
根據任務類型自動決定:
|
||
- 同步任務 → 直接呼叫 AIRouter
|
||
- 非同步任務 → 丟入 Redis Queue
|
||
"""
|
||
|
||
def __init__(self):
|
||
from src.services.ai_router import get_ai_router
|
||
self._ai_router = get_ai_router()
|
||
self._queue = None # Lazy load
|
||
|
||
async def _get_queue(self):
|
||
"""Lazy load Queue"""
|
||
if self._queue is None:
|
||
from src.workers.nvidia_worker import get_task_queue
|
||
self._queue = get_task_queue()
|
||
return self._queue
|
||
|
||
def classify_task(self, text: str, context: dict | None = None) -> TaskType:
|
||
"""
|
||
分類任務類型
|
||
|
||
規則:
|
||
1. 包含 Tool 關鍵字 → TOOL_CALLING
|
||
2. 包含 K8s 操作動詞 → K8S_OPERATION
|
||
3. 包含 Incident 關鍵字 → INCIDENT_ANALYSIS
|
||
4. 其他 → GENERAL_CHAT
|
||
"""
|
||
text_lower = text.lower()
|
||
|
||
# Tool Calling 關鍵字
|
||
tool_keywords = [
|
||
"restart", "scale", "delete", "rollback",
|
||
"重啟", "擴展", "刪除", "回滾",
|
||
"apply", "execute", "run",
|
||
]
|
||
if any(kw in text_lower for kw in tool_keywords):
|
||
return TaskType.TOOL_CALLING
|
||
|
||
# K8s 操作
|
||
k8s_keywords = ["pod", "deployment", "service", "node", "kubectl"]
|
||
if any(kw in text_lower for kw in k8s_keywords):
|
||
return TaskType.K8S_OPERATION
|
||
|
||
# Incident 分析
|
||
incident_keywords = ["incident", "error", "crash", "down", "事故", "錯誤"]
|
||
if any(kw in text_lower for kw in incident_keywords):
|
||
return TaskType.INCIDENT_ANALYSIS
|
||
|
||
return TaskType.GENERAL_CHAT
|
||
|
||
async def route(
|
||
self,
|
||
text: str,
|
||
context: dict | None = None,
|
||
tools: list[dict] | None = None,
|
||
) -> TaskRoutingResult:
|
||
"""
|
||
路由任務
|
||
|
||
Args:
|
||
text: 用戶輸入
|
||
context: 上下文
|
||
tools: Tool 定義 (如有)
|
||
|
||
Returns:
|
||
TaskRoutingResult: 路由結果
|
||
"""
|
||
task_type = self.classify_task(text, context)
|
||
is_async = task_type in ASYNC_TASKS
|
||
|
||
logger.info(
|
||
"task_routing",
|
||
task_type=task_type.value,
|
||
is_async=is_async,
|
||
text_preview=text[:50],
|
||
)
|
||
|
||
if is_async:
|
||
# 丟入 Queue
|
||
queue = await self._get_queue()
|
||
job_id = await queue.enqueue(
|
||
task_type=task_type.value,
|
||
text=text,
|
||
context=context,
|
||
tools=tools,
|
||
)
|
||
return TaskRoutingResult(
|
||
task_type=task_type,
|
||
is_async=True,
|
||
job_id=job_id,
|
||
)
|
||
else:
|
||
# 同步處理
|
||
decision = await self._ai_router.route(text, context)
|
||
return TaskRoutingResult(
|
||
task_type=task_type,
|
||
is_async=False,
|
||
result=decision,
|
||
)
|
||
|
||
|
||
# =============================================================================
|
||
# Singleton
|
||
# =============================================================================
|
||
|
||
_router: TaskRouter | None = None
|
||
|
||
|
||
def get_task_router() -> TaskRouter:
|
||
"""取得 TaskRouter 單例"""
|
||
global _router
|
||
if _router is None:
|
||
_router = TaskRouter()
|
||
return _router
|
||
```
|
||
|
||
### 4.3 ai_router.py 修改 (差異)
|
||
|
||
```python
|
||
# 新增到 AIProvider enum
|
||
class AIProvider(Enum):
|
||
"""AI 提供者"""
|
||
OLLAMA = "ollama"
|
||
GEMINI = "gemini"
|
||
CLAUDE = "claude"
|
||
NVIDIA = "nvidia" # 🆕 新增
|
||
|
||
# 新增到 PROVIDER_LATENCY_BUDGET
|
||
PROVIDER_LATENCY_BUDGET: dict[AIProvider, int] = {
|
||
AIProvider.OLLAMA: 60000,
|
||
AIProvider.GEMINI: 30000,
|
||
AIProvider.CLAUDE: 30000,
|
||
AIProvider.NVIDIA: 60000, # 🆕 免費 tier 延遲較高
|
||
}
|
||
|
||
# 新增到 _intent_provider_overrides
|
||
self._intent_provider_overrides: dict[IntentType, AIProvider | None] = {
|
||
# ... 現有 ...
|
||
# 🆕 Tool Calling 專用路由
|
||
IntentType.TOOL_CALLING: AIProvider.NVIDIA, # Tool Calling → Nemotron
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 5. 工作衝突評估
|
||
|
||
### 5.1 當前進行中工作
|
||
|
||
| 工作項目 | 狀態 | 衝突評估 |
|
||
|----------|------|----------|
|
||
| **K3s 優化** | ✅ 100% 完成 | 無衝突 ✅ |
|
||
| **Phase 19 Omni-Terminal** | ✅ ~95% 完成 | 無衝突 ✅ |
|
||
| **Phase 16 架構大掃除** | ✅ 50/50 完成 | 無衝突 ✅ |
|
||
|
||
### 5.2 檔案修改衝突分析
|
||
|
||
| 檔案 | Nemotron 修改 | 其他工作修改 | 衝突? |
|
||
|------|---------------|--------------|-------|
|
||
| `ai_router.py` | 加入 NVIDIA Provider | 無 | ❌ 無衝突 |
|
||
| `ai_rate_limiter.py` | 加入 NVIDIA 限制 | 無 | ❌ 無衝突 |
|
||
| `config.py` | 加入 NVIDIA_API_KEY | 無 | ❌ 無衝突 |
|
||
| `model_registry.py` | 加入 NVIDIA 模型 | 無 | ❌ 無衝突 |
|
||
|
||
### 5.3 依賴關係
|
||
|
||
```
|
||
Nemotron 整合依賴:
|
||
├── ✅ Redis (已有)
|
||
├── ✅ AI Router (已有)
|
||
├── ✅ Rate Limiter (已有)
|
||
├── ✅ Telegram Service (已有)
|
||
├── ✅ Langfuse (已有)
|
||
└── ✅ OpenClaw Decision Flow (已有)
|
||
|
||
結論: 無新增基礎設施依賴,可獨立實作
|
||
```
|
||
|
||
### 5.4 工作順序建議
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────────────┐
|
||
│ 建議執行順序 │
|
||
├─────────────────────────────────────────────────────────────────┤
|
||
│ │
|
||
│ Day 1 (4-5h) │
|
||
│ ├── A.1 環境配置 (30min) │
|
||
│ ├── A.2 核心實作 (2h) │
|
||
│ ├── A.3 整合現有系統 (1.5h) │
|
||
│ └── A.5 測試 (30min) │
|
||
│ │
|
||
│ Day 2 (3-4h) - 可選,視需求 │
|
||
│ ├── A.4 HITL 高風險保護 (1h) │
|
||
│ ├── B.1 Task Router (1h) │
|
||
│ ├── B.2 Redis Queue (1.5h) │
|
||
│ └── B.3-4 部署配置 + 測試 (1h) │
|
||
│ │
|
||
│ ⚠️ Phase B 可以延後,先驗證 Phase A 效果 │
|
||
│ │
|
||
└─────────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
---
|
||
|
||
## 6. 風險評估
|
||
|
||
### 6.1 技術風險
|
||
|
||
| 風險 | 機率 | 影響 | 緩解措施 |
|
||
|------|------|------|----------|
|
||
| 免費 tier 額度不足 | 中 | 低 | Fallback 到 Gemini |
|
||
| API 延遲高峰 | 高 | 低 | 非同步 Queue 處理 |
|
||
| Tool Calling 精準度下降 | 低 | 中 | 重試 + Schema 驗證 |
|
||
| NVIDIA 服務不穩定 | 低 | 中 | 多層 Fallback |
|
||
|
||
### 6.2 整合風險
|
||
|
||
| 風險 | 機率 | 影響 | 緩解措施 |
|
||
|------|------|------|----------|
|
||
| AIRouter 改壞現有邏輯 | 低 | 高 | 充分單元測試 |
|
||
| Rate Limiter 計數錯誤 | 低 | 中 | Redis 原子操作 |
|
||
| Telegram HITL 失敗 | 低 | 中 | 超時自動拒絕 |
|
||
|
||
### 6.3 安全風險
|
||
|
||
| 風險 | 機率 | 影響 | 緩解措施 |
|
||
|------|------|------|----------|
|
||
| API Key 洩漏 | 極低 | 高 | K8s Secrets + 不寫入代碼 |
|
||
| 高風險操作誤執行 | 低 | 高 | HITL 強制確認 |
|
||
|
||
---
|
||
|
||
## 7. 首席架構師審查
|
||
|
||
### 7.1 架構合規性
|
||
|
||
| 檢查項 | 狀態 | 說明 |
|
||
|--------|------|------|
|
||
| **leWOOOgo 積木化** | ✅ 通過 | Provider 獨立模組,可替換 |
|
||
| **ADR 遵循** | ✅ 通過 | 符合 ADR-023 路由決策矩陣 |
|
||
| **分層架構** | ✅ 通過 | Router → Provider → API |
|
||
| **Fallback 機制** | ✅ 通過 | NVIDIA → Gemini → Claude |
|
||
| **成本控制** | ✅ 通過 | Rate Limiter 整合 |
|
||
| **可觀測性** | ✅ 通過 | Langfuse + structlog |
|
||
|
||
### 7.2 代碼品質
|
||
|
||
| 檢查項 | 狀態 | 說明 |
|
||
|--------|------|------|
|
||
| **Pydantic Schema** | ✅ 通過 | 嚴格類型驗證 |
|
||
| **非同步 HTTP** | ✅ 通過 | httpx.AsyncClient |
|
||
| **重試機制** | ✅ 通過 | 3 次 + 指數退避 |
|
||
| **錯誤處理** | ✅ 通過 | 結構化 Result 類型 |
|
||
| **日誌** | ✅ 通過 | structlog 結構化日誌 |
|
||
|
||
### 7.3 安全性
|
||
|
||
| 檢查項 | 狀態 | 說明 |
|
||
|--------|------|------|
|
||
| **Secrets 管理** | ✅ 通過 | K8s Secrets |
|
||
| **HITL 保護** | ✅ 通過 | 高風險操作需人工確認 |
|
||
| **Rate Limiting** | ✅ 通過 | 防止 API 濫用 |
|
||
|
||
### 7.4 審查評分
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────────────┐
|
||
│ 首席架構師審查評分 │
|
||
├─────────────────────────────────────────────────────────────────┤
|
||
│ │
|
||
│ 架構合規性: 20/20 ⭐⭐⭐⭐⭐ │
|
||
│ 代碼品質: 18/20 ⭐⭐⭐⭐ │
|
||
│ 安全性: 20/20 ⭐⭐⭐⭐⭐ │
|
||
│ 可維護性: 18/20 ⭐⭐⭐⭐ │
|
||
│ 整合風險: 19/20 ⭐⭐⭐⭐⭐ │
|
||
│ ──────────────────────────────────────────── │
|
||
│ 總分: 95/100 │
|
||
│ │
|
||
│ 評級: ✅ STRONG PASS │
|
||
│ │
|
||
│ 審查意見: │
|
||
│ 1. 架構設計符合 leWOOOgo 積木化原則 │
|
||
│ 2. Provider 獨立、可測試、可替換 │
|
||
│ 3. Fallback 機制完善,不會造成服務中斷 │
|
||
│ 4. HITL 保護高風險操作是正確決策 │
|
||
│ 5. Phase B (Queue) 可視 Phase A 效果決定是否實作 │
|
||
│ │
|
||
│ P2 建議 (非阻塞): │
|
||
│ - 考慮加入 Circuit Breaker 模式 │
|
||
│ - 考慮 Token 成本追蹤 (類似 Gemini) │
|
||
│ │
|
||
│ 首席架構師: Claude Code │
|
||
│ 日期: 2026-03-29 01:00 (台北時間) │
|
||
│ │
|
||
└─────────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
---
|
||
|
||
## 8. 批准與執行
|
||
|
||
### 8.1 待統帥批准項目
|
||
|
||
| # | 項目 | 內容 | 狀態 |
|
||
|---|------|------|------|
|
||
| 1 | **整體方案** | Nemotron 整合到 OpenClaw | ⏳ 待批准 |
|
||
| 2 | **Phase A 執行** | NvidiaProvider 實作 (4-5h) | ⏳ 待批准 |
|
||
| 3 | **Phase B 執行** | Task Queue 架構 (3-4h) | ⏳ 待批准 |
|
||
| 4 | **NVIDIA_API_KEY** | 儲存到 GitHub + K8s Secrets | ⏳ 待批准 |
|
||
|
||
### 8.2 批准後立即執行
|
||
|
||
```bash
|
||
# Step 1: 安全儲存 API Key (統帥執行)
|
||
# GitHub Secrets
|
||
gh secret set NVIDIA_API_KEY --body "nvapi-xxxx"
|
||
|
||
# K8s Secrets
|
||
kubectl create secret generic nvidia-api \
|
||
--from-literal=NVIDIA_API_KEY="nvapi-xxxx" \
|
||
-n awoooi-prod
|
||
|
||
# Step 2: 開始實作 (Claude Code 執行)
|
||
# A.1 → A.2 → A.3 → A.5 → (驗收) → B.1 → B.2
|
||
```
|
||
|
||
### 8.3 統帥批准欄位
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────────────┐
|
||
│ 統帥批准欄位 │
|
||
├─────────────────────────────────────────────────────────────────┤
|
||
│ │
|
||
│ □ 批准整體方案 │
|
||
│ □ 批准 Phase A 執行 │
|
||
│ □ 批准 Phase B 執行 (可延後) │
|
||
│ □ 批准 NVIDIA_API_KEY 儲存 │
|
||
│ │
|
||
│ 批准日期: _______________________ │
|
||
│ 統帥簽核: _______________________ │
|
||
│ │
|
||
└─────────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
---
|
||
|
||
**文件建立者**: Claude Code (首席架構師)
|
||
**建立日期**: 2026-03-29 01:00 (台北時間)
|
||
**狀態**: 📋 待統帥批准
|