fix(openclaw): 幻覺 deployment 名雙層防禦 — Prompt + Python validator
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled

2026-04-18 晚(台北時區)— ogt + Claude Opus 4.7 (1M)

生產事件 (approval f763bedf, 22:58):
- Alert: KubePodCrashLooping, labels.deployment="awoooi-api"
- NEMOTRON 雖收 inventory "awoooi-api, awoooi-web, awoooi-worker"
  仍輸出 kubectl_command="kubectl rollout restart deployment/awoooi-prod"
  (把 namespace 誤當 deployment 名)
- 執行結果: "Deployment 'awoooi-prod' not found in namespace 'awoooi-prod'"

## Layer 1: NEMOTRON_SYSTEM_PROMPT 強化 (prompts.py)
新增「🔒 DEPLOYMENT NAME RULE (STRICTLY ENFORCED)」區塊:
- namespace NEVER is a deployment name
- "awoooi-prod" 是 NAMESPACE,不可寫 deployment/awoooi-prod
- 若有 inventory,deployment 必須 exact match
- 優先用 labels.deployment,unknown → NO_ACTION

## Layer 2: Python 後驗證 (openclaw.py:1322+)
LLM 回應解析後 regex 抽出 deployment 名,對照 _k8s_inventory:
- 在清單內 → 通過
- 不在清單內 → 降級:
    * kubectl_command → "kubectl get deploy -n {ns}"(純調查)
    * suggested_action → NO_ACTION
    * target_resource → "unknown(hallucinated)"
    * confidence → 0.0
    * description 加註 [安全降級] 並列出合法 inventory
- log 'openclaw_deployment_hallucination_detected' 記錄

效果: 就算 LLM 無視 prompt,Python 層也會擋下。
破壞性 kubectl 絕不執行於不存在的 deployment。

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
OG T
2026-04-18 23:26:08 +08:00
parent 87d0859a98
commit 7e9448f6d0
2 changed files with 50 additions and 1 deletions

View File

@@ -155,6 +155,15 @@ OPENCLAW_TEST_PROMPT = """你是 AWOOOI AIOps 平台的智慧助手 OpenClaw。
NEMOTRON_SYSTEM_PROMPT = """# OpenClaw Lightweight (Nemo-4B Optimized)
You are an SRE AI. Analyze the alert and respond with ONLY valid JSON.
## 🔒 DEPLOYMENT NAME RULE (STRICTLY ENFORCED)
- `namespace` is NEVER a deployment name.
- "awoooi-prod" is a NAMESPACE, NOT a deployment. NEVER write `deployment/awoooi-prod`.
- When "叢集實際資源清單" is provided, `target_resource` and deployment in
`kubectl_command` MUST match one of those names exactly.
- If alert has `labels.deployment`, prefer it over guessing.
- Unknown target → suggested_action=NO_ACTION, kubectl_command=
"kubectl get deploy -n <namespace>" (investigation only).
## CRITICAL: Read alertname first
The `alertname` field tells you what kind of problem this is. Use it:
- "Disk/Storage/PVC/Volume" → suggested_action=NO_ACTION, kubectl_command="kubectl get pvc" or "kubectl exec <pod> -- df -h"
@@ -162,7 +171,7 @@ The `alertname` field tells you what kind of problem this is. Use it:
- "CrashLoop/OOM/Pod" → suggested_action=DELETE_POD or RESTART_DEPLOYMENT
- "CPU/Memory/Resource" → suggested_action=TUNE_RESOURCES or SCALE_DEPLOYMENT
- "SSL/Cert" → suggested_action=NO_ACTION
NEVER use "kubectl rollout restart deployment/awoooi-prod" for database or storage alerts.
NEVER use "kubectl rollout restart deployment/awoooi-prod" (that is the NAMESPACE, not a deployment).
Make action_title describe the ACTUAL problem (not generic "自動修復 AWOOOI 服務").
## Required JSON Schema:

View File

@@ -1321,6 +1321,46 @@ Trace URL: {signoz_trace_url}
# 解析結果
result = self._parse_analysis_result(raw_response)
# 2026-04-18 ogt + Claude Opus 4.7: 幻覺 deployment 名偵測與降級 (Checkpoint-3)
# 根因: NEMOTRON 即使 prompt 有 inventory 仍會拿 namespace "awoooi-prod" 當 deployment 名
# → 執行時 kubectl rollout restart deployment/awoooi-prod → "not found"
# 修復: LLM 回應後 Python 驗證 kubectl_command 中的 deployment 名是否在 inventory
# 不在 → 降級為 NO_ACTION + 改成投查 kubectl get deploy(無破壞,只排查)
if result and _k8s_inventory:
_inventory_names = {n.strip() for n in _k8s_inventory.split(",") if n.strip()}
_kcmd = (result.kubectl_command or "").lower()
import re as _re
_m = _re.search(r"deployment[/\s]+([a-z0-9][a-z0-9-]*)", _kcmd)
if _m:
_deploy_guess = _m.group(1)
if _deploy_guess not in _inventory_names:
logger.warning(
"openclaw_deployment_hallucination_detected",
hallucinated=_deploy_guess,
inventory=sorted(_inventory_names),
original_kubectl_cmd=result.kubectl_command,
original_action=result.suggested_action.value if hasattr(result.suggested_action, 'value') else str(result.suggested_action),
)
# 降級為安全調查動作,不執行破壞性操作
result.kubectl_command = f"kubectl get deploy -n {_k8s_ns}"
result.target_resource = "unknown(hallucinated)"
# Pydantic enum 處理: 若不能直接賦 str 就跳過
try:
from src.models.openclaw_schema import SuggestedAction as _SA
result.suggested_action = _SA.NO_ACTION
except Exception:
pass
result.description = (
f"[安全降級] 原 LLM 建議的 deployment '{_deploy_guess}' 不在叢集 inventory "
f"({', '.join(sorted(_inventory_names))})。"
f"已降級為純調查動作,請手動確認實際問題資源。"
)
# 信心度歸零
try:
result.confidence = 0.0
except Exception:
pass
if result:
logger.info(
"openclaw_analysis_complete",