Your Name
|
54a4e59af9
|
fix(auto-approve): 主機告警 SSH 診斷指令豁免 bad_target 驗證 — 修復 no_executable_action
根因:host_resource_alert 規則使用 {host}(由 instance label 派生),
與 {target} 無關;但 host 告警缺少 K8s deployment label 導致 target=unknown,
_is_bad_target=True → kubectl_command 被清空 → auto_approve 以
no_executable_action 拒絕 → 每日 3 次人工攔截。
修復:
- alert_rule_engine.py: SSH 指令(startswith "ssh ")跳過 bad_target 驗證
- prompts.py: 主 + Nemo prompt 補 Host* 告警 SSH 診斷規則,防 LLM fallback 路徑輸出 kubectl
- ssh_command_whitelist.py: 新建唯讀 SSH 指令白名單模組(供 _ssh_execute() 執行前驗證)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-05-04 14:15:05 +08:00 |
|
Your Name
|
bb5f16f8ef
|
fix(aiops-p2): P2.1 LLM品質三修 — Evidence-First + consensus confidence + raw_evidence注入
CD Pipeline / build-and-deploy (push) Has been cancelled
根因:
- consensus_engine 四 ExpertAgent confidence=0.0 → 加權投票 total=0 → 永遠返回 NO_ACTION
- prompts.py 無 Evidence-First 指令 → LLM 靠記憶推理,無真實環境約束
- openclaw.py analyze_alert 建 prompt 未注入 MCP evidence (diagnosis_context)
修復:
- consensus_engine: SRE/Security/Cost/Performance 依訊號強度設 0.45~0.80 confidence
- consensus_engine: _normalize_action 加「重新啟動」別名 → RESTART
- consensus_engine: SecurityAgent 移除未使用的 _target 變數
- prompts.py: 加 Evidence-First Protocol + Skepticism Rules 區塊
- openclaw.py: analyze_alert 提取 diagnosis_context → <raw_evidence> 注入 full_prompt
驗證: consensus score 從 0.0 → 0.744(CrashLoop 測試案例)
P2.1 fix 2026-04-24 ogt + Claude Sonnet 4.6
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-24 15:52:25 +08:00 |
|
OG T
|
7e9448f6d0
|
fix(openclaw): 幻覺 deployment 名雙層防禦 — Prompt + Python validator
CD Pipeline / build-and-deploy (push) Has been cancelled
2026-04-18 晚(台北時區)— ogt + Claude Opus 4.7 (1M)
生產事件 (approval f763bedf, 22:58):
- Alert: KubePodCrashLooping, labels.deployment="awoooi-api"
- NEMOTRON 雖收 inventory "awoooi-api, awoooi-web, awoooi-worker"
仍輸出 kubectl_command="kubectl rollout restart deployment/awoooi-prod"
(把 namespace 誤當 deployment 名)
- 執行結果: "Deployment 'awoooi-prod' not found in namespace 'awoooi-prod'"
## Layer 1: NEMOTRON_SYSTEM_PROMPT 強化 (prompts.py)
新增「🔒 DEPLOYMENT NAME RULE (STRICTLY ENFORCED)」區塊:
- namespace NEVER is a deployment name
- "awoooi-prod" 是 NAMESPACE,不可寫 deployment/awoooi-prod
- 若有 inventory,deployment 必須 exact match
- 優先用 labels.deployment,unknown → NO_ACTION
## Layer 2: Python 後驗證 (openclaw.py:1322+)
LLM 回應解析後 regex 抽出 deployment 名,對照 _k8s_inventory:
- 在清單內 → 通過
- 不在清單內 → 降級:
* kubectl_command → "kubectl get deploy -n {ns}"(純調查)
* suggested_action → NO_ACTION
* target_resource → "unknown(hallucinated)"
* confidence → 0.0
* description 加註 [安全降級] 並列出合法 inventory
- log 'openclaw_deployment_hallucination_detected' 記錄
效果: 就算 LLM 無視 prompt,Python 層也會擋下。
破壞性 kubectl 絕不執行於不存在的 deployment。
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
2026-04-18 23:26:09 +08:00 |
|
OG T
|
604d8eea37
|
fix(schema-drift): 補齊 prompts.py + Claude API schema enum 同步 (ADR-090)
CD Pipeline / build-and-deploy (push) Successful in 12m27s
問題: fe77e6d 擴充了 models/ai.py enum 至 8 值,但兩個地方未同步:
1. core/prompts.py L77: 缺 INVESTIGATE、OBSERVE
2. core/prompts.py L176 (NEMOTRON_SYSTEM_PROMPT): 缺 APPLY_HPA、INVESTIGATE、OBSERVE
3. openclaw.py L564 (_call_claude tools schema): 舊 4 值 enum 約束
影響: LLM 不知道可以輸出 INVESTIGATE/OBSERVE,只能選舊 4 值
修復: 三處統一對齊 8 個 suggested_action 值
RESTART_DEPLOYMENT|DELETE_POD|SCALE_DEPLOYMENT|APPLY_HPA|TUNE_RESOURCES|INVESTIGATE|OBSERVE|NO_ACTION
Closes: ADR-090 Prompt-Model 三層同步鐵律
2026-04-17 ogt + Claude Sonnet 4.6
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-17 22:10:18 +08:00 |
|
OG T
|
a258d87767
|
fix(webhooks+prompts): 修復 LLM 對所有告警一律輸出「重啟 AWOOOI 服務」的根本問題
CD Pipeline / build-and-deploy (push) Has been cancelled
根因 (INC-20260416-C365D0 postgres 磁碟告警事故):
1. alert_context 中 alertname 埋在 labels 深處,LLM 看到 alert_type="custom" → 不知道是什麼告警
2. 快取鍵用 alert_type:target_resource → 不同 alertname 共用同一快取 → 全部回傳第一個 LLM 結果
3. 系統 Prompt 無 alert-category 指導 → LLM 永遠輸出 kubectl rollout restart
修復:
- webhooks.py: alert_context 置頂加入 alertname + alert_category + annotations
- openclaw.py: 快取鍵改用 alertname:target_resource(告警名稱才是主要識別符)
- prompts.py: OPENCLAW_SYSTEM_PROMPT + NEMOTRON_SYSTEM_PROMPT 加入 Alert-Specific Analysis Rules
database/storage 告警 → NO_ACTION + 調查指令;K8s 告警 → 對應重啟指令
禁止對非 K8s 告警輸出 kubectl rollout restart deployment/awoooi-prod
2026-04-16 ogt + Claude Sonnet 4.6(亞太)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-16 19:56:13 +08:00 |
|
OG T
|
eccf61fbc9
|
fix(ai): 修復假信心度 + 解除 Shadow Mode (Phase 22 P1)
CD Pipeline / build-and-deploy (push) Has been cancelled
E2E Health Check / e2e-health (push) Has been cancelled
1. openclaw.py: LLM 截斷時 confidence 0.82→0.0 (禁止偽造信心度)
2. prompts.py: NEMOTRON schema 範例值改用佔位符,防模型照抄 0.75
3. configmap: SHADOW_MODE_ENABLED=false,開放 low 風險自動執行
條件門檻: confidence≥90% + trust_score≥5 + playbook_success≥95%
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-01 15:59:42 +08:00 |
|
OG T
|
0fd53422c6
|
fix(openclaw): NEMOTRON_SYSTEM_PROMPT confidence/reasoning 移至最前
CD Pipeline / build-and-deploy (push) Failing after 5m36s
E2E Health Check / e2e-health (push) Successful in 17s
Nemo-4B 4B 參數模型輸出長度有限,confidence/reasoning 排在 schema 末尾
時常被截斷,導致 openclaw.py:1045 fallback 補 0.82 假數據。
修復:將 confidence 和 reasoning 移至 schema 最前兩個欄位,確保模型
輸出截斷時仍包含最關鍵欄位。同時明確禁止模型抄範例值。
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-01 13:19:18 +08:00 |
|
OG T
|
d03668669b
|
fix(openclaw): optimize for Nemo-4B with lightweight prompt and resilient parsing
E2E Health Check / e2e-health (push) Successful in 26s
|
2026-03-31 15:59:58 +08:00 |
|
OG T
|
fb0ddf305c
|
fix(api): fix dockerfile to include models.json, remove huge prompt example to fit 4K limit
E2E Health Check / e2e-health (push) Successful in 17s
|
2026-03-31 14:03:34 +08:00 |
|
OG T
|
46843c8e19
|
fix(nvidia): revert to nemotron-mini, truncate context for 4K limit, enforce precise confidence
E2E Health Check / e2e-health (push) Successful in 17s
|
2026-03-31 13:57:10 +08:00 |
|
OG T
|
30f045bf28
|
feat: ADR-019 System Prompt 集中管理 + Nightly LLM Workflow
新增:
- docs/adr/ADR-019-system-prompt-management.md - System Prompt 規範
- apps/api/src/core/prompts.py - 集中管理 System Prompts
- .github/workflows/nightly-llm.yaml - 每夜 LLM 迴歸測試
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
2026-03-26 12:27:47 +08:00 |
|