awoooi

Author	SHA1	Message	Date
OG T	e6e484c1dc	fix(openclaw): import path 修正 — src.models.ai (非 openclaw_schema) Some checks are pending CD Pipeline / build-and-deploy (push) Has started running Details IDE 正確抓到的 bug(非 false positive),SuggestedAction 在 src/models/ai.py。 _SA.NO_ACTION 現在能正確降級。 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 23:26:45 +08:00
OG T	7e9448f6d0	fix(openclaw): 幻覺 deployment 名雙層防禦 — Prompt + Python validator Some checks failed CD Pipeline / build-and-deploy (push) Has been cancelled Details 2026-04-18 晚（台北時區）— ogt + Claude Opus 4.7 (1M) 生產事件 (approval f763bedf, 22:58): - Alert: KubePodCrashLooping, labels.deployment="awoooi-api" - NEMOTRON 雖收 inventory "awoooi-api, awoooi-web, awoooi-worker" 仍輸出 kubectl_command="kubectl rollout restart deployment/awoooi-prod" (把 namespace 誤當 deployment 名) - 執行結果: "Deployment 'awoooi-prod' not found in namespace 'awoooi-prod'" ## Layer 1: NEMOTRON_SYSTEM_PROMPT 強化 (prompts.py) 新增「🔒 DEPLOYMENT NAME RULE (STRICTLY ENFORCED)」區塊: - namespace NEVER is a deployment name - "awoooi-prod" 是 NAMESPACE,不可寫 deployment/awoooi-prod - 若有 inventory,deployment 必須 exact match - 優先用 labels.deployment,unknown → NO_ACTION ## Layer 2: Python 後驗證 (openclaw.py:1322+) LLM 回應解析後 regex 抽出 deployment 名,對照 _k8s_inventory: - 在清單內 → 通過 - 不在清單內 → 降級: * kubectl_command → "kubectl get deploy -n {ns}"(純調查) * suggested_action → NO_ACTION * target_resource → "unknown(hallucinated)" * confidence → 0.0 * description 加註 [安全降級] 並列出合法 inventory - log 'openclaw_deployment_hallucination_detected' 記錄效果: 就算 LLM 無視 prompt,Python 層也會擋下。破壞性 kubectl 絕不執行於不存在的 deployment。 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 23:26:09 +08:00
AWOOOI CD	87d0859a98	chore(cd): deploy `6ad73b4` [skip ci]	2026-04-18 12:22:38 +00:00
OG T	6ad73b4834	fix(flywheel): 三修 L5/L6 斷鏈 — RBAC 擴權 + 失敗原因入庫 + verifier 失敗時也跑 All checks were successful CD Pipeline / build-and-deploy (push) Successful in 11m6s Details 2026-04-18 晚（台北時區） — ogt + Claude Opus 4.7 (1M) 全景飛輪診斷暴露 3 個真斷鏈: - L5 執行 30d: EXECUTION_FAILED 216 / EXECUTION_SUCCESS 2 (失敗率 99%) - L6 驗證 7d: verification_result 全 NULL (988 筆 evidence 都沒驗) - 所有 rejection_reason / error_message 欄位全空(無法診斷) 根因: awoooi-executor ServiceAccount RBAC 不足,executor.py 每次 kubectl get nodes/HPA 都 Forbidden,連 evidence 都抓不到,後面 repair 全炸,verifier 因為 execution 沒 success 永遠不 trigger,evidence 驗證結果永遠 NULL。修一個 RBAC 解 3 個節點。 ## P0.1 RBAC 擴權 (k8s/awoooi-prod/07-rbac.yaml) 新增 cluster-scope 讀權(僅 list/get/watch,零寫入): - nodes + nodes/status (evidence gathering 必需) - horizontalpodautoscalers (HPA 狀態) - metrics.k8s.io: nodes + pods (resource metrics) - statefulsets + daemonsets (完整 workload 視圖) 已 kubectl apply + 煙霧測試: kubectl get nodes 可跑。 ## P0.2 失敗時必寫 rejection_reason (approval_db.py) update_execution_status() 新增 error_message 參數,失敗時寫入 rejection_reason (截 2000 字) → 之後診斷有依據。 approval_execution.py 呼叫端同步更新,result.error 一路傳進 DB。 ## P0.3 Verifier 失敗時也跑 (approval_execution.py) 原邏輯: verifier 只在 result.success=True 時呼叫 → 99% 失敗下永遠不跑。新邏輯: 失敗 path 也 create_task 跑 verifier,action_taken 後綴加 ":FAILED" 標記。verifier 抓 post_state 寫 verification_result='failed' 回 incident_evidence。 L7 learning 從此有失敗樣本可學,playbook trust 負向 2x 衰減才真正生效。預期效果: - EXECUTION_FAILED 率 30d 內應從 99% 降到 <30% - incident_evidence.verification_result NULL 率應從 100% 降到 <10% - approval_records.rejection_reason 補齊率從 0% 到 100% Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 20:12:57 +08:00
AWOOOI CD	1dac23fd56	chore(cd): deploy `b0d560d` [skip ci]	2026-04-18 10:21:41 +00:00
OG T	b0d560dbb3	fix(drift-narrator): shortener 用 replace — 包容 LLM 加 'Resource/Name:' 前綴幻覺 All checks were successful CD Pipeline / build-and-deploy (push) Successful in 10m50s Details 2026-04-18 下午（台北時區）— ogt + Claude Opus 4.7 Round 4 LLM 自己在 field 前加資源識別符: 'Deployment/awoooi-web: spec.template.spec.containers' 導致 startswith 模式 shortener 失效(前綴不在開頭)。防禦式修法: startswith 不中 → 改用 replace 清除任何位置的前綴。結果: 'Deployment/awoooi-web: containers' ✅ Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 18:12:15 +08:00
AWOOOI CD	c40f3506e3	chore(cd): deploy `b63aed7` [skip ci]	2026-04-18 09:20:51 +00:00
OG T	b63aed72df	fix(drift-narrator): 砍 spec.template.spec. 前綴 — 修 Telegram 自動換行醜陋排版 All checks were successful CD Pipeline / build-and-deploy (push) Successful in 12m1s Details 2026-04-18 下午（台北時區）— ogt + Claude Opus 4.7 統帥實彈三輪視覺回報: 字段名 'spec.template.spec.volumes' 共 24 字元, 加上 emoji+': '+summary 超過 Telegram <pre> 視覺寬度,自動換行造成 emoji 與 field name 斷開、單獨成行的醜狀。修復: _shorten_field_path() 砍 3 種常見前綴: - 'spec.template.spec.' → '' - 'spec.template.' → '' (後備) - 'spec.' → '' (後備) 效果對比: 前: '🟡 spec.template.spec.affinity.podAntiAffinity.preferredDuringS: [清單 3 項]' 後: '🟡 affinity.podAntiAffinity.preferredDuringS: [清單 3 項]' Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 17:10:20 +08:00
AWOOOI CD	584831bace	chore(cd): deploy `f3960f3` [skip ci]	2026-04-18 08:39:13 +00:00
OG T	f3960f36d2	fix(drift-narrator): fallback 強化 — 標註 K8s 預設值補齊 + 可操作數獨立計算 All checks were successful CD Pipeline / build-and-deploy (push) Successful in 10m37s Details 2026-04-18 下午（台北時區）—— ogt + Claude Opus 4.7 (1M) 統帥實彈測試回報: 卡片顯示「securityContext: (未設) → {物件 0 欄位}」毫無意義。根因: _fallback_items 對「K8s controller 自動補齊空物件」的噪音誤當成真實變更輸出。且「還有 29 項」數字包含白名單 + trivial。修復 3 項: 1. _is_trivial_drift() 新判定函數 None/空字串/{}/[]/false/0 等互相視為「無實質變更」捕捉 K8s controller 自動補齊場景 2. _summarize_item() 替代原本 smart_shorten - trivial → "K8s 預設值補齊 (無實質變更)" - None → value → "新增 xxx" - value → None → "已刪除 (原: xxx)" - 其他 → "from → to" 3. _fallback_items() 改進 - 按 level 排序 (HIGH 優先) - 白名單 + HPA allowlist 先過濾 4. _count_nontrivial_drift() + Telegram 呈現 - 新增「可操作」計數 (去掉白名單 + trivial) - 「還有 N 項」用可操作數,不會誤導 - items 為空時顯示「全為白名單或預設值補齊」預期效果: 之前: "... 還有 29 項" (其實只 1 個是真實 drift) 現在: "... 還有 0 項" 或 "(全部為白名單或預設值補齊)" Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 16:29:49 +08:00
OG T	1606093dd2	fix(drift-narrator): 兩個 hotfix — NEMOTRON wrapper 解析 + tags asyncpg 型別 Some checks failed CD Pipeline / build-and-deploy (push) Has been cancelled Details 2026-04-18 下午（台北時區）—— ogt + Claude Opus 4.7 (1M) Live-fire test (report_id=80a34b58) 暴露兩個 bug: ## Bug 1: LLM JSON 被 NEMOTRON wrapper 吞掉根因: openclaw.call() 經 NEMOTRON 路由時強制回 {description,...} 結構, 我的 prompt 要 {narrative, items} 無法穿透。 (同 `1ff3405` 早前碰過的 JSON 裸奔問題根源) 修復: 三路 fallback 解析 - Path 1: 直接我們的 {narrative, items}（Ollama 或 LLM 守規矩） - Path 2: NEMOTRON wrapper,description 巢狀 JSON 含我們結構 - Path 3: description 是純敘述 → 當 narrative + Python fallback_items ## Bug 2: tags 參數 asyncpg DataError 根因: 傳 '{drift,type4d,llm_summary}' 字面量字串,asyncpg 要求 Python list '(a sized iterable container expected (got type str))' 修復: tags 改傳 ['drift','type4d','llm_summary'] Python list,移除 CAST AS text[] asyncpg 自動推斷 text[] Live-fire 結果驗證: - narrative ✅ 生成(fallback path) - items ⚠️ 只 1 筆(NEMOTRON 未吐我們結構) - DB write ❌ tags 型別錯 - Telegram ✅ 送出(雖 fallback 內容但視覺 OK) 本 commit 後預期: - LLM 回應走 Path 2/3 → narrative + Python fallback items(5 筆 smart summary) - DB write 成功 → automation_operation_log + ai_collaboration_trace 皆有記錄 - 若 LLM 未來學會走 Path 1(給我們 {narrative, items}),自動升級 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 16:26:17 +08:00
AWOOOI CD	e7bd37a5ac	chore(cd): deploy `a156566` [skip ci]	2026-04-18 08:14:08 +00:00
OG T	a156566b17	feat(drift-narrator): ADR-090-C L4 稽核閉環 — notification_formatted op 入庫 Some checks failed CD Pipeline / build-and-deploy (push) Successful in 10m47s Details run-migration / migrate (push) Failing after 14s Details 2026-04-18 下午（台北時區）—— ogt + Claude Opus 4.7 (1M) 架構鐵律執行: 「沒有被記錄的 AI 決策,就等於沒有發生過。」 drift_narrator 每次呼叫 LLM 生成摘要,必須完整寫入 automation_operation_log + ai_collaboration_trace,形成 L4 稽核 + RLHF 語料。本 commit 兩件事: 1. apps/api/migrations/adr090c_notification_formatted_op_type.sql - 擴充 automation_operation_log.operation_type CHECK 加 'notification_formatted' - DROP + ADD CONSTRAINT idempotent 模式 - 已用 awoooi（表 owner）apply 進 prod 驗證通過 2. apps/api/src/services/drift_narrator_service.py - 新增 _log_ai_action_to_db() 負責 DB 稽核寫入 - 在 _generate_narrative_and_items() 結尾（success / fallback 都寫）呼叫 - automation_operation_log: * operation_type='notification_formatted' * actor='drift_narrator' * input = {report_id, namespace, counts, items_scanned} * output = {narrative, items, items_count} * duration_ms, tags=['drift','type4d','llm_summary'] * parent_op_id 查詢 alert_fired 鏈路（未來 drift → alert 關聯） - ai_collaboration_trace: * agent='drift_narrator', model=provider (ollama / nemotron / 等) * prompt（限 8000 字）+ response（JSONB） * accepted = LLM JSON 解析成功 flag（未來 RLHF 訓料金礦） - 錯誤處理: DB 寫入 try/except 包住,永不破壞 Telegram 通知主流程 P2.4 事件關聯: - SELECT parent op via input->>'report_id' 或 'drift_report_id' - 若找到則綁定 parent_op_id（形成 alert_fired → notification_formatted 追溯鏈） - 目前 drift 本身不經 alert_fired,parent 為 NULL（等未來鏈路接通） P2.5 RLHF 語料: - ai_collaboration_trace.accepted=true 的紀錄即為「LLM 解析成功」樣本 - 未來統帥按 Telegram [✅ 採納變更] / [⏪ 回滾] 時,對應 trace 也可更新 outcome flag,形成完整 Human-in-the-loop 語料技術細節: - get_db_context() auto-commit（src/db/base.py:128）,無需手動 commit - prompt 最長 8000 字（一般 drift 約 2-3k） - raw_response 保留前 500 字在 trace.response JSON 中相關: - feedback_ai_autonomous_direction.md L4 北極星 - feedback_secrets_leak_incidents_2026-04-18.md L1-L4 分層 - ADR-090 11 張神經網路表 - commit fb88512（B 方案視覺層） IDE 可能顯示 src.db.base 找不到 —— 那是誤報（drift_repository.py 用同一條路徑）。 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 16:04:23 +08:00
AWOOOI CD	4f70da027e	chore(cd): deploy `fb88512` [skip ci]	2026-04-18 08:03:46 +00:00
OG T	fb88512fcb	fix(drift-narrator): B 方案 LLM 驅動智能摘要 — 徹底消滅 str()[:30] 暴力截斷 Some checks failed CD Pipeline / build-and-deploy (push) Has been cancelled Details 2026-04-18 下午（台北時區）—— ogt + Claude Opus 4.7 (1M) 根因: _format_drift_summary() 對 dict/list 型別的 git_value/actual_value 直接呼叫 str()[:30] 暴力截斷,產生像 "[{'name': 'repair-ssh-key', 's" 這種亂碼掉半個 dict key 的亂七八糟輸出,徹底違背「AI 自主化」原則。 B 方案架構決策: 「捨棄 Python 寫死的字串解析邏輯。將原始 Config Diff 結構直接作為 Context,餵給 Hermes/NemoTron,利用 prompt 規定輸出格式,讓 LLM 自己消化並輸出包含紅黃燈標示的 Top 5 人類易讀摘要。」實作: 1. _NARRATIVE_PROMPT 重寫 — 要求 LLM 回傳 {narrative, items[]} JSON - drift items 以 JSON serialize 餵進 prompt（保留 200 字 context） - items 限 5 筆,HIGH 優先 - summary 30 字繁中口語（非技術 repr） 2. _generate_narrative_and_items() 新方法 — 解析 LLM JSON 並驗證結構 3. _format_drift_for_llm() 新方法 — 結構化 JSON 給 LLM（取代舊 str 版） 4. _render_telegram_body() 新方法 — 組裝乾淨的 Telegram 卡片範例輸出: 🤖 AI 研判 <LLM 4-5 行敘述> 📊 漂移明細 (HIGH: 1 \| MEDIUM: 29) 🔴 spec.template.spec.volumes: 新增 2 項 repair-ssh-key 掛載 🟡 spec.template.spec.serviceAccount: (未設) → awoooi-executor ... 還有 27 項 (按 🔍 查看 Diff) 5. Fallback 強化 — _smart_shorten() + _fallback_items() LLM 失敗時用型別感知的 Python 摘要（dict/list 顯示大小,不暴力 repr）移除: - _format_drift_summary() — 舊的暴力截斷實作 - _generate_narrative() — 只回 string 的舊介面保留: - _fallback_narrative() / _format_intent_summary() — 仍有用 - Redis 快取 / trigger 條件 / DB update — 邏輯不變 MVP 階段: 本 commit 只改視覺呈現,沒動 automation_operation_log / ai_collaboration_trace 稽核寫入。等 Telegram 視覺驗證 OK 後再做 Phase 2 加入 DB 稽核。相關: - feedback_ai_autonomous_direction.md 北極星原則 - `1ff3405` 今早的 JSON 裸奔 hotfix（只修了 narrative,沒修 items） Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 15:54:16 +08:00
AWOOOI CD	7d342e3f3e	chore(cd): deploy `7542e6e` [skip ci]	2026-04-18 07:36:38 +00:00
OG T	7542e6e570	feat(cd): ADR-090-B CD 注入 L2→L3 13 個 key — 消滅 K8s 單點盲區 All checks were successful CD Pipeline / build-and-deploy (push) Successful in 11m38s Details 2026-04-18 下午（台北時區）—— ogt + Claude Opus 4.7 (1M) 背景: Memory feedback_secrets_leak_incidents + reference_secrets_architecture_v2 定義 L1-L4 分層架構。盤點發現 14 個 K8s secret key 只存在 L3（K8s etcd）而無 L2（Gitea Secret）備援,etcd 故障或 secret 誤刪將永久遺失。本 commit 補上 13 個 key 的 L2→L3 CD 自動注入（SMTP_USER/SMTP_PASSWORD 仍為 CHANGE_ME 跳過）: DATABASE_URL / MIGRATION_DATABASE_URL (ADR-090-B 新增) REDIS_URL / JWT_SECRET / JWT_ALGORITHM WEBHOOK_HMAC_SECRET (之前 L2 有但 CD 沒引) SENTRY_DSN / CLAUDE_API_KEY GITEA_API_TOKEN (via AWOOOI_GITEA_API_TOKEN 前綴繞過 Gitea 保留字) NEMOTRON_BOT_TOKEN / OPENCLAW_BOT_TOKEN SMTP_HOST / SRE_GROUP_CHAT_ID 模式: 完全照既有 cd.yaml `Inject K8s Secrets` step 模式 — env: 引用 + if [ -n ] guard + kubectl patch json op=add + base64 -w 0 + echo 結果。 110 行新增,0 行刪除,YAML 語法驗證通過。安全: Gitea Secret 值從 K8s 現有 secret 同步（保持一致）,本 CD run 為 no-op patch。未來 K8s secret 誤刪或 rebuild 可從 Gitea 一鍵恢復。相關: - docs/superpowers/specs/2026-04-18-blindspot-governance-capacity-l4.md - docs/adr/ADR-090-monitoring-blindspot-governance.md - apps/api/migrations/adr090b_awoooi_migrator_role.sql Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 15:26:28 +08:00
AWOOOI CD	6768a375bd	chore(cd): deploy `2d43751` [skip ci]	2026-04-18 05:34:11 +00:00
OG T	2d43751729	feat(ops): ADR-090-B 零信任收尾範本 — wrapper / sudoers / migrator / CI Some checks failed CD Pipeline / build-and-deploy (push) Successful in 12m17s Details run-migration / migrate (push) Failing after 14s Details 2026-04-18 台北時區 —— ogt + Claude Opus 4.7 (1M) 本 commit 響應本 Session 兩次憑證外洩事故 (feedback_secrets_leak_incidents_2026-04-18.md), 交付統帥可直接部署的零信任基礎設施範本. 檔案清單: 1. scripts/host-ops/awoooi-hosts-add.sh - 110 主機 /etc/hosts 白名單 wrapper - 只允許預定義主機名,idempotent,帶 IP 格式驗證 - 安裝: /usr/local/bin/awoooi-hosts-add (root:root 0755) 2. scripts/host-ops/awoooi-wrapper.sudoers - 配套 sudoers 規則 (NOPASSWD for wrapper + SIGHUP only) - 安裝: /etc/sudoers.d/awoooi-wrapper (root:root 0440) - 禁 tee / bash / sh 這類 generic shell access 3. apps/api/migrations/adr090b_awoooi_migrator_role.sql - PG 限權角色 awoooi_migrator - 只能 DDL (CREATE/ALTER/DROP/INDEX/COMMENT) - 明確 REVOKE 所有 DML + default privileges 鎖死 - 本檔由統帥執行 (需 superuser),不由 Claude 執行 4. k8s/awoooi-prod/awoooi-migrator-secret.template.yaml - K8s Secret patch 範本 - 新增 MIGRATION_DATABASE_URL key (awoooi_migrator 連線串) - 與應用 DATABASE_URL 拆開 5. .gitea/workflows/run-migration.yml - CI 自動套用新 migration (單 transaction + ON_ERROR_STOP) - 用 Gitea secret MIGRATION_DATABASE_URL,不走明碼 - 每次成功寫一筆 asset_discovery_run (audit trail) 零信任三層防線 (對應 feedback_secrets_leak_incidents): L1 對話無密碼 -> wrapper 內建白名單 L2 操作經 wrapper -> sudoers + awoooi_migrator L3 顯示強制遮蔽 -> CI 走 secret,不走 env 本 Session 發現的 3 次憑證外洩全部在 feedback_secrets_leak memory 登記,並有對應 P0 輪替計畫. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 13:23:39 +08:00
OG T	5ae82d1d1f	feat(db): ADR-090 L4 AIOps 地基 — 資產盤點 × 7 項自動化覆蓋矩陣永久化 DB Some checks failed CD Pipeline / build-and-deploy (push) Has been cancelled Details 2026-04-18 下午（台北時區）—— ogt + Claude Opus 4.7 (1M) MoWoooWorkDown 假警報 RCA 暴露三重結構性失守: - 110/188 主機 load 18/16 × 13 天 / cadvisor 288% / K3s 120/121 無監控 - Prometheus 僅 35 targets / 58 rules(覆蓋不到三成) - HostHighCpuLoad 量錯維度(CPU idle vs load_avg) 統帥戰略指令: - 全景資產 × 七大自動化 × 永久化 DB - AI 四分工(OpenClaw × NemoTron × Hermes × Claude LLM) - 所有自動化操作歷程必進 DB,不靠 MD(MD 會漂移) 本 commit 交付: 1. SQL migration (apps/api/migrations/adr090_asset_inventory_foundation.sql) - 11 張表 + 33 indexes + 20 CHECK + 3 UNIQUE + 16 FK - pgcrypto extension dependency - 完整 idempotent(CREATE IF NOT EXISTS + single transaction) - 已 apply 進 awoooi_prod(188 PG),驗證通過 2. ADR-090 (docs/adr/ADR-090-monitoring-blindspot-governance.md) - 決策紀錄 + 7 引擎對映 + 4 替代方案否決 3. 主戰略文件 (docs/superpowers/specs/2026-04-18-blindspot-governance-capacity-l4.md) - §0-§14: 背景 / 根因 / Schema DDL / 4 層防禦 / 7 Phase 實施 / HARD_RULES / AI 分工矩陣 / 驗收指標 / 技術債 / 回滾 / 接手協議 4. MASTER §8 Living Changelog 追加 Phase 7 啟動條目 11 張表: asset_inventory / asset_discovery_run / asset_coverage_snapshot / asset_relationship / alert_rule_catalog / asset_change_event / asset_compliance_snapshot / host_capacity_snapshot / capacity_violation_event / automation_operation_log / ai_collaboration_trace 首筆 bootstrap 記錄已 seed 進 asset_discovery_run (run_id=6760c5bf-57e5-4a40-b82d-31b794464652) 相關 Memory (未 commit,存於 ~/.claude/...): - project_blindspot_governance.md (跨 session 指針) - feedback_monitor_self_monitoring.md (監控工具必須被監控) - feedback_secrets_leak_incidents_2026-04-18.md (憑證外洩三防線) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 13:18:46 +08:00
OG T	fb1d101902	fix(backup): HostBackupFailed P1 根治 — Prometheus textfile 指標 + docker socket 讀取問題一：backup_110_last_success_timestamp 指標從未存在根因：腳本只寫純文字 last_success 檔，從未輸出 .prom 格式修復：成功時寫入 /home/ollama/node_exporter_textfiles/backup.prom node_exporter 新增 --collector.textfile.directory=/textfile_collector volume: /home/ollama/node_exporter_textfiles:/textfile_collector 問題二：Harbor/Gitea rsync 權限拒絕根因：/var/lib/docker/volumes/ 是 710 root:root，docker group 無法直接存取 FS 路徑修復：改用 docker run --rm -v <volume>:/source alpine tar czf - 透過 docker socket（wooo 已在 docker group）讀取 volume 內容再解壓驗證：備份腳本三項全 OK，node_exporter 9100/metrics 正確輸出指標 Prometheus absent(backup_110_last_success_timestamp) 應在下次 scrape 後清除 2026-04-18 ogt + Claude Sonnet 4.6 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-18 10:37:23 +08:00
AWOOOI CD	d23343ac69	chore(cd): deploy `1ff3405` [skip ci]	2026-04-17 17:17:58 +00:00
OG T	1ff3405755	fix(drift-narrator): 修復 JSON 裸奔 — 從 NEMOTRON 回傳解析 description 欄位 All checks were successful CD Pipeline / build-and-deploy (push) Successful in 10m44s Details 根因：openclaw.call() 經 NEMOTRON 路由後強制輸出 JSON（NEMOTRON_SYSTEM_PROMPT 鐵律）但 _generate_narrative 期待純文字 → JSON 整包吐到 Telegram <pre> 區塊裸奔修復：收到 text 後先嘗試 JSON 解析 - 成功 → 按優先順序取 description / action_title / reasoning - 失敗（非 JSON）→ 原文使用（向下相容 Ollama qwen 純文字回傳）效果：Telegram Config Drift 卡片顯示繁中人話摘要，不再吐原始 JSON 2026-04-17 ogt + Claude Sonnet 4.6 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-18 01:08:32 +08:00
AWOOOI CD	1de72fffe5	chore(cd): deploy `4f2e122` [skip ci]	2026-04-17 17:03:41 +00:00
OG T	4f2e122fd2	fix(openclaw): Checkpoint-2 webhook path K8s inventory injection — 防止 NemoTron 幻覺 awoooi-service All checks were successful CD Pipeline / build-and-deploy (push) Successful in 11m39s Details 根因：NemoTron 在 webhook path（analyze_alert）無叢集上下文 → 盲猜 deployment/awoooi-service → kubectl not found → EXECUTION_FAILED → trust score 0 永遠修復： - analyze_alert() Step 0.5: 呼叫 _fetch_k8s_inventory_for_openclaw() 拉取真實 Deployment 清單 - 注入「🔒 叢集實際資源清單」section 到 full_prompt，強制 LLM 從清單選擇資源名 - 失敗/超時 → 返回空字串 → 注入警示提示，主流程不中斷 - available_len 計算納入 k8s_section 長度防止 4K 截斷影響： - Solver Agent path (solver_agent.py) 已在 `cf50a5c` 修復 - 本 commit 修復 Alertmanager webhook path（analyze_alert → NemoTron） - 兩條路徑均有 K8s 環境感知，LLM 不再幻覺資源名 ADR-082: Phase 2 多 Agent 協作 2026-04-17 ogt + Claude Sonnet 4.6（Checkpoint-2 webhook path completion） Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-18 00:53:27 +08:00
AWOOOI CD	0bde389323	chore(cd): deploy `cf50a5c` [skip ci]	2026-04-17 15:17:51 +00:00
OG T	cf50a5ce25	fix(solver+execution): Checkpoint-1 假成功修復 + Checkpoint-2 K8s 環境感知 All checks were successful CD Pipeline / build-and-deploy (push) Successful in 10m55s Details ## Checkpoint-1: 假成功根治 - approval_execution.py: execute_approved_action 改返回 bool (原返回 None，呼叫端無法判斷 K8s 是否接受指令) - decision_manager.py auto-execute 路徑: 用 _exec_success 取代硬編 success=True 修復: K8s 拒絕指令時正確發 ❌ 而非 ✅ 自動修復完成 ## Checkpoint-2: K8s 環境感知 (Inventory Pre-flight) - solver_agent.py: 新增 _fetch_k8s_inventory() — 生成 kubectl 指令前先拉 kubectl get deployments,statefulsets -n awoooi-prod，將真實名稱清單注入 Solver prompt，LLM 必須從清單選擇，防止幻覺（awooiii-api 三個 i） - 超時 5s 或失敗 → 返回 ""，prompt 顯示警示但不中斷主流程 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 23:08:23 +08:00
AWOOOI CD	bf835e51ac	chore(cd): deploy `cbb719b` [skip ci]	2026-04-17 14:54:34 +00:00
OG T	cbb719b4a1	fix(decision_manager): ADR-091 hotfix — 修復 d5dbfc9 喪屍閘門邏輯漏洞 All checks were successful CD Pipeline / build-and-deploy (push) Successful in 11m9s Details d5dbfc9 引入的閘門條件 `not action.strip()` 在 action="待分析" 時判斷為 False（非空字串），導致閘門失效，喪屍卡片仍然突圍廣播。根本原因：c759b4e P1 修復讓 suggested_action fallback 為 "待分析" 而非 ""，使原本的 empty-string 檢查形同虛設。修復：改用集合判斷 `_action_text in {"", "待分析", "NO_ACTION", "待分析 - 系統自動保護"}`，涵蓋所有已知失敗狀態 token，完全封堵喪屍卡片廣播路徑。 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 22:44:53 +08:00
AWOOOI CD	3c56f02954	chore(cd): deploy `af2adb5` [skip ci]	2026-04-17 14:36:03 +00:00
OG T	af2adb5b96	fix(telegram): ADR-091 禁止 Agent Debate 分析失敗時廣播「待分析」喪屍卡片 All checks were successful CD Pipeline / build-and-deploy (push) Successful in 10m51s Details 問題根因: GET /incidents 觸發 Phase 2 Agent Debate → LLM 全失敗 → description="待分析" + action="" → 每隔幾分鐘廣播新 Telegram 卡片 → 告警疲勞（SRE 最致命的殺手）架構缺陷 (anti-pattern): GET 請求（讀取操作）產生對外廣播副作用 → 違反 RESTful 原則修復 (_push_decision_to_telegram): 在 DB 更新完成後、Telegram 推送前加入閘門： description="待分析" AND action="" → 靜默退出，絕不廣播 ADR-091 鐵律: 只有 Alertmanager Webhook POST（真實新告警）可觸發 Telegram 廣播 Agent Debate 失敗分析 → 靜默 DB 更新，不污染頻道 2026-04-17 ogt + Claude Sonnet 4.6 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 22:26:35 +08:00
AWOOOI CD	f7edae78fb	chore(cd): deploy `604d8ee` [skip ci]	2026-04-17 14:21:29 +00:00
OG T	6c10c6db86	chore(types): 同步 shared-types 自動產生 All checks were successful Type Sync Check / check-type-sync (push) Successful in 1m14s Details Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 22:12:16 +08:00
OG T	604d8eea37	fix(schema-drift): 補齊 prompts.py + Claude API schema enum 同步 (ADR-090) All checks were successful CD Pipeline / build-and-deploy (push) Successful in 12m27s Details 問題: `fe77e6d` 擴充了 models/ai.py enum 至 8 值，但兩個地方未同步： 1. core/prompts.py L77: 缺 INVESTIGATE、OBSERVE 2. core/prompts.py L176 (NEMOTRON_SYSTEM_PROMPT): 缺 APPLY_HPA、INVESTIGATE、OBSERVE 3. openclaw.py L564 (_call_claude tools schema): 舊 4 值 enum 約束影響: LLM 不知道可以輸出 INVESTIGATE/OBSERVE，只能選舊 4 值修復: 三處統一對齊 8 個 suggested_action 值 RESTART_DEPLOYMENT\|DELETE_POD\|SCALE_DEPLOYMENT\|APPLY_HPA\|TUNE_RESOURCES\|INVESTIGATE\|OBSERVE\|NO_ACTION Closes: ADR-090 Prompt-Model 三層同步鐵律 2026-04-17 ogt + Claude Sonnet 4.6 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 22:10:18 +08:00
OG T	e4bc3ec0ee	docs(hard-rules): Prompt-Model 同步鐵律 — LLM Schema Drift 禁令血的教訓 (2026-04-17): SuggestedAction enum prompt/model 不同步 → NemoTron 輸出 investigate → Pydantic 爆炸 → 全系統 fallback 待分析新增強制鐵律: - 修改 prompts.py 必須同步更新 models/ai.py - 接收 LLM JSON 的 Model 必須有 validator + fallback - 禁止靜默死亡（必須 log 具體失敗欄位） Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 21:48:50 +08:00
AWOOOI CD	8e43d52afb	chore(cd): deploy `fe77e6d` [skip ci]	2026-04-17 13:45:54 +00:00
OG T	fe77e6d297	fix(ai): SuggestedAction enum 擴充 + Pydantic fallback 防護 Some checks failed CD Pipeline / build-and-deploy (push) Successful in 10m48s Details Type Sync Check / check-type-sync (push) Failing after 2m52s Details 根本原因: NemoTron 輸出 "investigate" → Pydantic 只接受 4 個值 → 爆炸 → openclaw_analysis_parse_failed → analysis_result=None → 全部 fallback 卡片顯示「待分析」修復: 1. SuggestedAction enum 新增 INVESTIGATE/OBSERVE/APPLY_HPA/TUNE_RESOURCES (prompt.py 列了 6 個，enum 只有 4 個，prompt/model 不同步是根源) 2. normalize_suggested_action validator: uppercase + 別名映射 + 未知值 fallback NO_ACTION 確保任何 LLM 輸出都不會讓 Pydantic 爆炸導致 analysis_result = None Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 21:36:36 +08:00
AWOOOI CD	5d715e16ee	chore(cd): deploy `c759b4e` [skip ci]	2026-04-17 08:38:18 +00:00
OG T	c759b4eeab	fix(webhook+decision): ADR-089 async webhook + 超時髒資料修復 All checks were successful CD Pipeline / build-and-deploy (push) Successful in 10m16s Details P0 — Webhook async (ADR-089): - Alertmanager 收到告警立即回 202，不再同步等 90s LLM - 新增 _process_new_alert_background()：LLM 分析/Approval/Incident/Telegram 全進背景 - 根治 Alertmanager Fallback 風暴（超時 → 重送 → 指數退避風暴） P1 — 超時髒資料 (decision_manager): - _package_to_proposal_data: blocked_reason 禁止進 desc_parts（禁進卡片） - _push_decision_to_telegram: suggested_action fallback 改「待分析」，禁止 description 流入 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 16:29:24 +08:00
AWOOOI CD	f2ac5d01c6	chore(cd): deploy `9d6aa7e` [skip ci]	2026-04-17 08:24:05 +00:00
OG T	9d6aa7ea45	feat(trust): ADR-088 Trust Score 持久化 — L4 自動放行核心 All checks were successful CD Pipeline / build-and-deploy (push) Successful in 10m40s Details TrustScoreManager 從記憶體升級為 PostgreSQL 持久化， Pod 重啟後信任分數不再歸零，AI 能真正累積到 L4 自動放行門檻。變更: - migrations/adr088_trust_score_persistence.sql: trust_records 表 - db/models.py: TrustRecordDB ORM model - repositories/interfaces.py: ITrustRepository Protocol - repositories/trust_repository.py: PG upsert ON CONFLICT DO UPDATE - services/trust_engine.py: bulk_load() 啟動 warm-up - services/learning_service.py: _persist_trust() + 2 call sites - main.py: 啟動時 load_all() → bulk_load() 流程: 批准 5 次 → score=5 寫入 DB → Pod 重啟 → warm-up 讀回 → evaluate_adjusted_risk MEDIUM→LOW → 自動執行 2026-04-17 ogt + Claude Sonnet 4.6（亞太）: ADR-088 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 16:14:44 +08:00
AWOOOI CD	148d59a0e4	chore(cd): deploy `1ae9e9f` [skip ci]	2026-04-17 07:32:22 +00:00
OG T	ba8cf6105d	docs(adr): ADR-086 Telegram UI 清洗規範 + ADR-087 AutoApprove kubectl 閘門 ADR-086: Telegram 通知卡片 UI 清洗規範 - _parse_debate_summary() 設計決定與各 TYPE 欄位清洗規則 - TYPE-3 鍵盤重構：批准/拒絕永遠第一行 - 技術債：_parse_debate_summary 提升模組層級（P1-1） ADR-087: AutoApprove 安全強化 — kubectl 強制執行閘門 - 條件 1d 設計：_raw_action 語意 + NO_EXECUTABLE_ACTION reason - Solver Nemo 格式 kubectl 驗證 - 降級指令改為真實 kubectl 唯讀調查 - min_trust_score=0 保留理由記錄（TrustEngine 記憶體持久化技術債） - P0-2 風險記錄：kubectl exec 未加入 _DESTRUCTIVE_PATTERNS 2026-04-17 ogt + Claude Sonnet 4.6（亞太）: Session 技術債清理 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 15:25:34 +08:00
OG T	1ae9e9f389	fix(code-review): P0-1 action fallback 語意修正 + P1-2 reason enum + P2-2 secops 清洗 All checks were successful CD Pipeline / build-and-deploy (push) Successful in 10m7s Details Code Review 發現 (2026-04-17 首席架構師審查): P0-1 auto_approve.py 條件 1d 語意修正: - 原：用 `action` 變數（已 fallback = action or kubectl_command）做 kubectl 判斷 → action="" + kubectl_command="kubectl get pods" → action="kubectl get pods" → 1d 通過 → _kubectl_cmd 與 action 同值（重複判斷同一來源），掩蓋 action 本身是自然語言的情況 - 修：改用 proposal_data.get("action", "") 原始值（_raw_action） → 直接檢查 action 欄位本身，邏輯語意明確 P1-2 auto_approve.py NO_EXECUTABLE_ACTION 新增: - 新增 AutoApproveReason.NO_EXECUTABLE_ACTION enum 值 - 條件 1d 改用此 reason（原 NO_PLAYBOOK 語意為「無匹配 Playbook」，不適用此場景） - 避免污染 KM 飛輪學習資料的根因分類（ADR-068） P2-2 decision_manager.py secops 分支: - threat_behavior 改用 _parse_debate_summary → 取 diagnosis 欄位 - 與 BUG-A/BUG-C 修復一致，不再傾倒完整 debate_summary 前 150 字 ADR-082: Phase 2 多 Agent 協作 2026-04-17 ogt + Claude Sonnet 4.6（亞太）: Code Review 後修正 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 15:23:35 +08:00
AWOOOI CD	b80836329e	chore(cd): deploy `93205ce` [skip ci]	2026-04-17 06:58:39 +00:00
OG T	93205ceab0	fix(auto_approve+solver): P1 kubectl gate + P2 Nemo path kubectl 強制 All checks were successful CD Pipeline / build-and-deploy (push) Successful in 9m56s Details P1 安全漏洞 (auto_approve.py): - 新增條件 1d：action 必須含 kubectl 關鍵字才可自動執行 - Solver 經 OpenClaw Nemo 路徑輸出自然語言 → 條件 1c 通過但無法執行 - 修復：自然語言 action → 降級人工審核（NO_PLAYBOOK reason） P2 執行障礙 (solver_agent.py): - Nemo 格式路徑：action_title 不含 kubectl → return [] → 觸發 _degraded_plan - _default_action_for_category：舊自然語言 → 真實 kubectl 調查指令 - 降級路徑現在輸出 kubectl get/top/exec 等唯讀指令，可被 auto_approve 1d 正確評估 ADR-082: Phase 2 多 Agent 協作 2026-04-17 ogt + Claude Sonnet 4.6（亞太）: P1+P2 hotfix Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 14:49:53 +08:00
OG T	f421e652d3	fix(telegram): BUG-C TYPE-3 排版清洗 + 批准/拒絕永遠置頂（ADR-075 UI 第三波修復） Some checks failed CD Pipeline / build-and-deploy (push) Has been cancelled Details Checkpoint 1 — decision_manager.py TYPE-3 root_cause 清洗: - 舊: root_cause=_smt(reasoning, 500) → debate_summary 全文（診斷/方案/審查/質疑）全部傾倒到 AI 診斷欄 - 新: _parse_debate_summary 只取 diagnosis 欄位 + _smt 截斷 300 字 - 移除 _requires_human 變數（已無用途） Checkpoint 2 — telegram_gateway.py _build_inline_keyboard 按鈕順序重構: - 舊: K8s 類別按鈕置頂，批准/拒絕受 requires_human_approval 控制 → 死卡 - 新: [✅ 批准][❌ 拒絕] 永遠第一行，K8s/DB/Host 操作按鈕置後 - 移除 requires_human_approval 參數（邏輯已簡化為無條件置頂）修改範圍: decision_manager.py else 路由段 + _build_inline_keyboard + send_approval_card 簽名， telegram_gateway.py 模板/訊息格式零改動。 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 14:42:29 +08:00
AWOOOI CD	682f974a37	chore(cd): deploy `418d735` [skip ci]	2026-04-17 06:23:07 +00:00
OG T	418d73540b	fix(telegram): BUG-A TYPE-1 + BUG-B TYPE-4D 資料前處理（ADR-075 UI 第二波修復） All checks were successful CD Pipeline / build-and-deploy (push) Successful in 10m25s Details BUG-A (TYPE-1 純資訊通知): - 舊: message=reasoning[:200] → debate_summary 全文傾倒（診斷/方案/審查/質疑一起出現） - 新: _parse_debate_summary(reasoning) 只取 diagnosis 欄位 + _smt 截斷 200 字 BUG-B (TYPE-4D Config Drift): - 舊: diff_summary=description[:500] → LLM 輸出的 JSON 原文直接顯示在 <pre> 區塊 - 新: JSON Catcher — json.loads(description) 成功則格式化「📝建議操作/📖說明/⏪回滾方案」失敗 (JSONDecodeError/TypeError/AttributeError) → 平滑降級為純文字截斷僅修改 decision_manager.py 路由準備段，telegram_gateway.py 模板層零改動。 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 14:14:10 +08:00
AWOOOI CD	f677b72114	chore(cd): deploy `6baa2e9` [skip ci]	2026-04-17 06:07:05 +00:00

1 2 3 4 5 ...

1558 Commits