fix(drift-narrator): B 方案 LLM 驅動智能摘要 — 徹底消滅 str()[:30] 暴力截斷

2026-04-18 下午（台北時區）—— ogt + Claude Opus 4.7 (1M) 根因: _format_drift_summary() 對 dict/list 型別的 git_value/actual_value 直接呼叫 str()[:30] 暴力截斷,產生像 "[{'name': 'repair-ssh-key', 's" 這種亂碼掉半個 dict key 的亂七八糟輸出,徹底違背「AI 自主化」原則。 B 方案架構決策: 「捨棄 Python 寫死的字串解析邏輯。將原始 Config Diff 結構直接作為 Context,餵給 Hermes/NemoTron,利用 prompt 規定輸出格式,讓 LLM 自己消化並輸出包含紅黃燈標示的 Top 5 人類易讀摘要。」實作: 1. _NARRATIVE_PROMPT 重寫 — 要求 LLM 回傳 {narrative, items[]} JSON - drift items 以 JSON serialize 餵進 prompt（保留 200 字 context） - items 限 5 筆,HIGH 優先 - summary 30 字繁中口語（非技術 repr） 2. _generate_narrative_and_items() 新方法 — 解析 LLM JSON 並驗證結構 3. _format_drift_for_llm() 新方法 — 結構化 JSON 給 LLM（取代舊 str 版） 4. _render_telegram_body() 新方法 — 組裝乾淨的 Telegram 卡片範例輸出: 🤖 AI 研判 <LLM 4-5 行敘述> 📊 漂移明細 (HIGH: 1 | MEDIUM: 29) 🔴 spec.template.spec.volumes: 新增 2 項 repair-ssh-key 掛載 🟡 spec.template.spec.serviceAccount: (未設) → awoooi-executor ... 還有 27 項 (按 🔍 查看 Diff) 5. Fallback 強化 — _smart_shorten() + _fallback_items() LLM 失敗時用型別感知的 Python 摘要（dict/list 顯示大小,不暴力 repr）移除: - _format_drift_summary() — 舊的暴力截斷實作 - _generate_narrative() — 只回 string 的舊介面保留: - _fallback_narrative() / _format_intent_summary() — 仍有用 - Redis 快取 / trigger 條件 / DB update — 邏輯不變 MVP 階段: 本 commit 只改視覺呈現,沒動 automation_operation_log / ai_collaboration_trace 稽核寫入。等 Telegram 視覺驗證 OK 後再做 Phase 2 加入 DB 稽核。相關: - feedback_ai_autonomous_direction.md 北極星原則 - 1ff3405 今早的 JSON 裸奔 hotfix（只修了 narrative,沒修 items） Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 15:53:28 +08:00
parent 7d342e3f3e
commit fb88512fcb
1 changed files with 153 additions and 58 deletions
--- a/apps/api/src/services/drift_narrator_service.py
+++ b/apps/api/src/services/drift_narrator_service.py
@@ -53,22 +53,37 @@ TRIGGER_MEDIUM_MIN = 3
 # ============================================================
 # Prompt
 # ============================================================
-_NARRATIVE_PROMPT = """你是 AWOOOI SRE 維運助理，請將以下 K8s 配置漂移報告轉為繁體中文人話。
+# 2026-04-18 ogt + Claude Opus 4.7: B 方案 — LLM 驅動智能摘要（取代 Python str()[:30] 截斷）
+# 架構鐵律: 捨棄 Python 寫死字串解析，結構化 diff 直接餵 LLM,由 LLM 產出繁中 Top 5 摘要
+_NARRATIVE_PROMPT = """你是 AWOOOI SRE 維運助理。以下是 K8s Config Drift 報告的原始結構化資料。

-## 漂移摘要
-{drift_summary}
+## 漂移項目原始資料（JSON）
+{drift_items_json}

 ## 意圖分析
 {intent_summary}

-## 要求
- 繁體中文，5 行以內
- 第 1 行：說明漂移了哪些資源（resource name）
- 第 2 行：說明嚴重程度和數量
- 第 3 行：最可能的原因（引用意圖分析）
- 第 4 行：建議的運維動作（rollback 或 adopt）
- 避免技術術語，用平實口語
- 只輸出摘要文字，不要標題或 markdown
+## 輸出規格（必須是合法 JSON，不得有任何前後文字）
+{{
+  "narrative": "4-5 行繁體中文敘述,說明漂移了哪些資源/嚴重程度/可能原因/建議動作",
+  "items": [
+    {{
+      "level": "high 或 medium",
+      "field": "簡化後的欄位路徑 (40 字內)",
+      "summary": "30 字內繁體中文口語摘要,說明從什麼變成什麼"
+    }}
+  ]
+}}
+
+## 規則
+- 繁體中文
+- items 最多挑 5 筆最重要的（HIGH 優先）
+- summary 要讓非技術人員看懂「改了什麼」,例如:
+  - "新增 repair-ssh-key secret 掛載"（而非 repr 一長串）
+  - "(未設) → awoooi-executor"
+  - "新增 pod anti-affinity 規則"
+- 禁止 markdown、反引號、emoji
+- 只輸出純 JSON,不要包在 code block 裡
 """


@@ -110,8 +125,9 @@ class DriftNarratorService:
            logger.debug("drift_narrator_cache_hit", report_id=report.report_id)
            return

-        narrative = await self._generate_narrative(report, interpretation)
-        await self._send_telegram(report, narrative)
+        # 2026-04-18 B 方案: LLM 同時產 narrative + 結構化 items（取代 str()[:30]）
+        narrative, items = await self._generate_narrative_and_items(report, interpretation)
+        await self._send_telegram(report, narrative, items)

        # 寫入 DB narrative_text (Phase 30 ADR-067)
        try:
@@ -142,68 +158,115 @@ class DriftNarratorService:
        medium = sum(1 for i in non_hpa_items if i.drift_level.value == "medium")
        return high >= TRIGGER_HIGH_MIN or medium >= TRIGGER_MEDIUM_MIN

-    async def _generate_narrative(
+    async def _generate_narrative_and_items(
        self,
        report: "DriftReport",
        interpretation: "DriftInterpretation | None",
-    ) -> str:
-        """呼叫 Ollama qwen2.5:7b-instruct 生成摘要"""
-        drift_summary = self._format_drift_summary(report)
+    ) -> tuple[str, list[dict]]:
+        """
+        2026-04-18 ogt + Claude Opus 4.7: B 方案 — LLM 產生 narrative + 結構化 items
+
+        回傳 (narrative, items):
+          narrative: 繁中 4-5 行敘述
+          items: [{level, field, summary}, ...] 最多 5 筆
+
+        LLM 失敗則 fallback 到 Python 智能截斷（不是 str()[:30] 暴力砍）
+        """
+        import json as _json
+
+        drift_items_json = self._format_drift_for_llm(report)
        intent_summary = self._format_intent_summary(interpretation)

        prompt = _NARRATIVE_PROMPT.format(
-            drift_summary=drift_summary,
+            drift_items_json=drift_items_json,
            intent_summary=intent_summary,
        )

-        # 2026-04-17 ogt + Claude Sonnet 4.6: 改用 OpenClaw AI Router 取代直接 Ollama httpx
-        # 根因：直接呼叫 192.168.0.111:11434 繞過 AI Router，無 fallback → "All connection attempts failed"
-        # 修復：統一走 openclaw.call()，自動享有 Provider 降級與 fallback 機制
-        # 同 drift_interpreter.py 的修法（d952435）
        try:
            openclaw = get_openclaw()
            text, _provider, success = await openclaw.call(prompt)

            if success and text and text.strip():
-                # 2026-04-17 ogt + Claude Sonnet 4.6: 修復 JSON 裸奔問題
-                # 根因：openclaw.call() 經 NEMOTRON 路由後強制回傳 JSON（NEMOTRON_SYSTEM_PROMPT 要求）
-                #       但此處需要純文字敘述 → JSON 被直接吐到 Telegram <pre> 區塊
-                # 修復：嘗試解析 JSON，優先取 description；否則視為純文字使用
-                import json as _json
                _raw = text.strip()
+                # 嘗試剝 code fence
+                if _raw.startswith("```"):
+                    _raw = _raw.strip("`").lstrip("json").strip()
                try:
                    _parsed = _json.loads(_raw)
                    if isinstance(_parsed, dict):
-                        narrative = (
-                            _parsed.get("description")
-                            or _parsed.get("action_title")
-                            or _parsed.get("reasoning")
-                            or _raw
-                        )
-                        return str(narrative).strip()
-                except (_json.JSONDecodeError, ValueError):
-                    pass
-                return _raw
+                        narrative = str(_parsed.get("narrative", "")).strip()
+                        items = _parsed.get("items", [])
+                        if isinstance(items, list) and narrative:
+                            # 驗證 item 結構
+                            clean_items = []
+                            for it in items[:5]:
+                                if isinstance(it, dict) and it.get("field") and it.get("summary"):
+                                    clean_items.append({
+                                        "level": it.get("level", "medium"),
+                                        "field": str(it["field"])[:60],
+                                        "summary": str(it["summary"])[:80],
+                                    })
+                            if clean_items:
+                                return narrative, clean_items
+                except (_json.JSONDecodeError, ValueError) as e:
+                    logger.warning("drift_narrator_json_parse_fail", err=str(e), raw_prefix=_raw[:80])

            logger.warning("drift_narrator_openclaw_failed", provider=_provider)

        except Exception as e:
            logger.warning("drift_narrator_llm_error", error=str(e))

-        # Fallback：結構化文字摘要
-        return self._fallback_narrative(report, interpretation)
+        # Fallback：Python 智能截斷（不是 str()[:30]）
+        return self._fallback_narrative(report, interpretation), self._fallback_items(report)

-    def _format_drift_summary(self, report: "DriftReport") -> str:
-        lines = []
-        for item in report.items[:8]:
+    def _format_drift_for_llm(self, report: "DriftReport") -> str:
+        """
+        2026-04-18 ogt + Claude Opus 4.7: B 方案 — 餵 LLM 用的 JSON 序列化
+        保留更多原始 context 給 LLM 推理,不做 30 字元暴力截斷
+        """
+        import json as _json
+        items_for_llm = []
+        for item in report.items[:12]:
            if item.is_allowlisted or item.field_path in _HPA_ALLOWLIST_PATHS:
                continue
-            lines.append(
-                f"- [{item.drift_level.value}] {item.resource_kind}/{item.resource_name}: "
-                f"{item.field_path} "
-                f"(Git: {str(item.git_value)[:30]} → K8s: {str(item.actual_value)[:30]})"
-            )
-        return "\n".join(lines) if lines else "（均為白名單欄位）"
+            items_for_llm.append({
+                "level": item.drift_level.value,
+                "resource": f"{item.resource_kind}/{item.resource_name}",
+                "field": item.field_path,
+                "git_value": str(item.git_value)[:200] if item.git_value is not None else None,
+                "actual_value": str(item.actual_value)[:200] if item.actual_value is not None else None,
+            })
+        return _json.dumps(items_for_llm, ensure_ascii=False, indent=2)
+
+    def _smart_shorten(self, val) -> str:
+        """型別安全摘要 — dict/list 顯示大小,字串保留頭尾,None 轉「未設」"""
+        if val is None:
+            return "(未設)"
+        s = str(val)
+        # 嘗試判斷是不是 JSON 字串
+        if s.startswith("[") and s.endswith("]"):
+            return f"[清單 {s.count(',')+1 if s != '[]' else 0} 項]"
+        if s.startswith("{") and s.endswith("}"):
+            # 粗估欄位數
+            return f"{{物件 {s.count(':')} 欄位}}"
+        if len(s) > 40:
+            return s[:37] + "..."
+        return s
+
+    def _fallback_items(self, report: "DriftReport") -> list[dict]:
+        """LLM 失敗時的 Python 智能摘要（取代舊 str()[:30]）"""
+        items = []
+        for item in report.items[:5]:
+            if item.is_allowlisted or item.field_path in _HPA_ALLOWLIST_PATHS:
+                continue
+            from_val = self._smart_shorten(item.git_value)
+            to_val = self._smart_shorten(item.actual_value)
+            items.append({
+                "level": item.drift_level.value,
+                "field": item.field_path[:60],
+                "summary": f"{from_val} → {to_val}",
+            })
+        return items

    def _format_intent_summary(self, interpretation: "DriftInterpretation | None") -> str:
        if not interpretation:
@@ -234,21 +297,21 @@ class DriftNarratorService:
            f"建議：確認是否需要 rollback 回 Git 狀態。"
        )

-    async def _send_telegram(self, report: "DriftReport", narrative: str) -> None:
+    async def _send_telegram(
+        self,
+        report: "DriftReport",
+        narrative: str,
+        items: list[dict],
+    ) -> None:
        """
-        推送 TYPE-4D Config Drift 卡片（ADR-075）
+        推送 TYPE-4D Config Drift 卡片（ADR-075）+ B 方案智能摘要

-        使用 send_drift_card() 取代舊 send_notification()，呈現結構化格式與操作按鈕。
-        diff_summary = AI 研判（narrative） + 漂移詳情（前 8 筆）
-        approval_id / incident_id 均使用 report_id（無需建立 ApprovalRequest）
+        2026-04-18 ogt + Claude Opus 4.7: 改用 LLM 產的結構化 items,
+        取代 str()[:30] 暴力截斷產生的亂碼
        """
        from src.services.telegram_gateway import get_telegram_gateway

-        diff_summary = (
-            f"🤖 AI 研判\n{narrative}\n\n"
-            f"漂移明細（HIGH: {report.high_count} | MEDIUM: {report.medium_count}）\n"
-            f"{self._format_drift_summary(report)}"
-        )
+        diff_summary = self._render_telegram_body(report, narrative, items)

        try:
            tg = get_telegram_gateway()
@@ -262,6 +325,38 @@ class DriftNarratorService:
        except Exception as e:
            logger.warning("drift_narrator_telegram_error", error=str(e))

+    def _render_telegram_body(
+        self,
+        report: "DriftReport",
+        narrative: str,
+        items: list[dict],
+    ) -> str:
+        """
+        組裝 Telegram 卡片 body（B 方案格式）
+
+        範例輸出:
+          🤖 AI 研判
+          volumes 與 affinity 被手動修改...
+
+          📊 漂移明細 (HIGH: 1 | MEDIUM: 29)
+          🔴 spec.template.spec.volumes: 新增 2 項 repair-ssh-key 掛載
+          🟡 spec.template.spec.serviceAccount: (未設) → awoooi-executor
+          🟡 spec.template.spec.affinity.podAntiAffinity: 新增 preferred 規則
+          ... 還有 27 項
+        """
+        lines = [f"🤖 AI 研判\n{narrative}\n"]
+        lines.append(f"📊 漂移明細 (HIGH: {report.high_count} | MEDIUM: {report.medium_count})")
+        for it in items:
+            emoji = "🔴" if it.get("level") == "high" else "🟡"
+            lines.append(f"{emoji} {it['field']}: {it['summary']}")
+
+        total = report.high_count + report.medium_count
+        shown = len(items)
+        if total > shown:
+            lines.append(f"... 還有 {total - shown} 項 (按 🔍 查看 Diff)")
+
+        return "\n".join(lines)
+

 # ============================================================
 # Singleton