專業化資源壓力告警語意

2026-05-19 22:40:19 +08:00
parent 0fc96837f4
commit 2e5ff091fa
3 changed files with 52 additions and 8 deletions
--- a/docs/AI_INTELLIGENCE_MODULE_SOT.md
+++ b/docs/AI_INTELLIGENCE_MODULE_SOT.md
@@ -111,7 +111,7 @@ SQL漏斗(~300筆)
 - ElephantAlpha 使用 NVIDIA NIM hosted API；production 預設模型為 `nvidia/llama-3.3-nemotron-super-49b-v1.5`，`ELEPHANT_ALPHA_FALLBACK_MODELS` 需保留至少一個可呼叫備援；403/404、408/409/425/429、5xx、timeout 與 connection error 必須嘗試下一個模型。
 - ElephantAlpha L3 HITL 只允許發送有實證、可審核、可行動的升級告警；價格類 trigger 無 Hermes 具體威脅時，只記錄 suppressed escalation telemetry 與 cooldown，不寫 pending `human_review`，不發 Telegram 空告警。
 - ElephantAlpha 價格類 trigger 的 HITL / 決策 prefetch 必須先使用觸發 SQL 與 `competitor_prices` / `price_records` 的 DB 實證生成 SKU、MOMO / PChome 價差與建議 action lines；完整 Hermes LLM prefetch 預設關閉（`ELEPHANT_ALPHA_HERMES_LLM_PREFETCH_ENABLED=false`），避免 5s timeout 後落入無實證摘要或雲端備援。若無 DB 實證，只記錄 suppressed telemetry / cooldown，不發 Telegram 空告警。
- `resource_optimization` 不再交給 LLM 生成「預期效益 / 已執行」敘事。此 trigger 必須先由程式量測 `action_plans` backlog、P1/P2 數、pending_review、逾時項目與 CPU load；只有 CPU 達門檻、P1/P2 積壓或逾時積壓才發 Telegram「資源壓力告警」。單純 queue 大但 CPU 正常只記錄 telemetry，不派發 Hermes/NemoTron、不宣稱 48 小時效益。
+- `resource_optimization` 不再交給 LLM 生成「預期效益 / 已執行」敘事，顯示名稱統一為「資源壓力治理」。此 trigger 必須先由程式量測 `action_plans` backlog、P1/P2 數、pending_review、逾時項目與 CPU load；只有 CPU 達門檻、P1/P2 積壓或逾時積壓才發 Telegram「資源壓力告警」。單純 queue 大但 CPU 正常只記錄 telemetry，不派發 Hermes/NemoTron、不宣稱 48 小時效益；Telegram 段落使用「系統處置紀錄」而非泛稱「已執行」，避免暗示 AI 已完成未經驗證的外部動作。
 - `resource_optimization` 會先執行 `ActionPlanHygieneService` 清理過期噪音：只關閉超過 72 小時的 `code_review_fix` / `openclaw_recommendation` 類 advisory action_plans，以及 NemoTron `direct_response/reply_simple` 舊聊天回覆計畫；將狀態改為 `auto_disabled` 或 `rejected` 並寫入 `metadata_json.hygiene_history`。不刪資料，也不碰 NemoTron human_review / pricing / tool action 類業務行動。
 - `momo-scheduler` 每 6 小時固定執行 `run_action_plan_hygiene_task()`，讓過期 advisory action_plans 的關閉不再依賴 `resource_optimization` 告警觸發；排程失敗會經 EventRouter 發送 `action_plan_hygiene_failure`。
 - `action_plans` 產生端必須防重：Code Review 同一檔案已有 active `code_review_fix` 時不重建；OpenClaw recommendation 會寫入文字 fingerprint 並跳過同一建議；AIOrchestrator 不再把 NemoTron `direct_response/reply_simple` 聊天回覆存成 action plan，真正需工具、審核或執行的 NemoTron action 才能進 queue。
@@ -339,6 +339,7 @@ LEFT JOIN competitor_prices cp
 - Dashboard、AI pick、Hermes、Excel export、daily/growth 圖表與 competitor PPT 必須以 `competitor_prices + competitor_price_history + competitor_match_attempts` 為短期唯一生產真相源，且只消費 `identity_v2` matcher 驗證過的配對；舊版僅靠 `match_score` 的快取不可直接進入決策或簡報。
 - `pchome_matches` 與 live `pchome_batch()` 僅保留 legacy compatibility，不得作為新簡報或 AI 決策主來源。
 - `services/competitor_intel_repository.py` 是下游頁面、圖表、簡報的共用查詢出口；新增消費端不得各自硬寫不同 match threshold。
+- competitor PPT 不可只輸出 matched rows 造成覆蓋率假象；`fetch_competitor_comparison_results()` 必須用 `LEFT JOIN valid_competitor` 保留高營收/高價但尚未有效配對的 MOMO 商品，並帶出 `match_status`、`candidate_count`、`best_match_score` 與 `match_diagnostic`，讓簡報與 AI 文案明確區分「高信心比對」與「待補身份/價格」。
 - `services/competitor_identity_revalidator.py` 可對既有 `competitor_prices` legacy row 離線重跑 `identity_v2`：只有新版 matcher 分數 `>= 0.76` 且無 hard veto 才補 `identity_v2` / `legacy_revalidated` tags；預設不刷新 `expires_at`，避免過期價格進入決策。
 - Dashboard 必須把「待比對」拆成可診斷狀態：`價格過期待刷新`、`舊版配對待重驗`、`低分配對待審`、`身份否決`、`找不到同款`、`抓取異常`、`尚未搜尋`。不可再用單一「待比對」掩蓋資料品質原因。

--- a/services/elephant_alpha_autonomous_engine.py
+++ b/services/elephant_alpha_autonomous_engine.py
@@ -87,7 +87,7 @@ _TRIGGER_ZH = {
    "price_drop_alert": "價格下滑警報",
    "market_opportunity": "市場機會偵測",
    "threat_escalation": "威脅升級通報",
-    "resource_optimization": "資源調配優化",
+    "resource_optimization": "資源壓力治理",
    "code_exception": "程式碼異常偵測",
    "weekly_insight": "全景電商洞察分析",
 }
@@ -1087,21 +1087,21 @@ class ElephantAlphaAutonomousEngine:
            if metrics.get("load_pressure")
            else "主機 CPU 未達高負載門檻，這不是主機資源耗盡，而是工作隊列/人工審核積壓。"
        )
-        executed = [
+        handling_notes = [
            f"已寫入 ai_insights(resource_pressure) #{insight_id}。"
            if insight_id
            else "ai_insights(resource_pressure) 寫入未取得 id；請查看 scheduler log。"
        ]
        if new_limit != previous_limit:
-            executed.append(f"已將 ElephantAlpha 自主決策上限由 {previous_limit} 調整為 {new_limit} 次/小時。")
+            handling_notes.append(f"已將 ElephantAlpha 自主決策上限由 {previous_limit} 調整為 {new_limit} 次/小時。")
        if hygiene_count > 0:
            by_source = hygiene.get("by_source") or {}
            source_text = "、".join(f"{key} {value}" for key, value in by_source.items()) or f"{hygiene_count} 筆"
-            executed.insert(
+            handling_notes.insert(
                1,
                f"已自動關閉過期 action_plans {hygiene_count} 筆（{source_text}）；只改 status/metadata，不刪除資料。",
            )
-        executed.append("未執行外部修復、未啟動 Hermes/NemoTron 價格分析、未宣稱效益預測。")
+        handling_notes.append("邊界：未執行外部修復、未啟動 Hermes/NemoTron 價格分析、未宣稱效益預測。")

        lines = [
            "<b>Elephant Alpha · 資源壓力告警</b>",
@@ -1130,8 +1130,8 @@ class ElephantAlphaAutonomousEngine:
            f"• {escape(load_judgement)}",
            "• 這則告警只採用 action_plans 與 CPU 實測值，不採用 LLM 生成的 48 小時效益預測。",
            "",
-            "<b>已執行</b>",
-            *[f"• {escape(item)}" for item in executed],
+            "<b>系統處置紀錄</b>",
+            *[f"• {escape(item)}" for item in handling_notes],
            "",
            "<b>建議下一步</b>",
            *[f"• {escape(item)}" for item in ElephantAlphaAutonomousEngine._build_resource_pressure_actions(metrics)],
@@ -1456,6 +1456,12 @@ class ElephantAlphaAutonomousEngine:
        decision: StrategicDecision,
        trigger: AutonomousTrigger,
    ) -> None:
+        if trigger.trigger_type == "resource_optimization":
+            self._log.warning(
+                "Suppressed legacy autonomous execution Telegram template for resource_optimization; "
+                "resource pressure alerts must use the measurement-based template."
+            )
+            return
        try:
            from services.telegram_templates import _send_telegram_raw

--- a/tests/test_elephant_alpha_engine.py
+++ b/tests/test_elephant_alpha_engine.py
@@ -297,6 +297,7 @@ def test_resource_pressure_message_is_measurement_based_not_llm_theatre():
    )

    assert "量測指標" in msg
+    assert "系統處置紀錄" in msg
    assert "主機 CPU 未達高負載門檻" in msg
    assert "未啟動 Hermes/NemoTron 價格分析" in msg
    assert "預期效益" not in msg
@@ -388,3 +389,39 @@ def test_resource_pressure_message_reports_hygiene_result():
    assert "清理前 Action queue：100" in msg
    assert "已自動關閉過期 action_plans 91 筆" in msg
    assert "只改 status/metadata，不刪除資料" in msg
+
+
+def test_resource_optimization_cannot_use_legacy_autonomous_execution_template(monkeypatch):
+    from services.elephant_alpha_autonomous_engine import (
+        AutonomousTrigger,
+        ElephantAlphaAutonomousEngine,
+    )
+    from services.elephant_alpha_orchestrator import StrategicDecision
+
+    engine = ElephantAlphaAutonomousEngine()
+    sent = []
+
+    async def _capture_send(message):
+        sent.append(message)
+
+    monkeypatch.setattr("services.telegram_templates._send_telegram_raw", _capture_send)
+
+    decision = StrategicDecision(
+        priority="medium",
+        agents_required=["elephant_alpha"],
+        reasoning="resource pressure test",
+        expected_outcome="old style outcome must not be sent",
+        confidence=0.8,
+        execution_plan=[{"agent": "Hermes", "action": "generate_resource_optimization_strategy"}],
+        resource_requirements={},
+    )
+    trigger = AutonomousTrigger(
+        trigger_type="resource_optimization",
+        conditions={},
+        threshold=0.6,
+        enabled=True,
+    )
+
+    asyncio.run(engine._notify_telegram_executed(decision, trigger))
+
+    assert sent == []