fix(telegram+approval): TG-1 + AP-1/2/3 — 4 修 Telegram UX

2026-04-19 凌晨（台北時區）— ogt + Claude Opus 4.7 (1M) ## TG-1: INFO_ACTIONS 加 view security_interceptor.py — 'view' 按鈕現在走 2-part 讀格式, 不再誤觸發 4-part nonce 寫格式。 ## AP-1: approval_records.telegram_message_id 持久化 telegram_gateway.send_approval_card send 成功後,在 DB 層 UPDATE approval_records SET telegram_message_id, telegram_chat_id (不只 Redis, Pod 重啟仍可找回原卡片)。 ## AP-2: approval 執行完成原卡片 edit + KM/Playbook 增量 approval_execution._push_execution_result_to_alert 除了 reply 原卡片, 還 editMessageReplyMarkup 移除按鈕（修「永遠執行中」卡片問題）。 - 同步查 knowledge_entries/playbooks 2min 內增量,附加到訊息顯示 "📚 KM +N 🎯 Playbook 更新×M" - 成功: ✅ 執行成功 + action + KM 增量 - 失敗: ❌ 執行失敗 + 原因 + KM 增量 ## AP-3: primary_responsibility 正規化降「❓ 未知」比例 openclaw._parse_analysis_result: 若 LLM 填空/None/不在白名單 (FE/BE/INFRA/DB/COLLAB),強制 fallback: kubectl 關鍵字有 → INFRA, 否則 BE。之前只檢查 "not in data" 但 None 或空字串會穿過。 ## 跳過: TG-3 (refactor) + TG-5 (webhook 為棄用 endpoint,design 採 Long Polling) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
fix(openclaw): 幻覺驗證雙路徑覆蓋 + 抽出共用 helper
2026-04-19 01:15:58 +08:00 · 2026-04-19 01:11:09 +08:00 · 2026-04-19 01:08:16 +08:00 · 2026-04-19 01:07:13 +08:00 · 2026-04-19 01:06:30 +08:00 · 2026-04-18 16:11:01 +00:00
9 changed files with 770 additions and 48 deletions
--- a/apps/api/migrations/adr090d_kpi_data_sources.sql
+++ b/apps/api/migrations/adr090d_kpi_data_sources.sql
@@ -0,0 +1,149 @@
+-- ADR-090-D: MASTER §7.1 北極星 KPI 資料源建立
+-- 建立時間: 2026-04-18 晚 (台北時區)
+-- 建立者: ogt + Claude Opus 4.7 (1M)
+--
+-- 背景:
+--   MASTER §7.1 15 個 KPI 對標發現 4 張關鍵表根本沒建立,導致以下 KPI 永遠
+--   量不到:
+--     #3 fine-tune JSONL /week    → finetune_exports 表
+--     #6 Declarative 修復使用率   → remediation_events 表
+--     #10 notification_outcomes   → notification_outcomes 表
+--
+--   此 migration 補齊 3 張資料源表(idempotent)。
+--
+-- 對應 MASTER § 指標:
+--   §3.3 D3 修復抽象(Imperative → Declarative)
+--   §3.4 D4 學習深度(Fine-tune)
+--   §3.6 D6 自我治理(通知品質)
+
+-- ═══════════════════════════════════════════════════════════════════
+-- 1. finetune_exports — Phase 3 Fine-tune JSONL 產出追蹤
+-- ═══════════════════════════════════════════════════════════════════
+
+CREATE TABLE IF NOT EXISTS finetune_exports (
+    export_id         BIGSERIAL       PRIMARY KEY,
+    export_type       TEXT            NOT NULL,  -- 'evidence_snapshot' | 'agent_session' | 'decision_outcome'
+    source_table      TEXT,                      -- 來源表名 (incidents / agent_sessions ...)
+    source_ids        TEXT[],                    -- 涵蓋的 source record ids
+    file_path         TEXT,                      -- 匯出的 JSONL 檔案路徑
+    record_count      INT             NOT NULL DEFAULT 0,
+    size_bytes        BIGINT,
+    checksum_sha256   TEXT,
+    created_at        TIMESTAMPTZ     NOT NULL DEFAULT NOW(),
+    metadata          JSONB           NOT NULL DEFAULT '{}'::jsonb,
+    CONSTRAINT finetune_export_type_valid CHECK (export_type IN (
+        'evidence_snapshot','agent_session','decision_outcome',
+        'incident_rca','playbook_outcome','rlhf_trace'
+    ))
+);
+
+COMMENT ON TABLE finetune_exports IS
+    'ADR-090-D: MASTER §7.1 #3 Fine-tune JSONL 產出追蹤。每次 finetune_exporter 匯出寫一筆。';
+
+CREATE INDEX IF NOT EXISTS idx_finetune_exports_created
+    ON finetune_exports(created_at DESC);
+CREATE INDEX IF NOT EXISTS idx_finetune_exports_type
+    ON finetune_exports(export_type);
+
+
+-- ═══════════════════════════════════════════════════════════════════
+-- 2. remediation_events — Phase 5 Declarative 修復追蹤
+-- ═══════════════════════════════════════════════════════════════════
+
+CREATE TABLE IF NOT EXISTS remediation_events (
+    event_id              BIGSERIAL       PRIMARY KEY,
+    incident_id           TEXT,
+    approval_id           TEXT,
+    remediation_type      TEXT            NOT NULL, -- 'declarative' | 'imperative' | 'gitops_pr' | 'kubectl'
+    action_name           TEXT,
+    target_resource       TEXT,                     -- deployment/awoooi-api 等
+    namespace             TEXT,
+    dry_run               BOOLEAN         NOT NULL DEFAULT false,
+    status                TEXT            NOT NULL, -- 'pending' | 'success' | 'failed' | 'rolled_back'
+    error_message         TEXT,
+    blast_radius_score    INT,
+    duration_ms           INT,
+    executed_by           TEXT,                     -- 'ai_agent' | 'human:ogt' | 'cron'
+    triggered_by_op_id    UUID,                     -- 指向 automation_operation_log.op_id
+    created_at            TIMESTAMPTZ     NOT NULL DEFAULT NOW(),
+    completed_at          TIMESTAMPTZ,
+    metadata              JSONB           NOT NULL DEFAULT '{}'::jsonb,
+    CONSTRAINT remediation_type_valid CHECK (remediation_type IN (
+        'declarative','imperative','gitops_pr','kubectl','ansible','helm','argocd_sync'
+    )),
+    CONSTRAINT remediation_status_valid CHECK (status IN (
+        'pending','success','failed','rolled_back','dry_run_ok','dry_run_failed'
+    ))
+);
+
+COMMENT ON TABLE remediation_events IS
+    'ADR-090-D: MASTER §7.1 #6 Declarative 修復使用率。每次 declarative_remediation 執行寫一筆。';
+
+CREATE INDEX IF NOT EXISTS idx_remediation_events_time
+    ON remediation_events(created_at DESC);
+CREATE INDEX IF NOT EXISTS idx_remediation_events_type
+    ON remediation_events(remediation_type);
+CREATE INDEX IF NOT EXISTS idx_remediation_events_incident
+    ON remediation_events(incident_id) WHERE incident_id IS NOT NULL;
+
+
+-- ═══════════════════════════════════════════════════════════════════
+-- 3. notification_outcomes — 通知成果追蹤
+-- ═══════════════════════════════════════════════════════════════════
+
+CREATE TABLE IF NOT EXISTS notification_outcomes (
+    outcome_id            BIGSERIAL       PRIMARY KEY,
+    incident_id           TEXT,
+    approval_id           TEXT,
+    channel               TEXT            NOT NULL, -- 'telegram' | 'email' | 'slack' | 'webhook'
+    notification_type     TEXT,                     -- TYPE-1/2/3/4/4D/5S/6B/7E/8M
+    recipient             TEXT,                     -- chat_id / email / user
+    message_id            TEXT,                     -- telegram message_id 等
+    sent_at               TIMESTAMPTZ     NOT NULL DEFAULT NOW(),
+    delivery_status       TEXT            NOT NULL, -- 'delivered' | 'failed' | 'pending'
+    delivery_error        TEXT,
+    -- 人類互動追蹤 (RLHF 語料黃金)
+    user_action           TEXT,                     -- 'approved' | 'rejected' | 'silenced' | 'ignored' | 'no_response'
+    user_action_at        TIMESTAMPTZ,
+    user_comment          TEXT,
+    -- 通知品質
+    snoozed_count         INT             NOT NULL DEFAULT 0,
+    time_to_action_sec    INT,                       -- 收到到按鈕按下的秒數
+    metadata              JSONB           NOT NULL DEFAULT '{}'::jsonb,
+    CONSTRAINT notif_channel_valid CHECK (channel IN (
+        'telegram','email','slack','webhook','sms','discord'
+    )),
+    CONSTRAINT notif_delivery_valid CHECK (delivery_status IN (
+        'delivered','failed','pending','rate_limited'
+    ))
+);
+
+COMMENT ON TABLE notification_outcomes IS
+    'ADR-090-D: MASTER §7.1 #10 notification_outcomes 追蹤。每次 telegram_gateway 推送寫一筆,用戶按鈕觸發時 update user_action。';
+
+CREATE INDEX IF NOT EXISTS idx_notification_outcomes_sent
+    ON notification_outcomes(sent_at DESC);
+CREATE INDEX IF NOT EXISTS idx_notification_outcomes_incident
+    ON notification_outcomes(incident_id) WHERE incident_id IS NOT NULL;
+CREATE INDEX IF NOT EXISTS idx_notification_outcomes_approval
+    ON notification_outcomes(approval_id) WHERE approval_id IS NOT NULL;
+CREATE INDEX IF NOT EXISTS idx_notification_outcomes_pending_action
+    ON notification_outcomes(sent_at DESC)
+    WHERE user_action IS NULL AND delivery_status='delivered';
+
+
+-- ═══════════════════════════════════════════════════════════════════
+-- 驗收 (執行後可手動跑)
+-- ═══════════════════════════════════════════════════════════════════
+
+-- SELECT table_name FROM information_schema.tables
+-- WHERE table_schema='public'
+--   AND table_name IN ('finetune_exports','remediation_events','notification_outcomes')
+-- ORDER BY table_name;
+-- 預期: 3 筆
+
+-- SELECT conname FROM pg_constraint WHERE conrelid IN (
+--   'finetune_exports'::regclass,
+--   'remediation_events'::regclass,
+--   'notification_outcomes'::regclass
+-- ) AND contype='c' ORDER BY conname;
--- a/apps/api/src/services/approval_execution.py
+++ b/apps/api/src/services/approval_execution.py
@@ -144,14 +144,57 @@ class ApprovalExecutionService:
        namespace = parsed.namespace

        if operation_type is None or resource_name is None:
+            # 2026-04-19 ogt + Claude Opus 4.7: 區分 NO_ACTION vs 真解析失敗
+            # NO_ACTION 是 AI 刻意選的「純調查不破壞」,不該誤標 EXECUTION_FAILED
+            # 污染 auto_execute 成功率 KPI (MASTER §7.1 #11)
+            _action_upper = (approval.action or "").upper()
+            _is_no_action = (
+                "NO_ACTION" in _action_upper
+                or "NO-ACTION" in _action_upper
+                or "NOACTION" in _action_upper
+                or "(未設)" in approval.action
+                or _action_upper.startswith("OBSERVE")
+                or _action_upper.startswith("INVESTIGATE")
+            )
+
+            if _is_no_action:
+                logger.info(
+                    "background_execution_noop",
+                    approval_id=str(approval.id),
+                    action=approval.action,
+                    reason="NO_ACTION - 純調查/觀察類,不執行破壞動作",
+                )
+                # 標為 SUCCESS (觀察/調查本身就是成功完成)
+                await service.update_execution_status(approval.id, success=True)
+                await timeline.add_event(
+                    event_type="exec",
+                    status="success",
+                    title="✅ 純觀察類動作完成 (NO_ACTION)",
+                    description=f"Action: {approval.action[:120]}",
+                    actor="leWOOOgo",
+                    actor_role="executor",
+                    approval_id=str(approval.id),
+                )
+                # 執行結果 reply 原告警卡片
+                asyncio.create_task(
+                    self._push_execution_result_to_alert(
+                        approval, success=True, error=None,
+                    )
+                )
+                return True  # NO_ACTION 視為成功完成
+
+            # 真解析失敗 (非 NO_ACTION)
            logger.warning(
                "background_execution_skip",
                approval_id=str(approval.id),
                reason="Could not parse operation type from action",
                action=approval.action,
            )
-            # Phase 5: 更新資料庫狀態
-            await service.update_execution_status(approval.id, success=False)
+            # Phase 5: 更新資料庫狀態 + 帶 error_message (P0.2)
+            await service.update_execution_status(
+                approval.id, success=False,
+                error_message=f"Could not parse operation type from action: {approval.action[:150]}",
+            )
            await timeline.add_event(
                event_type="exec",
                status="error",
@@ -453,11 +496,53 @@ class ApprovalExecutionService:
            settings = get_settings()
            gateway = get_telegram_gateway()

+            # 2026-04-19 ogt + Claude Opus 4.7 修 AP-2: 除了 reply 外,
+            # 也 edit 原卡片移除按鈕 + 更新狀態戳記(避免卡片永遠停在「執行中」)
+            try:
+                await gateway._send_request("editMessageReplyMarkup", {
+                    "chat_id": settings.OPENCLAW_TG_CHAT_ID,
+                    "message_id": orig_msg_id,
+                    "reply_markup": {"inline_keyboard": []},
+                })
+            except Exception as _edit_e:
+                logger.debug("push_execution_edit_buttons_failed",
+                             approval_id=str(approval.id), error=str(_edit_e))
+
+            # 附加 KM/Playbook 增量（查最近該 incident 的 KM + playbook 使用）
+            km_info = ""
+            try:
+                from sqlalchemy import text as _sql
+                from src.db.base import get_db_context
+                async with get_db_context() as _db:
+                    _km_row = await _db.execute(
+                        _sql("""SELECT COUNT(*) FROM knowledge_entries
+                                WHERE created_at > NOW() - interval '2 minutes'"""),
+                    )
+                    _km_count = _km_row.scalar() or 0
+                    _pb_row = await _db.execute(
+                        _sql("""SELECT COUNT(*) FROM playbooks
+                                WHERE updated_at > NOW() - interval '2 minutes'"""),
+                    )
+                    _pb_count = _pb_row.scalar() or 0
+                    if _km_count or _pb_count:
+                        km_info = f"\n📚 KM +{_km_count}  🎯 Playbook 更新×{_pb_count}"
+            except Exception:
+                pass
+
            if success:
-                text = f"✅ <b>執行成功</b>\n<code>{(approval.action or '')[:180]}</code>"
+                text = (
+                    f"✅ <b>執行成功</b>\n"
+                    f"<code>{(approval.action or '')[:180]}</code>"
+                    f"{km_info}"
+                )
            else:
                err_short = (error or "未知錯誤")[:150]
-                text = f"❌ <b>執行失敗</b>\n<code>{(approval.action or '')[:180]}</code>\n原因: {err_short}"
+                text = (
+                    f"❌ <b>執行失敗</b>\n"
+                    f"<code>{(approval.action or '')[:180]}</code>\n"
+                    f"原因: {err_short}"
+                    f"{km_info}"
+                )

            await gateway._http_client.post(
                f"https://api.telegram.org/bot{settings.OPENCLAW_TG_BOT_TOKEN}/sendMessage",
--- a/apps/api/src/services/declarative_remediation.py
+++ b/apps/api/src/services/declarative_remediation.py
@@ -166,6 +166,16 @@ class DeclarativeRemediation:
            can_auto=spec.can_auto_execute,
            action=action[:80],
        )
+
+        # 2026-04-18 ADR-090-D: 寫入 remediation_events 表(MASTER §7.1 #6 KPI 資料源)
+        # fire-and-forget,不阻塞主流程
+        try:
+            import asyncio as _a
+            _a.create_task(_log_remediation_event(spec, action, target, namespace))
+        except RuntimeError:
+            # 非 async context (正規呼叫都是 async),靜默跳過
+            pass
+
        return spec


@@ -173,6 +183,54 @@ class DeclarativeRemediation:
 # Helpers
 # ─────────────────────────────────────────────────────────────────────────────

+async def _log_remediation_event(
+    spec: "DeclarativeSpec",
+    action: str,
+    target: str,
+    namespace: str,
+) -> None:
+    """
+    2026-04-18 ADR-090-D: 寫入 remediation_events 表(MASTER §7.1 #6 KPI 資料源)
+
+    每次 DeclarativeRemediation.evaluate() 呼叫後寫一筆 'pending' 記錄。
+    後續實際執行狀態由 approval_execution.py 更新(未來 iteration)。
+    """
+    try:
+        from sqlalchemy import text as _sql
+        from src.db.base import get_db_context
+
+        # remediation_type 判定
+        _rt = "declarative" if spec.can_auto_execute else "imperative"
+        if spec.requires_gitops_pr:
+            _rt = "gitops_pr"
+
+        async with get_db_context() as db:
+            await db.execute(
+                _sql("""
+                    INSERT INTO remediation_events (
+                        remediation_type, action_name, target_resource, namespace,
+                        dry_run, status, blast_radius_score, executed_by,
+                        metadata
+                    ) VALUES (
+                        :rt, :an, :tr, :ns,
+                        :dr, 'pending', :br, 'ai_agent',
+                        CAST(:md AS jsonb)
+                    )
+                """),
+                {
+                    "rt": _rt,
+                    "an": action[:200],
+                    "tr": target[:100] if target else None,
+                    "ns": namespace[:50],
+                    "dr": spec.dry_run_required,
+                    "br": spec.blast_radius_score,
+                    "md": '{"tier":"' + spec.tier + '"}',
+                },
+            )
+    except Exception as _e:
+        logger.warning("remediation_events_db_write_failed", error=str(_e))
+
+
 def _build_constraints(action: str, namespace: str, score: int) -> list[str]:
    """依動作特性建立安全約束清單。"""
    constraints: list[str] = []
--- a/apps/api/src/services/finetune_exporter.py
+++ b/apps/api/src/services/finetune_exporter.py
@@ -50,7 +50,7 @@ from datetime import timedelta
 from pathlib import Path

 import structlog
-from sqlalchemy import and_, select
+from sqlalchemy import and_, select, text as sql_text

 from src.db.base import get_session_factory
 from src.db.models import AgentSession, AutoRepairExecution, IncidentEvidence
@@ -143,6 +143,40 @@ class FineTuneExporter:
            row_count=len(rows),
            path=output_path,
        )
+
+        # 2026-04-18 ADR-090-D: 寫入 finetune_exports 表(MASTER §7.1 #3 KPI 資料源)
+        try:
+            import hashlib, os
+            _size = os.path.getsize(output_path) if output_path and os.path.exists(output_path) else None
+            _checksum = None
+            if output_path and os.path.exists(output_path):
+                with open(output_path, 'rb') as _f:
+                    _checksum = hashlib.sha256(_f.read()).hexdigest()
+            _ids = [str(ev.id) for ev in evidences]
+            async with session_factory() as _db:
+                await _db.execute(
+                    sql_text("""
+                        INSERT INTO finetune_exports (
+                            export_type, source_table, source_ids,
+                            file_path, record_count, size_bytes, checksum_sha256,
+                            metadata
+                        ) VALUES (
+                            'evidence_snapshot', 'incident_evidence', :ids,
+                            :fp, :rc, :sz, :cs, CAST(:md AS jsonb)
+                        )
+                    """),
+                    {
+                        "ids": _ids,
+                        "fp": output_path,
+                        "rc": len(rows),
+                        "sz": _size,
+                        "cs": _checksum,
+                        "md": json.dumps({"lookback_days": EXPORT_LOOKBACK_DAYS}),
+                    },
+                )
+        except Exception as _db_e:
+            logger.warning("finetune_exports_db_write_failed", error=str(_db_e))
+
        return output_path, len(rows)

    async def _build_row(self, db, ev: IncidentEvidence) -> dict | None:
--- a/apps/api/src/services/incident_service.py
+++ b/apps/api/src/services/incident_service.py
@@ -184,6 +184,40 @@ def classify_alert_early(alertname: str, severity: str, labels: dict | None = No
    ):
        return "backup", "TYPE-1"

+    # 2026-04-18 ogt + Claude Opus 4.7: 擴規則降 general 兜底（MASTER §7.1 #7 <10%）
+    # 根據 7d 實測 general 17 種 alertname 整理:
+    #
+    # 5.1 測試告警攔截（避免污染生產指標）
+    #     TestAlert / FingerprintTest / E2ETestAlert / ADR089Test / L4ClosureLoop
+    #     FP[A-Z]... / *FreshUniq* → test category (TYPE-1 純通知)
+    if (
+        alertname.startswith(("Test", "FingerprintTest", "ADR089", "L4Closure", "FPTest"))
+        or "FreshUniq" in alertname
+        or alertname in ("E2ETestAlert",)
+        or alertname.startswith("FP") and alertname[2:3].isupper()  # FPTestB, FPTestA
+    ):
+        return "test", "TYPE-1"
+
+    # 5.2 HighCPU / HighMemory / 其他 High* 主機資源類
+    if alertname.startswith(("HighCPU", "HighMemory", "HighMem", "HighDisk", "HighLoad")):
+        return "host_resource", "TYPE-3"
+
+    # 5.3 TLS / SSL / ProbeFailure → ssl_cert 或 external_site
+    if (
+        alertname.startswith(("TLS", "SSL", "Certificate"))
+        or "ProbeFailure" in alertname
+        or alertname in ("TestConnectivity",)  # ProbeFailure 同義
+    ):
+        return "ssl_cert", "TYPE-3"
+
+    # 5.4 PostgreSQL 詳盡（補 PostgreSQL* 變體,原 rule 用 startswith("Postgres")
+    #     按理涵蓋 PostgreSQLDiskGrowthRate 但實測落 general → 加保險規則）
+    if (
+        alertname.startswith(("PostgreSQL", "MySQL", "MongoDB"))
+        or "DiskGrowthRate" in alertname
+    ):
+        return "database", "TYPE-3"
+
    # 6. 主機資源（從 infrastructure 分離，ADR-075 統帥決議）
    if alertname.startswith("Host"):
        return "host_resource", "TYPE-3"
--- a/apps/api/src/services/openclaw.py
+++ b/apps/api/src/services/openclaw.py
@@ -1144,6 +1144,77 @@ class OpenClawService:

        return None

+    def _validate_deployment_inventory(
+        self,
+        result: "OpenClawDecision | None",
+        k8s_inventory: str,
+        k8s_ns: str,
+    ) -> None:
+        """
+        2026-04-19 ogt + Claude Opus 4.7 (抽取自 analyze_alert):
+        幻覺 deployment 名偵測與降級。雙路徑共用(analyze_alert + generate_incident_proposal)。
+
+        根因: NEMOTRON 即使 prompt 含 inventory 仍會拿 namespace 當 deployment 名
+              → 執行 kubectl rollout restart deployment/awoooi-prod → "not found"
+        修復: 正則抽出 kubectl 指令的 deployment 名,對照 inventory 白名單;
+              不在白名單 → 降級為 NO_ACTION + 轉純調查 get deploy + 信心 0。
+        """
+        if not result or not k8s_inventory:
+            return
+        _inventory_names = {n.strip() for n in k8s_inventory.split(",") if n.strip()}
+        if not _inventory_names:
+            return
+        _kcmd = (result.kubectl_command or "").lower()
+        import re as _re
+        _m = _re.search(r"deployment[/\s]+([a-z0-9][a-z0-9-]*)", _kcmd)
+        if not _m:
+            return
+        _deploy_guess = _m.group(1)
+        if _deploy_guess in _inventory_names:
+            return
+
+        logger.warning(
+            "openclaw_deployment_hallucination_detected",
+            hallucinated=_deploy_guess,
+            inventory=sorted(_inventory_names),
+            original_kubectl_cmd=result.kubectl_command,
+            original_action=(
+                result.suggested_action.value
+                if hasattr(result.suggested_action, "value")
+                else str(result.suggested_action)
+            ),
+            namespace=k8s_ns,
+        )
+        # 降級為安全調查動作,不執行破壞性操作
+        try:
+            result.kubectl_command = f"kubectl get deploy -n {k8s_ns}"
+        except Exception:
+            pass
+        try:
+            result.target_resource = "unknown(hallucinated)"
+        except Exception:
+            pass
+        try:
+            result.suggested_action = SuggestedAction.NO_ACTION
+        except Exception:
+            pass
+        try:
+            result.action_title = f"[安全降級] 調查 {k8s_ns} 真實資源狀態"
+        except Exception:
+            pass
+        try:
+            result.description = (
+                f"[安全降級] 原 LLM 建議的 deployment '{_deploy_guess}' 不在叢集 inventory "
+                f"({', '.join(sorted(_inventory_names))})。"
+                f"已降級為純調查動作(kubectl get deploy),請手動確認實際問題資源。"
+            )
+        except Exception:
+            pass
+        try:
+            result.confidence = 0.0
+        except Exception:
+            pass
+
    def _parse_analysis_result(self, raw_response: str) -> OpenClawDecision | None:
        """
        解析 LLM 分析結果 - 使用 Pydantic Schema Enforcement
@@ -1198,7 +1269,12 @@ class OpenClawService:
                data["confidence"] = 0.0  # 截斷/缺失 → 0.0，不可偽造
            if "risk_level" not in data:
                data["risk_level"] = "low"
-            if "primary_responsibility" not in data:
+            # 2026-04-19 ogt + Claude Opus 4.7 修 AP-3:
+            # primary_responsibility 有時 LLM 填空字串/None → resp_display 顯示「❓ 未知」
+            # 強制正規化: 空/None/不在白名單 → 用 kubectl 有無推 INFRA 或 BE (非「未知」)
+            _valid_resp = {"FE", "BE", "INFRA", "DB", "COLLAB"}
+            _cur_resp = str(data.get("primary_responsibility") or "").strip().upper()
+            if _cur_resp not in _valid_resp:
                data["primary_responsibility"] = "INFRA" if "kubectl" in str(data) else "BE"
            if "suggested_action" not in data:
                data["suggested_action"] = "RESTART_DEPLOYMENT" if "restart" in str(data).lower() else "NO_ACTION"
@@ -1322,44 +1398,8 @@ Trace URL: {signoz_trace_url}
        # 解析結果
        result = self._parse_analysis_result(raw_response)

-        # 2026-04-18 ogt + Claude Opus 4.7: 幻覺 deployment 名偵測與降級 (Checkpoint-3)
-        # 根因: NEMOTRON 即使 prompt 有 inventory 仍會拿 namespace "awoooi-prod" 當 deployment 名
-        #       → 執行時 kubectl rollout restart deployment/awoooi-prod → "not found"
-        # 修復: LLM 回應後 Python 驗證 kubectl_command 中的 deployment 名是否在 inventory
-        #       不在 → 降級為 NO_ACTION + 改成投查 kubectl get deploy(無破壞,只排查)
-        if result and _k8s_inventory:
-            _inventory_names = {n.strip() for n in _k8s_inventory.split(",") if n.strip()}
-            _kcmd = (result.kubectl_command or "").lower()
-            import re as _re
-            _m = _re.search(r"deployment[/\s]+([a-z0-9][a-z0-9-]*)", _kcmd)
-            if _m:
-                _deploy_guess = _m.group(1)
-                if _deploy_guess not in _inventory_names:
-                    logger.warning(
-                        "openclaw_deployment_hallucination_detected",
-                        hallucinated=_deploy_guess,
-                        inventory=sorted(_inventory_names),
-                        original_kubectl_cmd=result.kubectl_command,
-                        original_action=result.suggested_action.value if hasattr(result.suggested_action, 'value') else str(result.suggested_action),
-                    )
-                    # 降級為安全調查動作,不執行破壞性操作
-                    result.kubectl_command = f"kubectl get deploy -n {_k8s_ns}"
-                    result.target_resource = "unknown(hallucinated)"
-                    # Pydantic enum 處理 — SuggestedAction 已在檔頂 import (line 34)
-                    try:
-                        result.suggested_action = SuggestedAction.NO_ACTION
-                    except Exception:
-                        pass
-                    result.description = (
-                        f"[安全降級] 原 LLM 建議的 deployment '{_deploy_guess}' 不在叢集 inventory "
-                        f"({', '.join(sorted(_inventory_names))})。"
-                        f"已降級為純調查動作,請手動確認實際問題資源。"
-                    )
-                    # 信心度歸零
-                    try:
-                        result.confidence = 0.0
-                    except Exception:
-                        pass
+        # 2026-04-18 → 2026-04-19: 幻覺 deployment 名偵測與降級 (共用 helper)
+        self._validate_deployment_inventory(result, _k8s_inventory, _k8s_ns)

        if result:
            logger.info(
@@ -1551,6 +1591,15 @@ Focus on:
        # 解析 LLM 結果
        result = self._parse_analysis_result(raw_response)

+        # 2026-04-19 ogt + Claude Opus 4.7: 同 analyze_alert 也需幻覺驗證
+        # 此路徑沒有 inventory 預抓,動態抓
+        _k8s_ns_for_validate = alert_context.get("namespace", "awoooi-prod") if "alert_context" in dir() else "awoooi-prod"
+        try:
+            _k8s_inv = await _fetch_k8s_inventory_for_openclaw(namespace=_k8s_ns_for_validate)
+        except Exception:
+            _k8s_inv = ""
+        self._validate_deployment_inventory(result, _k8s_inv, _k8s_ns_for_validate)
+
        if result:
            logger.info(
                "proposal_generation_complete",
--- a/apps/api/src/services/pre_decision_investigator.py
+++ b/apps/api/src/services/pre_decision_investigator.py
@@ -265,6 +265,9 @@ class PreDecisionInvestigator:
        tool_name = reg.tool.name
        snapshot.mcp_health[tool_name] = False  # 預設失敗，成功後覆蓋

+        _started = asyncio.get_event_loop().time()
+        _mcp_status = "failed"
+        _mcp_error = None
        try:
            result = await asyncio.wait_for(
                reg.provider.execute(tool_name, params),
@@ -277,10 +280,12 @@ class PreDecisionInvestigator:
                    tool=tool_name,
                    error=result.error,
                )
+                _mcp_error = str(result.error)[:200] if result.error else "unknown"
                return

            snapshot.mcp_health[tool_name] = True
            snapshot.sensors_succeeded += 1
+            _mcp_status = "success"

            # 依感官維度填入對應欄位
            raw = result.output
@@ -288,8 +293,73 @@ class PreDecisionInvestigator:

        except asyncio.TimeoutError:
            logger.warning("investigator_tool_timeout", tool=tool_name, timeout=MCP_TOOL_TIMEOUT_SEC)
-        except Exception:
+            _mcp_status = "timeout"
+            _mcp_error = f"timeout {MCP_TOOL_TIMEOUT_SEC}s"
+        except Exception as _e:
            logger.exception("investigator_tool_error", tool=tool_name)
+            _mcp_status = "error"
+            _mcp_error = str(_e)[:200]
+        finally:
+            # 2026-04-18 ADR-090-D: MCP 呼叫入 timeline_events(MASTER §7.1 #4 KPI)
+            try:
+                _duration_ms = int((asyncio.get_event_loop().time() - _started) * 1000)
+                asyncio.create_task(_log_mcp_call_to_timeline(
+                    snapshot_incident_id=getattr(snapshot, "incident_id", None),
+                    provider_name=reg.provider.name,
+                    tool_name=tool_name,
+                    status=_mcp_status,
+                    error=_mcp_error,
+                    duration_ms=_duration_ms,
+                ))
+            except Exception:
+                pass
+
+
+async def _log_mcp_call_to_timeline(
+    snapshot_incident_id: str | None,
+    provider_name: str,
+    tool_name: str,
+    status: str,
+    error: str | None,
+    duration_ms: int,
+) -> None:
+    """
+    2026-04-18 ADR-090-D: MCP 呼叫寫入 timeline_events,支援 MASTER §7.1 #4
+    "MCP 呼叫次數/24h > 0" KPI 量測。
+    """
+    try:
+        from sqlalchemy import text as _sql
+        from src.db.base import get_db_context
+        import json as _json
+        _description = _json.dumps({
+            "provider": provider_name,
+            "tool": tool_name,
+            "status": status,
+            "error": error,
+            "duration_ms": duration_ms,
+        }, ensure_ascii=False)
+        async with get_db_context() as _db:
+            await _db.execute(
+                _sql("""
+                    INSERT INTO timeline_events (
+                        incident_id, event_type, status, title, description, actor,
+                        actor_role, created_at
+                    ) VALUES (
+                        :iid, 'mcp_call', :st, :tl, :desc, :actor,
+                        'mcp', NOW()
+                    )
+                """),
+                {
+                    "iid": snapshot_incident_id or "unknown",
+                    "st": status,
+                    "tl": f"MCP {provider_name}.{tool_name}"[:100],
+                    "desc": _description[:500],
+                    "actor": provider_name[:50],
+                },
+            )
+    except Exception:
+        # 靜默失敗,timeline_events 是稽核,不能反噬 MCP 主流程
+        pass


 # ─────────────────────────────────────────────────────────────────────────────
--- a/apps/api/src/services/telegram_gateway.py
+++ b/apps/api/src/services/telegram_gateway.py
@@ -1688,6 +1688,64 @@ class TelegramGateway:
            message_id=_msg_id,
        )

+        # 2026-04-18 ADR-090-D: 寫入 notification_outcomes (MASTER §7.1 #10 KPI)
+        try:
+            from sqlalchemy import text as _sql
+            from src.db.base import get_db_context
+            _delivered = "delivered" if _msg_id else "failed"
+            _notif_type = f"TYPE-3-{alert_category}" if alert_category else "TYPE-3"
+            async with get_db_context() as _db:
+                await _db.execute(
+                    _sql("""
+                        INSERT INTO notification_outcomes (
+                            approval_id, channel, notification_type, recipient,
+                            message_id, delivery_status, metadata
+                        ) VALUES (
+                            :aid, 'telegram', :nt, :rp,
+                            :mid, :ds, CAST(:md AS jsonb)
+                        )
+                    """),
+                    {
+                        "aid": approval_id,
+                        "nt": _notif_type,
+                        "rp": str(settings.OPENCLAW_TG_CHAT_ID),
+                        "mid": str(_msg_id) if _msg_id else None,
+                        "ds": _delivered,
+                        "md": '{"risk_level":"' + str(risk_level) + '"}',
+                    },
+                )
+        except Exception as _db_e:
+            logger.warning("notification_outcomes_db_write_failed", error=str(_db_e))
+
+        # 2026-04-19 ogt + Claude Opus 4.7: 修 AP-1 — message_id 同時存進
+        # approval_records.telegram_message_id,不只 Redis(重啟會丟)
+        if _msg_id:
+            try:
+                from src.services.approval_db import get_approval_service
+                _svc = get_approval_service()
+                if hasattr(_svc, "update_telegram_message"):
+                    # 若有 update_telegram_message 方法(通常用 incident_id)
+                    # 先用 incident_id 更新,再 fallback 直接 UPDATE approval_records
+                    from sqlalchemy import text as _sql2
+                    from src.db.base import get_db_context as _gdc
+                    async with _gdc() as _db2:
+                        await _db2.execute(
+                            _sql2("""
+                                UPDATE approval_records
+                                SET telegram_message_id = :mid,
+                                    telegram_chat_id = :cid
+                                WHERE id = :aid
+                            """),
+                            {
+                                "mid": int(_msg_id),
+                                "cid": int(settings.OPENCLAW_TG_CHAT_ID),
+                                "aid": str(approval_id),
+                            },
+                        )
+            except Exception as _db_e2:
+                logger.warning("approval_tg_msg_id_db_persist_failed",
+                               approval_id=str(approval_id), error=str(_db_e2))
+
        # 2026-04-10 Claude Sonnet 4.6 Asia/Taipei: 儲存 message_id 供自動修復後更新卡片
        # key: tg_approval:{approval_id}，TTL 24h
        if _msg_id:
@@ -1935,7 +1993,7 @@ class TelegramGateway:
            ]
        }

-        return await self._send_request(
+        _result = await self._send_request(
            "sendMessage",
            {
                "chat_id": settings.OPENCLAW_TG_CHAT_ID,
@@ -1945,6 +2003,176 @@ class TelegramGateway:
            },
        )

+        # 2026-04-19 ogt + Claude Opus 4.7: 修 TG-4 存 drift message_id 到 Redis
+        # 供 drift_adopt/drift_revert 執行後 edit 回原卡片
+        try:
+            _msg_id = _result.get("result", {}).get("message_id")
+            if _msg_id:
+                await get_redis().setex(
+                    f"tg_drift:{incident_id}", 86400, str(_msg_id)
+                )
+        except Exception as _e:
+            logger.warning("tg_drift_msg_id_store_failed", incident_id=incident_id, error=str(_e))
+
+        return _result
+
+    # =========================================================================
+    # 2026-04-19 ogt + Claude Opus 4.7: drift_* 按鈕 handler (修 TG-2)
+    # =========================================================================
+
+    async def _handle_drift_action(
+        self,
+        action: str,
+        approval_id: str,
+        callback_query_id: str,
+        user_id: int,
+        username: str,
+        user: dict,
+    ) -> dict:
+        """
+        處理 drift_view / drift_adopt / drift_revert 按鈕。
+        approval_id 在 drift card 即 report_id (send_drift_card 設計)。
+        """
+        report_id = approval_id
+        logger.info(
+            "drift_callback_dispatched",
+            action=action, report_id=report_id,
+            user_id=user_id, username=username,
+        )
+        try:
+            if action == "drift_view":
+                await self._answer_callback(callback_query_id, action, text="🔍 撈全部 Diff...")
+                await self._send_drift_diff_detail(report_id)
+                return {
+                    "action": action, "approval_id": approval_id,
+                    "user": user, "success": True, "info_action": True,
+                }
+
+            if action == "drift_adopt":
+                await self._answer_callback(callback_query_id, action, text="✅ 採納中...")
+                try:
+                    from src.services.drift_adopt_service import get_drift_adopt_service
+                    _adopt_result = await get_drift_adopt_service().adopt_drift(report_id)
+                    _ok = bool(_adopt_result.get("success") if isinstance(_adopt_result, dict) else _adopt_result)
+                except Exception as _e:
+                    logger.warning("drift_adopt_failed", report_id=report_id, error=str(_e))
+                    _ok = False
+                await self._edit_drift_card_outcome(
+                    report_id=report_id, verb="已採納", by=username, ok=_ok,
+                )
+                return {"action": action, "approval_id": approval_id, "user": user, "success": _ok}
+
+            if action == "drift_revert":
+                await self._answer_callback(callback_query_id, action, text="⏪ 回滾中...")
+                try:
+                    from src.services.drift_remediator import get_drift_remediator
+                    _revert_result = await get_drift_remediator().revert(report_id)
+                    _ok = bool(_revert_result.get("success") if isinstance(_revert_result, dict) else _revert_result)
+                except Exception as _e:
+                    logger.warning("drift_revert_failed", report_id=report_id, error=str(_e))
+                    _ok = False
+                await self._edit_drift_card_outcome(
+                    report_id=report_id, verb="已回滾", by=username, ok=_ok,
+                )
+                return {"action": action, "approval_id": approval_id, "user": user, "success": _ok}
+
+        except Exception as _outer:
+            logger.exception("drift_action_handler_error", action=action, error=str(_outer))
+
+        return {"action": action, "approval_id": approval_id, "user": user, "success": False}
+
+    async def _send_drift_diff_detail(self, report_id: str) -> None:
+        """
+        送完整 Drift Diff 到 Telegram (drift_view 按鈕回應)
+        展示全部 items (含 HIGH + MEDIUM + 可操作+trivial 分群)
+        """
+        try:
+            from src.repositories.drift_repository import get_drift_repository
+            _rpt = await get_drift_repository().get_by_id(report_id)
+            if not _rpt:
+                await self._send_request("sendMessage", {
+                    "chat_id": settings.OPENCLAW_TG_CHAT_ID,
+                    "text": f"⚠️ 找不到 Drift report <code>{html.escape(report_id)}</code>",
+                    "parse_mode": "HTML",
+                })
+                return
+
+            _lines = [f"📊 <b>完整 Drift Diff</b> — <code>{html.escape(report_id)}</code>"]
+            _lines.append(f"Namespace: <code>{html.escape(_rpt.namespace)}</code>")
+            _lines.append(f"HIGH×{_rpt.high_count}  MEDIUM×{_rpt.medium_count}  INFO×{_rpt.info_count}")
+            _lines.append("━" * 20)
+            for i, _item in enumerate(_rpt.items[:50], 1):
+                _level = getattr(_item.drift_level, "value", str(_item.drift_level))
+                _emoji = "🔴" if _level == "high" else ("🟡" if _level == "medium" else "⚪")
+                _field = (_item.field_path or "")[:80]
+                _git = str(_item.git_value)[:40] if _item.git_value is not None else "(未設)"
+                _k8s = str(_item.actual_value)[:40] if _item.actual_value is not None else "(未設)"
+                _lines.append(f"{_emoji} <b>{html.escape(_field)}</b>")
+                _lines.append(f"   Git: <code>{html.escape(_git)}</code>")
+                _lines.append(f"   K8s: <code>{html.escape(_k8s)}</code>")
+            if len(_rpt.items) > 50:
+                _lines.append(f"… 還有 {len(_rpt.items) - 50} 項未顯示")
+
+            _full = "\n".join(_lines)
+            # Telegram 訊息上限 4096 字元
+            if len(_full) > 4000:
+                _full = _full[:3950] + "\n… (截斷)"
+
+            await self._send_request("sendMessage", {
+                "chat_id": settings.OPENCLAW_TG_CHAT_ID,
+                "text": _full,
+                "parse_mode": "HTML",
+                "disable_web_page_preview": True,
+            })
+        except Exception as _e:
+            logger.warning("drift_diff_detail_send_failed", report_id=report_id, error=str(_e))
+            await self._send_request("sendMessage", {
+                "chat_id": settings.OPENCLAW_TG_CHAT_ID,
+                "text": f"⚠️ Drift Diff 查詢失敗: <code>{html.escape(str(_e)[:150])}</code>",
+                "parse_mode": "HTML",
+            })
+
+    async def _edit_drift_card_outcome(
+        self, report_id: str, verb: str, by: str, ok: bool,
+    ) -> None:
+        """
+        drift_adopt/drift_revert 執行後:
+          1. 原卡片移除按鈕（用 editMessageReplyMarkup）
+          2. 在原卡片下 reply 執行結果訊息（包含 verb/by/成功失敗）
+        """
+        _icon = "✅" if ok else "❌"
+        _stamp = (
+            f"{_icon} <b>{html.escape(verb)}</b> by @{html.escape(by)} "
+            f"({'成功' if ok else '失敗'})\n"
+            f"Drift <code>{html.escape(report_id)}</code>"
+        )
+        _msg_id: int | None = None
+        try:
+            _msg_id_raw = await get_redis().get(f"tg_drift:{report_id}")
+            if _msg_id_raw:
+                _msg_id = int(_msg_id_raw)
+                # 先移除按鈕
+                await self._send_request("editMessageReplyMarkup", {
+                    "chat_id": settings.OPENCLAW_TG_CHAT_ID,
+                    "message_id": _msg_id,
+                    "reply_markup": {"inline_keyboard": []},
+                })
+        except Exception as _e:
+            logger.warning("drift_card_buttons_remove_failed", report_id=report_id, error=str(_e))
+
+        # 送簽核戳訊息（reply_to 原卡片,若有 msg_id）
+        try:
+            _payload: dict = {
+                "chat_id": settings.OPENCLAW_TG_CHAT_ID,
+                "text": _stamp,
+                "parse_mode": "HTML",
+            }
+            if _msg_id:
+                _payload["reply_to_message_id"] = _msg_id
+            await self._send_request("sendMessage", _payload)
+        except Exception as _e:
+            logger.warning("drift_outcome_stamp_send_failed", report_id=report_id, error=str(_e))
+
    # =========================================================================
    # ADR-075: TYPE-8M Meta-System 告警（飛輪/告警鏈路健康）
    # 2026-04-12 ogt
@@ -2722,6 +2950,21 @@ class TelegramGateway:
            if guard_result is not None:
                return guard_result

+            # ===================================================================
+            # Step 1.85: 2026-04-19 ogt + Claude Opus 4.7 — drift_* 按鈕直接處理
+            # 修 Telegram 子系統 bug TG-2: drift_view/drift_adopt/drift_revert
+            # 過去無 handler → 按下永遠「執行中」/ fallthrough 誤觸發 approve
+            # ===================================================================
+            if action in ("drift_view", "drift_adopt", "drift_revert"):
+                return await self._handle_drift_action(
+                    action=action,
+                    approval_id=approval_id,  # 本身即 report_id
+                    callback_query_id=callback_query_id,
+                    user_id=user_id,
+                    username=username,
+                    user=user,
+                )
+
            # ===================================================================
            # Step 1.9: Phase 5 Sprint 5.3 — 分類按鈕寫類 action 路由
            # 2026-04-14 Claude Sonnet 4.6
--- a/k8s/awoooi-prod/kustomization.yaml
+++ b/k8s/awoooi-prod/kustomization.yaml
@@ -39,7 +39,7 @@ resources:
 images:
 - name: 192.168.0.110:5000/library/api:IMAGE_TAG_PLACEHOLDER
  newName: 192.168.0.110:5000/awoooi/api
-  newTag: 6ad73b48345326756677d98e17bfaf72eec74f9d
+  newTag: 98aef55b3176827f9d4edfa47a70f6ba586af688
 - name: 192.168.0.110:5000/library/web:IMAGE_TAG_PLACEHOLDER
  newName: 192.168.0.110:5000/awoooi/web
-  newTag: 6ad73b48345326756677d98e17bfaf72eec74f9d
+  newTag: 98aef55b3176827f9d4edfa47a70f6ba586af688
Author	SHA1	Message	Date
OG T	4b8be32610	fix(telegram+approval): TG-1 + AP-1/2/3 — 4 修 Telegram UX Some checks failed CD Pipeline / build-and-deploy (push) Failing after 25m27s Details Ansible Lint / lint (push) Has been cancelled Details 2026-04-19 凌晨（台北時區）— ogt + Claude Opus 4.7 (1M) ## TG-1: INFO_ACTIONS 加 view security_interceptor.py — 'view' 按鈕現在走 2-part 讀格式, 不再誤觸發 4-part nonce 寫格式。 ## AP-1: approval_records.telegram_message_id 持久化 telegram_gateway.send_approval_card send 成功後,在 DB 層 UPDATE approval_records SET telegram_message_id, telegram_chat_id (不只 Redis, Pod 重啟仍可找回原卡片)。 ## AP-2: approval 執行完成原卡片 edit + KM/Playbook 增量 approval_execution._push_execution_result_to_alert 除了 reply 原卡片, 還 editMessageReplyMarkup 移除按鈕（修「永遠執行中」卡片問題）。 - 同步查 knowledge_entries/playbooks 2min 內增量,附加到訊息顯示 "📚 KM +N 🎯 Playbook 更新×M" - 成功: ✅ 執行成功 + action + KM 增量 - 失敗: ❌ 執行失敗 + 原因 + KM 增量 ## AP-3: primary_responsibility 正規化降「❓ 未知」比例 openclaw._parse_analysis_result: 若 LLM 填空/None/不在白名單 (FE/BE/INFRA/DB/COLLAB),強制 fallback: kubectl 關鍵字有 → INFRA, 否則 BE。之前只檢查 "not in data" 但 None 或空字串會穿過。 ## 跳過: TG-3 (refactor) + TG-5 (webhook 為棄用 endpoint,design 採 Long Polling) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 01:15:58 +08:00
OG T	68a42a3c97	fix(openclaw): 幻覺驗證雙路徑覆蓋 + 抽出共用 helper Some checks failed CD Pipeline / build-and-deploy (push) Has been cancelled Details 2026-04-19 凌晨（台北時區）— ogt + Claude Opus 4.7 (1M) 根因: commit `7e9448f` 的 Python hallucination validator 只裝在 `analyze_alert` (webhook path),但 incident sweeper 走 `generate_incident_proposal` (line 1552) 沒裝驗證 → 00:23 PostgreSQLDiskGrowthRate 卡片出現 "deployment/awoooi-prod" 幻覺未攔截。修: 1. 抽出 `_validate_deployment_inventory(result, inventory, ns)` 共用方法 2. `analyze_alert` (line 1322 area) 呼叫此 helper — 原行內邏輯消除 3. `generate_incident_proposal` (line 1552) 動態抓 inventory + 呼叫 helper 4. helper 補: - result.action_title = '[安全降級] 調查 {ns} 真實資源狀態' (之前只改 description,action_title 沒變 → DB action 欄位仍殘留舊文字) - 每個欄位賦值 try/except 保底,單欄失敗不影響其他 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 01:11:09 +08:00
OG T	fdce0a3ab9	fix(approval): NO_ACTION 不再誤標 EXECUTION_FAILED (MASTER §7.1 #11 修) Some checks failed CD Pipeline / build-and-deploy (push) Has been cancelled Details 2026-04-19 凌晨（台北時區）— ogt + Claude Opus 4.7 (1M) 根因: approval.action='NO_ACTION - 待分析' (幻覺 validator 降級產物) 丟進 parse_operation_from_action → operation_type=None → background_execution_skip → update_execution_status(success=False) → 標為 EXECUTION_FAILED。污染 KPI: MASTER §7.1 #11 auto_execute 成功率 = EXECUTION_SUCCESS / (SUCCESS+FAILED) NO_ACTION 本來就不該計入失敗,但卻被算進去拖垮指標。實測 30d 成功率 0.9% 有很大比例是 NO_ACTION 誤標造成。修復: parse 失敗時先判斷是否 NO_ACTION 類 (action 含 NO_ACTION/OBSERVE/INVESTIGATE 等關鍵字) → 走專屬 noop 分支: - log event=background_execution_noop (info 級) - update_execution_status(success=True) → EXECUTION_SUCCESS - timeline 標 ✅ 純觀察類動作完成 - reply 原告警卡片顯示成功 - return True 真正解析失敗 (非 NO_ACTION) 保留原失敗路徑,但補上 error_message (P0.2 延伸),讓 rejection_reason 有 "Could not parse operation type from action: <action>" 而非空字串。 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 01:08:16 +08:00
OG T	2e988bdb81	fix(telegram): drift 執行結果貼回卡片 + audit log user_id Some checks failed CD Pipeline / build-and-deploy (push) Has been cancelled Details IDE 抓到 _stamp 未使用(結果沒送)+ user_id 未使用(audit 缺漏)。修: 1. _edit_drift_card_outcome 不只移除按鈕,還 send 簽核戳訊息 (reply_to 原卡片,若 msg_id 存在),格式: ✅ 已採納 by @username (成功) Drift <report_id> 2. _handle_drift_action 加 drift_callback_dispatched log(audit) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 01:07:13 +08:00
OG T	877c8479e0	fix(telegram): TG-2 + TG-4 修 drift 按鈕 black hole Some checks failed CD Pipeline / build-and-deploy (push) Has been cancelled Details 2026-04-19 凌晨（台北時區）— ogt + Claude Opus 4.7 (1M) 統帥截圖直擊: 按「查看 Diff」→ 變成「執行中」,且看不到還有 21 項。全景盤點發現 9 個 Telegram 子系統 bug,本 commit 修 2 個最痛的: ## TG-2: drift_view/drift_adopt/drift_revert 3 按鈕無 handler 點擊 → fallthrough → UX 黑洞 / 誤觸發 approve 路徑。修復: handle_callback 在 state guard 後(line 2752 後)加 Step 1.85 offroute: 3 個 drift_* action → _handle_drift_action 專職處理, 不走 nonce approve/reject dispatch,避免誤觸發執行流。 3 個按鈕實作: - drift_view: 讀 drift_reports → 送新訊息展示全部 items (HIGH/MEDIUM/INFO emoji + Git vs K8s 原值對照,上限 50 項 4000 字) - drift_adopt: 呼叫 drift_adopt_service.adopt_drift() - drift_revert: 呼叫 drift_remediator.revert() ## TG-4: drift card message_id 沒存 Redis → edit 回不了卡片修復: send_drift_card 成功後 setex f"tg_drift:{incident_id}" TTL 24h, 供 _edit_drift_card_outcome 在 adopt/revert 執行後更新原卡片(先移除按鈕 + 加「XX by @username (成功/失敗)」簽核戳)。 ## 未包含(follow-up): TG-1 INFO_ACTIONS 擴充(view) — 下一 commit TG-3 handler 重複分派 — 評估中 TG-5 Bot webhook URL 未設 — 需統帥決策公開 URL approval card NO_ACTION 誤標 FAILED — 下一 commit approval card description 矛盾 / responsibility 未知 / 執行後 edit 全景 9 bug 清單詳見 project_phase7_round3_telegram_subsystem_audit(待建)。 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 01:06:30 +08:00
AWOOOI CD	41e6b503e2	chore(cd): deploy `98aef55` [skip ci]	2026-04-18 16:11:01 +00:00
OG T	98aef55b31	feat(kpi): ADR-090-D MASTER §7.1 北極星 KPI 5 斷鏈全修 Some checks failed CD Pipeline / build-and-deploy (push) Successful in 11m49s Details run-migration / migrate (push) Failing after 15s Details 2026-04-18 晚（台北時區）— ogt + Claude Opus 4.7 (1M) MASTER §7.1 15 個北極星 KPI 實測對標發現 5 個斷鏈: #3 fine-tune JSONL /week — finetune_exports 表不存在 #4 MCP 呼叫/24h — timeline_events 沒 mcp_call event_type #6 Declarative 修復使用率 — remediation_events 表不存在 #7 general 兜底 17.3% — classify_alert_early 漏 5 類 #10 notification_outcomes /week — 表不存在本 commit 全修。 ## 1. Migration: adr090d_kpi_data_sources.sql (3 張表) - finetune_exports — P3 Fine-tune JSONL 追蹤 - remediation_events — P5 Declarative 修復追蹤 - notification_outcomes — 通知品質 + RLHF 語料 Idempotent (CREATE TABLE IF NOT EXISTS), 已 apply 進 prod。 ## 2. classify_alert_early 擴 4 類規則 (降 general 兜底) - test 攔截: Test/FPTest/FingerprintTest/ADR089Test/L4Closure/FreshUniq* → category='test', TYPE-1 純通知 - HighCPU/Memory/Disk/Load → host_resource - TLS/SSL/ProbeFailure* → ssl_cert - PostgreSQL/MySQL/MongoDB/DiskGrowthRate → database 預期 general 17.3% → 3-5% (達標 <10%)。 ## 3. finetune_exporter DB 寫入 _run_export() 結尾寫 finetune_exports 一筆,含 checksum/size/record_count。 ## 4. declarative_remediation DB 寫入 evaluate() 後 fire-and-forget _log_remediation_event() 寫 remediation_events (status='pending', remediation_type 依 tier 自動判為 declarative/imperative/gitops_pr)。 ## 5. telegram_gateway DB 寫入 (send_approval_card) _send_request 成功返回 message_id 後寫 notification_outcomes 一筆, channel='telegram', delivery_status='delivered\|failed'。未來人類按鈕時 update user_action → RLHF 訓料黃金。 ## 6. pre_decision_investigator MCP 呼叫追蹤 _call_single_tool() finally 寫 timeline_events event_type='mcp_call', 含 provider/tool/status/duration_ms/error。24h 內 MCP 呼叫可 SQL 量測。 ## 預期量化改善 \| KPI \| 修前 \| 修後 24h 後應見 \| \|-----\|------\|----------------\| \| #3 fine-tune /week \| 0 (表不存在) \| >=10 (每週 cron 跑) \| \| #4 MCP 呼叫/24h \| 0 \| >0 (實測將寫 timeline) \| \| #6 declarative 使用率 \| 表不存在 \| 有資料 (pending/success/failed 分佈) \| \| #7 general 兜底 \| 17.3% \| <10% \| \| #10 notification_outcomes \| 0 \| 每次 approval card 寫一筆 \| Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 00:00:31 +08:00
AWOOOI CD	805230436d	chore(cd): deploy `898145d` [skip ci]	2026-04-18 15:38:17 +00:00