feat(drift+target): P0.1+P0.2+P0.3 三修 — drift 分頁分類 + AI 推薦 + target 追 trace

統帥三問決議:全做;AI 推薦 0.85 門檻純顯示不自動;先查 aol 再修

## RCA: awoooi-service 失敗來源
- /api/v1/aiops/kpi 顯示過去 24h 有 1 筆 playbook_executed actor=approval_execution status=failed
- grep codebase: 無任何程式碼寫死 awoooi-service(只有歷史 comment)
- 最可能源: alert_rule_engine._extract_vars 從 labels.service 取值當 Deployment 名
- cf5050c/4f2e122(2026-04-18)已修 NEMOTRON 幻覺雙路徑;本次修第三條路徑

## 修復
### P0.3a alert_rule_engine._extract_vars
- labels.service 降級:-service 結尾先剝 suffix 視為 base name
- match_rule 回傳新增 target_source 欄位追 trace
- 下次 awoooi-service 復發可直接看來源(label.service(stripped) 等)

### P0.3c approval_execution._log_aol_started.input
- 補 parsed_target/operation/namespace 欄位
- 未來 aol 查 failed 可直接看 target,無需推敲

### P0.1 telegram_gateway._send_drift_diff_detail
- 分頁(10 項/頁)取代一次洗版 30 項
- header 3 桶分類計數: 人工高風險 / 一般修改 / K8s 自動
- 底部 ⬅️/➡️ 分頁按鈕(callback: drift_view_page:{report_id}_{page})
- security_interceptor INFO_ACTIONS 加 drift_view_page 白名單

### P0.2 drift_narrator recommendation
- LLM prompt 加 recommendation 欄位(action/confidence/reason)
- action ∈ {adopt, revert, ignore, investigate}
- 卡片頂部顯示「🎯 AI 建議: 回滾 (85%) — reason」
- LLM 失敗走 _fallback_recommendation(規則式依 intent 對應)
- 卡片 diff_summary 上限 500 → 1500 字容納推薦 + narrative + items
- 統帥指令:純顯示不自動執行(門檻 0.85 保留未來)

## 驗證
- 90 個 pytest test 全過(drift + rule_engine + approval_execution)
- 5 檔 AST syntax check 過

## 下次驗收
1. 下次 drift 觸發 → 卡片頂部有「🎯 AI 建議」
2. drift_view 按下 → 3 桶分類 header + ⬅️/➡️
3. awoooi-service 若復發 → automation_operation_log.input.parsed_target 直接查

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Your Name
2026-04-20 04:01:30 +08:00
parent 8d40bbff2b
commit 54d60d04f5
6 changed files with 319 additions and 37 deletions

View File

@@ -163,15 +163,19 @@ def _extract_vars(alert_context: dict) -> dict[str, str]:
"""
從 alert_context 提取模板變數。
GAP-A4 (2026-04-14 Claude Sonnet 4.6): 強化 target 解析,新增多層 label 查找順序:
GAP-A4 (2026-04-14 Claude Sonnet 4.6): 強化 target 解析,多層 label 查找順序:
1. labels.deployment (最權威)
2. labels.app / labels.app.kubernetes.io/name
3. labels.statefulset
4. labels.pod → 去除 replicaset/pod hash 後綴
5. labels.container / labels.name
6. labels.service
6. labels.service2026-04-20 降級K8s Service 名 != Deployment 名,
改記 target_source=label.service 讓下游可疑點觸發 pre-flight 驗證)
7. target_resource但排除 IP:port 和 alertname
target_source 欄位回傳讓 decision/execution 層能追 trace
若 P0.3 本次 trace 不夠清楚,下次觸發就有 aol.context.target_source。
若全部提取失敗 → target="unknown",由 match_rule() 的後置驗證丟棄此規則。
"""
labels = alert_context.get("labels", {})
@@ -184,10 +188,12 @@ def _extract_vars(alert_context: dict) -> dict[str, str]:
# GAP-A4: 多層 label 查找,由最權威到最弱
target = ""
target_source = "" # 2026-04-20: 追蹤 target 從哪個 label 來(供 aol 留痕)
for key in ("deployment", "app", "app.kubernetes.io/name", "statefulset"):
val = labels.get(key, "")
if val and not _is_bad_target(val, alertname):
target = val
target_source = f"label.{key}"
break
# Pod label 需去除 hash 後綴還原 Deployment 名稱
@@ -195,26 +201,45 @@ def _extract_vars(alert_context: dict) -> dict[str, str]:
pod = labels.get("pod", "")
if pod and not _is_bad_target(pod, alertname):
target = _strip_pod_suffix(pod)
target_source = "label.pod(stripped)"
# container / name 次優
if not target:
for key in ("container", "name", "service"):
for key in ("container", "name"):
val = labels.get(key, "")
if val and not _is_bad_target(val, alertname):
target = val
target_source = f"label.{key}"
break
# service label 最末位K8s Service 名非 Deployment 名,常產生 awoooi-service 幻覺)
# 2026-04-20 P0.3:若 service 以 '-service' 結尾,先去 suffix 視作 base name
# 仍留 target_source=label.service(demoted) 讓下游觸發 pre-flight 驗證
if not target:
svc = labels.get("service", "")
if svc and not _is_bad_target(svc, alertname):
if svc.endswith("-service"):
# awoooi-service → awoooi (通常對應 awoooi-api 或類似,仍需 pre-flight 驗)
target = svc[: -len("-service")] or svc
target_source = "label.service(stripped -service suffix)"
else:
target = svc
target_source = "label.service(demoted)"
# raw_target 末位(且必須通過 bad_target 驗證)
if not target and not _is_bad_target(raw_target, alertname):
target = raw_target
target_source = "alert_context.target_resource"
# 若全部失敗 → 保留 "unknown" 讓後置驗證層 reject
if not target:
target = "unknown"
target_source = "none(fallback)"
container = labels.get("name", labels.get("container", "")) or target
return {
"target": target,
"target_source": target_source, # 2026-04-20 P0.3 新增
"host": host,
"container": container,
"instance": instance,
@@ -410,6 +435,7 @@ def match_rule(alert_context: dict) -> dict[str, Any] | None:
"suggested_action": resp["suggested_action"],
"kubectl_command": kubectl_command,
"target_resource": vars["target"],
"target_source": vars.get("target_source", ""), # 2026-04-20 P0.3 留痕
"namespace": vars["namespace"],
"risk_level": risk,
"blast_radius": {

View File

@@ -1096,18 +1096,37 @@ class ApprovalExecutionService:
在 automation_operation_log 寫一筆 'pending' 紀錄,回傳 op_id 供 _log_aol_completed 更新。
失敗時 (DB 異常) 回 None,主流程繼續 — aol 寫入永不阻塞執行。
2026-04-20 P0.3: input 補 target / operation_type / namespace
失敗時 aol.input 就能直接看到 target 是什麼(追 awoooi-service 類誤判的 source trace
"""
try:
from sqlalchemy import text as _sql
from src.db.base import get_db_context
import json as _json
# 2026-04-20 P0.3: 先嘗試從 action 解析 target / op_type失敗不阻塞
_parsed_target: str | None = None
_parsed_op: str | None = None
_parsed_ns: str | None = None
try:
_parsed = parse_operation_from_action(approval.action or "")
_parsed_target = _parsed.resource_name
_parsed_op = _parsed.operation_type.value if _parsed.operation_type else None
_parsed_ns = _parsed.namespace
except Exception:
pass
input_payload = {
"approval_id": str(approval.id),
"incident_id": approval.incident_id or "",
"action": (approval.action or "")[:500],
"risk_level": getattr(approval, "risk_level", None) or "",
"requested_by": getattr(approval, "requested_by", "") or "",
# 2026-04-20 P0.3: target source trace
"parsed_target": _parsed_target or "",
"parsed_operation": _parsed_op or "",
"parsed_namespace": _parsed_ns or "",
}
async with get_db_context() as db:

View File

@@ -55,6 +55,10 @@ TRIGGER_MEDIUM_MIN = 3
# ============================================================
# 2026-04-18 ogt + Claude Opus 4.7: B 方案 — LLM 驅動智能摘要(取代 Python str()[:30] 截斷)
# 架構鐵律: 捨棄 Python 寫死字串解析,結構化 diff 直接餵 LLM,由 LLM 產出繁中 Top 5 摘要
# 2026-04-20 P0.2 ogt + Claude Opus 4.7: 加 recommendation 輸出LLM 推薦該按哪顆按鈕
# - action ∈ {adopt, revert, ignore, investigate}
# - confidence 0.0-1.0(統帥指令:先不 auto-execute門檻 0.85 保留給未來)
# - reason 一行繁中解釋
_NARRATIVE_PROMPT = """你是 AWOOOI SRE 維運助理。以下是 K8s Config Drift 報告的原始結構化資料。
## 漂移項目原始資料JSON
@@ -72,9 +76,20 @@ _NARRATIVE_PROMPT = """你是 AWOOOI SRE 維運助理。以下是 K8s Config Dri
"field": "簡化後的欄位路徑 (40 字內)",
"summary": "30 字內繁體中文口語摘要,說明從什麼變成什麼"
}}
]
],
"recommendation": {{
"action": "adopt 或 revert 或 ignore 或 investigate",
"confidence": 0.85,
"reason": "一行繁體中文解釋為何推薦此動作(含關鍵證據)"
}}
}}
## recommendation action 語意
- adopt: 現狀合理,應把 K8s 狀態寫回 Git (例HPA 自動擴縮、緊急 hotfix 已驗證)
- revert: 漂移有風險,應回滾到 Git 狀態 (例image tag 被誤改、secret 被外部改)
- ignore: 噪音K8s controller 自動補齊 (例:空 list/dict 差異)
- investigate: 不確定,需要人工查清楚
## 規則
- 繁體中文
- items 最多挑 5 筆最重要的HIGH 優先)
@@ -84,6 +99,7 @@ _NARRATIVE_PROMPT = """你是 AWOOOI SRE 維運助理。以下是 K8s Config Dri
- "新增 pod anti-affinity 規則"
- 禁止 markdown、反引號、emoji
- 只輸出純 JSON,不要包在 code block 裡
- recommendation.confidence 要誠實HIGH drift 且意圖不明 → 0.3-0.5trivial noise → 0.9
"""
@@ -126,8 +142,9 @@ class DriftNarratorService:
return
# 2026-04-18 B 方案: LLM 同時產 narrative + 結構化 items取代 str()[:30]
narrative, items = await self._generate_narrative_and_items(report, interpretation)
await self._send_telegram(report, narrative, items)
# 2026-04-20 P0.2: 追加 recommendationaction/confidence/reason
narrative, items, recommendation = await self._generate_narrative_and_items(report, interpretation)
await self._send_telegram(report, narrative, items, recommendation)
# 寫入 DB narrative_text (Phase 30 ADR-067)
try:
@@ -162,13 +179,17 @@ class DriftNarratorService:
self,
report: "DriftReport",
interpretation: "DriftInterpretation | None",
) -> tuple[str, list[dict]]:
) -> tuple[str, list[dict], dict]:
"""
2026-04-18 ogt + Claude Opus 4.7: B 方案 — LLM 產生 narrative + 結構化 items
2026-04-20 P0.2 ogt + Claude Opus 4.7: 追加 recommendationAI 推薦按鈕)
回傳 (narrative, items):
回傳 (narrative, items, recommendation):
narrative: 繁中 4-5 行敘述
items: [{level, field, summary}, ...] 最多 5 筆
recommendation: {action, confidence, reason}
action ∈ {adopt, revert, ignore, investigate}
confidence 0.0-1.0(統帥指令:先不 auto-execute僅顯示供統帥參考
LLM 失敗則 fallback 到 Python 智能截斷(不是 str()[:30] 暴力砍)
@@ -189,6 +210,7 @@ class DriftNarratorService:
started_ms = time.time()
narrative: str = ""
items: list[dict] = []
recommendation: dict = {} # 2026-04-20 P0.2
raw_response: str | None = None
provider: str = "unknown"
status: str = "failed"
@@ -260,6 +282,33 @@ class DriftNarratorService:
if not clean_items:
clean_items = self._fallback_items(report)
# 2026-04-20 P0.2: 解析 recommendation若 LLM 給了)
_rec = None
try:
if isinstance(_parsed, dict):
_rec = _parsed.get("recommendation")
# Path 2 場景recommendation 也可能藏在 _inner
if _rec is None and _parsed.get("description", "").startswith("{"):
_inner_txt = str(_parsed["description"]).strip()
_inner = _json.loads(_inner_txt)
if isinstance(_inner, dict):
_rec = _inner.get("recommendation")
except (_json.JSONDecodeError, ValueError, KeyError):
_rec = None
if isinstance(_rec, dict) and _rec.get("action"):
_act = str(_rec.get("action", "")).strip().lower()
if _act in ("adopt", "revert", "ignore", "investigate"):
try:
_conf = float(_rec.get("confidence", 0.0))
except (TypeError, ValueError):
_conf = 0.0
_conf = max(0.0, min(1.0, _conf))
recommendation = {
"action": _act,
"confidence": _conf,
"reason": str(_rec.get("reason", ""))[:200],
}
narrative = _parsed_narrative
items = clean_items
status = "success"
@@ -277,6 +326,10 @@ class DriftNarratorService:
items = self._fallback_items(report)
status = "failed"
# 2026-04-20 P0.2: LLM 未給 recommendation 就走 Python fallback
if not recommendation:
recommendation = self._fallback_recommendation(report, interpretation)
# ADR-090-C: 同步寫 DB 稽核(永不 propagate error,保護主流程)
duration_ms = int((time.time() - started_ms) * 1000)
try:
@@ -294,7 +347,50 @@ class DriftNarratorService:
except Exception as e:
logger.warning("drift_narrator_audit_write_failed", error=str(e))
return narrative, items
return narrative, items, recommendation
def _fallback_recommendation(
self,
report: "DriftReport",
interpretation: "DriftInterpretation | None",
) -> dict:
"""
2026-04-20 P0.2 ogt + Claude Opus 4.7: LLM 沒給 recommendation 時的 Python fallback
規則式推薦(保守):
- 全部 trivial/白名單 → ignore (0.8)
- 有 HIGH drift + intent=emergency_hotfix → adopt (0.5) (不確定,降信心)
- 有 HIGH drift + intent=human_error → revert (0.7)
- 其他 → investigate (0.4)(請人工介入)
"""
actionable = self._count_nontrivial_drift(report)
if actionable == 0:
return {
"action": "ignore",
"confidence": 0.8,
"reason": "全部為白名單或 K8s 預設值補齊,無實質變更。",
}
_has_high = report.high_count > 0
_intent = interpretation.intent.value if interpretation else "unknown"
if _has_high and _intent == "emergency_hotfix":
return {
"action": "adopt",
"confidence": 0.5,
"reason": "HIGH drift 但意圖分析為緊急 hotfix建議採納並補 Git請人工複核",
}
if _has_high and _intent == "human_error":
return {
"action": "revert",
"confidence": 0.7,
"reason": "HIGH drift 且意圖分析為人為誤操作,建議回滾 Git 狀態。",
}
return {
"action": "investigate",
"confidence": 0.4,
"reason": f"{actionable} 項可操作漂移,意圖={_intent},需人工查清楚再決定。",
}
async def _log_ai_action_to_db(
self,
@@ -540,24 +636,29 @@ class DriftNarratorService:
report: "DriftReport",
narrative: str,
items: list[dict],
recommendation: dict | None = None,
) -> None:
"""
推送 TYPE-4D Config Drift 卡片ADR-075+ B 方案智能摘要
2026-04-18 ogt + Claude Opus 4.7: 改用 LLM 產的結構化 items,
取代 str()[:30] 暴力截斷產生的亂碼
2026-04-20 P0.2 ogt + Claude Opus 4.7: recommendation 顯示在卡片頂部
(統帥指令:先不 auto-execute純顯示推薦讓人一眼知道按哪顆
"""
from src.services.telegram_gateway import get_telegram_gateway
diff_summary = self._render_telegram_body(report, narrative, items)
diff_summary = self._render_telegram_body(report, narrative, items, recommendation)
try:
tg = get_telegram_gateway()
# 2026-04-20 P0.2: 500 → 1500 字上限,讓 AI 推薦 + narrative + items 都能容納
# send_drift_card 已同步放寬 HTML 顯示上限至 1500
await tg.send_drift_card(
incident_id=report.report_id,
approval_id=report.report_id,
resource_name=report.namespace,
diff_summary=diff_summary[:500],
diff_summary=diff_summary[:1500],
detected_at="",
)
except Exception as e:
@@ -603,21 +704,38 @@ class DriftNarratorService:
report: "DriftReport",
narrative: str,
items: list[dict],
recommendation: dict | None = None,
) -> str:
"""
組裝 Telegram 卡片 bodyB 方案格式)
組裝 Telegram 卡片 bodyB 方案格式 + P0.2 AI 推薦
範例輸出:
🎯 AI 建議:⏪ 回滾 (85%) — image tag 被手動改到未驗證版本
🤖 AI 研判
volumes 與 affinity 被手動修改...
📊 漂移明細 (HIGH: 1 | MEDIUM: 29)
🔴 spec.template.spec.volumes: 新增 2 項 repair-ssh-key 掛載
🟡 spec.template.spec.serviceAccount: (未設) → awoooi-executor
🟡 spec.template.spec.affinity.podAntiAffinity: 新增 preferred 規則
... 還有 27 項
"""
lines = [f"🤖 AI 研判\n{narrative}\n"]
lines = []
# 2026-04-20 P0.2 AI 推薦(頂部,純推薦不自動執行)
if recommendation and recommendation.get("action"):
_act = recommendation["action"]
_conf = float(recommendation.get("confidence", 0.0))
_reason = recommendation.get("reason", "")
_emoji_action = {
"adopt": "✅ 採納",
"revert": "⏪ 回滾",
"ignore": "🔕 忽略",
"investigate": "🔍 人工調查",
}.get(_act, _act)
lines.append(f"🎯 AI 建議:{_emoji_action} ({int(_conf * 100)}%) — {_reason}\n")
lines.append(f"🤖 AI 研判\n{narrative}\n")
# 用非 trivial + 非白名單 的實際可操作數顯示
actionable = self._count_nontrivial_drift(report)

View File

@@ -435,7 +435,9 @@ class TelegramSecurityInterceptor:
- 格式二: {action, incident_id, is_info_action: True}
"""
# 2026-04-01 Claude Code (ADR-050): 支援 read-only info actions (2-part format)
INFO_ACTIONS = {"detail", "reanalyze", "history"}
# 2026-04-20 P0.1 ogt + Claude Opus 4.7: drift_view_page 納入 INFO_ACTIONS
# payload 格式: drift_view_page:{report_id}_{page}(底線分隔,不跟冒號衝突)
INFO_ACTIONS = {"detail", "reanalyze", "history", "drift_view_page"}
parts = callback_data.split(":")
if len(parts) == 2 and parts[0] in INFO_ACTIONS:
return {

View File

@@ -1961,7 +1961,8 @@ class TelegramGateway:
# 根因:<pre> 在 Telegram HTML mode 渲染為 code block但 diff_summary 是 AI
# 研判敘述 + emoji 清單(非 code應以純文字顯示
# Diff 長度處理 (ADR-071, Section 14.9.6)
if len(diff_summary) <= 500:
# 2026-04-20 P0.2 ogt + Claude Opus 4.7: 500 → 1500 讓 AI 建議 + narrative + items 完整顯示
if len(diff_summary) <= 1500:
diff_block = f"\n━━━━━━━━━━━━━━━━━━━\n{html.escape(diff_summary)}"
else:
web_url = f"https://aiops.wooo.work/incidents/{incident_id}/drift-diff"
@@ -2084,10 +2085,37 @@ class TelegramGateway:
return {"action": action, "approval_id": approval_id, "user": user, "success": False}
async def _send_drift_diff_detail(self, report_id: str) -> None:
# 2026-04-20 P0.1 ogt + Claude Opus 4.7: drift_view 分頁 + 分類桶
# 原邏輯: _send_drift_diff_detail 一次列 3800 字元 → 30 項洗版
# 新邏輯: 分頁 10 項/頁、header 顯示 3 桶分類計數、⬅️/➡️ 按鈕切頁
_DRIFT_PAGE_SIZE = 10
def _classify_drift_item(self, item) -> str:
"""
送完整 Drift Diff 到 Telegram (drift_view 按鈕回應)
展示全部 items (含 HIGH + MEDIUM + 可操作+trivial 分群)
分類 drift item 到 3 桶(規則式,不走 LLM 省 token:
- k8s_default: K8s controller 自動補齊(白名單或空↔空)
- human_high: HIGH level 且非 trivial像是 image/env/ports 被人工改)
- routine_medium: MEDIUM 非 trivial一般設定調整
"""
level = getattr(item.drift_level, "value", str(item.drift_level))
# 白名單或 trivial → K8s 自動補齊
if item.is_allowlisted:
return "k8s_default"
_g, _a = item.git_value, item.actual_value
_empty_g = _g is None or str(_g).strip() in ("", "{}", "[]", "null", "None")
_empty_a = _a is None or str(_a).strip() in ("", "{}", "[]", "null", "None")
if _empty_g and _empty_a:
return "k8s_default"
if level == "high":
return "human_high"
return "routine_medium"
async def _send_drift_diff_detail(self, report_id: str, page: int = 0) -> None:
"""
送分頁 Drift Diff 到 Telegram (drift_view / drift_view_page 按鈕回應)
每頁 _DRIFT_PAGE_SIZE 項header 顯示 3 桶分類計數 + 分頁位置,
底部含「⬅️ 上頁 / 下頁 ➡️」按鈕 (callback: drift_view_page:{report_id}_{page})。
"""
try:
from src.repositories.drift_repository import get_drift_repository
@@ -2100,25 +2128,44 @@ class TelegramGateway:
})
return
# 2026-04-19 ogt + Claude Opus 4.7: 修 HTTP 400 真因
# 原邏輯: _full[:3950] 切在 HTML tag/entity 中間 → Telegram parse_mode HTML 拒絕
# 修法: item-by-item 累計長度,超過 3800 就停,確保完整 HTML 結構
# (3800 留 250 buffer 給 header + 截斷提示)
_MAX_LEN = 3800
# 1. 分類 & 排序HIGH 優先 → routine → trivial
_classified: list[tuple[str, object]] = [
(self._classify_drift_item(_it), _it) for _it in _rpt.items
]
_bucket_order = {"human_high": 0, "routine_medium": 1, "k8s_default": 2}
_classified.sort(key=lambda x: _bucket_order[x[0]])
_bucket_counts = {"human_high": 0, "routine_medium": 0, "k8s_default": 0}
for _bk, _ in _classified:
_bucket_counts[_bk] += 1
_total = len(_classified)
_total_pages = max(1, (_total + self._DRIFT_PAGE_SIZE - 1) // self._DRIFT_PAGE_SIZE)
_page = max(0, min(page, _total_pages - 1))
_start = _page * self._DRIFT_PAGE_SIZE
_end = min(_start + self._DRIFT_PAGE_SIZE, _total)
_slice = _classified[_start:_end]
# 2. HeaderAI 分類桶)
_header = [
f"📊 <b>完整 Drift Diff</b> — <code>{html.escape(report_id)}</code>",
f"📊 <b>Drift Diff (頁 {_page + 1}/{_total_pages})</b> — <code>{html.escape(report_id)[:24]}</code>",
f"Namespace: <code>{html.escape(_rpt.namespace)}</code>",
f"HIGH×{_rpt.high_count} MEDIUM×{_rpt.medium_count} INFO×{_rpt.info_count}",
(
f"🔴 人工高風險 {_bucket_counts['human_high']} | "
f"🟡 一般修改 {_bucket_counts['routine_medium']} | "
f"🔧 K8s 自動 {_bucket_counts['k8s_default']}"
),
"" * 20,
]
_lines = list(_header)
_MAX_LEN = 3800
_used_len = sum(len(s) + 1 for s in _header)
_shown = 0
for _item in _rpt.items:
_level = getattr(_item.drift_level, "value", str(_item.drift_level))
_emoji = "🔴" if _level == "high" else ("🟡" if _level == "medium" else "")
# 3. 本頁項目(每項仍守 _MAX_LEN 上限,極端長值時寧可提早中斷也不洗版)
_rendered = 0
_bucket_emoji = {"human_high": "🔴", "routine_medium": "🟡", "k8s_default": "🔧"}
for _bk, _item in _slice:
_emoji = _bucket_emoji[_bk]
_field = (_item.field_path or "")[:80]
_git = str(_item.git_value)[:40] if _item.git_value is not None else "(未設)"
_k8s = str(_item.actual_value)[:40] if _item.actual_value is not None else "(未設)"
@@ -2131,22 +2178,42 @@ class TelegramGateway:
break
_lines.append(_block)
_used_len += len(_block) + 1
_shown += 1
_rendered += 1
_remaining = len(_rpt.items) - _shown
if _remaining > 0:
_lines.append(f"… 還有 {_remaining} 項未顯示")
_skipped_in_page = len(_slice) - _rendered
if _skipped_in_page > 0:
_lines.append(f"本頁還有 {_skipped_in_page}過長未顯示,請縮小 field 範圍")
_full = "\n".join(_lines)
await self._send_request("sendMessage", {
# 4. 分頁按鈕INFO_ACTIONS 2-part 格式payload 用底線分隔 report_id 與 page
_rows = []
_nav = []
if _page > 0:
_nav.append({
"text": "⬅️ 上頁",
"callback_data": f"drift_view_page:{report_id}_{_page - 1}",
})
if _page < _total_pages - 1:
_nav.append({
"text": "下頁 ➡️",
"callback_data": f"drift_view_page:{report_id}_{_page + 1}",
})
if _nav:
_rows.append(_nav)
_keyboard = {"inline_keyboard": _rows} if _rows else None
_payload = {
"chat_id": settings.OPENCLAW_TG_CHAT_ID,
"text": _full,
"parse_mode": "HTML",
"disable_web_page_preview": True,
})
}
if _keyboard:
_payload["reply_markup"] = _keyboard
await self._send_request("sendMessage", _payload)
except Exception as _e:
logger.warning("drift_diff_detail_send_failed", report_id=report_id, error=str(_e))
logger.warning("drift_diff_detail_send_failed", report_id=report_id, page=page, error=str(_e))
await self._send_request("sendMessage", {
"chat_id": settings.OPENCLAW_TG_CHAT_ID,
"text": f"⚠️ Drift Diff 查詢失敗: <code>{html.escape(str(_e)[:150])}</code>",
@@ -2986,6 +3053,18 @@ class TelegramGateway:
# 2026-04-01 Claude Code (ADR-050 P2): reanalyze button handler
await self._answer_callback(callback_query_id, action, text="🔄 重診排程中...")
await self._send_reanalyze_result(incident_id)
elif action == "drift_view_page":
# 2026-04-20 P0.1 ogt + Claude Opus 4.7: drift_view 分頁切頁
# incident_id 格式: {report_id}_{page}(底線分隔)
_rid, _, _page_str = incident_id.rpartition("_")
try:
_page_num = int(_page_str)
except ValueError:
_rid, _page_num = incident_id, 0
await self._answer_callback(
callback_query_id, action, text=f"📄 切換至第 {_page_num + 1} 頁..."
)
await self._send_drift_diff_detail(_rid or incident_id, page=_page_num)
else:
# 2026-04-14 Claude Sonnet 4.6 (Phase 5 Sprint 5.1):
# 未知 action → fallback dispatcher (查看 callback_action_spec.yaml 是否有註冊)

View File

@@ -6,6 +6,44 @@
---
## 📍 2026-04-20 上午 — P0.1 + P0.2 + P0.3 三項 Drift/Target 修復
### 統帥三問 RCA 後決議
1. 全做 P0.1 + P0.2 + P0.3
2. AI 推薦門檻 0.85 OK**但先不 auto-execute**(純推薦)
3. 先查 aol 找 awoooi-service 來源 trace 再修
### RCA 結論awoooi-service 失敗)
- 透過 `/api/v1/aiops/kpi` 看到過去 24h 有 1 筆 `playbook_executed actor=approval_execution status=failed`
- grep 全 codebase**無任何程式碼寫死 `awoooi-service`**(只有歷史 comment
- 最可能來源:`alert_rule_engine._extract_vars``labels.service` 取值當 Deployment 名K8s Service 名 ≠ Deployment 名)
- cf59050 / 4f2e1222026-04-18已修 NEMOTRON 幻覺雙路徑本次修第三條路徑rule engine label fallback
### 修復內容5 檔 / 281 行)
| # | 檔案 | 內容 |
|---|------|------|
| P0.3a | `alert_rule_engine.py` | `_extract_vars` service label 降級:`-service` 結尾先剝 suffix同時回傳 `target_source` 追蹤來源 |
| P0.3c | `approval_execution.py` | `_log_aol_started` input 補 `parsed_target/operation/namespace`,下次失敗可直接從 aol 查 trace |
| P0.3b | `approval_execution.py` | 既有 `_log_aol_completed` 本就寫 `resource_name/error/stderr`,追 trace 夠用 |
| P0.1 | `telegram_gateway.py` | `_send_drift_diff_detail` 加分頁10 項/頁)+ 3 桶分類 header人工高風險/一般修改/K8s 自動)+ ⬅️/➡️ 按鈕 |
| P0.1 | `security_interceptor.py` | INFO_ACTIONS 加 `drift_view_page` 白名單 |
| P0.2 | `drift_narrator_service.py` | LLM prompt 加 recommendation 欄位adopt/revert/ignore/investigate + confidence + reason|
| P0.2 | `drift_narrator_service.py` | `_render_telegram_body` 頂部顯示「🎯 AI 建議:⏪ 回滾 (85%) — 原因」 |
| P0.2 | `drift_narrator_service.py` + `telegram_gateway.py` | 卡片 diff_summary 上限 500 → 1500 字,容納推薦 + narrative + items |
### 驗證
- 90 個 pytest test 全過drift / rule_engine / approval_execution
- 5 檔 AST syntax check 過
- AI 推薦**純顯示不自動執行**(依統帥指令)
### 下一步
1. 等下次 real drift 觸發,驗卡片頂部有「🎯 AI 建議」
2. 等下次 drift_view 按下,驗分頁 + 分類 header + ⬅️/➡️ 按鈕
3. 若 awoooi-service 再復發,查 `automation_operation_log``input.parsed_target` 直接追來源
4. P1 留drift 分類器 (noise/controller/human) 進 DB、auto-adopt 門檻 ≥0.85 + low risk
---
## 📍 2026-04-19 晚 21:30 — Gap Review + 3 Gap 修 + AI 自主化 1/9→4/9 LLM 🎖️🎖️🎖️🎖️
### 統帥核心指示