feat(drift+target): P0.1+P0.2+P0.3 三修 — drift 分頁分類 + AI 推薦 + target 追 trace
統帥三問決議:全做;AI 推薦 0.85 門檻純顯示不自動;先查 aol 再修 ## RCA: awoooi-service 失敗來源 - /api/v1/aiops/kpi 顯示過去 24h 有 1 筆 playbook_executed actor=approval_execution status=failed - grep codebase: 無任何程式碼寫死 awoooi-service(只有歷史 comment) - 最可能源: alert_rule_engine._extract_vars 從 labels.service 取值當 Deployment 名 - cf5050c/4f2e122(2026-04-18)已修 NEMOTRON 幻覺雙路徑;本次修第三條路徑 ## 修復 ### P0.3a alert_rule_engine._extract_vars - labels.service 降級:-service 結尾先剝 suffix 視為 base name - match_rule 回傳新增 target_source 欄位追 trace - 下次 awoooi-service 復發可直接看來源(label.service(stripped) 等) ### P0.3c approval_execution._log_aol_started.input - 補 parsed_target/operation/namespace 欄位 - 未來 aol 查 failed 可直接看 target,無需推敲 ### P0.1 telegram_gateway._send_drift_diff_detail - 分頁(10 項/頁)取代一次洗版 30 項 - header 3 桶分類計數: 人工高風險 / 一般修改 / K8s 自動 - 底部 ⬅️/➡️ 分頁按鈕(callback: drift_view_page:{report_id}_{page}) - security_interceptor INFO_ACTIONS 加 drift_view_page 白名單 ### P0.2 drift_narrator recommendation - LLM prompt 加 recommendation 欄位(action/confidence/reason) - action ∈ {adopt, revert, ignore, investigate} - 卡片頂部顯示「🎯 AI 建議:⏪ 回滾 (85%) — reason」 - LLM 失敗走 _fallback_recommendation(規則式依 intent 對應) - 卡片 diff_summary 上限 500 → 1500 字容納推薦 + narrative + items - 統帥指令:純顯示不自動執行(門檻 0.85 保留未來) ## 驗證 - 90 個 pytest test 全過(drift + rule_engine + approval_execution) - 5 檔 AST syntax check 過 ## 下次驗收 1. 下次 drift 觸發 → 卡片頂部有「🎯 AI 建議」 2. drift_view 按下 → 3 桶分類 header + ⬅️/➡️ 3. awoooi-service 若復發 → automation_operation_log.input.parsed_target 直接查 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -163,15 +163,19 @@ def _extract_vars(alert_context: dict) -> dict[str, str]:
|
||||
"""
|
||||
從 alert_context 提取模板變數。
|
||||
|
||||
GAP-A4 (2026-04-14 Claude Sonnet 4.6): 強化 target 解析,新增多層 label 查找順序:
|
||||
GAP-A4 (2026-04-14 Claude Sonnet 4.6): 強化 target 解析,多層 label 查找順序:
|
||||
1. labels.deployment (最權威)
|
||||
2. labels.app / labels.app.kubernetes.io/name
|
||||
3. labels.statefulset
|
||||
4. labels.pod → 去除 replicaset/pod hash 後綴
|
||||
5. labels.container / labels.name
|
||||
6. labels.service
|
||||
6. labels.service(2026-04-20 降級:K8s Service 名 != Deployment 名,
|
||||
改記 target_source=label.service 讓下游可疑點觸發 pre-flight 驗證)
|
||||
7. target_resource(但排除 IP:port 和 alertname)
|
||||
|
||||
target_source 欄位回傳讓 decision/execution 層能追 trace;
|
||||
若 P0.3 本次 trace 不夠清楚,下次觸發就有 aol.context.target_source。
|
||||
|
||||
若全部提取失敗 → target="unknown",由 match_rule() 的後置驗證丟棄此規則。
|
||||
"""
|
||||
labels = alert_context.get("labels", {})
|
||||
@@ -184,10 +188,12 @@ def _extract_vars(alert_context: dict) -> dict[str, str]:
|
||||
|
||||
# GAP-A4: 多層 label 查找,由最權威到最弱
|
||||
target = ""
|
||||
target_source = "" # 2026-04-20: 追蹤 target 從哪個 label 來(供 aol 留痕)
|
||||
for key in ("deployment", "app", "app.kubernetes.io/name", "statefulset"):
|
||||
val = labels.get(key, "")
|
||||
if val and not _is_bad_target(val, alertname):
|
||||
target = val
|
||||
target_source = f"label.{key}"
|
||||
break
|
||||
|
||||
# Pod label 需去除 hash 後綴還原 Deployment 名稱
|
||||
@@ -195,26 +201,45 @@ def _extract_vars(alert_context: dict) -> dict[str, str]:
|
||||
pod = labels.get("pod", "")
|
||||
if pod and not _is_bad_target(pod, alertname):
|
||||
target = _strip_pod_suffix(pod)
|
||||
target_source = "label.pod(stripped)"
|
||||
|
||||
# container / name 次優
|
||||
if not target:
|
||||
for key in ("container", "name", "service"):
|
||||
for key in ("container", "name"):
|
||||
val = labels.get(key, "")
|
||||
if val and not _is_bad_target(val, alertname):
|
||||
target = val
|
||||
target_source = f"label.{key}"
|
||||
break
|
||||
|
||||
# service label 最末位(K8s Service 名非 Deployment 名,常產生 awoooi-service 幻覺)
|
||||
# 2026-04-20 P0.3:若 service 以 '-service' 結尾,先去 suffix 視作 base name,
|
||||
# 仍留 target_source=label.service(demoted) 讓下游觸發 pre-flight 驗證
|
||||
if not target:
|
||||
svc = labels.get("service", "")
|
||||
if svc and not _is_bad_target(svc, alertname):
|
||||
if svc.endswith("-service"):
|
||||
# awoooi-service → awoooi (通常對應 awoooi-api 或類似,仍需 pre-flight 驗)
|
||||
target = svc[: -len("-service")] or svc
|
||||
target_source = "label.service(stripped -service suffix)"
|
||||
else:
|
||||
target = svc
|
||||
target_source = "label.service(demoted)"
|
||||
|
||||
# raw_target 末位(且必須通過 bad_target 驗證)
|
||||
if not target and not _is_bad_target(raw_target, alertname):
|
||||
target = raw_target
|
||||
target_source = "alert_context.target_resource"
|
||||
|
||||
# 若全部失敗 → 保留 "unknown" 讓後置驗證層 reject
|
||||
if not target:
|
||||
target = "unknown"
|
||||
target_source = "none(fallback)"
|
||||
|
||||
container = labels.get("name", labels.get("container", "")) or target
|
||||
return {
|
||||
"target": target,
|
||||
"target_source": target_source, # 2026-04-20 P0.3 新增
|
||||
"host": host,
|
||||
"container": container,
|
||||
"instance": instance,
|
||||
@@ -410,6 +435,7 @@ def match_rule(alert_context: dict) -> dict[str, Any] | None:
|
||||
"suggested_action": resp["suggested_action"],
|
||||
"kubectl_command": kubectl_command,
|
||||
"target_resource": vars["target"],
|
||||
"target_source": vars.get("target_source", ""), # 2026-04-20 P0.3 留痕
|
||||
"namespace": vars["namespace"],
|
||||
"risk_level": risk,
|
||||
"blast_radius": {
|
||||
|
||||
@@ -1096,18 +1096,37 @@ class ApprovalExecutionService:
|
||||
在 automation_operation_log 寫一筆 'pending' 紀錄,回傳 op_id 供 _log_aol_completed 更新。
|
||||
|
||||
失敗時 (DB 異常) 回 None,主流程繼續 — aol 寫入永不阻塞執行。
|
||||
|
||||
2026-04-20 P0.3: input 補 target / operation_type / namespace,
|
||||
失敗時 aol.input 就能直接看到 target 是什麼(追 awoooi-service 類誤判的 source trace)。
|
||||
"""
|
||||
try:
|
||||
from sqlalchemy import text as _sql
|
||||
from src.db.base import get_db_context
|
||||
import json as _json
|
||||
|
||||
# 2026-04-20 P0.3: 先嘗試從 action 解析 target / op_type,失敗不阻塞
|
||||
_parsed_target: str | None = None
|
||||
_parsed_op: str | None = None
|
||||
_parsed_ns: str | None = None
|
||||
try:
|
||||
_parsed = parse_operation_from_action(approval.action or "")
|
||||
_parsed_target = _parsed.resource_name
|
||||
_parsed_op = _parsed.operation_type.value if _parsed.operation_type else None
|
||||
_parsed_ns = _parsed.namespace
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
input_payload = {
|
||||
"approval_id": str(approval.id),
|
||||
"incident_id": approval.incident_id or "",
|
||||
"action": (approval.action or "")[:500],
|
||||
"risk_level": getattr(approval, "risk_level", None) or "",
|
||||
"requested_by": getattr(approval, "requested_by", "") or "",
|
||||
# 2026-04-20 P0.3: target source trace
|
||||
"parsed_target": _parsed_target or "",
|
||||
"parsed_operation": _parsed_op or "",
|
||||
"parsed_namespace": _parsed_ns or "",
|
||||
}
|
||||
|
||||
async with get_db_context() as db:
|
||||
|
||||
@@ -55,6 +55,10 @@ TRIGGER_MEDIUM_MIN = 3
|
||||
# ============================================================
|
||||
# 2026-04-18 ogt + Claude Opus 4.7: B 方案 — LLM 驅動智能摘要(取代 Python str()[:30] 截斷)
|
||||
# 架構鐵律: 捨棄 Python 寫死字串解析,結構化 diff 直接餵 LLM,由 LLM 產出繁中 Top 5 摘要
|
||||
# 2026-04-20 P0.2 ogt + Claude Opus 4.7: 加 recommendation 輸出,LLM 推薦該按哪顆按鈕
|
||||
# - action ∈ {adopt, revert, ignore, investigate}
|
||||
# - confidence 0.0-1.0(統帥指令:先不 auto-execute,門檻 0.85 保留給未來)
|
||||
# - reason 一行繁中解釋
|
||||
_NARRATIVE_PROMPT = """你是 AWOOOI SRE 維運助理。以下是 K8s Config Drift 報告的原始結構化資料。
|
||||
|
||||
## 漂移項目原始資料(JSON)
|
||||
@@ -72,9 +76,20 @@ _NARRATIVE_PROMPT = """你是 AWOOOI SRE 維運助理。以下是 K8s Config Dri
|
||||
"field": "簡化後的欄位路徑 (40 字內)",
|
||||
"summary": "30 字內繁體中文口語摘要,說明從什麼變成什麼"
|
||||
}}
|
||||
]
|
||||
],
|
||||
"recommendation": {{
|
||||
"action": "adopt 或 revert 或 ignore 或 investigate",
|
||||
"confidence": 0.85,
|
||||
"reason": "一行繁體中文解釋為何推薦此動作(含關鍵證據)"
|
||||
}}
|
||||
}}
|
||||
|
||||
## recommendation action 語意
|
||||
- adopt: 現狀合理,應把 K8s 狀態寫回 Git (例:HPA 自動擴縮、緊急 hotfix 已驗證)
|
||||
- revert: 漂移有風險,應回滾到 Git 狀態 (例:image tag 被誤改、secret 被外部改)
|
||||
- ignore: 噪音,K8s controller 自動補齊 (例:空 list/dict 差異)
|
||||
- investigate: 不確定,需要人工查清楚
|
||||
|
||||
## 規則
|
||||
- 繁體中文
|
||||
- items 最多挑 5 筆最重要的(HIGH 優先)
|
||||
@@ -84,6 +99,7 @@ _NARRATIVE_PROMPT = """你是 AWOOOI SRE 維運助理。以下是 K8s Config Dri
|
||||
- "新增 pod anti-affinity 規則"
|
||||
- 禁止 markdown、反引號、emoji
|
||||
- 只輸出純 JSON,不要包在 code block 裡
|
||||
- recommendation.confidence 要誠實(HIGH drift 且意圖不明 → 0.3-0.5;trivial noise → 0.9)
|
||||
"""
|
||||
|
||||
|
||||
@@ -126,8 +142,9 @@ class DriftNarratorService:
|
||||
return
|
||||
|
||||
# 2026-04-18 B 方案: LLM 同時產 narrative + 結構化 items(取代 str()[:30])
|
||||
narrative, items = await self._generate_narrative_and_items(report, interpretation)
|
||||
await self._send_telegram(report, narrative, items)
|
||||
# 2026-04-20 P0.2: 追加 recommendation(action/confidence/reason)
|
||||
narrative, items, recommendation = await self._generate_narrative_and_items(report, interpretation)
|
||||
await self._send_telegram(report, narrative, items, recommendation)
|
||||
|
||||
# 寫入 DB narrative_text (Phase 30 ADR-067)
|
||||
try:
|
||||
@@ -162,13 +179,17 @@ class DriftNarratorService:
|
||||
self,
|
||||
report: "DriftReport",
|
||||
interpretation: "DriftInterpretation | None",
|
||||
) -> tuple[str, list[dict]]:
|
||||
) -> tuple[str, list[dict], dict]:
|
||||
"""
|
||||
2026-04-18 ogt + Claude Opus 4.7: B 方案 — LLM 產生 narrative + 結構化 items
|
||||
2026-04-20 P0.2 ogt + Claude Opus 4.7: 追加 recommendation(AI 推薦按鈕)
|
||||
|
||||
回傳 (narrative, items):
|
||||
回傳 (narrative, items, recommendation):
|
||||
narrative: 繁中 4-5 行敘述
|
||||
items: [{level, field, summary}, ...] 最多 5 筆
|
||||
recommendation: {action, confidence, reason}
|
||||
action ∈ {adopt, revert, ignore, investigate}
|
||||
confidence 0.0-1.0(統帥指令:先不 auto-execute,僅顯示供統帥參考)
|
||||
|
||||
LLM 失敗則 fallback 到 Python 智能截斷(不是 str()[:30] 暴力砍)
|
||||
|
||||
@@ -189,6 +210,7 @@ class DriftNarratorService:
|
||||
started_ms = time.time()
|
||||
narrative: str = ""
|
||||
items: list[dict] = []
|
||||
recommendation: dict = {} # 2026-04-20 P0.2
|
||||
raw_response: str | None = None
|
||||
provider: str = "unknown"
|
||||
status: str = "failed"
|
||||
@@ -260,6 +282,33 @@ class DriftNarratorService:
|
||||
if not clean_items:
|
||||
clean_items = self._fallback_items(report)
|
||||
|
||||
# 2026-04-20 P0.2: 解析 recommendation(若 LLM 給了)
|
||||
_rec = None
|
||||
try:
|
||||
if isinstance(_parsed, dict):
|
||||
_rec = _parsed.get("recommendation")
|
||||
# Path 2 場景:recommendation 也可能藏在 _inner
|
||||
if _rec is None and _parsed.get("description", "").startswith("{"):
|
||||
_inner_txt = str(_parsed["description"]).strip()
|
||||
_inner = _json.loads(_inner_txt)
|
||||
if isinstance(_inner, dict):
|
||||
_rec = _inner.get("recommendation")
|
||||
except (_json.JSONDecodeError, ValueError, KeyError):
|
||||
_rec = None
|
||||
if isinstance(_rec, dict) and _rec.get("action"):
|
||||
_act = str(_rec.get("action", "")).strip().lower()
|
||||
if _act in ("adopt", "revert", "ignore", "investigate"):
|
||||
try:
|
||||
_conf = float(_rec.get("confidence", 0.0))
|
||||
except (TypeError, ValueError):
|
||||
_conf = 0.0
|
||||
_conf = max(0.0, min(1.0, _conf))
|
||||
recommendation = {
|
||||
"action": _act,
|
||||
"confidence": _conf,
|
||||
"reason": str(_rec.get("reason", ""))[:200],
|
||||
}
|
||||
|
||||
narrative = _parsed_narrative
|
||||
items = clean_items
|
||||
status = "success"
|
||||
@@ -277,6 +326,10 @@ class DriftNarratorService:
|
||||
items = self._fallback_items(report)
|
||||
status = "failed"
|
||||
|
||||
# 2026-04-20 P0.2: LLM 未給 recommendation 就走 Python fallback
|
||||
if not recommendation:
|
||||
recommendation = self._fallback_recommendation(report, interpretation)
|
||||
|
||||
# ADR-090-C: 同步寫 DB 稽核(永不 propagate error,保護主流程)
|
||||
duration_ms = int((time.time() - started_ms) * 1000)
|
||||
try:
|
||||
@@ -294,7 +347,50 @@ class DriftNarratorService:
|
||||
except Exception as e:
|
||||
logger.warning("drift_narrator_audit_write_failed", error=str(e))
|
||||
|
||||
return narrative, items
|
||||
return narrative, items, recommendation
|
||||
|
||||
def _fallback_recommendation(
|
||||
self,
|
||||
report: "DriftReport",
|
||||
interpretation: "DriftInterpretation | None",
|
||||
) -> dict:
|
||||
"""
|
||||
2026-04-20 P0.2 ogt + Claude Opus 4.7: LLM 沒給 recommendation 時的 Python fallback
|
||||
|
||||
規則式推薦(保守):
|
||||
- 全部 trivial/白名單 → ignore (0.8)
|
||||
- 有 HIGH drift + intent=emergency_hotfix → adopt (0.5) (不確定,降信心)
|
||||
- 有 HIGH drift + intent=human_error → revert (0.7)
|
||||
- 其他 → investigate (0.4)(請人工介入)
|
||||
"""
|
||||
actionable = self._count_nontrivial_drift(report)
|
||||
if actionable == 0:
|
||||
return {
|
||||
"action": "ignore",
|
||||
"confidence": 0.8,
|
||||
"reason": "全部為白名單或 K8s 預設值補齊,無實質變更。",
|
||||
}
|
||||
|
||||
_has_high = report.high_count > 0
|
||||
_intent = interpretation.intent.value if interpretation else "unknown"
|
||||
|
||||
if _has_high and _intent == "emergency_hotfix":
|
||||
return {
|
||||
"action": "adopt",
|
||||
"confidence": 0.5,
|
||||
"reason": "HIGH drift 但意圖分析為緊急 hotfix,建議採納並補 Git(請人工複核)。",
|
||||
}
|
||||
if _has_high and _intent == "human_error":
|
||||
return {
|
||||
"action": "revert",
|
||||
"confidence": 0.7,
|
||||
"reason": "HIGH drift 且意圖分析為人為誤操作,建議回滾 Git 狀態。",
|
||||
}
|
||||
return {
|
||||
"action": "investigate",
|
||||
"confidence": 0.4,
|
||||
"reason": f"有 {actionable} 項可操作漂移,意圖={_intent},需人工查清楚再決定。",
|
||||
}
|
||||
|
||||
async def _log_ai_action_to_db(
|
||||
self,
|
||||
@@ -540,24 +636,29 @@ class DriftNarratorService:
|
||||
report: "DriftReport",
|
||||
narrative: str,
|
||||
items: list[dict],
|
||||
recommendation: dict | None = None,
|
||||
) -> None:
|
||||
"""
|
||||
推送 TYPE-4D Config Drift 卡片(ADR-075)+ B 方案智能摘要
|
||||
|
||||
2026-04-18 ogt + Claude Opus 4.7: 改用 LLM 產的結構化 items,
|
||||
取代 str()[:30] 暴力截斷產生的亂碼
|
||||
2026-04-20 P0.2 ogt + Claude Opus 4.7: recommendation 顯示在卡片頂部
|
||||
(統帥指令:先不 auto-execute,純顯示推薦讓人一眼知道按哪顆)
|
||||
"""
|
||||
from src.services.telegram_gateway import get_telegram_gateway
|
||||
|
||||
diff_summary = self._render_telegram_body(report, narrative, items)
|
||||
diff_summary = self._render_telegram_body(report, narrative, items, recommendation)
|
||||
|
||||
try:
|
||||
tg = get_telegram_gateway()
|
||||
# 2026-04-20 P0.2: 500 → 1500 字上限,讓 AI 推薦 + narrative + items 都能容納
|
||||
# (send_drift_card 已同步放寬 HTML 顯示上限至 1500)
|
||||
await tg.send_drift_card(
|
||||
incident_id=report.report_id,
|
||||
approval_id=report.report_id,
|
||||
resource_name=report.namespace,
|
||||
diff_summary=diff_summary[:500],
|
||||
diff_summary=diff_summary[:1500],
|
||||
detected_at="",
|
||||
)
|
||||
except Exception as e:
|
||||
@@ -603,21 +704,38 @@ class DriftNarratorService:
|
||||
report: "DriftReport",
|
||||
narrative: str,
|
||||
items: list[dict],
|
||||
recommendation: dict | None = None,
|
||||
) -> str:
|
||||
"""
|
||||
組裝 Telegram 卡片 body(B 方案格式)
|
||||
組裝 Telegram 卡片 body(B 方案格式 + P0.2 AI 推薦)
|
||||
|
||||
範例輸出:
|
||||
🎯 AI 建議:⏪ 回滾 (85%) — image tag 被手動改到未驗證版本
|
||||
|
||||
🤖 AI 研判
|
||||
volumes 與 affinity 被手動修改...
|
||||
|
||||
📊 漂移明細 (HIGH: 1 | MEDIUM: 29)
|
||||
🔴 spec.template.spec.volumes: 新增 2 項 repair-ssh-key 掛載
|
||||
🟡 spec.template.spec.serviceAccount: (未設) → awoooi-executor
|
||||
🟡 spec.template.spec.affinity.podAntiAffinity: 新增 preferred 規則
|
||||
... 還有 27 項
|
||||
"""
|
||||
lines = [f"🤖 AI 研判\n{narrative}\n"]
|
||||
lines = []
|
||||
|
||||
# 2026-04-20 P0.2 AI 推薦(頂部,純推薦不自動執行)
|
||||
if recommendation and recommendation.get("action"):
|
||||
_act = recommendation["action"]
|
||||
_conf = float(recommendation.get("confidence", 0.0))
|
||||
_reason = recommendation.get("reason", "")
|
||||
_emoji_action = {
|
||||
"adopt": "✅ 採納",
|
||||
"revert": "⏪ 回滾",
|
||||
"ignore": "🔕 忽略",
|
||||
"investigate": "🔍 人工調查",
|
||||
}.get(_act, _act)
|
||||
lines.append(f"🎯 AI 建議:{_emoji_action} ({int(_conf * 100)}%) — {_reason}\n")
|
||||
|
||||
lines.append(f"🤖 AI 研判\n{narrative}\n")
|
||||
|
||||
# 用非 trivial + 非白名單 的實際可操作數顯示
|
||||
actionable = self._count_nontrivial_drift(report)
|
||||
|
||||
@@ -435,7 +435,9 @@ class TelegramSecurityInterceptor:
|
||||
- 格式二: {action, incident_id, is_info_action: True}
|
||||
"""
|
||||
# 2026-04-01 Claude Code (ADR-050): 支援 read-only info actions (2-part format)
|
||||
INFO_ACTIONS = {"detail", "reanalyze", "history"}
|
||||
# 2026-04-20 P0.1 ogt + Claude Opus 4.7: drift_view_page 納入 INFO_ACTIONS
|
||||
# payload 格式: drift_view_page:{report_id}_{page}(底線分隔,不跟冒號衝突)
|
||||
INFO_ACTIONS = {"detail", "reanalyze", "history", "drift_view_page"}
|
||||
parts = callback_data.split(":")
|
||||
if len(parts) == 2 and parts[0] in INFO_ACTIONS:
|
||||
return {
|
||||
|
||||
@@ -1961,7 +1961,8 @@ class TelegramGateway:
|
||||
# 根因:<pre> 在 Telegram HTML mode 渲染為 code block,但 diff_summary 是 AI
|
||||
# 研判敘述 + emoji 清單(非 code),應以純文字顯示
|
||||
# Diff 長度處理 (ADR-071, Section 14.9.6)
|
||||
if len(diff_summary) <= 500:
|
||||
# 2026-04-20 P0.2 ogt + Claude Opus 4.7: 500 → 1500 讓 AI 建議 + narrative + items 完整顯示
|
||||
if len(diff_summary) <= 1500:
|
||||
diff_block = f"\n━━━━━━━━━━━━━━━━━━━\n{html.escape(diff_summary)}"
|
||||
else:
|
||||
web_url = f"https://aiops.wooo.work/incidents/{incident_id}/drift-diff"
|
||||
@@ -2084,10 +2085,37 @@ class TelegramGateway:
|
||||
|
||||
return {"action": action, "approval_id": approval_id, "user": user, "success": False}
|
||||
|
||||
async def _send_drift_diff_detail(self, report_id: str) -> None:
|
||||
# 2026-04-20 P0.1 ogt + Claude Opus 4.7: drift_view 分頁 + 分類桶
|
||||
# 原邏輯: _send_drift_diff_detail 一次列 3800 字元 → 30 項洗版
|
||||
# 新邏輯: 分頁 10 項/頁、header 顯示 3 桶分類計數、⬅️/➡️ 按鈕切頁
|
||||
_DRIFT_PAGE_SIZE = 10
|
||||
|
||||
def _classify_drift_item(self, item) -> str:
|
||||
"""
|
||||
送完整 Drift Diff 到 Telegram (drift_view 按鈕回應)
|
||||
展示全部 items (含 HIGH + MEDIUM + 可操作+trivial 分群)
|
||||
分類 drift item 到 3 桶(規則式,不走 LLM 省 token):
|
||||
- k8s_default: K8s controller 自動補齊(白名單或空↔空)
|
||||
- human_high: HIGH level 且非 trivial(像是 image/env/ports 被人工改)
|
||||
- routine_medium: MEDIUM 非 trivial(一般設定調整)
|
||||
"""
|
||||
level = getattr(item.drift_level, "value", str(item.drift_level))
|
||||
# 白名單或 trivial → K8s 自動補齊
|
||||
if item.is_allowlisted:
|
||||
return "k8s_default"
|
||||
_g, _a = item.git_value, item.actual_value
|
||||
_empty_g = _g is None or str(_g).strip() in ("", "{}", "[]", "null", "None")
|
||||
_empty_a = _a is None or str(_a).strip() in ("", "{}", "[]", "null", "None")
|
||||
if _empty_g and _empty_a:
|
||||
return "k8s_default"
|
||||
if level == "high":
|
||||
return "human_high"
|
||||
return "routine_medium"
|
||||
|
||||
async def _send_drift_diff_detail(self, report_id: str, page: int = 0) -> None:
|
||||
"""
|
||||
送分頁 Drift Diff 到 Telegram (drift_view / drift_view_page 按鈕回應)
|
||||
|
||||
每頁 _DRIFT_PAGE_SIZE 項,header 顯示 3 桶分類計數 + 分頁位置,
|
||||
底部含「⬅️ 上頁 / 下頁 ➡️」按鈕 (callback: drift_view_page:{report_id}_{page})。
|
||||
"""
|
||||
try:
|
||||
from src.repositories.drift_repository import get_drift_repository
|
||||
@@ -2100,25 +2128,44 @@ class TelegramGateway:
|
||||
})
|
||||
return
|
||||
|
||||
# 2026-04-19 ogt + Claude Opus 4.7: 修 HTTP 400 真因
|
||||
# 原邏輯: _full[:3950] 切在 HTML tag/entity 中間 → Telegram parse_mode HTML 拒絕
|
||||
# 修法: item-by-item 累計長度,超過 3800 就停,確保完整 HTML 結構
|
||||
# (3800 留 250 buffer 給 header + 截斷提示)
|
||||
_MAX_LEN = 3800
|
||||
# 1. 分類 & 排序(HIGH 優先 → routine → trivial)
|
||||
_classified: list[tuple[str, object]] = [
|
||||
(self._classify_drift_item(_it), _it) for _it in _rpt.items
|
||||
]
|
||||
_bucket_order = {"human_high": 0, "routine_medium": 1, "k8s_default": 2}
|
||||
_classified.sort(key=lambda x: _bucket_order[x[0]])
|
||||
|
||||
_bucket_counts = {"human_high": 0, "routine_medium": 0, "k8s_default": 0}
|
||||
for _bk, _ in _classified:
|
||||
_bucket_counts[_bk] += 1
|
||||
|
||||
_total = len(_classified)
|
||||
_total_pages = max(1, (_total + self._DRIFT_PAGE_SIZE - 1) // self._DRIFT_PAGE_SIZE)
|
||||
_page = max(0, min(page, _total_pages - 1))
|
||||
_start = _page * self._DRIFT_PAGE_SIZE
|
||||
_end = min(_start + self._DRIFT_PAGE_SIZE, _total)
|
||||
_slice = _classified[_start:_end]
|
||||
|
||||
# 2. Header(AI 分類桶)
|
||||
_header = [
|
||||
f"📊 <b>完整 Drift Diff</b> — <code>{html.escape(report_id)}</code>",
|
||||
f"📊 <b>Drift Diff (頁 {_page + 1}/{_total_pages})</b> — <code>{html.escape(report_id)[:24]}</code>",
|
||||
f"Namespace: <code>{html.escape(_rpt.namespace)}</code>",
|
||||
f"HIGH×{_rpt.high_count} MEDIUM×{_rpt.medium_count} INFO×{_rpt.info_count}",
|
||||
(
|
||||
f"🔴 人工高風險 {_bucket_counts['human_high']} | "
|
||||
f"🟡 一般修改 {_bucket_counts['routine_medium']} | "
|
||||
f"🔧 K8s 自動 {_bucket_counts['k8s_default']}"
|
||||
),
|
||||
"━" * 20,
|
||||
]
|
||||
_lines = list(_header)
|
||||
_MAX_LEN = 3800
|
||||
_used_len = sum(len(s) + 1 for s in _header)
|
||||
_shown = 0
|
||||
|
||||
for _item in _rpt.items:
|
||||
_level = getattr(_item.drift_level, "value", str(_item.drift_level))
|
||||
_emoji = "🔴" if _level == "high" else ("🟡" if _level == "medium" else "⚪")
|
||||
# 3. 本頁項目(每項仍守 _MAX_LEN 上限,極端長值時寧可提早中斷也不洗版)
|
||||
_rendered = 0
|
||||
_bucket_emoji = {"human_high": "🔴", "routine_medium": "🟡", "k8s_default": "🔧"}
|
||||
for _bk, _item in _slice:
|
||||
_emoji = _bucket_emoji[_bk]
|
||||
_field = (_item.field_path or "")[:80]
|
||||
_git = str(_item.git_value)[:40] if _item.git_value is not None else "(未設)"
|
||||
_k8s = str(_item.actual_value)[:40] if _item.actual_value is not None else "(未設)"
|
||||
@@ -2131,22 +2178,42 @@ class TelegramGateway:
|
||||
break
|
||||
_lines.append(_block)
|
||||
_used_len += len(_block) + 1
|
||||
_shown += 1
|
||||
_rendered += 1
|
||||
|
||||
_remaining = len(_rpt.items) - _shown
|
||||
if _remaining > 0:
|
||||
_lines.append(f"… 還有 {_remaining} 項未顯示")
|
||||
_skipped_in_page = len(_slice) - _rendered
|
||||
if _skipped_in_page > 0:
|
||||
_lines.append(f"… 本頁還有 {_skipped_in_page} 項過長未顯示,請縮小 field 範圍")
|
||||
|
||||
_full = "\n".join(_lines)
|
||||
|
||||
await self._send_request("sendMessage", {
|
||||
# 4. 分頁按鈕(INFO_ACTIONS 2-part 格式,payload 用底線分隔 report_id 與 page)
|
||||
_rows = []
|
||||
_nav = []
|
||||
if _page > 0:
|
||||
_nav.append({
|
||||
"text": "⬅️ 上頁",
|
||||
"callback_data": f"drift_view_page:{report_id}_{_page - 1}",
|
||||
})
|
||||
if _page < _total_pages - 1:
|
||||
_nav.append({
|
||||
"text": "下頁 ➡️",
|
||||
"callback_data": f"drift_view_page:{report_id}_{_page + 1}",
|
||||
})
|
||||
if _nav:
|
||||
_rows.append(_nav)
|
||||
_keyboard = {"inline_keyboard": _rows} if _rows else None
|
||||
|
||||
_payload = {
|
||||
"chat_id": settings.OPENCLAW_TG_CHAT_ID,
|
||||
"text": _full,
|
||||
"parse_mode": "HTML",
|
||||
"disable_web_page_preview": True,
|
||||
})
|
||||
}
|
||||
if _keyboard:
|
||||
_payload["reply_markup"] = _keyboard
|
||||
await self._send_request("sendMessage", _payload)
|
||||
except Exception as _e:
|
||||
logger.warning("drift_diff_detail_send_failed", report_id=report_id, error=str(_e))
|
||||
logger.warning("drift_diff_detail_send_failed", report_id=report_id, page=page, error=str(_e))
|
||||
await self._send_request("sendMessage", {
|
||||
"chat_id": settings.OPENCLAW_TG_CHAT_ID,
|
||||
"text": f"⚠️ Drift Diff 查詢失敗: <code>{html.escape(str(_e)[:150])}</code>",
|
||||
@@ -2986,6 +3053,18 @@ class TelegramGateway:
|
||||
# 2026-04-01 Claude Code (ADR-050 P2): reanalyze button handler
|
||||
await self._answer_callback(callback_query_id, action, text="🔄 重診排程中...")
|
||||
await self._send_reanalyze_result(incident_id)
|
||||
elif action == "drift_view_page":
|
||||
# 2026-04-20 P0.1 ogt + Claude Opus 4.7: drift_view 分頁切頁
|
||||
# incident_id 格式: {report_id}_{page}(底線分隔)
|
||||
_rid, _, _page_str = incident_id.rpartition("_")
|
||||
try:
|
||||
_page_num = int(_page_str)
|
||||
except ValueError:
|
||||
_rid, _page_num = incident_id, 0
|
||||
await self._answer_callback(
|
||||
callback_query_id, action, text=f"📄 切換至第 {_page_num + 1} 頁..."
|
||||
)
|
||||
await self._send_drift_diff_detail(_rid or incident_id, page=_page_num)
|
||||
else:
|
||||
# 2026-04-14 Claude Sonnet 4.6 (Phase 5 Sprint 5.1):
|
||||
# 未知 action → fallback dispatcher (查看 callback_action_spec.yaml 是否有註冊)
|
||||
|
||||
@@ -6,6 +6,44 @@
|
||||
|
||||
---
|
||||
|
||||
## 📍 2026-04-20 上午 — P0.1 + P0.2 + P0.3 三項 Drift/Target 修復
|
||||
|
||||
### 統帥三問 RCA 後決議
|
||||
1. 全做 P0.1 + P0.2 + P0.3
|
||||
2. AI 推薦門檻 0.85 OK,**但先不 auto-execute**(純推薦)
|
||||
3. 先查 aol 找 awoooi-service 來源 trace 再修
|
||||
|
||||
### RCA 結論(awoooi-service 失敗)
|
||||
- 透過 `/api/v1/aiops/kpi` 看到過去 24h 有 1 筆 `playbook_executed actor=approval_execution status=failed`
|
||||
- grep 全 codebase:**無任何程式碼寫死 `awoooi-service`**(只有歷史 comment)
|
||||
- 最可能來源:`alert_rule_engine._extract_vars` 從 `labels.service` 取值當 Deployment 名(K8s Service 名 ≠ Deployment 名)
|
||||
- cf59050 / 4f2e122(2026-04-18)已修 NEMOTRON 幻覺雙路徑;本次修第三條路徑(rule engine label fallback)
|
||||
|
||||
### 修復內容(5 檔 / 281 行)
|
||||
| # | 檔案 | 內容 |
|
||||
|---|------|------|
|
||||
| P0.3a | `alert_rule_engine.py` | `_extract_vars` service label 降級:`-service` 結尾先剝 suffix,同時回傳 `target_source` 追蹤來源 |
|
||||
| P0.3c | `approval_execution.py` | `_log_aol_started` input 補 `parsed_target/operation/namespace`,下次失敗可直接從 aol 查 trace |
|
||||
| P0.3b | `approval_execution.py` | 既有 `_log_aol_completed` 本就寫 `resource_name/error/stderr`,追 trace 夠用 |
|
||||
| P0.1 | `telegram_gateway.py` | `_send_drift_diff_detail` 加分頁(10 項/頁)+ 3 桶分類 header(人工高風險/一般修改/K8s 自動)+ ⬅️/➡️ 按鈕 |
|
||||
| P0.1 | `security_interceptor.py` | INFO_ACTIONS 加 `drift_view_page` 白名單 |
|
||||
| P0.2 | `drift_narrator_service.py` | LLM prompt 加 recommendation 欄位(adopt/revert/ignore/investigate + confidence + reason)|
|
||||
| P0.2 | `drift_narrator_service.py` | `_render_telegram_body` 頂部顯示「🎯 AI 建議:⏪ 回滾 (85%) — 原因」 |
|
||||
| P0.2 | `drift_narrator_service.py` + `telegram_gateway.py` | 卡片 diff_summary 上限 500 → 1500 字,容納推薦 + narrative + items |
|
||||
|
||||
### 驗證
|
||||
- 90 個 pytest test 全過(drift / rule_engine / approval_execution)
|
||||
- 5 檔 AST syntax check 過
|
||||
- AI 推薦**純顯示不自動執行**(依統帥指令)
|
||||
|
||||
### 下一步
|
||||
1. 等下次 real drift 觸發,驗卡片頂部有「🎯 AI 建議」
|
||||
2. 等下次 drift_view 按下,驗分頁 + 分類 header + ⬅️/➡️ 按鈕
|
||||
3. 若 awoooi-service 再復發,查 `automation_operation_log` 的 `input.parsed_target` 直接追來源
|
||||
4. P1 留:drift 分類器 (noise/controller/human) 進 DB、auto-adopt 門檻 ≥0.85 + low risk
|
||||
|
||||
---
|
||||
|
||||
## 📍 2026-04-19 晚 21:30 — Gap Review + 3 Gap 修 + AI 自主化 1/9→4/9 LLM 🎖️🎖️🎖️🎖️
|
||||
|
||||
### 統帥核心指示
|
||||
|
||||
Reference in New Issue
Block a user