feat(adr-081): Phase 1 感官縱深 — 8D 情報蒐集 + 執行後驗證

成品:
- IncidentEvidence DB model(8D 感官 + pre/post 執行狀態)
- EvidenceSnapshot dataclass(build_summary → LLM 上下文)
- SanitizationService(Prompt Injection 0-tolerance,12 pattern)
- MCPToolRegistry(動態工具登記,suggest_tools 不寫死告警類型)
- PreDecisionInvestigator(8D 並行感官,P99 < 8s,Redis 30s 快取)
- PostExecutionVerifier(warmup 10s → 後狀態評估 success/degraded/failed)
- decision_manager + approval_execution 接線(feature flag 守衛)

Gate 1 修復:D4/D5/D7/D8 補 sanitize_dict_values;移除裸 "error" failure
signal 防 error_rate key 誤判;evidence_snapshot rowcount 零行警告。

測試:130 passed(+111 新增)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
OG T
2026-04-15 13:08:38 +08:00
parent db9e304a14
commit f1cbf6db7d
14 changed files with 2936 additions and 3 deletions

View File

@@ -737,3 +737,99 @@ class KnowledgeEntryRecord(Base):
# 2026-04-04 ogt: Phase 25 P1 — Anti-Pattern 快速查詢
Index("ix_knowledge_symptoms_hash", "symptoms_hash"),
)
# IncidentEvidence — ADR-081 Phase 1 EvidenceSnapshot 持久化
# 2026-04-15 ogt + Claude Sonnet 4.6: AI 自主化飛輪 Phase 1 初始建立
class IncidentEvidence(Base):
"""
不可變事件證據快照表
每次決策前 PreDecisionInvestigator 拍攝一次 EvidenceSnapshot
寫入此表以供:
- 決策溯源LLM 推理過程的完整情報上下文)
- 學習訓練Phase 3 fine-tune pipeline 金礦資料)
- 異常驗證(執行前 vs 執行後 state diff
ADR-081: PreDecisionInvestigator + EvidenceSnapshot
設計原則:只追加寫入,禁止 UPDATEevent sourcing 對齊)
"""
__tablename__ = "incident_evidence"
id: Mapped[str] = mapped_column(String(36), primary_key=True, default=generate_uuid)
# 關聯
incident_id: Mapped[str] = mapped_column(String(30), nullable=False, index=True)
# Phase 3 填充matched_playbook_id 目前永久 nullPhase 3 修復
matched_playbook_id: Mapped[str | None] = mapped_column(String(36), nullable=True)
# Schema 版本(方便 fine-tune pipeline 過濾相容版本)
schema_version: Mapped[str] = mapped_column(String(10), default="v1", nullable=False)
# 8D 感官數據(各維度 nullable — MCP 失敗時部分缺失)
k8s_state: Mapped[dict | None] = mapped_column(
JSON, nullable=True, comment="D1: kubectl describe pod + events"
)
recent_logs: Mapped[str | None] = mapped_column(
Text, nullable=True, comment="D2: container stderr tail-50經 SanitizationService 清洗"
)
metrics_snapshot: Mapped[dict | None] = mapped_column(
JSON, nullable=True, comment="D3: Prometheus 5min vs 1h baseline 對比"
)
recent_deployments: Mapped[list | None] = mapped_column(
JSON, nullable=True, comment="D4: ArgoCD/Gitea 過去 1h 部署 diff"
)
business_metrics: Mapped[dict | None] = mapped_column(
JSON, nullable=True, comment="D5: 訂單量 / 登入成功率 / P0 SLI"
)
historical_context: Mapped[str | None] = mapped_column(
Text, nullable=True, comment="D6: 過去 30 天同 alertname 處置歷史摘要"
)
peer_health: Mapped[dict | None] = mapped_column(
JSON, nullable=True, comment="D7: 同 Deployment 其他 replica 健康度"
)
dependency_topology: Mapped[dict | None] = mapped_column(
JSON, nullable=True, comment="D8: Istio/Service Mesh 上下游 latency/error rate"
)
# 感官品質指標
mcp_health: Mapped[dict] = mapped_column(
JSON, default=dict, nullable=False,
comment="各 MCP 呼叫成敗 {tool_name: bool},用於 decision_fusion 權重調整"
)
collection_duration_ms: Mapped[int | None] = mapped_column(
Integer, nullable=True, comment="情報蒐集總耗時msP99 目標 < 8000"
)
sensors_attempted: Mapped[int] = mapped_column(
default=0, nullable=False, comment="嘗試啟動的感官數"
)
sensors_succeeded: Mapped[int] = mapped_column(
default=0, nullable=False, comment="成功回傳資料的感官數"
)
# LLM 輸入摘要(不超 8K tokens由 Investigator 壓縮)
evidence_summary: Mapped[str | None] = mapped_column(
Text, nullable=True, comment="最終餵給 LLM 的情報摘要UTF-8< 8K tokens"
)
# 執行前後 StatePostExecutionVerifier 填入 post_execution_state
pre_execution_state: Mapped[dict | None] = mapped_column(
JSON, nullable=True, comment="執行前環境狀態快照PostExecutionVerifier 基準線)"
)
post_execution_state: Mapped[dict | None] = mapped_column(
JSON, nullable=True, comment="執行後環境狀態PostExecutionVerifier 抓取Phase 1 接線)"
)
verification_result: Mapped[str | None] = mapped_column(
String(20), nullable=True, comment="success / degraded / failed / timeoutPostExecutionVerifier 填入)"
)
# 時間戳(台北時區)
collected_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=taipei_now, nullable=False
)
__table_args__ = (
Index("ix_incident_evidence_incident_id", "incident_id"),
Index("ix_incident_evidence_collected_at", "collected_at"),
Index("ix_incident_evidence_playbook_id", "matched_playbook_id"),
)