feat(Phase 3): 學習閉環重建 — 三根因修復 + 2x EWMA + Evolver Agent
Some checks failed
CD Pipeline / build-and-deploy (push) Failing after 19m7s
Type Sync Check / check-type-sync (push) Failing after 1m18s

ADR-083 Phase 3 學習閉環重建:

**三根因修復**
- approval_execution.py: fire-and-forget create_task → await asyncio.wait_for(timeout=30) × 2
  (成功路徑 L265 + 失敗路徑 L353,超時記錄 learning_trigger_timeout metric,主流程不 crash)
- models/approval.py: ApprovalRequestBase 新增 matched_playbook_id 欄位
- decision_manager.py: _auto_execute 建立 ApprovalRequest 時填充 matched_playbook_id
- learning_service.py: 雙路徑查找 _matched_pb_id(matched_playbook_id + metadata fallback)

**2x EWMA 負向強化**
- models/playbook.py: 新增 trust_score: float = 0.3(EWMA 動態信任度欄位)
- repositories/playbook_repository.py: update_stats 加 EWMA
  成功: trust = 0.9 × old + 0.1 × 1.0
  失敗: trust = 0.8 × old + 0.2 × 0.0(衰減速度 2x)
  trust < 0.1 → log warning,等 Evolver 封存

**Evolver Agent(新建)**
- services/playbook_evolver.py: 三功能全靜態規則
  1. 低信任封存: trust < 0.1 → DEPRECATED
  2. 休眠封存: 30d 未使用 AND trust < 0.5 → DEPRECATED
  3. 相似合併: 症狀 Jaccard > 0.9 → 保留高 trust,封存低 trust
  AIOPS_P3_EVOLVER_ENABLED=False 預設關閉

**文件**
- ADR-083 學習閉環重建
- MASTER §8 Phase 3 完工記錄

AIOPS_P3_ENABLED=False(預設),骨架就位等統帥批准開啟

Co-Authored-By: Claude Sonnet 4.6(亞太)<noreply@anthropic.com>
This commit is contained in:
OG T
2026-04-15 14:01:28 +08:00
parent 7edb298a75
commit 7da64eaad2
9 changed files with 565 additions and 20 deletions

View File

@@ -143,6 +143,9 @@ class ApprovalRequestBase(BaseModel):
# 2026-04-14 Claude Sonnet 4.6: 上移 incident_id 到 Base
# 讓 ApprovalRequestCreate 也能攜帶(修 9b9ff5b 的 NoneAttr bug
incident_id: str | None = Field(default=None, description="關聯的 Incident ID")
# ADR-083 Phase 3: 命中的 Playbook ID學習迴路必填
# 2026-04-15 ogt + Claude Sonnet 4.6(亞太): Phase 3 matched_playbook_id 傳遞修復
matched_playbook_id: str | None = Field(default=None, description="命中的 Playbook ID供學習服務 EWMA 更新")
class ApprovalRequestCreate(ApprovalRequestBase):

View File

@@ -205,6 +205,12 @@ class Playbook(BaseModel):
success_count: int = Field(default=0, ge=0, description="成功執行次數")
failure_count: int = Field(default=0, ge=0, description="失敗執行次數")
last_used_at: datetime | None = Field(None, description="最後使用時間")
# ADR-083 Phase 3: EWMA 信任度0.0-1.0,初值 0.3
# 成功: trust_new = 0.9 * trust_old + 0.1 * 1.0
# 失敗: trust_new = 0.8 * trust_old + 0.2 * 0.02x 衰減)
# trust < 0.1 → 自動封存(由 Evolver Agent 處理)
# 2026-04-15 ogt + Claude Sonnet 4.6(亞太): Phase 3 EWMA 負向強化
trust_score: float = Field(default=0.3, ge=0.0, le=1.0, description="EWMA 動態信任度Phase 3 新增)")
# === 人工標記 ===
approved_by: str | None = Field(None, description="核准者")

View File

@@ -292,7 +292,15 @@ class PlaybookRepository:
playbook_id: str,
success: bool,
) -> bool:
"""更新執行統計"""
"""
更新執行統計 + EWMA 信任度
ADR-083 Phase 3: 負向 2x 強化 EWMA 公式
成功: trust_new = 0.9 * trust_old + 0.1 * 1.0
失敗: trust_new = 0.8 * trust_old + 0.2 * 0.0(衰減速度 2x
trust < 0.1 → 記錄警告,由 Evolver Agent 封存
2026-04-15 ogt + Claude Sonnet 4.6(亞太): Phase 3 EWMA 實裝
"""
try:
playbook = await self.get_by_id(playbook_id)
if not playbook:
@@ -300,17 +308,32 @@ class PlaybookRepository:
if success:
playbook.success_count += 1
# 正向 EWMAalpha=0.1,正向結果權重較小(保守更新)
playbook.trust_score = 0.9 * playbook.trust_score + 0.1 * 1.0
else:
playbook.failure_count += 1
# 負向 EWMAalpha=0.2,失敗懲罰 2x快速衰退
playbook.trust_score = 0.8 * playbook.trust_score + 0.2 * 0.0
# 邊界保護
playbook.trust_score = max(0.0, min(1.0, playbook.trust_score))
playbook.last_used_at = now_taipei()
await self.update(playbook)
if playbook.trust_score < 0.1:
logger.warning(
"playbook_trust_low_auto_archive_candidate",
playbook_id=playbook_id,
trust_score=playbook.trust_score,
hint="Evolver Agent 應將此 Playbook 封存",
)
logger.info(
"playbook_stats_updated",
playbook_id=playbook_id,
success=success,
success_rate=playbook.success_rate,
trust_score=playbook.trust_score,
)
return True

View File

@@ -261,14 +261,25 @@ class ApprovalExecutionService:
self._trigger_playbook_extraction(approval)
)
# ADR-030 Phase 5: 觸發學習服務 (fire-and-forget)
asyncio.create_task(
self._trigger_learning(
approval=approval,
success=True,
duration_seconds=result.duration_ms / 1000 if result.duration_ms else 0,
# ADR-030 Phase 5 / ADR-083 Phase 3: 觸發學習服務
# Phase 3 修復:移除 fire-and-forget改用 await + 30s 熔斷
# 超時 → 記錄 metric主流程繼續不 crash
# 2026-04-15 ogt + Claude Sonnet 4.6(亞太): Phase 3 fire-and-forget 修復
try:
await asyncio.wait_for(
self._trigger_learning(
approval=approval,
success=True,
duration_seconds=result.duration_ms / 1000 if result.duration_ms else 0,
),
timeout=30.0,
)
except asyncio.TimeoutError:
logger.warning(
"learning_trigger_timeout",
approval_id=str(approval.id),
timeout_sec=30.0,
)
)
# ADR-081 Phase 1: 執行後驗證 (fire-and-forget)
# PostExecutionVerifier 等待 K8s 收斂後抓取後狀態,補填 EvidenceSnapshot
@@ -349,15 +360,25 @@ class ApprovalExecutionService:
)
)
# ADR-030 Phase 5: 觸發學習服務 (失敗案例)
asyncio.create_task(
self._trigger_learning(
approval=approval,
success=False,
error_message=result.error,
duration_seconds=result.duration_ms / 1000 if result.duration_ms else 0,
# ADR-030 Phase 5 / ADR-083 Phase 3: 觸發學習服務失敗案例
# Phase 3 修復fire-and-forget → await + 30s 熔斷
# 2026-04-15 ogt + Claude Sonnet 4.6(亞太): Phase 3 fire-and-forget 修復
try:
await asyncio.wait_for(
self._trigger_learning(
approval=approval,
success=False,
error_message=result.error,
duration_seconds=result.duration_ms / 1000 if result.duration_ms else 0,
),
timeout=30.0,
)
except asyncio.TimeoutError:
logger.warning(
"learning_trigger_timeout",
approval_id=str(approval.id),
timeout_sec=30.0,
)
)
async def _push_execution_result_to_alert(
self,

View File

@@ -1478,6 +1478,10 @@ class DecisionManager:
# 建立虛擬 ApprovalRequest (auto_execute — 不需人工審核)
_risk = token.proposal_data.get("risk_level", "low")
# ADR-083 Phase 3: 傳遞 matched_playbook_id學習迴路 EWMA 才能觸發
# Playbook RAG 命中時 proposal_data["playbook_id"] 會有值(見 _try_playbook_match
# 2026-04-15 ogt + Claude Sonnet 4.6(亞太): Phase 3 matched_playbook_id 修復
_matched_playbook_id = token.proposal_data.get("playbook_id")
approval = ApprovalRequest(
incident_id=incident.incident_id,
action=action,
@@ -1486,6 +1490,7 @@ class DecisionManager:
required_signatures=0,
status=ApprovalStatus.APPROVED,
risk_level=_risk,
matched_playbook_id=_matched_playbook_id,
)
# ADR-071-I: 執行前抓 metrics_before 快照 (2026-04-11 Claude Sonnet 4.6)

View File

@@ -205,24 +205,33 @@ class LearningService:
trust_after = trust_record.score if trust_record else 0
# 2. 更新 Playbook 統計 (如果有匹配)
# ADR-083 Phase 3: 雙路徑查找 matched_playbook_id
# 路徑 A: ApprovalRequest.matched_playbook_idauto_execute 路徑Phase 3 修復)
# 路徑 B: approval.metadata["playbook_id"](人工審核路徑,透過 proposal_service 存入 metadata
# 2026-04-15 ogt + Claude Sonnet 4.6(亞太): Phase 3 Playbook EWMA 修復
_matched_pb_id: str | None = (
getattr(approval, "matched_playbook_id", None)
or (approval.metadata or {}).get("matched_playbook_id")
or (approval.metadata or {}).get("playbook_id")
)
playbook_updated = False
if hasattr(approval, "matched_playbook_id") and approval.matched_playbook_id:
if _matched_pb_id:
try:
await self._update_playbook_stats(
playbook_id=approval.matched_playbook_id,
playbook_id=_matched_pb_id,
success=result.success,
)
playbook_updated = True
except Exception as e:
logger.warning(
"playbook_stats_update_failed",
playbook_id=approval.matched_playbook_id,
playbook_id=_matched_pb_id,
error=str(e),
)
# 3. 嘗試萃取新 Playbook (成功且無匹配 Playbook)
new_playbook_id = None
if result.success and not getattr(approval, "matched_playbook_id", None):
if result.success and not _matched_pb_id:
try:
new_playbook_id = await self._try_extract_playbook(
incident_id=result.incident_id,

View File

@@ -0,0 +1,356 @@
"""
AWOOOI AIOps Phase 3 — Playbook Evolver Agent知識演化官
==========================================================
職責Playbook 自動合併、低信任封存、知識庫精化
觸發方式:定時(每日凌晨)或手動呼叫 `run_evolver()`
三大功能:
1. 低信任封存 — trust_score < 0.1 → DEPRECATED自動退場
2. 休眠封存 — 30 天未使用 AND trust_score < 0.5 → DEPRECATED
3. 相似合併 — Jaccard 症狀相似度 > 0.9 → 合併為一(保留 trust 較高者)
設計原則:
- 純靜態規則(不依賴 LLM— 保證確定性
- 熔斷保護:單筆操作失敗不影響其他 Playbook
- best-effort 審計:合併/封存寫 logger.info
- feature flag 保護AIOPS_P3_EVOLVER_ENABLED=False 時靜默跳過
ADR-083: Phase 3 學習閉環重建
2026-04-15 ogt + Claude Sonnet 4.6(亞太): Phase 3 初始建立
"""
from __future__ import annotations
import asyncio
from dataclasses import dataclass, field
from datetime import timedelta
from typing import Any
import structlog
from src.models.playbook import Playbook, PlaybookStatus
from src.utils.timezone import now_taipei
logger = structlog.get_logger(__name__)
# ── 閾值常數 ────────────────────────────────────────────────────────────────
TRUST_ARCHIVE_THRESHOLD = 0.1 # trust_score < 此值 → 封存
DORMANT_TRUST_THRESHOLD = 0.5 # 休眠封存trust < 此值 AND 30d 未用
DORMANT_DAYS = 30 # 休眠天數
MERGE_SIMILARITY_THRESHOLD = 0.9 # 症狀相似度 > 此值 → 合併候選
MAX_MERGE_PER_RUN = 10 # 單次合併上限(防止操作過多)
# ─────────────────────────────────────────────────────────────────────────────
# Result Types
# ─────────────────────────────────────────────────────────────────────────────
@dataclass
class EvolverReport:
"""Evolver 執行報告"""
archived_count: int = 0
merged_count: int = 0
skipped_count: int = 0
archived_ids: list[str] = field(default_factory=list)
merged_pairs: list[tuple[str, str]] = field(default_factory=list) # (deprecated_id, kept_id)
errors: list[str] = field(default_factory=list)
@property
def total_affected(self) -> int:
return self.archived_count + self.merged_count
# ─────────────────────────────────────────────────────────────────────────────
# Main Entry Point
# ─────────────────────────────────────────────────────────────────────────────
async def run_evolver() -> EvolverReport:
"""
執行 Evolver Agent 全流程。
Returns:
EvolverReport包含所有操作結果
Raises:
不拋出 — 所有錯誤內部吸收,最壞情況返回空報告
"""
from src.core.feature_flags import aiops_flags
if not aiops_flags.AIOPS_P3_EVOLVER_ENABLED:
logger.debug("evolver_skipped", reason="AIOPS_P3_EVOLVER_ENABLED=False")
return EvolverReport()
report = EvolverReport()
try:
playbooks = await _fetch_all_active_playbooks()
if not playbooks:
logger.info("evolver_no_playbooks")
return report
# Step 1: 低信任封存
await _archive_low_trust(playbooks, report)
# Step 2: 休眠封存(剩餘 APPROVED/DRAFT 中挑)
remaining = [p for p in playbooks if p.status not in (PlaybookStatus.DEPRECATED,)]
await _archive_dormant(remaining, report)
# Step 3: 相似合併
still_active = [p for p in remaining if p.status not in (PlaybookStatus.DEPRECATED,)]
await _merge_similar(still_active, report)
logger.info(
"evolver_done",
archived=report.archived_count,
merged=report.merged_count,
skipped=report.skipped_count,
errors=len(report.errors),
)
except Exception:
logger.exception("evolver_fatal")
return report
# ─────────────────────────────────────────────────────────────────────────────
# Step Implementations
# ─────────────────────────────────────────────────────────────────────────────
async def _archive_low_trust(playbooks: list[Playbook], report: EvolverReport) -> None:
"""Step 1: trust_score < 0.1 → DEPRECATED自動退場"""
from src.services.playbook_service import get_playbook_service
service = get_playbook_service()
for pb in playbooks:
if pb.status == PlaybookStatus.DEPRECATED:
continue
if pb.trust_score < TRUST_ARCHIVE_THRESHOLD:
try:
await service.update_with_validation(
pb.playbook_id,
{"status": PlaybookStatus.DEPRECATED.value},
)
logger.info(
"evolver_archived_low_trust",
playbook_id=pb.playbook_id,
playbook_name=pb.name,
trust_score=pb.trust_score,
)
report.archived_count += 1
report.archived_ids.append(pb.playbook_id)
# 原地更新 status 避免後續步驟重複處理
pb.status = PlaybookStatus.DEPRECATED
except Exception as e:
report.errors.append(f"archive_low_trust:{pb.playbook_id}:{e}")
logger.warning(
"evolver_archive_failed",
playbook_id=pb.playbook_id,
error=str(e),
)
async def _archive_dormant(playbooks: list[Playbook], report: EvolverReport) -> None:
"""Step 2: 30d 未使用 AND trust < 0.5 → DEPRECATED休眠退場"""
from src.services.playbook_service import get_playbook_service
service = get_playbook_service()
cutoff = now_taipei() - timedelta(days=DORMANT_DAYS)
for pb in playbooks:
if pb.status == PlaybookStatus.DEPRECATED:
continue
if pb.last_used_at is None:
# 從未使用過 — 只在 trust 低於閾值時封存
if pb.trust_score >= DORMANT_TRUST_THRESHOLD:
report.skipped_count += 1
continue
elif pb.last_used_at > cutoff:
# 30 天內有使用 — 不封存
report.skipped_count += 1
continue
if pb.trust_score >= DORMANT_TRUST_THRESHOLD:
# trust 夠高 — 保留休眠 Playbook
report.skipped_count += 1
continue
try:
await service.update_with_validation(
pb.playbook_id,
{"status": PlaybookStatus.DEPRECATED.value},
)
logger.info(
"evolver_archived_dormant",
playbook_id=pb.playbook_id,
playbook_name=pb.name,
trust_score=pb.trust_score,
last_used_at=str(pb.last_used_at),
)
report.archived_count += 1
report.archived_ids.append(pb.playbook_id)
pb.status = PlaybookStatus.DEPRECATED
except Exception as e:
report.errors.append(f"archive_dormant:{pb.playbook_id}:{e}")
logger.warning(
"evolver_dormant_archive_failed",
playbook_id=pb.playbook_id,
error=str(e),
)
async def _merge_similar(playbooks: list[Playbook], report: EvolverReport) -> None:
"""
Step 3: 症狀 Jaccard 相似度 > 0.9 → 合併為一
策略:保留 trust_score 較高的那筆,將較差的標記 DEPRECATED
合併次數上限 MAX_MERGE_PER_RUN避免單次操作影響太多。
"""
from src.services.playbook_service import get_playbook_service
from src.utils.similarity import calculate_jaccard_similarity
service = get_playbook_service()
merged_set: set[str] = set() # 已合併(或被合併)的 playbook_id
active = [p for p in playbooks if p.status not in (PlaybookStatus.DEPRECATED,)]
merge_count = 0
for i, pb_a in enumerate(active):
if pb_a.playbook_id in merged_set:
continue
if merge_count >= MAX_MERGE_PER_RUN:
break
for pb_b in active[i + 1:]:
if pb_b.playbook_id in merged_set:
continue
if merge_count >= MAX_MERGE_PER_RUN:
break
sim = _compute_symptom_similarity(pb_a, pb_b)
if sim < MERGE_SIMILARITY_THRESHOLD:
continue
# 相似度 >= 0.9 → 合併
# 保留 trust 較高的,封存較差的
keep, drop = (
(pb_a, pb_b) if pb_a.trust_score >= pb_b.trust_score else (pb_b, pb_a)
)
try:
# 把 drop 的來源 incident 合入 keep
merged_source_ids = list(
set(keep.source_incident_ids) | set(drop.source_incident_ids)
)
await service.update_with_validation(
keep.playbook_id,
{"source_incident_ids": merged_source_ids},
)
# 封存被合併的
await service.update_with_validation(
drop.playbook_id,
{"status": PlaybookStatus.DEPRECATED.value},
)
logger.info(
"evolver_merged",
kept_id=keep.playbook_id,
dropped_id=drop.playbook_id,
similarity=f"{sim:.2f}",
kept_trust=keep.trust_score,
dropped_trust=drop.trust_score,
)
merged_set.add(drop.playbook_id)
report.merged_count += 1
report.merged_pairs.append((drop.playbook_id, keep.playbook_id))
drop.status = PlaybookStatus.DEPRECATED
merge_count += 1
except Exception as e:
report.errors.append(f"merge:{keep.playbook_id}+{drop.playbook_id}:{e}")
logger.warning(
"evolver_merge_failed",
keep_id=keep.playbook_id,
drop_id=drop.playbook_id,
error=str(e),
)
# ─────────────────────────────────────────────────────────────────────────────
# Helpers
# ─────────────────────────────────────────────────────────────────────────────
async def _fetch_all_active_playbooks() -> list[Playbook]:
"""抓取所有非 DEPRECATED 的 Playbook用於 Evolver 掃描)。"""
try:
from src.services.playbook_service import get_playbook_service
service = get_playbook_service()
# 不過濾 status — Evolver 需要看到 DRAFT + APPROVED
playbooks_page, total = await service.list_playbooks(limit=500)
return playbooks_page
except Exception as e:
logger.warning("evolver_fetch_playbooks_failed", error=str(e))
return []
def _compute_symptom_similarity(pb_a: Playbook, pb_b: Playbook) -> float:
"""
計算兩個 Playbook 症狀模式的 Jaccard 相似度。
組合三維度:
- alert_names Jaccard權重 0.5
- keywords Jaccard權重 0.3
- affected_services Jaccard權重 0.2
"""
from src.utils.similarity import calculate_jaccard_similarity
sp_a = pb_a.symptom_pattern
sp_b = pb_b.symptom_pattern
alert_sim = calculate_jaccard_similarity(
set(sp_a.alert_names), set(sp_b.alert_names)
)
keyword_sim = calculate_jaccard_similarity(
set(sp_a.keywords), set(sp_b.keywords)
)
# affected_services 為空 = 通用型 Playbook → 視為完全相符
if not sp_a.affected_services and not sp_b.affected_services:
service_sim = 1.0
elif not sp_a.affected_services or not sp_b.affected_services:
service_sim = 0.5 # 一個通用一個特定 → 中等
else:
service_sim = calculate_jaccard_similarity(
set(sp_a.affected_services), set(sp_b.affected_services)
)
return 0.5 * alert_sim + 0.3 * keyword_sim + 0.2 * service_sim
# ─────────────────────────────────────────────────────────────────────────────
# Singleton / Scheduling Hook
# ─────────────────────────────────────────────────────────────────────────────
async def schedule_daily_evolver() -> None:
"""
供 startup 或 APScheduler 呼叫的每日 Evolver 觸發點。
呼叫方式main.py lifespan 或 scheduler
asyncio.create_task(schedule_daily_evolver())
"""
from src.core.feature_flags import aiops_flags
if not aiops_flags.AIOPS_P3_EVOLVER_ENABLED:
return
logger.info("evolver_daily_scheduled")
try:
report = await run_evolver()
logger.info(
"evolver_daily_done",
archived=report.archived_count,
merged=report.merged_count,
)
except Exception:
logger.exception("evolver_daily_failed")

View File

@@ -0,0 +1,92 @@
# ADR-083: 學習閉環重建與 Playbook 演化
**狀態**: Completed
**日期**: 2026-04-15
**作者**: ogt + Claude Sonnet 4.6(亞太)
**Phase**: Phase 3
---
## 背景
Phase 2ADR-082完成後AWOOOI AIOps 系統進入 Phase 3。
根據 MASTER v2 架構診斷L7 學習層有三個核彈級根因導致 Playbook 信任度**永遠停在初始值 0.3**
1. **fire-and-forget**`approval_execution._trigger_learning()``asyncio.create_task()` 呼叫主流程不等待GC 可能在學習完成前回收協程。
2. **matched_playbook_id 永 null**`decision_manager._auto_execute` 建立 `ApprovalRequest` 時從未填充 `matched_playbook_id`,導致 `learning_service._update_playbook_stats``if _matched_pb_id:` 條件永遠為 False。
3. **無負向強化**`playbook_repository.update_stats` 只遞增 `success_count / failure_count`,無 EWMA 動態信任度計算。
## 決策
### D1: fire-and-forget → await asyncio.wait_for
```python
# 舊(火烤)
asyncio.create_task(self._trigger_learning(...))
# 新Phase 3
try:
await asyncio.wait_for(self._trigger_learning(...), timeout=30.0)
except asyncio.TimeoutError:
logger.warning("learning_trigger_timeout", ...)
```
- 超時 30s → 記錄 metric主流程繼續不 crash
- 成功路徑 + 失敗路徑各修一處
### D2: matched_playbook_id 傳遞
```python
# ApprovalRequestBase 新增欄位
matched_playbook_id: str | None = Field(default=None)
# decision_manager._auto_execute 填充
_matched_playbook_id = token.proposal_data.get("playbook_id")
approval = ApprovalRequest(..., matched_playbook_id=_matched_playbook_id)
# learning_service 雙路徑查找
_matched_pb_id = (
getattr(approval, "matched_playbook_id", None)
or (approval.metadata or {}).get("matched_playbook_id")
or (approval.metadata or {}).get("playbook_id")
)
```
### D3: 2x EWMA 負向強化
```
Playbook.trust_score 初值 = 0.3(新增欄位)
成功: trust_new = 0.9 × trust_old + 0.1 × 1.0
失敗: trust_new = 0.8 × trust_old + 0.2 × 0.0 ← 衰減係數 2x 快
```
trust < 0.1 → 記錄警告,由 Evolver Agent 封存
### D4: Evolver Agentplaybook_evolver.py
三功能全靜態規則(不依賴 LLM
1. **低信任封存**trust_score < 0.1 → DEPRECATED
2. **休眠封存**30d 未使用 AND trust < 0.5 → DEPRECATED
3. **相似合併**:症狀 Jaccard > 0.9 → 保留高 trust封存低 trust
Feature flag`AIOPS_P3_EVOLVER_ENABLED=False`(預設關閉)
## 後果
**正面**
- 學習閉環首次真正打通(第 9 節點觸發率 0% → 接近 100%
- Playbook 信任度動態更新,失敗者加速退場
- Evolver 防止 Playbook 重複萃取膨脹
**限制**
- 人工審核路徑的 `matched_playbook_id` 傳遞仍依賴 `metadata` fallback完整修復需改 approval DB schema留 Phase 3.5
- Evolver 使用 Jaccard非 cosine相似度向量化合併留 Phase 4
- `AIOPS_P3_ENABLED = False` 預設值 — 學習呼叫仍執行,但 EWMA 計算不依賴此開關(直接在 repository 層)
## 驗收條件
- `matched_playbook_id` null 率 = 0auto_execute 路徑)
- Playbook trust_score 隨每次執行更新(可由 `playbook_stats_updated` log 驗證)
- 學習超時 → `learning_trigger_timeout` log 出現(不 crash
- Evolver `run_evolver()` 空跑無 exception

View File

@@ -1454,3 +1454,33 @@ Phase 6 完成後
- 狀態標頭更新:🟢 §0-§7 全填完
**MASTER v2 第一版完成。** 下一步:統帥 review → 批准 ADR-080 → Phase 0 開工。
---
### 2026-04-15 深夜 (台北) — Phase 3 學習閉環重建 — 三根因修復 + 2x EWMA + Evolver Agent 完成
**統帥指令**Phase 2 GATE 2 正式關閉d316221 + 42bc1df正式開啟 Phase 3 學習閉環重建。
**變更摘要5 檔 / 1 新建):**
| 檔案 | 修改內容 | 根因 |
|------|---------|------|
| `services/approval_execution.py` | fire-and-forget `create_task``await asyncio.wait_for(..., timeout=30)` × 2 處(成功路徑 + 失敗路徑) | 學習觸發 0% |
| `models/approval.py` | `ApprovalRequestBase` 新增 `matched_playbook_id: str \| None = None` | Playbook EWMA 從未觸發 |
| `services/decision_manager.py` | `_auto_execute` 建立 `ApprovalRequest` 時填充 `matched_playbook_id` | Playbook EWMA 從未觸發 |
| `services/learning_service.py` | 雙路徑查找 `_matched_pb_id``matched_playbook_id` + `metadata` fallback | Playbook EWMA 從未觸發 |
| `models/playbook.py` | 新增 `trust_score: float = 0.3`EWMA 動態信任度) | 無 trust_score 欄位 |
| `repositories/playbook_repository.py` | `update_stats` 加 EWMA成功 α=0.1、失敗 α=0.22x 衰減)+ 邊界保護 + 低信任警告 | 無 EWMA 更新 |
| `services/playbook_evolver.py` | **新建** Evolver Agent低信任封存<0.1+ 休眠封存30d + <0.5+ Jaccard 合併(>0.9 | 無 Playbook 生命週期管理 |
**退出驗收條件Phase 3 §7.1**
- [ ] `matched_playbook_id` null 率 = 0所有 RAG 命中的決策都填充)
- [ ] Playbook trust_score 每次執行後更新
- [ ] 學習呼叫成功率 ≥ 99%(不再 fire-and-forget
- [ ] Evolver 合併演練 1 次成功
**Feature Flags預設全 False等統帥開啟**
- `AIOPS_P3_ENABLED`
- `AIOPS_P3_EVOLVER_ENABLED`
**下一步:** ADR-083 草稿 → Gate 3 架構審查 → Phase 3 commit push Gitea