ADR-083 Phase 3 學習閉環重建: **三根因修復** - approval_execution.py: fire-and-forget create_task → await asyncio.wait_for(timeout=30) × 2 (成功路徑 L265 + 失敗路徑 L353,超時記錄 learning_trigger_timeout metric,主流程不 crash) - models/approval.py: ApprovalRequestBase 新增 matched_playbook_id 欄位 - decision_manager.py: _auto_execute 建立 ApprovalRequest 時填充 matched_playbook_id - learning_service.py: 雙路徑查找 _matched_pb_id(matched_playbook_id + metadata fallback) **2x EWMA 負向強化** - models/playbook.py: 新增 trust_score: float = 0.3(EWMA 動態信任度欄位) - repositories/playbook_repository.py: update_stats 加 EWMA 成功: trust = 0.9 × old + 0.1 × 1.0 失敗: trust = 0.8 × old + 0.2 × 0.0(衰減速度 2x) trust < 0.1 → log warning,等 Evolver 封存 **Evolver Agent(新建)** - services/playbook_evolver.py: 三功能全靜態規則 1. 低信任封存: trust < 0.1 → DEPRECATED 2. 休眠封存: 30d 未使用 AND trust < 0.5 → DEPRECATED 3. 相似合併: 症狀 Jaccard > 0.9 → 保留高 trust,封存低 trust AIOPS_P3_EVOLVER_ENABLED=False 預設關閉 **文件** - ADR-083 學習閉環重建 - MASTER §8 Phase 3 完工記錄 AIOPS_P3_ENABLED=False(預設),骨架就位等統帥批准開啟 Co-Authored-By: Claude Sonnet 4.6(亞太)<noreply@anthropic.com>
93 lines
3.3 KiB
Markdown
93 lines
3.3 KiB
Markdown
# ADR-083: 學習閉環重建與 Playbook 演化
|
||
|
||
**狀態**: Completed
|
||
**日期**: 2026-04-15
|
||
**作者**: ogt + Claude Sonnet 4.6(亞太)
|
||
**Phase**: Phase 3
|
||
|
||
---
|
||
|
||
## 背景
|
||
|
||
Phase 2(ADR-082)完成後,AWOOOI AIOps 系統進入 Phase 3。
|
||
根據 MASTER v2 架構診斷,L7 學習層有三個核彈級根因導致 Playbook 信任度**永遠停在初始值 0.3**:
|
||
|
||
1. **fire-and-forget**:`approval_execution._trigger_learning()` 以 `asyncio.create_task()` 呼叫,主流程不等待,GC 可能在學習完成前回收協程。
|
||
2. **matched_playbook_id 永 null**:`decision_manager._auto_execute` 建立 `ApprovalRequest` 時從未填充 `matched_playbook_id`,導致 `learning_service._update_playbook_stats` 的 `if _matched_pb_id:` 條件永遠為 False。
|
||
3. **無負向強化**:`playbook_repository.update_stats` 只遞增 `success_count / failure_count`,無 EWMA 動態信任度計算。
|
||
|
||
## 決策
|
||
|
||
### D1: fire-and-forget → await asyncio.wait_for
|
||
|
||
```python
|
||
# 舊(火烤)
|
||
asyncio.create_task(self._trigger_learning(...))
|
||
|
||
# 新(Phase 3)
|
||
try:
|
||
await asyncio.wait_for(self._trigger_learning(...), timeout=30.0)
|
||
except asyncio.TimeoutError:
|
||
logger.warning("learning_trigger_timeout", ...)
|
||
```
|
||
|
||
- 超時 30s → 記錄 metric,主流程繼續(不 crash)
|
||
- 成功路徑 + 失敗路徑各修一處
|
||
|
||
### D2: matched_playbook_id 傳遞
|
||
|
||
```python
|
||
# ApprovalRequestBase 新增欄位
|
||
matched_playbook_id: str | None = Field(default=None)
|
||
|
||
# decision_manager._auto_execute 填充
|
||
_matched_playbook_id = token.proposal_data.get("playbook_id")
|
||
approval = ApprovalRequest(..., matched_playbook_id=_matched_playbook_id)
|
||
|
||
# learning_service 雙路徑查找
|
||
_matched_pb_id = (
|
||
getattr(approval, "matched_playbook_id", None)
|
||
or (approval.metadata or {}).get("matched_playbook_id")
|
||
or (approval.metadata or {}).get("playbook_id")
|
||
)
|
||
```
|
||
|
||
### D3: 2x EWMA 負向強化
|
||
|
||
```
|
||
Playbook.trust_score 初值 = 0.3(新增欄位)
|
||
|
||
成功: trust_new = 0.9 × trust_old + 0.1 × 1.0
|
||
失敗: trust_new = 0.8 × trust_old + 0.2 × 0.0 ← 衰減係數 2x 快
|
||
```
|
||
|
||
trust < 0.1 → 記錄警告,由 Evolver Agent 封存
|
||
|
||
### D4: Evolver Agent(playbook_evolver.py)
|
||
|
||
三功能全靜態規則(不依賴 LLM):
|
||
1. **低信任封存**:trust_score < 0.1 → DEPRECATED
|
||
2. **休眠封存**:30d 未使用 AND trust < 0.5 → DEPRECATED
|
||
3. **相似合併**:症狀 Jaccard > 0.9 → 保留高 trust,封存低 trust
|
||
|
||
Feature flag:`AIOPS_P3_EVOLVER_ENABLED=False`(預設關閉)
|
||
|
||
## 後果
|
||
|
||
**正面**:
|
||
- 學習閉環首次真正打通(第 9 節點觸發率 0% → 接近 100%)
|
||
- Playbook 信任度動態更新,失敗者加速退場
|
||
- Evolver 防止 Playbook 重複萃取膨脹
|
||
|
||
**限制**:
|
||
- 人工審核路徑的 `matched_playbook_id` 傳遞仍依賴 `metadata` fallback(完整修復需改 approval DB schema,留 Phase 3.5)
|
||
- Evolver 使用 Jaccard(非 cosine)相似度,向量化合併留 Phase 4
|
||
- `AIOPS_P3_ENABLED = False` 預設值 — 學習呼叫仍執行,但 EWMA 計算不依賴此開關(直接在 repository 層)
|
||
|
||
## 驗收條件
|
||
|
||
- `matched_playbook_id` null 率 = 0(auto_execute 路徑)
|
||
- Playbook trust_score 隨每次執行更新(可由 `playbook_stats_updated` log 驗證)
|
||
- 學習超時 → `learning_trigger_timeout` log 出現(不 crash)
|
||
- Evolver `run_evolver()` 空跑無 exception
|