OG T
|
df3ef9006c
|
fix(auto-repair): 首席架構師 Review — 4 Critical/Important 修復
CD Pipeline / build-and-deploy (push) Successful in 7m2s
Critical #1: KM write task 移出 try/except
- _trigger_learning 的 KM 寫入原在 try 內,learning 失敗時不寫 KM
- 移至 except 後確保成功/失敗都寫入
- 移除冗餘 import asyncio(已在頂層 import)
- Minor: approval.incident_id or None 防空字串
Important #2: migration 加 PRIMARY KEY
- playbook_id 從 UNIQUE 升為 PRIMARY KEY
- prod DB 已執行 ALTER TABLE ADD PRIMARY KEY
Important #3: s.sequence→s.step_number, s.description→s.command
- embed_playbook() 使用不存在的欄位名,RAG 向量索引靜默失敗
- RepairStep 正確欄位: step_number, command
Important #1: PlaybookService._get_rag_service 不再 Service 層快取
- 改為每次呼叫工廠 get_playbook_rag_service()
- 避免舊實例繞過工廠的 is_closed 重建邏輯
冷啟動修復 (首席架構師建議B+C):
- _trigger_playbook_extraction 執行成功後自動設定
execution_success=True, effectiveness_score=4, status=RESOLVED
- skip 路徑 logger.debug → logger.info 提升可觀測性
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-04 12:02:03 +08:00 |
|
OG T
|
72d7536ead
|
feat(auto-repair): 完整自動修復閉環 + KM 沉澱串接
CD Pipeline / build-and-deploy (push) Has been cancelled
1. DB Migration: playbooks 資料表 (phase7_playbooks_table.sql)
- 這是自動修復無法啟動的根本原因 — table 從未建立
- 5 個索引: status/tags/alert_names/source_incidents/created_at
- 已在 prod DB 執行
2. playbook_service: 萃取後自動沉澱 KM
- extract_from_incident() 完成後 fire-and-forget _write_to_km()
- 內容含症狀模式、修復步驟、信心度、來源 Incident
3. approval_execution: 執行結果沉澱 KM
- _trigger_learning() 後 fire-and-forget _write_execution_result_to_km()
- 成功/失敗記錄都寫入,category=execution_result
完整閉環:
告警 → AI分析 → 查Playbook → 決策 → 執行 → 結果寫KM
↓
Incident解決 → KM(knowledge_extractor)
→ Playbook萃取 → KM
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-04 11:54:15 +08:00 |
|
OG T
|
3256142d29
|
feat(api): ADR-030 Phase 5 持續學習迴圈
從執行結果中學習,持續優化決策:
1. learning_service.py - 持續學習服務
- process_execution_result(): 處理執行結果
- process_human_feedback(): 處理人工反饋
- 自動調整信任度 (成功+1 / 失敗歸零)
- 更新 Playbook 統計
- 成功案例自動萃取 Playbook
2. approval_execution.py - 整合學習觸發
- 執行成功後觸發學習
- 執行失敗後觸發學習
- _trigger_learning(): 非阻塞呼叫學習服務
學習流程:
執行完成 → LearningService.process_execution_result()
├─ 成功: TrustEngine +1 分 + Playbook 統計更新
└─ 失敗: TrustEngine 歸零 + 記錄失敗原因
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
2026-03-26 22:19:41 +08:00 |
|
OG T
|
2e75a20150
|
feat(api): Phase 7.5-7.6 Playbook 整合決策與自動萃取
Phase 7.5: DecisionManager 三軌決策
- 新增 Playbook 優先匹配 (similarity >= 85%)
- 三軌決策順序: Playbook > LLM > Expert System
- 整合 PlaybookService 推薦引擎
Phase 7.6: 自動萃取機制
- approval_execution.py 成功執行後觸發萃取
- 條件: RESOLVED/CLOSED + effectiveness >= 4
- 滿分 (5) 自動核准 Playbook
測試:
- 13 個 Playbook 單元測試全部通過
- 修復 Incident 模型欄位對應 (reasoning_steps)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
2026-03-26 11:09:25 +08:00 |
|
OG T
|
716b94f60a
|
feat(api): Phase 16 R4.2 抽取 ApprovalExecutionService
Strangler Fig Pattern: 從 approvals.py 抽取執行編排邏輯
新增:
- src/services/approval_execution.py (271 行)
- ApprovalExecutionService class
- 整合 OperationParser + Executor + Timeline + Notifications
瘦身成果:
- approvals.py: 1097 → 787 行 (-310 行)
- R4 總計: 移除 310 行內嵌業務邏輯
CI/CD 修復:
- 移除危險的 rm -f ~/actions-runner-* 指令
- 改用 checkout clean: true + workspace 內清理
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
2026-03-25 22:04:15 +08:00 |
|