Commit Graph

2 Commits

Author SHA1 Message Date
OG T
df3ef9006c fix(auto-repair): 首席架構師 Review — 4 Critical/Important 修復
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 7m2s
Critical #1: KM write task 移出 try/except
- _trigger_learning 的 KM 寫入原在 try 內,learning 失敗時不寫 KM
- 移至 except 後確保成功/失敗都寫入
- 移除冗餘 import asyncio(已在頂層 import)
- Minor: approval.incident_id or None 防空字串

Important #2: migration 加 PRIMARY KEY
- playbook_id 從 UNIQUE 升為 PRIMARY KEY
- prod DB 已執行 ALTER TABLE ADD PRIMARY KEY

Important #3: s.sequence→s.step_number, s.description→s.command
- embed_playbook() 使用不存在的欄位名,RAG 向量索引靜默失敗
- RepairStep 正確欄位: step_number, command

Important #1: PlaybookService._get_rag_service 不再 Service 層快取
- 改為每次呼叫工廠 get_playbook_rag_service()
- 避免舊實例繞過工廠的 is_closed 重建邏輯

冷啟動修復 (首席架構師建議B+C):
- _trigger_playbook_extraction 執行成功後自動設定
  execution_success=True, effectiveness_score=4, status=RESOLVED
- skip 路徑 logger.debug → logger.info 提升可觀測性

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 12:02:03 +08:00
OG T
72d7536ead feat(auto-repair): 完整自動修復閉環 + KM 沉澱串接
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
1. DB Migration: playbooks 資料表 (phase7_playbooks_table.sql)
   - 這是自動修復無法啟動的根本原因 — table 從未建立
   - 5 個索引: status/tags/alert_names/source_incidents/created_at
   - 已在 prod DB 執行

2. playbook_service: 萃取後自動沉澱 KM
   - extract_from_incident() 完成後 fire-and-forget _write_to_km()
   - 內容含症狀模式、修復步驟、信心度、來源 Incident

3. approval_execution: 執行結果沉澱 KM
   - _trigger_learning() 後 fire-and-forget _write_execution_result_to_km()
   - 成功/失敗記錄都寫入,category=execution_result

完整閉環:
告警 → AI分析 → 查Playbook → 決策 → 執行 → 結果寫KM
                                              ↓
                              Incident解決 → KM(knowledge_extractor)
                                          → Playbook萃取 → KM

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 11:54:15 +08:00