feat(km): P1-1 KMWriter 統一契約 + 5 caller 切換 + M4 反查鏈補齊

12-Agent 全景診斷揪出 KM 寫入鏈路 5 條入口無統一契約,fire-and-forget
在 Pod recycle 時會丟失條目。本次抽 KMWriter 強制 7 條契約。

## 7 條契約強制
1. 同步底線:強制 await asyncio.wait_for(timeout)
2. 重試:3 次指數退避 1s/2s/4s(OperationalError / 網路類例外)
3. 失敗回收:3 次後寫 Redis DLQ km:dlq + log
4. 觀測:structlog event + 預留 metric hook(P1-3 補 emitter)
5. 冪等:incident_id + path_type 為 unique key
6. 禁止吞例外:except 必須 log + raise/DLQ
7. M4 反查鏈:payload 含 approval_id 時自動填 related_approval_id 並回填 Path A

## Caller 切換(5 條入口統一介面)
- incident_service.py:1086 Path A(KB extractor + km_conversion)
- approval_execution.py:771 Path B-人工
- decision_manager.py:2178 Path B-自動成功(消除跨類私有方法調用 M1)
- decision_manager.py:2200 Path B-自動失敗(修 B2 早期吞例外)
- playbook_service.py:210 PlaybookKM(兩份 T0 報告都漏的第三條)

## M4 反查鏈補齊
- knowledge.py + models.py: 補 related_approval_id ORM 欄位
- 對齊 phase26_incident_km_integration.sql:20 schema(partial index 已存在)
- approval↔KM 雙向反查鏈完整(dual-path 縫合線)

## Feature Flag (rollback 保險)
- KM_WRITE_AWAIT=true (default): await + timeout + DLQ 強制
- KM_WRITE_AWAIT=false: fire-and-forget(舊行為)

## 測試
- apps/api/tests/test_km_writer.py: 18 測試全綠
  覆蓋 success / timeout / retry / DLQ / 冪等 / KMWriteError /
  on_failure=raise / 反查鏈回填
- 1552 unit tests 全綠(無回歸)

## 驗收
飛輪閉環核心 — KM 寫入不再靜默丟失,AI 學習鏈不斷裂。

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Your Name
2026-04-29 09:41:24 +08:00
parent 715dc3cb91
commit c22e5f334e
8 changed files with 909 additions and 129 deletions

View File

@@ -69,6 +69,9 @@ class KnowledgeEntryCreate(BaseModel):
status: EntryStatus = EntryStatus.DRAFT
related_incident_id: str | None = None
related_playbook_id: str | None = None
# P1-1 2026-04-28 ogt + Claude Sonnet 4.6: M4 補反查鏈 — approval → KM 關聯
# phase26_incident_km_integration.sql 已建立 related_approval_id 欄位與 partial index
related_approval_id: str | None = None
# 2026-04-04 ogt: Phase 25 P1 — Anti-Pattern 閉環用症狀 hash
symptoms_hash: str | None = None
created_by: str | None = None
@@ -96,6 +99,8 @@ class KnowledgeEntry(BaseModel):
status: EntryStatus = EntryStatus.DRAFT
related_incident_id: str | None = None
related_playbook_id: str | None = None
# P1-1 2026-04-28 ogt + Claude Sonnet 4.6: M4 補反查鏈
related_approval_id: str | None = None
# 2026-04-04 ogt: Phase 25 P1 — Anti-Pattern 閉環攔截用的症狀 hashSymptomPattern.compute_hash()
symptoms_hash: str | None = None
view_count: int = 0