critic PR review 揭示已 push commits 的 7 個 blocker,本 commit 全部修復。
## C1 + C2 + M1 + M2 + M3 — KMWriter 真正統一契約(critic 最嚴重 5 條)
### C1 km_writer.py:194 — backfill 自打臉修
- 裸 asyncio.create_task(_backfill_path_a_approval) → await _backfill_path_a_approval_safe()
- 同步 await + 獨立 DLQ km:backfill:dlq + try/except 不阻塞主寫入
- 新增 km_backfill_reconciler_job.py(每 5 分鐘掃 DLQ)+ ENABLE_KM_BACKFILL_RECONCILER flag
- 防 Path B 比 Path A 先完成 → related_approval_id 永遠 NULL 的 race
### C2 km_writer.py:391 — KM_WRITE_AWAIT=false 路徑收緊
- 從 ensure_future(fire-and-forget 比舊版同步寫更糟)
- 改 await writer.write(retry=1, timeout=2.0)(仍 await 但只試一次、超時短)
- docstring 明確標註「緊急回滾用,不保證可靠性」
### M1 decision_manager.py:2178/2203 — 移除 _fire_and_forget 旁路
- 兩處 _fire_and_forget(executor.write_execution_result_to_km(...))
- 改 await asyncio.shield(...) + BaseException 保護(防上層 cancel 中斷)
- KM_WRITE_AWAIT=true 在這條路徑終於真正 await
### M2 incident_service.py:1099 — 自製 path 加 retry+DLQ
- 原本 if settings.KM_WRITE_AWAIT: await asyncio.wait_for else create_task
- 改 3 次指數退避 retry + DLQ 保護(呼叫 km_writer 私有 helper)
### M3 km_writer.py:166 — 冪等聲明對齊實作
- knowledge_repository.create() 加 UPSERT 路徑(pg_insert ON CONFLICT DO UPDATE)
- KnowledgeEntryCreate / KnowledgeEntryRecord 加 path_type 欄位
- migration: ADD COLUMN path_type + partial unique index uix_knowledge_incident_path
## M4 alertmanager.yml — equal: [] 收緊(critic 防爆炸抑制)
- OllamaInstanceDown / KMConverterDown 抑制加 equal: ['cluster'] 約束
- 防多 cluster 場景下任一 Ollama down 誤抑全 AI/SLO 告警
## M5 Alertmanager 版本驗證(已確認 v0.31.1,遠超 v0.22+)
## M6 governance_agent.py — health score 區分 skipped vs ok vs violated
- check_slo_compliance 加 _meta {violated_count, skipped_count, ok_count, all_skipped, status}
- run_self_check: SLO 全 skipped 時獨立發 governance_slo_data_gap 告警
(不污染 self_failure 計數,因為 no_data 是 emitter 未實作不是治理機制故障)
## M7 scripts/check_config_drift.py — 改 AST 解析
- regex 改 ast.parse 找 Settings ClassDef AnnAssign Field(default=...)
- 避免多行 list / default_factory= / 含跳行字串的 false negative
- 4 欄位(AI_FALLBACK_ORDER / ARGOCD_URL / PROMETHEUS_URL / OLLAMA_URL)全對齊
## 新增測試
- test_km_writer_backfill_reconciler.py: 7 cases(C1 reconciler + safe helper)
- test_km_writer_idempotent.py: 5 cases(M3 path_type 注入 + UPSERT 分支)
## 驗證
- 1585 unit tests 全綠(+13 從 1572)
- amtool check-config SUCCESS(8 inhibit_rules / 2 receivers)
- drift checker AST-based 4 欄位全對齊
- Alertmanager v0.31.1 確認支援新語法
## 期望影響
- KMWriter 名實統一:飛輪閉環 KM 寫入路徑 100% 可靠
- M4 抑制爆炸風險解除
- 治理層不再對 SLO no_data 靜默
- drift checker false negative 風險解除
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
196 lines
7.3 KiB
Python
196 lines
7.3 KiB
Python
"""
|
||
KM Backfill Reconciler 單元測試
|
||
================================
|
||
P1-1 C1 修復 2026-04-28 ogt + Claude Sonnet 4.6
|
||
|
||
測試範圍:
|
||
1. reconciler 從 DLQ 成功補救 → LREM 移除
|
||
2. reconciler DB 失敗 → 保留 DLQ(不移除)
|
||
3. reconciler DLQ 格式錯誤 → 移除(無法補救)
|
||
4. reconciler DLQ 空 → 0 processed
|
||
5. ENABLE_KM_BACKFILL_RECONCILER=false → 跳過
|
||
6. _backfill_path_a_approval_safe — 成功路徑不寫 DLQ
|
||
7. _backfill_path_a_approval_safe — 失敗時寫 km:backfill:dlq
|
||
|
||
建立:2026-04-28 (台北時區) ogt + Claude Sonnet 4.6
|
||
"""
|
||
|
||
import json
|
||
from unittest.mock import AsyncMock, MagicMock, patch
|
||
|
||
import pytest
|
||
|
||
from src.jobs.km_backfill_reconciler_job import (
|
||
run_km_backfill_reconciler,
|
||
)
|
||
from src.services.km_writer import (
|
||
KM_BACKFILL_DLQ_KEY,
|
||
_backfill_path_a_approval_safe,
|
||
)
|
||
|
||
|
||
# =============================================================================
|
||
# Helper
|
||
# =============================================================================
|
||
|
||
def _make_dlq_record(incident_id: str = "INC-001", approval_id: str = "AP-001") -> bytes:
|
||
return json.dumps({"incident_id": incident_id, "approval_id": approval_id}).encode()
|
||
|
||
|
||
# =============================================================================
|
||
# 1. Reconciler 成功補救
|
||
# =============================================================================
|
||
|
||
@pytest.mark.asyncio
|
||
async def test_reconciler_success_removes_from_dlq():
|
||
"""成功補救後應 LREM 從 DLQ 移除"""
|
||
record = _make_dlq_record("INC-R1", "AP-R1")
|
||
mock_redis = AsyncMock()
|
||
mock_redis.lrange = AsyncMock(return_value=[record])
|
||
mock_redis.lrem = AsyncMock()
|
||
|
||
with patch("src.jobs.km_backfill_reconciler_job.settings") as mock_settings, \
|
||
patch("src.core.redis_client.get_redis", return_value=mock_redis), \
|
||
patch("src.jobs.km_backfill_reconciler_job._do_backfill", new_callable=AsyncMock) as mock_do:
|
||
|
||
mock_settings.ENABLE_KM_BACKFILL_RECONCILER = True
|
||
result = await run_km_backfill_reconciler()
|
||
|
||
assert result["processed"] == 1
|
||
assert result["success"] == 1
|
||
assert result["failed"] == 0
|
||
mock_do.assert_called_once_with("INC-R1", "AP-R1")
|
||
mock_redis.lrem.assert_called_once_with(KM_BACKFILL_DLQ_KEY, 1, record)
|
||
|
||
|
||
# =============================================================================
|
||
# 2. Reconciler DB 失敗 → 保留 DLQ
|
||
# =============================================================================
|
||
|
||
@pytest.mark.asyncio
|
||
async def test_reconciler_db_failure_preserves_dlq():
|
||
"""DB 失敗時不應 LREM(保留 DLQ 等下次補救)"""
|
||
record = _make_dlq_record("INC-FAIL", "AP-FAIL")
|
||
mock_redis = AsyncMock()
|
||
mock_redis.lrange = AsyncMock(return_value=[record])
|
||
mock_redis.lrem = AsyncMock()
|
||
|
||
with patch("src.jobs.km_backfill_reconciler_job.settings") as mock_settings, \
|
||
patch("src.core.redis_client.get_redis", return_value=mock_redis), \
|
||
patch("src.jobs.km_backfill_reconciler_job._do_backfill",
|
||
side_effect=Exception("db connection refused")):
|
||
|
||
mock_settings.ENABLE_KM_BACKFILL_RECONCILER = True
|
||
result = await run_km_backfill_reconciler()
|
||
|
||
assert result["processed"] == 1
|
||
assert result["success"] == 0
|
||
assert result["failed"] == 1
|
||
# 失敗時不應 LREM
|
||
mock_redis.lrem.assert_not_called()
|
||
|
||
|
||
# =============================================================================
|
||
# 3. Reconciler 格式錯誤 → 移除(無法補救)
|
||
# =============================================================================
|
||
|
||
@pytest.mark.asyncio
|
||
async def test_reconciler_malformed_record_removed():
|
||
"""格式錯誤的 DLQ record 應被移除(不能卡住 DLQ)"""
|
||
malformed = b"not-json-at-all"
|
||
mock_redis = AsyncMock()
|
||
mock_redis.lrange = AsyncMock(return_value=[malformed])
|
||
mock_redis.lrem = AsyncMock()
|
||
|
||
with patch("src.jobs.km_backfill_reconciler_job.settings") as mock_settings, \
|
||
patch("src.core.redis_client.get_redis", return_value=mock_redis), \
|
||
patch("src.jobs.km_backfill_reconciler_job._do_backfill", new_callable=AsyncMock) as mock_do:
|
||
|
||
mock_settings.ENABLE_KM_BACKFILL_RECONCILER = True
|
||
await run_km_backfill_reconciler()
|
||
|
||
# 格式錯誤移除
|
||
mock_redis.lrem.assert_called_once_with(KM_BACKFILL_DLQ_KEY, 1, malformed)
|
||
# 不嘗試 DB 補救
|
||
mock_do.assert_not_called()
|
||
|
||
|
||
# =============================================================================
|
||
# 4. DLQ 空 → 0 processed
|
||
# =============================================================================
|
||
|
||
@pytest.mark.asyncio
|
||
async def test_reconciler_empty_dlq():
|
||
"""DLQ 為空時應返回 0 processed"""
|
||
mock_redis = AsyncMock()
|
||
mock_redis.lrange = AsyncMock(return_value=[])
|
||
|
||
with patch("src.jobs.km_backfill_reconciler_job.settings") as mock_settings, \
|
||
patch("src.core.redis_client.get_redis", return_value=mock_redis):
|
||
|
||
mock_settings.ENABLE_KM_BACKFILL_RECONCILER = True
|
||
result = await run_km_backfill_reconciler()
|
||
|
||
assert result["processed"] == 0
|
||
assert result["success"] == 0
|
||
assert result["failed"] == 0
|
||
|
||
|
||
# =============================================================================
|
||
# 5. ENABLE_KM_BACKFILL_RECONCILER=false → 跳過
|
||
# =============================================================================
|
||
|
||
@pytest.mark.asyncio
|
||
async def test_reconciler_disabled_skips():
|
||
"""Feature flag false 時應直接返回 0,不存取 Redis"""
|
||
with patch("src.jobs.km_backfill_reconciler_job.settings") as mock_settings, \
|
||
patch("src.core.redis_client.get_redis") as mock_get_redis:
|
||
|
||
mock_settings.ENABLE_KM_BACKFILL_RECONCILER = False
|
||
result = await run_km_backfill_reconciler()
|
||
|
||
assert result["processed"] == 0
|
||
mock_get_redis.assert_not_called()
|
||
|
||
|
||
# =============================================================================
|
||
# 6. _backfill_path_a_approval_safe — 成功路徑不寫 DLQ
|
||
# =============================================================================
|
||
|
||
@pytest.mark.asyncio
|
||
async def test_backfill_safe_success_no_dlq():
|
||
"""成功時不應寫 km:backfill:dlq"""
|
||
with patch("src.services.km_writer._backfill_path_a_approval", new_callable=AsyncMock) as mock_bf, \
|
||
patch("src.core.redis_client.get_redis") as mock_get_redis:
|
||
|
||
await _backfill_path_a_approval_safe("INC-OK", "AP-OK")
|
||
|
||
mock_bf.assert_called_once_with("INC-OK", "AP-OK")
|
||
mock_get_redis.assert_not_called()
|
||
|
||
|
||
# =============================================================================
|
||
# 7. _backfill_path_a_approval_safe — 失敗時寫 km:backfill:dlq
|
||
# =============================================================================
|
||
|
||
@pytest.mark.asyncio
|
||
async def test_backfill_safe_failure_writes_dlq():
|
||
"""失敗時應寫 km:backfill:dlq 且不拋例外"""
|
||
captured_keys = []
|
||
mock_redis = AsyncMock()
|
||
|
||
async def _capture_lpush(key, value):
|
||
captured_keys.append(key)
|
||
|
||
mock_redis.lpush.side_effect = _capture_lpush
|
||
mock_redis.ltrim = AsyncMock()
|
||
|
||
with patch("src.services.km_writer._backfill_path_a_approval",
|
||
side_effect=Exception("db error")), \
|
||
patch("src.core.redis_client.get_redis", return_value=mock_redis):
|
||
|
||
# 不應拋例外
|
||
await _backfill_path_a_approval_safe("INC-ERR", "AP-ERR")
|
||
|
||
assert KM_BACKFILL_DLQ_KEY in captured_keys
|