fix(report): 日報重發 + 自動修復 0% 兩大根因修復
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled

問題一:日度巡檢報告重複發送(多 Pod 各自跑 daily job)
  - 根因:run_daily_report_loop 沒有接 leader lock
    其他 scanner(capacity/hermes/compliance)都有呼叫
    try_acquire_daily_lock,唯獨日報 loop 缺失
  - 修法:asyncio.sleep 後加 try_acquire_daily_lock("daily_report")
    搶不到 lock 的 Pod 直接 continue,等下一個 08:00

問題二:自動修復成功率永遠 0.0%
  - 根因:_collect_repair_stats 查 incidents.outcome->>'execution_success'
    但整條執行鏈路(approval_execution.py NO_ACTION + 真實執行)
    從未將 execution_success 寫回 incidents.outcome JSON
    導致查詢永遠回 0
  - 修法:改查 approval_records.status(EXECUTION_SUCCESS / EXECUTION_FAILED)
    這是唯一被穩定寫入的 source of truth

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Your Name
2026-04-22 09:03:40 +08:00
parent 757a58cc60
commit 6810ab359d

View File

@@ -219,33 +219,29 @@ class ReportGenerationService:
async def _collect_repair_stats(self, since: datetime) -> dict:
"""
收集自動修復統計IncidentRecord.outcome JSON
收集自動修復統計
2026-04-14 Claude Sonnet 4.6 修復 — 原本引用不存在的 ApprovalRequestRecord
實際 execution_success 儲存在 IncidentRecord.outcome JSON 欄位。
2026-04-22 Claude Sonnet 4.6 修復 — incidents.outcome JSON 在執行鏈路中從未被寫入
execution_success,導致永遠查詢到 0。改查 approval_records.status 作為 source of truth
approval_execution.py 每次執行後都會寫入 EXECUTION_SUCCESS / EXECUTION_FAILED
"""
from sqlalchemy import func, select, text
from sqlalchemy import text
from src.db.base import get_db_context
from src.db.models import IncidentRecord
async with get_db_context() as db:
# PostgreSQL JSON 路徑查詢outcome->>'execution_success'
success = await db.scalar(
select(func.count()).select_from(IncidentRecord).where(
IncidentRecord.created_at >= since,
text("outcome->>'execution_success' = 'true'"),
)
) or 0
failed = await db.scalar(
select(func.count()).select_from(IncidentRecord).where(
IncidentRecord.created_at >= since,
text("outcome->>'execution_success' = 'false'"),
)
) or 0
return {"success": success, "failed": failed}
row = await db.execute(
text("""
SELECT
COUNT(*) FILTER (WHERE status = 'execution_success') AS success,
COUNT(*) FILTER (WHERE status = 'execution_failed') AS failed
FROM approval_records
WHERE created_at >= :since
"""),
{"since": since},
)
r = row.one()
return {"success": int(r.success or 0), "failed": int(r.failed or 0)}
async def _collect_km_stats(self, since: datetime) -> int:
"""收集新增 KM 條目數"""
@@ -559,6 +555,12 @@ async def run_daily_report_loop() -> None:
)
await asyncio.sleep(sleep_seconds)
# 2026-04-22 Claude Sonnet 4.6: 多 Pod 競速保護 — 只有搶到 Redis SETNX 的 Pod 才發報告
from src.services.ai_advisory_helpers import try_acquire_daily_lock
if not await try_acquire_daily_lock("daily_report"):
logger.info("daily_report_skipped_other_pod")
continue
logger.info("daily_report_triggered")
await service.send_daily_report()