Commit Graph

7 Commits

Author SHA1 Message Date
Your Name
77fe2a85fd fix(api): 在日報月報 preview 顯示資料源沉澱
Some checks failed
Code Review / ai-code-review (push) Successful in 17s
CD Pipeline / tests (push) Successful in 1m43s
CD Pipeline / build-and-deploy (push) Failing after 8m19s
CD Pipeline / post-deploy-checks (push) Has been skipped
2026-06-18 20:01:44 +08:00
Your Name
e2ab879636 fix(alerts): correct telegram execution truth
Some checks failed
CD Pipeline / tests (push) Failing after 52s
CD Pipeline / build-and-deploy (push) Has been skipped
CD Pipeline / post-deploy-checks (push) Has been skipped
Code Review / ai-code-review (push) Successful in 11s
2026-05-31 13:58:39 +08:00
Your Name
88af639651 fix(report): 修正 approval_records.status 大小寫不一致
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 9m46s
DB 以 SQLEnum 儲存 enum name(EXECUTION_FAILED 大寫),
而非 enum value(execution_failed 小寫)。
SQL 加 UPPER(status::text) 確保不論大小寫皆能命中。

驗證:live DB 查詢 success=0, failed=2(之前永遠 0/0)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-22 09:10:39 +08:00
Your Name
6810ab359d fix(report): 日報重發 + 自動修復 0% 兩大根因修復
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
問題一:日度巡檢報告重複發送(多 Pod 各自跑 daily job)
  - 根因:run_daily_report_loop 沒有接 leader lock
    其他 scanner(capacity/hermes/compliance)都有呼叫
    try_acquire_daily_lock,唯獨日報 loop 缺失
  - 修法:asyncio.sleep 後加 try_acquire_daily_lock("daily_report")
    搶不到 lock 的 Pod 直接 continue,等下一個 08:00

問題二:自動修復成功率永遠 0.0%
  - 根因:_collect_repair_stats 查 incidents.outcome->>'execution_success'
    但整條執行鏈路(approval_execution.py NO_ACTION + 真實執行)
    從未將 execution_success 寫回 incidents.outcome JSON
    導致查詢永遠回 0
  - 修法:改查 approval_records.status(EXECUTION_SUCCESS / EXECUTION_FAILED)
    這是唯一被穩定寫入的 source of truth

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-22 09:03:44 +08:00
OG T
aa4e5757a2 fix: 技術債清理 — report_generation 重試機制 + GAP-A4 文件化
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 15m46s
技術債 #1: postmortem 發送失敗靜默吞掉
- 3 次指數退避重試 (2s → 4s → 6s)
- 全失敗後送簡化降級通知到 SRE 群組
- 防止事後檢討默默消失

技術債 #2 (QueryBuilder 抽象): DEFER
- 全專案僅 1 處用 outcome JSON path query
- 違反「Don't design for hypothetical future requirements」
- 待第二 caller 出現再抽

技術債 #3 (E2E 測試): 已涵蓋
- test_gap_a4_placeholder_resolution.py TestMatchRuleRejection
- Mission C prod 鏈路實測(KubePodCrashLooping)
- Playwright K8s/Telegram staging 留待 staging 環境就緒

新增文件:
- ADR-078-gap-a4-placeholder-resolution.md
- LOGBOOK 2026-04-14 深夜收官條目

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-04-14 18:46:25 +08:00
OG T
f54dea48b1 fix(GAP-D5): 日度報告 DB 欄位修正
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
兩處 import/查詢錯誤修復(統帥 E2E 預覽發現):

1. _collect_repair_stats: ApprovalRequestRecord 不存在
   → 改用 IncidentRecord + outcome JSON 路徑查詢 execution_success

2. _collect_playbook_count: PlaybookRecord 不存在
   → 改用 playbook_service.list_playbooks() (Redis 儲存)

修復前:修復成功率永遠 0.0%、活躍 Playbook 永遠 0
修復後:報告數字反映真實 DB/Redis 狀態

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-04-14 18:32:29 +08:00
OG T
684d6cfb43 feat(adr-076): 戰術 B 四大 Task 全部完成 — 告警聚合+重試+自動報告
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 17m34s
Task 2: AlertGroupingService — Redis 5分鐘滑動視窗,防告警風暴
- apps/api/src/services/alert_grouping_service.py (新增)
- webhooks.py 整合:指紋生成後/LLM前短路子告警
- Threshold=3,Graceful Degradation,16 tests

Task 3: approval_execution.py 執行失敗重試
- MAX_RETRY=2, RETRY_DELAY_SECONDS=30
- _is_transient_error() 瞬態/永久分類,永久錯誤不重試
- Timeline 記錄重試進度,成功後標注重試次數,29 tests

Task 4: report_generation_service.py 自動報告
- 日度巡檢報告:每日 08:00 台北時間,Telegram SRE 群組推送
- Postmortem:Incident resolved + duration > 10 分鐘自動觸發
- main.py lifespan 掛載 run_daily_report_loop(),30 tests

測試: 600 → 675 通過 (+75),0 failed

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-04-14 14:39:14 +08:00