P0 #3 (徹底長期修系列) — 把 daily report 的 pod 健康判斷從「ready=False 一律告警」 升級到完整 K8s pod lifecycle state machine: | Phase | 行為 | |-------|------| | Succeeded / Completed | 跳過(CronJob/Job 跑完正常) | | Failed | 必告警 | | Unknown | 必告警 | | Pending <5min | 跳過(剛 schedule 合理) | | Pending >=5min | 告警「image pull / scheduling 卡住」| | Running ready=True | 健康,跳過 | | Running ready=False <2min | 跳過(剛起來 probe 還沒過)| | Running ready=False >=2min | 告警「readiness probe fail / 啟動異常」| | restarts >=3 | 必告警(無論 phase)| 實作: - PodInfo 加 start_time: Optional[str](從 .status.startTime) - _get_pod_status kubectl custom-columns 加 STARTTIME - _build_warnings 完整 state machine + 閾值常數 regression test (test_heartbeat_pod_state_machine.py 13 個) 覆蓋每個 phase + 邊界條件,含 2026-05-02 統帥截圖鐵證重現(3 個 drift-scanner Succeeded pod 不該觸發「需關注 3 項」假警報)。 Tests: 13 passed (新增 test_heartbeat_pod_state_machine.py) 接續 a38d9112(單純 Succeeded skip),這次徹底處理 Pending/Failed/Unknown + 時間閾值 + 沒 start_time 的保守告警。 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>