feat(p1.5): FailoverAlerter 整合點 3+4 + 6 個 testcase 補完
Some checks failed
CD Pipeline / build-and-deploy (push) Failing after 1m32s
Some checks failed
CD Pipeline / build-and-deploy (push) Failing after 1m32s
P1.5 收尾(status 文件 line 96-99 指定):
整合點 3 — failover_manager Gemini quota 告警觸發:
- ollama_failover_manager.py: _check_gemini_quota 返回 False 時呼叫
alerter.alert_gemini_quota_exceeded({quota, current_count})
- 從 Redis 讀 ollama:gemini_daily_count:{date} 取 current_count(fail-soft)
- alerter 內 24h dedup(QUOTA_DEDUP_TTL_SEC=86400),每日只發一次
- try/except 包裹:告警失敗 fail-open,不阻斷 routing
整合點 4 — main.py lifespan 注入 Redis client:
- 在 _recovery_svc.start() 之後、yield 之前
- 呼叫 configure_alerter(get_redis()) 替換 singleton 注入 dedup 能力
- try/except 包裹:注入失敗 fail-open(alerter 仍可工作但 dedup 失效)
新測試 (174 行, 6/6 pass):
- test_alert_failover_dedup: 同 to_provider 第二次被 10min dedup ✅
- test_alert_recovery_send: 正常發送 + Markdown 訊息 + 連續 N 次 HEALTHY ✅
- test_no_telegram_chat_id_noop: chat_id 缺時 fail-soft 不 raise ✅
- test_quota_alert_dedup_24h: TTL=86400s,訊息含 quota+count ✅
- test_configure_alerter_replaces_singleton: lifespan 注入後 redis 可用 ✅
- test_dedup_fail_open_when_no_redis: Redis None → 允許送出 ✅
Mock 注意:_send() inline import telegram_gateway/get_settings,
mock target 必須是 src.services.telegram_gateway / src.core.config
而非 alerter module 自己。
回歸:原 37 ollama_failover_manager + 3 lifespan_wiring 測試全綠。
飛輪自主化分數:~75 → 預估 ~80(配額耗盡有告警,運維可見性 +5)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -563,6 +563,17 @@ async def lifespan(_app: FastAPI) -> AsyncGenerator[None, None]:
|
||||
|
||||
# 啟動 recovery service(從 Redis bootstrap current_primary,並啟動背景監控)
|
||||
await _recovery_svc.start()
|
||||
|
||||
# 2026-04-26 P1.5 整合點 4 by Claude Opus 4.7 — Failover Alerter 注入 Redis client
|
||||
# 必須在 recovery_svc.start() 之後(確保 Redis pool 已可用),yield 之前
|
||||
try:
|
||||
from src.services.failover_alerter import configure_alerter
|
||||
from src.core.redis_client import get_redis
|
||||
configure_alerter(get_redis())
|
||||
logger.info("failover_alerter_configured")
|
||||
except Exception as _alerter_err:
|
||||
logger.warning("failover_alerter_configure_failed", error=str(_alerter_err))
|
||||
|
||||
logger.info("ollama_failover_system_started")
|
||||
except Exception as e:
|
||||
logger.warning("ollama_failover_system_start_failed", error=str(e))
|
||||
|
||||
@@ -232,6 +232,30 @@ class OllamaFailoverManager:
|
||||
url_111=url_111,
|
||||
url_188=url_188,
|
||||
)
|
||||
# 2026-04-26 P1.5 整合點 3 by Claude Opus 4.7 — 配額耗盡 Telegram 告警
|
||||
# alerter 內部 24h dedup(QUOTA_DEDUP_TTL_SEC),即使每次 quota exceeded
|
||||
# 都呼叫,當日只會發送一次告警。失敗 fail-open(不阻擋 routing)。
|
||||
try:
|
||||
from src.services.failover_alerter import get_failover_alerter
|
||||
from src.core.redis_client import get_redis
|
||||
_current_count = quota # 預設為 quota 值(已超過則 ≥ quota)
|
||||
try:
|
||||
_redis = get_redis()
|
||||
if _redis is not None:
|
||||
_key = f"ollama:gemini_daily_count:{datetime.date.today().isoformat()}"
|
||||
_raw = await _redis.get(_key)
|
||||
_current_count = int(_raw or 0)
|
||||
except Exception:
|
||||
pass
|
||||
await get_failover_alerter().alert_gemini_quota_exceeded({
|
||||
"quota": quota,
|
||||
"current_count": _current_count,
|
||||
})
|
||||
except Exception as _alert_err:
|
||||
logger.warning(
|
||||
"gemini_quota_alert_dispatch_failed",
|
||||
error=str(_alert_err),
|
||||
)
|
||||
|
||||
# 寫入 audit_log(best-effort)
|
||||
await self._write_failover_audit(result)
|
||||
|
||||
Reference in New Issue
Block a user