Files
awoooi/apps
Your Name 2c57b71db9
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 10m45s
feat(wave5-p2): GovernanceAgent 4 項自檢 + Ollama 健康告警規則 + Prometheus metrics 整合
MASTER plan_complete_v3.md Wave 5 P2.2 + P2.3 完成(multiple engineers 在限額前完成代碼,補 commit):

P2.2 — GovernanceAgent 4 項自檢:
- governance_agent.py (342 行) — 每 1 小時自檢循環:
  · trust_drift(信任度漂移檢測)
  · knowledge_degradation(知識退化檢測)
  · llm_hallucination(LLM 幻覺檢測)
  · execution_blast_radius(執行爆炸半徑檢測)
- main.py lifespan: asyncio.create_task(run_governance_loop()) 啟動
  try/except 包裹,schedule 失敗不阻斷主流程
- failover_alerter.py: alert_governance(event_type, payload) 1h dedup
  四類事件 → Telegram MarkdownV2 告警

P2.3 — Ollama 健康規則 + Prometheus Metrics:
- ops/monitoring/ollama_health_rules.yaml (148 行):
  · OllamaHealthDegraded / OllamaPrimaryDown
  · OllamaFailoverTriggered / GeminiQuotaExceeded
  · 補 Prometheus 取資料的 alert rules
- core/metrics.py (57 行):
  · GEMINI_DAILY_CALL_COUNT / GEMINI_DAILY_QUOTA Gauge
  · OLLAMA_FAILOVER_TRIGGERED_TOTAL Counter
  · OLLAMA_CURRENT_PRIMARY_IS_OLLAMA Gauge
- ollama_failover_manager.py:
  · _check_gemini_quota: 每次 check 同步更新 Gauge(讓 Prometheus 取最新值)
  · select_provider: failover 時 inc Counter + 切 Primary Gauge
  · try/except 包裹,metric 失敗不阻斷主路由

E2E 測試:
- test_failover_e2e_dispatch.py (365 行)
  完整 dispatch 路徑:health check → failover decide → alerter → metrics

Tests: 54 passed (e2e_dispatch + failover_manager + failover_alerter)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Multiple Engineers (上 session Wave 5) <noreply@anthropic.com>
2026-04-26 20:56:19 +08:00
..