Commit Graph

2 Commits

Author SHA1 Message Date
Your Name
32affaffeb fix(critic-hotfix): 4 修補 critic BLOCKER + HIGH(CD 阻塞 + 飛輪空轉)
Some checks are pending
CD Pipeline / build-and-deploy (push) Has started running
Critic 全面審查 6 個 commit 後抓出:

CD 阻塞修復:
- test_ai_router_failover_integration.py: 3 個 test 改用 patch.object 直接
  mock _select_provider_and_model 強制初始 OLLAMA。原 IntentType.UNKNOWN mock
  在 router 內仍被 reclassify 成 DIAGNOSE → openclaw_nemo,failover 不觸發。
  → 5/5 PASSED

BLOCKER B1 — Gitea Telegram 通知永遠發不出去:
- apps/api/src/api/v1/gitea_webhook.py:399
  redis = await get_redis()  →  redis = get_redis()
  原 await 會 raise TypeError 被外層 except 吞 → Task C PR merged + workflow_run
  failure 通知全部失效(CI 綠燈是假象,test 只驗 HTTP 202 不驗實際送達)

BLOCKER B2 — P1.3+P1.4 學習鏈閉環空轉(兩處同 bug):
- apps/api/src/api/v1/webhooks.py:261
- apps/api/src/services/approval_execution.py:771(pre-existing)
  EvidenceSnapshot.get_latest_snapshot(...) 是 module-level async function
  不是 classmethod → AttributeError 被 except 吞成 warning
  → 飛輪閉環假性接通實際空跑(feature flag default off 暫時免爆)

HIGH H3 — main.py lifespan 順序競爭:
- apps/api/src/main.py: configure_alerter() 移到 _recovery_svc.start() 之前
  原順序:start() 觸發 immediate-check → 可能呼叫 alert_recovery,但 alerter
  尚未注入 Redis → dedup fail-open,重複告警風險。

HIGH H1 — Gemini quota dedup 跨日吞告警:
- apps/api/src/services/failover_alerter.py:89
  dedup key 加 :{YYYY-MM-DD} 後綴,每日獨立 dedup window
  原昨 22:00 觸發,今 21:30 再觸發時 dedup 還沒過期會被吞掉

Tests: 14 passed (failover_alerter + ai_router_failover_integration + lifespan_wiring)

延後 follow-up:
- H2: proactive_inspector memory metric 改名 + baseline 清理
- H4: probe_success NaN fallback
- M1-M4 / S1-S2: 見 critic 報告

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 20:39:53 +08:00
Your Name
55c6b4e2d9 feat(p1): Ollama 多層容災系統 — P1.1 健康檢測 + P1.2 ai_router 整合 + P1.5 容災告警
ADR-092 P1 飛輪閉環的 Ollama 失敗轉移子系統,全部 Engineer-A2/C/C2 補上。

新服務 (1581 行):
- ollama_health_monitor.py (356):3 層健康檢測(TCP/HTTP/推理)
- ollama_failover_manager.py (571):111→188 自動切換 + Redis 持久化 + recovery callback
- ollama_auto_recovery.py (436):30s 背景監控 + 連續 3 次 HEALTHY → 切回 + clear_cache
- failover_alerter.py (218):P1.5 Telegram 容災告警

服務整合:
- ai_router.py: AIProviderEnum.OLLAMA_188 + 120s budget + failover fallback chain
- main.py lifespan: 啟動時 wire callback + start recovery,關閉時優雅 stop
- config.py: OLLAMA_FALLBACK_URL / OLLAMA_HEALTH_CHECK_MODEL / GEMINI_DAILY_QUOTA(帳單熔斷)

K8s 配置:
- 04-configmap.yaml.patch-188-fallback:注入 OLLAMA_FALLBACK_URL=http://192.168.0.188:11434

測試 (2082 行):
- test_ollama_health_monitor.py (402)
- test_ollama_failover_manager.py (707)
- test_ollama_auto_recovery.py (580)
- test_ai_router_failover_integration.py (257)
- test_lifespan_failover_wiring.py (136)

依賴鏈:service 三件套 + ai_router + main.py 一起 commit,缺一就 ImportError。

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 20:18:33 +08:00