Commit Graph

5 Commits

Author SHA1 Message Date
Your Name
4111ea4f9f fix(ai): remove 188 ollama provider
All checks were successful
Code Review / ai-code-review (push) Successful in 12s
CD Pipeline / tests (push) Successful in 1m13s
CD Pipeline / build-and-deploy (push) Successful in 3m36s
CD Pipeline / post-deploy-checks (push) Successful in 1m20s
2026-05-06 14:34:48 +08:00
Your Name
b1ef05fa8c feat(ollama): ADR-110 GCP 三層容災架構(GCP-A → GCP-B → Local → Gemini)
Some checks failed
Code Review / ai-code-review (push) Successful in 50s
CD Pipeline / tests (push) Failing after 1m14s
CD Pipeline / build-and-deploy (push) Has been skipped
CD Pipeline / post-deploy-checks (push) Has been skipped
## 變更摘要
- Primary: http://34.143.170.20:11434 (GCP-A SSD, 9x 載速 + 2x 推理)
- Secondary: http://34.21.145.224:11434 (GCP-B SSD)
- Fallback: http://192.168.0.111:11434 (M1 Pro Local HDD,最後防線)
- 廢止 ADR-105「111 唯一鐵律」,新建 ADR-110

## 核心改動
- config.py: 新增 OLLAMA_SECONDARY_URL;validator 加 GCP IP 白名單(34.143.170.20, 34.21.145.224)
- ollama_failover_manager.py: 三層 Ollama 決策矩陣;並行健康檢查三台;health_111 → health_gcp_a
- ollama_health_monitor.py: host label 萃取改為通用版(支援 GCP 公網 IP)
- failover_alerter.py: 故障/恢復主機動態顯示,不再硬編碼「Ollama 111 (GPU)」
- ollama_auto_recovery.py: notify_recovery 改為 ollama_gcp_a;recovered_host 動態
- k8s/awoooi-prod: configmap + deployment + network-policy 同步更新(egress 加 GCP /32)
- 服務層: 10 個服務檔案硬編碼 192.168.0.111 改為讀 settings.OLLAMA_URL
- 測試: URL 常數更新,新增三層容災場景,GCP IP 白名單驗證測試

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 22:49:23 +08:00
Your Name
d845d53257 fix(security): keep Gemini key out of request URLs
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 15m5s
2026-04-29 22:56:12 +08:00
Your Name
dccdcdbaf5 fix(flywheel): unblock action safety and Claude fallback
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 9m45s
2026-04-29 21:51:18 +08:00
Your Name
ed205489c1 feat(p3.2-tests+ci-schema): model_version 測試 + CI test_schema 對齊 + Grafana SLO Dashboard
Some checks failed
CD Pipeline / build-and-deploy (push) Failing after 1m20s
P3.2 配套測試 + CI 環境同步 + ADR-100 Grafana 視覺化:

CI test_schema 補齊(解 1162-1172 阻塞之延伸):
- setup_test_schema.sql 加 ai_provider_version_history 表
- 對齊 production p3_2_provider_version_history.sql(已 K8s exec 上線)

新增測試 (636 行):
- test_model_version_probe.py (387) — Provider 探測單元測試
- test_model_version_tracker.py (249) — Tracker 整合測試
  · 4 個 DB-dependent tests 標 @pytest.mark.integration
  · 15 unit + 4 integration(unit step 跳過 integration class)

新增配套:
- ai-slo-dashboard.json (496 行) — Grafana 儀表板
  · 對應 ADR-100 SLO 規則的 4 大面板:
    自主修復成功率 / 飛輪閉環延遲 / 治理事件 / Provider 健康度

修改:
- governance_agent.py +122 行 — SLO 指標暴露 + retrieve metric 整合

Tests: 15 passed (probe + tracker unit), 4 deselected (integration class)

Production 部署狀態:
- p2_decision_fusion_columns.sql  K8s exec 完成(commit c58bdd0c)
- p3_2_provider_version_history.sql  K8s exec 完成(this commit)
- 兩個 production migration 都已上線,CI test_schema 同步補齊

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 14:57:16 +08:00