Your Name
|
2104f0f01a
|
fix(recovery): harden runner failclosed authority copy [skip ci]
|
2026-06-28 16:32:28 +08:00 |
|
Your Name
|
f52ec0db26
|
fix(recovery): add runner failclosed cron authority [skip ci]
|
2026-06-28 16:32:27 +08:00 |
|
Your Name
|
d7f56351f2
|
fix(recovery): reopen controlled automation after failclosed regression
CD Pipeline / workflow-shape (push) Successful in 0s
CD Pipeline / cancel-stale-cd (push) Has been skipped
Code Review / ai-code-review (push) Has been cancelled
CD Pipeline / tests (push) Failing after 14m8s
Type Sync Check / check-type-sync (push) Successful in 42s
CD Pipeline / build-and-deploy (push) Has been cancelled
CD Pipeline / post-deploy-checks (push) Has been cancelled
|
2026-06-28 16:01:40 +08:00 |
|
Your Name
|
ba054e698d
|
fix(recovery): seal runner failclosed disablers [skip ci]
|
2026-06-28 15:58:06 +08:00 |
|
Your Name
|
3c495bb472
|
fix(ci): preserve controlled cd drain lane
Code Review / ai-code-review (push) Successful in 16s
|
2026-06-28 14:30:50 +08:00 |
|
Your Name
|
4414ec991f
|
fix(ci): reopen hard-limited controlled cd lane
CD Pipeline / workflow-shape (push) Successful in 0s
CD Pipeline / cancel-stale-cd (push) Has been skipped
CD Pipeline / tests (push) Successful in 1m42s
Code Review / ai-code-review (push) Successful in 15s
CD Pipeline / build-and-deploy (push) Successful in 6m33s
CD Pipeline / post-deploy-checks (push) Successful in 3m10s
|
2026-06-28 11:53:42 +08:00 |
|
Your Name
|
f109b11478
|
fix(recovery): seal 110 cd lane restore sources [skip ci]
|
2026-06-28 11:37:01 +08:00 |
|
Your Name
|
e97b252475
|
fix(cd): reopen controlled runtime deploy lane
CD Pipeline / tests (push) Failing after 7s
CD Pipeline / build-and-deploy (push) Has been skipped
CD Pipeline / post-deploy-checks (push) Has been skipped
Code Review / ai-code-review (push) Successful in 17s
|
2026-06-28 11:09:42 +08:00 |
|
Your Name
|
241cbe067e
|
fix(recovery): freeze 110 cd lane and source-aware 188 gates [skip ci]
|
2026-06-28 10:58:41 +08:00 |
|
Your Name
|
0531050934
|
fix(runner): split controlled cd lane guard [skip ci]
|
2026-06-28 09:56:31 +08:00 |
|
Your Name
|
00db624e5f
|
fix(reboot): fail closed direct cd lane pressure path [skip ci]
|
2026-06-28 09:46:46 +08:00 |
|
Your Name
|
3200f9af97
|
docs(runner): add direct runner pressure exception [skip ci]
|
2026-06-28 09:00:26 +08:00 |
|
Your Name
|
899635cc63
|
docs(runner): record 110 fail-closed pressure exception [skip ci]
|
2026-06-28 08:44:45 +08:00 |
|
Your Name
|
4c951b2996
|
fix(ci): keep 110 runner inactive until pressure clears
|
2026-06-27 20:15:01 +08:00 |
|
ogt
|
5e4887d15c
|
fix(ops): gate reboot recovery on product freshness [skip ci]
|
2026-06-25 19:39:42 +08:00 |
|
ogt
|
6f5e22ba69
|
fix(ops): classify momo source absence in cold-start gate [skip ci]
|
2026-06-24 23:05:42 +08:00 |
|
Your Name
|
2ec7f6f440
|
fix(ops): harden heartbeat and momo alert noise
Code Review / ai-code-review (push) Successful in 14s
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Successful in 31s
CD Pipeline / tests (push) Successful in 1m59s
CD Pipeline / build-and-deploy (push) Successful in 7m36s
CD Pipeline / post-deploy-checks (push) Failing after 43s
Ansible / Reboot Recovery Contract / validate (push) Has been cancelled
|
2026-06-24 19:38:33 +08:00 |
|
Your Name
|
95f442adab
|
fix(ops): harden 188 backup exporter recovery [skip ci]
|
2026-06-24 06:37:44 +08:00 |
|
Your Name
|
93ac6030cf
|
fix(ops): 同步 source provider freshness 告警規則
Ansible / Reboot Recovery Contract / validate (push) Has been cancelled
Code Review / ai-code-review (push) Successful in 10s
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Successful in 24s
|
2026-06-18 14:23:13 +08:00 |
|
Your Name
|
ff18872a23
|
feat(ops): 新增 host runaway process aiops guard
Code Review / ai-code-review (push) Successful in 14s
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Failing after 26s
Ansible / Reboot Recovery Contract / validate (push) Has been cancelled
|
2026-06-18 14:17:03 +08:00 |
|
Your Name
|
b997016991
|
docs(ops): 鎖定重啟 Plan B 機制檢查 [skip ci]
|
2026-06-18 11:50:53 +08:00 |
|
Your Name
|
6efd186750
|
docs(security): 建立高價值配置控管清冊 [skip ci]
|
2026-06-11 11:29:58 +08:00 |
|
Your Name
|
ae7b39d96a
|
fix(ops): harden reboot recovery and backup alerts
|
2026-05-29 12:41:34 +08:00 |
|
Your Name
|
6d2b0ed4cd
|
ops(runner): add isolation readiness gate [skip ci]
|
2026-05-24 09:56:47 +08:00 |
|
Your Name
|
4407b46bb6
|
ops(runner): inventory workflow labels [skip ci]
|
2026-05-24 09:52:04 +08:00 |
|
Your Name
|
22b45006b7
|
ops(runner): add pool inventory audit [skip ci]
|
2026-05-24 09:47:02 +08:00 |
|
Your Name
|
9b465ee140
|
ci(runner): drain legacy docker act runner safely
Code Review / ai-code-review (push) Successful in 11s
|
2026-05-21 18:53:45 +08:00 |
|
Your Name
|
b3ab4da03b
|
ci(cd): wait for host web build pressure
Code Review / ai-code-review (push) Successful in 17s
|
2026-05-21 15:51:36 +08:00 |
|
Your Name
|
ae9d0b7385
|
feat(monitoring): alert on stale source provider ingestion
Code Review / ai-code-review (push) Successful in 10s
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Successful in 25s
CD Pipeline / tests (push) Successful in 3m26s
CD Pipeline / build-and-deploy (push) Successful in 3m38s
CD Pipeline / post-deploy-checks (push) Successful in 1m25s
|
2026-05-20 19:19:21 +08:00 |
|
Your Name
|
598f33ae8b
|
fix(monitoring): clarify alert chain smoke evidence
Code Review / ai-code-review (push) Successful in 11s
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Successful in 22s
CD Pipeline / tests (push) Successful in 3m55s
CD Pipeline / build-and-deploy (push) Successful in 3m31s
CD Pipeline / post-deploy-checks (push) Successful in 1m33s
|
2026-05-20 13:11:44 +08:00 |
|
Your Name
|
21dcfbd991
|
fix(governance): collapse km slo fallback series
Code Review / ai-code-review (push) Successful in 11s
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Successful in 22s
CD Pipeline / tests (push) Successful in 1m6s
CD Pipeline / build-and-deploy (push) Successful in 5m17s
CD Pipeline / post-deploy-checks (push) Successful in 1m38s
|
2026-05-14 19:37:15 +08:00 |
|
Your Name
|
d2a4a17969
|
fix(governance): stabilize adr100 km growth slo
Code Review / ai-code-review (push) Successful in 22s
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Successful in 25s
CD Pipeline / tests (push) Successful in 1m11s
CD Pipeline / post-deploy-checks (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
|
2026-05-14 19:33:52 +08:00 |
|
Your Name
|
4111ea4f9f
|
fix(ai): remove 188 ollama provider
Code Review / ai-code-review (push) Successful in 12s
CD Pipeline / tests (push) Successful in 1m13s
CD Pipeline / build-and-deploy (push) Successful in 3m36s
CD Pipeline / post-deploy-checks (push) Successful in 1m20s
|
2026-05-06 14:34:48 +08:00 |
|
OG T
|
c4f40235f4
|
fix(alertmanager): gate direct telegram to alertchain emergencies
Code Review / ai-code-review (push) Successful in 10s
|
2026-05-06 13:45:33 +08:00 |
|
OG T
|
4753099155
|
fix(alertmanager): send direct alerts to sre group
Code Review / ai-code-review (push) Successful in 10s
|
2026-05-06 13:38:47 +08:00 |
|
Your Name
|
587551c1f1
|
fix(ops): monitor full-stack cold-start gates
Code Review / ai-code-review (push) Successful in 11s
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Successful in 18s
|
2026-05-06 00:48:05 +08:00 |
|
Your Name
|
6e96623884
|
fix(ops): harden momo scheduler cold start gate
Code Review / ai-code-review (push) Successful in 10s
|
2026-05-06 00:15:14 +08:00 |
|
Your Name
|
0315c2b510
|
docs(ops): codify full stack cold start recovery
Code Review / ai-code-review (push) Successful in 7s
|
2026-05-06 00:07:57 +08:00 |
|
Your Name
|
23932773ef
|
fix(monitoring): route docker baseline alerts to ssh
Code Review / ai-code-review (push) Successful in 11s
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Successful in 19s
|
2026-05-06 00:00:12 +08:00 |
|
Your Name
|
2f50c67f5c
|
fix(monitoring): keep host alert ssh diagnostics canonical
Code Review / ai-code-review (push) Successful in 10s
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Successful in 20s
E2E Health Check / e2e-health (push) Successful in 2m35s
|
2026-05-05 23:57:53 +08:00 |
|
Your Name
|
2221fd3256
|
fix(ops): persist host resource guardrails
CD Pipeline / tests (push) Successful in 5m25s
Code Review / ai-code-review (push) Successful in 25s
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Successful in 37s
CD Pipeline / build-and-deploy (push) Successful in 7m31s
CD Pipeline / post-deploy-checks (push) Successful in 5m10s
|
2026-05-05 16:13:19 +08:00 |
|
Your Name
|
1cc9de5722
|
fix(ops): point runner guardrail alerts to host script
CD Pipeline / tests (push) Successful in 5m31s
Code Review / ai-code-review (push) Successful in 30s
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Successful in 37s
CD Pipeline / build-and-deploy (push) Successful in 7m45s
CD Pipeline / post-deploy-checks (push) Successful in 5m4s
|
2026-05-05 15:25:37 +08:00 |
|
Your Name
|
d08d1e4951
|
fix(ops): alert on missing docker resource limits
CD Pipeline / tests (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
CD Pipeline / post-deploy-checks (push) Has been cancelled
Code Review / ai-code-review (push) Successful in 23s
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Successful in 38s
|
2026-05-05 15:01:31 +08:00 |
|
Your Name
|
72d66e4ae6
|
fix(ops): align stale job cleanup thresholds
Code Review / ai-code-review (push) Successful in 28s
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Successful in 36s
|
2026-05-05 14:54:17 +08:00 |
|
Your Name
|
5e625f777d
|
fix(ops): add stale gitea job cleanup guard
Code Review / ai-code-review (push) Has been cancelled
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Has been cancelled
|
2026-05-05 14:50:47 +08:00 |
|
Your Name
|
7d45f0cb58
|
fix(ops): alert on stale gitea actions jobs
CD Pipeline / tests (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
CD Pipeline / post-deploy-checks (push) Has been cancelled
Code Review / ai-code-review (push) Has been cancelled
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Has been cancelled
|
2026-05-05 14:42:09 +08:00 |
|
Your Name
|
fe618960a8
|
fix(ops): monitor systemd runners in host baseline
CD Pipeline / build-and-deploy (push) Has been cancelled
CD Pipeline / tests (push) Has been cancelled
CD Pipeline / post-deploy-checks (push) Has been cancelled
Code Review / ai-code-review (push) Has been cancelled
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Successful in 39s
|
2026-05-05 14:08:43 +08:00 |
|
Your Name
|
e8e6748f70
|
fix(ops): add docker host resource baseline guardrails
CD Pipeline / tests (push) Failing after 1m50s
CD Pipeline / build-and-deploy (push) Has been skipped
CD Pipeline / post-deploy-checks (push) Has been skipped
Code Review / ai-code-review (push) Successful in 25s
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Successful in 38s
|
2026-05-05 13:45:09 +08:00 |
|
Your Name
|
ec013f662d
|
fix(watchdog): 修复 Trust Drift 重复告警 + 建立 GCP Ollama nginx proxy
Code Review / ai-code-review (push) Successful in 45s
Ansible Lint / lint (push) Has been cancelled
- ai_slo_watchdog_job: 改用 trust_drift_detector 纯统计 lib
避免与 governance_agent 每小时自检查重复触发 Telegram
- infra/ansible: 建立 110 nginx proxy 转发到 GCP-A/B
端口 11435 -> 34.143.170.20:11434 (GCP-A)
端口 11436 -> 34.21.145.224:11434 (GCP-B)
- docs/runbooks: DEPLOY-GCP-OLLAMA-PROXY.md 完整部署指南
- ops/nginx: 手动部署脚本供 110 直接执行
ADR-110 三层容灾启用前提:先部署 proxy,再改 ConfigMap
|
2026-05-04 23:12:35 +08:00 |
|
Your Name
|
b1ef05fa8c
|
feat(ollama): ADR-110 GCP 三層容災架構(GCP-A → GCP-B → Local → Gemini)
Code Review / ai-code-review (push) Successful in 50s
CD Pipeline / tests (push) Failing after 1m14s
CD Pipeline / build-and-deploy (push) Has been skipped
CD Pipeline / post-deploy-checks (push) Has been skipped
## 變更摘要
- Primary: http://34.143.170.20:11434 (GCP-A SSD, 9x 載速 + 2x 推理)
- Secondary: http://34.21.145.224:11434 (GCP-B SSD)
- Fallback: http://192.168.0.111:11434 (M1 Pro Local HDD,最後防線)
- 廢止 ADR-105「111 唯一鐵律」,新建 ADR-110
## 核心改動
- config.py: 新增 OLLAMA_SECONDARY_URL;validator 加 GCP IP 白名單(34.143.170.20, 34.21.145.224)
- ollama_failover_manager.py: 三層 Ollama 決策矩陣;並行健康檢查三台;health_111 → health_gcp_a
- ollama_health_monitor.py: host label 萃取改為通用版(支援 GCP 公網 IP)
- failover_alerter.py: 故障/恢復主機動態顯示,不再硬編碼「Ollama 111 (GPU)」
- ollama_auto_recovery.py: notify_recovery 改為 ollama_gcp_a;recovered_host 動態
- k8s/awoooi-prod: configmap + deployment + network-policy 同步更新(egress 加 GCP /32)
- 服務層: 10 個服務檔案硬編碼 192.168.0.111 改為讀 settings.OLLAMA_URL
- 測試: URL 常數更新,新增三層容災場景,GCP IP 白名單驗證測試
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
2026-05-03 22:49:23 +08:00 |
|