ogt
|
fcd4337b3a
|
fix(ops): expose host runner build load in exporter [skip ci]
|
2026-06-27 12:58:01 +08:00 |
|
ogt
|
7f706feded
|
fix(ops): recognize repo-scoped CI containers in load guard [skip ci]
|
2026-06-27 12:45:45 +08:00 |
|
ogt
|
8fdcc0194f
|
fix(ops): recover backup core after reboot [skip ci]
|
2026-06-27 03:06:42 +08:00 |
|
Your Name
|
b07486b7f2
|
docs(ops): record nginx exporter recovery [skip ci]
|
2026-06-24 20:19:08 +08:00 |
|
Your Name
|
35a3a59839
|
fix(ops): reduce post-reboot notification noise
Code Review / ai-code-review (push) Successful in 18s
Ansible / Reboot Recovery Contract / validate (push) Has been cancelled
|
2026-06-24 06:52:47 +08:00 |
|
Your Name
|
95f442adab
|
fix(ops): harden 188 backup exporter recovery [skip ci]
|
2026-06-24 06:37:44 +08:00 |
|
Your Name
|
271a9a526d
|
docs(ops): record 188 node exporter recovery [skip ci]
|
2026-06-24 02:28:16 +08:00 |
|
Your Name
|
ff18872a23
|
feat(ops): 新增 host runaway process aiops guard
Code Review / ai-code-review (push) Successful in 14s
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Failing after 26s
Ansible / Reboot Recovery Contract / validate (push) Has been cancelled
|
2026-06-18 14:17:03 +08:00 |
|
Your Name
|
63d8361f2a
|
docs(ops): 收斂重啟 repo-side readiness blockers [skip ci]
|
2026-06-18 12:11:56 +08:00 |
|
Your Name
|
ee2cc2bfc3
|
fix(alerts): 收斂 Telegram 告警到 SRE 戰情室
CD Pipeline / tests (push) Failing after 1m23s
CD Pipeline / build-and-deploy (push) Has been skipped
CD Pipeline / post-deploy-checks (push) Has been skipped
Code Review / ai-code-review (push) Successful in 15s
|
2026-06-12 11:06:16 +08:00 |
|
Your Name
|
3418e014bc
|
fix(security): 移除即時高風險明文與 SSH 信任缺口 [skip ci]
|
2026-06-11 11:10:26 +08:00 |
|
Your Name
|
cfb866d055
|
feat(governance): add agent market automation surfaces
Ansible Lint / lint (push) Successful in 35s
CD Pipeline / tests (push) Failing after 13s
CD Pipeline / build-and-deploy (push) Has been skipped
CD Pipeline / post-deploy-checks (push) Has been skipped
Code Review / ai-code-review (push) Failing after 11s
|
2026-06-04 21:50:55 +08:00 |
|
Your Name
|
017dba8b00
|
docs(argocd): codify health persistence config [skip ci]
|
2026-06-04 09:33:45 +08:00 |
|
Your Name
|
d0163b2d69
|
docs(ops): document ollama 111 fallback diagnosis [skip ci]
|
2026-06-04 09:31:20 +08:00 |
|
Your Name
|
ae7b39d96a
|
fix(ops): harden reboot recovery and backup alerts
|
2026-05-29 12:41:34 +08:00 |
|
Your Name
|
d6d2719e02
|
fix(alerts): deploy drift guard with canonical rules
Code Review / ai-code-review (push) Has been cancelled
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Successful in 29s
|
2026-05-29 11:14:12 +08:00 |
|
Your Name
|
7d2128b53c
|
fix(alerts): keep prometheus canonical rules in sync
Code Review / ai-code-review (push) Successful in 11s
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Successful in 32s
|
2026-05-29 11:09:33 +08:00 |
|
Your Name
|
ae9d0b7385
|
feat(monitoring): alert on stale source provider ingestion
Code Review / ai-code-review (push) Successful in 10s
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Successful in 25s
CD Pipeline / tests (push) Successful in 3m26s
CD Pipeline / build-and-deploy (push) Successful in 3m38s
CD Pipeline / post-deploy-checks (push) Successful in 1m25s
|
2026-05-20 19:19:21 +08:00 |
|
Your Name
|
4956fbb849
|
fix(monitoring): verify alert rule deploy content
Code Review / ai-code-review (push) Successful in 11s
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Successful in 23s
|
2026-05-20 13:26:24 +08:00 |
|
Your Name
|
d2a4a17969
|
fix(governance): stabilize adr100 km growth slo
Code Review / ai-code-review (push) Successful in 22s
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Successful in 25s
CD Pipeline / tests (push) Successful in 1m11s
CD Pipeline / post-deploy-checks (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
|
2026-05-14 19:33:52 +08:00 |
|
Your Name
|
a0a0731cd6
|
fix(auto-repair): preserve exact playbook candidates
Code Review / ai-code-review (push) Successful in 10s
CD Pipeline / tests (push) Successful in 5m46s
CD Pipeline / build-and-deploy (push) Successful in 4m6s
CD Pipeline / post-deploy-checks (push) Successful in 1m28s
|
2026-05-13 23:38:19 +08:00 |
|
Your Name
|
7a8cbb3241
|
fix(auto-repair): prefer exact playbooks and fail failed steps
Code Review / ai-code-review (push) Successful in 11s
CD Pipeline / tests (push) Successful in 1m3s
CD Pipeline / build-and-deploy (push) Successful in 3m31s
CD Pipeline / post-deploy-checks (push) Successful in 1m32s
|
2026-05-13 23:21:17 +08:00 |
|
Your Name
|
4ee57b710d
|
fix(ops): support API image path for T16 seed script
Code Review / ai-code-review (push) Successful in 10s
|
2026-05-13 23:03:40 +08:00 |
|
Your Name
|
1778a692e0
|
feat(awooop): add auto repair canary live-fire target
Code Review / ai-code-review (push) Successful in 11s
CD Pipeline / tests (push) Successful in 1m11s
CD Pipeline / build-and-deploy (push) Failing after 6m52s
CD Pipeline / post-deploy-checks (push) Has been skipped
|
2026-05-13 22:30:20 +08:00 |
|
Your Name
|
b4d367eeb4
|
feat(awooop): expose mcp bridge truth chain
Code Review / ai-code-review (push) Successful in 13s
CD Pipeline / tests (push) Successful in 1m17s
CD Pipeline / build-and-deploy (push) Successful in 3m55s
CD Pipeline / post-deploy-checks (push) Successful in 1m45s
|
2026-05-13 03:21:31 +08:00 |
|
Your Name
|
de16c88418
|
chore(rls): 套用 outbound message canary
Code Review / ai-code-review (push) Successful in 11s
|
2026-05-12 21:55:23 +08:00 |
|
Your Name
|
7d92f0acd7
|
chore(rls): stage projects canary path
Code Review / ai-code-review (push) Successful in 10s
CD Pipeline / tests (push) Successful in 1m8s
CD Pipeline / build-and-deploy (push) Successful in 3m49s
CD Pipeline / post-deploy-checks (push) Successful in 1m25s
|
2026-05-12 21:25:24 +08:00 |
|
Your Name
|
b7af597459
|
chore(rls): 套用 tool registry canary wave1.1
Code Review / ai-code-review (push) Successful in 10s
|
2026-05-12 21:15:14 +08:00 |
|
Your Name
|
8c4dc7a5a8
|
chore(rls): 新增 manual script gate 與 canary wave1
Code Review / ai-code-review (push) Successful in 10s
CD Pipeline / tests (push) Successful in 1m5s
CD Pipeline / build-and-deploy (push) Failing after 10m6s
CD Pipeline / post-deploy-checks (push) Has been skipped
|
2026-05-12 20:23:27 +08:00 |
|
Your Name
|
ff30c61c4c
|
fix(rls): 收斂 API DB access context
Code Review / ai-code-review (push) Successful in 21s
CD Pipeline / tests (push) Successful in 1m20s
CD Pipeline / build-and-deploy (push) Successful in 4m15s
CD Pipeline / post-deploy-checks (push) Successful in 1m58s
|
2026-05-12 19:55:13 +08:00 |
|
Your Name
|
f0255e0300
|
chore(ops): 補強 RLS role bootstrap gate
Code Review / ai-code-review (push) Successful in 10s
|
2026-05-12 18:36:35 +08:00 |
|
Your Name
|
0bc1878778
|
chore(ops): 新增 RLS preflight 與 registry certbot 修復包
Code Review / ai-code-review (push) Successful in 13s
|
2026-05-12 18:25:53 +08:00 |
|
Your Name
|
1a74286dfa
|
fix(awooop): mirror ops notifications through api
Code Review / ai-code-review (push) Successful in 10s
|
2026-05-12 14:43:09 +08:00 |
|
Your Name
|
d3e1b61096
|
fix(ops): persist 188 ollama localhost binding
Code Review / ai-code-review (push) Successful in 11s
|
2026-05-06 15:27:19 +08:00 |
|
Your Name
|
f88a3a846b
|
fix(ops): contain 188 ollama gateway exposure
Code Review / ai-code-review (push) Successful in 10s
|
2026-05-06 15:18:28 +08:00 |
|
Your Name
|
d441f70693
|
fix(ai): add 188 ollama retirement gate
Code Review / ai-code-review (push) Successful in 10s
CD Pipeline / tests (push) Successful in 1m2s
CD Pipeline / build-and-deploy (push) Successful in 9m2s
CD Pipeline / post-deploy-checks (push) Successful in 1m15s
|
2026-05-06 14:55:21 +08:00 |
|
OG T
|
6e2ab7cedc
|
fix(alertmanager): make live config deployment safe
Code Review / ai-code-review (push) Successful in 10s
|
2026-05-06 13:52:57 +08:00 |
|
Your Name
|
587551c1f1
|
fix(ops): monitor full-stack cold-start gates
Code Review / ai-code-review (push) Successful in 11s
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Successful in 18s
|
2026-05-06 00:48:05 +08:00 |
|
Your Name
|
ed7c6946cb
|
docs(awooop): define private Ollama mesh gateway
Code Review / ai-code-review (push) Successful in 10s
|
2026-05-05 22:56:22 +08:00 |
|
Your Name
|
72d66e4ae6
|
fix(ops): align stale job cleanup thresholds
Code Review / ai-code-review (push) Successful in 28s
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Successful in 36s
|
2026-05-05 14:54:17 +08:00 |
|
Your Name
|
5e625f777d
|
fix(ops): add stale gitea job cleanup guard
Code Review / ai-code-review (push) Has been cancelled
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Has been cancelled
|
2026-05-05 14:50:47 +08:00 |
|
Your Name
|
7d45f0cb58
|
fix(ops): alert on stale gitea actions jobs
CD Pipeline / tests (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
CD Pipeline / post-deploy-checks (push) Has been cancelled
Code Review / ai-code-review (push) Has been cancelled
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Has been cancelled
|
2026-05-05 14:42:09 +08:00 |
|
Your Name
|
34d1c76be9
|
fix(ops): route systemd runner baseline alerts
CD Pipeline / build-and-deploy (push) Has been cancelled
CD Pipeline / post-deploy-checks (push) Has been cancelled
CD Pipeline / tests (push) Has been cancelled
Code Review / ai-code-review (push) Has been cancelled
|
2026-05-05 14:19:58 +08:00 |
|
Your Name
|
fe618960a8
|
fix(ops): monitor systemd runners in host baseline
CD Pipeline / build-and-deploy (push) Has been cancelled
CD Pipeline / tests (push) Has been cancelled
CD Pipeline / post-deploy-checks (push) Has been cancelled
Code Review / ai-code-review (push) Has been cancelled
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Successful in 39s
|
2026-05-05 14:08:43 +08:00 |
|
Your Name
|
e8e6748f70
|
fix(ops): add docker host resource baseline guardrails
CD Pipeline / tests (push) Failing after 1m50s
CD Pipeline / build-and-deploy (push) Has been skipped
CD Pipeline / post-deploy-checks (push) Has been skipped
Code Review / ai-code-review (push) Successful in 25s
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Successful in 38s
|
2026-05-05 13:45:09 +08:00 |
|
Your Name
|
95110971f3
|
fix(telegram): close remaining DM alert routes
CD Pipeline / tests (push) Successful in 1m27s
Code Review / ai-code-review (push) Successful in 29s
CD Pipeline / post-deploy-checks (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
|
2026-04-30 23:02:17 +08:00 |
|
Your Name
|
e27b462bef
|
fix(ops): keep disabled gitea runner stopped
Code Review / ai-code-review (push) Successful in 27s
|
2026-04-30 10:59:46 +08:00 |
|
OG T
|
fb1d101902
|
fix(backup): HostBackupFailed P1 根治 — Prometheus textfile 指標 + docker socket 讀取
問題一:backup_110_last_success_timestamp 指標從未存在
根因:腳本只寫純文字 last_success 檔,從未輸出 .prom 格式
修復:成功時寫入 /home/ollama/node_exporter_textfiles/backup.prom
node_exporter 新增 --collector.textfile.directory=/textfile_collector
volume: /home/ollama/node_exporter_textfiles:/textfile_collector
問題二:Harbor/Gitea rsync 權限拒絕
根因:/var/lib/docker/volumes/ 是 710 root:root,docker group 無法直接存取 FS 路徑
修復:改用 docker run --rm -v <volume>:/source alpine tar czf -
透過 docker socket(wooo 已在 docker group)讀取 volume 內容再解壓
驗證:備份腳本三項全 OK,node_exporter 9100/metrics 正確輸出指標
Prometheus absent(backup_110_last_success_timestamp) 應在下次 scrape 後清除
2026-04-18 ogt + Claude Sonnet 4.6
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-18 10:37:23 +08:00 |
|
OG T
|
de055778b3
|
fix(cd): CD_PUSH_TOKEN + backup 路徑使用 BACKUP_ROOT 環境變數
CD Pipeline / build-and-deploy (push) Has been cancelled
- cd.yaml: GITEA_CD_TOKEN → CD_PUSH_TOKEN(Gitea 保留 GITEA_ 前綴)
- ADR-069: 同步更新 token 名稱說明
- backup-from-110.sh: 改用 BACKUP_ROOT 環境變數(預設 /home/ollama/backup/110)
避免 /var/log /var/run 需要 root 權限
- 已部署到 188 + cron 0 1 * * * 設定完成
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-11 09:07:47 +08:00 |
|
OG T
|
43edff184d
|
feat(dr): Sprint C — Host rsync 備份 + DR SOP 文件
C-1 Velero: 已確認運作中(daily-awoooi-prod schedule, 13d, MinIO Available)
C-2 Host rsync 備份:
scripts/ops/backup-from-110.sh — 188 每日凌晨 1:00 rsync 備份 110
- Harbor registry data(最高優先)
- Gitea repos
- bitan-pharmacy.git(若存在)
- 成功寫入 /var/run/backup-110.last_success 供 Prometheus 監控
- 失敗時 Telegram 告警
ops/monitoring/alerts-unified.yml — 新增 HostBackupFailed 告警規則
C-3 DR SOP 文件:
docs/runbooks/disaster-recovery/DR-K8s-awoooi.md (<15分鐘)
docs/runbooks/disaster-recovery/DR-Nginx.md (<5分鐘)
docs/runbooks/disaster-recovery/DR-Harbor.md (<30分鐘)
docs/runbooks/disaster-recovery/DR-Bitan.md (<5分鐘)
docs/runbooks/disaster-recovery/DR-Stock.md (<5分鐘)
部署備份腳本說明 (需手動執行):
scp scripts/ops/backup-from-110.sh ollama@192.168.0.188:~/bin/backup-from-110.sh
ssh ollama@192.168.0.188 "chmod +x ~/bin/backup-from-110.sh && mkdir -p /backup/110/{harbor,gitea}"
ssh ollama@192.168.0.188 "echo '0 1 * * * /home/ollama/bin/backup-from-110.sh' | crontab -"
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-11 03:04:18 +08:00 |
|