Your Name
|
ee2cc2bfc3
|
fix(alerts): 收斂 Telegram 告警到 SRE 戰情室
CD Pipeline / tests (push) Failing after 1m23s
CD Pipeline / build-and-deploy (push) Has been skipped
CD Pipeline / post-deploy-checks (push) Has been skipped
Code Review / ai-code-review (push) Successful in 15s
|
2026-06-12 11:06:16 +08:00 |
|
Your Name
|
1a74286dfa
|
fix(awooop): mirror ops notifications through api
Code Review / ai-code-review (push) Successful in 10s
|
2026-05-12 14:43:09 +08:00 |
|
Your Name
|
95110971f3
|
fix(telegram): close remaining DM alert routes
CD Pipeline / tests (push) Successful in 1m27s
Code Review / ai-code-review (push) Successful in 29s
CD Pipeline / post-deploy-checks (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
|
2026-04-30 23:02:17 +08:00 |
|
OG T
|
fb1d101902
|
fix(backup): HostBackupFailed P1 根治 — Prometheus textfile 指標 + docker socket 讀取
問題一:backup_110_last_success_timestamp 指標從未存在
根因:腳本只寫純文字 last_success 檔,從未輸出 .prom 格式
修復:成功時寫入 /home/ollama/node_exporter_textfiles/backup.prom
node_exporter 新增 --collector.textfile.directory=/textfile_collector
volume: /home/ollama/node_exporter_textfiles:/textfile_collector
問題二:Harbor/Gitea rsync 權限拒絕
根因:/var/lib/docker/volumes/ 是 710 root:root,docker group 無法直接存取 FS 路徑
修復:改用 docker run --rm -v <volume>:/source alpine tar czf -
透過 docker socket(wooo 已在 docker group)讀取 volume 內容再解壓
驗證:備份腳本三項全 OK,node_exporter 9100/metrics 正確輸出指標
Prometheus absent(backup_110_last_success_timestamp) 應在下次 scrape 後清除
2026-04-18 ogt + Claude Sonnet 4.6
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-18 10:37:23 +08:00 |
|
OG T
|
de055778b3
|
fix(cd): CD_PUSH_TOKEN + backup 路徑使用 BACKUP_ROOT 環境變數
CD Pipeline / build-and-deploy (push) Has been cancelled
- cd.yaml: GITEA_CD_TOKEN → CD_PUSH_TOKEN(Gitea 保留 GITEA_ 前綴)
- ADR-069: 同步更新 token 名稱說明
- backup-from-110.sh: 改用 BACKUP_ROOT 環境變數(預設 /home/ollama/backup/110)
避免 /var/log /var/run 需要 root 權限
- 已部署到 188 + cron 0 1 * * * 設定完成
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-11 09:07:47 +08:00 |
|
OG T
|
43edff184d
|
feat(dr): Sprint C — Host rsync 備份 + DR SOP 文件
C-1 Velero: 已確認運作中(daily-awoooi-prod schedule, 13d, MinIO Available)
C-2 Host rsync 備份:
scripts/ops/backup-from-110.sh — 188 每日凌晨 1:00 rsync 備份 110
- Harbor registry data(最高優先)
- Gitea repos
- bitan-pharmacy.git(若存在)
- 成功寫入 /var/run/backup-110.last_success 供 Prometheus 監控
- 失敗時 Telegram 告警
ops/monitoring/alerts-unified.yml — 新增 HostBackupFailed 告警規則
C-3 DR SOP 文件:
docs/runbooks/disaster-recovery/DR-K8s-awoooi.md (<15分鐘)
docs/runbooks/disaster-recovery/DR-Nginx.md (<5分鐘)
docs/runbooks/disaster-recovery/DR-Harbor.md (<30分鐘)
docs/runbooks/disaster-recovery/DR-Bitan.md (<5分鐘)
docs/runbooks/disaster-recovery/DR-Stock.md (<5分鐘)
部署備份腳本說明 (需手動執行):
scp scripts/ops/backup-from-110.sh ollama@192.168.0.188:~/bin/backup-from-110.sh
ssh ollama@192.168.0.188 "chmod +x ~/bin/backup-from-110.sh && mkdir -p /backup/110/{harbor,gitea}"
ssh ollama@192.168.0.188 "echo '0 1 * * * /home/ollama/bin/backup-from-110.sh' | crontab -"
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-11 03:04:18 +08:00 |
|