Commit Graph

6 Commits

Author SHA1 Message Date
Your Name
ee2cc2bfc3 fix(alerts): 收斂 Telegram 告警到 SRE 戰情室
Some checks failed
CD Pipeline / tests (push) Failing after 1m23s
CD Pipeline / build-and-deploy (push) Has been skipped
CD Pipeline / post-deploy-checks (push) Has been skipped
Code Review / ai-code-review (push) Successful in 15s
2026-06-12 11:06:16 +08:00
Your Name
1a74286dfa fix(awooop): mirror ops notifications through api
All checks were successful
Code Review / ai-code-review (push) Successful in 10s
2026-05-12 14:43:09 +08:00
Your Name
95110971f3 fix(telegram): close remaining DM alert routes
Some checks failed
CD Pipeline / tests (push) Successful in 1m27s
Code Review / ai-code-review (push) Successful in 29s
CD Pipeline / post-deploy-checks (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
2026-04-30 23:02:17 +08:00
OG T
fb1d101902 fix(backup): HostBackupFailed P1 根治 — Prometheus textfile 指標 + docker socket 讀取
問題一:backup_110_last_success_timestamp 指標從未存在
根因:腳本只寫純文字 last_success 檔,從未輸出 .prom 格式
修復:成功時寫入 /home/ollama/node_exporter_textfiles/backup.prom
      node_exporter 新增 --collector.textfile.directory=/textfile_collector
      volume: /home/ollama/node_exporter_textfiles:/textfile_collector

問題二:Harbor/Gitea rsync 權限拒絕
根因:/var/lib/docker/volumes/ 是 710 root:root,docker group 無法直接存取 FS 路徑
修復:改用 docker run --rm -v <volume>:/source alpine tar czf -
      透過 docker socket(wooo 已在 docker group)讀取 volume 內容再解壓

驗證:備份腳本三項全 OK,node_exporter 9100/metrics 正確輸出指標
      Prometheus absent(backup_110_last_success_timestamp) 應在下次 scrape 後清除

2026-04-18 ogt + Claude Sonnet 4.6

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 10:37:23 +08:00
OG T
de055778b3 fix(cd): CD_PUSH_TOKEN + backup 路徑使用 BACKUP_ROOT 環境變數
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
- cd.yaml: GITEA_CD_TOKEN → CD_PUSH_TOKEN(Gitea 保留 GITEA_ 前綴)
- ADR-069: 同步更新 token 名稱說明
- backup-from-110.sh: 改用 BACKUP_ROOT 環境變數(預設 /home/ollama/backup/110)
  避免 /var/log /var/run 需要 root 權限
- 已部署到 188 + cron 0 1 * * * 設定完成

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-11 09:07:47 +08:00
OG T
43edff184d feat(dr): Sprint C — Host rsync 備份 + DR SOP 文件
C-1 Velero: 已確認運作中(daily-awoooi-prod schedule, 13d, MinIO Available)

C-2 Host rsync 備份:
  scripts/ops/backup-from-110.sh — 188 每日凌晨 1:00 rsync 備份 110
    - Harbor registry data(最高優先)
    - Gitea repos
    - bitan-pharmacy.git(若存在)
    - 成功寫入 /var/run/backup-110.last_success 供 Prometheus 監控
    - 失敗時 Telegram 告警
  ops/monitoring/alerts-unified.yml — 新增 HostBackupFailed 告警規則

C-3 DR SOP 文件:
  docs/runbooks/disaster-recovery/DR-K8s-awoooi.md  (<15分鐘)
  docs/runbooks/disaster-recovery/DR-Nginx.md        (<5分鐘)
  docs/runbooks/disaster-recovery/DR-Harbor.md       (<30分鐘)
  docs/runbooks/disaster-recovery/DR-Bitan.md        (<5分鐘)
  docs/runbooks/disaster-recovery/DR-Stock.md        (<5分鐘)

部署備份腳本說明 (需手動執行):
  scp scripts/ops/backup-from-110.sh ollama@192.168.0.188:~/bin/backup-from-110.sh
  ssh ollama@192.168.0.188 "chmod +x ~/bin/backup-from-110.sh && mkdir -p /backup/110/{harbor,gitea}"
  ssh ollama@192.168.0.188 "echo '0 1 * * * /home/ollama/bin/backup-from-110.sh' | crontab -"

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-11 03:04:18 +08:00