Files
awoooi/docs/runbooks/disaster-recovery/DR-Bitan.md
OG T 43edff184d feat(dr): Sprint C — Host rsync 備份 + DR SOP 文件
C-1 Velero: 已確認運作中(daily-awoooi-prod schedule, 13d, MinIO Available)

C-2 Host rsync 備份:
  scripts/ops/backup-from-110.sh — 188 每日凌晨 1:00 rsync 備份 110
    - Harbor registry data(最高優先)
    - Gitea repos
    - bitan-pharmacy.git(若存在)
    - 成功寫入 /var/run/backup-110.last_success 供 Prometheus 監控
    - 失敗時 Telegram 告警
  ops/monitoring/alerts-unified.yml — 新增 HostBackupFailed 告警規則

C-3 DR SOP 文件:
  docs/runbooks/disaster-recovery/DR-K8s-awoooi.md  (<15分鐘)
  docs/runbooks/disaster-recovery/DR-Nginx.md        (<5分鐘)
  docs/runbooks/disaster-recovery/DR-Harbor.md       (<30分鐘)
  docs/runbooks/disaster-recovery/DR-Bitan.md        (<5分鐘)
  docs/runbooks/disaster-recovery/DR-Stock.md        (<5分鐘)

部署備份腳本說明 (需手動執行):
  scp scripts/ops/backup-from-110.sh ollama@192.168.0.188:~/bin/backup-from-110.sh
  ssh ollama@192.168.0.188 "chmod +x ~/bin/backup-from-110.sh && mkdir -p /backup/110/{harbor,gitea}"
  ssh ollama@192.168.0.188 "echo '0 1 * * * /home/ollama/bin/backup-from-110.sh' | crontab -"

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-11 03:04:18 +08:00

1.1 KiB
Raw Blame History

DR-Bitan — bitan-pharmacy 容器崩潰復原 SOP

目標時間: < 5 分鐘
觸發場景: bitan-pharmacy 容器停止、崩潰,或 Docker daemon 重啟後未自動啟動
工具: docker compose, Ansible
最後更新: 2026-04-11 (Claude Sonnet 4.6 Asia/Taipei)


快速復原

ssh wooo@192.168.0.110 "cd /home/wooo/apps/bitan-pharmacy && docker compose up -d"

# 驗收30 秒內)
curl -s -o /dev/null -w '%{http_code}' https://bitan.wooo.work
# 期望: 200

診斷步驟(若快速復原失敗)

# 查看容器狀態
ssh wooo@192.168.0.110 "docker ps -a | grep bitan"

# 查看最近 log
ssh wooo@192.168.0.110 "docker logs bitan-pharmacy --tail 50"

# 常見問題:
# 1. Port 3003 被佔用 → 找佔用程序: ss -tlnp | grep 3003
# 2. 磁碟空間不足 → df -h
# 3. Image 損壞 → docker compose build && docker compose up -d

用 Ansible 確認狀態

# 在 MacBook 執行
ansible-playbook -i infra/ansible/inventory/hosts.yml \
  infra/ansible/playbooks/110-devops.yml \
  --tags bitan