Files
awoooi/scripts/reboot-recovery/harbor-watchdog.service
OG T 66b12bf9eb fix(infra): 根治 Harbor Exited(128) Race Condition + harbor-watchdog 常駐自愈
問題根因:
  awoooi-startup-110.sh 在 Harbor 啟動時,第一次 compose up -d 會同時
  啟動所有容器。harbor-core/db/portal 嘗試連 syslog:1514(harbor-log 未就緒),
  失敗後 exit(128),restart:always 重試直到 backoff 放棄。
  即使後來 harbor-log healthy,其他容器已不再重試。

修復 1 — startup-110.sh Harbor 時序(4 Phase 策略):
  Phase 1: 清除所有 Exited Harbor 容器(打破 backoff 死鎖)
  Phase 2: 只啟動 harbor-log
  Phase 3: 等 harbor-log healthy(最多 90s)
  Phase 4: 啟動全組件

修復 2 — harbor-watchdog.service(常駐自愈):
  Type=simple 常駐進程,每 60s 輪詢 http://127.0.0.1:5000/v2/
  不健康 → 等 5s 再確認 → 執行 Phase 1-4 完整修復
  修復重開機時序問題無法覆蓋的「運行中崩潰」場景

Bug Fix:curl -f 會把 HTTP 401 視為失敗(exit 22),
  Harbor /v2/ 正常回傳 401(需認證),改用 curl -s 不加 -f

REBOOT-RECOVERY-SOP.md → v5.0

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 12:13:21 +08:00

23 lines
650 B
Desktop File
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
[Unit]
Description=Harbor Watchdog — 自動偵測並修復 Harbor 崩潰
# 2026-04-05 Claude Code: 解決 Harbor Exited(128) 死鎖問題
After=network-online.target docker.service awoooi-startup-110.service
Wants=network-online.target
Requires=docker.service
[Service]
Type=simple
# watchdog.sh 是無限 loopsystemd 持續監控
ExecStart=/usr/local/bin/harbor-watchdog.sh
# 若 watchdog 意外結束(腳本 bug30s 後重啟
Restart=on-failure
RestartSec=30
# 日誌
StandardOutput=journal
StandardError=journal
# watchdog 需要 root 才能操作 docker與 startup script 一致)
User=root
[Install]
WantedBy=multi-user.target