47 KiB
BACKUP-STATUS.md — 備份狀態總覽
2026-04-05 Claude Code: 首席架構師完整盤點 — 全服務全自動化 + 告警機制 備份中心:192.168.0.110 (
/backup/) — Restic + latest-only retention + Google Drive/rclone offsite mirror 2026-06-12 Codex pre-reboot refresh: 110 cron / Google Drive rclone / Alertmanager / credential escrow / cold-start scorecard rechecked. 2026-06-12 Codex post-reboot refresh: 110 recovered, offsite latest-only verified, and remaining red gates narrowed to 120 config capture plus credential escrow evidence. 2026-06-12 Codex post-120 recovery refresh: 120 restored, backup aggregate / offsite / full cold-start green; DR still blocked only by credential escrow evidence. 2026-06-13 Codex live refresh: backup core remains green; DR still blocked only by credential escrow evidence. 2026-06-13 Codex post-CD refresh: backup/offsite/alert contracts remain green after deploy markere4a349bc; global SSH trust guardrail held; DR still blocked only by credential escrow evidence. 2026-06-13 Codex escrow refresh: 13:10 live report confirms offsite/rclone/script readiness is green and only five non-secret credential escrow evidence markers remain missing. 2026-06-18 Codex cold-start refresh: full-stack service readiness is green after stale failed Job classification; backup core remains green; DR still blocked only by five credential escrow evidence markers. 2026-06-24 Codex Velero/exporter refresh: 188 MinIO / Velero backup freshness, 188 PostgreSQL / Redis exporters, 188 node-exporter, and 110 disk pressure are recovered; DR still blocked only by five credential escrow evidence markers and service full-green is blocked by MOMO data freshness. 2026-06-24 22:17 Codex backup readback: 11013/13 fresh failed=0, 1882/2 fresh failed=0,core_blockers=0,integrity_stale=0,offsite_fresh=1,rclone_gdrive_fresh=1,escrow_missing=5; MOMO import-boundary fix is production-deployed, but full-stack remains blocked by MOMO data freshness. 2026-06-24 22:40 Codex MOMO source readback: scheduler / DB / import metadata confirm the full-stack blocker is missing upstream source data, not backup freshness; no manual import or Drive write was performed. 2026-06-24 23:04 Codex cold-start gate refresh: repo-side v1.42 dry-run now emits MOMO source-absence evidence and blocks with188 momo source file absent while daily sales data stale; backup/offsite remains green and live 110 script deployment is not claimed. 2026-06-24 23:15 Codex live-sync gate readback: read-only deploy parity check correctly blocks because repo cold-start hashf60b81029969a527dc742ebc9558d2933f11fe24ec4f46f7a7bc6637759b7b05differs from 110 live hash10608873d406911a519afa96218abebc2b85ab6123bdf46b6e21eb269e554bb8; installer remains a live write requiring explicit approval. 2026-06-24 23:33 Codex backup readback: 11013/13 fresh failed=0, 1882/2 fresh failed=0,core_blockers=0,integrity_stale=0,offsite_fresh=1,rclone_gdrive_fresh=1,escrow_missing=5; live cold-start still blocks only on MOMO source absence / data freshness, not backup. 2026-06-25 09:05 Codex backup readback: 11013/13 fresh failed=0, 1882/2 fresh failed=0,core_blockers=0,integrity_stale=0,offsite_fresh=1,rclone_gdrive_fresh=1,escrow_missing=5; live cold-start isPASS=87 WARN=1 BLOCKED=1because MOMO business data is stale and MOMO Google Drive token metadata is missing. 2026-06-25 09:37 Codex MOMO deploy readback: MOMOmainise137d7a5d02a7595a44c3f3cc1cf54b766424ee7, Gitea Actionscd.yaml #910succeeded, and 188 host /momo-schedulersource now fail closed on Google Drive auth/API failure. Backup/offsite remains green; full-stack still blocks on MOMO data freshness andescrow_missing=5. 2026-06-25 10:23 Codex MOMO fail-closed live proof: 10:04 scheduler run recorded Google Drive auth failure as❌ 自動匯入失敗and sent Telegram failure notification successfully; cold-start remainsPASS=87 WARN=1 BLOCKED=1because business data is stale beyond 3 days and Drive token metadata/writeback is not confirmed. Backup/offsite remains green andescrow_missing=5remains the DR blocker. 2026-06-25 10:35 Codex route / DB / backup refresh: direct public routes for AWOOOI API, IwoooS, VibeWork, AwoooGo, MOMO health, Stock, and Bitan are 200; backup remains 11013/13and 1882/2fresh; MOMO daily and monthly DB bounds still stop at2026-06-17; latest import job remains56 completed. 2026-06-25 19:17 Codex latest recovery readback: post-start quick check isFULL_STACK_GREEN_DR_ESCROW_BLOCKED; 110 backup13/13 fresh failed=0, 188 backup2/2 fresh failed=0,core_blockers=0,integrity_stale=0,offsite_fresh=1,rclone_gdrive_fresh=1; MOMO data freshness is recovered through2026-06-24; DR still blocked only byescrow_missing=5. 2026-06-25 19:35 Codex product-data gate refresh: backup/offsite remains green, but overall "all products/data latest" is blocked by StockPlatform/api/v1/system/freshness(core_margin_short_daily_missing,ai_recommendations_stale). This is not a backup failure; keepescrow_missing=5as the DR blocker and Stock freshness as a separate product-data blocker. 2026-06-25 20:11 Codex StockPlatform cron-source recovery: StockPlatform Gitea/live source is nowfb91aa4c6272469d1d26e0820169629eac17d28a; six missing production cron entrypoints are restored; natural cron runs for source remediation, market index, price, margin, chips, and AI no longer fail from missing files. Backup/offsite remains green. Stock freshness still blocks because official 2026-06-25 margin-short data is pending and AI recommendations correctly stay on 2026-06-24; this is still not a backup or restore incident. 2026-06-25 20:25 Codex 110 CPU cleanup: two orphan StockPlatform headless Chrome process groups were cleared by targeted approvedSIGTERM; no Docker/systemd/Nginx/K8s/DB/backup write occurred. Backup/offsite remains green, DR still blocked byescrow_missing=5, and Stock freshness remains the only hard product-data blocker. 2026-06-25 21:14 Codex full wrapper refresh: StockPlatform 21:00intelligence-syncand 21:10 AI pipeline naturally caught up;/api/v1/system/freshnessisstatus=okwith blockers[]. Backup/offsite remains 11013/13and 1882/2fresh,core_blockers=0,offsite_fresh=1,rclone_gdrive_fresh=1; full-stack service/data result isFULL_STACK_GREEN_DR_ESCROW_BLOCKED, with onlyescrow_missing=5blocking DR complete. 2026-06-26 06:28 Codex隔日 backup readback: 11013/13 fresh failed=0, 1882/2 fresh failed=0,core_blockers=0,integrity_stale=0,offsite_fresh=1,rclone_gdrive_fresh=1,last_backup_all=2026-06-26 02:31:02,escrow_missing=5; full-stack service/data result remainsFULL_STACK_GREEN_DR_ESCROW_BLOCKED. 2026-06-27 00:56 Codex backup core recovery: 188momo_pg_dailywas fresh but temporarily false-blocked by cron/config drift (configured_missing_188=1). 188 crontab was backed up to/home/ollama/momo_backups/crontab-before-momo-pg-host-owned-20260627-001925.txt, the daily MOMO PostgreSQL backup entry was restored to host-owned/home/ollama/bin/momo-pg-backup.sh, and the exporter now reportsawoooi_backup_job_configured{host="188",job="momo_pg_daily"} 1.backup-statusnow reports 11013/13 fresh failed=0, 1882/2 fresh failed=0,core_blockers=0,configured_missing_188=0,integrity_stale=0,offsite_fresh=1,rclone_gdrive_fresh=1,escrow_missing=5; DR still blocked only by credential escrow evidence. 2026-06-27 02:42 Codex post-reboot revalidation:post-reboot-readiness-summary.shremainsFULL_STACK_GREEN_DR_ESCROW_BLOCKEDwithSERVICE_GREEN=1,PRODUCT_DATA_GREEN=1,BACKUP_CORE_GREEN=1,HOST_188_HYGIENE_BLOCKED=0,STOCK_FRESHNESS_STATUS=ok, andESCROW_MISSING_COUNT=5.dr-offsite-operator-checklist.sh --checkconfirmsCORE_COLD_START_GREEN=1,RECOVERY_STATE=CORE_READY_DR_OFFSITE_PENDING, live Prometheusawoooi_recovery_core_ready=1, andawoooi_recovery_dr_offsite_ready=0.
2026-06-27 00:56 Backup / Offsite / Escrow Live Status
Read-only and minimal-write evidence sources: 00:56 /backup/scripts/backup-status.sh --no-notify --no-refresh from 110, 188 crontab backup / controlled MOMO backup path correction, 188 textfile exporter refresh, post-start quick check at 00:57, and Prometheus recovery recording-rule readback.
- 110 backup health:
13/13 fresh failed=0。 - 188 backup health:
2/2 fresh failed=0。 - Integrity / configured blockers:
core_blockers=0、configured_missing_110=0、configured_missing_188=0、script_missing_110=0、script_missing_188=0、integrity_stale=0。 - 188 MOMO backup config drift fix: crontab rollback file
/home/ollama/momo_backups/crontab-before-momo-pg-host-owned-20260627-001925.txt; active cron now uses/home/ollama/bin/momo-pg-backup.sh; exporter reportsawoooi_backup_job_configured{host="188",job="momo_pg_daily"} 1。 - Offsite / GDrive freshness:
offsite_configured=1、offsite_fresh=1、rclone_gdrive_configured=1、rclone_gdrive_fresh=1。 - Last aggregate backup:
2026-06-26 02:31:02。 - Prometheus recovery rules:
awoooi_recovery_core_ready=1、awoooi_recovery_dr_offsite_ready=0。 - DR blocker remains:
escrow_missing=5,不得偽造 evidence marker,也不得貼 secret value / hash / partial token。 - Full-stack service state:
FULL_STACK_GREEN_DR_ESCROW_BLOCKED。Post-start quick checkPASS=38 WARN=3 BLOCKED=0;StockPlatform freshnessstatus=ok;MOMO daily freshness2|2026-06-24。
| Gate | Status | Evidence |
|---|---|---|
| 110 backup freshness | VERIFIED | 13/13 fresh, failed count 0. |
| 188 backup freshness | VERIFIED | 2/2 fresh, failed count 0. |
| 188 MOMO backup cron/config | VERIFIED | Active crontab uses /home/ollama/bin/momo-pg-backup.sh; configured_missing_188=0. |
| Offsite / GDrive freshness | VERIFIED | offsite_fresh=1, rclone_gdrive_fresh=1. |
| Backup core blockers | GREEN | core_blockers=0; Prometheus awoooi_recovery_core_ready=1. |
| Full-stack service state | FULL_STACK_GREEN_DR_ESCROW_BLOCKED | POST_START_QUICK_CHECK PASS=38 WARN=3 BLOCKED=0; service/data/backup core green. |
| Credential escrow | BLOCKED | escrow_missing=5; only real non-secret owner evidence may close this. |
2026-06-26 06:28 Backup / Offsite / Escrow Live Status
Read-only evidence sources: 06:26 / 06:28 post-start-quick-check.sh, delegated /backup/scripts/backup-status.sh --no-notify --no-refresh, route-only wrapper retry validation, and direct StockPlatform / MOMO freshness readback.
- 110 backup health:
13/13 fresh failed=0。 - 188 backup health:
2/2 fresh failed=0。 - Integrity / configured blockers:
core_blockers=0、configured_missing_110=0、configured_missing_188=0、script_missing_110=0、script_missing_188=0、integrity_stale=0。 - Offsite / GDrive freshness:
offsite_configured=1、offsite_fresh=1、rclone_gdrive_configured=1、rclone_gdrive_fresh=1。 - Last aggregate backup:
2026-06-26 02:31:02。 - DR blocker remains:
escrow_missing=5,不得偽造 evidence marker,也不得貼 secret value / hash / partial token。 - Full-stack service state:
FULL_STACK_GREEN_DR_ESCROW_BLOCKED。Cold-startPASS=89 WARN=0 BLOCKED=0;StockPlatform freshnessstatus=ok;MOMO daily freshness1|2026-06-24。 - Route note: 06:26 full wrapper had one-time route
000for IwoooS / VibeWork, but direct curl and route-only wrapper immediately returned200andRESULT=GREEN; v1.6 wrapper now retries routes before blocking.
| Gate | Status | Evidence |
|---|---|---|
| 110 backup freshness | VERIFIED | 13/13 fresh, failed count 0. |
| 188 backup freshness | VERIFIED | 2/2 fresh, failed count 0. |
| Offsite / GDrive freshness | VERIFIED | offsite_fresh=1, rclone_gdrive_fresh=1. |
| Backup core blockers | GREEN | core_blockers=0. |
| Full-stack service state | FULL_STACK_GREEN_DR_ESCROW_BLOCKED | Cold-start PASS=89 WARN=0 BLOCKED=0; core wrapper PASS=15 WARN=2 BLOCKED=0; route-only wrapper PASS=31 WARN=0 BLOCKED=0. |
| Credential escrow | BLOCKED | escrow_missing=5; only real non-secret owner evidence may close this. |
2026-06-25 19:17 Backup / Offsite / Escrow Live Status
Read-only evidence sources: /backup/scripts/backup-status.sh --no-notify --no-refresh from 110 at 19:17 Asia/Taipei, plus 19:05 post-start quick check and 19:05-19:06 route stability readback.
- 110 backup health:
13/13 fresh failed=0。 - 188 backup health:
2/2 fresh failed=0。 - Integrity / configured blockers:
core_blockers=0、dr_warnings=5、configured_missing_110=0、configured_missing_188=0、script_missing_110=0、script_missing_188=0、integrity_stale=0。 - Offsite / GDrive freshness:
offsite_configured=1、offsite_fresh=1、rclone_gdrive_configured=1、rclone_gdrive_fresh=1。 - Last aggregate backup:
2026-06-25 02:35:09。 - DR blocker remains:
escrow_missing=5,不得偽造 evidence marker,也不得貼 secret value / hash / partial token。 - Full-stack service state:
FULL_STACK_GREEN_DR_ESCROW_BLOCKED。這代表服務面、路由、K3s、MOMO data freshness、backup/offsite 為 green;不是 DR complete。 - MOMO DB readback from 19:05 wrapper:
daily_sales_snapshot=109061|2025-07-01|2026-06-24、DB_MONTHLY_SYNC 15383|15383|2026-06-01|2026-06-24|2026-06-01|2026-06-24、DB_DAILY_FRESHNESS 1|2026-06-24、latest import job57 completed|即時業績_當日.xlsx|2026-06-25T13:16:47.359958|2026-06-25T13:18:02.964985|15383|15383|0。
| Gate | Status | Evidence |
|---|---|---|
| 110 backup freshness | VERIFIED | 13/13 fresh, failed count 0. |
| 188 backup freshness | VERIFIED | 2/2 fresh, failed count 0. |
| Offsite / GDrive freshness | VERIFIED | offsite_fresh=1, rclone_gdrive_fresh=1. |
| Backup core blockers | GREEN | core_blockers=0. |
| MOMO data freshness | VERIFIED | `DB_DAILY_FRESHNESS 1 |
| Full-stack service state | FULL_STACK_GREEN_DR_ESCROW_BLOCKED | 21:14 POST_START_QUICK_CHECK PASS=38 WARN=2 BLOCKED=0; cold-start PASS=89 WARN=0 BLOCKED=0; StockPlatform freshness is OK, and only escrow_missing=5 blocks DR complete. |
| Credential escrow | BLOCKED | escrow_missing=5; only real non-secret owner evidence may close this. |
2026-06-25 20:11 StockPlatform Cron Source / Backup Boundary
Read-only and minimal-write evidence sources: StockPlatform Gitea / live source readback, one fast-forward git pull --ff-only origin main on 110 /home/wooo/stockplatform-v2, natural cron logs, ops.job_runs, and /api/v1/system/freshness.
- Backup remains green from the 19:17 readback: 110
13/13 fresh failed=0, 1882/2 fresh failed=0,core_blockers=0,offsite_fresh=1,rclone_gdrive_fresh=1。 - DR blocker remains
escrow_missing=5。 - StockPlatform source-version drift is repaired: live
/home/wooo/stockplatform-v2and Giteamainarefb91aa4c6272469d1d26e0820169629eac17d28a。 - Six previously missing production cron entrypoint scripts are present, and every
scripts/ops/*.shreferenced byinstall-production-cron.shexists on live source。 - Natural cron evidence after source sync:
source-remediation-queuesucceeded at 19:56 and 20:00.market-index-ingestionsucceeded at 20:00.price-ingestionsucceeded at 20:02.margin-short-ingestionsucceeded at 20:05 but official 2026-06-25 margin-short data remained pending, withrow_count=0.chips-ingestionsucceeded at 20:06.ai-recommendation-pipelinesucceeded at the cron/job layer at 20:10 and correctly blocked internally oncore_margin_short_daily_incomplete,official_margin_short_daily_official_pending。
- Stock freshness remains separate from backup:
/api/v1/system/freshnessis stillblockedwithcore_margin_short_daily_missingandai_recommendations_stale。 - No backup restore, manual DB restore, Docker restart, Nginx reload, K8s action, firewall change, or secret read was performed to address StockPlatform.
| Gate | Status | Evidence |
|---|---|---|
| Backup / offsite | VERIFIED | 19:17 backup readback remains green. |
| StockPlatform cron source | REPAIRED | Live and Gitea at fb91aa4c6272469d1d26e0820169629eac17d28a; missing entrypoints restored. |
| StockPlatform natural cron entrypoints | VERIFIED | 19:56-20:10 official schedule runs no longer fail with script_exit_127. |
| StockPlatform product data freshness | BLOCKED_EXTERNAL_SOURCE | Official 2026-06-25 margin-short source pending; AI recommendations stay stale by design. |
| Credential escrow | BLOCKED | escrow_missing=5; only real non-secret owner evidence may close this. |
2026-06-25 10:35 Backup / Offsite / Escrow Live Status
Read-only evidence sources: /backup/scripts/backup-status.sh --no-notify --no-refresh from 110 at 10:35 Asia/Taipei; scheduler log proof at 10:04; cold-start rerun at 10:35; public route and DB readback at 10:35.
- 110 backup health:
13/13 fresh failed=0。 - 188 backup health:
2/2 fresh failed=0。 - Integrity / configured blockers:
core_blockers=0、dr_warnings=5、configured_missing_110=0、configured_missing_188=0、script_missing_110=0、script_missing_188=0、integrity_stale=0。 - Offsite / GDrive freshness:
offsite_configured=1、offsite_fresh=1、rclone_gdrive_configured=1、rclone_gdrive_fresh=1。 - Last aggregate backup:
2026-06-25 02:35:09。 - DR blocker remains:
escrow_missing=5,不得偽造 evidence marker,也不得貼 secret value / hash / partial token。 - Full-stack service release blocker remains separate: cold-start
PASS=85 WARN=1 BLOCKED=1,原因是 MOMO business data freshness stale (MOMO_DAILY_FRESHNESS 8|2026-06-17) plus Google Drive token metadata missing / writeback not confirmed。這不是 backup freshness failure。 - MOMO code boundary now covers both failure modes:
cd.yaml #904makes monthly sync failure fail the import job and prevents Drive file movement;cd.yaml #910makes Drive auth/API failure returnsuccess=falseinstead of a no-file success. - Live scheduler proof: 2026-06-25 10:04
auto_import_tasklogsGoogle Drive 認證失敗: could not locate runnable browser,then logs❌ 自動匯入失敗and sends Telegram failure notification successfully. Therefore the alert is now a correct failure signal, not a heartbeat / no-file false green. - MOMO DB readback:
daily_sales_snapshot=104614|2025-07-01|2026-06-17; current-monthrealtime_sales_monthly=10936|2026/06/01|2026/06/17; latest import job remains56 completedwith10936/10936/0and no newer successful daily-sales import by 10:35.
| Gate | Status | Evidence |
|---|---|---|
| 110 backup freshness | VERIFIED | 13/13 fresh, failed count 0. |
| 188 backup freshness | VERIFIED | 2/2 fresh, failed count 0. |
| Offsite / GDrive freshness | VERIFIED | offsite_fresh=1, rclone_gdrive_fresh=1. |
| Backup core blockers | GREEN | core_blockers=0. |
| Credential escrow | BLOCKED | escrow_missing=5; only real non-secret owner evidence may close this. |
| MOMO Drive token metadata | WARN | Host and container token metadata paths are missing; no token content was read. |
| MOMO Drive auth false-green | FIX_DEPLOYED_AND_LIVE_PROVEN | Gitea Actions cd.yaml #910 success; 188 host and scheduler container source include fail-closed marker; 10:04 scheduler cycle failed closed and sent failure notification. |
| Service full green | NO-GO | Blocked by MOMO source absence / data freshness; token metadata warning also requires owner-gated evidence. |
2026-06-24 23:33 Backup / Offsite / Escrow Live Status
Read-only command: /backup/scripts/backup-status.sh --no-notify --no-refresh from 110 at 23:33 Asia/Taipei.
- 110 backup health:
13/13 fresh failed=0。 - 188 backup health:
2/2 fresh failed=0。 - Integrity / configured blockers:
core_blockers=0、dr_warnings=5、configured_missing_110=0、configured_missing_188=0、script_missing_110=0、script_missing_188=0、integrity_stale=0。 - Offsite / GDrive freshness:
offsite_configured=1、offsite_fresh=1、rclone_gdrive_configured=1、rclone_gdrive_fresh=1。 - Last aggregate backup:
2026-06-24 02:28:39。 - DR blocker remains:
escrow_missing=5,不得偽造 evidence marker,也不得貼 secret value / hash / partial token。 - Full-stack service release blocker remains separate: cold-start
PASS=88 WARN=0 BLOCKED=1,原因是188 momo source file absent while daily sales data stale/MOMO_DAILY_FRESHNESS 7|2026-06-17;這不是 backup freshness failure。
| Gate | Status | Evidence |
|---|---|---|
| 110 backup freshness | VERIFIED | 13/13 fresh, failed count 0. |
| 188 backup freshness | VERIFIED | 2/2 fresh, failed count 0. |
| Offsite / GDrive freshness | VERIFIED | offsite_fresh=1, rclone_gdrive_fresh=1. |
| Backup core blockers | GREEN | core_blockers=0. |
| Credential escrow | BLOCKED | escrow_missing=5; only real non-secret owner evidence may close this. |
| Service full green | NO-GO | Blocked by MOMO source absence / data freshness, not by backup. |
2026-06-24 22:17 Backup / Offsite / Escrow Live Status
Read-only command: /backup/scripts/backup-status.sh --no-notify --no-refresh from 110 at 22:17 Asia/Taipei.
- 110 backup health:
13/13 fresh failed=0。 - 188 backup health:
2/2 fresh failed=0。 - Integrity / configured blockers:
core_blockers=0、dr_warnings=5、configured_missing_110=0、configured_missing_188=0、script_missing_110=0、script_missing_188=0。 - Offsite / GDrive freshness:
offsite_configured=1、offsite_fresh=1、rclone_gdrive_configured=1、rclone_gdrive_fresh=1。 - Last aggregate backup:
2026-06-24 02:28:39。 - DR blocker remains:
escrow_missing=5,不得偽造 evidence marker,也不得貼 secret value / hash / partial token。 - Full-stack service release blocker remains separate: cold-start
PASS=86 WARN=0 BLOCKED=1,原因是MOMO_DAILY_FRESHNESS 7|2026-06-17;這不是 backup freshness failure。 - MOMO code boundary is now production-deployed: Gitea Actions
cd.yaml #904succeeded at commit84035906aba0e5e190d031a13cfd9b47a8cd1f73; 188 live source marker confirms monthly sync failure now fails the job and prevents Drive file movement.
| Gate | Status | Evidence |
|---|---|---|
| 110 backup freshness | VERIFIED | 13/13 fresh, failed count 0. |
| 188 backup freshness | VERIFIED | 2/2 fresh, failed count 0. |
| Offsite / GDrive freshness | VERIFIED | offsite_fresh=1, rclone_gdrive_fresh=1. |
| Backup core blockers | GREEN | core_blockers=0. |
| Credential escrow | BLOCKED | escrow_missing=5; only real non-secret owner evidence may close this. |
| MOMO code boundary | VERIFIED | cd.yaml #904 success and 188 live import-service marker. |
| Service full green | NO-GO | Blocked by MOMO data freshness, not by backup. |
2026-06-24 22:40 MOMO source absence clarification:
- Scheduler
auto_import_taskrecent runs reportfile_count=0andimported_count=0for the expected Drive folder / pattern. import_configremainsgdrive_folder_path=當日業績匯入andgdrive_file_pattern=即時業績_當日。- Latest valid import job
56already completed withsync_success=trueand bounds2026-06-01..2026-06-17。 - Repo-side cold-start v1.42 dry-run emits
MOMO_SOURCE_EMPTY_EVIDENCE_LINES 21、MOMO_IMPORT_CONFIG 當日業績匯入|即時業績_當日、MOMO_LATEST_IMPORT_JOB 56|completed|即時業績_當日.xlsx|2026-06-18T11:41:00.853176|2026-06-18T11:42:02.309425|10936|10936|0and keeps the only hard blocker as source absence. - 110 live monitor deployment is intentionally not claimed:
verify-cold-start-monitor-deploy.shreports repo hashf60b81029969a527dc742ebc9558d2933f11fe24ec4f46f7a7bc6637759b7b05vs live hash10608873d406911a519afa96218abebc2b85ab6123bdf46b6e21eb269e554bb8. - Therefore backup/offsite remains green while service full-green remains blocked by business data source absence. Do not run backup restore or DB restore to solve this symptom.
2026-06-24 21:33 Backup / Offsite / Escrow Live Status
Read-only command: /backup/scripts/backup-status.sh --no-notify --no-refresh from 110 at 21:33 Asia/Taipei.
- 110 backup health:
13/13 fresh failed=0。 - 188 backup health:
2/2 fresh failed=0。 - Integrity / configured blockers:
core_blockers=0、configured_missing_110=0、configured_missing_188=0、script_missing_110=0、script_missing_188=0。 - Offsite / GDrive freshness:
offsite_configured=1、offsite_fresh=1、rclone_gdrive_configured=1、rclone_gdrive_fresh=1。 - Last aggregate backup:
2026-06-24 02:28:39。 - DR blocker remains:
escrow_missing=5,不得偽造 evidence marker,也不得貼 secret value / hash / partial token。 - Full-stack service release blocker remains separate: cold-start
PASS=86 WARN=0 BLOCKED=1,原因是MOMO_DAILY_FRESHNESS 7|2026-06-17;這不是 backup freshness failure。MOMO 程式版本已是V10.653,但業務資料仍未到今天。
| Gate | Status | Evidence |
|---|---|---|
| 110 backup freshness | VERIFIED | 13/13 fresh, failed count 0. |
| 188 backup freshness | VERIFIED | 2/2 fresh, failed count 0. |
| Offsite / GDrive freshness | VERIFIED | offsite_fresh=1, rclone_gdrive_fresh=1. |
| Backup core blockers | GREEN | core_blockers=0. |
| Credential escrow | BLOCKED | escrow_missing=5; only real non-secret owner evidence may close this. |
| Service full green | NO-GO | Blocked by MOMO data freshness, not by backup. |
2026-06-24 Velero / Exporter / Disk-Pressure Live Status
2026-06-24 06:35 refresh:
- 110 backup health remains fresh: 13 configured jobs, stale
0, failed count0, config failed0。 - 188 backup health remains fresh: 2 configured jobs, stale
0, missing cron/script0。 - 188
node-exportertextfile scrape is restored: Prometheusup{job="node-exporter-188"}=1andawoooi_backup_health_monitor_up{host="188"}=1。 - 188 PostgreSQL exporter and Redis exporter are restored: local metrics
pg_up=1/redis_up=1; Prometheus seesup{job="postgres-exporter"}=1,pg_up=1,up{job="redis-exporter"}=1,redis_up=1。 - 188 MinIO endpoint is healthy on
192.168.0.188:9000; 120 VeleroBackupStorageLocation/defaultisAvailable。 - One-off Velero backup
reboot-recovery-202606240456completed successfully; 110 backup-health textfile reportsawoooi_velero_latest_completed_backup_fresh=1。 VeleroBackupNotRun、BackupHealthMonitorMissing188、PostgreSQLDown、RedisDownand 110 disk-pressure alerts are resolved.- 110
/disk use is reduced from92%to73%after Docker image/build-cache cleanup only. Docker volume prune remains forbidden without explicit owner approval. - Credential escrow readback remains blocked:
ESCROW_MISSING_COUNT=5。 - Full service green is still blocked by MOMO business data freshness:
MOMO_DAILY_FRESHNESS 6|2026-06-17。
| Gate | Status | Evidence |
|---|---|---|
| 110 backup freshness | VERIFIED | 13/13 fresh, failed count 0, config failed 0. |
| 188 backup freshness | VERIFIED | 2/2 fresh, node-exporter scrape and textfile metrics restored. |
| Velero / MinIO storage | VERIFIED | MinIO health OK, BSL Available, backup reboot-recovery-202606240456 Completed, freshness metric 1. |
| PostgreSQL / Redis exporters | VERIFIED | pg_up=1, redis_up=1, Prometheus scrape up=1 for both exporters. |
| Alert chain | VERIFIED_WITH_EXPECTED_REDLIGHTS | Exporter/Velero/disk alerts resolved; escrow missing and MOMO freshness blocker remain visible. |
| Credential escrow | BLOCKED | Five non-secret evidence markers still missing. |
| DR closeout | NO-GO | Must not be declared complete until real owner-provided non-secret evidence IDs are validated and markers are written. |
Operational helpers:
# 188 PostgreSQL / Redis exporters
ssh ollama@192.168.0.188 'bash /home/ollama/bin/188-db-exporters-restore.sh'
# 188 MinIO / 120 Velero BSL readback
ssh wooo@192.168.0.110 '/home/wooo/scripts/188-minio-velero-restore.sh'
# Maintenance-window one-off Velero backup + backup-health textfile refresh
ssh wooo@192.168.0.110 'CREATE_VELERO_BACKUP=true REFRESH_BACKUP_HEALTH=true /home/wooo/scripts/188-minio-velero-restore.sh'
Current policy: restore backup and monitoring red lights first; do not silence VeleroBackupNotRun or exporter-down alerts. Healthy heartbeat success messages are suppressed separately and should not be confused with real backup/data/escrow alerts.
2026-06-18 Cold-Start / Backup Live Status
2026-06-18 13:43 refresh:
full-stack-cold-start-check.sh --monitor-read-only --no-color --watch --interval 1 --max-attempts 1:PASS=84 WARN=0 BLOCKED=0,resultGREEN。- K8s schedule evidence:
FAILED_JOBS=1、STALE_FAILED_JOBS=1、ACTIVE_FAILED_JOBS=0、BAD_PODS=0。Retainedkm-vectorizefailed Job is historical evidence only; active failed Job count is zero. - 110 backup health:
total=13 stale=0 missing_cron=0 missing_script=0 failed_count=0 config_failed=0 integrity_total=2 integrity_stale=0。 - 188 backup health:
total=2 stale=0 missing_cron=0 missing_script=0。 - Public routes / TLS, momo DB parity, 120 / 121 K3s readiness, AWOOOI API/Web route checks all passed in the same cold-start run.
- Credential escrow readback remains blocked:
/backup/scripts/offsite-escrow-evidence-report.sh --no-colorreportsSCRIPT_MISSING_COUNT=0、OFFSITE_CONFIGURED=1、RCLONE_CONFIGURED=1、READINESS_REQUIRE_CONFIGURED_BLOCKED=0、ESCROW_MISSING_COUNT=5。
| Gate | Status | Evidence |
|---|---|---|
| Full cold-start service readiness | GREEN | PASS=84 WARN=0 BLOCKED=0; stale failed Job evidence is separated from active failed Job blockers. |
| 110 backup freshness | VERIFIED | 13/13 fresh, failed count 0, config failed 0, integrity stale 0. |
| 188 backup freshness | VERIFIED | 2/2 fresh, missing cron/script 0. |
| Credential escrow | BLOCKED | Five non-secret evidence markers still missing: restic_repository_password, offsite_provider_credentials, break_glass_admin_credentials, dns_registrar_recovery, oauth_ai_provider_recovery. |
| DR closeout | NO-GO | Must not be declared complete until real owner-provided non-secret evidence IDs are validated and markers are written. |
Current policy: service recovery and backup health can be green while DR is still blocked. Do not fake escrow markers, do not paste secrets into repo/chat, and do not silence escrow alerts.
2026-06-13 Post-CD Live Status
2026-06-13 01:26 / 01:28 refresh:
/backup/scripts/backup-status.sh --no-notify:11013/13 fresh failed=0、1882/2 fresh failed=0、integrity_stale=0、offsite_fresh=1、rclone_gdrive_fresh=1、core_blockers=0、escrow_missing=5、last aggregate2026-06-12 15:54:40。/home/wooo/node_exporter_textfiles/offsite_full_sync_verify.prom:awoooi_backup_offsite_remote_verify_ok=1、awoooi_backup_offsite_full_verify_fresh=1,13 個 repo 都是snapshot_count=1且snapshot_latest_only=1。backup-alert-live-visibility-check.py:Prometheus 與 Alertmanager 皆可見 5 個 escrow gap active/firing alerts。- Prometheus rules API:
BackupConfigCapturePartial、BackupAggregateRunFailed、BackupCredentialEscrowEvidenceMissing、ColdStartRecoveryBlocked、ColdStartHost120Unreachable全部存在且 healthok;目前只有 escrow gap rule 正確 firing,其餘 inactive。 backup-alert-label-contract-check.py:本地ops/monitoring/alerts-unified.yml與 live Prometheus label contract 對齊,24 條 baseline backup alert rules 已載入。- 這代表備份核心仍綠;剩餘紅燈仍是 DR credential escrow evidence,不是備份腳本或 offsite sync 失敗。
| Gate | Status | Evidence |
|---|---|---|
| 110 backup cron | VERIFIED | Live crontab still has 02:00 backup-all, 03:00 sync-offsite-backups --mode sync, 06:05 backup-status, 07:20 verify-offsite-full-sync; success is summarized once daily and not sent as noisy Telegram heartbeat. |
| Backup freshness | VERIFIED | 2026-06-13 01:26 status shows 110 13/13 fresh failed=0, 188 2/2 fresh failed=0, core_blockers=0; last aggregate backup completed 2026-06-12 15:54:40. |
| 188 momo backup cron/exporter contract | VERIFIED | 188 crontab now runs /home/ollama/bin/momo-pg-backup.sh; exporter reports awoooi_backup_job_configured{host="188",job="momo_pg_daily"} 1, so configured_missing_188=0. |
| Google Drive/rclone remote latest-only | VERIFIED | 2026-06-13 01:28 textfile confirms all 13 remote repos have snapshot_count=1 and snapshot_latest_only=1; latest scheduled verifier log at 2026-06-12 07:20 returned REMOTE_LATEST_ONLY_OK=1, FULL_MARKER_FRESH=1, VERIFY_OK=1, FAILED=0. |
| Offsite gate marker | VERIFIED | /backup/offsite/enable-rclone-sync present; full marker fresh and verifier wrote /home/wooo/node_exporter_textfiles/offsite_full_sync_verify.prom. |
| Backup alert rules | VERIFIED | 2026-06-13 01:27 live check confirms Prometheus and Alertmanager expose the active BackupCredentialEscrowEvidenceMissing gap alerts for the five missing items; Prometheus rules API has all five required alert names healthy; label contract check confirms all baseline backup alert rules are loaded. |
| Backup aggregate health | VERIFIED | 2026-06-12 15:54 /backup/scripts/backup-all.sh completed 13/13 successfully; Configs captured 120 / 121 / K8s workloads / K8s secrets / Velero from source 192.168.0.120; 18:55 core_blockers=0. |
| Credential escrow | BLOCKED | Five evidence markers missing. Only write non-secret marker evidence with /backup/scripts/mark-credential-escrow-verified.sh. |
| Config backup capture | VERIFIED | 2026-06-12 15:54 Configs backup succeeded for 120-k3s-host-configs, 121-k3s-host-configs, cluster-k8s-workloads, cluster-k8s-secrets, and cluster-velero-backups; latest Configs snapshot bee9ae22. |
| Full cold-start | GREEN | 2026-06-13 01:26 read-only rerun: PASS=83 WARN=0 BLOCKED=0; result GREEN. |
| 110 -> 120 / 188 SSH trust | VERIFIED | Final trust repair backup /home/wooo/.ssh/known_hosts.before-120-188-final-refresh.20260613-011949; CD fix 80e6ec1a uses /home/wooo/.ssh/deploy_known_hosts; post-deploy marker e4a349bc did not clobber global known_hosts, and 120 / 188 entries remain present. |
| 120 console handoff | CLOSED | 120 root filesystem was repaired from console/initramfs with offline fsck, booted at 2026-06-12 15:13, SSH returned, root mounted rw, failed units 0, and K3s mon returned Ready. |
| 2026-06-05 manual backup remediation | VERIFIED with aggregate blocker | 18:40 status: stale110=none, stale188=none, configured_missing_188=0; manual snapshots: AWOOOI b7d5ee4e, Gitea ea641613, Open-WebUI d1147507, ClawBot 73ead3cc, AI artifacts b1161ab8. |
| 2026-06-06 credential escrow audit | BLOCKED | 15:03 report confirms scripts/config/rclone are present, but all five non-secret evidence markers are still missing; 15:06 safe dry-run checklist is documented below. |
Current policy: normal success should not create immediate Telegram noise. Failures and operator-action states must still alert; a single daily status summary runs at 06:05.
2026-06-13 post-CD closeout:
- 110 / 120 / 121 / 188 public/core service recovery is green.
- Backup aggregate, Google Drive/rclone offsite latest-only, and full cold-start are green after 120 recovery.
- Latest normal CD deploy preserved API/Web workload balancing and did not break cold-start SSH trust.
- Do not close DR scorecard until all five credential escrow evidence markers are written with non-secret evidence IDs.
Credential Escrow Evidence Checklist
2026-06-13 13:10 live refresh:
/backup/scripts/mark-credential-escrow-verified.sh --status:仍缺restic_repository_password、offsite_provider_credentials、break_glass_admin_credentials、dns_registrar_recovery、oauth_ai_provider_recovery。/backup/scripts/offsite-escrow-evidence-report.sh --no-color:SCRIPT_MISSING_COUNT=0、OFFSITE_CONFIGURED=1、RCLONE_CONFIGURED=1、READINESS_REQUIRE_CONFIGURED_BLOCKED=0、ESCROW_MISSING_COUNT=5、SUMMARY PASS=8 WARN=5 BLOCKED=0。- Owner request package: CREDENTIAL-ESCROW-EVIDENCE-OWNER-REQUEST.md。
- 判定:備份核心與 offsite readiness 是 green;DR closeout 仍 blocked,直到五個 marker 以真實非敏感 evidence-id 寫入。
Credential escrow marker 只證明「復原資料已被人工驗證且可取回」,不能包含任何 secret。
| Item | Acceptable evidence-id | Forbidden |
|---|---|---|
restic_repository_password |
Password manager item ID, sealed envelope ID, recovery checklist ID | Restic password, recovery code, secret URL |
offsite_provider_credentials |
Vault item ID for Google Drive/rclone or provider credential record | OAuth token, refresh token, application key |
break_glass_admin_credentials |
Break-glass credential record ID or sealed envelope ID | Admin password, SSH private key, OTP seed |
dns_registrar_recovery |
Registrar recovery checklist ID or vault item ID | Registrar password, recovery codes |
oauth_ai_provider_recovery |
Provider account recovery checklist ID or vault item ID | API key, token, client secret |
Safe flow after human verification:
# 1. Read current status; this does not expose secrets.
/backup/scripts/mark-credential-escrow-verified.sh --status
# 2. Validate the non-secret evidence-id first.
/backup/scripts/mark-credential-escrow-verified.sh \
--item <allowed-item> \
--evidence-id <non-secret-evidence-id> \
--dry-run
# 3. Only after dry-run passes, write the marker.
/backup/scripts/mark-credential-escrow-verified.sh \
--item <allowed-item> \
--evidence-id <non-secret-evidence-id>
# 4. Recheck DR metric.
/backup/scripts/offsite-escrow-evidence-report.sh --no-color
<non-secret-evidence-id> must be an existing external reference. Placeholder values such as EVIDENCE_ID_FOR_* or VAULT-ITEM-ID are rejected and must not be written.
備份全景圖(全部自動化)
| # | 資料類型 | 備份腳本 | 排程 | 最大損失 | 狀態 |
|---|---|---|---|---|---|
| 1 | Gitea (DB + 倉庫) | backup-gitea.sh |
每日 02:00 | 24h | ✅ |
| 2 | MOMO PostgreSQL | backup-momo.sh |
每日 02:00 | 24h | ✅ |
| 3 | Harbor (Registry + DB) | backup-harbor.sh |
每日 02:00 | 24h | ✅ |
| 4 | AWOOOI PostgreSQL (完整) | backup-awoooi.sh |
每日 02:00 | 6h | ✅ |
| 4h | AWOOOI PostgreSQL (高頻) | backup-awoooi-frequent.sh |
08/14/20:00 | 6h | ✅ |
| 5 | Langfuse (AI 追蹤/評測) | backup-langfuse.sh |
每日 02:00 | 24h | ✅ |
| 6 | Monitoring (Prometheus/Grafana/Alertmanager) | backup-monitoring.sh |
每日 02:00 | 24h | ✅ |
| 7 | SignOz (ClickHouse 追蹤/日誌) | backup-signoz.sh |
每日 02:00 | 24h | ✅ |
| 8 | Open-WebUI (LLM 對話紀錄) | backup-open-webui.sh |
每日 02:00 | 24h | ✅ |
| 9 | ClawBot Redis (狀態/快取) | backup-clawbot.sh |
每日 02:00 | 24h | ✅ |
| - | K8s 資源 (全命名空間) | Velero + MinIO | 每日 02:00 | 24h | ✅ |
備份總控:/backup/scripts/backup-all.sh v3.0 — 統一執行 9 個備份
告警機制
備份失敗與需要人工處理的狀態必須推送 AwoooP / Telegram。正常成功不即時推送,避免洗版;成功狀態由每日 06:05 摘要與 Prometheus/textfile 證據承載。
| 狀態 | Severity | Telegram 收到 |
|---|---|---|
success |
info | 不即時洗版;每日 06:05 backup status 摘要 |
warning |
warning | ⚠️ 黃色警告 |
failed |
critical | 🔴 立即告警 |
告警端點:http://192.168.0.188:8088/api/v1/webhook/custom
測試指令:
source /backup/scripts/common.sh
notify_clawbot "failed" "backup-test" "測試告警" 0
保留策略
2026-05-19 起,110 本地 restic repo、188 MOMO 檔案備份與 Google Drive/rclone 離機鏡像採 latest-only 策略:成功建立新 snapshot 後只保留最新一份。2026-06-13 01:28 live textfile 已確認 Google Drive/rclone remote 13 個 repo 各 1 份,且 latest-only 指標全為 1。
2026-06-04 manual refresh evidence:
- 188
momo-pg-backup.shproducedmomo_analytics_20260604_154234.sql.gzand pruned old backups beyond keep-last=1. - 110
backup-awoooi-frequent.shcompleted restic snapshot7440d75fand pruned previous AWOOOI high-frequency DB snapshot. - 18:54
backup-status.sh --no-notify:stale110=none,stale188=none,configured_missing_188=0,core_blockers=1,escrow_missing=5.
18:55 cold-start scorecard refresh:
PASS=71 WARN=3 BLOCKED=3.- Remaining hard blocks: 120 ping, 120 SSH, and 120 K3s read-only check.
- 188 backup health stale jobs are clear.
- momo current-month parity is green:
2215|2215|2026-06-01|2026-06-04|2026-06-01|2026-06-04.
19:02 120 console handoff evidence:
- local/110/121/188 cannot reach 192.168.0.120.
- K3s node lease for
monstopped renewing at2026-05-22 02:48:36 +08. 120-fsck-maintenance-checklist.sh --no-colorreturnsPASS=2 WARN=2 BLOCKED=3, so backup aggregate remains correctly blocked until console/SSH recovery.
2026-06-12 18:55 update: 120 has returned and the aggregate backup blocker is cleared. /backup/scripts/backup-all.sh completed 13/13, full offsite sync completed 13/13, full verifier returned REMOTE_LATEST_ONLY_OK=1 / VERIFY_OK=1, and backup-status.sh --no-notify reports core_blockers=0. The only remaining DR warning is escrow_missing=5.
2026-06-13 01:28 update: post-CD live readback still shows remote_verify_ok=1, full_verify_fresh=1, and all 13 repos snapshot_count=1; backup core remains green after deploy marker e4a349bc.
2026-06-05 manual remediation:
- 16:00 live check still had 120 unreachable,
stale110=awoooi_db,backup_all failed=6, andescrow_missing=5. - 14:00 AWOOOI high-frequency backup failed, then 16:01 manual rerun completed snapshot
b7d5ee4e. - 02:00 Gitea failure was caused by stale container
/tmp/gitea-dump.zip; it was renamed in-container to/tmp/gitea-dump.stale.20260605_161032.zip, then Gitea backup completed snapshotea641613. scripts/backup/backup-gitea.shand live 110/backup/scripts/backup-gitea.shnow preserve stale container dump files with timestamped names before starting a new dump.- 110 -> 188 SSH known_hosts was refreshed after fingerprint match for
192.168.0.188; Open-WebUI backup completed snapshotd1147507. - ClawBot backup completed snapshot
73ead3cc; BGSAVE still warned, but the Redis volume backup succeeded. - AI artifacts backup completed snapshot
b1161ab8. - Full offsite sync was skipped by runway gate because the next scheduled backup was too close; partial sync for
awoooi gitea open-webui clawbot ai-artifactscompleted5/5. - 18:39 full verifier confirmed all 13 Google Drive/rclone repos have
remote snapshots=1,REMOTE_LATEST_ONLY_OK=1, andVERIFY_OK=1. - 18:40 backup status still reports
failed=6/core_blockers=6because the 02:00 aggregate history remains failed untilbackup-allreruns after 120 returns. Do not mark aggregate backup green from individual backup success alone.
2026-06-06 convergence evidence:
- 14:46 live check: 120 still ping/SSH failed and K3s
monremainsNotReady,SchedulingDisabled. - 02:00 aggregate backup failed only Configs:
全服務備份完成 (1532s) - 1 個失敗 (12/13). - 14:58
backup-status.sh --no-notify:stale110=none,stale188=none,failed=1,core_blockers=1,escrow_missing=5. - 14:46
verify-offsite-full-sync.sh --write-textfile --no-color: all 13 remote repos have one snapshot,REMOTE_LATEST_ONLY_OK=1,VERIFY_OK=1. - 15:03 cold-start scorecard:
PASS=71 WARN=3 BLOCKED=3; direct 188 checks still showmomo-schedulerhealthy with recent log activity, and the scheduler WARN is no longer present in the scorecard. - 15:03 credential escrow report: rclone/offsite readiness is configured, but
restic_repository_password,offsite_provider_credentials,break_glass_admin_credentials,dns_registrar_recovery, andoauth_ai_provider_recoverystill lack non-secret evidence markers. Do not write placeholders or secrets.
Crontab 完整排程(110)
0 2 * * * backup-all.sh ← 9 個服務完整備份
0 8,14,20 * * * backup-awoooi-frequent.sh ← AWOOOI 高頻(每 6 小時)
0 3 * * * sync-offsite-backups.sh --mode sync ← Google Drive/rclone gated sync
5 6 * * * backup-status.sh ← 每日一次備份狀態摘要,避免成功心跳洗版
20 7 * * * verify-offsite-full-sync.sh --write-textfile ← Google Drive/rclone latest-only 驗證
備份架構
192.168.0.110 (/backup/scripts/backup-all.sh) 每日 02:00
├── [1/9] backup-gitea.sh → gitea dump → /backup/gitea
├── [2/9] backup-momo.sh → SSH 188 pg_dump momo → /backup/momo
├── [3/9] backup-harbor.sh → harbor dump → /backup/harbor
├── [4/9] backup-awoooi.sh → SSH 188 pg_dump awoooi_prod/dev/k3s → /backup/awoooi
├── [5/9] backup-langfuse.sh → docker exec langfuse-db pg_dump → /backup/langfuse
├── [6/9] backup-monitoring.sh → volumes prometheus/grafana/alertmanager → /backup/monitoring
├── [7/9] backup-signoz.sh → volumes signoz-clickhouse/sqlite → /backup/signoz
├── [8/9] backup-open-webui.sh → SSH 188 volume open-webui → /backup/open-webui
└── [9/9] backup-clawbot.sh → SSH 188 volume clawbot-redis → /backup/clawbot
備份失敗 → notify_clawbot("failed") → /webhook/custom 或 AwoooP/Alertmanager path → Telegram 🔴
備份成功 → textfile / Prometheus / 06:05 status 摘要,不即時洗版
192.168.0.188 (Velero) 每日 02:00
└── K8s 資源快照 → MinIO :9000 (bucket: velero)
尚未備份(說明)
| 服務 | 原因 | 備記 |
|---|---|---|
| Prometheus TSDB | 原始指標數據(非設定),TSDB 自帶 30d TTL | 低優先;Grafana 設定已備份 |
| Sentry | 目前沒有在運行(docker ps 空) | 有 volume,重新部署後再評估 |
| Redis (AWOOOI) | Cache/WorkingMemory,無持久業務數據 | 低優先 |
| Velero MinIO 數據 | MinIO 是備份的備份,需離機備份 | 待評估 B2/S3 offsite |
驗證 SOP
# 最新備份日誌
ssh wooo@192.168.0.110 "tail -50 /backup/logs/backup.log"
# 所有服務快照數
ssh wooo@192.168.0.110 "for r in gitea momo harbor awoooi langfuse monitoring signoz open-webui clawbot; do
echo -n \"\$r: \"
restic -r /backup/\$r snapshots --password-file /backup/scripts/.restic-password 2>/dev/null | grep -c snapshot || echo 0
done"
# 告警測試
ssh wooo@192.168.0.110 "source /backup/scripts/common.sh && notify_clawbot 'warning' 'manual-test' '手動告警測試' 0"
相關文件
- REBOOT-RECOVERY-SOP.md - 重開機恢復 SOP
scripts/backup/- 所有備份腳本(Git 版本)/backup/scripts/(on 110) - 實際部署腳本