9.9 KiB
IwoooS Backup / Restore / Escrow / Retention 只讀清冊
| 項目 | 內容 |
|---|---|
| 日期 | 2026-06-11 |
| 狀態 | repo_only_inventory_ready |
| 工具 | scripts/security/backup-restore-escrow-inventory.py |
| Snapshot | docs/security/backup-restore-escrow-inventory.snapshot.json |
| Schema | docs/schemas/backup_restore_escrow_inventory_v1.schema.json |
| 本階段模式 | committed repo files only;不執行 backup / restore / offsite sync |
| runtime gate | 0 |
1. 核心結論
Backup / restore / escrow / retention 已從抽象待辦推進成可重跑的 repo-only 高價值配置清冊。本清冊納入 38 個 surface,其中 27 個屬於 write-capable 或 apply-capable 類型,必須先有 owner response、維護窗口、rollback owner 與驗證指標,才可進入任何 runtime 行動。
此清冊仍不是 live backup truth。它只證明 repo 裡有哪些備份、還原、離機同步、credential escrow、Velero、alert 與 DR 文件需要被控管;不代表備份已成功、restore drill 已執行、offsite sync 已授權、escrow marker 可寫入或 retention policy 可變更。
2026-06-14 已新增 docs/security/BACKUP-RESTORE-OWNER-REQUEST-DRAFT.md 與 docs/security/backup-restore-owner-request-draft.snapshot.json,將 38 個 backup / restore / escrow surface 轉成 owner request draft。固定 request_draft_count=38、write_capable_request_draft_count=27、live_evidence_required_request_count=38、required_owner_field_count=14、blocked_action_count=18、request_sent_count=0、owner_response_received_count=0、runtime_gate_count=0。
此 artifact 只表示 owner request 的 required shape,不代表 request sent、recipient confirmed、owner response received / accepted、live backup evidence、backup run、restore run、offsite sync、remote delete、credential escrow marker write、retention change、restic prune、rclone config、Velero restore / backup、kubectl action、SSH、secret collection、host write、production write 或 runtime gate。
2. 覆蓋摘要
| 指標 | 目前值 | 邊界 |
|---|---|---|
| surface count | 38 |
只代表 repo source 可見 |
| source exists | 38 |
每個 source 皆有 SHA256,不代表 live 已套用 |
| backup script surface | 15 |
含總控、服務備份與 config capture |
| offsite / escrow surface | 8 |
含 rclone sync、verifier、readiness、marker、config helpers |
| Velero surface | 5 |
含 CronJob、ConfigMap script、standalone script、credential manifest、install manifest |
| restore drill surface | 4 |
仍需人工批准,不可直接演練 |
| retention surface | 3 |
restic forget --prune 與 latest-only delete 仍未授權 |
| credential surface | 5 |
只允許 metadata / evidence id,不收 secret value |
| alert surface | 1 |
不代表 Prometheus / Alertmanager reload |
| write-capable surface | 27 |
可見代表需管控,不代表可執行 |
| owner response received / accepted | 0 / 0 |
不得假性提高 |
| restore / offsite / escrow / retention accepted | 0 / 0 / 0 / 0 |
全部仍待 owner gate |
| runtime gate | 0 |
不提供操作按鈕 |
Backup / restore / credential 類別成熟度從 52% 推進到 58%。這只代表 repo-only 清冊、schema、snapshot 與前台邊界完成,不代表 live evidence、restore drill 或 offsite sync 已被接受。
3. 已納管的 surface 類型
| 類型 | 代表 source | 目前狀態 |
|---|---|---|
| 備份總控 | scripts/backup/backup-all.sh |
可執行總控可見,但本階段不執行 |
| 服務備份 | backup-gitea.sh、backup-awoooi.sh、backup-harbor.sh、backup-langfuse.sh、backup-monitoring.sh、backup-signoz.sh、backup-open-webui.sh、backup-clawbot.sh、backup-sentry.sh、backup-ai-artifacts.sh、backup-public-routes.sh |
需逐服務 owner、freshness、restore target isolation 與 secret redaction proof |
| Restic / retention | scripts/backup/common.sh、scripts/backup/enforce-latest-only-retention.sh |
GFS 與 latest-only policy 可見;restic prune 與 delete 仍未授權 |
| Offsite / escrow | sync-offsite-backups.sh、verify-offsite-full-sync.sh、backup-offsite-readiness-gate.sh、offsite-escrow-evidence-report.sh、mark-credential-escrow-verified.sh |
remote sync、remote delete、marker write 全部仍需人工批准 |
| Credential config | configure-offsite-rclone.sh、configure-offsite-b2.sh、k8s/velero/01-credentials.yaml |
只控管 secret metadata;不得收 value、hash、partial token 或 recovery code |
| Velero restore drill | k8s/awoooi-prod/16-cronjob-backup-restore-test.yaml、17-configmap-backup-restore-scripts.yaml、scripts/cron_backup_restore_test.sh |
manifest 與 script 可見;不代表 CronJob live、restore dry-run 或 metric 正常 |
| Alert / health | ops/monitoring/alerts.yml、scripts/ops/backup-health-textfile-exporter.py |
只納入 rule / metric source;不 reload Alertmanager,不寫 live textfile |
| DR / cold-start 文件 | BACKUP-STATUS.md、FULL-STACK-COLD-START-SOP.md、backup DR evaluation snapshots |
文件已納入控管;命令範例不得被視為授權 |
4. 目前不符合或待補強
| 優先 | 缺口 | 風險 | 下一步 |
|---|---|---|---|
| P0 | credential_escrow_markers 尚未 accepted |
缺少可恢復 restic/offsite/break-glass/DNS/OAuth credential 的人工證據 | 建立 non-secret evidence id owner response;不得直接寫 marker |
| P0 | restore drill approval package 仍是模板 | 不能證明 DB、config、K8s、observability restore 可安全演練 | 補隔離環境、observer、source backup ref、stop condition 與 rollback owner |
| P0 | offsite sync 具有 remote delete 能力 | latest-only / rclone sync 可能刪除遠端舊 pack | 補 offsite owner、runway、remote delete owner、full sync window 與 verifier evidence |
| P0 | retention / restic prune 未形成 owner gate | 誤刪 snapshot 或縮短可恢復窗口 | 補 retention owner、restore runway、prune window 與回滾條件 |
| P0 | Velero credential / install manifest 仍需 live disposition | Cluster-admin、MinIO endpoint 與 credential injection 風險高 | 補 RBAC owner、secret manager source、rotation owner、least privilege review |
| P1 | restore test ConfigMap 與 standalone script timestamp 寫法不一致 | Prometheus textfile 可能無法正確讀取 13 位 timestamp | 先列入 owner disposition;修正需走 CronJob / ConfigMap owner gate |
| P1 | backup status runbook 有舊 live refresh note | 先前 live 狀態可能過期 | 需要 owner-provided live refresh,不由本清冊主動 SSH |
| P1 | backup health exporter 可寫 textfile | false-green metric 會誤導告警 | 補 exporter owner、metric freshness SLO、失敗通知與 guard |
5. 固定 0 / false 邊界
以下旗標必須維持 false:
runtime_execution_authorized=false
host_write_authorized=false
backup_run_authorized=false
restore_run_authorized=false
restore_drill_authorized=false
offsite_sync_authorized=false
offsite_remote_delete_authorized=false
credential_escrow_marker_write_authorized=false
retention_change_authorized=false
restic_prune_authorized=false
rclone_config_authorized=false
velero_restore_authorized=false
velero_backup_authorized=false
kubectl_action_authorized=false
ssh_read_authorized=false
ssh_write_authorized=false
secret_value_collection_allowed=false
active_scan_authorized=false
action_buttons_allowed=false
6. 下一階段優先順序
- P0:整理 backup / restore / escrow owner response packet,欄位包含 owner role / team、decision、decision reason、affected scope、redacted evidence refs、followup owner、rollback owner、maintenance window、validation plan。
- P0:建立 credential escrow review package,只允許 non-secret evidence id,不寫 marker。
- P0:針對 offsite sync 補 remote delete owner、runway、full sync window 與 verifier evidence;驗收前不得執行
sync。 - P0:針對 restore drill 補隔離環境、observer、source backup refs、stop condition 與 rollback owner;驗收前不得跑 restore。
- P1:針對 Velero CronJob / ConfigMap script timestamp 差異建立 owner disposition,不直接 apply。
- P1:由 owner 提供最新 backup status / offsite / escrow / Velero metric redacted evidence;本階段不主動 SSH 取得。
- P1:將 P1-3 指標同步到
/zh-TW/iwooos,並做 desktop / mobile overflow 與 no action button 驗證。
7. 驗證指令
python3 scripts/security/backup-restore-escrow-inventory.py \
--root . \
--output /tmp/backup-restore-escrow-inventory-check.json
固定 committed snapshot 時間:
python3 scripts/security/backup-restore-escrow-inventory.py \
--root . \
--generated-at 2026-06-11T22:20:00+08:00 \
--output docs/security/backup-restore-escrow-inventory.snapshot.json
8. 完成度
| 工作 | 完成度 | 說明 |
|---|---|---|
| repo-only surface 註冊 | 100% |
38 個 source surface 已納入 |
| source existence / SHA256 | 100% |
38 / 38 source 存在 |
| schema / snapshot | 100% |
backup_restore_escrow_inventory_v1 已建立 |
| 高價值配置成熟度 | 58% |
從 52% 推進;只代表只讀框架 |
| owner response 收件 / 接受 | 0% |
尚未送件、收件或接受 |
| live evidence collection | 0% |
未 SSH、未 rclone、未 kubectl、未 restore |
| restore / offsite / escrow / retention gate | 0% |
全部仍為 0 / false |
| owner request draft | 100% |
已新增 38 份 request draft、snapshot、文件與 guard;request sent / received / accepted 仍為 0 |
9. 邊界
本清冊未執行 backup-all.sh、未執行任何 service backup、未執行 restic check、未執行 restic forget --prune、未執行 rclone sync、未讀遠端 offsite、未寫 escrow marker、未修改 rclone / B2 設定、未 apply Velero manifest、未跑 restore dry-run、未寫 Prometheus textfile、未 reload alert rules、未 SSH、未收 secret value、未新增任何前端執行按鈕。