docs(ops): add credential escrow evidence owner request [skip ci]
This commit is contained in:
@@ -33943,3 +33943,26 @@ production browser smoke:
|
||||
- Service / cold-start:`GREEN`。
|
||||
- API / Web workload balancing:`LIVE_VERIFIED`。
|
||||
- DR scorecard:仍不可宣稱完成,credential escrow evidence 仍缺 `5` 個。
|
||||
|
||||
## 2026-06-13 — Credential escrow owner evidence request package
|
||||
|
||||
**Live read-only evidence,13:10 Asia/Taipei**:
|
||||
- `/backup/scripts/mark-credential-escrow-verified.sh --status`:仍缺 `restic_repository_password`、`offsite_provider_credentials`、`break_glass_admin_credentials`、`dns_registrar_recovery`、`oauth_ai_provider_recovery`。
|
||||
- `/backup/scripts/offsite-escrow-evidence-report.sh --no-color`:`SCRIPT_MISSING_COUNT=0`、`OFFSITE_CONFIGURED=1`、`RCLONE_CONFIGURED=1`、`READINESS_REQUIRE_CONFIGURED_BLOCKED=0`、`ESCROW_MISSING_COUNT=5`、`SUMMARY PASS=8 WARN=5 BLOCKED=0`。
|
||||
|
||||
**文件 / snapshot**:
|
||||
- 新增 `docs/security/CREDENTIAL-ESCROW-EVIDENCE-OWNER-REQUEST.md`。
|
||||
- 新增 `docs/security/credential-escrow-evidence-owner-request.snapshot.json`。
|
||||
- 更新 `docs/runbooks/BACKUP-STATUS.md` 與 reboot workplan,將 P1 credential escrow 從「script/config 是否可用」收斂為「等待 owner 提供真實非敏感 evidence-id」。
|
||||
|
||||
**目前進度**:
|
||||
- Credential escrow owner request package:`80%`。
|
||||
- Owner external verification:`0%`。
|
||||
- Dry-run validation:`0%`。
|
||||
- Marker write:`0%`。
|
||||
- DR closeout verification:`0%`。
|
||||
|
||||
**邊界**:
|
||||
- 本輪沒有讀取、收集、貼上或保存任何 secret value、hash、prefix/suffix、partial token。
|
||||
- 本輪沒有寫入 live marker;`BackupCredentialEscrowEvidenceMissing` 必須繼續 firing,直到五個 marker 以真實非敏感 evidence-id 補齊。
|
||||
- Service / cold-start 維持 `GREEN`;DR scorecard 仍是 `BLOCKED`。
|
||||
|
||||
@@ -7,6 +7,7 @@
|
||||
> 2026-06-12 Codex post-120 recovery refresh: 120 restored, backup aggregate / offsite / full cold-start green; DR still blocked only by credential escrow evidence.
|
||||
> 2026-06-13 Codex live refresh: backup core remains green; DR still blocked only by credential escrow evidence.
|
||||
> 2026-06-13 Codex post-CD refresh: backup/offsite/alert contracts remain green after deploy marker `e4a349bc`; global SSH trust guardrail held; DR still blocked only by credential escrow evidence.
|
||||
> 2026-06-13 Codex escrow refresh: 13:10 live report confirms offsite/rclone/script readiness is green and only five non-secret credential escrow evidence markers remain missing.
|
||||
|
||||
---
|
||||
|
||||
@@ -50,6 +51,13 @@ Current policy: normal success should not create immediate Telegram noise. Failu
|
||||
|
||||
## Credential Escrow Evidence Checklist
|
||||
|
||||
2026-06-13 13:10 live refresh:
|
||||
|
||||
- `/backup/scripts/mark-credential-escrow-verified.sh --status`:仍缺 `restic_repository_password`、`offsite_provider_credentials`、`break_glass_admin_credentials`、`dns_registrar_recovery`、`oauth_ai_provider_recovery`。
|
||||
- `/backup/scripts/offsite-escrow-evidence-report.sh --no-color`:`SCRIPT_MISSING_COUNT=0`、`OFFSITE_CONFIGURED=1`、`RCLONE_CONFIGURED=1`、`READINESS_REQUIRE_CONFIGURED_BLOCKED=0`、`ESCROW_MISSING_COUNT=5`、`SUMMARY PASS=8 WARN=5 BLOCKED=0`。
|
||||
- Owner request package: [CREDENTIAL-ESCROW-EVIDENCE-OWNER-REQUEST.md](../security/CREDENTIAL-ESCROW-EVIDENCE-OWNER-REQUEST.md)。
|
||||
- 判定:備份核心與 offsite readiness 是 green;DR closeout 仍 blocked,直到五個 marker 以真實非敏感 evidence-id 寫入。
|
||||
|
||||
Credential escrow marker 只證明「復原資料已被人工驗證且可取回」,不能包含任何 secret。
|
||||
|
||||
| Item | Acceptable evidence-id | Forbidden |
|
||||
|
||||
140
docs/security/CREDENTIAL-ESCROW-EVIDENCE-OWNER-REQUEST.md
Normal file
140
docs/security/CREDENTIAL-ESCROW-EVIDENCE-OWNER-REQUEST.md
Normal file
@@ -0,0 +1,140 @@
|
||||
# Credential Escrow Evidence Owner Request
|
||||
|
||||
> 狀態時間:2026-06-13 13:10 Asia/Taipei
|
||||
> 範圍:DR credential escrow evidence marker 補齊
|
||||
> 原則:只收非敏感 evidence-id;禁止收集、張貼、提交或保存任何密碼、token、金鑰、recovery code、secret value、hash、prefix/suffix 或 partial token。
|
||||
|
||||
---
|
||||
|
||||
## 1. 目前判定
|
||||
|
||||
| 項目 | 狀態 | 完成度 | 說明 |
|
||||
|------|------|-------:|------|
|
||||
| 備份核心 | GREEN | 100% | 110 / 188 freshness、Google Drive rclone、latest-only verifier、full cold-start 目前皆為綠燈。 |
|
||||
| Escrow script / offsite readiness | READY | 100% | `SCRIPT_MISSING_COUNT=0`、`OFFSITE_CONFIGURED=1`、`RCLONE_CONFIGURED=1`、`READINESS_REQUIRE_CONFIGURED_BLOCKED=0`。 |
|
||||
| Credential escrow owner request package | READY_TO_DISPATCH | 80% | 本文件定義 owner 需要提供的非敏感 evidence-id、禁止內容、dry-run 與驗收流程。 |
|
||||
| Credential escrow marker | BLOCKED_WAITING_OWNER_EVIDENCE | 0% | 五個 marker 仍未寫入;不得用 placeholder 或 secret 補齊。 |
|
||||
| DR scorecard closeout | BLOCKED | 90% | 服務可用性已綠;DR 完成仍等 `ESCROW_MISSING_COUNT=0`。 |
|
||||
|
||||
這不是服務 outage。這是 DR 復原治理 gate:必須證明關鍵復原憑證可由指定 owner 在災難時取回,但證明本身不能洩漏憑證。
|
||||
|
||||
---
|
||||
|
||||
## 2. Live Evidence
|
||||
|
||||
2026-06-13 13:10 在 110 只讀檢查:
|
||||
|
||||
```text
|
||||
missing: restic_repository_password
|
||||
missing: offsite_provider_credentials
|
||||
missing: break_glass_admin_credentials
|
||||
missing: dns_registrar_recovery
|
||||
missing: oauth_ai_provider_recovery
|
||||
|
||||
SCRIPT_MISSING_COUNT=0
|
||||
OFFSITE_CONFIGURED=1
|
||||
RCLONE_CONFIGURED=1
|
||||
B2_CONFIGURED=0
|
||||
READINESS_REQUIRE_CONFIGURED_BLOCKED=0
|
||||
ESCROW_MISSING_COUNT=5
|
||||
PARTIAL_MARKER_PRESENT=1
|
||||
FULL_MARKER_PRESENT=1
|
||||
SUMMARY PASS=8 WARN=5 BLOCKED=0
|
||||
```
|
||||
|
||||
`B2_CONFIGURED=0` 是 legacy B2 未配置;目前 Google Drive / rclone provider 已配置,所以不是本輪 blocker。
|
||||
|
||||
---
|
||||
|
||||
## 3. Owner Request Matrix
|
||||
|
||||
| Item | Owner 要確認的復原能力 | 可接受的非敏感 evidence-id | 禁止內容 |
|
||||
|------|--------------------------|-----------------------------|----------|
|
||||
| `restic_repository_password` | 能在災難時取回 restic repository password。 | Password manager item ID、sealed envelope ID、recovery checklist ID。 | Restic password、recovery code、secret URL、截圖中的密碼。 |
|
||||
| `offsite_provider_credentials` | 能在災難時取回 Google Drive / rclone 或 offsite provider 憑證。 | Vault item ID、provider credential record ID、offsite access checklist ID。 | OAuth token、refresh token、application key、client secret、cookie。 |
|
||||
| `break_glass_admin_credentials` | 能在災難時取得 break-glass admin 登入或替代復原路徑。 | Break-glass credential record ID、sealed envelope ID、emergency access checklist ID。 | Admin password、SSH private key、OTP seed、recovery code。 |
|
||||
| `dns_registrar_recovery` | 能在災難時恢復 DNS registrar / domain control。 | Registrar recovery checklist ID、vault item ID、domain recovery record ID。 | Registrar password、recovery code、unredacted registrar session。 |
|
||||
| `oauth_ai_provider_recovery` | 能在災難時恢復 AI provider / OAuth provider 管理權。 | Provider recovery checklist ID、vault item ID、provider account recovery record ID。 | API key、token、client secret、OAuth refresh token。 |
|
||||
|
||||
Evidence-id 必須是「外部系統中可查核的記錄代號」,例如 password manager item ID、sealed envelope 編號、內部 recovery checklist 編號。它不能是憑證本身,也不能足以推導出憑證。
|
||||
|
||||
---
|
||||
|
||||
## 4. 禁止提交的資料
|
||||
|
||||
以下內容不得出現在 repo、聊天、issue、PR、LOGBOOK、snapshot、terminal output 貼文或 marker note:
|
||||
|
||||
- 密碼、token、API key、private key、SSH key、cookie、session。
|
||||
- OAuth client secret、refresh token、authorization header。
|
||||
- OTP seed、recovery code、backup code。
|
||||
- PostgreSQL / Redis / Sentry / provider connection URL 中含帳密的字串。
|
||||
- secret hash、prefix、suffix、partial token、可逆遮罩值。
|
||||
- 未遮罩截圖、未遮罩 password manager 畫面。
|
||||
- placeholder,例如 `EVIDENCE_ID_FOR_*`、`VAULT-ITEM-ID`、`TODO`、`TBD`。
|
||||
|
||||
---
|
||||
|
||||
## 5. Safe Execution Flow
|
||||
|
||||
以下命令只能在 owner 已於 repo / chat 外部確認復原資料存在後執行。`<NON_SECRET_EVIDENCE_ID>` 必須換成真實但非敏感的外部記錄代號。
|
||||
|
||||
```bash
|
||||
# 1. 讀取目前 marker 狀態;此步不暴露 secret。
|
||||
/backup/scripts/mark-credential-escrow-verified.sh --status
|
||||
|
||||
# 2. 先 dry-run 驗證 evidence-id 格式與 item 合法性;此步不寫入 marker。
|
||||
/backup/scripts/mark-credential-escrow-verified.sh \
|
||||
--item <item> \
|
||||
--evidence-id <NON_SECRET_EVIDENCE_ID> \
|
||||
--dry-run
|
||||
|
||||
# 3. dry-run OK 且 owner 明確批准後,才寫入 marker。
|
||||
/backup/scripts/mark-credential-escrow-verified.sh \
|
||||
--item <item> \
|
||||
--evidence-id <NON_SECRET_EVIDENCE_ID> \
|
||||
--note <SHORT_NON_SECRET_NOTE>
|
||||
|
||||
# 4. 寫入後重新產生 escrow / backup / cold-start 證據。
|
||||
/backup/scripts/offsite-escrow-evidence-report.sh --no-color
|
||||
/backup/scripts/backup-status.sh --no-notify --no-refresh
|
||||
/home/wooo/scripts/full-stack-cold-start-check.sh --monitor-read-only --no-color --watch --interval 1 --max-attempts 1
|
||||
```
|
||||
|
||||
若 dry-run 拒絕 placeholder、過短值、疑似 secret 或不合法 item,必須停止,回到 owner 重新提供非敏感 evidence-id。
|
||||
|
||||
---
|
||||
|
||||
## 6. 驗收條件
|
||||
|
||||
| Gate | 必須看到 |
|
||||
|------|----------|
|
||||
| Escrow marker | 五個 item 都不再顯示 missing。 |
|
||||
| Escrow report | `ESCROW_MISSING_COUNT=0`。 |
|
||||
| Prometheus textfile | `awoooi_backup_dr_credential_escrow_missing_count 0`。 |
|
||||
| Backup status | `escrow_missing=0`,且 `core_blockers=0` 維持不變。 |
|
||||
| Alertmanager | `BackupCredentialEscrowEvidenceMissing` 不再 firing。 |
|
||||
| Cold-start | `WARN=0 BLOCKED=0` 維持 green。 |
|
||||
|
||||
只有以上全部成立,才可以把 DR scorecard 從 `BLOCKED` 改為 `COMPLETE`。
|
||||
|
||||
---
|
||||
|
||||
## 7. 工作推進百分比
|
||||
|
||||
| Lane | 目前完成度 | 下一步 |
|
||||
|------|-----------:|--------|
|
||||
| Owner request package | 80% | 指定 owner role / team,交付本文件與五個 item 清單。 |
|
||||
| Owner external verification | 0% | Owner 在 password manager、sealed envelope、registrar/provider account 外部完成查核。 |
|
||||
| Dry-run validation | 0% | 五個 item 都以非敏感 evidence-id 通過 `--dry-run`。 |
|
||||
| Marker write | 0% | 五個 marker 寫入成功。 |
|
||||
| DR closeout verification | 0% | escrow report、backup status、Alertmanager、cold-start 全部重跑且綠燈。 |
|
||||
|
||||
---
|
||||
|
||||
## 8. 目前不可宣稱
|
||||
|
||||
- 不可宣稱 DR scorecard complete。
|
||||
- 不可宣稱 credential escrow 已補齊。
|
||||
- 不可把備份 / offsite / cold-start green 等同 credential escrow green。
|
||||
- 不可用 placeholder、測試 ID 或秘密值補 marker。
|
||||
- 不可消音 `BackupCredentialEscrowEvidenceMissing`,它目前是正確紅燈。
|
||||
@@ -0,0 +1,115 @@
|
||||
{
|
||||
"schema_version": 1,
|
||||
"generated_at": "2026-06-13T13:10:53+08:00",
|
||||
"timezone": "Asia/Taipei",
|
||||
"scope": "credential_escrow_evidence_owner_request",
|
||||
"source_evidence": {
|
||||
"host": "192.168.0.110",
|
||||
"commands": [
|
||||
"/backup/scripts/mark-credential-escrow-verified.sh --status",
|
||||
"/backup/scripts/offsite-escrow-evidence-report.sh --no-color"
|
||||
],
|
||||
"script_missing_count": 0,
|
||||
"offsite_configured": 1,
|
||||
"rclone_configured": 1,
|
||||
"b2_configured": 0,
|
||||
"readiness_require_configured_blocked": 0,
|
||||
"partial_marker_present": 1,
|
||||
"full_marker_present": 1,
|
||||
"escrow_missing_count": 5,
|
||||
"summary": {
|
||||
"pass": 8,
|
||||
"warn": 5,
|
||||
"blocked": 0
|
||||
}
|
||||
},
|
||||
"missing_items": [
|
||||
{
|
||||
"item": "restic_repository_password",
|
||||
"allowed_evidence_id_types": [
|
||||
"password_manager_item_id",
|
||||
"sealed_envelope_id",
|
||||
"recovery_checklist_id"
|
||||
],
|
||||
"status": "missing"
|
||||
},
|
||||
{
|
||||
"item": "offsite_provider_credentials",
|
||||
"allowed_evidence_id_types": [
|
||||
"vault_item_id",
|
||||
"provider_credential_record_id",
|
||||
"offsite_access_checklist_id"
|
||||
],
|
||||
"status": "missing"
|
||||
},
|
||||
{
|
||||
"item": "break_glass_admin_credentials",
|
||||
"allowed_evidence_id_types": [
|
||||
"break_glass_credential_record_id",
|
||||
"sealed_envelope_id",
|
||||
"emergency_access_checklist_id"
|
||||
],
|
||||
"status": "missing"
|
||||
},
|
||||
{
|
||||
"item": "dns_registrar_recovery",
|
||||
"allowed_evidence_id_types": [
|
||||
"registrar_recovery_checklist_id",
|
||||
"vault_item_id",
|
||||
"domain_recovery_record_id"
|
||||
],
|
||||
"status": "missing"
|
||||
},
|
||||
{
|
||||
"item": "oauth_ai_provider_recovery",
|
||||
"allowed_evidence_id_types": [
|
||||
"provider_recovery_checklist_id",
|
||||
"vault_item_id",
|
||||
"provider_account_recovery_record_id"
|
||||
],
|
||||
"status": "missing"
|
||||
}
|
||||
],
|
||||
"forbidden_values": [
|
||||
"password",
|
||||
"token",
|
||||
"api_key",
|
||||
"private_key",
|
||||
"ssh_key",
|
||||
"cookie",
|
||||
"session",
|
||||
"authorization_header",
|
||||
"oauth_client_secret",
|
||||
"refresh_token",
|
||||
"otp_seed",
|
||||
"recovery_code",
|
||||
"backup_code",
|
||||
"database_url_with_credentials",
|
||||
"secret_hash",
|
||||
"secret_prefix",
|
||||
"secret_suffix",
|
||||
"partial_token",
|
||||
"unredacted_screenshot",
|
||||
"placeholder"
|
||||
],
|
||||
"progress": {
|
||||
"owner_request_package_percent": 80,
|
||||
"owner_external_verification_percent": 0,
|
||||
"dry_run_validation_percent": 0,
|
||||
"marker_write_percent": 0,
|
||||
"dr_closeout_verification_percent": 0
|
||||
},
|
||||
"gates": {
|
||||
"runtime_execution_authorized": false,
|
||||
"secret_value_collection_authorized": false,
|
||||
"marker_write_completed": false,
|
||||
"dr_scorecard_complete": false
|
||||
},
|
||||
"done_criteria": [
|
||||
"ESCROW_MISSING_COUNT=0",
|
||||
"awoooi_backup_dr_credential_escrow_missing_count=0",
|
||||
"backup-status escrow_missing=0",
|
||||
"BackupCredentialEscrowEvidenceMissing not firing",
|
||||
"cold-start WARN=0 BLOCKED=0"
|
||||
]
|
||||
}
|
||||
@@ -11,9 +11,9 @@
|
||||
|
||||
| Area | Status | Completion | Evidence |
|
||||
|------|--------|------------|----------|
|
||||
| Overall recovery readiness | SERVICE_GREEN_WORKLOAD_BALANCED_DR_ESCROW_BLOCKED | 95% | 2026-06-13 01:26 final cold-start scorecard is `PASS=83 WARN=0 BLOCKED=0`; 120/121 K3s are both `Ready control-plane`, backup core blockers remain `0`, public routes/TLS/momo DB/schedules/Alertmanager are green, API/Web remain spread across 120 / 121 after deploy marker `e4a349bc`, and CD no longer clobbers global `known_hosts`. Remaining blocker is DR-only credential escrow evidence (`escrow_missing=5`); ArgoCD `km-vectorize` is tracked separately as governance health debt until its official scheduled Job refreshes `lastSuccessfulTime`. |
|
||||
| Overall recovery readiness | SERVICE_GREEN_WORKLOAD_BALANCED_DR_ESCROW_BLOCKED | 95% | 2026-06-13 12:59 final cold-start scorecard is `PASS=83 WARN=0 BLOCKED=0`; 120/121 K3s are both `Ready control-plane`, backup core blockers remain `0`, public routes/TLS/momo DB/schedules/Alertmanager are green, API/Web are live-verified split across 120 / 121 after topology strategy hardening, and CD no longer clobbers global `known_hosts`. 13:10 escrow report shows offsite/rclone/script readiness green, but DR remains blocked by five missing credential escrow evidence markers; ArgoCD `km-vectorize` is tracked separately as governance health debt until its official scheduled Job refreshes `lastSuccessfulTime`. |
|
||||
| P0 host / K3s recovery | DONE | 100% | 120 booted after console fsck at `2026-06-12 15:13`; host is reachable, root is mounted `rw`, failed units `0`, `mon` and `mon1` are both `Ready control-plane`, and cold-start P0/P1 checks are green. |
|
||||
| P1 backup / alert / escrow | BLOCKED_DR_ESCROW | 90% | 2026-06-13 01:26 `backup-status` shows 110 `13/13 fresh failed=0`, 188 `2/2 fresh failed=0`, `core_blockers=0`, `escrow_missing=5`; 01:28 offsite textfile has `remote_verify_ok=1`, `full_verify_fresh=1`, and all 13 repos `snapshot_count=1`; 01:27 Alertmanager exposes the five expected escrow gap alerts and Prometheus has all five required alert rule names healthy. |
|
||||
| P1 backup / alert / escrow | BLOCKED_DR_ESCROW | 92% | 2026-06-13 12:43 `backup-status` shows 110 `13/13 fresh failed=0`, 188 `2/2 fresh failed=0`, `core_blockers=0`, `escrow_missing=5`; 13:10 escrow report shows `SCRIPT_MISSING_COUNT=0`, `OFFSITE_CONFIGURED=1`, `RCLONE_CONFIGURED=1`, `ESCROW_MISSING_COUNT=5`, `PASS=8 WARN=5 BLOCKED=0`. Owner request package is now ready; actual marker write remains blocked on real non-secret evidence IDs. |
|
||||
| P2 service / data truth | VERIFIED_WORKLOAD_BALANCED | 100% | 2026-06-13 01:26 cold-start is green; public routes/TLS are green, VIP API/Web are reachable, momo current-month parity is `4571/4571` with matching date bounds, schedules/services are green. API/Web both keep 120 / 121 split placement after latest deploy marker `e4a349bc`. |
|
||||
| P3 docs / automation contracts | DONE_WITH_VALIDATION_GAP | 100% | Workplan, SOP v1.8, BACKUP-STATUS, LOGBOOK, 120 console/fsck recovery, Gitea backup stale-dump hardening, reboot ledger/version-comparison SOP, escrow evidence audit, 188 nginx Ansible baseline, 110 cold-start detector script, startup judgment layers, GO/NO-GO tree, host recovery cards, T+0/T+60 timeline checks, host role / load-balancing assessment, CD `known_hosts` guardrail, and `km-vectorize` remediation tracking are updated; Ansible syntax check is unavailable on this workstation. |
|
||||
|
||||
@@ -121,7 +121,7 @@ Next: <single next action>
|
||||
| P1-002 | VERIFIED | 100 | Confirm success-noise policy | Daily status is once at 06:05; normal backup success is not a Telegram spam path. | Keep failure-only escalation in backup docs. | Docs say failures escalate; daily status is summary only. |
|
||||
| P1-003 | VERIFIED | 100 | Confirm Google Drive latest-only | 2026-06-12 18:55 verifier shows 13 repos with exactly one remote snapshot each after the post-120 aggregate backup and full offsite sync. | Record evidence in backup status. | `REMOTE_LATEST_ONLY_OK=1`, `VERIFY_OK=1`. |
|
||||
| P1-004 | VERIFIED | 100 | Confirm required alerts exist | Live Prometheus rules include all five required backup/cold-start alerts. | Keep in scorecard. | All five alert names FOUND live. |
|
||||
| P1-005 | BLOCKED | 5 | Fill credential escrow evidence markers | Five markers are missing. This is a DR scorecard blocker, not a service outage. Scripts/config are present and the marker CLI supports `--dry-run`; secrets must not enter repo or chat. | Human verifies vault/offline escrow, validates each non-secret evidence ID with `--dry-run`, then writes markers using `/backup/scripts/mark-credential-escrow-verified.sh`. | `awoooi_backup_dr_credential_escrow_missing_count=0`. |
|
||||
| P1-005 | BLOCKED_WAITING_OWNER_EVIDENCE | 20 | Fill credential escrow evidence markers | Five markers are missing. This is a DR scorecard blocker, not a service outage. 2026-06-13 13:10 proves scripts/offsite/rclone readiness is green; the remaining blocker is owner-provided real non-secret evidence IDs. Owner request package exists at `docs/security/CREDENTIAL-ESCROW-EVIDENCE-OWNER-REQUEST.md`; secrets must not enter repo or chat. | Human verifies vault/offline escrow, validates each non-secret evidence ID with `--dry-run`, then writes markers using `/backup/scripts/mark-credential-escrow-verified.sh`. | `awoooi_backup_dr_credential_escrow_missing_count=0`. |
|
||||
| P1-006 | DONE | 100 | Fix backup health failed component | 2026-06-12 18:55 backup-status shows `failed=0`, `core_blockers=0`, `config_failed=0`; 120 config capture is no longer red. | Keep normal daily backup cadence. | `failed_count=0`, `config_failed=0`. |
|
||||
| P1-007 | DONE | 100 | Refresh stale backup jobs | 2026-06-04 cleared `stale188=momo_pg_daily`; 2026-06-05 cleared recurring `stale110=awoooi_db`; 2026-06-06 confirms no stale jobs after the next aggregate window. | Keep normal cron cadence; only 120-driven Configs remains red. | `stale110=none`, `stale188=none`, 110 `13/13 fresh`, 188 `2/2 fresh`. |
|
||||
| P1-008 | DONE | 100 | Align 188 momo backup cron/exporter contract | 188 backup exporter expected `/home/ollama/bin/momo-pg-backup.sh`; crontab still pointed to the old app-side script. Crontab was backed up and updated to the host-owned controller script. | Keep backup controller path in future deploy docs. | `configured_missing_188=0`, `awoooi_backup_job_configured{host="188",job="momo_pg_daily"} 1`. |
|
||||
@@ -129,6 +129,7 @@ Next: <single next action>
|
||||
| P1-010 | DONE | 100 | Offsite sync manual backup repairs | 2026-06-12 17:37 full offsite sync completed `13/13` after controlled P0 runway override to 240m; 18:55 verifier confirmed 13 remote repos each have one snapshot. | Allow normal 03:00 full sync cadence unless another manual backup creates new snapshots. | `REMOTE_LATEST_ONLY_OK=1`, `VERIFY_OK=1`, full sync `13/13`. |
|
||||
| P1-011 | DONE | 100 | Confirm 2026-06-12 backup convergence | 18:55 live check confirms the post-120 aggregate held: no stale jobs, no configured/missing script jobs, no failed components, offsite fresh, and only credential escrow remains as DR warning. | Keep escrow as explicit red gate. | `stale110=none`, `stale188=none`, `failed=0`, `config_failed=0`, `core_blockers=0`. |
|
||||
| P1-012 | DONE | 100 | Audit credential escrow marker write safety | 2026-06-12 15:02 `mark-credential-escrow-verified.sh --status` reports all five allowed items missing; `offsite-escrow-evidence-report.sh --no-color` reports rclone/offsite configured and `ESCROW_MISSING_COUNT=5`; repo search found only runbooks/placeholders/rules, not real evidence IDs. | Write markers only after a real non-secret evidence ID exists for each item; never write placeholder or secret. | The marker blocker is narrowed to missing external evidence IDs, not missing script/config/offsite readiness. |
|
||||
| P1-014 | DONE | 100 | Publish credential escrow owner request package | 2026-06-13 13:10 live report confirms `SCRIPT_MISSING_COUNT=0`, `OFFSITE_CONFIGURED=1`, `RCLONE_CONFIGURED=1`, `ESCROW_MISSING_COUNT=5`, `PASS=8 WARN=5 BLOCKED=0`. New owner request package defines allowed evidence-id types, forbidden secret values, safe dry-run flow, write flow, and closeout gates. | Dispatch to the credential owners without collecting secret values; keep marker write gated until owner gives real non-secret evidence IDs. | `docs/security/CREDENTIAL-ESCROW-EVIDENCE-OWNER-REQUEST.md` and snapshot exist and validate. |
|
||||
| P1-013 | IN_PROGRESS | 90 | Remediate `km-vectorize` CronJob health debt | ArgoCD Degraded is isolated to `CronJob/km-vectorize`: `lastSuccessfulTime` is stale even though retained 6/2-6/4 Jobs completed, and the manifest schedule was semantically wrong (`0 19` with `timeZone: Asia/Taipei` ran at 19:00 台北, not 03:00). Manual Job evidence is invalid because the controller deleted `km-vectorize-codex-002709` as `UnexpectedJob`. Gitea main `47ee96b0` is synced live and the CronJob spec is corrected. | Verify the next 03:00 official CronJob updates `lastSuccessfulTime` and ArgoCD returns `Healthy`. | `lastSuccessfulTime` is after the manifest sync, the official scheduled Job is `Complete`, and ArgoCD `awoooi-prod` health is `Healthy`. |
|
||||
|
||||
---
|
||||
|
||||
Reference in New Issue
Block a user