Files
awoooi/docs/security/HOST-SERVICE-POST-INCIDENT-READBACK-PLAN.md
Your Name abda0ef617
All checks were successful
Code Review / ai-code-review (push) Successful in 18s
CD Pipeline / tests (push) Successful in 1m41s
CD Pipeline / build-and-deploy (push) Successful in 5m21s
CD Pipeline / post-deploy-checks (push) Successful in 2m49s
feat(iwooos): 新增主機服務事故回讀 gate
2026-06-15 20:01:14 +08:00

9.4 KiB
Raw Blame History

IwoooS Docker / systemd / 主機服務事故後回讀計畫

項目 內容
日期 2026-06-15
狀態 post_incident_readback_plan_ready_no_runtime_action
工具 scripts/security/host-service-post-incident-readback-plan.py
Snapshot docs/security/host-service-post-incident-readback-plan.snapshot.json
Source acceptance docs/security/host-service-change-evidence-acceptance.snapshot.json
runtime gate 0

1. 目的

本文件承接 Docker / systemd / host service 變更證據驗收帳本補上事故後回讀計畫。未來若再次發生主機重啟、Docker daemon 卡住、compose stack 異常、systemd failed unit、repair-bot / runner 競爭、port binding / gateway 不一致、public route 或 AI provider 健康異常IwoooS 必須先收齊「誰動、何時動、改前改後狀態、影響哪些服務、如何恢復、是否同步相關產品、如何防再發」的脫敏證據。

這不是 SSH 授權、不是 live host read、不是 docker / systemctl 操作、不是 repair-bot 執行、不是 Ansible apply、不是 route smoke、不是 host restart也不是 runtime gate。它只建立 post-incident readback 的欄位、reviewer checks、分流與拒收條件避免把「Docker API 可回應」、「container up」、「route 200」、「dashboard 可見」或「服務暫時恢復」誤判成「事故原因、責任、影響、恢復與防再發都已驗收」。

2. 摘要

指標 目前值 說明
readback candidate 9 承接 compose、systemd、repair-bot、Ansible service role 與 host config backup capture surface
write-capable readback candidate 3 Ansible docker compose service role、110 repair-bot whitelist、188 repair-bot whitelist
live evidence required 8 除本機開發 compose 外,其餘都需要 owner 提供脫敏 live evidence ref
recovery / health impact review required 9 全部都必須交代 service、route、AI provider、monitoring 與產品影響
cross-project sync required 9 全部都必須交代跨產品 / owner / Session 同步 ref
no-false-green required 9 全部都不得用服務變綠替代事故驗收
readback field 36 readback 欄位總數
required readback field 28 owner / reviewer 必填欄位
reviewer check 28 actor、boot / recovery、before / after、daemon、compose、systemd、failed unit、port binding、impact、同步、防再發與 no-false-green 檢查
outcome lane 10 waiting、補 actor、補 before-after、補服務狀態、補 impact、隔離、拒收、review、防再發回補、runtime gate
blocked action 41 SSH、Docker、systemctl、repair-bot、Ansible、route smoke、reload、restart、secret、raw log / config、active scan、production write 等
post-incident readback received / accepted 0 / 0 尚未收到或驗收
no-false-green accepted 0 不把 route 200、container up、Docker API 回應、dashboard up 或 service healthy 當事故驗收
runtime gate / action button 0 / 0 不提供操作入口

3. Readback Candidate 範圍

Candidate 驗收焦點
host_service_post_incident_readback:local_dev_compose 本機 compose 與 production 邊界,避免把開發狀態誤投影到正式服務
host_service_post_incident_readback:monitoring_110_compose 110 monitoring compose、Prometheus / Alertmanager / exporter、route recovery 與 alert false-green
host_service_post_incident_readback:monitoring_exporters_188_compose 188 exporter compose、資料庫 / Redis 指標、monitoring 影響與 post-check
host_service_post_incident_readback:sentry_110_reference_compose Sentry self-hosted compose、核心容器狀態、public route 與 admin route recovery
host_service_post_incident_readback:langfuse_110_compose Langfuse compose、trace / prompt privacy 邊界、route 與 DB 依賴
host_service_post_incident_readback:ansible_docker_compose_service_role Ansible service role、compose action 權限、maintenance window 與 rollback owner
host_service_post_incident_readback:repair_bot_110_whitelist 110 repair-bot whitelist、Harbor / registry / Gitea / Langfuse / Alertmanager / SigNoz 影響
host_service_post_incident_readback:repair_bot_188_whitelist 188 repair-bot whitelist、OpenClaw / MinIO / SigNoz / Redis / Nginx / Ollama 影響
host_service_post_incident_readback:config_backup_host_capture host config backup capture、systemd / Docker / Nginx / cron / K8s 來源與 no raw config 邊界

4. 必填 Readback 欄位

  1. change_or_incident_ref
  2. actor_attribution_ref
  3. boot_time_ref
  4. restart_or_recovery_window_ref
  5. before_service_state_ref
  6. after_service_state_ref
  7. docker_daemon_state_ref
  8. compose_stack_state_ref
  9. systemd_unit_state_ref
  10. failed_unit_review_ref
  11. port_binding_state_ref
  12. dependency_impact_ref
  13. public_route_recovery_ref
  14. admin_route_recovery_ref
  15. agent_provider_health_ref
  16. monitoring_alert_ref
  17. operator_notification_ref
  18. cross_project_sync_ref
  19. restoration_evidence_ref
  20. postcheck_readback_ref
  21. recurrence_guard_ref
  22. maintenance_window
  23. rollback_owner
  24. followup_owner
  25. redacted_evidence_refs
  26. no_secret_value_attestation
  27. no_raw_log_or_config_attestation
  28. no_false_green_attestation

5. Reviewer Checks

  1. source_change_evidence_current
  2. incident_ref_present
  3. actor_not_anonymous
  4. boot_or_recovery_window_present
  5. before_after_service_state_present
  6. docker_daemon_state_present
  7. compose_stack_state_present
  8. systemd_unit_state_present
  9. failed_unit_review_present
  10. port_binding_state_present
  11. dependency_impact_present
  12. public_route_recovery_present
  13. admin_route_recovery_present
  14. agent_provider_health_present
  15. monitoring_alert_ref_present
  16. operator_notification_present
  17. cross_project_sync_present
  18. restoration_evidence_present
  19. postcheck_independent
  20. recurrence_guard_present
  21. runner_repair_bot_contention_present
  22. maintenance_window_present
  23. rollback_owner_present
  24. no_false_green_route_or_container
  25. raw_log_config_absent
  26. secret_or_key_value_absent
  27. counts_transition_safe
  28. runtime_stays_zero

6. Outcome Lanes

Lane 說明
waiting_post_incident_readback 尚未收到主機服務事故回讀包;所有 accepted / runtime count 維持 0
request_actor_supplement 缺 actor / owner / decision 時要求補件
request_before_after_supplement 缺 before / after、boot time、restart window 或 restoration evidence 時要求補件
request_service_state_supplement 缺 Docker daemon、compose、systemd、failed unit、port binding 或 dependency 狀態時要求補件
request_impact_supplement 缺 public/admin route、AI provider、monitoring、operator notification 或 cross-project sync 時要求補件
quarantine_raw_payload 收到 secret、env dump、raw log、raw journal、raw compose 或未脫敏 host config 時只能隔離
reject_unattributed_restart 無 actor、無 affected scope、無 rollback 或無 notification 的 restart / kill / compose action 不得驗收
ready_for_host_service_post_incident_review metadata 合格後,只能進 reviewer review
recurrence_guard_backfill_required 需補防再發 guard、owner review、change freeze 或 automation block
waiting_runtime_gate 即使 readback acceptedruntime gate 仍需獨立人工批准

7. 禁止動作

  1. ssh_read
  2. ssh_write
  3. live_host_read
  4. docker_ps_live_read
  5. docker_restart
  6. docker_kill
  7. docker_start
  8. docker_compose_up
  9. docker_compose_down
  10. docker_compose_pull
  11. systemctl_restart
  12. systemctl_reload
  13. systemctl_kill
  14. systemctl_start
  15. repair_bot_execute
  16. ansible_apply
  17. sudo_action
  18. host_file_write
  19. firewall_change
  20. port_change
  21. route_smoke
  22. public_gateway_reload
  23. nginx_reload
  24. active_scan
  25. secret_value_collection
  26. raw_live_config_storage
  27. raw_docker_log_storage
  28. raw_journal_storage
  29. raw_env_dump_storage
  30. accept_restart_without_actor
  31. accept_recovery_without_before_after
  32. accept_service_healthy_as_config_accepted
  33. accept_route_200_as_all_green
  34. accept_container_up_as_all_green
  35. skip_dependency_map_review
  36. skip_port_binding_review
  37. hide_daemon_runner_contention
  38. mark_readback_accepted_without_reviewer_record
  39. open_runtime_gate
  40. add_action_button
  41. production_write

8. 指令

產生 committed snapshot

python3 scripts/security/host-service-post-incident-readback-plan.py \
  --root . \
  --source-change-evidence-report docs/security/host-service-change-evidence-acceptance.snapshot.json \
  --output docs/security/host-service-post-incident-readback-plan.snapshot.json \
  --generated-at 2026-06-15T20:30:00+08:00

驗證 guard

python3 scripts/security/security-mirror-progress-guard.py --root .

9. 完成度

  • 只讀計畫:100%
  • owner readback received0%
  • reviewer accepted0%
  • runtime gate0%
  • action button0%

下一步是請 owner 以脫敏 evidence ref 補齊 110 / 188 / service surface 的事故回讀包在驗收前IwoooS 不會把任何 Docker / systemd / compose / repair-bot / route / monitoring 狀態提高成執行期授權。