9.4 KiB
9.4 KiB
IwoooS Docker / systemd / 主機服務事故後回讀計畫
| 項目 | 內容 |
|---|---|
| 日期 | 2026-06-15 |
| 狀態 | post_incident_readback_plan_ready_no_runtime_action |
| 工具 | scripts/security/host-service-post-incident-readback-plan.py |
| Snapshot | docs/security/host-service-post-incident-readback-plan.snapshot.json |
| Source acceptance | docs/security/host-service-change-evidence-acceptance.snapshot.json |
| runtime gate | 0 |
1. 目的
本文件承接 Docker / systemd / host service 變更證據驗收帳本,補上事故後回讀計畫。未來若再次發生主機重啟、Docker daemon 卡住、compose stack 異常、systemd failed unit、repair-bot / runner 競爭、port binding / gateway 不一致、public route 或 AI provider 健康異常,IwoooS 必須先收齊「誰動、何時動、改前改後狀態、影響哪些服務、如何恢復、是否同步相關產品、如何防再發」的脫敏證據。
這不是 SSH 授權、不是 live host read、不是 docker / systemctl 操作、不是 repair-bot 執行、不是 Ansible apply、不是 route smoke、不是 host restart,也不是 runtime gate。它只建立 post-incident readback 的欄位、reviewer checks、分流與拒收條件,避免把「Docker API 可回應」、「container up」、「route 200」、「dashboard 可見」或「服務暫時恢復」誤判成「事故原因、責任、影響、恢復與防再發都已驗收」。
2. 摘要
| 指標 | 目前值 | 說明 |
|---|---|---|
| readback candidate | 9 |
承接 compose、systemd、repair-bot、Ansible service role 與 host config backup capture surface |
| write-capable readback candidate | 3 |
Ansible docker compose service role、110 repair-bot whitelist、188 repair-bot whitelist |
| live evidence required | 8 |
除本機開發 compose 外,其餘都需要 owner 提供脫敏 live evidence ref |
| recovery / health impact review required | 9 |
全部都必須交代 service、route、AI provider、monitoring 與產品影響 |
| cross-project sync required | 9 |
全部都必須交代跨產品 / owner / Session 同步 ref |
| no-false-green required | 9 |
全部都不得用服務變綠替代事故驗收 |
| readback field | 36 |
readback 欄位總數 |
| required readback field | 28 |
owner / reviewer 必填欄位 |
| reviewer check | 28 |
actor、boot / recovery、before / after、daemon、compose、systemd、failed unit、port binding、impact、同步、防再發與 no-false-green 檢查 |
| outcome lane | 10 |
waiting、補 actor、補 before-after、補服務狀態、補 impact、隔離、拒收、review、防再發回補、runtime gate |
| blocked action | 41 |
SSH、Docker、systemctl、repair-bot、Ansible、route smoke、reload、restart、secret、raw log / config、active scan、production write 等 |
| post-incident readback received / accepted | 0 / 0 |
尚未收到或驗收 |
| no-false-green accepted | 0 |
不把 route 200、container up、Docker API 回應、dashboard up 或 service healthy 當事故驗收 |
| runtime gate / action button | 0 / 0 |
不提供操作入口 |
3. Readback Candidate 範圍
| Candidate | 驗收焦點 |
|---|---|
host_service_post_incident_readback:local_dev_compose |
本機 compose 與 production 邊界,避免把開發狀態誤投影到正式服務 |
host_service_post_incident_readback:monitoring_110_compose |
110 monitoring compose、Prometheus / Alertmanager / exporter、route recovery 與 alert false-green |
host_service_post_incident_readback:monitoring_exporters_188_compose |
188 exporter compose、資料庫 / Redis 指標、monitoring 影響與 post-check |
host_service_post_incident_readback:sentry_110_reference_compose |
Sentry self-hosted compose、核心容器狀態、public route 與 admin route recovery |
host_service_post_incident_readback:langfuse_110_compose |
Langfuse compose、trace / prompt privacy 邊界、route 與 DB 依賴 |
host_service_post_incident_readback:ansible_docker_compose_service_role |
Ansible service role、compose action 權限、maintenance window 與 rollback owner |
host_service_post_incident_readback:repair_bot_110_whitelist |
110 repair-bot whitelist、Harbor / registry / Gitea / Langfuse / Alertmanager / SigNoz 影響 |
host_service_post_incident_readback:repair_bot_188_whitelist |
188 repair-bot whitelist、OpenClaw / MinIO / SigNoz / Redis / Nginx / Ollama 影響 |
host_service_post_incident_readback:config_backup_host_capture |
host config backup capture、systemd / Docker / Nginx / cron / K8s 來源與 no raw config 邊界 |
4. 必填 Readback 欄位
change_or_incident_refactor_attribution_refboot_time_refrestart_or_recovery_window_refbefore_service_state_refafter_service_state_refdocker_daemon_state_refcompose_stack_state_refsystemd_unit_state_reffailed_unit_review_refport_binding_state_refdependency_impact_refpublic_route_recovery_refadmin_route_recovery_refagent_provider_health_refmonitoring_alert_refoperator_notification_refcross_project_sync_refrestoration_evidence_refpostcheck_readback_refrecurrence_guard_refmaintenance_windowrollback_ownerfollowup_ownerredacted_evidence_refsno_secret_value_attestationno_raw_log_or_config_attestationno_false_green_attestation
5. Reviewer Checks
source_change_evidence_currentincident_ref_presentactor_not_anonymousboot_or_recovery_window_presentbefore_after_service_state_presentdocker_daemon_state_presentcompose_stack_state_presentsystemd_unit_state_presentfailed_unit_review_presentport_binding_state_presentdependency_impact_presentpublic_route_recovery_presentadmin_route_recovery_presentagent_provider_health_presentmonitoring_alert_ref_presentoperator_notification_presentcross_project_sync_presentrestoration_evidence_presentpostcheck_independentrecurrence_guard_presentrunner_repair_bot_contention_presentmaintenance_window_presentrollback_owner_presentno_false_green_route_or_containerraw_log_config_absentsecret_or_key_value_absentcounts_transition_saferuntime_stays_zero
6. Outcome Lanes
| Lane | 說明 |
|---|---|
waiting_post_incident_readback |
尚未收到主機服務事故回讀包;所有 accepted / runtime count 維持 0 |
request_actor_supplement |
缺 actor / owner / decision 時要求補件 |
request_before_after_supplement |
缺 before / after、boot time、restart window 或 restoration evidence 時要求補件 |
request_service_state_supplement |
缺 Docker daemon、compose、systemd、failed unit、port binding 或 dependency 狀態時要求補件 |
request_impact_supplement |
缺 public/admin route、AI provider、monitoring、operator notification 或 cross-project sync 時要求補件 |
quarantine_raw_payload |
收到 secret、env dump、raw log、raw journal、raw compose 或未脫敏 host config 時只能隔離 |
reject_unattributed_restart |
無 actor、無 affected scope、無 rollback 或無 notification 的 restart / kill / compose action 不得驗收 |
ready_for_host_service_post_incident_review |
metadata 合格後,只能進 reviewer review |
recurrence_guard_backfill_required |
需補防再發 guard、owner review、change freeze 或 automation block |
waiting_runtime_gate |
即使 readback accepted,runtime gate 仍需獨立人工批准 |
7. 禁止動作
ssh_readssh_writelive_host_readdocker_ps_live_readdocker_restartdocker_killdocker_startdocker_compose_updocker_compose_downdocker_compose_pullsystemctl_restartsystemctl_reloadsystemctl_killsystemctl_startrepair_bot_executeansible_applysudo_actionhost_file_writefirewall_changeport_changeroute_smokepublic_gateway_reloadnginx_reloadactive_scansecret_value_collectionraw_live_config_storageraw_docker_log_storageraw_journal_storageraw_env_dump_storageaccept_restart_without_actoraccept_recovery_without_before_afteraccept_service_healthy_as_config_acceptedaccept_route_200_as_all_greenaccept_container_up_as_all_greenskip_dependency_map_reviewskip_port_binding_reviewhide_daemon_runner_contentionmark_readback_accepted_without_reviewer_recordopen_runtime_gateadd_action_buttonproduction_write
8. 指令
產生 committed snapshot:
python3 scripts/security/host-service-post-incident-readback-plan.py \
--root . \
--source-change-evidence-report docs/security/host-service-change-evidence-acceptance.snapshot.json \
--output docs/security/host-service-post-incident-readback-plan.snapshot.json \
--generated-at 2026-06-15T20:30:00+08:00
驗證 guard:
python3 scripts/security/security-mirror-progress-guard.py --root .
9. 完成度
- 只讀計畫:
100% - owner readback received:
0% - reviewer accepted:
0% - runtime gate:
0% - action button:
0%
下一步是請 owner 以脫敏 evidence ref 補齊 110 / 188 / service surface 的事故回讀包;在驗收前,IwoooS 不會把任何 Docker / systemd / compose / repair-bot / route / monitoring 狀態提高成執行期授權。