Files
awoooi/docs/security/SSH-NETWORK-POST-INCIDENT-READBACK-PLAN.md
Your Name 09aeebb767
All checks were successful
Code Review / ai-code-review (push) Successful in 14s
CD Pipeline / tests (push) Successful in 1m36s
CD Pipeline / build-and-deploy (push) Successful in 3m54s
CD Pipeline / post-deploy-checks (push) Successful in 1m45s
feat(iwooos): 新增 SSH network 事故回讀 gate
2026-06-15 19:26:24 +08:00

192 lines
9.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# IwoooS SSH / network / firewall post-incident readback plan
| 項目 | 內容 |
|------|------|
| 日期 | 2026-06-15 |
| 狀態 | `post_incident_readback_plan_ready_no_runtime_action` |
| 工具 | `scripts/security/ssh-network-post-incident-readback-plan.py` |
| Snapshot | `docs/security/ssh-network-post-incident-readback-plan.snapshot.json` |
| Source acceptance | `docs/security/port-firewall-change-evidence-acceptance.snapshot.json` |
| runtime gate | `0` |
## 1. 目的
本文件承接端口 / 防火牆變更證據驗收帳本補上事故後回讀計畫。未來若再次發生端口被關、firewall / NetworkPolicy / NodePort / WireGuard policy 被改動、deploy SSH 斷線、AI provider route 異常、public route 或 monitoring 路徑受影響IwoooS 必須先收齊「誰改、何時改、改前改後狀態、影響哪些服務、是否同步相關產品、怎麼恢復、怎麼防再發」的脫敏證據。
這不是 SSH 授權、不是 live firewall read、不是 firewall / port change、不是 route smoke、不是 host restart也不是 runtime gate。它只建立 post-incident readback 的欄位、reviewer checks、分流與拒收條件避免把「服務後來恢復」誤判成「事故原因、責任、影響與防再發都已驗收」。
## 2. 摘要
| 指標 | 目前值 | 說明 |
|------|--------|------|
| readback candidate | `14` | 承接端口 / 防火牆 / NodePort / NetworkPolicy / WireGuard / deploy SSH / sudo / alert action surface |
| write-capable readback candidate | `6` | 可能影響 deploy SSH、monitoring deploy、sudoers 或 alert action catalog 的 surface |
| policy / exposure readback candidate | `5` | NetworkPolicy、NodePort、WireGuard 與 exposure 相關 surface |
| health impact review required | `14` | 全部都必須交代 service / AI provider / monitoring / product impact |
| cross-project sync required | `14` | 全部都必須交代跨產品 / owner / Session 同步 ref |
| recurrence guard required | `14` | 全部都必須提出防再發 guard 或 change freeze rule |
| readback field | `30` | readback 欄位總數 |
| required readback field | `24` | owner / reviewer 必填欄位 |
| reviewer check | `24` | actor、before / after、health impact、通知、同步、恢復、防再發與 no-false-green 檢查 |
| outcome lane | `10` | waiting、補 actor、補 before-after、補 health impact、隔離、拒收、review、ledger-only、防再發回補、runtime gate |
| blocked action | `34` | SSH、firewall、port、route smoke、reload、restart、secret、active scan、provider switch、prompt send、production write 等 |
| post-incident readback received / accepted | `0 / 0` | 尚未收到或驗收 |
| no-false-green accepted | `0` | 不把 route 200、service up 或 UI 可見當事故驗收 |
| runtime gate / action button | `0 / 0` | 不提供操作入口 |
## 3. Readback Candidate 範圍
| Candidate | 驗收焦點 |
|-----------|----------|
| `ssh_network_post_incident_readback:ansible_inventory_ssh_targets` | 主機存取異動、端口影響、維護窗口、rollback 與 post-check |
| `ssh_network_post_incident_readback:gitea_cd_deploy_ssh` | deploy SSH 可達性、回復證據、rollback owner 與跨專案通知 |
| `ssh_network_post_incident_readback:gitea_cd_dev_ssh` | dev / prod 邊界、端口 policy、owner decision 與防再發 |
| `ssh_network_post_incident_readback:deploy_alerts_ssh_path` | alert deploy path、通知鏈路、受影響產品與恢復 readback |
| `ssh_network_post_incident_readback:monitoring_discover_docker_ssh` | monitoring discovery 可達性、read-only window 與 false-green 風險 |
| `ssh_network_post_incident_readback:monitoring_exporter_deploy_ssh` | exporter deploy access、firewall owner、post-check 與 rollback |
| `ssh_network_post_incident_readback:backup_config_ssh_capture` | backup access、restore validation、service dependency 與 notification |
| `ssh_network_post_incident_readback:host_ops_sudoers_wrapper` | sudo 授權邊界、break-glass、回復責任與 forbidden command proof |
| `ssh_network_post_incident_readback:k8s_prod_network_policy` | ingress / egress policy、route impact、metrics / alert 與回滾 |
| `ssh_network_post_incident_readback:argocd_metrics_network_policy` | metrics scrape、NodePort exposure、source whitelist 與 monitoring impact |
| `ssh_network_post_incident_readback:argocd_metrics_nodeport` | NodePort exposure、firewall owner、rollback 與 public/admin route 影響 |
| `ssh_network_post_incident_readback:velero_metrics_nodeport` | backup metrics exposure、access policy 與 restore readiness 影響 |
| `ssh_network_post_incident_readback:wireguard_mesh_runbook` | mesh cutover、firewall rule owner、canary / rollback 與 maintenance window |
| `ssh_network_post_incident_readback:alert_rules_ssh_actions` | alert action catalog、read/write/admin 分級、cooldown 與 post-check |
## 4. 必填 Readback 欄位
1. `change_or_incident_ref`
2. `actor_attribution_ref`
3. `incident_detected_at_ref`
4. `change_window_ref`
5. `affected_port_or_policy_ref`
6. `before_state_ref`
7. `after_state_ref`
8. `service_dependency_ref`
9. `public_route_impact_ref`
10. `ai_provider_impact_ref`
11. `monitoring_alert_impact_ref`
12. `customer_or_product_impact_ref`
13. `operator_notification_ref`
14. `cross_project_sync_ref`
15. `restoration_evidence_ref`
16. `postcheck_readback_ref`
17. `recurrence_guard_ref`
18. `maintenance_window`
19. `rollback_owner`
20. `followup_owner`
21. `redacted_evidence_refs`
22. `no_secret_value_attestation`
23. `no_raw_firewall_dump_attestation`
24. `no_false_green_attestation`
## 5. Reviewer Checks
1. `source_change_evidence_current`
2. `incident_ref_present`
3. `actor_not_anonymous`
4. `before_after_state_present`
5. `port_policy_redacted`
6. `service_dependency_present`
7. `public_route_impact_present`
8. `ai_provider_impact_present`
9. `monitoring_alert_impact_present`
10. `customer_product_impact_present`
11. `operator_notification_present`
12. `cross_project_sync_present`
13. `restoration_evidence_present`
14. `postcheck_independent`
15. `recurrence_guard_present`
16. `emergency_classification_present`
17. `maintenance_window_present`
18. `rollback_owner_present`
19. `no_false_green_route_200`
20. `raw_firewall_dump_absent`
21. `secret_or_key_value_absent`
22. `hidden_impact_absent`
23. `counts_transition_safe`
24. `runtime_stays_zero`
## 6. Outcome Lanes
| Lane | 說明 |
|------|------|
| `waiting_post_incident_readback` | 尚未收到事故回讀包;所有 accepted / runtime count 維持 0 |
| `request_actor_supplement` | 缺 actor / owner / decision 時要求補件 |
| `request_before_after_supplement` | 缺 before / after 或 restoration evidence 時要求補件 |
| `request_health_impact_supplement` | 缺 service / AI provider / monitoring / product impact 時要求補件 |
| `quarantine_raw_payload` | 收到 raw firewall dump、secret 或 key material 時只能隔離 |
| `reject_unattributed_incident` | 無 actor、無 affected scope、無 rollback 或無 notification 的事故回讀不得驗收 |
| `ready_for_post_incident_review` | metadata 合格後,只能進 reviewer review |
| `incident_readback_only_update` | 只允許更新只讀 ledger不得反向視為已批准操作 |
| `recurrence_guard_backfill_required` | 需補防再發 guard、owner review 與 change freeze |
| `waiting_runtime_gate` | 即使 readback acceptedruntime gate 仍需獨立人工批准 |
## 7. 禁止動作
1. `ssh_read`
2. `ssh_write`
3. `live_firewall_read`
4. `firewall_change`
5. `port_change`
6. `port_close`
7. `port_open`
8. `network_policy_apply`
9. `nodeport_change`
10. `wireguard_change`
11. `sudo_action`
12. `deploy_ssh_action`
13. `route_smoke`
14. `public_gateway_reload`
15. `nginx_reload`
16. `host_restart`
17. `docker_restart`
18. `systemd_restart`
19. `secret_value_collection`
20. `ssh_key_collection`
21. `raw_firewall_dump_storage`
22. `raw_key_material_storage`
23. `mark_readback_accepted_without_reviewer_record`
24. `mark_incident_resolved_without_postcheck`
25. `hide_cross_project_impact`
26. `treat_route_200_as_all_green`
27. `treat_break_glass_as_approval`
28. `close_management_port_without_owner`
29. `open_runtime_gate`
30. `add_action_button`
31. `production_write`
32. `active_scan`
33. `provider_switch`
34. `prompt_send`
## 8. 指令
產生 committed snapshot
```bash
python3 scripts/security/ssh-network-post-incident-readback-plan.py \
--root . \
--source-change-evidence-report docs/security/port-firewall-change-evidence-acceptance.snapshot.json \
--output docs/security/ssh-network-post-incident-readback-plan.snapshot.json \
--generated-at 2026-06-15T19:16:00+08:00
```
驗證 guard
```bash
python3 scripts/security/security-mirror-progress-guard.py --root .
```
## 9. 完成度
| 工作 | 完成度 | 說明 |
|------|--------|------|
| post-incident readback plan artifact | `100%` | 14 份候選、snapshot、文件與 guard 已固定 |
| post-incident readback received / accepted | `0%` | 尚未收到,尚未驗收 |
| actor / before-after / impact evidence | `0%` | 尚未收到 owner-provided evidence |
| service / AI provider / monitoring impact | `0%` | 尚未收到脫敏 impact refs |
| cross-project sync / notification evidence | `0%` | 尚未收到同步與通知證據 |
| recurrence guard / no-false-green accepted | `0%` | 尚未驗收防再發或 no-false-green |
| SSH / firewall / port / route / restart action | `0%` | 未授權且未執行 |
| runtime gate / production write | `0%` | 未授權且未執行 |