Files
awoooi/docs/security/MONITORING-ALERTING-OBSERVABILITY-INVENTORY.md
Your Name 8a424f0c56
All checks were successful
CD Pipeline / tests (push) Successful in 1m26s
Code Review / ai-code-review (push) Successful in 23s
CD Pipeline / build-and-deploy (push) Successful in 4m52s
CD Pipeline / post-deploy-checks (push) Successful in 1m59s
feat(security): 新增 monitoring alerting 只讀清冊
2026-06-12 00:45:08 +08:00

114 lines
7.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# IwoooS Monitoring / Alerting / Observability repo-only 清冊
| 項目 | 內容 |
|------|------|
| 日期 | 2026-06-12 |
| 狀態 | `repo_only_inventory_ready` |
| 工具 | `scripts/security/monitoring-alerting-observability-inventory.py` |
| Snapshot | `docs/security/monitoring-alerting-observability-inventory.snapshot.json` |
| Schema | `docs/schemas/monitoring_alerting_observability_inventory_v1.schema.json` |
| runtime gate | `0` |
## 1. 目的
這份清冊把 Prometheus、Alertmanager、Grafana、SigNoz、Sentry、Langfuse、OTEL、Telegram / notification policy、deploy / reload scripts 與 alert chain smoke scripts 集中納入 IwoooS 高價值配置控管。
本階段仍是 repo-only source inventory只讀已提交檔案、計算 SHA256、整理 owner gate 與 live evidence 缺口。不連 live Prometheus、不 reload Alertmanager、不改 Grafana、不套用 SigNoz rule、不部署 Sentry、不發 Telegram、不建立 silence、不 SSH、不 kubectl、不讀 secret value。
## 2. 覆蓋摘要
| 指標 | 目前值 | 說明 |
|------|--------|------|
| surface | `60` | 全部為 committed repo source |
| source exists | `60` | 每個 source path 都存在並有 SHA256 |
| Prometheus config surface | `8` | 基礎設定、remote write、generated target、service registry、exporter query |
| alert rule surface | `13` | Prometheus、K8s、SLO、Ollama、app alert rule 與 Grafana alert rule |
| Alertmanager receiver surface | `1` | route / receiver / grouping source |
| Grafana surface | `6` | alert rule 與 dashboard JSON |
| SigNoz surface | `3` | alert rule、log rule、API client |
| Sentry surface | `4` | compose、deploy、webhook receiver、API client |
| Langfuse surface | `3` | compose、runbook、API client |
| notification policy surface | `4` | failure-only policy、backup policy、observability contract、notification matrix |
| Telegram surface | `3` | digest policy、receipt package、gateway service |
| OTEL surface | `1` | SigNoz OTEL collector |
| deploy / reload surface | `6` | Alertmanager / Prometheus / Sentry / exporter deploy 或 reload-capable script |
| drift guard surface | `1` | Prometheus rule drift guard |
| smoke surface | `4` | live / test alert 與 alert chain smoke script |
| write-capable surface | `11` | 可能 reload、deploy、send notification、fire alert 或 restart exporter |
| owner response received / accepted | `0 / 0` | 不得假性拉高 |
| live evidence received | `0` | 尚未驗證 live monitoring truth |
| reload owner / receiver owner / route smoke accepted | `0 / 0 / 0` | 尚未授權 reload、route change 或 live smoke |
| runtime gate / action button | `0 / 0` | 不得建立前端執行入口 |
| P1-4 成熟度 | `56% -> 62%` | 只代表 repo-only 清冊完成,不代表 live alert chain 通過 |
## 3. Write-capable surface
| surface | 風險 | 必要 gate |
|---------|------|-----------|
| `deploy_alertmanager_config_script` | 可影響 receiver / route / reload | owner response、維護窗口、rollback owner、receiver smoke |
| `deploy_prometheus_alerts_script` | 可影響 alert rules / reload | rule test、receiver mapping、reload owner、rollback owner |
| `k8s_deploy_prometheus_config_script` | 可觸發 K8s apply / reload | K8s owner、ArgoCD / kubectl 邊界、rollback owner |
| `api_apply_prometheus_config_script` | 可由 API side 套用 Prometheus config | API deploy owner、config source owner、reload proof |
| `monitoring_exporter_deploy_script` | 可透過 host deploy 影響 exporter / restart | host owner、SSH boundary、restart window |
| `sentry_self_hosted_deploy` | 可部署 Sentry self-hosted stack | deploy owner、backup owner、migration rollback |
| `telegram_gateway_service` | 可送 Telegram / 寫 delivery path | token injection owner、receipt owner、send approval gate |
| `notification_manager_service` | 可改通知通道與路由 | channel owner、receipt owner、failure-only policy |
| `converged_alert_recurrence_notifier` | 可造成 recurrence escalation 或噪音 | noise budget owner、silence boundary、receipt proof |
| `fire_live_alert_script` | 可向 live alert chain 打告警 | allowed receiver、test window、cleanup owner |
| `fire_test_alert_script` | 可觸發測試告警與通知 | dedup proof、receiver owner、noise guard |
## 4. 固定 0 / false 邊界
```text
runtime_execution_authorized=false
host_write_authorized=false
prometheus_reload_authorized=false
alertmanager_reload_authorized=false
grafana_dashboard_apply_authorized=false
signoz_rule_apply_authorized=false
sentry_deploy_authorized=false
langfuse_config_change_authorized=false
otel_collector_reload_authorized=false
receiver_route_change_authorized=false
silence_policy_change_authorized=false
telegram_send_authorized=false
notification_route_change_authorized=false
webhook_receiver_change_authorized=false
remote_write_change_authorized=false
exporter_deploy_authorized=false
live_alert_fire_authorized=false
alert_chain_smoke_authorized=false
ssh_read_authorized=false
ssh_write_authorized=false
kubectl_action_authorized=false
secret_value_collection_allowed=false
active_scan_authorized=false
action_buttons_allowed=false
```
## 5. 下一步 owner response
1. Prometheus owner提供 live config hash、rule diff、reload owner、rollback owner 與 false-green guard。
2. Alertmanager owner提供 receiver diff、route owner、silence policy owner、reload owner 與 failure-only notification proof。
3. Grafana owner提供 dashboard UID / folder owner、import owner、rollback ref 與 smoke plan。
4. SigNoz / OTEL owner提供 pipeline diff、rule apply owner、data export boundary 與 rollback owner。
5. Sentry / Langfuse owner提供 compose live hash、secret injection owner、upgrade / restart window、backup owner 與 rollback owner。
6. Telegram / notification owner提供 receiver owner、receipt owner、redaction proof、retry boundary 與 no-secret-value evidence。
7. Alert chain smoke owner提供 allowed receiver、execution window、expected receipt、noise budget 與 cleanup owner。
## 6. 完成度
| 工作 | 完成度 | 說明 |
|------|--------|------|
| P1-4 repo-only surface 註冊 | `100%` | `60` 個 surface 已納入 snapshot |
| source existence / SHA256 | `100%` | `60 / 60` source path 存在 |
| monitoring / alerting 高價值配置成熟度 | `56% -> 62%` | 只代表清冊與 guard 準備度 |
| owner response 收件 / 接受 | `0%` | 尚未收到或接受任何 owner response |
| live evidence collection | `0%` | 未讀 live monitoring stack |
| reload / receiver / route smoke gate | `0%` | 未授權、未執行 |
| runtime gate | `0%` | 無前端執行按鈕 |
## 7. 邊界
本清冊完成不代表 live Prometheus / Alertmanager / Grafana / SigNoz / Sentry / Langfuse 已一致,也不代表 alert route 已送達或告警必到。不得把 repo source 可見、snapshot、IwoooS UI、AwoooP approval 或 LOGBOOK 當成 Prometheus reload、Alertmanager reload、Grafana import、SigNoz apply、Sentry deploy、Langfuse change、OTEL reload、remote write change、silence change、Telegram send、live alert fire、alert chain smoke、SSH、kubectl、active scan 或 secret collection 授權。