114 lines
7.1 KiB
Markdown
114 lines
7.1 KiB
Markdown
# IwoooS Monitoring / Alerting / Observability repo-only 清冊
|
||
|
||
| 項目 | 內容 |
|
||
|------|------|
|
||
| 日期 | 2026-06-12 |
|
||
| 狀態 | `repo_only_inventory_ready` |
|
||
| 工具 | `scripts/security/monitoring-alerting-observability-inventory.py` |
|
||
| Snapshot | `docs/security/monitoring-alerting-observability-inventory.snapshot.json` |
|
||
| Schema | `docs/schemas/monitoring_alerting_observability_inventory_v1.schema.json` |
|
||
| runtime gate | `0` |
|
||
|
||
## 1. 目的
|
||
|
||
這份清冊把 Prometheus、Alertmanager、Grafana、SigNoz、Sentry、Langfuse、OTEL、Telegram / notification policy、deploy / reload scripts 與 alert chain smoke scripts 集中納入 IwoooS 高價值配置控管。
|
||
|
||
本階段仍是 repo-only source inventory:只讀已提交檔案、計算 SHA256、整理 owner gate 與 live evidence 缺口。不連 live Prometheus、不 reload Alertmanager、不改 Grafana、不套用 SigNoz rule、不部署 Sentry、不發 Telegram、不建立 silence、不 SSH、不 kubectl、不讀 secret value。
|
||
|
||
## 2. 覆蓋摘要
|
||
|
||
| 指標 | 目前值 | 說明 |
|
||
|------|--------|------|
|
||
| surface | `60` | 全部為 committed repo source |
|
||
| source exists | `60` | 每個 source path 都存在並有 SHA256 |
|
||
| Prometheus config surface | `8` | 基礎設定、remote write、generated target、service registry、exporter query |
|
||
| alert rule surface | `13` | Prometheus、K8s、SLO、Ollama、app alert rule 與 Grafana alert rule |
|
||
| Alertmanager receiver surface | `1` | route / receiver / grouping source |
|
||
| Grafana surface | `6` | alert rule 與 dashboard JSON |
|
||
| SigNoz surface | `3` | alert rule、log rule、API client |
|
||
| Sentry surface | `4` | compose、deploy、webhook receiver、API client |
|
||
| Langfuse surface | `3` | compose、runbook、API client |
|
||
| notification policy surface | `4` | failure-only policy、backup policy、observability contract、notification matrix |
|
||
| Telegram surface | `3` | digest policy、receipt package、gateway service |
|
||
| OTEL surface | `1` | SigNoz OTEL collector |
|
||
| deploy / reload surface | `6` | Alertmanager / Prometheus / Sentry / exporter deploy 或 reload-capable script |
|
||
| drift guard surface | `1` | Prometheus rule drift guard |
|
||
| smoke surface | `4` | live / test alert 與 alert chain smoke script |
|
||
| write-capable surface | `11` | 可能 reload、deploy、send notification、fire alert 或 restart exporter |
|
||
| owner response received / accepted | `0 / 0` | 不得假性拉高 |
|
||
| live evidence received | `0` | 尚未驗證 live monitoring truth |
|
||
| reload owner / receiver owner / route smoke accepted | `0 / 0 / 0` | 尚未授權 reload、route change 或 live smoke |
|
||
| runtime gate / action button | `0 / 0` | 不得建立前端執行入口 |
|
||
| P1-4 成熟度 | `56% -> 62%` | 只代表 repo-only 清冊完成,不代表 live alert chain 通過 |
|
||
|
||
## 3. Write-capable surface
|
||
|
||
| surface | 風險 | 必要 gate |
|
||
|---------|------|-----------|
|
||
| `deploy_alertmanager_config_script` | 可影響 receiver / route / reload | owner response、維護窗口、rollback owner、receiver smoke |
|
||
| `deploy_prometheus_alerts_script` | 可影響 alert rules / reload | rule test、receiver mapping、reload owner、rollback owner |
|
||
| `k8s_deploy_prometheus_config_script` | 可觸發 K8s apply / reload | K8s owner、ArgoCD / kubectl 邊界、rollback owner |
|
||
| `api_apply_prometheus_config_script` | 可由 API side 套用 Prometheus config | API deploy owner、config source owner、reload proof |
|
||
| `monitoring_exporter_deploy_script` | 可透過 host deploy 影響 exporter / restart | host owner、SSH boundary、restart window |
|
||
| `sentry_self_hosted_deploy` | 可部署 Sentry self-hosted stack | deploy owner、backup owner、migration rollback |
|
||
| `telegram_gateway_service` | 可送 Telegram / 寫 delivery path | token injection owner、receipt owner、send approval gate |
|
||
| `notification_manager_service` | 可改通知通道與路由 | channel owner、receipt owner、failure-only policy |
|
||
| `converged_alert_recurrence_notifier` | 可造成 recurrence escalation 或噪音 | noise budget owner、silence boundary、receipt proof |
|
||
| `fire_live_alert_script` | 可向 live alert chain 打告警 | allowed receiver、test window、cleanup owner |
|
||
| `fire_test_alert_script` | 可觸發測試告警與通知 | dedup proof、receiver owner、noise guard |
|
||
|
||
## 4. 固定 0 / false 邊界
|
||
|
||
```text
|
||
runtime_execution_authorized=false
|
||
host_write_authorized=false
|
||
prometheus_reload_authorized=false
|
||
alertmanager_reload_authorized=false
|
||
grafana_dashboard_apply_authorized=false
|
||
signoz_rule_apply_authorized=false
|
||
sentry_deploy_authorized=false
|
||
langfuse_config_change_authorized=false
|
||
otel_collector_reload_authorized=false
|
||
receiver_route_change_authorized=false
|
||
silence_policy_change_authorized=false
|
||
telegram_send_authorized=false
|
||
notification_route_change_authorized=false
|
||
webhook_receiver_change_authorized=false
|
||
remote_write_change_authorized=false
|
||
exporter_deploy_authorized=false
|
||
live_alert_fire_authorized=false
|
||
alert_chain_smoke_authorized=false
|
||
ssh_read_authorized=false
|
||
ssh_write_authorized=false
|
||
kubectl_action_authorized=false
|
||
secret_value_collection_allowed=false
|
||
active_scan_authorized=false
|
||
action_buttons_allowed=false
|
||
```
|
||
|
||
## 5. 下一步 owner response
|
||
|
||
1. Prometheus owner:提供 live config hash、rule diff、reload owner、rollback owner 與 false-green guard。
|
||
2. Alertmanager owner:提供 receiver diff、route owner、silence policy owner、reload owner 與 failure-only notification proof。
|
||
3. Grafana owner:提供 dashboard UID / folder owner、import owner、rollback ref 與 smoke plan。
|
||
4. SigNoz / OTEL owner:提供 pipeline diff、rule apply owner、data export boundary 與 rollback owner。
|
||
5. Sentry / Langfuse owner:提供 compose live hash、secret injection owner、upgrade / restart window、backup owner 與 rollback owner。
|
||
6. Telegram / notification owner:提供 receiver owner、receipt owner、redaction proof、retry boundary 與 no-secret-value evidence。
|
||
7. Alert chain smoke owner:提供 allowed receiver、execution window、expected receipt、noise budget 與 cleanup owner。
|
||
|
||
## 6. 完成度
|
||
|
||
| 工作 | 完成度 | 說明 |
|
||
|------|--------|------|
|
||
| P1-4 repo-only surface 註冊 | `100%` | `60` 個 surface 已納入 snapshot |
|
||
| source existence / SHA256 | `100%` | `60 / 60` source path 存在 |
|
||
| monitoring / alerting 高價值配置成熟度 | `56% -> 62%` | 只代表清冊與 guard 準備度 |
|
||
| owner response 收件 / 接受 | `0%` | 尚未收到或接受任何 owner response |
|
||
| live evidence collection | `0%` | 未讀 live monitoring stack |
|
||
| reload / receiver / route smoke gate | `0%` | 未授權、未執行 |
|
||
| runtime gate | `0%` | 無前端執行按鈕 |
|
||
|
||
## 7. 邊界
|
||
|
||
本清冊完成不代表 live Prometheus / Alertmanager / Grafana / SigNoz / Sentry / Langfuse 已一致,也不代表 alert route 已送達或告警必到。不得把 repo source 可見、snapshot、IwoooS UI、AwoooP approval 或 LOGBOOK 當成 Prometheus reload、Alertmanager reload、Grafana import、SigNoz apply、Sentry deploy、Langfuse change、OTEL reload、remote write change、silence change、Telegram send、live alert fire、alert chain smoke、SSH、kubectl、active scan 或 secret collection 授權。
|