# 告警鏈路 E2E 驗證 Runbook > Phase O-5 Wave 5.4 (2026-04-02 ogt) > ADR-025 / ADR-035 / ADR-037 --- ## 架構概覽 ``` Prometheus → Alertmanager → AWOOOI API → Telegram ↓ SigNoz Trace ↓ Langfuse (AI 分析) ``` **端點**: - Prometheus: `192.168.0.110:9090` - Alertmanager: `192.168.0.110:9093` - AWOOOI API: `https://awoooi.wooo.work` / `192.168.0.120:32334` (K8s) - Webhook: `POST /api/v1/webhooks/alertmanager` --- ## 快速煙霧測試 ```bash # 執行 Wave A 全量驗收 (8 項) python3 scripts/alert_chain_smoke_test.py # 監控覆蓋率驗證 python3 scripts/generate_monitoring.py # JSON 輸出 (CI 用) python3 scripts/generate_monitoring.py --json python3 scripts/generate_monitoring.py --check # exit 1 if < 70% ``` --- ## 手動 E2E 測試步驟 ### Step 1: 觸發測試告警 ```bash curl -X POST http://192.168.0.110:9093/api/v1/alerts \ -H "Content-Type: application/json" \ -d '[{ "labels": { "alertname": "TestAlert", "severity": "warning", "service": "test" }, "annotations": { "summary": "E2E 測試告警", "description": "手動觸發,驗證鏈路" } }]' ``` ### Step 2: 驗證 AWOOOI API 收到 ```bash # 查看最近 webhook 日誌 kubectl logs -n awoooi-prod deployment/awoooi-api --since=5m | grep -i "alertmanager\|webhook" ``` ### Step 3: 驗證 alert_chain 指標更新 ```bash # 查詢 Prometheus curl -s "http://192.168.0.110:9090/api/v1/query?query=alert_chain_last_success_timestamp" \ | python3 -c "import json,sys,datetime; d=json.load(sys.stdin); ts=float(d['data']['result'][0]['value'][1]); print(datetime.datetime.fromtimestamp(ts).strftime('%Y-%m-%d %H:%M:%S'))" ``` ### Step 4: 驗證 Telegram 收到通知 查看「AwoooI SRE戰情室」Telegram 群組,應收到格式化告警訊息。正式告警不再以 @tsenyangbot 個人 DM 作為收件通道。 --- ## Smoke Test 項目清單 (Wave A 8/8) | # | 項目 | 指令 | 預期 | |---|------|------|------| | 1 | API Health (6 組件) | `GET /health` | 200 + all healthy | | 2 | Alert Chain Metric | Prometheus query | timestamp ≤ 60 分鐘前 | | 3 | Alertmanager Webhook | `GET /api/v1/webhooks/health` | `{"status":"ok"}` | | 4 | SigNoz Webhook | `GET /api/v1/webhooks/signoz/health` | `{"status":"ok"}` | | 5 | Sentry Webhook | `GET /api/v1/webhooks/sentry/health` | `{"status":"ok"}` | | 6 | SigNoz | `GET http://192.168.0.188:3301` | HTTP 200 | | 7 | OTEL Collector | `kubectl get pods -n otel` | 2 Running | | 8 | Event Exporter | `kubectl get pods -n monitoring` | 1 Running | --- ## 已知問題與豁免 | Target | 狀態 | 原因 | |--------|------|------| | federation-k8s | down | SigNoz 內部 Prometheus,非外部暴露 | | kube-state-metrics | down | 僅 OTEL Collector 內部存取 | | node-exporter-120/121 | down | K8s 節點防火牆規則 | --- ## 回滾指令 ```bash # Phase 24 AI Router 回滾 kubectl set env deployment/awoooi-api -n awoooi-prod USE_AI_ROUTER=false # 重啟 API pod kubectl rollout restart deployment/awoooi-api -n awoooi-prod # 驗證 kubectl rollout status deployment/awoooi-api -n awoooi-prod ``` --- ## 歷史驗收記錄 | 日期 | 結果 | 備註 | |------|------|------| | 2026-04-02 21:22 | 8/8 ✅ | Wave A 首次全量通過 |