ADR-024 Router 層瘦身 R4: 將業務邏輯從 Router 移出至正確層次。 變更: - 新增 src/models/webhook.py: AlertPayload + AlertResponse 移至 models 層 - 新增 src/services/alert_analyzer_service.py: AlertAnalyzer (141行) 移至 services 層 - RISK_MAPPING / ACTION_MAPPING / BLAST_RADIUS_MAPPING 對應表 - analyze() 方法含 K8s 資源名稱正規化 (ADR-016) - webhooks.py: 移除重複定義,改為 import,-243行 Router 層 webhooks.py 已符合 ADR-024 禁止清單規範: AlertAnalyzer 不再存在於 Router 層。 R4 狀態: #127✅ #128✅ #129✅ #130✅ (全部完成) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
117 lines
3.6 KiB
Python
117 lines
3.6 KiB
Python
"""
|
|
Webhook API Schema - 告警接收 Pydantic 模型
|
|
=============================================
|
|
|
|
從 api/v1/webhooks.py 抽取至 models 層 (ADR-024 四層架構)
|
|
|
|
設計原則:
|
|
- AlertPayload: 外部告警接收格式 (Prometheus, K8s, Alertmanager 等)
|
|
- AlertResponse: 告警處理回應格式
|
|
- 不含業務邏輯,純資料結構
|
|
|
|
版本: v1.0
|
|
建立: 2026-04-01 (台北時區)
|
|
建立者: Claude Code (R4 Router 瘦身 #129)
|
|
"""
|
|
|
|
from typing import Literal
|
|
|
|
from pydantic import BaseModel, Field
|
|
|
|
|
|
class AlertPayload(BaseModel):
|
|
"""
|
|
外部告警 Payload
|
|
|
|
接收來自 Prometheus AlertManager、K8s Event Watcher、Grafana 等
|
|
外部監控系統的告警通知。
|
|
|
|
OpenClaw AI 會自動分析告警並建立待簽核卡片。
|
|
|
|
Example:
|
|
```json
|
|
{
|
|
"alert_type": "k8s_pod_crash",
|
|
"severity": "critical",
|
|
"source": "prometheus",
|
|
"target_resource": "harbor-core-7d4b8c9f5-xk2m3",
|
|
"namespace": "harbor",
|
|
"message": "Pod CrashLoopBackOff detected",
|
|
"metrics": {"restart_count": 5, "cpu_percent": 95}
|
|
}
|
|
```
|
|
"""
|
|
|
|
alert_type: Literal[
|
|
"k8s_node_failure", # K8s 節點故障
|
|
"k8s_pod_crash", # Pod 崩潰
|
|
"db_connection_timeout", # 資料庫連線超時
|
|
"service_404", # 服務 404 錯誤
|
|
"high_cpu", # CPU 飆高
|
|
"high_memory", # 記憶體飆高
|
|
"disk_full", # 磁碟滿
|
|
"ssl_expiry", # SSL 憑證即將過期
|
|
"custom", # 自訂告警
|
|
] = Field(..., description="告警類型")
|
|
|
|
severity: Literal["info", "warning", "critical"] = Field(
|
|
"warning",
|
|
description="告警嚴重度",
|
|
)
|
|
|
|
source: str = Field(
|
|
...,
|
|
description="告警來源 (例如: prometheus, k8s-event-watcher)",
|
|
)
|
|
|
|
target_resource: str = Field(
|
|
...,
|
|
description="受影響的資源 (例如: harbor, nginx-ingress-7d4b8c9f5-xk2m3)",
|
|
)
|
|
|
|
namespace: str = Field(
|
|
"default",
|
|
description="K8s Namespace",
|
|
)
|
|
|
|
message: str = Field(
|
|
...,
|
|
description="告警訊息",
|
|
)
|
|
|
|
metrics: dict | None = Field(
|
|
None,
|
|
description="相關指標數據 (例如: {cpu_percent: 95, memory_percent: 80})",
|
|
)
|
|
|
|
labels: dict | None = Field(
|
|
None,
|
|
description="告警標籤 (例如: {app: harbor, team: devops})",
|
|
)
|
|
|
|
|
|
class AlertResponse(BaseModel):
|
|
"""
|
|
告警處理回應
|
|
|
|
包含 OpenClaw AI 分析結果:
|
|
- 風險等級 (risk_level)
|
|
- 爆炸半徑 (透過 approval_id 查詢)
|
|
- 建議修復腳本 (suggested_action)
|
|
|
|
戰略 B 新增:
|
|
- hit_count: 告警聚合次數
|
|
- converged: 是否為收斂的重複告警
|
|
"""
|
|
|
|
success: bool = Field(..., description="處理是否成功")
|
|
message: str = Field(..., description="處理結果訊息")
|
|
alert_id: str | None = Field(None, description="告警唯一識別碼")
|
|
approval_created: bool = Field(False, description="是否已建立待簽核卡片")
|
|
approval_id: str | None = Field(None, description="待簽核卡片 ID (UUID)")
|
|
risk_level: str | None = Field(None, description="AI 判定風險等級 (low/medium/high/critical)")
|
|
suggested_action: str | None = Field(None, description="AI 建議修復腳本")
|
|
# 戰略 B: 告警風暴收斂
|
|
hit_count: int = Field(1, description="告警聚合次數 (相同指紋觸發次數)")
|
|
converged: bool = Field(False, description="是否為收斂的重複告警 (跳過 LLM)")
|