SigNoz Log-Based Alert Rules
2026-04-05 Claude Code: Sprint 2 — 日誌告警
設定步驟
- 開啟 http://192.168.0.188:3301/alerts
- 點擊 "New Alert Rule" → "Logs Based Alert"
- 依照下表填入各欄位
- Notification Channel: 選擇 awoooi-api (指向 /api/v1/webhooks/signoz)
- 保存並啟用
驗證 Webhook 鏈路:
Rule 1: API 高錯誤日誌率
| 欄位 |
值 |
| Name |
APIHighErrorLogRate |
| Type |
Logs Based Alert |
| Query |
service.name = "awoooi-api" AND severity_text = "ERROR" |
| Condition |
Count > 10 per 5m |
| For |
5m |
| Severity |
warning |
| Labels |
layer=k8s, component=api, team=backend |
| Message |
API 錯誤日誌率過高 ({{ $value }} 次/5分鐘) |
Rule 2: Worker 任務失敗
| 欄位 |
值 |
| Name |
WorkerTaskFailed |
| Type |
Logs Based Alert |
| Query |
service.name = "awoooi-worker" AND (body CONTAINS "task_failed" OR body CONTAINS "Unhandled exception") |
| Condition |
Count > 5 per 5m |
| For |
5m |
| Severity |
warning |
| Labels |
layer=k8s, component=worker, team=backend |
| Message |
Worker 任務失敗次數過高 ({{ $value }} 次/5分鐘) |
Rule 3: Pod OOM Kill
| 欄位 |
值 |
| Name |
PodOOMKilled |
| Type |
Logs Based Alert |
| Query |
body CONTAINS "OOMKilled" OR body CONTAINS "OutOfMemory" |
| Condition |
Count > 0 per 1m |
| For |
1m |
| Severity |
critical |
| Labels |
layer=k8s, component=k8s, team=ops |
| Message |
偵測到 Pod OOM Kill 事件 |
Rule 4: Telegram Polling 失敗
| 欄位 |
值 |
| Name |
TelegramPollingFailed |
| Type |
Logs Based Alert |
| Query |
service.name = "awoooi-api" AND body CONTAINS "telegram_polling_error" |
| Condition |
Count > 3 per 5m |
| For |
5m |
| Severity |
critical |
| Labels |
layer=k8s, component=api, team=platform |
| Message |
Telegram Polling 連續失敗,機器人可能無回應 |
Rule 5: Nemotron 全部超時
| 欄位 |
值 |
| Name |
NemotronAllTimeout |
| Type |
Logs Based Alert |
| Query |
service.name = "awoooi-api" AND body CONTAINS "nemotron_tool_call_timeout" |
| Condition |
Count > 5 per 5m |
| For |
5m |
| Severity |
warning |
| Labels |
layer=k8s, component=ai, team=ai |
| Message |
Nemotron tool call 頻繁超時,AI 功能可能降級 |
與 Prometheus 標籤規範對齊
所有 SigNoz alert 必須包含:
layer: k8s (pod 內的日誌)
component: 對應服務名稱
team: 負責團隊
source: signoz (用於 auto_repair 路由決策)
這樣讓 AWOOOI API 的 auto_repair_service 能用相同的 layer/component 邏輯決定修復路徑。