Files
awoooi/k8s/monitoring/minio-kali-alerts.yaml
OG T 99be215e83 fix(monitoring): R1 Review 修正 — Blackbox DNS/PSA label/告警閾值
Critical: Blackbox Exporter replacement 從 K8s DNS 改為主機 IP (192.168.0.188:9115)
Important: Descheduler namespace 顯式宣告 PSA restricted labels
Suggestion: failedJobsHistoryLimit 3→1, 新增 MinioDiskUsageCritical 5% 告警

R1 Review by: 首席架構師 (Phase O-1)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 14:02:50 +08:00

68 lines
2.4 KiB
YAML
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# =============================================================================
# MinIO + Kali 告警規則 - Phase O-1.3/O-1.4
# =============================================================================
# 建立者: Claude Code (首席架構師)
# 日期: 2026-04-02 (台北時間)
# 部署位置: 192.168.0.188 /etc/prometheus/rules/minio-kali-alerts.yaml
# =============================================================================
groups:
- name: minio-alerts
rules:
# MinIO 服務離線
- alert: MinioDown
expr: up{job="minio"} == 0
for: 5m
labels:
severity: critical
service: minio
annotations:
summary: "MinIO 儲存服務離線"
description: "MinIO ({{ $labels.instance }}) 已離線超過 5 分鐘Velero 備份將失敗"
# MinIO 磁碟使用率過高
- alert: MinioDiskUsageHigh
expr: minio_cluster_capacity_usable_free_bytes / minio_cluster_capacity_usable_total_bytes < 0.2
for: 10m
labels:
severity: warning
service: minio
annotations:
summary: "MinIO 磁碟剩餘空間不足 20%"
description: "MinIO 可用空間: {{ $value | humanizePercentage }}"
# R1 Review: 加 5% critical 磁碟告警作為最後防線
- alert: MinioDiskUsageCritical
expr: minio_cluster_capacity_usable_free_bytes / minio_cluster_capacity_usable_total_bytes < 0.05
for: 5m
labels:
severity: critical
service: minio
annotations:
summary: "MinIO 磁碟剩餘空間不足 5% (緊急)"
description: "MinIO 可用空間: {{ $value | humanizePercentage }},備份即將失敗"
# MinIO 離線磁碟
- alert: MinioOfflineDisk
expr: minio_cluster_disk_offline_total > 0
for: 5m
labels:
severity: critical
service: minio
annotations:
summary: "MinIO 有離線磁碟"
description: "離線磁碟數: {{ $value }}"
- name: kali-alerts
rules:
# Kali Scanner 不可達
- alert: KaliScannerDown
expr: probe_success{job="blackbox-kali"} == 0
for: 10m
labels:
severity: warning
service: kali-scanner
annotations:
summary: "Kali Scanner API 不可達"
description: "192.168.0.112:8080 TCP 探測失敗超過 10 分鐘"