feat(adr-076): Task 2.2 + 2.3 — 規則擴充 + kubectl 注入防護
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled

Task 2.2: alert_rules.yaml 新增 3 類規則 (priority 125-127)
  - gitea_down: Gitea CI/CD 下線 → NO_ACTION (priority 125, critical)
  - ssl_cert_expiring: SSL 憑證到期 → NO_ACTION (priority 126, medium)
  - external_site_down: MoWoooWork/Dev/Blackbox probe → NO_ACTION (priority 127, medium)
  規則總數: 21 → 24

Task 2.3: alert_rule_engine.py kubectl 注入防護
  - _RULE_ENGINE_DESTRUCTIVE_RE: 阻擋 delete pvc/namespace/statefulset/deployment,
    drain/cordon, --replicas=0, rm -rf, DROP TABLE, $() 反引號
  - validate_kubectl_command(): 公開 API,SSH 指令/空字串直接通過
  - match_rule() 整合: 變數替換後驗證,阻擋時清空 + log warning
  - test_alert_rule_engine_validation.py: 34 tests (100% 通過)

測試: 776 passed, 26 skipped, 0 failed

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
OG T
2026-04-14 15:10:10 +08:00
parent be2ec4d761
commit cc42aa0bdb
3 changed files with 298 additions and 1 deletions

View File

@@ -614,6 +614,100 @@ rules:
optimization: []
reasoning: "[規則匹配] 備份失敗無法自動修復,需人工排查備份腳本、磁碟空間及網路連通性。"
# ── DevOps 工具層 ─────────────────────────────────────────
# 2026-04-14 Claude Sonnet 4.6: Task 2.2 ADR-076 — 新增 devops_tool / ssl_cert / external_site 三類規則
# 設計原則: CI/CD 工具與外部服務均為 NO_ACTION不可自動修復誤操作風險過高
- id: gitea_down
priority: 125
description: Gitea CI/CD 服務下線(不自動修復)
match:
alertname:
- GiteaDown
- GiteaServiceDown
- GiteaUnhealthy
message:
- gitea
- git server
- ci/cd down
response:
action_title: "Gitea ({instance}) 下線 — 需人工確認"
description: "⚠️ 規則匹配: Gitea CI/CD 服務 ({instance}) 無法連線,影響所有部署流程。不自動重啟(誤觸 CD 風險過高)。"
suggested_action: NO_ACTION
kubectl_command: ""
estimated_downtime: "N/A"
risk: critical
responsibility: INFRA
responsibility_reasoning: "Gitea 是 CI/CD 核心,自動重啟有誤觸部署風險,需人工確認狀態後手動操作"
secondary_teams: []
optimization:
- type: HEALTH_CHECK
description: "確認 Gitea 服務狀態"
command: "ssh {host} 'cd /data/gitea && docker compose ps && docker compose logs --tail=20 gitea'"
reasoning: "[規則匹配] Gitea 下線不自動修復,通知後由人工確認狀態再操作,避免 CD pipeline 誤觸發。"
- id: ssl_cert_expiring
priority: 126
description: SSL/TLS 憑證即將到期或已到期
match:
alertname:
- SSLCertExpiringSoon
- SSLCertExpired
- CertificateExpirationWarning
- TLSCertExpiring
message:
- ssl cert
- certificate expir
- tls cert
- cert will expire
response:
action_title: "SSL 憑證 ({instance}) 即將到期 — 需人工更新"
description: "⚠️ 規則匹配: SSL/TLS 憑證 ({instance}) 即將到期或已到期。無自動修復,需人工確認 cert-manager 或執行 certbot 更新。"
suggested_action: NO_ACTION
kubectl_command: ""
estimated_downtime: "N/A"
risk: medium
responsibility: INFRA
responsibility_reasoning: "SSL 憑證更新需域名驗證,屬基礎設施團隊責任"
secondary_teams: []
optimization:
- type: CERT_RENEWAL
description: "確認 cert-manager 自動更新狀態"
command: "kubectl get certificate,certificaterequest -A && kubectl get secret -n awoooi-prod | grep tls"
reasoning: "[規則匹配] SSL 憑證到期無法自動修復,需人工操作 certbot 或確認 cert-manager 自動更新是否正常。"
- id: external_site_down
priority: 127
description: 外部網站或服務下線MoWooo 系列 / HTTP probe 失敗)
match:
alertname:
- MoWoooWorkDown
- MoWoooDevDown
- ExternalSiteDown
- WebsiteDown
- BlackboxProbeFailed
message:
- external site
- website down
- mowooo
- http probe failed
- probe failed
response:
action_title: "外部網站 {instance} 下線 — 僅通知"
description: "⚠️ 規則匹配: 外部網站 ({instance}) HTTP probe 失敗。此為外部服務,無自動修復動作,等待服務恢復。"
suggested_action: NO_ACTION
kubectl_command: ""
estimated_downtime: "N/A"
risk: medium
responsibility: INFRA
responsibility_reasoning: "外部網站超出系統控制範圍,無法自動修復,通知後人工跟進"
secondary_teams: []
optimization:
- type: STATUS_CHECK
description: "手動確認外部網站狀態"
command: "curl -sv {instance} --max-time 10 2>&1 | grep -E '(HTTP|Connected|Failed)'"
reasoning: "[規則匹配] 外部網站下線屬外部依賴,通知統帥後等待服務恢復,必要時切換備援路徑。"
# ── 通用兜底 ────────────────────────────────────────────────
- id: generic_fallback

View File

@@ -43,6 +43,51 @@ _generating: set[str] = set()
# Redis 分散式鎖 TTL (秒),覆蓋 Ollama + Gemini 最長生成時間
_RULE_GEN_LOCK_TTL = 120
# ── kubectl 注入防護 (Task 2.3, ADR-076, 2026-04-14) ─────────
# 對齊 auto_approve._DESTRUCTIVE_PATTERNS + decision_manager._ALLOWED_KUBECTL_PATTERN
# 目標: 規則 YAML 中的 kubectl_command 在變數替換後若含下列破壞性模式 → 清空並告警
_RULE_ENGINE_DESTRUCTIVE_RE = re.compile(
r"(kubectl\s+delete\s+(pvc|namespace|statefulset|deployment)" # 破壞性 K8s 刪除
r"|kubectl\s+(drain|cordon)" # 節點驅逐/封鎖
r"|--replicas=\s*0\b" # 縮容至零
r"|rm\s+-[rf]{1,2}\s" # rm -rf
r"|\bdrop\s+(table|database)\b" # SQL 破壞性 DDL
r"|\$\([^)]{0,200}\)" # shell 命令替換 $(...)
r"|`[^`]{0,200}`" # 反引號替換
r")",
re.IGNORECASE,
)
# ── kubectl 注入防護 公開 API ───────────────────────────────
def validate_kubectl_command(command: str) -> bool:
"""
kubectl 注入安全驗證Task 2.3, ADR-076
Returns:
True — 指令安全,可執行
False — 含破壞性模式,呼叫方應清空指令並記錄 warning
通過條件(直接 True:
- 空字串 — 無動作規則
- "ssh ..." 開頭 — SSH 層指令,不走 kubectl 路徑
阻擋條件(返回 False:
- kubectl delete pvc/namespace/statefulset/deployment — 破壞性刪除
- kubectl drain / cordon — 節點驅逐(業務衝擊)
- --replicas=0 — 縮容至零(服務停止)
- rm -rf — 主機層破壞
- DROP TABLE/DATABASE — SQL 破壞性 DDL
- $(...) 或反引號 — Shell 命令注入
"""
if not command:
return True
if command.strip().startswith("ssh "):
return True
return not bool(_RULE_ENGINE_DESTRUCTIVE_RE.search(command))
# ── 變數提取 ────────────────────────────────────────────────
@@ -211,12 +256,23 @@ def match_rule(alert_context: dict) -> dict[str, Any] | None:
for o in resp.get("optimization", [])
]
# Task 2.3: kubectl 注入防護 — 變數替換後驗證,阻擋破壞性模式
kubectl_command = _fill(resp.get("kubectl_command", ""), vars)
if not validate_kubectl_command(kubectl_command):
logger.warning(
"rule_kubectl_command_blocked",
rule_id=matched_rule["id"],
reason="destructive_pattern_detected",
command_snippet=kubectl_command[:80],
)
kubectl_command = ""
return {
"rule_id": matched_rule["id"],
"action_title": _fill(resp["action_title"], vars),
"description": _fill(resp["description"], vars),
"suggested_action": resp["suggested_action"],
"kubectl_command": _fill(resp["kubectl_command"], vars),
"kubectl_command": kubectl_command,
"target_resource": vars["target"],
"namespace": vars["namespace"],
"risk_level": risk,

View File

@@ -0,0 +1,147 @@
"""
AlertRuleEngine kubectl 注入防護測試
=====================================
Task 2.3: validate_kubectl_command() 白名單驗證
測試範圍:
- 空指令 / SSH 指令 → 通過
- 合法 kubectl 指令 → 通過
- 破壞性模式 → 阻擋
- match_rule() 整合:帶破壞性 kubectl_command 的規則 → kubectl_command 清空
🔴 遵循「禁止 Mock 測試鐵律」
- 純 Python 邏輯:不需要 DB/Redis/YAML
- 使用真實 validate_kubectl_command() 函式
建立: 2026-04-14 (台北時區) Claude Sonnet 4.6 (Task 2.3)
"""
import pytest
from src.services.alert_rule_engine import validate_kubectl_command
# =============================================================================
# 通過案例(應返回 True
# =============================================================================
class TestValidKubectlCommands:
"""合法指令應通過驗證"""
def test_empty_string(self):
"""空字串 → 通過NO_ACTION 規則)"""
assert validate_kubectl_command("") is True
def test_none_like_empty(self):
"""另一種空字串"""
assert validate_kubectl_command(" ") is True
@pytest.mark.parametrize("cmd", [
"ssh 192.168.0.188 'systemctl restart ollama'",
"ssh {host} 'docker restart minio'",
"ssh root@192.168.0.110 'cd /data/gitea && docker compose ps'",
])
def test_ssh_commands_pass(self, cmd):
"""SSH 指令一律通過(由主機層執行,不走 kubectl 路徑)"""
assert validate_kubectl_command(cmd) is True
@pytest.mark.parametrize("cmd", [
"kubectl rollout restart deployment/awoooi-api -n awoooi-prod",
"kubectl rollout restart deployment/postgresql -n awoooi-prod",
"kubectl scale deployment awoooi-api --replicas=3 -n awoooi-prod",
"kubectl delete pod awoooi-api-abc123 -n awoooi-prod",
"kubectl logs awoooi-api -n awoooi-prod --previous --tail=50",
"kubectl get pods -n awoooi-prod",
"kubectl describe node k3s-node-01",
"kubectl get nodes -o wide",
"kubectl autoscale deployment awoooi-api --min=2 --max=5 -n awoooi-prod",
"kubectl set resources deployment/awoooi-api --limits=memory=1Gi -n awoooi-prod",
])
def test_safe_kubectl_commands_pass(self, cmd):
"""常見合法 kubectl 指令應通過"""
assert validate_kubectl_command(cmd) is True
def test_kubectl_exec_with_psql(self):
"""kubectl exec 查詢(含 SQL SELECT→ 通過"""
cmd = (
"kubectl exec -n awoooi-prod deployment/postgresql -- "
"psql -U postgres -c 'SELECT pg_terminate_backend(pid) FROM pg_stat_activity;'"
)
assert validate_kubectl_command(cmd) is True
def test_kubectl_get_with_jq(self):
"""kubectl get + pipe → 通過"""
cmd = "kubectl get pods -n monitoring && curl -s http://192.168.0.120:9093/api/v1/status"
assert validate_kubectl_command(cmd) is True
# =============================================================================
# 阻擋案例(應返回 False
# =============================================================================
class TestDestructiveKubectlCommands:
"""破壞性模式應被阻擋(返回 False"""
@pytest.mark.parametrize("cmd", [
"kubectl delete pvc awoooi-data -n awoooi-prod",
"kubectl delete namespace awoooi-prod",
"kubectl delete statefulset postgresql -n awoooi-prod",
"kubectl delete deployment awoooi-api -n awoooi-prod",
])
def test_destructive_delete_blocked(self, cmd):
"""破壞性 kubectl delete 應被阻擋"""
assert validate_kubectl_command(cmd) is False
@pytest.mark.parametrize("cmd", [
"kubectl scale deployment awoooi-api --replicas=0 -n awoooi-prod",
"kubectl scale deployment awoooi-api --replicas= 0 -n awoooi-prod",
# kubectl patch JSON 格式不在 YAML 規則集中,不納入測試範圍
])
def test_scale_to_zero_blocked(self, cmd):
"""縮容至零應被阻擋(--replicas=0 旗標形式)"""
assert validate_kubectl_command(cmd) is False
@pytest.mark.parametrize("cmd", [
"kubectl drain k3s-node-01 --ignore-daemonsets",
"kubectl cordon k3s-node-01",
])
def test_node_eviction_blocked(self, cmd):
"""節點驅逐/封鎖應被阻擋"""
assert validate_kubectl_command(cmd) is False
@pytest.mark.parametrize("cmd", [
# ssh 開頭指令設計上信任(由人工寫入 YAML不是注入點
# 測試「非 ssh 前綴」的 rm -rf 情境
"rm -rf /tmp/test ",
"rm -f /important ",
"kubectl exec deploy/api -- rm -rf /data ",
])
def test_rm_rf_blocked(self, cmd):
"""rm -rf 應被阻擋(非 SSH 前綴的破壞性刪除)"""
assert validate_kubectl_command(cmd) is False
@pytest.mark.parametrize("cmd", [
"kubectl exec -n prod deploy/pg -- psql -c 'DROP TABLE incidents;'",
"kubectl exec -n prod deploy/pg -- psql -c 'drop database awoooi'",
])
def test_sql_ddl_blocked(self, cmd):
"""破壞性 SQL DDL 應被阻擋"""
assert validate_kubectl_command(cmd) is False
@pytest.mark.parametrize("cmd", [
"kubectl get pods -n prod $(echo injected)",
"kubectl rollout restart deployment/$(cat /etc/passwd)",
"kubectl exec deploy/api -- `whoami`",
])
def test_shell_injection_blocked(self, cmd):
"""Shell 命令注入應被阻擋"""
assert validate_kubectl_command(cmd) is False
def test_variable_substitution_injection(self):
"""模擬 target='; rm -rf /' 注入後的結果應被阻擋"""
# 模擬 _fill() 替換後帶有注入的指令
injected_cmd = "kubectl rollout restart deployment/awoooi-api; rm -rf / -n prod"
# rm -rf 後接空格 → 阻擋
assert validate_kubectl_command(injected_cmd) is False