fix(auto-repair): Bug #5+#6 — SSH binary + affected_services 匹配修正
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled

Bug #5 (webhooks.py): target_resource 現在優先用 component label
  - SentryDown alert 有 labels.component="sentry"
  - 舊邏輯: labels.instance="192.168.0.110:9000" → Playbook affected_services 不匹配
  - 新邏輯: component → pod → instance → alertname

Bug #6 (Dockerfile): python:3.11-slim 無 openssh-client
  - SSH_COMMAND Playbook 執行路徑調用 asyncio.create_subprocess_exec("ssh", ...)
  - image 沒有 ssh binary → 所有 SSH 修復必然失敗
  - 修正: 在 production stage 安裝 openssh-client

服務清單: 補 sentry 主服務到 service-registry.yaml (AUTO 級別)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
OG T
2026-04-09 14:11:50 +08:00
parent 73ef9c6b12
commit 1fb0c0ca90
3 changed files with 20 additions and 1 deletions

View File

@@ -56,6 +56,10 @@ COPY apps/api/models.json ./models.json
# 2026-04-09 ogt: 規則引擎配置 — alert_rule_engine.py 從此檔載入規則
COPY apps/api/alert_rules.yaml ./alert_rules.yaml
# Install openssh-client — SSH_COMMAND Playbook 執行路徑需要 ssh binary
# (2026-04-09 Claude Sonnet 4.6 Asia/Taipei, Bug #6 修正 — python:3.11-slim 無 openssh-client)
RUN apt-get update && apt-get install -y --no-install-recommends openssh-client && rm -rf /var/lib/apt/lists/*
# Create non-root user
RUN useradd -m -u 1000 appuser && chown -R appuser:appuser /app
USER appuser

View File

@@ -1197,7 +1197,15 @@ async def alertmanager_webhook(
"warning"
)
target_resource = alert.labels.get("pod") or alert.labels.get("instance") or alertname
# 優先用 component labelDocker 層告警用 component如 SentryDown → "sentry"
# 次優 podK8s 告警),再次 instanceblackbox probe最後 alertname
# (2026-04-09 Claude Sonnet 4.6 Asia/Taipei, Bug #5 修正 — affected_services 匹配 Playbook)
target_resource = (
alert.labels.get("component")
or alert.labels.get("pod")
or alert.labels.get("instance")
or alertname
)
namespace = alert.labels.get("namespace", "default")
message = alert.annotations.get("summary") or alert.annotations.get("description") or alertname

View File

@@ -157,6 +157,13 @@ services:
stateful_level: AUTO
containers: ["blackbox-exporter"]
- name: sentry
display_name: "Sentry (錯誤追蹤)"
host: "192.168.0.110"
stateful_level: AUTO
reason: "Web server 無狀態docker compose up -d 即可恢復"
containers: ["sentry-web", "sentry-worker", "sentry-cron"]
- name: langfuse
display_name: "Langfuse (LLMOps)"
host: "192.168.0.110"