fix(auto-repair): Bug #5+#6 — SSH binary + affected_services 匹配修正
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
Bug #5 (webhooks.py): target_resource 現在優先用 component label - SentryDown alert 有 labels.component="sentry" - 舊邏輯: labels.instance="192.168.0.110:9000" → Playbook affected_services 不匹配 - 新邏輯: component → pod → instance → alertname Bug #6 (Dockerfile): python:3.11-slim 無 openssh-client - SSH_COMMAND Playbook 執行路徑調用 asyncio.create_subprocess_exec("ssh", ...) - image 沒有 ssh binary → 所有 SSH 修復必然失敗 - 修正: 在 production stage 安裝 openssh-client 服務清單: 補 sentry 主服務到 service-registry.yaml (AUTO 級別) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -56,6 +56,10 @@ COPY apps/api/models.json ./models.json
|
||||
# 2026-04-09 ogt: 規則引擎配置 — alert_rule_engine.py 從此檔載入規則
|
||||
COPY apps/api/alert_rules.yaml ./alert_rules.yaml
|
||||
|
||||
# Install openssh-client — SSH_COMMAND Playbook 執行路徑需要 ssh binary
|
||||
# (2026-04-09 Claude Sonnet 4.6 Asia/Taipei, Bug #6 修正 — python:3.11-slim 無 openssh-client)
|
||||
RUN apt-get update && apt-get install -y --no-install-recommends openssh-client && rm -rf /var/lib/apt/lists/*
|
||||
|
||||
# Create non-root user
|
||||
RUN useradd -m -u 1000 appuser && chown -R appuser:appuser /app
|
||||
USER appuser
|
||||
|
||||
@@ -1197,7 +1197,15 @@ async def alertmanager_webhook(
|
||||
"warning"
|
||||
)
|
||||
|
||||
target_resource = alert.labels.get("pod") or alert.labels.get("instance") or alertname
|
||||
# 優先用 component label(Docker 層告警用 component,如 SentryDown → "sentry")
|
||||
# 次優 pod(K8s 告警),再次 instance(blackbox probe),最後 alertname
|
||||
# (2026-04-09 Claude Sonnet 4.6 Asia/Taipei, Bug #5 修正 — affected_services 匹配 Playbook)
|
||||
target_resource = (
|
||||
alert.labels.get("component")
|
||||
or alert.labels.get("pod")
|
||||
or alert.labels.get("instance")
|
||||
or alertname
|
||||
)
|
||||
namespace = alert.labels.get("namespace", "default")
|
||||
message = alert.annotations.get("summary") or alert.annotations.get("description") or alertname
|
||||
|
||||
|
||||
@@ -157,6 +157,13 @@ services:
|
||||
stateful_level: AUTO
|
||||
containers: ["blackbox-exporter"]
|
||||
|
||||
- name: sentry
|
||||
display_name: "Sentry (錯誤追蹤)"
|
||||
host: "192.168.0.110"
|
||||
stateful_level: AUTO
|
||||
reason: "Web server 無狀態,docker compose up -d 即可恢復"
|
||||
containers: ["sentry-web", "sentry-worker", "sentry-cron"]
|
||||
|
||||
- name: langfuse
|
||||
display_name: "Langfuse (LLMOps)"
|
||||
host: "192.168.0.110"
|
||||
|
||||
Reference in New Issue
Block a user