OG T
|
a421d2c5b8
|
feat(ops): Plan A docker-health-monitor.sh — Docker 容器健康監控自動修復
- 偵測 unhealthy / exited / dead 容器
- 排除清單: DB(PG/Redis)、Gitea、監控棧
- Prometheus/Grafana/Alertmanager exited → docker start (保護 WAL)
- 必須三段式通知: Intent→Action→Result (首席架構師裁示)
- HMAC-SHA256 簽章 → AWOOOI API /api/v1/webhooks/custom-alert
- Fallback: API down → 直接 Telegram Bot API
- 冷卻期 300s,防止重複修復
部署: cron */5 * * * * on 192.168.0.110 + 192.168.0.188
設定: /etc/awoooi-ops/secrets.env
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-08 11:48:39 +08:00 |
|
OG T
|
b688eeecb7
|
fix(ops): seed 腳本支援 API_BASE 環境變數
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-05 12:23:55 +08:00 |
|
OG T
|
76f7330c9d
|
feat(api): POST /playbooks/ 建立端點 + seed-repair-playbooks.py (Task 14)
CD Pipeline / build-and-deploy (push) Failing after 57s
CD Pipeline / Deploy Prometheus Alert Rules (push) Has been skipped
- playbooks.py: 新增 POST / 端點供直接建立 Playbook (seed/管理用)
- seed-repair-playbooks.py: 5個 Host Repair Playbooks (ssh_command)
sentry/harbor/gitea/alertmanager (110) + openclaw (188)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-05 11:53:49 +08:00 |
|
OG T
|
7becdcbaf6
|
ops(scripts): 加入 deploy-alerts.sh 自動部署 Prometheus 規則
功能: 驗證 YAML → 備份 → scp → reload → 驗證規則數+關鍵規則
同步啟用 Prometheus --web.enable-lifecycle (110 docker-compose.yml)
部署驗證: 28 條規則全部 ✅,關鍵規則 SentryDown/HarborDown/GiteaDown/OpenClawDown/AlertmanagerDown/AlertChainUnhealthy 已上線
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-05 02:29:21 +08:00 |
|