This commit is contained in:
@@ -224,17 +224,21 @@ jobs:
|
||||
# ── 健康檢查(H3: HTTP + 三容器狀態雙重驗證) ─────────────────────────
|
||||
- name: 健康檢查
|
||||
run: |
|
||||
echo "⏳ 等待服務啟動(15s)..."
|
||||
sleep 15
|
||||
for i in $(seq 1 5); do
|
||||
HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" https://mo.wooo.work/health --max-time 10 || echo "000")
|
||||
if [ "$HTTP_CODE" = "200" ]; then
|
||||
echo "✅ HTTP 健康檢查通過(HTTP $HTTP_CODE)"
|
||||
echo "⏳ 等待服務啟動(30s)..."
|
||||
sleep 30
|
||||
for i in $(seq 1 12); do
|
||||
INTERNAL_CODE=$(ssh -i ~/.ssh/id_deploy ollama@192.168.0.188 \
|
||||
"docker exec momo-pro-system curl -s -o /dev/null -w '%{http_code}' --max-time 8 http://127.0.0.1:80/health" 2>/dev/null || true)
|
||||
EXTERNAL_CODE=$(curl -s -o /dev/null -w "%{http_code}" https://mo.wooo.work/health --max-time 10 2>/dev/null || true)
|
||||
INTERNAL_CODE=${INTERNAL_CODE:-000}
|
||||
EXTERNAL_CODE=${EXTERNAL_CODE:-000}
|
||||
if [ "$INTERNAL_CODE" = "200" ] && [ "$EXTERNAL_CODE" = "200" ]; then
|
||||
echo "✅ HTTP 健康檢查通過(internal=$INTERNAL_CODE, external=$EXTERNAL_CODE)"
|
||||
break
|
||||
fi
|
||||
echo "⏳ 嘗試 $i/5,HTTP $HTTP_CODE,等待 10s..."
|
||||
[ "$i" -eq 5 ] && echo "❌ HTTP 健康檢查失敗" && exit 1
|
||||
sleep 10
|
||||
echo "⏳ 嘗試 $i/12,internal=$INTERNAL_CODE external=$EXTERNAL_CODE,等待 15s..."
|
||||
[ "$i" -eq 12 ] && echo "❌ HTTP 健康檢查失敗" && exit 1
|
||||
sleep 15
|
||||
done
|
||||
# 驗證三應用容器均在 Running 狀態
|
||||
ssh -i ~/.ssh/id_deploy ollama@192.168.0.188 \
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
|
||||
> 本文件定義專案開發的核心準則與不可違反的規範
|
||||
> **建立日期**: 2026-01-12
|
||||
> **當前版本**: V10.11 (四 AI Agent 自動化 Metrics Scrape 修復版)
|
||||
> **當前版本**: V10.12 (CD 健康檢查強化版)
|
||||
> **最後更新**: 2026-04-29
|
||||
|
||||
---
|
||||
|
||||
@@ -18,6 +18,7 @@
|
||||
- Grafana 線上部署:188 active Grafana 已載入 4 個 dashboard,`MOMO AI Automation Overview` provisioning 成功。
|
||||
- Prometheus scrape 修復:active monitoring stack 新增 `momo-app` scrape job,目標 `momo-pro-system:80/metrics`。
|
||||
- Gunicorn preload 修復:`post_fork` 略過 Flask/Werkzeug request-bound LocalProxy,避免 worker boot fail。
|
||||
- CD 健康檢查強化:改為 internal container health + external `mo.wooo.work` 雙檢查,重試窗延長到約 3 分鐘。
|
||||
|
||||
【下次待辦】
|
||||
- 觀察 Prometheus scrape 後 `momo_ai_*` 是否在事件發生後產生時間序列。
|
||||
|
||||
4
app.py
4
app.py
@@ -95,8 +95,8 @@ except Exception as e:
|
||||
sys_log.error(f"無法檢測磁碟空間: {e}")
|
||||
|
||||
# 🚩 系統版本定義 (備份與顯示用)
|
||||
# 🚩 2026-04-30 V10.11: Gunicorn preload guard + AI metrics scrape
|
||||
SYSTEM_VERSION = "V10.11"
|
||||
# 🚩 2026-04-30 V10.12: CD health check internal/external hardening
|
||||
SYSTEM_VERSION = "V10.12"
|
||||
|
||||
# ==========================================
|
||||
# 🔒 SQL Injection 防護函數
|
||||
|
||||
@@ -253,7 +253,7 @@ YOUTUBE_API_KEY = os.getenv('YOUTUBE_API_KEY', '')
|
||||
# ==========================================
|
||||
# 系統版本與路徑
|
||||
# ==========================================
|
||||
SYSTEM_VERSION = "V10.11"
|
||||
SYSTEM_VERSION = "V10.12"
|
||||
LOG_FILE_PATH = os.path.join(BASE_DIR, 'logs/system.log')
|
||||
public_url = PUBLIC_URL # 用於模板顯示
|
||||
|
||||
|
||||
@@ -63,6 +63,7 @@
|
||||
- **原因**: 110 與 188 之間的 SSH 隧道中斷。
|
||||
- **檢查**: 在 110 執行 `curl -I http://127.0.0.1:5003/health`。
|
||||
- **修復**: 在 110 執行 `ssh -fN -L 5003:127.0.0.1:5003 ollama@192.168.0.188` 重啟隧道。
|
||||
- **CD 判斷**: 先確認 188 內部 `docker exec momo-pro-system curl http://127.0.0.1:80/health`,再看外部 `https://mo.wooo.work/health`;若 internal 已 200 但 external 502,多半是 Nginx/tunnel 短暫延遲。
|
||||
|
||||
### 2. CI/CD 報錯 `parent snapshot ... not found`
|
||||
- **原因**: Docker Buildx 快取損壞。
|
||||
|
||||
@@ -18,6 +18,7 @@
|
||||
- 2026-04-30 active Grafana 已載入 4 個 dashboard;AI dashboard 檔案同步到 188 實際掛載目錄 `monitoring/grafana/provisioning/dashboards/json/`。
|
||||
- 2026-04-30 active Prometheus 補 `momo-app` scrape job,目標 `momo-pro-system:80/metrics`;Prometheus 需加入 `momo-network` 才能解析 app container DNS。
|
||||
- 2026-04-30 發現並修復 `gunicorn.conf.py` `post_fork` 掃到 Flask/Werkzeug LocalProxy 導致 worker boot fail 的問題。
|
||||
- 2026-04-30 CD 健康檢查曾因 rebuild 後短暫 502 太早失敗;已改為 internal `docker exec momo-pro-system /health` + external `https://mo.wooo.work/health` 雙檢查,重試約 3 分鐘。
|
||||
|
||||
## 已落地範圍
|
||||
|
||||
@@ -48,6 +49,7 @@
|
||||
- 2026-04-29 AI Grafana observability + AI core 回歸:`36 passed`,collect-only:`36 tests collected`。
|
||||
- 2026-04-30 Gunicorn LocalProxy 修復:新增 `tests/test_gunicorn_config.py`。
|
||||
- 2026-04-30 Prometheus scrape 修復:新增 `tests/test_prometheus_ai_automation_scrape.py`。
|
||||
- 2026-04-30 CD health check hardening:新增 `tests/test_cd_health_check.py`。
|
||||
- 2026-04-29 L2 安全記憶批次:`24 passed`。
|
||||
- collect-only:`48 tests collected`。
|
||||
- `git diff --check` 已通過。
|
||||
|
||||
@@ -31,6 +31,7 @@
|
||||
- **Smoke 每日摘要推播**: 新增 Telegram 手動推播 API 與 momo-scheduler 每日 09:10 摘要任務,只讀 smoke history。
|
||||
- **Grafana AI 觀測**: 新增 `MOMO AI Automation Overview` provisioning dashboard,覆蓋 EventRouter、safe action、replay、AutoHeal Prometheus 指標。
|
||||
- **Grafana 線上載入與 scrape 修復**: 188 active Grafana 載入 4 dashboards;active Prometheus 補 `momo-app` scrape job,並修復 gunicorn preload LocalProxy boot crash。
|
||||
- **CD 健康檢查強化**: Gitea Actions health check 改為 internal container health + external URL 雙檢查,降低 rebuild 後短暫 502 誤判。
|
||||
|
||||
### 2026-04-28~29:Phase 3e 重構大戰 + daily_sales cache 隱形 bug 根除
|
||||
- **app.py 縮減 -10.8%**: 7,386 → 6,590 行,11 commits 全綠零 502。
|
||||
|
||||
22
tests/test_cd_health_check.py
Normal file
22
tests/test_cd_health_check.py
Normal file
@@ -0,0 +1,22 @@
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
ROOT = Path(__file__).resolve().parents[1]
|
||||
CD_WORKFLOW = ROOT / ".gitea/workflows/cd.yaml"
|
||||
|
||||
|
||||
def test_cd_health_check_allows_slow_rebuild_warmup():
|
||||
workflow = CD_WORKFLOW.read_text(encoding="utf-8")
|
||||
|
||||
assert "等待服務啟動(30s)" in workflow
|
||||
assert "seq 1 12" in workflow
|
||||
assert "等待 15s" in workflow
|
||||
|
||||
|
||||
def test_cd_health_check_validates_internal_and_external_health():
|
||||
workflow = CD_WORKFLOW.read_text(encoding="utf-8")
|
||||
|
||||
assert "docker exec momo-pro-system curl" in workflow
|
||||
assert "http://127.0.0.1:80/health" in workflow
|
||||
assert "https://mo.wooo.work/health" in workflow
|
||||
assert 'internal=$INTERNAL_CODE, external=$EXTERNAL_CODE' in workflow
|
||||
Reference in New Issue
Block a user