40 Commits

Author SHA1 Message Date
OoO
938b9fe963 fix: 修正 CD 同步判斷與正式版本驗證
All checks were successful
CD Pipeline / deploy (push) Successful in 1m5s
2026-05-17 21:01:33 +08:00
OoO
a6100a3d01 ci(observability): centralize deploy gate detection
All checks were successful
CD Pipeline / deploy (push) Successful in 3m2s
2026-05-05 23:47:34 +08:00
OoO
8cb82d4cd5 ci(observability): include QA entrypoints in deploy gate
Some checks failed
CD Pipeline / deploy (push) Has been cancelled
2026-05-05 23:43:34 +08:00
OoO
215bd9b73c ci(observability): verify CSS mirror instead of mutating runner
Some checks failed
CD Pipeline / deploy (push) Has been cancelled
2026-05-05 23:40:45 +08:00
OoO
4380fa641c ci(observability): gate frontend deploys with QA suite
Some checks failed
CD Pipeline / deploy (push) Has been cancelled
2026-05-05 23:39:00 +08:00
OoO
9bc6664dc0 fix(p37): cd.yaml SPA shadow grep pipefail bug — 真正修好 CD failure
All checks were successful
CD Pipeline / deploy (push) Successful in 2m29s
P34/P36 都沒打到的 root cause:
  ETAG=$(echo "$HDR" | grep -i '^etag:' | ...)
  當 grep 找不到匹配 (mo.wooo.work /health 不帶 etag header),
  grep exit 1 → bash pipefail → 變數賦值整行 exit 1 →
  set -e 殺掉整個 script → run 280/281 同樣位置死。

修:每個 grep pipeline 結尾補 `|| true` 兜底,empty result 不殺 script。

本機 bash -eo pipefail 模擬實 prod /health response:
  ETAG=[] CLEN=[64] XPT=[]
  FLASK_OK=1 (CLEN=64 != 7480 觸發 PASS)
 預期下個 CD run 該 step 綠
2026-05-04 14:34:05 +08:00
OoO
64fe4fb651 fix(p36): cd.yaml SPA shadow 偵測 bash -e exit bug 修復
Some checks failed
CD Pipeline / deploy (push) Failing after 2m27s
run 280 failure 根因:P34 寫 `[ -n "$XPT" ] && [ "$X" != "0" ] && FLASK_OK=1`
三條 && 串連在 Gitea Actions 的 bash -e 模式下,第一條 -n 判斷 false
就 exit 1(empty XPT 是常態,因 mo.wooo.work /health 不帶 x-process-time)。

改 if/then/fi block — 純條件分支不影響 exit code。

驗證真 prod 已通:
- mo.wooo.work/observability/ai_calls 回 35700 byte Flask login 重導頁
  (session cookie 正常 set,35700 != 7480 SPA shell)
- mo.wooo.work/admin/ai_calls 回 404(P32 改名後正確不存在)
我 27-35 phase 全部活在 prod 上,只是 192.168.0.188 LAN 是別 project 干擾。
2026-05-04 14:30:18 +08:00
OoO
927d7072ce fix(p34): cd.yaml 加 SPA Shadow 偵測 — 防 nginx fallback 偽綠
Some checks failed
CD Pipeline / deploy (push) Has been cancelled
過去 5 個 deploy(run 273-277)全 success 但 prod 上 Flask 從未接到請求 —
nginx 對所有路徑 fallback 到 SPA index.html (7480 byte / etag e167a58a...) —
原健康檢查只看 HTTP 200,被 SPA shell 200 騙過。

新增第 3 階段檢查(接在原 HTTP 200 retry + 三容器驗證之後):
驗 /health response 三條 fingerprint 任一不符 SPA shell 即 Flask 真接到:
  (a) Content-Length != 7480
  (b) etag != e167a58a1baf907f55a2925a2e8665d1
  (c) x-process-time header 存在(Flask middleware 加的,nginx static 不會帶)

三條全失敗 = SPA 攔截 → 推 Telegram + exit 1(CD 紅)。
TELEGRAM secrets 未設時跳過告警不阻 deploy。

修了過去那種「我推 commit、CD 全綠、實際 prod 0 影響」的盲點。
2026-05-04 14:21:42 +08:00
OoO
47fe375952 fix(ci): CD migration apply 邏輯 hotfix — 改跑全 v5.0 範圍(024-099)
All checks were successful
CD Pipeline / deploy (push) Successful in 3m42s
統帥 2026-05-04 Telegram 報錯:「ai_calls relation does not exist」

根因:
  cd.yaml 原邏輯 `git diff HEAD~1 HEAD -- migrations/` 只看單一 commit。
  v5.0 migrations 024-028 在 commit 4648673(最早),後續 12 個 commit
  都不含 migration → CD「自動 apply」step 一次都沒觸發。
  → ai_calls / mcp_calls / ai_call_budgets / rag_query_log /
    learning_episodes / embedding_signature 全部缺表 / 缺欄位。

修補:改邏輯跑 migrations/02[4-9]_*.sql + 03[0-9]_*.sql 等 v5.0 範圍
  - 所有 v5.0 migration 是 IF NOT EXISTS / WHERE NOT EXISTS 冪等保證
    (critic-A11 第 1 輪 B2/H1/H2/H3/M1/M2 修補時加的)
  - 重跑 100% 無害,已建立的表會被 IF NOT EXISTS 跳過
  - 026 / 027 / 028 含 ivfflat/CONCURRENTLY 走 transactionless apply

push 後 CD 跑完 → ai_calls 表立即存在 → 23:55 token 日報恢復正常

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 23:59:04 +08:00
OoO
2b218589bd ci(cd): 自動 apply pending migrations + paths trigger 補 migrations/**
All checks were successful
CD Pipeline / deploy (push) Successful in 3m23s
- paths trigger 加 migrations/** → DB schema 變更自動觸發 CD
- 新增「套用待跑 migration」step → CD 自動跑 git diff HEAD~1 範圍內的 SQL
- 026 含 CONCURRENTLY 不包 -1 transaction(critic-A11 B2 修補一致)
- 失敗只 warn 不中斷 deploy(migrations 設計為 IF NOT EXISTS / WHERE NOT EXISTS 冪等)

merge 後第一次部署即會自動 apply migrations 024/025/026,
無需統帥 SSH 188 跑 psql。

Operation Ollama-First v5.0 / Phase 6 收尾 / CD 自動化補洞

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 23:12:20 +08:00
OoO
6bce46bbc7 fix(runtime): 強化健康檢查監控韌性
All checks were successful
CD Pipeline / deploy (push) Successful in 2m29s
2026-05-01 14:46:49 +08:00
OoO
f9fec4706e fix(ci): 修正 Gitea Actions workflow YAML
All checks were successful
CD Pipeline / deploy (push) Successful in 1m46s
2026-05-01 00:15:03 +08:00
OoO
73c7ddcee0 fix(cd): 使用 inplace rsync 保留 bind mount inode 2026-04-30 23:32:59 +08:00
OoO
d06c7016dc fix(cd): 修復 sync 版本 mount drift 2026-04-30 23:24:54 +08:00
OoO
f282ddc18c fix(cd): sync 模式改用 app 熱重載
Some checks failed
CD Pipeline / deploy (push) Has been cancelled
2026-04-30 15:11:57 +08:00
OoO
6e480449c1 fix(ci): 隔離 EWOOOC Gitea runner label
Some checks failed
CD Pipeline / deploy (push) Has been cancelled
2026-04-30 14:38:35 +08:00
OoO
3949720998 docs(ci): 標註 EWOOOC runner image 判讀方式
All checks were successful
CD Pipeline / deploy (push) Successful in 1m34s
2026-04-30 09:30:00 +08:00
OoO
3193f1979d 縮短 CD rebuild 切換停機窗口
Some checks failed
CD Pipeline / deploy (push) Failing after 1m6s
2026-04-30 09:25:49 +08:00
OoO
8bd44b1131 修復 CD sync 後未 reload
Some checks are pending
CD Pipeline / deploy (push) Waiting to run
2026-04-30 09:02:29 +08:00
OoO
5a569d1e05 強化 CD 健康檢查重試
All checks were successful
CD Pipeline / deploy (push) Successful in 1m32s
2026-04-30 08:58:22 +08:00
OoO
d33a59d027 ci(cd): 納入 gunicorn config 變更觸發
Some checks are pending
CD Pipeline / deploy (push) Waiting to run
2026-04-30 00:20:16 +08:00
OoO
832030b6de fix(cd): sync 模式改用 compose up -d 取代 restart,根除 502 復發
All checks were successful
CD Pipeline / deploy (push) Successful in 1m13s
根因:cd.yaml sync 模式用 `docker compose restart`,對「不存在的容器」
直接報錯 → 任何外力(人工 docker rm、orphan 清理、別專案連動)清掉容器後,
下次 sync 部署 100% 失敗 → 健康檢查 5 連 502。

修法:
- sync 模式改 `docker compose up -d --no-deps`:image 未變時對既存容器
  no-op(觸發熱掛載),對不存在則自動 create
- 緊急回滾 step 同步從 `docker restart momo-pro-system ...` 改 `compose up -d`,
  否則容器不存在時連回滾都救不回

驗證:2026-04-28 15:33 跑 P0 救急已成功讓 4/4 容器 healthy + HTTP 200。

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 15:34:30 +08:00
OoO
227b114101 fix(ci): use docker compose restart instead of hardcoded container names in sync mode
All checks were successful
CD Pipeline / deploy (push) Successful in 1m13s
2026-04-28 13:36:23 +08:00
OoO
1d49c66159 fix(ci): use --no-cache for docker build to bypass cache snapshot corruption
Some checks failed
CD Pipeline / deploy (push) Failing after 57s
2026-04-28 13:15:38 +08:00
OoO
6924c8ea8a fix(ci): rebuild guard 容器名稱錯誤 momo-postgres → momo-db
All checks were successful
CD Pipeline / deploy (push) Successful in 1m16s
2026-04-28 10:42:24 +08:00
ogt
86d80d3f2a fix: cd.yaml rsync 加 --ignore-errors || true,徹底防止 code 23 中斷部署
All checks were successful
CD Pipeline / deploy (push) Successful in 1m44s
雙重防護:
1. --ignore-errors:rsync 遇到 attr/type 錯誤繼續而非中止
2. || true:即使 rsync 以非 0 退出,整個 step 也不失敗

根本原因已修(templates/components symlink 在 188 恢復正確),
這兩個 flag 作為永久安全閥,防止殘留 Docker run 歷史債再次卡死 CD。

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 02:13:24 +08:00
ogt
5076a9e51d fix: cd.yaml rsync code 23 — 排除 root 擁有的 alertmanager 目錄
Some checks failed
CD Pipeline / deploy (push) Failing after 1m0s
根本原因:rsync -t 嘗試更新 monitoring/alertmanager/ 時間戳,
但該目錄由 root 擁有,ollama 無寫入權,觸發 code 23。

新增排除規則:
- --exclude='monitoring/alertmanager/' (root-owned, alertmanager.yml 不在 git)
- --exclude='._*' (macOS 舊 rsync 遺留的資源 fork 檔案)

已同步修改 sync 與 rebuild 兩個 rsync 指令。

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 01:51:01 +08:00
ogt
5761aeb1ce fix(cd): 修復 CD Pipeline 11 項安全/可靠性問題
All checks were successful
CD Pipeline / deploy (push) Successful in 1m24s
🔴 Critical:
  C1 commit message injection: 所有 ${{ }} 值改走 env: 區塊隔離,不直接嵌入 shell
  C2 SSH StrictHostKeyChecking: 改用 known_hosts 驗證,支援 SSH_HOST_KEY secret

🟠 High:
  H1 rsync excludes 對齊: Rebuild 模式補齊 .gitea/ .claude/ docs/ *.md 等 7 條
  H2 --force-recreate: Rebuild 模式加入強制重建,防止靜默更新失敗
  H3 健康檢查強化: 加入 SSH 驗三容器 Running 狀態(scheduler/telegram-bot)
  H4 緊急回滾: 部署失敗時自動嘗試 docker restart 三容器回復服務
  H5 ADR-011 守衛: Rebuild 前確認 momo-postgres 存活才繼續

🟡 Medium:
  M1 .claude/ 加入 rsync excludes(不同步 hook 腳本至 188)
  M2 *.md 加入 rsync excludes(根目錄 markdown 不需同步)
  M3 workflow_dispatch 新增 force_rebuild boolean 輸入
  M4 cancel-in-progress 已知風險記入檔頭注意事項

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-22 01:53:19 +08:00
ogt
b6fa303ab3 fix(cd): 新增 scripts/** 與 .claude/** 至 CI/CD 觸發路徑
All checks were successful
CD Pipeline / deploy (push) Successful in 1m21s
修正:scripts/ 與 .claude/ 變更不觸發 CD Pipeline 的異常
補上觸發規則後,review.md + tg_notify.sh 的變更也會進入 Actions

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-22 01:44:47 +08:00
ogt
0099543c05 fix(security): 全域健檢 — 40 項安全/Bug/品質修復
Some checks failed
CD Pipeline / deploy (push) Failing after 5m18s
🔴 Critical
- auto_heal_service: 補 import re + sqlalchemy.text + 修正 orchestrator 變數名
  + autoheal_playbook→playbooks 表名 + _alert_and_store cooldown 修復
- aider_heal_executor: shell injection 改 shell=False + list 參數
- docker-compose: DISABLE_LOGIN 改 env var + 移除密碼 fallback + POSTGRES_HOST 修正
- app.py: /api/backup /api/run_task 等 6 個管理 API 加 @login_required
- config.py + pg_sync + e2e_test: 移除 wooo_pg_2026 hardcoded 密碼 fallback
- pg_backup.sh: 移除 TELEGRAM_TOKEN= 中間變數,直接用 $TELEGRAM_BOT_TOKEN
- migration 014: trigger_pattern→match_pattern + 補 error_type NOT NULL 欄位

🟡 High
- telegram_bot_service: str(e) 改通用訊息 + session try/finally + 移除 pa:/pr: 舊 callback
- run_scheduler: ElephantAlpha thread 死亡監控 + 自動重啟 + Telegram 告警
  + agent_context 03:30 TTL 定時清理任務
- openclaw_learning_service: build_rag_context 兩路徑加 .limit(200)
- hooks: commit-quality + momo-prod-guard 空 catch 改 stderr+exit(1)
- scripts/code_review: auto_yes 預設改 false
- db_backup_service: PGPASSWORD 透過 env dict 傳遞

📦 Migrations
- 013_autoheal: 修正建表順序 playbooks→incidents(外鍵前向引用)
- 018_add_missing_indexes: heal_logs/incidents 外鍵索引 + cleanup_expired_agent_context()

🟢 Infrastructure
- requirements.txt: 加版本下界 Flask>=2.3 SQLAlchemy>=1.4 等
- cd.yaml: 新增 run_scheduler.py + run_telegram_bot.py 監聽路徑
- .gitignore: insert_playbook_local.py 加入忽略

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-22 01:12:23 +08:00
ogt
2e0de960ce feat(code-review): 重建為 Post-Deploy AI Agent Pipeline
All checks were successful
CD Pipeline / deploy (push) Successful in 1m21s
架構重建:
- 移除 pre-commit hook(本機 commit 不再阻塞)
- 改為 CD 健康檢查通過後自動觸發 webhook

新建 services/code_review_pipeline_service.py:
  5-Step Pipeline(後台 daemon thread)
  Step1 system        讀取部署後變更檔案內容
  Step2 Hermes        程式碼掃描(bugs/security/perf,hermes3:latest)
  Step3 OpenClaw      架構品質評估(Gemini 2.5 Flash)
  Step4 ElephantAlpha 決策協調(severity + auto_fix 裁量)
  Step5 NemoTron      action_plans 寫入 + AiderHeal 觸發
  全程 Telegram 告警(啟動/完成/錯誤)+ ai_insights DB 持久化

重建 routes/code_review_routes.py:
  POST /code-review/api/internal/trigger  CD webhook(X-Internal-Token)
  GET  /code-review/api/status            前端即時 polling
  GET  /code-review/api/history           歷史清單
  GET  /code-review/                      前端儀表板

重建 templates/code_review.html:
  深色儀表板,Pipeline 即時進度 + Severity 分佈 + 問題清單 + EA 決策
  3s polling(running)/ 30s(idle)

.gitea/workflows/cd.yaml:
  健康檢查通過後注入「觸發 AI Code Review」step
  continue-on-error: true(不影響部署結果)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 20:55:23 +08:00
ogt
704f5b6538 fix: restore full scheduler + telegram-bot + fix momo-app network isolation
All checks were successful
CD Pipeline / deploy (push) Successful in 1m55s
三個關鍵修復:
1. momo-app 加入 momo-pro_default 網路 → 修復 momo-db DNS 解析失敗(crash loop)
2. 新增 telegram-bot compose 服務 → momo-telegram-bot 容器從未啟動,小龍蝦群組零訊息
3. 重寫 run_scheduler.py → 完整載入 scheduler.py 13 個真實排程任務
4. 新增 run_telegram_bot.py 至 repo(原本只存在 server,未納入版控)
5. cd.yaml 同步更新:三容器 restart/rebuild(app/scheduler/telegram-bot)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-20 19:48:32 +08:00
ogt
fca235eb8d fix: close missing double-quote in sync restart step (shell parse error)
All checks were successful
CD Pipeline / deploy (push) Successful in 1m18s
Line 134 was missing the closing " after the echo statement:
  echo '...'   (broken)
  echo '...'"  (fixed)

Caused: 'unexpected EOF while looking for matching"'
2026-04-20 06:49:32 +08:00
ogt
2ffbe06eab fix: resolve container name conflict in rebuild CD step
Some checks failed
CD Pipeline / deploy (push) Failing after 45s
'docker compose up --force-recreate' fails when the existing container
was started by a different compose invocation, leaving a stale container
with the same name. Error: 'container name already in use'.

Fix: explicitly stop + rm the two containers before compose build & up.
Using 2>/dev/null to ignore errors if containers are already stopped.
Removed --force-recreate (no longer needed after explicit rm).
2026-04-20 06:46:04 +08:00
ogt
456c031955 fix: remove defunct momo-telegram-bot from all CD/compose references
Some checks failed
CD Pipeline / deploy (push) Failing after 1m20s
CD was failing with 'No such container: momo-telegram-bot' because
the Gitea Actions restart step still listed all three containers.

Changes:
1. .gitea/workflows/cd.yaml:
   - Sync mode: docker restart now only targets momo-pro-system momo-scheduler
   - Rebuild mode: docker compose up no longer includes telegram-bot service

2. docker-compose.yml:
   - Removed telegram-bot service block (38 lines)
   - Syncs local repo with remote server state (already removed there)
2026-04-20 06:19:44 +08:00
ogt
69df1436b7 ci: rebuild 模式同時重建 scheduler + telegram-bot 容器
All checks were successful
CD Pipeline / deploy (push) Successful in 1m27s
三容器共用同一 image,rebuild 後只重建 momo-app 會導致
scheduler/telegram-bot 繼續用舊 image(如 paramiko 遺失)。
改為 --force-recreate momo-app scheduler telegram-bot 統一更新。

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 16:43:12 +08:00
ogt
8d0b79cd00 feat(ops): restore Telegram chain + P2/P3 price decisions + ADR-011
All checks were successful
CD Pipeline / deploy (push) Successful in 1m19s
P2 (Inline Keyboard 降價決策):
- routes/bot_api_routes.py: POST /bot/api/price-decision/notify
- services/telegram_bot_service.py: pa:/pr: callback handlers

P3 (OpenClaw 自動觸發):
- services/openclaw_strategist_service.py: Gemini 週報末尾輸出
  PRICE_DECISIONS_JSON,解析後自動推送 inline keyboard 給 admin

Ops 修復(跨專案隔離與容器斷訊根因):
- ADR-011 全面規範多專案共存邊界、禁用 --remove-orphans
- .gitea/workflows/cd.yaml: sync 模式一次重啟三容器
  (原本僅 momo-pro-system,scheduler/telegram-bot 靜默落伍)
- run_telegram_bot.py: 從 scripts/tools/ 複製到根目錄
  (消滅 docker-compose mount 建空目錄的陷阱)
- CLAUDE.md: 補核心容器表、診斷黃金三句、緊急指令

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 12:25:04 +08:00
ogt
c49c2c4f6f fix: rebuild 模式加 --force-recreate 避免容器名稱衝突
Some checks failed
CD Pipeline / deploy (push) Has been cancelled
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 11:23:26 +08:00
ogt
30e4485142 fix: add rsync+ssh install step in CD pipeline
All checks were successful
CD Pipeline / deploy (push) Successful in 1m36s
2026-04-19 01:48:40 +08:00
ogt
1b4f3a7bbe feat: EwoooC 初始化 — 完整專案推版至 Gitea
Some checks failed
CD Pipeline / deploy (push) Failing after 59s
- 建立 Gitea Actions CD pipeline (.gitea/workflows/cd.yaml)
- 部署模式: rsync Python 檔案至 188 → docker restart (volume mount)
- Dockerfile/requirements 變動時自動重建 Docker image
- 部署通知: Telegram (開始/成功/失敗)
- 健康檢查: https://mo.wooo.work/health (最多 5 次重試)
- 同步最新 CLAUDE.md / ADR-008 / memory (2026-04-19)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 01:21:13 +08:00