14 Commits

Author SHA1 Message Date
OoO
b7838b382f V10.502 修正 AiderHeal 自動修復診斷 2026-05-31 16:49:45 +08:00
OoO
9ec7713e6f 補 AiderHeal 靜默失敗診斷
All checks were successful
CD Pipeline / deploy (push) Successful in 56s
2026-05-13 12:25:35 +08:00
OoO
b65a319cb8 固化 Ollama 三主機路由紅線
Some checks failed
CD Pipeline / deploy (push) Has been cancelled
2026-05-13 12:09:40 +08:00
OoO
c4d39cf544 串接 AiderHeal 共用 SSH helper
Some checks failed
CD Pipeline / deploy (push) Has been cancelled
2026-05-12 23:52:28 +08:00
OoO
078bf2683c fix(adr-027): Phase 2 — ADR-027 4 破洞修補 + 移除寫死 111
config.py — B1+B2 lazy resolve
- get_ollama_host() 取代 import-time freeze 的 OLLAMA_HOST
- get_embedding_host() 取代 EMBEDDING_HOST
- 主機切換時不需重啟 Python 進程

services/ollama_service.py — B3+B4 三主機級聯
- resolve_ollama_host(primary, secondary, fallback) 三主機級聯
  - Primary:   34.143.170.20 (SSD) — GCP 主主機
  - Secondary: 34.21.145.224 (SSD) — 同等效能備援
  - Fallback:  192.168.0.111 (HDD) — 最後一道防線
- _is_reachable: HTTP /api/version probe 取代 TCP socket(防 process 卡死假活)
- mark_unhealthy(host) 即時失效 cache,30s 內跳過該主機
- 14 unit tests 全綠

services/aider_heal_executor.py — N2
- 移除寫死 192.168.0.111,改用 get_ollama_host()
- AiderHeal 終於遵循 ADR-027 GCP 優先策略

Operation Ollama-First v5.0 / Phase 2 A6

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 23:05:11 +08:00
OoO
abd722986e fix(autoheal): preflight + shell || true 結合性 — 解 24h 100% no-op
All checks were successful
CD Pipeline / deploy (push) Successful in 2m23s
Debugger 五階段方法論 root cause(2026-05-03):
ADR-020 政策層 100% 達成(reasoning 全 auto_fix=enabled),但 AiderHeal
執行層 24 小時 0 次 push 成功,全部 silent fail。

兩根因疊加:
  #1 (config) AIDER_REPO_PATH=/home/wooo/ewoooc 在 110 主機不存在
            → 寫 SOP docs/runbooks/aider-heal-110-setup-sop.md 給統帥手動執行
  #2 (code)  setup_cmds 結尾 `git stash 2>&1 || true` 因 shell 結合性等同
            `(A && B && C && D) || true`,cd 失敗整 chain rc=0 被吞,
            line 261 if rc != 0 永不觸發 → setup_failed 從未被 log
  #4 (code)  缺 preflight,環境壞掉時靜默走完整 pipeline 印 no_diff

本次程式碼修復:
  • execute_code_fix 開頭加 L0 preflight(test -d $REPO/.git)
    失敗 fail-fast + Telegram 嚴重告警 + 指向 SOP
  • setup_cmds 改 `A && B && C && (D || true)` 用 subshell 限縮 || true
  • 全檔 5 處 `cd $REPO_PATH` 統一改 `cd shlex.quote(REPO_PATH)`
    避免下次有人複製 cd chain 又踩同類 shell quoting bug

SOP 同步處理 critic High-2 + Medium-6:
  • 步驟 2 改用 SSH clone(git clone gitea-autoheal:...)
    避免 HTTP clone 在 private repo 卡帳密 + 跟步驟 1 部署的 key 不關聯
  • 步驟 4b 修引號嵌套(heredoc + 單引號保護),原版永遠 false positive

Critic 審過 Approve to commit;Medium-2/3/4(速率限制 / log 加 stderr /
新增 preflight unit test)排 follow-up,不阻擋本次。

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 01:55:34 +08:00
OoO
6cad59f83e feat(code-review): ADR-020 全自動修復政策 — 拆掉 CRITICAL/HIGH HITL 閘門
All checks were successful
CD Pipeline / deploy (push) Successful in 2m23s
post-deploy code review pipeline 改為「任何 finding 一律觸發 AiderHeal」,
局部覆寫 ADR-012 L3 HITL(不影響 schema migration / 流量切換 /
customer-facing 廣播 / AIOps prod SSH 等其他 L3 場景)。安全網改為
Git revert + Gitea CI/CD 健康檢查 + 主開關 CODE_REVIEW_AUTO_FIX_ENABLED。

實作:
  • _ea_orchestrate / _guard_ea_decision / rule fallback 三條路徑統一為
    has_findings AND AUTO_FIX_ENABLED → auto_fix=true
  • _guard 強制 LLM 即使回 auto_fix=False 也升級為 true(核心保證)
  • CODE_REVIEW_AUTO_FIX_ENABLED 預設 false → true
  • Telegram 文案移除「需人工審查」,改顯示主開關狀態
  • action_plan status pending_review → auto_disabled(語意對齊)
  • aider_heal_executor 標頭 ADR-014 → ADR-020、補「直推 main」分支策略

文件:
  • 新增 docs/adr/ADR-020-code-review-full-autoheal.md
  • ADR-012 加 Note 行反向引用 ADR-020
  • README 索引收錄

測試:tests/test_code_review_pipeline_security.py 反轉 HITL 期望,
新增 5 case(含 LLM 降級被 guard 拒絕、LLM human_review_needed=true 被改 false)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 23:44:01 +08:00
OoO
3e9d53c98c refactor(telegram): migrate aider_heal_executor sender to EventRouter (ADR-019 Phase 5)
services/aider_heal_executor.py 的 _notify_telegram() 為 ADR-013 AutoHeal 閉環
通知出口(aider 自動修復進度)。原直接 POST sendMessage,timeout=5s(非阻塞)。
改走 services.event_router.dispatch_sync()。

行為變化:
- 失敗仍靜默 pass,caller(execute_code_fix 等)行為完全不變
- severity=warning 會被 EventRouter classify 為 L0 或 L1(無 trace 時 L0),
  輕量分流不會拖慢自動修復流程
- 享 EventRouter 內建 retry + JSONL queue replay;舊版 timeout=5 失敗即丟訊息
  的問題改善(可從 queue 重送)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 13:09:34 +08:00
OoO
bc7113bc86 fix: ElephantAlpha crash, AiderHeal Ollama host, MCP integration for Hermes/NemoTron, and MCP hallucination
All checks were successful
CD Pipeline / deploy (push) Successful in 1m18s
2026-04-28 12:11:33 +08:00
ogt
d5c0feab5e fix: Telegram bot 全功能修復 — 16個await按鈕/AI對話/模型遷移/DB schema
All checks were successful
CD Pipeline / deploy (push) Successful in 1m35s
## Telegram Bot 功能修復
- 補全 16 個 await: 按鈕的 handler(日期選擇/目標設定/促銷追蹤等),
  新增 _handle_await_callback + _process_await_input 完整狀態機
- cmd: 按鈕加入  即時回饋 + try/except 防 BadRequest
- handle_callback 加頂層 try/except 錯誤兜底
- 補 momo:cmd:suggestion + momo:menu:main callback handler
- 修復 _enhanced_keyword_matching context NameError

## AI 模型遷移(hermes3@111 → qwen2.5@188)
- hermes_analyst_service: URL 192.168.0.111→188, hermes3→qwen2.5:7b-instruct
- code_review_pipeline: 改用 HERMES_URL/HERMES_MODEL 常數
- elephant_alpha_orchestrator / nemoton_dispatcher: registry/footprint 同步
- aider_heal_executor: OLLAMA_API_BASE fallback 改 188
- ai_routes: footprint display 字串改 qwen2.5:7b-instruct

## ElephantAlpha 404 修復
- elephant_service: openrouter→NVIDIA NIM, nvidia/llama-3.1-nemotron-ultra-253b-v1
- ai_provider: 模型 ID 同步更新

## TELEGRAM_CHAT_ID 環境變數修正
- cicd_routes + aider_heal_executor: 優先讀 TELEGRAM_CHAT_IDS[0],
  fallback TELEGRAM_CHAT_ID,修復通知靜默失敗

## AI 對話 logging 改善
- telegram_ai_integration: Hermes 降級改 WARNING,OpenClaw 失敗加 exc_info
- hermes_analyst_service: 連線失敗 log 加 host/model context

## DB Schema 修復
- migrations/019: action_plans 補齊全欄位,DROP NOT NULL action_type
- autoheal_models: ActionPlan ORM 同步為超集 schema

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 03:30:14 +08:00
ogt
0099543c05 fix(security): 全域健檢 — 40 項安全/Bug/品質修復
Some checks failed
CD Pipeline / deploy (push) Failing after 5m18s
🔴 Critical
- auto_heal_service: 補 import re + sqlalchemy.text + 修正 orchestrator 變數名
  + autoheal_playbook→playbooks 表名 + _alert_and_store cooldown 修復
- aider_heal_executor: shell injection 改 shell=False + list 參數
- docker-compose: DISABLE_LOGIN 改 env var + 移除密碼 fallback + POSTGRES_HOST 修正
- app.py: /api/backup /api/run_task 等 6 個管理 API 加 @login_required
- config.py + pg_sync + e2e_test: 移除 wooo_pg_2026 hardcoded 密碼 fallback
- pg_backup.sh: 移除 TELEGRAM_TOKEN= 中間變數,直接用 $TELEGRAM_BOT_TOKEN
- migration 014: trigger_pattern→match_pattern + 補 error_type NOT NULL 欄位

🟡 High
- telegram_bot_service: str(e) 改通用訊息 + session try/finally + 移除 pa:/pr: 舊 callback
- run_scheduler: ElephantAlpha thread 死亡監控 + 自動重啟 + Telegram 告警
  + agent_context 03:30 TTL 定時清理任務
- openclaw_learning_service: build_rag_context 兩路徑加 .limit(200)
- hooks: commit-quality + momo-prod-guard 空 catch 改 stderr+exit(1)
- scripts/code_review: auto_yes 預設改 false
- db_backup_service: PGPASSWORD 透過 env dict 傳遞

📦 Migrations
- 013_autoheal: 修正建表順序 playbooks→incidents(外鍵前向引用)
- 018_add_missing_indexes: heal_logs/incidents 外鍵索引 + cleanup_expired_agent_context()

🟢 Infrastructure
- requirements.txt: 加版本下界 Flask>=2.3 SQLAlchemy>=1.4 等
- cd.yaml: 新增 run_scheduler.py + run_telegram_bot.py 監聽路徑
- .gitignore: insert_playbook_local.py 加入忽略

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-22 01:12:23 +08:00
ogt
31dfbcdd4d fix(i18n): 強制 Elephant Alpha Gemini 回應繁體中文
All checks were successful
CD Pipeline / deploy (push) Successful in 1m20s
- aider_heal_executor.py:全檔簡體→繁體,所有 Telegram 通知節點繁化
- elephant_alpha_orchestrator.py:system prompt 與 user prompt 雙層加入語言強制指令,確保 reasoning/expected_outcome 等欄位輸出繁體中文

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 12:22:13 +08:00
ogt
bf5f0d256a fix(aiops): resolve ADR-014 logical bugs
- Fixed target_file context passing in auto_heal_service
- Fixed docker log scanning inside momo-scheduler using SSHJumpExecutor
- Fixed AiderHealExecutor SSH key path
2026-04-20 23:25:49 +08:00
ogt
3127466a85 feat(aiops): implement ADR-014 Autonomous Code Heal Pipeline
All checks were successful
CD Pipeline / deploy (push) Successful in 1m14s
- Added AiderHealExecutor for SSH remote execution of aider-chat
- Added CODE_FIX action_type to AutoHealService
- Added code_exception trigger to Elephant Alpha engine (Traceback log scanning)
- Added 014 playbook migration script
2026-04-20 23:13:32 +08:00