1320 Commits

Author SHA1 Message Date
ogt
cb03f6b3e8 fix(ai-ops): HealLog DetachedInstanceError — expunge after commit
All checks were successful
CD Pipeline / deploy (push) Successful in 1m22s
session.close() 後存取 heal_log.result 觸發 lazy reload 失敗。
在 close 前 expunge(hl) 讓物件帶著已載入屬性脫離 session。

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 16:11:58 +08:00
ogt
e6642d5e17 fix(ai-ops): 修正 _init_autoheal_tables 建表順序 (Playbook 先於 Incident FK)
All checks were successful
CD Pipeline / deploy (push) Successful in 1m23s
incidents.playbook_id → FK → playbooks.id
建表必須先 Playbook 再 Incident,否則 psycopg2 報 UndefinedTable

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 16:09:47 +08:00
ogt
77d3a1da48 feat(ai-ops): ADR-013 AIOps 自動修復閉環完整實作
Some checks failed
CD Pipeline / deploy (push) Failing after 3m24s
架構(Exception → Incident → PlayBook → Heal → KM → Telegram):

新增元件:
- database/autoheal_models.py: Incident/Playbook/HealLog 三張表 + 7 條種子 PlayBook
- migrations/013_autoheal.sql: 建表 DDL + 種子資料(冪等 INSERT)
- services/auto_heal_service.py: 核心引擎 7 步閉環
  - _classify_error: 8 類錯誤自動分類 (DNS_FAIL/DB_UNREACHABLE/OOM/...)
  - _match_playbook: error_type + keyword + 冷卻 + max_retries 保護
  - _execute_playbook: DOCKER_RESTART/SSH_CMD/ALERT_ONLY/WAIT_RETRY
  - _sink_to_km: 修復知識寫入 ai_insights (auto_heal_playbook)
  - SSH 白名單:僅允許 docker restart / compose restart / docker start

修改元件:
- database/manager.py: _init_autoheal_tables() 啟動時建表+種子 PlayBook
- scheduler.py: 3 個核心任務植入 handle_exception
  (run_auto_import_task / run_icaim_analysis_task / run_weekly_strategy_task)
- requirements.txt: paramiko(SSH 跳板;不可用時降級 subprocess+CLI ssh)

安全設計: CMD 白名單 + cooldown + max_retries escalation + DB 冪等 migration

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 16:03:49 +08:00
ogt
7fbeaaf213 fix(ai-ops): Hermes L1 移除過緊 timeout + keep_alive 常駐
All checks were successful
CD Pipeline / deploy (push) Successful in 1m16s
問題盤點(2026-04-19 實地 SSH 111:11434):
- 我原本設 HERMES_TIMEOUT=30 是人為限制,AI 推理不該被綁
- 111 Ollama 實況:9 個模型共享,deepseek-r1:14b 會佔 VRAM
- hermes3 冷啟動 30+s(切換)/ warm 後 <1s(40x 差距)
- 30s timeout → 冷啟動必中 → 誤判 AI 掛 → 人為降級

修正:
- HERMES_TIMEOUT default 30 → 180(HERMES_TIMEOUT=0 代表無限制)
- 新增 keep_alive=24h payload,讓 hermes3 常駐 VRAM
  避免被其他客戶端(deepseek-r1 等)切換觸發冷啟動
- Memory reference_env_map.md 更新 111 實況(9 模型清單、切換陷阱、
  ADR-012 呼叫設定)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 14:25:28 +08:00
ogt
1fd1622007 feat(telegram): 全面切換 HTML parse_mode + 三層式視覺分隔
All checks were successful
CD Pipeline / deploy (push) Successful in 1m12s
起因:Markdown 舊版 parse_mode 導致 \[Demo] / task\_name 反斜線外漏,
且三層結構(事件資訊 / AI 加工區 / 原始技術細節)分隔線不夠明顯。

切換 HTML parse_mode(只需 escape & < >,不會有反斜線副作用):
- telegram_templates.py 全模板重寫為 HTML
  * <b>粗體</b> / <code>module</code> / <pre>trace</pre>
  * H_DIV (━×20) 節間強分隔 / L_DIV (─×18) 節內弱分隔
  * 新增 triaged_alert() 實作 ADR-012 §④ 三層式結構
    [事件資訊] → ━━━ → [🤖 AI 分析] → ━━━ → [🔍 原始技術細節]

event_router.py:
- _hermes_observe_parsed() 回結構化 dict {summary, cause, actions}
  取代舊的字串版本
- _render_l1/l2_with_fallback 改用 tpl.triaged_alert() 統一格式
- _send() parse_mode 改 HTML

Call sites 同步改 HTML:
- routes/bot_api_routes.py price_decision_notify
- services/openclaw_strategist_service.py 兩個發送處
- services/telegram_bot_service.py 三個 edit_message_text
  (_handle_price_approve / _handle_price_reject / _handle_ops_callback)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 13:54:44 +08:00
ogt
bda4edd23b feat(ai-ops): ADR-012 Phase 2/3/4 完整實作
All checks were successful
CD Pipeline / deploy (push) Successful in 1m11s
Phase 2 — Hermes L1 Observer 真實接入:
- services/event_router.py::_hermes_observe() 呼叫 hermes3:latest
  @192.168.0.111:11434/api/generate,做 stack trace 翻譯
- 輸出 JSON {summary, probable_cause, actions},容錯 markdown fence
- scheduler.py run_auto_import_task / run_momo_task 兩個 outer
  except 改走 event_router.dispatch(),帶完整 trace

Phase 3 — NemoTron L2 Investigator 規則式實作:
- event_router._L2_RULES: event_type → [(action, params)] 規則表
  • db_connection_error → query_km + retry_task(60s backoff)
  • crawler_timeout    → silence_alert(30min) + retry_task(300s)
  • nim_quota_exhausted → silence_alert(720min)
  • embedding_failure   → silence_alert(10min)
- agent_actions.retry_task 真實實作: threading.Timer + exponential
  backoff (60→120→240s) + _retry_state 追蹤 + ALLOWED_RETRY_TASKS
  白名單 + 非 scheduler 容器回 'deferred'

Phase 4 — L3 HITL Ops 擴充:
- agent_actions: pause_task / resume_task / force_retry_now / is_task_paused
- OPS_ACTIONS 白名單與 SAFE_ACTIONS 嚴格分離(L2 不可呼叫 L3)
- telegram_templates.ops_action_request(): 4 按鈕 inline keyboard
  (暫停1h / 暫停6h / 立即重試 / 解除暫停)
- telegram_bot_service._handle_ops_callback(): 接 momo:ops:<action>:<task>
- scheduler.py run_momo_task + run_auto_import_task 開頭加
  is_task_paused() 檢查(Phase 4 暫停機制生效)

安全邊界(ADR-012 §①):
- L1 Hermes 只讀 → 失敗降 L0 + 🟡 標記
- L2 NemoTron 只碰 ai_insights + 發 Telegram + SAFE_ACTIONS
- L3 OpenClaw 任意動作必經 HITL inline keyboard 批准
- 不做容器重啟按鈕(需 docker socket,風險過高)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 13:26:51 +08:00
ogt
0b4f80ee8a feat(ai-ops): Agent Action Ladder 骨幹(ADR-012 Phase 1)+ 週報套模板
All checks were successful
CD Pipeline / deploy (push) Successful in 1m14s
ADR-012 核心設計:
- 4 級信任邊界:L0 直出 / L1 Hermes 觀察 / L2 NemoTron 診斷執行 / L3 OpenClaw HITL
- 通知鏈絕不中斷:每級失敗立即降級,保底 L0 模板 + 🟡 標記
- Audit Trail:每次 dispatch 自動寫 ai_insights (insight_type=agent_action)
- 安全白名單:L2 可呼叫 6 個安全 action(retry/query_km/silence + 3 個既有 NemoTron tool)

新增檔案:
- services/event_router.py — 事件分流入口,按 severity × event_type 分 Tier
- services/agent_actions.py — 安全 action 白名單(Phase 1 stub + 完整介面)
- docs/adr/ADR-012-agent-action-ladder.md — 完整設計 + 分階段計畫

Phase 1 狀態:
- L0 直出完整可用 
- L1 Hermes / L2 NemoTron 為 stub(Phase 2/3 填實作)
- Fallback 降級鏈已完整 
- 靜音檢查(is_silenced)+ Audit Trail 已就緒 

處理既有 TODO:
- services/openclaw_strategist_service.py::_notify_telegram_group()
  改用 telegram_templates.report() 統一週報格式

全景盤點(新 memory):
- reference_telegram_endpoints_map.md — 21 個 Telegram 發送點
- feedback_agent_action_ladder.md — 操作規範
  (+ 既有 ADR-011 跨專案隔離規範一併生效)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 12:46:51 +08:00
ogt
528a6c0468 feat(telegram): 統一訊息格式模板(六類 + callback prefix)
All checks were successful
CD Pipeline / deploy (push) Successful in 1m12s
新增 services/telegram_templates.py:
- alert() 🚨 告警 / warning() ⚠️ 警告 / info() ℹ️ 資訊
- success()  成功 / report() 📊 報告 / price_decision() 💰 決策
- decision_result() 回執(edit_message 用)
- 全訊息標 [EwoooC] 前綴(跨專案共用 bot 識別來源,見 ADR-011)
- _escape_md() 處理 user input,避免 Markdown 破版
- _tail() 取 trace 末段,避開曠日 stack trace

接入點改用模板(P2/P3):
- routes/bot_api_routes.py price_decision_notify
- services/openclaw_strategist_service.py _send_price_decision_requests
- services/telegram_bot_service.py _handle_price_approve/reject
  callback_data 改用 momo: prefix(舊 pa:/pr: 向下相容)

尚未接入(待下次迭代):
- scheduler.py 各 task 錯誤通知
- _notify_telegram_group() 週報推播

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 12:28:23 +08:00
ogt
8d0b79cd00 feat(ops): restore Telegram chain + P2/P3 price decisions + ADR-011
All checks were successful
CD Pipeline / deploy (push) Successful in 1m19s
P2 (Inline Keyboard 降價決策):
- routes/bot_api_routes.py: POST /bot/api/price-decision/notify
- services/telegram_bot_service.py: pa:/pr: callback handlers

P3 (OpenClaw 自動觸發):
- services/openclaw_strategist_service.py: Gemini 週報末尾輸出
  PRICE_DECISIONS_JSON,解析後自動推送 inline keyboard 給 admin

Ops 修復(跨專案隔離與容器斷訊根因):
- ADR-011 全面規範多專案共存邊界、禁用 --remove-orphans
- .gitea/workflows/cd.yaml: sync 模式一次重啟三容器
  (原本僅 momo-pro-system,scheduler/telegram-bot 靜默落伍)
- run_telegram_bot.py: 從 scripts/tools/ 複製到根目錄
  (消滅 docker-compose mount 建空目錄的陷阱)
- CLAUDE.md: 補核心容器表、診斷黃金三句、緊急指令

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 12:25:04 +08:00
ogt
986908222d feat(openclaw): 週日 02:00 Meta-Analysis + 全排程表完成
All checks were successful
CD Pipeline / deploy (push) Successful in 1m6s
openclaw_strategist_service.py:
- generate_meta_analysis_report(): 從 ai_insights 抽取週統計
  (高頻 SKU / relearn 事件 / 歸檔數) → Gemini 綜合分析 → 雙寫 KM + Telegram

scheduler.py:
- run_openclaw_meta_analysis_task() 排程包裝

run_scheduler.py:
- 週日 02:00 掛入 run_openclaw_meta_analysis_task

P1 三層 Agent 自主學習排程全部完成:
  02:00 DB備份 / 03:00 去重 / 04:00 品質重算
  週一 07:00 週報 / 週日 02:00 Meta-Analysis

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 11:40:58 +08:00
ogt
2394d65634 feat(openclaw): 週報 KM 引用標注(citation footer)
All checks were successful
CD Pipeline / deploy (push) Successful in 1m12s
- _build_citation_footer(): 查詢當週 ai_insights 引用來源
  依日期+類型彙整,附結構化「📚 本報告引用來源」區塊
- generate_weekly_strategy_report():
  prompt 加入行內引用指令(引用自 YYYY-MM-DD ~ YYYY-MM-DD 的洞察)
  Gemini 回傳後自動追加 citation footer,連同週報雙寫入 ai_insights

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 11:39:22 +08:00
ogt
e6109c2ef8 feat(adr-005): 每日去重 03:00 + 品質分數重算 04:00 批次
All checks were successful
CD Pipeline / deploy (push) Successful in 1m8s
openclaw_learning_service.py:
- run_dedup_batch(): 同 SKU/type/period 保留最高 avg_quality,其餘 archived
- run_quality_rescore_batch(): 套時間衰減公式全量重算 avg_quality;
  relearn 狀態額外 -20%;分數 < 0.05 自動歸檔

scheduler.py + run_scheduler.py:
- run_dedup_batch_task()  → 每日 03:00
- run_quality_rescore_task() → 每日 04:00

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 11:38:01 +08:00
ogt
8c6fe961cb feat(nemoton): 新增 route_to_km + mark_for_relearn 工具
All checks were successful
CD Pipeline / deploy (push) Successful in 1m7s
- route_to_km: NIM 決策後靜默歸檔洞察到指定 KM 領域
  (price_competition / sales_anomaly / promotion_opportunity / market_trend)
- mark_for_relearn: 新數據推翻歷史洞察時,批次更新 ai_insights.status='relearn'
  + feedback_down+1,供品質分數重算批次感知
- TOOL_MAP 加入兩個新 handler,Python 獨裁層補 route_to_km threat 注入

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 11:26:48 +08:00
ogt
709efb6e37 feat(adr-004): NIM HTTP 429 → Hermes 規則引擎降級路由
All checks were successful
CD Pipeline / deploy (push) Successful in 1m10s
- _call_nim(): 429 不重試,立即拋出讓上層接管
- _hermes_rule_fallback(): 確定性四規則路由(gap/sales/risk 閾值),
  Telegram 告警加 🟡 降級前綴,行為與 NIM system prompt 一致
- dispatch(): 捕捉 HTTPError 429 → 轉 _hermes_rule_fallback(),
  回傳 nim_stats.degraded=True 供監控追蹤

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 11:23:59 +08:00
ogt
c49c2c4f6f fix: rebuild 模式加 --force-recreate 避免容器名稱衝突
Some checks failed
CD Pipeline / deploy (push) Has been cancelled
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 11:23:26 +08:00
ogt
62d8504d8f docs: 新增 ADR-010 Gitea CI/CD Pipeline,更新 CLAUDE.md
- ADR-010:記錄 ewoooc Gitea repo 建立、cd.yaml 設計、runner user-level 踩坑、rsync 缺失修復
- CLAUDE.md:新增 CI/CD 表格(repo URL、pipeline 模式、runner、Telegram 通知)
- docs/adr/README.md:加入 ADR-010 索引

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 11:21:53 +08:00
ogt
abefca99e5 chore: docker-compose 明確宣告 EMBEDDING_HOST 環境變數
Some checks failed
CD Pipeline / deploy (push) Failing after 10m59s
momo-app 與 scheduler 兩個 service 均加入
EMBEDDING_HOST=http://192.168.0.111:11434
確保 bge-m3 embedding 永遠走 Hermes 內網,不走公開 HTTPS (ADR-003)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 02:04:35 +08:00
ogt
676c711e7a feat: AI 治理完備 V10.3 — 技術債清零 + DB 備份機制 + 備份 AI 監控
Some checks are pending
CD Pipeline / deploy (push) Waiting to run
技術債清零 (2026-04-19):
- migrations/010: ai_insights 補 decay_exempt/avg_quality/status/ai_model/feedback 欄位
- migrations/011: embedding_retry_queue 持久化表 (ADR-009)
- migrations/012: backup_log 備份記錄表
- services/openclaw_learning_service: 記憶體 Queue → DB retry queue,時間衰減 RAG
- services/nemoton_dispatcher_service: 三個 tool 強制雙寫 ai_insights (_sink_insight_to_km)
- services/import_service: Excel 前置欄位防禦(商品名稱類 + 業績金額類)
- services/ollama_service: generate_embedding 新增 EMBEDDING_HOST env,embedding 永遠走 192.168.0.111
- SYSTEM_VERSION: V9.4 → V10.3

DB 備份機制:
- scripts/pg_backup.sh: host-level pg_dump 備份腳本,cron 每日 02:00,保留 7 天,Telegram 通知
- services/db_backup_service.py: Python 備份 service,寫入 backup_log
- scheduler: run_db_backup_task (02:00) + run_backup_monitor_task (每 6h AI Agent 監控)
- Dockerfile: 加入 postgresql-client

文件:
- CLAUDE.md: 環境架構依 ADR-008 實地重寫,含完整 SSH/Docker 部署 SOP
- PROJECT_CONSTITUTION.md: 內容已整合入 CLAUDE.md,刪除重複檔案

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 02:03:45 +08:00
ogt
30e4485142 fix: add rsync+ssh install step in CD pipeline
All checks were successful
CD Pipeline / deploy (push) Successful in 1m36s
2026-04-19 01:48:40 +08:00
ogt
1b4f3a7bbe feat: EwoooC 初始化 — 完整專案推版至 Gitea
Some checks failed
CD Pipeline / deploy (push) Failing after 59s
- 建立 Gitea Actions CD pipeline (.gitea/workflows/cd.yaml)
- 部署模式: rsync Python 檔案至 188 → docker restart (volume mount)
- Dockerfile/requirements 變動時自動重建 Docker image
- 部署通知: Telegram (開始/成功/失敗)
- 健康檢查: https://mo.wooo.work/health (最多 5 次重試)
- 同步最新 CLAUDE.md / ADR-008 / memory (2026-04-19)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 01:21:13 +08:00