ogt
|
77d3a1da48
|
feat(ai-ops): ADR-013 AIOps 自動修復閉環完整實作
CD Pipeline / deploy (push) Failing after 3m24s
架構(Exception → Incident → PlayBook → Heal → KM → Telegram):
新增元件:
- database/autoheal_models.py: Incident/Playbook/HealLog 三張表 + 7 條種子 PlayBook
- migrations/013_autoheal.sql: 建表 DDL + 種子資料(冪等 INSERT)
- services/auto_heal_service.py: 核心引擎 7 步閉環
- _classify_error: 8 類錯誤自動分類 (DNS_FAIL/DB_UNREACHABLE/OOM/...)
- _match_playbook: error_type + keyword + 冷卻 + max_retries 保護
- _execute_playbook: DOCKER_RESTART/SSH_CMD/ALERT_ONLY/WAIT_RETRY
- _sink_to_km: 修復知識寫入 ai_insights (auto_heal_playbook)
- SSH 白名單:僅允許 docker restart / compose restart / docker start
修改元件:
- database/manager.py: _init_autoheal_tables() 啟動時建表+種子 PlayBook
- scheduler.py: 3 個核心任務植入 handle_exception
(run_auto_import_task / run_icaim_analysis_task / run_weekly_strategy_task)
- requirements.txt: paramiko(SSH 跳板;不可用時降級 subprocess+CLI ssh)
安全設計: CMD 白名單 + cooldown + max_retries escalation + DB 冪等 migration
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-19 16:03:49 +08:00 |
|
ogt
|
bda4edd23b
|
feat(ai-ops): ADR-012 Phase 2/3/4 完整實作
CD Pipeline / deploy (push) Successful in 1m11s
Phase 2 — Hermes L1 Observer 真實接入:
- services/event_router.py::_hermes_observe() 呼叫 hermes3:latest
@192.168.0.111:11434/api/generate,做 stack trace 翻譯
- 輸出 JSON {summary, probable_cause, actions},容錯 markdown fence
- scheduler.py run_auto_import_task / run_momo_task 兩個 outer
except 改走 event_router.dispatch(),帶完整 trace
Phase 3 — NemoTron L2 Investigator 規則式實作:
- event_router._L2_RULES: event_type → [(action, params)] 規則表
• db_connection_error → query_km + retry_task(60s backoff)
• crawler_timeout → silence_alert(30min) + retry_task(300s)
• nim_quota_exhausted → silence_alert(720min)
• embedding_failure → silence_alert(10min)
- agent_actions.retry_task 真實實作: threading.Timer + exponential
backoff (60→120→240s) + _retry_state 追蹤 + ALLOWED_RETRY_TASKS
白名單 + 非 scheduler 容器回 'deferred'
Phase 4 — L3 HITL Ops 擴充:
- agent_actions: pause_task / resume_task / force_retry_now / is_task_paused
- OPS_ACTIONS 白名單與 SAFE_ACTIONS 嚴格分離(L2 不可呼叫 L3)
- telegram_templates.ops_action_request(): 4 按鈕 inline keyboard
(暫停1h / 暫停6h / 立即重試 / 解除暫停)
- telegram_bot_service._handle_ops_callback(): 接 momo:ops:<action>:<task>
- scheduler.py run_momo_task + run_auto_import_task 開頭加
is_task_paused() 檢查(Phase 4 暫停機制生效)
安全邊界(ADR-012 §①):
- L1 Hermes 只讀 → 失敗降 L0 + 🟡 標記
- L2 NemoTron 只碰 ai_insights + 發 Telegram + SAFE_ACTIONS
- L3 OpenClaw 任意動作必經 HITL inline keyboard 批准
- 不做容器重啟按鈕(需 docker socket,風險過高)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-19 13:26:51 +08:00 |
|
ogt
|
986908222d
|
feat(openclaw): 週日 02:00 Meta-Analysis + 全排程表完成
CD Pipeline / deploy (push) Successful in 1m6s
openclaw_strategist_service.py:
- generate_meta_analysis_report(): 從 ai_insights 抽取週統計
(高頻 SKU / relearn 事件 / 歸檔數) → Gemini 綜合分析 → 雙寫 KM + Telegram
scheduler.py:
- run_openclaw_meta_analysis_task() 排程包裝
run_scheduler.py:
- 週日 02:00 掛入 run_openclaw_meta_analysis_task
P1 三層 Agent 自主學習排程全部完成:
02:00 DB備份 / 03:00 去重 / 04:00 品質重算
週一 07:00 週報 / 週日 02:00 Meta-Analysis
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-19 11:40:58 +08:00 |
|
ogt
|
e6109c2ef8
|
feat(adr-005): 每日去重 03:00 + 品質分數重算 04:00 批次
CD Pipeline / deploy (push) Successful in 1m8s
openclaw_learning_service.py:
- run_dedup_batch(): 同 SKU/type/period 保留最高 avg_quality,其餘 archived
- run_quality_rescore_batch(): 套時間衰減公式全量重算 avg_quality;
relearn 狀態額外 -20%;分數 < 0.05 自動歸檔
scheduler.py + run_scheduler.py:
- run_dedup_batch_task() → 每日 03:00
- run_quality_rescore_task() → 每日 04:00
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-19 11:38:01 +08:00 |
|
ogt
|
676c711e7a
|
feat: AI 治理完備 V10.3 — 技術債清零 + DB 備份機制 + 備份 AI 監控
CD Pipeline / deploy (push) Waiting to run
技術債清零 (2026-04-19):
- migrations/010: ai_insights 補 decay_exempt/avg_quality/status/ai_model/feedback 欄位
- migrations/011: embedding_retry_queue 持久化表 (ADR-009)
- migrations/012: backup_log 備份記錄表
- services/openclaw_learning_service: 記憶體 Queue → DB retry queue,時間衰減 RAG
- services/nemoton_dispatcher_service: 三個 tool 強制雙寫 ai_insights (_sink_insight_to_km)
- services/import_service: Excel 前置欄位防禦(商品名稱類 + 業績金額類)
- services/ollama_service: generate_embedding 新增 EMBEDDING_HOST env,embedding 永遠走 192.168.0.111
- SYSTEM_VERSION: V9.4 → V10.3
DB 備份機制:
- scripts/pg_backup.sh: host-level pg_dump 備份腳本,cron 每日 02:00,保留 7 天,Telegram 通知
- services/db_backup_service.py: Python 備份 service,寫入 backup_log
- scheduler: run_db_backup_task (02:00) + run_backup_monitor_task (每 6h AI Agent 監控)
- Dockerfile: 加入 postgresql-client
文件:
- CLAUDE.md: 環境架構依 ADR-008 實地重寫,含完整 SSH/Docker 部署 SOP
- PROJECT_CONSTITUTION.md: 內容已整合入 CLAUDE.md,刪除重複檔案
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-19 02:03:45 +08:00 |
|
ogt
|
1b4f3a7bbe
|
feat: EwoooC 初始化 — 完整專案推版至 Gitea
CD Pipeline / deploy (push) Failing after 59s
- 建立 Gitea Actions CD pipeline (.gitea/workflows/cd.yaml)
- 部署模式: rsync Python 檔案至 188 → docker restart (volume mount)
- Dockerfile/requirements 變動時自動重建 Docker image
- 部署通知: Telegram (開始/成功/失敗)
- 健康檢查: https://mo.wooo.work/health (最多 5 次重試)
- 同步最新 CLAUDE.md / ADR-008 / memory (2026-04-19)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-19 01:21:13 +08:00 |
|