OG T
|
8fdd159e6b
|
chore: trigger CD — Phase 25 P0 v4.3 benchmark fixes + NIM CB protection
|
2026-04-05 02:10:22 +08:00 |
|
OG T
|
e3b94462ca
|
fix(ci): python3-venv 自動安裝,確保 venv 建立不失敗
CD Pipeline / build-and-deploy (push) Failing after 21s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-05 02:03:18 +08:00 |
|
OG T
|
2243a21b96
|
fix(ai-router): v4.3 NIM 保護 — timeout 不計 CB 失敗,每次先跑 NIM 才切 Gemini
CD Pipeline / build-and-deploy (push) Failing after 20s
需求: NIM 必須等到有回應才切換,不能因為慢就被 CB 封鎖走 Gemini
變更:
- Timeout exception 不累積 CB failure(只有真實連線錯誤才計)
- NIM CB: failure_threshold=10, recovery_timeout=30s(比預設寬鬆)
- 設計文件 v4.3: 更新方向二,移除錯誤假設
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-05 01:51:12 +08:00 |
|
OG T
|
5ad403b287
|
fix(p0): v4.3 — 實測確認 Ollama CPU-only 不可用,DIAGNOSE 統一走 NIM
實測依據 (2026-04-05):
- Ollama llama3.2:3b CPU-only: 238s 回 {"ok":true},生產不可用
- Nemotron NIM: 2.2s~27.3s,avg 10.6s,一直是主力(Phase 22 起)
- NIM 從未有隱私問題,Incident 資料一直送雲端 GPU
變更:
- ai_router.py: _local_fallback_chain 廢棄(空 list)
- ai_router.py: DIAGNOSE route/route_sync 改回 _full_fallback_chain
- config.py: 更新 timeout 說明反映實測結果
- test_p0_diagnose_routing.py: 更新 docstring
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-05 01:49:06 +08:00 |
|
OG T
|
8f64affbdb
|
docs(runbooks): REBOOT-RECOVERY-SOP v3.0 完整重開機自動化方案
## 內容
完整盤點所有主機、服務、工具、監控的:
- 啟動順序與依賴關係圖
- 正常重啟 vs 異常重啟處理流程
- 各主機詳細啟動序列 (188/110/120/121)
- 常見故障排查手冊 (告警沉默/CD失效/數據消失/NodePort)
- E2E 驗證腳本
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-05 01:48:29 +08:00 |
|
OG T
|
ad4abefcd9
|
fix(k8s+ops): 修復告警鏈路 + Gitea runner 自動啟動
CD Pipeline / build-and-deploy (push) Failing after 21s
## 修復項目
1. NetworkPolicy allow-nginx-ingress 加入 192.168.0.110
- Alertmanager (在 110) 需要從 110 直接 POST webhook 到 API pod
- 修復前: 110 被 NetworkPolicy default-deny 阻擋,webhook timeout
- 修復後: 110 加入 ingress 白名單,告警鏈路恢復
2. awoooi-startup-110.sh 加入 Gitea Act Runner
- Step 6: 啟動 /home/wooo/act-runner (gitea-runner container)
- 修復前: 重開機後 runner 離線,CD pipeline 全面失效
- 修復後: runner 自動重啟,若配置過期自動清除重新註冊
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-05 01:42:52 +08:00 |
|
OG T
|
be3aa6069b
|
feat(backup): AWOOOI 高頻備份 — 每 6 小時備份 awoooi_prod
awoooi_prod 為核心生產 DB,每日一次最大損失 24 小時不可接受:
- backup-awoooi-frequent.sh:每 6 小時備份 awoooi_prod(08/14/20:00)
- 02:00 由 backup-all.sh 完整備份(含 dev/k3s)
- 合計 4次/天,最大數據損失 ≤ 6 小時
- GFS 保留:28h 高頻 + 30日 + 12週 + 24月
首次執行:✅ 680K,4s,snapshot db050dbc
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-05 01:14:50 +08:00 |
|
OG T
|
3136fc5ea0
|
feat(backup): 全面自動化備份 + AWOOOI DB + GFS 延長保留
首席架構師備份審計 — 全部自動化完成:
- backup-awoooi.sh:新增 AWOOOI PostgreSQL 備份腳本
- awoooi_prod (KB/事故/AutoRepair/Drift) + k3s_datastore
- 從 110 SSH 到 188 執行 pg_dump,整合進 restic
- 首次執行:680K,9s,snapshot 8750748f ✅
- backup-all.sh v2.0:整合第 4 個服務 AWOOOI DB
- GFS 保留策略延長:
- 每日 7→30 份(覆蓋最近 30 天)
- 每週 4→12 份(覆蓋最近 3 個月)
- 每月 6→24 份(覆蓋最近 2 年)
- BACKUP-STATUS.md:更新為全自動化狀態總覽
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-05 01:11:31 +08:00 |
|
OG T
|
84cfdb6195
|
docs(backup): 備份審計完整盤點 + 新增 AWOOOI DB 與 Gitea DB 備份腳本
首席架構師備份審計結論:
- awoooi_prod PostgreSQL:❌ 無備份 (P0 缺口)
- Gitea SQLite DB:❌ 無備份 (今日已損壞,人工修復耗時 2h+)
新增:
- scripts/backup/backup-awoooi-db.sh (188 部署,02:00 daily)
- scripts/backup/backup-gitea-db.sh (110 部署,01:00 daily)
- docs/runbooks/BACKUP-STATUS.md (全景表 + 部署步驟 + SOP)
- LOGBOOK.md 備份審計段落
待手動部署:統帥需 scp 腳本至 188/110 並設定 crontab
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-05 01:01:58 +08:00 |
|
OG T
|
8300879d02
|
chore: trigger CD deploy (warm-up + MinIO startup)
CD Pipeline / build-and-deploy (push) Failing after 24s
|
2026-04-05 01:00:31 +08:00 |
|
OG T
|
2f44d1281e
|
chore: trigger CD — warm-up Redis working memory deploy
|
2026-04-05 01:00:24 +08:00 |
|
OG T
|
c0c903dc48
|
fix(startup): 188 啟動腳本加入 MinIO — 解決 Velero BSL Unavailable
MinIO 重開機後不會自動啟動,導致 Velero BackupStorageLocation Unavailable
加入 MinIO docker compose up -d 到 STEP 7 Docker Compose 服務區段
⚠️ 統帥需要手動執行以下指令讓 188 上的 startup script 生效:
sudo cp /tmp/awoooi-startup.sh /usr/local/bin/awoooi-startup.sh
sudo chmod +x /usr/local/bin/awoooi-startup.sh
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-05 00:52:13 +08:00 |
|
OG T
|
45458e8f33
|
docs(adr): ADR-057 狀態更新為已批准並實作
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-05 00:44:31 +08:00 |
|
OG T
|
a81bf50537
|
feat(drift): ADR-057 adopt() Gitea PR API 實作
- DriftAdoptService: 透過 Gitea REST API 建立 branch + commit + PR
不在 API Pod 內執行 git(修復 C2 安全漏洞)
- adopt() 端點: 501 → 真實實作(呼叫 DriftAdoptService)
- config.py: 新增 GITEA_API_URL / GITEA_API_TOKEN / GITEA_REPO_OWNER / GITEA_REPO_NAME
- K8s secret awoooi-secrets 已注入 GITEA_API_TOKEN
- drift.py: 移除 trigger_drift_scan 中未使用的 interpreter 變數
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-05 00:39:29 +08:00 |
|
OG T
|
f4f454fd98
|
feat(api): 重開機後自動 warm-up Redis Working Memory from PostgreSQL
- main.py lifespan: 啟動時從 DB restore INVESTIGATING/MITIGATING incidents
- scripts/reboot-recovery: 188 + 110 自動化腳本 + systemd services
- scripts/reboot-recovery: aiops-network 自動建立 (ClawBot 依賴)
- docs/runbooks/REBOOT-RECOVERY-SOP.md: 完整改寫,含自動化腳本說明
Why: 重開機後 Redis 清空導致前端 incidents 顯示 0 筆(DB 完整保存)
統帥批准: 「所有數據必須被長久記錄下來」
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-05 00:39:20 +08:00 |
|
OG T
|
f94000aea2
|
chore: trigger CD — Phase 25 Review R2 fixes + ADR-054~057
|
2026-04-05 00:34:35 +08:00 |
|
OG T
|
96d5e18924
|
fix(p0): 實測修正 — timeout 依 benchmark 調整,_local_fallback_chain 移除雲端 Nemotron
- config.py: NEMOTRON_DIAGNOSE_TIMEOUT_SECONDS=60s (NIM 實測 11-45s + 15s buffer)
- config.py: OLLAMA_DIAGNOSE_TIMEOUT_SECONDS=200s (Ollama 實測 ~173s + 27s buffer)
- ollama.py: 新增 per-task timeout (diagnose/force_local 用 200s)
- ai_router.py: _local_fallback_chain 移除 Nemotron (NIM=雲端,不可進 local chain)
- ai_router.py: v4.2 — Option C 分情境路由正式確立
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-05 00:29:09 +08:00 |
|
OG T
|
ddb75b69c5
|
docs(logbook): Phase 25 Review R2 通過 + ADR-054~057 記錄
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-05 00:25:31 +08:00 |
|
OG T
|
15c7f6fcd3
|
docs(adr): 起草 ADR-054/055/056/057 — Phase 25 三方向架構決策
ADR-054: DIAGNOSE Privacy-First Routing (已批准)
- _local_fallback_chain 設計決策
- NEMOTRON privacy_level=local 首席架構師裁示
- 全部 local 失敗 → REJECT + Telegram
ADR-055: Knowledge Auto-Harvesting (已批准)
- AUTO_RUNBOOK DRAFT + ANTI_PATTERN PUBLISHED 設計理由
- compute_hash() 碰撞風險說明
- Fire-and-forget GC 防護強制規範
ADR-056: Config Drift Detection 四層架構 (已批准)
- Detector→Analyzer→Interpreter→Remediator 職責邊界
- AI 只做意圖分析不做修復決策
- adopt() 暫停 + _recent_reports Phase 1 限制
ADR-057: adopt() Gitea PR API 實作路徑 (草案,待批准)
- 解決 API Pod git add -A 安全風險
- PR review 流程保障
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-05 00:24:50 +08:00 |
|
OG T
|
4912c7f307
|
fix(phase25): 首席架構師 Review R2 修正 (I1/I2/I3/I4/C3/M1)
I1: auto_repair_service — 失敗分支 anti_pattern task 補齊 _pending_tasks GC 防護
C3: drift_remediator — _kubectl_apply() 實作 resource_key 範圍過濾(修復虛設參數 bug)
M1: drift_remediator — _git_push() 標記 DISABLED,防止誤啟用
I2: drift.py — Telegram 通知移除失效的 adopt() 端點連結
I3: drift/page.tsx — handleScan POST body namespace→namespaces(對齊後端 DriftScanRequest)
I4: drift/page.tsx — 移除硬編碼英文字串,改用 t('loading')/t('highCount')/t('mediumCount')
i18n: zh-TW.json + en.json 補齊 drift.loading key
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-05 00:22:38 +08:00 |
|
OG T
|
4bc4757fdc
|
test(phase25): Phase 25 P1/P2 source code inspection tests (36 tests)
- test_phase25_auto_harvesting.py: 18 tests for NemotronRunbookGenerator,
AntiPattern gate, fire-and-forget pattern, symptoms_hash
- test_phase25_drift_detection.py: 18 tests for DriftDetector, NemotronDriftInterpreter
(read-only), DriftRemediator, local fallback chain for DIAGNOSE
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-05 00:14:50 +08:00 |
|
OG T
|
cd5547f5eb
|
feat(web/kb): 知識庫支援 AUTO_RUNBOOK + ANTI_PATTERN 類型顯示
- KnowledgeEntry type: 加入 auto_runbook + anti_pattern
- TYPE_COLORS: auto_runbook (紫色) + anti_pattern (紅色)
- 類型過濾器: 新增兩種類型選項
- i18n: zh-TW + en 新增 type.auto_runbook + type.anti_pattern + status.published
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-04 18:09:10 +08:00 |
|
OG T
|
aea16c87ce
|
feat(web/drift): Config Drift Detection 頁面 — Phase 25 P2 前端
CD Pipeline / build-and-deploy (push) Waiting to run
- drift/page.tsx: 漂移偵測頁面(報告列表 + 手動掃描)
- sidebar.tsx: 加入 drift nav item(Diff icon,ops section)
- i18n: zh-TW + en 新增 nav.drift + drift.* keys
功能:
- GET /api/v1/drift/reports → 顯示最近 20 份報告
- POST /api/v1/drift/scan → 手動觸發掃描,顯示結果 banner
- DriftLevelBadge: 高/中/低 漂移計數
- StatusBadge: pending/resolved/ignored
- Nemotron 意圖分析顯示
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-04 18:08:05 +08:00 |
|
OG T
|
688146ef9c
|
test(ai-router): test_fallback_list >= 2 改 >= 1
CD Pipeline / build-and-deploy (push) Has been cancelled
DIAGNOSE local chain 選 Nemotron 後 fallback 只剩 Ollama 一個
>= 2 斷言過嚴,與 test_query_routes_to_ollama 同樣修正
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-04 18:05:25 +08:00 |
|
OG T
|
428ed5f8cd
|
test(ai-router): 修正 test_query_routes_to_ollama 斷言
CD Pipeline / build-and-deploy (push) Failing after 41s
Phase 25 P0 後 DIAGNOSE 走 _local_fallback_chain [NEMOTRON, OLLAMA]
選 NEMOTRON 為 primary,fallback 只剩 OLLAMA 一個,
>= 2 斷言過嚴,改為 >= 1。
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-04 18:02:43 +08:00 |
|
OG T
|
c4923b6908
|
docs(logbook): Phase 22.4 + Phase 25 全部驗證通過記錄
- Phase 22.4 tests 18/18 PASSED (b6e12f7)
- embed-all 7/7 prod 成功
- semantic-search E2E score=0.6867 驗證通過
- drift /scan E2E 正常回應
- drift-scanner CronJob 每小時執行
- dev/prod DB migration (symptoms_hash + enum) 完成
- 53 integration tests PASSED
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-04 18:00:33 +08:00 |
|
OG T
|
a562db4048
|
fix(phase25): 首席架構師 Review C1/C2/I1/I3 修正
CD Pipeline / build-and-deploy (push) Failing after 57s
C1: NemotronProvider.privacy_level cloud→local
NIM 部署在 192.168.0.188 內網,非官方雲端 API
可納入 DIAGNOSE _local_fallback_chain 隱私邊界
C2: adopt() 端點暫停,返回 501
API Pod 執行 git add -A 有安全風險
ADR-057 起草後改用 Gitea PR API 實作
I1: timeout log 修正,記錄實際套用的 timeout 值
原本永遠記錄 NEMOTRON_TIMEOUT_SECONDS=45
現在記錄依 task_type 選擇的正確值
I3: route_sync() 補 DIAGNOSE 隱私邊界
async route() 已有 _local_fallback_chain
sync 版本遺漏,此次補齊
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-04 18:00:05 +08:00 |
|
OG T
|
c4eafd2a5b
|
fix(ai-router): fallback_models 排除 selected_model 避免重複
CD Pipeline / build-and-deploy (push) Successful in 6m58s
DIAGNOSE intent 路由至 Nemotron 後,fallback_chain 中的 OPENCLAW_NEMO
也使用相同 model string,導致 fallback_models 包含已選模型。
修正: 過濾掉與 selected_model 相同的 model string。
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-04 17:43:44 +08:00 |
|
OG T
|
0c180dec86
|
docs(spec): 方向二實作修正記錄 — Nemotron privacy_level=cloud (P0)
|
2026-04-04 17:42:53 +08:00 |
|
OG T
|
8056be5847
|
feat(ai-router): DIAGNOSE intent override 升級至 Nemotron (P0)
|
2026-04-04 17:41:45 +08:00 |
|
OG T
|
c94cf5ac68
|
chore: trigger CD deploy Phase 25 (3455044)
|
2026-04-04 17:36:05 +08:00 |
|
OG T
|
671974dedb
|
test(ai-router): TestLocalFallbackChain — require_local 隱私邊界驗證 (P0)
CD Pipeline / build-and-deploy (push) Failing after 43s
新增兩個測試:cloud provider 被跳過 + 全失敗回傳 local_providers_unavailable。
實作邏輯已存在於 AIRouterExecutor.execute()(2026-04-04 ogt Phase 25 P0)。
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-04 17:32:32 +08:00 |
|
OG T
|
ffd679f5d3
|
feat(nemotron): per-task timeout,DIAGNOSE 使用獨立 timeout 設定 (P0)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-04 16:58:23 +08:00 |
|
OG T
|
3455044457
|
feat(phase25): Nemotron 主動防禦三方向 P0+P1+P2 完整實作
CD Pipeline / build-and-deploy (push) Failing after 38s
Type Sync Check / check-type-sync (push) Failing after 35s
P0 - DIAGNOSE Privacy-First Routing:
- ai_router.py: _local_fallback_chain [NEMOTRON→OLLAMA→REJECT]
- DIAGNOSE 意圖 override 改為 NEMOTRON (原 OLLAMA)
- DIAGNOSE fallback 使用 local-only 鏈,不觸碰雲端
- 全部失敗時 REJECT + Telegram 通知
- config.py: NEMOTRON_DIAGNOSE_TIMEOUT_SECONDS=30, OLLAMA_DIAGNOSE_TIMEOUT_SECONDS=60
- nemotron.py: 根據 context[task_type] 選擇 timeout
P1 - Knowledge Auto-Harvesting:
- models/knowledge.py: EntryType.AUTO_RUNBOOK + ANTI_PATTERN + symptoms_hash
- EntryStatus.PUBLISHED (ANTI_PATTERN 直接發布,無需審核)
- models/playbook.py: SymptomPattern.compute_hash() (16字元確定性 hash)
- services/runbook_generator.py: NemotronRunbookGenerator (v1.1)
- generate_runbook() → AUTO_RUNBOOK (DRAFT) + Telegram 審核 card
- generate_anti_pattern() → ANTI_PATTERN (PUBLISHED) + Telegram 通知
- 使用 nvidia.chat() (正確介面),Nemotron 超時時 Minimal fallback
- knowledge_service.py: check_anti_pattern(symptoms_hash, days=7)
- db/models.py: symptoms_hash VARCHAR(16) + ix_knowledge_symptoms_hash
- repositories/knowledge_repository.py: create() 支援 symptoms_hash + status
- auto_repair_service.py: anti_pattern_gate 在 decide() + runbook hook 在 execute()
- migrations/phase8_symptoms_hash.sql: ALTER TABLE + partial index + PUBLISHED constraint
P2 - Config Drift Detection:
- models/drift.py: DriftItem/DriftReport/DriftLevel/DriftIntent/DriftStatus
- services/drift_detector.py: GitStateReader + K8sStateReader + DriftDetector
- services/drift_analyzer.py: 白名單過濾 + DriftLevel 分級
- services/drift_interpreter.py: NemotronDriftInterpreter(意圖分析,不生成修復指令)
- services/drift_remediator.py: rollback(kubectl apply) + adopt(git push gitea)
- api/v1/drift.py: POST /scan, GET /reports, POST /rollback, POST /adopt
- migrations/phase9_drift_reports.sql: drift_reports 表
- k8s/drift-cronjob.yaml: 每小時自動掃描 CronJob
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-04 12:35:05 +08:00 |
|
OG T
|
0b41df45d6
|
docs(plans): 三方向實作計畫 P0/P1/P2
- P0: DIAGNOSE Privacy-First Routing(local chain 隔離 + REJECT 保護)
- P1: Knowledge Auto-Harvesting(Anti-Pattern 閉環 + Runbook 生成)
- P2: Config Drift Detection(GitOps 守門員 + Nemotron 意圖分析)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-04 12:31:36 +08:00 |
|
OG T
|
035cb9cd0d
|
docs(spec): Nemotron 主動防禦三方向設計文件
- 方向一:Knowledge Auto-Harvesting(Anti-Pattern 閉環 + Runbook 自動生成)
- 方向二:DIAGNOSE Privacy-First Routing(Local-Only Fallback Chain)
- 方向三:Config Drift Detection(GitOps 守門員 + Nemotron 意圖分析)
首席架構師 ogt 100% 技術背書
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-04 12:18:11 +08:00 |
|
OG T
|
b6e12f74f4
|
test(phase22): Phase 22.4 Nemotron 協作測試 18/18 PASSED
CD Pipeline / build-and-deploy (push) Successful in 7m12s
- 修正 file path: apps/api/src/ → src/ (從 apps/api/ 目錄執行)
- 擴大 snippet size: 800→1500 chars (docstring 過長導致 flag check 超出範圍)
- 擴大 _call_nemotron_tools snippet: 2000→5000 chars (timeout 在函數後段)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-04 12:16:28 +08:00 |
|
OG T
|
df3ef9006c
|
fix(auto-repair): 首席架構師 Review — 4 Critical/Important 修復
CD Pipeline / build-and-deploy (push) Successful in 7m2s
Critical #1: KM write task 移出 try/except
- _trigger_learning 的 KM 寫入原在 try 內,learning 失敗時不寫 KM
- 移至 except 後確保成功/失敗都寫入
- 移除冗餘 import asyncio(已在頂層 import)
- Minor: approval.incident_id or None 防空字串
Important #2: migration 加 PRIMARY KEY
- playbook_id 從 UNIQUE 升為 PRIMARY KEY
- prod DB 已執行 ALTER TABLE ADD PRIMARY KEY
Important #3: s.sequence→s.step_number, s.description→s.command
- embed_playbook() 使用不存在的欄位名,RAG 向量索引靜默失敗
- RepairStep 正確欄位: step_number, command
Important #1: PlaybookService._get_rag_service 不再 Service 層快取
- 改為每次呼叫工廠 get_playbook_rag_service()
- 避免舊實例繞過工廠的 is_closed 重建邏輯
冷啟動修復 (首席架構師建議B+C):
- _trigger_playbook_extraction 執行成功後自動設定
execution_success=True, effectiveness_score=4, status=RESOLVED
- skip 路徑 logger.debug → logger.info 提升可觀測性
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-04 12:02:03 +08:00 |
|
OG T
|
902443f376
|
feat(knowledge): 前端語意搜尋 UI — 切換按鈕 + 相似度分數顯示
CD Pipeline / build-and-deploy (push) Has been cancelled
- 搜尋欄旁新增語意/關鍵字切換按鈕 (Sparkles icon, claw-blue 高亮)
- 語意模式下呼叫 GET /api/v1/knowledge/semantic-search (500ms debounce)
- 條目卡片右側:語意模式顯示相似度百分比,關鍵字模式顯示 view_count
- 空態:語意模式未輸入時顯示提示文字
- i18n: zh-TW + en 新增 6 個 key
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-04 11:58:40 +08:00 |
|
OG T
|
369413f87d
|
docs: 更新 LOGBOOK KB Phase 2 全修完成 + 5 tests PASSED
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-04 11:55:40 +08:00 |
|
OG T
|
f6567751a9
|
test(knowledge): pgvector 語意搜尋整合測試 (5 tests)
CD Pipeline / build-and-deploy (push) Has been cancelled
- test_save_embedding: CAST AS vector 語法驗證
- test_semantic_search_returns_results: cosine similarity 查詢
- test_semantic_search_threshold_filters: 正交向量被 threshold 過濾
- test_semantic_search_archived_excluded: archived 不出現
- test_list_unembedded_entries: 未 embed 條目列舉
全部 5/5 PASSED (awoooi_dev PostgreSQL + pgvector)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-04 11:55:09 +08:00 |
|
OG T
|
72d7536ead
|
feat(auto-repair): 完整自動修復閉環 + KM 沉澱串接
CD Pipeline / build-and-deploy (push) Has been cancelled
1. DB Migration: playbooks 資料表 (phase7_playbooks_table.sql)
- 這是自動修復無法啟動的根本原因 — table 從未建立
- 5 個索引: status/tags/alert_names/source_incidents/created_at
- 已在 prod DB 執行
2. playbook_service: 萃取後自動沉澱 KM
- extract_from_incident() 完成後 fire-and-forget _write_to_km()
- 內容含症狀模式、修復步驟、信心度、來源 Incident
3. approval_execution: 執行結果沉澱 KM
- _trigger_learning() 後 fire-and-forget _write_execution_result_to_km()
- 成功/失敗記錄都寫入,category=execution_result
完整閉環:
告警 → AI分析 → 查Playbook → 決策 → 執行 → 結果寫KM
↓
Incident解決 → KM(knowledge_extractor)
→ Playbook萃取 → KM
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-04 11:54:15 +08:00 |
|
OG T
|
429d81d29b
|
fix(knowledge): I2+I3 首席架構師 Important 修復 — 依賴注入 + exception 細分
CD Pipeline / build-and-deploy (push) Has been cancelled
I2: KnowledgeService 移至 DecisionManager.__init__ 注入
_query_kb_context_inner 使用 self._knowledge_svc,移除函數內 import 耦合
I3: _query_kb_context exception 細分
- asyncio.TimeoutError → warning (預期降級)
- ConnectionError/OSError → warning (Ollama 連線問題,預期降級)
- Exception → error (非預期,提升監控可見性)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-04 11:51:43 +08:00 |
|
OG T
|
69a9218723
|
docs: 更新 LOGBOOK KB Phase 2 + 首席架構師 Review 紀錄
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-04 11:49:31 +08:00 |
|
OG T
|
f846000c8c
|
fix(knowledge): C1 首席架構師必修 — _query_kb_context 5秒 hard timeout
CD Pipeline / build-and-deploy (push) Has been cancelled
C1 修復 (首席架構師 Review 74/100 → 條件通過):
- 抽出 _query_kb_context_inner 含實際查詢邏輯
- _query_kb_context 用 asyncio.wait_for(timeout=5.0) 包裝
- Ollama hang/慢響應最多消耗 5s,保護 30s 決策 SLA
- timeout 時 logger.warning("kb_rag_timeout") 靜默降級
同步移除 LLM prompt 中的 emoji (## 📚 → ## Knowledge Base)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-04 11:48:57 +08:00 |
|
OG T
|
860dc1d892
|
feat(knowledge): KB Phase 2 — OpenClaw RAG 整合
CD Pipeline / build-and-deploy (push) Has been cancelled
_dual_engine_analyze 新增 _query_kb_context():
- Incident 分析前語意搜尋相關 KB 條目 (top-3, threshold=0.4)
- 將 KB context 注入 expert_context.diagnosis_context 傳給 LLM
- 失敗時靜默降級,不影響主分析流程
- dual_engine_llm_win log 新增 kb_rag 欄位,可觀測 RAG 命中率
架構: _query_kb_context 透過 get_knowledge_service() 呼叫 Service 層
符合 leWOOOgo 積木化 — decision_manager 不直接存取 DB/pgvector
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-04 11:46:47 +08:00 |
|
OG T
|
d0f09705e5
|
fix(auto-repair): 修復三個阻礙自動修復的根本原因
CD Pipeline / build-and-deploy (push) Has been cancelled
1. playbook_rag: Ollama embedding http_client 滾動重啟後 is_closed
- 新增 _get_http_client() 偵測 is_closed 自動重建
- singleton get_playbook_rag_service() 加 is_closed 重建判斷
2. telegram: 加入 ai_model 欄位顯示底層判斷模型
- TelegramMessage.ai_model 欄位
- format() / format_with_nemotron() 顯示 "Nemotron (nemotron-70b)"
- openclaw proposal_dict 加入 model 欄位
- decision_manager / send_approval_card 串接
3. DB: 清除 9 筆 3/26 殭屍 PENDING (mock_fallback CRITICAL 測試記錄)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-04 11:46:25 +08:00 |
|
OG T
|
12bc94796a
|
fix(knowledge): asyncpg 不支援 :param::type,改用 CAST(:param AS vector)
CD Pipeline / build-and-deploy (push) Has been cancelled
asyncpg 使用 $1 位置參數,:emb::vector 語法導致 PostgresSyntaxError。
save_embedding 和 semantic_search 均改用 CAST(:emb AS vector) 語法。
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-04 11:43:59 +08:00 |
|
OG T
|
cddc4cb1fc
|
fix(knowledge): 首席架構師 Review 修復 C1+C2+I1+I2 (71→~88/100)
CD Pipeline / build-and-deploy (push) Successful in 7m16s
C1: IKnowledgeRepository Protocol 補齊 save_embedding + semantic_search +
list_unembedded_entries,恢復 Interface 先行保護層
C2: embed_all_entries Service 層 raw SQL 移至 Repository.list_unembedded_entries()
Service 改透過 Protocol 呼叫,符合 leWOOOgo 積木化原則
I1: asyncio.create_task 加入 _pending_tasks set 持有引用,防 GC 回收與
Shutdown 時 Task 遺失;task done 後自動 discard
I2: OllamaEmbeddingService 從每次 new 改為 KnowledgeService.__init__ 注入,
單一實例重用
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-04 11:22:38 +08:00 |
|
OG T
|
8960bba7fe
|
feat(knowledge): pgvector RAG — 語意搜尋 + 背景 Embedding 管線
CD Pipeline / build-and-deploy (push) Has been cancelled
- repository: save_embedding (raw SQL pgvector cast) + semantic_search (cosine <=>)
- service: create_entry 背景 embed + semantic_search + embed_all_entries 批次補 embed
- router: GET /semantic-search (q/limit/threshold) + POST /embed-all 管理端點
向量模型: nomic-embed-text (Ollama 192.168.0.188, 768 dims)
索引: ivfflat cosine (knowledge_entries.embedding vector(768))
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-04 11:17:24 +08:00 |
|