OG T
|
f51bf5a6a8
|
feat(backup): 全服務備份覆蓋 + 告警機制 — 9/9 服務完整
新增備份(已部署到 110,首次執行全部通過):
- backup-langfuse.sh: Langfuse AI 追蹤/評測 DB (7238 traces)
- backup-monitoring.sh: Prometheus + Grafana + Alertmanager volumes + configs
- backup-signoz.sh: SignOz ClickHouse + SQLite (分散式追蹤/日誌)
- backup-open-webui.sh: Open-WebUI LLM 對話紀錄 (SSH 188 volume)
- backup-clawbot.sh: ClawBot Redis 狀態/快取 (SSH 188 volume)
- backup-all.sh v3.0: 整合至 9/9 服務
告警機制:
- common.sh: notify_clawbot 改用 /webhook/custom 正確格式
- failed → severity:critical → Telegram 🔴 立即告警
- 告警測試通過:{"status":"ok","alert_id":"878c4c59..."}
GFS 保留:30日/12週/24月 (AWOOOI 額外 28h 高頻)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-05 11:12:42 +08:00 |
|
OG T
|
91564c6ea3
|
docs(sop): REBOOT-RECOVERY-SOP.md v4.0
更新:
- 加入 Sentry /opt/sentry 啟動說明 (110 Step 7/9)
- 新增 Sentry 重開機損壞修復章節 (PostgreSQL WAL/Redis RDB/ClickHouse parts)
- 告警沉默診斷樹補充「規則未部署」診斷 + deploy-alerts.sh 修復指令
- E2E 驗證腳本加入 Sentry + Prometheus 規則數驗證 (≥25)
- 架構圖補充 Sentry :9000
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-05 03:11:27 +08:00 |
|
OG T
|
0db9b41808
|
docs(plan): Observability + Auto-healing 完整實施計畫 (15 Tasks, 3 Sprints)
Sprint 1 (P0): Prometheus 統一告警規則 + Sentry 啟動 + CD 同步
Sprint 2 (P1): SigNoz 日誌告警 + Sentry SDK 標籤
Sprint 3 (P2): SSH HostRepairAgent 基礎設施
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-05 02:24:23 +08:00 |
|
OG T
|
de33abe0e3
|
docs(spec): 全系統自愈閉環設計規格 v1.0
整合三大問題的完整解決方案:
1. Prometheus 規則未部署 (13條→40+條,含SentryDown/AlertChain)
2. 日誌收集但無log-based alerting
3. 自動修復只限K8s層,無Host Docker/systemd修復能力
包含:
- 統一標籤規範 (layer/component/team/host)
- Sprint 1: 規則部署+Sentry啟動+CD同步
- Sprint 2: SigNoz log alert + Sentry整合
- Sprint 3: SSH HostRepairAgent + Playbooks
- SOP v4.0整合更新點
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-05 02:14:01 +08:00 |
|
OG T
|
2243a21b96
|
fix(ai-router): v4.3 NIM 保護 — timeout 不計 CB 失敗,每次先跑 NIM 才切 Gemini
CD Pipeline / build-and-deploy (push) Failing after 20s
需求: NIM 必須等到有回應才切換,不能因為慢就被 CB 封鎖走 Gemini
變更:
- Timeout exception 不累積 CB failure(只有真實連線錯誤才計)
- NIM CB: failure_threshold=10, recovery_timeout=30s(比預設寬鬆)
- 設計文件 v4.3: 更新方向二,移除錯誤假設
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-05 01:51:12 +08:00 |
|
OG T
|
8f64affbdb
|
docs(runbooks): REBOOT-RECOVERY-SOP v3.0 完整重開機自動化方案
## 內容
完整盤點所有主機、服務、工具、監控的:
- 啟動順序與依賴關係圖
- 正常重啟 vs 異常重啟處理流程
- 各主機詳細啟動序列 (188/110/120/121)
- 常見故障排查手冊 (告警沉默/CD失效/數據消失/NodePort)
- E2E 驗證腳本
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-05 01:48:29 +08:00 |
|
OG T
|
be3aa6069b
|
feat(backup): AWOOOI 高頻備份 — 每 6 小時備份 awoooi_prod
awoooi_prod 為核心生產 DB,每日一次最大損失 24 小時不可接受:
- backup-awoooi-frequent.sh:每 6 小時備份 awoooi_prod(08/14/20:00)
- 02:00 由 backup-all.sh 完整備份(含 dev/k3s)
- 合計 4次/天,最大數據損失 ≤ 6 小時
- GFS 保留:28h 高頻 + 30日 + 12週 + 24月
首次執行:✅ 680K,4s,snapshot db050dbc
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-05 01:14:50 +08:00 |
|
OG T
|
3136fc5ea0
|
feat(backup): 全面自動化備份 + AWOOOI DB + GFS 延長保留
首席架構師備份審計 — 全部自動化完成:
- backup-awoooi.sh:新增 AWOOOI PostgreSQL 備份腳本
- awoooi_prod (KB/事故/AutoRepair/Drift) + k3s_datastore
- 從 110 SSH 到 188 執行 pg_dump,整合進 restic
- 首次執行:680K,9s,snapshot 8750748f ✅
- backup-all.sh v2.0:整合第 4 個服務 AWOOOI DB
- GFS 保留策略延長:
- 每日 7→30 份(覆蓋最近 30 天)
- 每週 4→12 份(覆蓋最近 3 個月)
- 每月 6→24 份(覆蓋最近 2 年)
- BACKUP-STATUS.md:更新為全自動化狀態總覽
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-05 01:11:31 +08:00 |
|
OG T
|
84cfdb6195
|
docs(backup): 備份審計完整盤點 + 新增 AWOOOI DB 與 Gitea DB 備份腳本
首席架構師備份審計結論:
- awoooi_prod PostgreSQL:❌ 無備份 (P0 缺口)
- Gitea SQLite DB:❌ 無備份 (今日已損壞,人工修復耗時 2h+)
新增:
- scripts/backup/backup-awoooi-db.sh (188 部署,02:00 daily)
- scripts/backup/backup-gitea-db.sh (110 部署,01:00 daily)
- docs/runbooks/BACKUP-STATUS.md (全景表 + 部署步驟 + SOP)
- LOGBOOK.md 備份審計段落
待手動部署:統帥需 scp 腳本至 188/110 並設定 crontab
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-05 01:01:58 +08:00 |
|
OG T
|
45458e8f33
|
docs(adr): ADR-057 狀態更新為已批准並實作
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-05 00:44:31 +08:00 |
|
OG T
|
f4f454fd98
|
feat(api): 重開機後自動 warm-up Redis Working Memory from PostgreSQL
- main.py lifespan: 啟動時從 DB restore INVESTIGATING/MITIGATING incidents
- scripts/reboot-recovery: 188 + 110 自動化腳本 + systemd services
- scripts/reboot-recovery: aiops-network 自動建立 (ClawBot 依賴)
- docs/runbooks/REBOOT-RECOVERY-SOP.md: 完整改寫,含自動化腳本說明
Why: 重開機後 Redis 清空導致前端 incidents 顯示 0 筆(DB 完整保存)
統帥批准: 「所有數據必須被長久記錄下來」
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-05 00:39:20 +08:00 |
|
OG T
|
ddb75b69c5
|
docs(logbook): Phase 25 Review R2 通過 + ADR-054~057 記錄
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-05 00:25:31 +08:00 |
|
OG T
|
15c7f6fcd3
|
docs(adr): 起草 ADR-054/055/056/057 — Phase 25 三方向架構決策
ADR-054: DIAGNOSE Privacy-First Routing (已批准)
- _local_fallback_chain 設計決策
- NEMOTRON privacy_level=local 首席架構師裁示
- 全部 local 失敗 → REJECT + Telegram
ADR-055: Knowledge Auto-Harvesting (已批准)
- AUTO_RUNBOOK DRAFT + ANTI_PATTERN PUBLISHED 設計理由
- compute_hash() 碰撞風險說明
- Fire-and-forget GC 防護強制規範
ADR-056: Config Drift Detection 四層架構 (已批准)
- Detector→Analyzer→Interpreter→Remediator 職責邊界
- AI 只做意圖分析不做修復決策
- adopt() 暫停 + _recent_reports Phase 1 限制
ADR-057: adopt() Gitea PR API 實作路徑 (草案,待批准)
- 解決 API Pod git add -A 安全風險
- PR review 流程保障
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-05 00:24:50 +08:00 |
|
OG T
|
c4923b6908
|
docs(logbook): Phase 22.4 + Phase 25 全部驗證通過記錄
- Phase 22.4 tests 18/18 PASSED (b6e12f7)
- embed-all 7/7 prod 成功
- semantic-search E2E score=0.6867 驗證通過
- drift /scan E2E 正常回應
- drift-scanner CronJob 每小時執行
- dev/prod DB migration (symptoms_hash + enum) 完成
- 53 integration tests PASSED
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-04 18:00:33 +08:00 |
|
OG T
|
0c180dec86
|
docs(spec): 方向二實作修正記錄 — Nemotron privacy_level=cloud (P0)
|
2026-04-04 17:42:53 +08:00 |
|
OG T
|
0b41df45d6
|
docs(plans): 三方向實作計畫 P0/P1/P2
- P0: DIAGNOSE Privacy-First Routing(local chain 隔離 + REJECT 保護)
- P1: Knowledge Auto-Harvesting(Anti-Pattern 閉環 + Runbook 生成)
- P2: Config Drift Detection(GitOps 守門員 + Nemotron 意圖分析)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-04 12:31:36 +08:00 |
|
OG T
|
035cb9cd0d
|
docs(spec): Nemotron 主動防禦三方向設計文件
- 方向一:Knowledge Auto-Harvesting(Anti-Pattern 閉環 + Runbook 自動生成)
- 方向二:DIAGNOSE Privacy-First Routing(Local-Only Fallback Chain)
- 方向三:Config Drift Detection(GitOps 守門員 + Nemotron 意圖分析)
首席架構師 ogt 100% 技術背書
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-04 12:18:11 +08:00 |
|
OG T
|
369413f87d
|
docs: 更新 LOGBOOK KB Phase 2 全修完成 + 5 tests PASSED
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-04 11:55:40 +08:00 |
|
OG T
|
69a9218723
|
docs: 更新 LOGBOOK KB Phase 2 + 首席架構師 Review 紀錄
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-04 11:49:31 +08:00 |
|
OG T
|
cddc4cb1fc
|
fix(knowledge): 首席架構師 Review 修復 C1+C2+I1+I2 (71→~88/100)
CD Pipeline / build-and-deploy (push) Successful in 7m16s
C1: IKnowledgeRepository Protocol 補齊 save_embedding + semantic_search +
list_unembedded_entries,恢復 Interface 先行保護層
C2: embed_all_entries Service 層 raw SQL 移至 Repository.list_unembedded_entries()
Service 改透過 Protocol 呼叫,符合 leWOOOgo 積木化原則
I1: asyncio.create_task 加入 _pending_tasks set 持有引用,防 GC 回收與
Shutdown 時 Task 遺失;task done 後自動 discard
I2: OllamaEmbeddingService 從每次 new 改為 KnowledgeService.__init__ 注入,
單一實例重用
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-04 11:22:38 +08:00 |
|
OG T
|
15aabd6ac5
|
fix(chat+nim): 修復首席架構師 Review I1-I4 + S3 四項重要問題
CD Pipeline / build-and-deploy (push) Successful in 7m9s
I1: chat_manager._call_openclaw timeout=30.0 → 讀 settings.OPENCLAW_TIMEOUT
I2: nvidia_provider.py stale comment "45" → "55" 對齊 ConfigMap
I3: asyncio.shield 移除 — shield 超時後 task 繼續跑但無人等待 (silent leak)
I4: ChatManager.__init__ 移除 repo 實例 (leWOOOgo 禁 Service 持有 repository)
S3: _check_nemotron_health probe 10s → 25s + /v1/models 輕量端點
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 16:36:16 +08:00 |
|
OG T
|
ce945fe89e
|
rule(cost): 🔴🔴🔴 費用變更強制審批 — HARD_RULES v1.8 + CLAUDE.md
統帥指示 2026-04-03:
所有涉及費用產生的變更必須停下來等統帥明確批准後才可執行
新增:
- HARD_RULES.md v1.8: Cost Change Approval 章節
- 定義涉費變更範圍
- 強制流程: 識別→停→說明→等批准→執行
- 今日違規教訓記錄
- CLAUDE.md 任務前必讀新增費用變更條目
Memory 已同步:
- feedback_cost_change_approval.md (新建)
- feedback_constitution_v2.md 第五章
- MEMORY.md 索引最高鐵律區
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 15:36:47 +08:00 |
|
OG T
|
dc232ebb49
|
docs: LOGBOOK 更新 — KB Phase 1 + monitoring + I1/I3 完成
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 13:22:54 +08:00 |
|
OG T
|
0b83707697
|
feat(web): APM/Apps/Deployments/Tickets 頁面升級 — 串接真實 API 數據
CD Pipeline / build-and-deploy (push) Has been cancelled
- apm/page.tsx: Golden Signals 真實數據 (SignOz ClickHouse)
- apps/page.tsx: 主機服務狀態 (/api/v1/dashboard 真實數據)
- deployments/page.tsx: K8s 部署狀態串接
- tickets/page.tsx: Incidents 列表串接
- i18n: apm/apps/deployments/tickets namespace 雙語補齊
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-03 00:08:11 +08:00 |
|
OG T
|
2d5f1a71ad
|
chore(observability): ClickHouse TTL 設定完成 — Phase O 全驗收
CD Pipeline / build-and-deploy (push) Has been cancelled
E2E Health Check / e2e-health (push) Has been cancelled
signoz_logs: 30天 (已內建 _retention_days DEFAULT 30)
signoz_metrics 8個表: 233280000s(2700天) → 7776000s(90天)
- samples_v4, samples_v4_agg_5m, samples_v4_agg_30m
- exp_hist, time_series_v4, time_series_v4_6hrs
- time_series_v4_1day, time_series_v4_1week
Phase O 驗收清單全部打勾 ✅
Co-Authored-By: Claude Code <noreply@anthropic.com>
|
2026-04-02 21:38:39 +08:00 |
|
OG T
|
08f73dfce8
|
docs: Phase O-5 Wave 5.4 告警鏈路 E2E 驗證 Runbook
E2E Health Check / e2e-health (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
- 架構圖、手動測試步驟、smoke test 清單
- generate_monitoring.py 用法說明
- 已知問題豁免清單、回滾指令
- 首次驗收記錄 2026-04-02 8/8
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-02 21:34:43 +08:00 |
|
OG T
|
48c65756da
|
chore(config): USE_AI_ROUTER=true 寫入 ConfigMap (Phase 24 B2)
E2E Health Check / e2e-health (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
防止下次 CD deploy 覆蓋 kubectl set env 的設定。
B2 觀察期 48h, 截止 2026-04-04 18:40 台北時間。
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-02 21:26:53 +08:00 |
|
OG T
|
3f339110dd
|
fix(observability): 同步 .188 實際部署調整至 repo
CD Pipeline / build-and-deploy (push) Has been cancelled
E2E Health Check / e2e-health (push) Has been cancelled
與原始計畫的差異:
1. MinIO Bearer Token 認證
- 原計畫: MINIO_PROMETHEUS_AUTH_TYPE=public (此版本不支援)
- 實際: mc admin prometheus generate 產生 Bearer Token
- 更新: prometheus-config-phase-o.yaml 加入 bearer_token
2. remote_write 廢棄 → OTEL Collector Prometheus scrape
- 原計畫: Prometheus remote_write → SigNoz OTEL /api/v1/write
- 實際: SigNoz OTEL Collector 不支援 Prometheus remote_write 格式 (404)
- 改用: OTEL Collector prometheus receiver 直接 scrape node-exporter + kube-state-metrics
- 新增: ops/signoz/otel-collector-config-phase-o.yaml (版本控管副本)
3. ADR-053 驗收清單更新為實際結果
Co-Authored-By: Claude Code <noreply@anthropic.com>
|
2026-04-02 21:23:47 +08:00 |
|
OG T
|
3e4612f259
|
docs(observability): ADR-053 SigNoz 統一架構 + Phase O 驗收
CD Pipeline / build-and-deploy (push) Failing after 36s
E2E Health Check / e2e-health (push) Successful in 16s
- 新增 ADR-053: 可觀測性統一架構決策記錄
- 更新 service-registry.yaml: 補齊 MinIO/Kali 監控入口
- 更新 LOGBOOK: Phase O 完成狀態
Phase O 驗收清單:
✅ kubectl Mac 本機免密碼
✅ OTEL Collector 2 Pod Running
✅ Event Exporter 1 Pod Running
✅ Descheduler CronJob Completed
✅ MinIO + Kali 告警規則
✅ Alert Chain Smoke Test
✅ CD Pipeline 整合
⏳ ClickHouse TTL / remote_write / SigNoz rules (待 .188 手動)
Co-Authored-By: Claude Code <noreply@anthropic.com>
|
2026-04-02 18:26:57 +08:00 |
|
OG T
|
51961b9f03
|
docs: Phase O 可觀測性終極補完計畫設計規格
SigNoz 統一派架構,解決 6 大盲區 (Event/Log/Metrics/Descheduler/kubectl/MinIO-Kali)
+ Monitoring Master Plan Wave A-D 收尾
+ 5 個首席架構師 Review 節點
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-02 13:45:23 +08:00 |
|
OG T
|
73e8f8ab77
|
feat(ai): Phase 24-A+B1 — AI Provider Registry + 絞殺者包裝 (ADR-052)
E2E Health Check / e2e-health (push) Successful in 16s
CD Pipeline / build-and-deploy (push) Has been cancelled
Brain Layer 雙軌 Registry 架構:
- 新建 src/services/ai_providers/ 目錄 (interfaces + 4 providers)
- OllamaProvider (local, rca/chat/code_review)
- GeminiProvider (cloud, rca/chat)
- ClaudeProvider (cloud, rca/chat/code_review)
- OpenClawNemoProvider (cloud, rca — 委派 188→NIM)
- 擴展 ai_router.py 加入:
- AIProviderRegistry (動態註冊/啟停)
- AIRouterExecutor (Cache + 閘門 CB/RL/Sem + 執行)
- openclaw.py 絞殺者包裝: USE_AI_ROUTER=true 走新路徑
- config.py + ConfigMap 加入 USE_AI_ROUTER=false (安全預設)
- ADR-052 正式文件 (14 項決策 D1-D14)
- HARD_RULES v1.7 加入 AI Router 規範
安全: USE_AI_ROUTER=false 預設不啟用,需手動開啟觀察
回滾: kubectl set env deployment/awoooi-api USE_AI_ROUTER=false
2026-04-02 ogt: Phase 24 首批實作
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-02 13:16:09 +08:00 |
|
OG T
|
db2a2852b8
|
docs: 前端重構驗收報告 87/100
E2E Health Check / e2e-health (push) Successful in 16s
CD Pipeline / build-and-deploy (push) Has been cancelled
Playwright 瀏覽器截圖 + KB API 端點測試 + Console 分析
- 24/24 路由零 404
- 7 完整頁面 + 15 ComingSoon
- KB API 7 端點全部正常
- 1 Low bug (archived entry still accessible via GET)
- Metrics Strip [object Object] 待修
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-02 10:20:27 +08:00 |
|
OG T
|
25889d4b8e
|
docs: 歸檔 ADR-050 reanalyze 實作計畫 (已完成)
CD Pipeline (Dev) / build-and-deploy-dev (push) Failing after 9s
E2E Health Check / e2e-health (push) Successful in 18s
CD Pipeline / build-and-deploy (push) Has been cancelled
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-02 09:38:03 +08:00 |
|
OG T
|
5959855a71
|
feat(web): 字體系統升級 + NemoClaw SVG 還原 + Knowledge Base 設計文件
E2E Health Check / e2e-health (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
- 字體:Syne (標題) + DM Mono (內文) + VT323 (品牌點陣),替換 Inter
- Tailwind: fontFamily 更新 + 5 層文字色彩 token (primary→disabled)
- Sidebar: NemoClaw 白瓷龍蝦爪 SVG + AWOOOI 用 VT323 放大
- OpenClaw Panel: 還原 NemoClaw 3D 白瓷龍蝦爪 (替換 NemoNodeAnimation)
- Knowledge Base 設計文件 (B分離/A K8s Job/Phase1跳過向量搜尋)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-02 00:48:42 +08:00 |
|
OG T
|
8845377a6d
|
docs: 更新 AI中心重設計規格 (廢棄元件 + 授權邏輯記錄)
E2E Health Check / e2e-health (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-01 22:28:32 +08:00 |
|
OG T
|
9cf73bda4f
|
feat(llmops): 啟用 Langfuse LLMOps 追蹤 + CD 自動注入 Keys
CD Pipeline / build-and-deploy (push) Successful in 7m6s
E2E Health Check / e2e-health (push) Successful in 18s
- 04-configmap.yaml: LANGFUSE_ENABLED=true (Phase 15.1 Key 已在 K8s Secret)
- cd.yaml: 補齊 Langfuse keys CD 自動注入 (LANGFUSE_PUBLIC/SECRET_KEY)
- LOGBOOK.md: ClawBot → OpenClaw 命名修正
- .gitignore: 加入 tsconfig.tsbuildinfo + .superpowers/
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-01 22:19:22 +08:00 |
|
OG T
|
0b04abf990
|
docs(plan): add AI Center v6 redesign implementation plan (13 tasks)
|
2026-04-01 19:39:41 +08:00 |
|
OG T
|
4b84e95723
|
docs: AI中心 UI 重設計規格文件 v6
- Anthropic Warmth (#f5f4ed) + OpenClaw Blue (#4A90D9) 色彩系統
- 3欄佈局:Sidebar(200px) | Feed(50%) | RightPanel(50%)
- 完整側邊欄:4區19項(整合 wooo-aiops 所有菜單)
- 事件卡片流程圖 + Q版龍蝦 (橘紅本色 #E85530)
- NemoClaw 白底節點動畫(截圖風格)
- 全面圓角規範
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-01 19:19:03 +08:00 |
|
OG T
|
9913f5dc6d
|
feat(infra): 開發環境分離 + BuildKit cache 修復 + circuit breaker 優化
CD Pipeline / build-and-deploy (push) Successful in 6m52s
E2E Health Check / e2e-health (push) Successful in 17s
CD Pipeline (Dev) / build-and-deploy-dev (push) Failing after 9s
1. k8s/awoooi-dev/: 新建 dev namespace (01-05 配置)
- Namespace + ResourceQuota (cpu 2/4, mem 4Gi/8Gi)
- ConfigMap: ENVIRONMENT=dev, LOG_LEVEL=DEBUG, SHADOW_MODE=false
- Deployment: 1 replica, NodePort 32344, image dev-latest
- RBAC: awoooi-executor-dev ServiceAccount
2. .gitea/workflows/cd-dev.yaml: dev branch CD pipeline
- 觸發: dev branch push
- Build: --no-cache (防 cache poisoning)
- Tag: dev-{sha} / dev-latest
- Deploy: awoooi-dev namespace, health check 32344
- Telegram: [DEV] 前綴通知
3. apps/api/Dockerfile: ARG CACHE_BUST=none (防 BuildKit cache 毒化)
- deps 層 (pip install) 仍可 cache
- src/ 和 models.json 層每次重建
4. .gitea/workflows/cd.yaml: 正式環境 API build 加入 CACHE_BUST=git_sha
- 確保 models.json 等配置變更正確進入 image
5. apps/api/src/services/nvidia_provider.py: timeout 不計入 circuit breaker
- TimeoutException → 只 log,不 record_failure()
- 只有硬性錯誤 (auth/rate limit/exception) 才斷路
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-01 16:22:21 +08:00 |
|
OG T
|
c9c60c3a61
|
feat(mcp-integrations): Phase S 架構修復 + MCP 整合基礎建設
E2E Health Check / e2e-health (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
Type Sync Check / check-type-sync (push) Failing after 22s
Phase S 技術債修復 (首席架構師審查 82→完整):
- S-01: generate_alert_fingerprint 移至 AlertAnalyzer.generate_fingerprint() staticmethod
- S-04: 移除 Pydantic v2 deprecated json_encoders (直接用原生 datetime 序列化)
Sentry MCP 整合 (Phase 23):
- ADR-048: Sentry→OpenClaw AI Triage 架構決策
- sentry_webhook_service.py: parse/analyze/create_incident/build_message Service 層
- config.py: SENTRY_WEBHOOK_SECRET (Fail-Closed HMAC-SHA256)
Playwright MCP 整合 (短期):
- smoke.spec.ts: 5 頁面 E2E smoke test (home/dashboard/incidents/approvals/terminal)
- cd.yaml: E2E Smoke Test 步驟 + Telegram 🎭 Smoke 狀態通知
長期規劃 ADR:
- ADR-049: Figma Code Connect 設計系統同步
- ADR-050: Telegram 互動式 Incident 2.0 (6鍵 Inline Keyboard)
- ADR-051: Context7 依賴升級顧問 (Next.js 14→15, FastAPI 0.115→0.128)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-01 16:20:57 +08:00 |
|
OG T
|
5a46998689
|
docs: Secrets 管理手冊 (ADR-035+ 統一 Secrets 真相來源)
CD Pipeline / build-and-deploy (push) Successful in 5m23s
E2E Health Check / e2e-health (push) Successful in 17s
建立 docs/runbooks/SECRETS-MANAGEMENT.md:
- 7 個 Gitea Secrets + 12 個 K8s Secrets 完整清單
- 更新 SOP (API + Web UI)
- 一鍵狀態檢查命令
- 各 key 取得/更新指南
- 緊急狀況處理
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-01 15:40:48 +08:00 |
|
OG T
|
22de22c989
|
refactor(phase-s): Phase S 技術債清理 - 五項架構改善
S-01: generate_alert_fingerprint() 移至 alert_analyzer_service (Router→Service)
S-02: 移除廢棄 USE_NEW_ENGINE config (Phase R 已完成歷史使命)
S-03: github_webhook.py linter 清理 (Field unused + delivery_id noqa)
S-04: Pydantic v2 遷移 - approval/incident models (class Config → ConfigDict)
S-05: Skill 09 v1.1 更新 (USE_NEW_ENGINE 廢棄說明)
測試: 393 passed, 零失敗
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-01 13:12:02 +08:00 |
|
OG T
|
59902f270d
|
fix(tests): 首席架構師審查修復 - 測試套件 + DI 強化 (96/100 OUTSTANDING)
P1 測試修復:
- test_smart_router.py: 更新至當前 API (IntentResult + DIAGNOSE/CONFIG 規範化)
- test_auto_repair_service.py: 注入 _no_cooldown fixture 隔離 Redis 依賴
- test_global_repair_cooldown.py: 加 @pytest.mark.integration 標記
P2 架構改進:
- AutoRepairService: 新增 cooldown_checker DI 參數 (Callable | None)
- global_repair_cooldown: get_redis() 移入 try-except 防止未捕獲 RuntimeError
P3 配置:
- pyproject.toml: 登記 integration pytest marker
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-01 11:11:50 +08:00 |
|
OG T
|
6fed8be8c4
|
docs(adr): ADR-024 R4 Router 瘦身標記完成
E2E Health Check / e2e-health (push) Successful in 17s
Type Sync Check / check-type-sync (push) Failing after 22s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-01 09:27:40 +08:00 |
|
OG T
|
5086bafa36
|
docs: ADR-045 Telegram Gateway 統一到 K8s AWOOOI API
記錄 2026-03-31 已實施的架構決策:
- 統一 Telegram 到 K8s AWOOOI API Webhook 模式
- 解決 OpenClaw (188) Long Polling 雙軌競爭問題
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-01 09:17:08 +08:00 |
|
OG T
|
a94bb57d8b
|
feat(types): ADR-046 IncidentConverter + IncidentEngineAdapter
實作 ADR-046 Option B: IncidentConverter 轉換層,解決
BrainIncident (lewooogo-brain) 與 LocalIncident (apps/api) 型別邊界問題。
變更:
- 新增 src/utils/incident_converter.py
- brain_to_local(): BrainIncident → LocalIncident
- local_to_brain(): LocalIncident → BrainIncident
- ESCALATED → MITIGATING 映射 (brain 無 ESCALATED)
- incident_engine.py: 新增 IncidentEngineAdapter 包裝層
- process_signal() / get_incident() 輸出轉換為 LocalIncident
- get_incident_engine() 返回 IncidentEngineAdapter
- incident_memory.py: 加入 brain_to_local import,更新 _record_to_incident 說明
- ADR-046: 標記三個轉換點全部完成
解鎖: #123 proposal_service.py 清理 (下一步)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-03-31 22:47:54 +08:00 |
|
OG T
|
2ba61acf72
|
fix(api): Phase R-R2.2 首席架構師 72/100 P2 修復
P2-01 signal_worker.py: persisted_to_pg 改用 getattr 防 BrainIncident AttributeError
P2-02 IIncidentEngine Protocol: update_incident_status → update_status 對齊 brain 實作
P2-03 config.py USE_NEW_ENGINE: 標記失效 + 回滾路徑更正 (git revert 而非 kubectl)
ADR-046: Option B (IncidentConverter) 決策完成,待實作清單更新
ADR-024: 審查結論 + 正式回滾指令更新
Skill 02: v2.5 版本記錄
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-03-31 22:33:08 +08:00 |
|
OG T
|
cd91560e0b
|
docs: Phase R-R2 完成文件更新 + ADR-046 型別統一
- ADR-024: 更新執行進度 (R1✅ R2✅ R3✅ R4待執行)
- ADR-046: 新增跨套件 Incident 型別統一治理 (待決策)
推薦 Option B: IncidentConverter 轉換層
- Skill 02: v2.5 記錄 Phase R-R2 + R-R2.1 + ADR-046
- LOGBOOK: 更新當前狀態
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-03-31 22:17:44 +08:00 |
|
OG T
|
67ef98e737
|
docs: 更新 LOGBOOK - Phase R-R2 完成 (#121 #122)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
2026-03-31 22:04:13 +08:00 |
|
OG T
|
a3bd0a4b45
|
docs: 更新 LOGBOOK - Phase R-R1 絞殺者模式確認完成
E2E Health Check / e2e-health (push) Successful in 16s
Type Sync Check / check-type-sync (push) Failing after 20s
確認項目:
- #117-119: Dockerfile + 絞殺者包裝 ✅ 已實作
- USE_NEW_ENGINE 開關已配置 (默認 False)
- 回滾機制: kubectl set env USE_NEW_ENGINE=false
- Phase 15.4 #113-114 取樣率確認完成
下一步:
- #120 E2E 驗證 (啟用 USE_NEW_ENGINE=True 測試)
- Phase R-R2 刪除重複邏輯
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
2026-03-31 21:36:33 +08:00 |
|