Commit Graph

131 Commits

Author SHA1 Message Date
OG T
bcd33e854f docs: ADR-042 前端效能優化模式 (DOM Bypass + Optimistic Updates)
All checks were successful
E2E Health Check / e2e-health (push) Successful in 16s
新增 ADR-042:
- Pattern 1: DOM Bypass (繞過 React 渲染,100x 效能提升)
- Pattern 2: Optimistic Updates (0ms UI 延遲 + 失敗回滾)
- Pattern 3: SSE Incremental Updates (增量更新,減少 API 請求)
- Pattern 4: AbortController (防止記憶體洩漏)

更新 Skills 01:
- v1.6 版本更新
- 新增效能優化模式章節
- 參考 ADR-042

首席架構師審查: 96-98/100 OUTSTANDING

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-31 11:36:21 +08:00
OG T
e176e063d4 fix(web): #19 Action Logs AbortController 防止記憶體洩漏
All checks were successful
E2E Health Check / e2e-health (push) Successful in 17s
- 新增 AbortController ref 追蹤請求
- fetchLogs: 每次新請求前取消前一次
- fetchStats: 共用 AbortController signal
- useEffect cleanup: unmount 時取消所有請求
- AbortError 正確忽略 (非錯誤狀態)

首席架構師審查: 98/100 OUTSTANDING (前端 P2)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-31 11:30:39 +08:00
OG T
723e8ef251 feat(api): Phase 21.3 Weekly Report (ADR-041)
All checks were successful
E2E Health Check / e2e-health (push) Successful in 16s
- 新增 WeeklyReportMessage dataclass (telegram_gateway.py)
- 新增 WeeklyReportService (整合 StatsService + K3sMonitor)
- 新增 CronJob (每週五 18:00 台北)
- 新增 API 端點 (/stats/weekly/preview, /stats/weekly/report)

Phase 21 定期報告機制全部完成!
- 21.1 Daily E2E Schedule 
- 21.2 K3s Telegram Report 
- 21.3 Weekly Report 

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-31 11:28:46 +08:00
OG T
b2e41ebac6 feat(api): Phase 21.2 K3s Status Telegram Report (ADR-041)
All checks were successful
E2E Health Check / e2e-health (push) Successful in 19s
- 新增 K3sStatusMessage dataclass (telegram_gateway.py)
- 新增 K3sMonitorService (Prometheus 數據收集)
- 新增 CronJob (每日 09:00 台北)
- 新增 API 端點 (/stats/k3s/status, /stats/k3s/report)

Phase 21 定期報告機制 (統帥已批准)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-31 11:25:27 +08:00
OG T
ce6b1b1c64 docs: 更新 LOGBOOK - #17 i18n Hydration 完成
前端 P1 改進全部完成:
- #15 SSE + 樂觀更新 (8c8664c)
- #16 DOM Bypass (0b87018)
- #17 i18n Hydration (f25e94e)

首席架構師審查: 96/100 OUTSTANDING

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-31 11:23:38 +08:00
OG T
f2aa9a7c41 feat(ci): Phase 21.1 Daily E2E Schedule (ADR-041)
Some checks failed
E2E Health Check / e2e-health (push) Failing after 23s
- 新增每日 00:00 台北自動執行 (cron: '0 16 * * *')
- 新增失敗時 Telegram 通知
- 更新 LOGBOOK 追蹤狀態

Phase 21 定期報告機制 (統帥已批准)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-31 11:19:26 +08:00
OG T
f25e94e8c4 fix(web): #17 i18n Hydration 防護 (NEXT_LOCALE Cookie)
Phase D #17: 修復 i18n 語系切換 Hydration 當機

問題: Client/Server 渲染語系落差導致 Hydration Mismatch
解法: Middleware 強制綁定 NEXT_LOCALE Cookie

實作內容:
- 從 URL 路徑提取當前語系
- 強制設定 NEXT_LOCALE cookie (1年 TTL)
- 確保 Server/Client 語系一致

@see QA Report 3.1 節

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-31 11:18:53 +08:00
OG T
c3a2e7745b docs: 更新 LOGBOOK - #15 #16 前端 P1 完成
- #16 ThinkingTerminal DOM Bypass (0b87018)
- #15 SSE + Optimistic Updates (8c8664c)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-31 11:13:43 +08:00
OG T
b31e079e41 docs: 更新 LOGBOOK - Phase A/B/C P1 完成 (97/100)
Some checks failed
CD Pipeline / build-and-deploy (push) Successful in 3m42s
E2E Health Check / e2e-health (push) Has been cancelled
- LOGBOOK: Phase A/B/C 首席架構師審查 OUTSTANDING
- Skills: DevOps Commander 更新
- ADR-033: K3s HA 架構補充

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-31 11:03:10 +08:00
OG T
5a3f539fe5 docs: 全面更新 Memory/Skills/LOGBOOK
Some checks failed
E2E Health Check / e2e-health (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
2026-03-30 ogt: 首席架構師審查 94/100 後狀態同步

更新項目:
- project_current_status.md: 今日完成總覽
- LOGBOOK.md: sudoers NOPASSWD 修復
- feedback_ai_fallback_order.md: NVIDIA 優先順序
- feedback_cd_security_nopasswd.md: 新增安全鐵律
- MEMORY.md: 新增索引
- 02-lewooogo-backend-core.md v2.3: AI Fallback 章節

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-30 01:42:08 +08:00
OG T
bf3a21d88e docs: 首席架構師審查 94/100 OUTSTANDING
Some checks failed
E2E Health Check / e2e-health (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
- Skills v2.2: 新增 Phase 19.4 API 整合模式
- ADR-030: 補充 §5.3 Playbook 自動狀態轉換閾值
- LOGBOOK: 更新審查結果

審查範圍: 18 commits (Phase 19.4 + ADR-039 + AI 仲裁)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-30 01:38:41 +08:00
OG T
4f06115497 docs: 首席架構師審查 - 前端內網 IP 禁令 (90/100)
Some checks failed
E2E Health Check / e2e-health (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
審查結果:
- P0 安全修復: sudo 密碼改用 secret 
- P1 識別: Sentry DSN build-arg 待處理
- P2 識別: 3 項次要問題已記錄

已更新:
- Skills 01 v1.5: 前端建置禁止內網 IP
- Skills 04 v2.1: CD 安全規範 + 內網 IP 禁令
- ADR-022: 新增前端內網 IP 禁令章節
- MEMORY.md: 新增審查記錄索引

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-30 01:32:48 +08:00
OG T
3c3294de4b docs: 更新 LOGBOOK - 首席架構師審查 (78→85/100)
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
E2E Health Check / e2e-health (push) Has been cancelled
2026-03-30 ogt: AI 仲裁修復 + P0 安全修復

變更:
- AI Fallback: NVIDIA 優先
- CI/CD 告警簡化
- P0 sudo 密碼明文修復

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-30 01:32:15 +08:00
OG T
9145faf24b docs: 前端內網 IP 禁令 - RCA + Hard Rule v1.6
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
E2E Health Check / e2e-health (push) Has been cancelled
事故: 2026-03-30 瀏覽器區域網路權限對話框
根因: CD 用 http://192.168.0.125:32334 建置 NEXT_PUBLIC_API_URL

已更新:
- CLAUDE.md: 新增 🔴🔴🔴 前端內網 IP 禁令章節
- HARD_RULES.md: v1.6 新增 Frontend Internal IP 規則
- LOGBOOK.md: RCA 事故回顧

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-30 01:28:38 +08:00
OG T
1b89ef399c docs: 更新 ARCHITECTURE_MEMORY + MASTER_EXECUTION_SCHEDULE
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-30 01:21:23 +08:00
OG T
998b5a7b5f docs: ADR-039 重編號為 ADR-040 + LOGBOOK 更新
ADR 變更:
- ADR-039 (gitea-cicd-migration) 保留給 Gitea CI/CD 遷移
- 原 ADR-039 (global-autorepair-governance) 改為 ADR-040

LOGBOOK:
- 新增 Phase 19.4 Terminal Service API 整合記錄
- 更新當前狀態

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-30 01:20:50 +08:00
OG T
78082052ee docs: 更新 ADR-039 Gitea 遷移狀態為已完成
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 3m36s
E2E Health Check / e2e-health (push) Successful in 16s
- Telegram CI/CD 告警驗證通過 (Raw Logs 200 OK)
- GitHub Actions 已停用
- Gitea 主倉運作正常

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-30 01:08:49 +08:00
OG T
97b2e059bc docs: ADR-039 完成 - Gitea CI/CD 遷移成功
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
E2E Health Check / e2e-health (push) Has been cancelled
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-30 01:07:51 +08:00
OG T
25e69e6870 feat(cicd): ADR-039 完成 - GitHub Actions 停用,Gitea 主倉
Some checks failed
E2E Health Check / e2e-health (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
- 停用所有 GitHub Actions workflows (.disabled)
- 更新 CLAUDE.md 添加 Gitea CI/CD 章節
- 更新 LOGBOOK.md 記錄遷移狀態
- Gitea 版本: 1.25.5
- Runner 版本: v0.3.1 (host 網絡模式)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-30 01:07:32 +08:00
OG T
322a79a889 docs(review): complete chief architect review for adr-038 & adr-039
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 4m16s
E2E Health Check / e2e-health (push) Successful in 21s
2026-03-29 23:56:34 +08:00
OG T
3eb3051a73 fix(ci): 修復 docker socket 重複掛載 (1774793847)
Some checks failed
CD Pipeline / build-and-deploy (push) Failing after 3m22s
E2E Health Check / e2e-health (push) Failing after 11s
2026-03-29 22:17:27 +08:00
OG T
f5b19cf108 feat(learning): 實作 Playbook 信心度調整機制 (ADR-030)
- 新增 _promote_playbook: 高評分提升信心度 +0.1
- 新增 _demote_playbook: 低評分降低信心度 -0.15
- 新增 find_by_source_incident: 按 incident_id 查詢 Playbook
- 新增 adjust_confidence: 信心度調整 + 狀態自動轉換
- 新增 Playbook.failure_rate 屬性

自動狀態轉換:
- ai_confidence >= 0.9 + DRAFT → 自動 APPROVED
- ai_confidence < 0.3 + failure_rate > 50% → 自動 DEPRECATED

測試: 13 案例全部通過

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 22:10:49 +08:00
OG T
d6b8224942 feat(cicd): ADR-039 Gitea CI/CD 遷移
2026-03-29 Claude Code (統帥授權):
- 新增 .gitea/workflows/cd.yaml (Build → Harbor → K8s)
- 新增 .gitea/workflows/e2e-health.yaml (E2E 健康檢查)
- 新增 ADR-039 文檔記錄遷移決策

方案 B: GitHub → Gitea CI/CD 遷移
- Gitea 作為主倉和 CI/CD
- GitHub 降級為只讀備份

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 21:51:45 +08:00
OG T
feafaa90a1 fix(ci): E2E Verification 添加重試機制
Some checks failed
CI / Pre-flight (push) Has been cancelled
CI / Lint & Type Check (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
CI / API Lint (push) Has been cancelled
CI / API Test (push) Has been cancelled
CI / Ollama Model Test (push) Has been cancelled
CI / OpenAPI Validate (push) Has been cancelled
CI / Docker Verify (api) (push) Has been cancelled
CI / Docker Verify (web) (push) Has been cancelled
2026-03-29 Claude Code:
- E2E 腳本也添加 3 次重試
- 間隔 5 秒
- 更新 LOGBOOK 記錄

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 21:44:55 +08:00
OG T
4c169c2f75 docs: 更新 LOGBOOK - E2E Health Check 修復進度
- 記錄 8 項問題與修復
- HMAC Secret 注入 + rollout restart
- VIP 暫時繞過,待後續診斷

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 21:43:02 +08:00
OG T
4707102498 feat(telegram): 實作 6 種新訊息模板 (ADR-038)
2026-03-29 ogt: Telegram 訊息模板完整實作

新增訊息類型:
- SentryErrorMessage: Sentry 錯誤通知 (含 Stack Trace)
- ResourceWarnMessage: 資源耗盡警告 (含 CPU/Memory/Disk)
- RepairReportMessage: 自動修復每日報告
- DailySummaryMessage: 每日系統狀態摘要
- DeploySuccessMessage: CD 部署成功通知
- RateLimitMessage: API 限額警告

新增發送方法:
- send_sentry_error()
- send_resource_warning()
- send_repair_report()
- send_daily_summary()
- send_deploy_success()
- send_rate_limit_warning()

新增按鈕:
- Sentry: [🔍 查看詳情] [🔕 靜默 1h]
- Resource: [ 自動擴展] [🔕 靜默 1h]

測試: 14 測試案例全部通過

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 21:23:07 +08:00
OG T
fecfc6b4af docs: 更新 LOGBOOK - NVIDIA RCA 模組化重構完成
2026-03-29 ogt: 反映模組化重構完成狀態

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 20:54:31 +08:00
OG T
04bfff9d19 refactor(ai): 模組化重構 - NVIDIA chat 移至 NvidiaProvider
符合 feedback_lewooogo_modular_enforcement.md 規範:
- 移除 openclaw.py 中的 _call_nvidia() (重複邏輯)
- 新增 NvidiaProvider.chat() 方法
- 更新 INvidiaProvider Protocol
- openclaw.py 改用 get_nvidia_provider().chat()
- 測試移至 test_nvidia_chat.py

架構層次:
- Router → Service → Provider (正確)
- 禁止 Service 層重複實作已存在的 Provider 功能

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 20:49:23 +08:00
OG T
1eb0be8f3f docs: 新增 Telegram 訊息模板規範 v1.0
定義 12 種訊息類別:
- 6 種已實作 (Incident/CI/PR/Exec/Heartbeat/Silence)
- 6 種待實作 (Sentry/Resource/Repair/Daily/Deploy/RateLimit)

包含:
- 完整模板格式
- 按鈕功能對照表
- Emoji 使用規範
- 字元限制規則
- 實作優先級 (P1: 5h, P2: 5h, P3: 1h)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 20:44:16 +08:00
OG T
31a6f2785d docs: 更新 LOGBOOK - NVIDIA RCA 整合 + 首席架構師審查
- 新增 NVIDIA RCA 整合記錄 (74→85/120)
- P0/P1 修復清單
- ConfigMap 變更記錄
- Memory 更新清單

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 20:36:41 +08:00
OG T
2fde0b5724 docs: 更新 LOGBOOK - Lint 清零 + E2E 診斷詳細紀錄
- Lint 61→0 完全清零,記錄 React Hook 依賴修復模式
- E2E Health Check 診斷進度 (VIP 可達,NodePort 待查)
- 新增 useMemo 包裝物件依賴的標準模式

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 20:29:43 +08:00
OG T
6a8e1bfdd1 feat(cicd): Gitea Mirror B2 備份策略
- 新增 Gitea remote (192.168.0.110:3001/wooo/awoooi)
- CD 成功後自動 mirror to Gitea
- 新增 GITEA_MIRROR_TOKEN GitHub Secret
- 更新 LOGBOOK 紀錄

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 20:28:21 +08:00
OG T
5a8edd692d docs: 更新 LOGBOOK - Lint 清理完成 2026-03-29 16:43:49 +08:00
OG T
d68917cdac docs: Wave 3 i18n 清零完成 - 40+ 違規全部修復
- TECHNICAL_DEBT_PHASE2.md: 更新為  全部完成狀態
- LOGBOOK.md: 新增 Wave 3 完成紀錄

修復清單:
- status-orb.tsx: 狀態標籤 i18n
- OmniTerminal.tsx: SSE 連線狀態 i18n
- sse-states.ts: 連線狀態 label 改 i18n key
- thinking-terminal.tsx: 終端機 UI 全面 i18n
- live-host-card.tsx: 移除 hardcoded 預設值

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 16:35:47 +08:00
OG T
590b5c2bd5 docs: P1 修復完成 - 91/100 → 95/100
5/5 P1 修復項目全部完成:
- RAG Provider DI 模式一致性
- Worker PDB (已存在)
- RAG 測試 9 項
- Grafana Config 快取
- Embedding 維度配置化

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 16:24:25 +08:00
OG T
fc3d4a6b3a docs: 首席架構師審查 91/100 + Phase 13.2 MCP Tools 完成
Wave 2 + Phase 13.2 審查結果:
- Worker HPA: 95/100
- Grafana Provider: 92/100
- RAG Provider: 88/100
- RAG Service: 90/100

P1 建議 (5項):
1. RAG Provider DI 模式一致性
2. Grafana Config 注入優化
3. RAG 測試補充
4. Embedding 維度配置化
5. Worker HPA + PDB 配合

模組化合規: Protocol/DI/Log 全部通過

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 16:19:24 +08:00
OG T
f6c3c7704f docs: 更新 LOGBOOK - Wave 2 Worker HPA 部署完成
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 16:10:23 +08:00
OG T
89e05e6ea2 docs: ADR-037 + 監控架構提案 + Runbooks
- ADR-037 監控增強架構
- MONITORING_MASTER_PLAN 主計畫
- MASTER_EXECUTION_SCHEDULE 執行排程
- Phase D/E/Worker HPA Runbooks

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 16:04:08 +08:00
OG T
95b46af986 docs: 新增稽核報告 + 靈感實驗室 + Runbook 更新
- AWOOOI_COMPREHENSIVE_AUDIT_2026Q1.md 全維度稽核
- INSPIRATION_LAB.md 靈感收集
- K3S-OPTIMIZATION-RUNBOOK.md 優化指南
- ADR-006 AI Fallback 策略更新

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 16:03:41 +08:00
OG T
8ba5f5c4d3 docs: Wave C-D 監控自動化確認完成
- C.1 generate_monitoring.py 
- C.2 CI 監控覆蓋率檢查 
- C.3 discover_docker.py 
- D.1 NVIDIA Dashboard 
- D.2 coverage_report.py 

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 16:03:14 +08:00
OG T
b5602e23db docs: 更新 LOGBOOK - Wave 1 安全網全部完成
- Circuit Breaker (ADR-038) 
- Global Repair Cooldown (ADR-039) 
- Signal Worker XCLAIM + Graceful Shutdown 
- AnomalyCounter Graceful Degradation 
- K8s terminationGracePeriodSeconds: 90 

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 15:57:56 +08:00
OG T
bf06737eed docs: ADR-038/039 + LOGBOOK 更新
- ADR-038: OpenClaw 併發治理架構
- ADR-039: 全域自動修復熔斷
- LOGBOOK: 今日進度記錄

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 15:48:09 +08:00
OG T
5f9a6a7e55 fix(ai): 移除假信心分數 + 顯示 AI 模型來源
問題: AI 仲裁顯示硬編碼信心分數 (0.75/0.88/0.92/0.70)

修復:
- decision_manager: 預設 confidence 0.75 → 0.0
- decision_manager: Expert System confidence=0.0 + is_rule_based
- openclaw: 所有 Mock Response confidence → 0.0
- telegram_gateway: 新增 ai_provider 欄位
- telegram_gateway: 動態來源標籤 (Ollama/Gemini/Claude/規則匹配)

Telegram 卡片顯示:
- confidence > 0 + provider=ollama → 🤖 Ollama 仲裁
- confidence > 0 + provider=gemini → 🤖 Gemini 仲裁
- confidence > 0 + provider=claude → 🤖 Claude 仲裁
- confidence == 0 → ⚙️ 規則匹配

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 15:19:51 +08:00
OG T
12e49d844a feat(monitoring): ADR-037 Wave B - Database Exporters + Prometheus 整合
- 部署 PostgreSQL Exporter (192.168.0.188:9187)
- 部署 Redis Exporter (192.168.0.188:9121)
- 更新 Prometheus scrape config
- 首席架構師審查: 97% OUTSTANDING

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 15:18:54 +08:00
OG T
b55b1147e2 docs: 更新 LOGBOOK - P1-3/P1-4 完成 (32 tests) 2026-03-29 11:29:17 +08:00
OG T
50c055b547 feat(api): Phase D-G P0 修正 - Learning Repository 積木化
新增:
- ILearningRepository Protocol (interfaces.py)
- LearningRepository (Redis 持久化層)
- Learning API 端點 (/api/v1/learning/*)
- LearningService.get_recommended_fix() 方法
- LearningService.get_learning_summary() 方法

修正:
- Service 不直接依賴 Redis Client (透過 Repository)
- 符合 leWOOOgo 積木化原則
- 首席架構師審查: 74/100 → 92/100

更新:
- ADR-030: 新增 Phase D-G P0 修正章節
- Skill 02: v1.9 → v2.0
- Runner 修復: 序列建構解決 _runner_file_commands 衝突

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 11:03:51 +08:00
OG T
7b2f585244 docs: 完整監控實施步驟 (7 Phase 詳細文檔)
Phase A: AnomalyCounter 服務 (4h)
- Redis Sorted Set 滑動窗口計數
- 頻率閾值告警 (REPEAT/ESCALATE/PERMANENT_FIX)
- Tier 決策邏輯整合

Phase B: Database Exporters (3h)
- pg_exporter: 連接池/慢查詢/鎖等待/膨脹監控
- redis_exporter: 記憶體/命中率/驅逐監控
- 15+ 告警規則

Phase C: Incident 頻率欄位 (2h)
- IncidentFrequencyStats 模型
- 告警聚合邏輯 (10 分鐘窗口)
- 前端頻率顯示

Phase D: Sentry Comment 回寫 (1h)
- 完成 TODO 實作
- Sentry API Token 配置

Phase E: SignOz 告警規則 (2h)
- Error Rate / Latency 告警
- Trace 異常檢測
- SignOz Webhook Handler

Phase F: Alert Chain E2E (2h)
- Smoke Test 腳本
- CD Pipeline 整合
- 鏈路監控告警

Phase G: Learning Service (3h)
- 修復效果學習
- 成功率計算
- Playbook 自動更新

總工時: 17h (2-3 天)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 10:23:04 +08:00
OG T
1e7c5134fe docs: 新增異常頻率統計與根本修復章節 (統帥反饋)
- 異常頻率追蹤架構 (Redis 計數器 + 滑動窗口)
- 修復策略分級 (Tier 1-4: 重啟→緩解→根因→架構)
- AI 學習服務 (LearningService + Playbook 自動更新)
- Telegram 頻率告警格式 (重複次數 + 成功率統計)
- 實作清單 (P0: 22h, P1: 12h, P2: 8h)

🔴 關鍵觀點: 重啟只是治標,不是治本

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 02:04:10 +08:00
OG T
56ae7290e3 docs: 更新 LOGBOOK - 完整監控策略 + Telegram 按鈕修復
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 02:01:06 +08:00
OG T
40163a51b5 feat(monitoring): 完整監控策略與自動整合架構
新增:
1. MONITORING_COMPLETE_STRATEGY.md - 完整監控策略
   - 5 主機 × 60+ 服務監控矩陣
   - P0/P1/P2 告警規則清單
   - AI 自動修復閉環流程
   - 安全護欄配置

2. MONITORING_INTEGRATION_ARCHITECTURE.md - 自動整合架構
   - 服務註冊表 (Single Source of Truth)
   - CI/CD 自動驗證監控覆蓋率
   - 新服務自動獲得監控

3. ops/monitoring/service-registry.yaml - 服務清單
   - K8s 工作負載 (API/Web/Worker/ArgoCD)
   - Docker 容器 (Ollama/OpenClaw/Redis/Postgres)
   - 前端頁面 SLO
   - API 端點 SLO
   - 告警模板與自動修復動作

4. ops/monitoring/validate_coverage.py - 覆蓋率驗證
   - CI 階段執行
   - 檢測未監控服務
   - 生成覆蓋率報告

設計原則:
- 監控即代碼 (Monitoring as Code)
- 新服務必須在 registry 註冊才能部署
- 自動發現機制防止遺漏

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 01:52:08 +08:00