Commit Graph

112 Commits

Author SHA1 Message Date
OG T
322a79a889 docs(review): complete chief architect review for adr-038 & adr-039
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 4m16s
E2E Health Check / e2e-health (push) Successful in 21s
2026-03-29 23:56:34 +08:00
OG T
3eb3051a73 fix(ci): 修復 docker socket 重複掛載 (1774793847)
Some checks failed
CD Pipeline / build-and-deploy (push) Failing after 3m22s
E2E Health Check / e2e-health (push) Failing after 11s
2026-03-29 22:17:27 +08:00
OG T
f5b19cf108 feat(learning): 實作 Playbook 信心度調整機制 (ADR-030)
- 新增 _promote_playbook: 高評分提升信心度 +0.1
- 新增 _demote_playbook: 低評分降低信心度 -0.15
- 新增 find_by_source_incident: 按 incident_id 查詢 Playbook
- 新增 adjust_confidence: 信心度調整 + 狀態自動轉換
- 新增 Playbook.failure_rate 屬性

自動狀態轉換:
- ai_confidence >= 0.9 + DRAFT → 自動 APPROVED
- ai_confidence < 0.3 + failure_rate > 50% → 自動 DEPRECATED

測試: 13 案例全部通過

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 22:10:49 +08:00
OG T
d6b8224942 feat(cicd): ADR-039 Gitea CI/CD 遷移
2026-03-29 Claude Code (統帥授權):
- 新增 .gitea/workflows/cd.yaml (Build → Harbor → K8s)
- 新增 .gitea/workflows/e2e-health.yaml (E2E 健康檢查)
- 新增 ADR-039 文檔記錄遷移決策

方案 B: GitHub → Gitea CI/CD 遷移
- Gitea 作為主倉和 CI/CD
- GitHub 降級為只讀備份

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 21:51:45 +08:00
OG T
feafaa90a1 fix(ci): E2E Verification 添加重試機制
Some checks failed
CI / Pre-flight (push) Has been cancelled
CI / Lint & Type Check (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
CI / API Lint (push) Has been cancelled
CI / API Test (push) Has been cancelled
CI / Ollama Model Test (push) Has been cancelled
CI / OpenAPI Validate (push) Has been cancelled
CI / Docker Verify (api) (push) Has been cancelled
CI / Docker Verify (web) (push) Has been cancelled
2026-03-29 Claude Code:
- E2E 腳本也添加 3 次重試
- 間隔 5 秒
- 更新 LOGBOOK 記錄

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 21:44:55 +08:00
OG T
4c169c2f75 docs: 更新 LOGBOOK - E2E Health Check 修復進度
- 記錄 8 項問題與修復
- HMAC Secret 注入 + rollout restart
- VIP 暫時繞過,待後續診斷

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 21:43:02 +08:00
OG T
4707102498 feat(telegram): 實作 6 種新訊息模板 (ADR-038)
2026-03-29 ogt: Telegram 訊息模板完整實作

新增訊息類型:
- SentryErrorMessage: Sentry 錯誤通知 (含 Stack Trace)
- ResourceWarnMessage: 資源耗盡警告 (含 CPU/Memory/Disk)
- RepairReportMessage: 自動修復每日報告
- DailySummaryMessage: 每日系統狀態摘要
- DeploySuccessMessage: CD 部署成功通知
- RateLimitMessage: API 限額警告

新增發送方法:
- send_sentry_error()
- send_resource_warning()
- send_repair_report()
- send_daily_summary()
- send_deploy_success()
- send_rate_limit_warning()

新增按鈕:
- Sentry: [🔍 查看詳情] [🔕 靜默 1h]
- Resource: [ 自動擴展] [🔕 靜默 1h]

測試: 14 測試案例全部通過

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 21:23:07 +08:00
OG T
fecfc6b4af docs: 更新 LOGBOOK - NVIDIA RCA 模組化重構完成
2026-03-29 ogt: 反映模組化重構完成狀態

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 20:54:31 +08:00
OG T
04bfff9d19 refactor(ai): 模組化重構 - NVIDIA chat 移至 NvidiaProvider
符合 feedback_lewooogo_modular_enforcement.md 規範:
- 移除 openclaw.py 中的 _call_nvidia() (重複邏輯)
- 新增 NvidiaProvider.chat() 方法
- 更新 INvidiaProvider Protocol
- openclaw.py 改用 get_nvidia_provider().chat()
- 測試移至 test_nvidia_chat.py

架構層次:
- Router → Service → Provider (正確)
- 禁止 Service 層重複實作已存在的 Provider 功能

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 20:49:23 +08:00
OG T
1eb0be8f3f docs: 新增 Telegram 訊息模板規範 v1.0
定義 12 種訊息類別:
- 6 種已實作 (Incident/CI/PR/Exec/Heartbeat/Silence)
- 6 種待實作 (Sentry/Resource/Repair/Daily/Deploy/RateLimit)

包含:
- 完整模板格式
- 按鈕功能對照表
- Emoji 使用規範
- 字元限制規則
- 實作優先級 (P1: 5h, P2: 5h, P3: 1h)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 20:44:16 +08:00
OG T
31a6f2785d docs: 更新 LOGBOOK - NVIDIA RCA 整合 + 首席架構師審查
- 新增 NVIDIA RCA 整合記錄 (74→85/120)
- P0/P1 修復清單
- ConfigMap 變更記錄
- Memory 更新清單

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 20:36:41 +08:00
OG T
2fde0b5724 docs: 更新 LOGBOOK - Lint 清零 + E2E 診斷詳細紀錄
- Lint 61→0 完全清零,記錄 React Hook 依賴修復模式
- E2E Health Check 診斷進度 (VIP 可達,NodePort 待查)
- 新增 useMemo 包裝物件依賴的標準模式

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 20:29:43 +08:00
OG T
6a8e1bfdd1 feat(cicd): Gitea Mirror B2 備份策略
- 新增 Gitea remote (192.168.0.110:3001/wooo/awoooi)
- CD 成功後自動 mirror to Gitea
- 新增 GITEA_MIRROR_TOKEN GitHub Secret
- 更新 LOGBOOK 紀錄

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 20:28:21 +08:00
OG T
5a8edd692d docs: 更新 LOGBOOK - Lint 清理完成 2026-03-29 16:43:49 +08:00
OG T
d68917cdac docs: Wave 3 i18n 清零完成 - 40+ 違規全部修復
- TECHNICAL_DEBT_PHASE2.md: 更新為  全部完成狀態
- LOGBOOK.md: 新增 Wave 3 完成紀錄

修復清單:
- status-orb.tsx: 狀態標籤 i18n
- OmniTerminal.tsx: SSE 連線狀態 i18n
- sse-states.ts: 連線狀態 label 改 i18n key
- thinking-terminal.tsx: 終端機 UI 全面 i18n
- live-host-card.tsx: 移除 hardcoded 預設值

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 16:35:47 +08:00
OG T
590b5c2bd5 docs: P1 修復完成 - 91/100 → 95/100
5/5 P1 修復項目全部完成:
- RAG Provider DI 模式一致性
- Worker PDB (已存在)
- RAG 測試 9 項
- Grafana Config 快取
- Embedding 維度配置化

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 16:24:25 +08:00
OG T
fc3d4a6b3a docs: 首席架構師審查 91/100 + Phase 13.2 MCP Tools 完成
Wave 2 + Phase 13.2 審查結果:
- Worker HPA: 95/100
- Grafana Provider: 92/100
- RAG Provider: 88/100
- RAG Service: 90/100

P1 建議 (5項):
1. RAG Provider DI 模式一致性
2. Grafana Config 注入優化
3. RAG 測試補充
4. Embedding 維度配置化
5. Worker HPA + PDB 配合

模組化合規: Protocol/DI/Log 全部通過

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 16:19:24 +08:00
OG T
f6c3c7704f docs: 更新 LOGBOOK - Wave 2 Worker HPA 部署完成
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 16:10:23 +08:00
OG T
89e05e6ea2 docs: ADR-037 + 監控架構提案 + Runbooks
- ADR-037 監控增強架構
- MONITORING_MASTER_PLAN 主計畫
- MASTER_EXECUTION_SCHEDULE 執行排程
- Phase D/E/Worker HPA Runbooks

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 16:04:08 +08:00
OG T
95b46af986 docs: 新增稽核報告 + 靈感實驗室 + Runbook 更新
- AWOOOI_COMPREHENSIVE_AUDIT_2026Q1.md 全維度稽核
- INSPIRATION_LAB.md 靈感收集
- K3S-OPTIMIZATION-RUNBOOK.md 優化指南
- ADR-006 AI Fallback 策略更新

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 16:03:41 +08:00
OG T
8ba5f5c4d3 docs: Wave C-D 監控自動化確認完成
- C.1 generate_monitoring.py 
- C.2 CI 監控覆蓋率檢查 
- C.3 discover_docker.py 
- D.1 NVIDIA Dashboard 
- D.2 coverage_report.py 

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 16:03:14 +08:00
OG T
b5602e23db docs: 更新 LOGBOOK - Wave 1 安全網全部完成
- Circuit Breaker (ADR-038) 
- Global Repair Cooldown (ADR-039) 
- Signal Worker XCLAIM + Graceful Shutdown 
- AnomalyCounter Graceful Degradation 
- K8s terminationGracePeriodSeconds: 90 

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 15:57:56 +08:00
OG T
bf06737eed docs: ADR-038/039 + LOGBOOK 更新
- ADR-038: OpenClaw 併發治理架構
- ADR-039: 全域自動修復熔斷
- LOGBOOK: 今日進度記錄

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 15:48:09 +08:00
OG T
5f9a6a7e55 fix(ai): 移除假信心分數 + 顯示 AI 模型來源
問題: AI 仲裁顯示硬編碼信心分數 (0.75/0.88/0.92/0.70)

修復:
- decision_manager: 預設 confidence 0.75 → 0.0
- decision_manager: Expert System confidence=0.0 + is_rule_based
- openclaw: 所有 Mock Response confidence → 0.0
- telegram_gateway: 新增 ai_provider 欄位
- telegram_gateway: 動態來源標籤 (Ollama/Gemini/Claude/規則匹配)

Telegram 卡片顯示:
- confidence > 0 + provider=ollama → 🤖 Ollama 仲裁
- confidence > 0 + provider=gemini → 🤖 Gemini 仲裁
- confidence > 0 + provider=claude → 🤖 Claude 仲裁
- confidence == 0 → ⚙️ 規則匹配

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 15:19:51 +08:00
OG T
12e49d844a feat(monitoring): ADR-037 Wave B - Database Exporters + Prometheus 整合
- 部署 PostgreSQL Exporter (192.168.0.188:9187)
- 部署 Redis Exporter (192.168.0.188:9121)
- 更新 Prometheus scrape config
- 首席架構師審查: 97% OUTSTANDING

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 15:18:54 +08:00
OG T
b55b1147e2 docs: 更新 LOGBOOK - P1-3/P1-4 完成 (32 tests) 2026-03-29 11:29:17 +08:00
OG T
50c055b547 feat(api): Phase D-G P0 修正 - Learning Repository 積木化
新增:
- ILearningRepository Protocol (interfaces.py)
- LearningRepository (Redis 持久化層)
- Learning API 端點 (/api/v1/learning/*)
- LearningService.get_recommended_fix() 方法
- LearningService.get_learning_summary() 方法

修正:
- Service 不直接依賴 Redis Client (透過 Repository)
- 符合 leWOOOgo 積木化原則
- 首席架構師審查: 74/100 → 92/100

更新:
- ADR-030: 新增 Phase D-G P0 修正章節
- Skill 02: v1.9 → v2.0
- Runner 修復: 序列建構解決 _runner_file_commands 衝突

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 11:03:51 +08:00
OG T
7b2f585244 docs: 完整監控實施步驟 (7 Phase 詳細文檔)
Phase A: AnomalyCounter 服務 (4h)
- Redis Sorted Set 滑動窗口計數
- 頻率閾值告警 (REPEAT/ESCALATE/PERMANENT_FIX)
- Tier 決策邏輯整合

Phase B: Database Exporters (3h)
- pg_exporter: 連接池/慢查詢/鎖等待/膨脹監控
- redis_exporter: 記憶體/命中率/驅逐監控
- 15+ 告警規則

Phase C: Incident 頻率欄位 (2h)
- IncidentFrequencyStats 模型
- 告警聚合邏輯 (10 分鐘窗口)
- 前端頻率顯示

Phase D: Sentry Comment 回寫 (1h)
- 完成 TODO 實作
- Sentry API Token 配置

Phase E: SignOz 告警規則 (2h)
- Error Rate / Latency 告警
- Trace 異常檢測
- SignOz Webhook Handler

Phase F: Alert Chain E2E (2h)
- Smoke Test 腳本
- CD Pipeline 整合
- 鏈路監控告警

Phase G: Learning Service (3h)
- 修復效果學習
- 成功率計算
- Playbook 自動更新

總工時: 17h (2-3 天)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 10:23:04 +08:00
OG T
1e7c5134fe docs: 新增異常頻率統計與根本修復章節 (統帥反饋)
- 異常頻率追蹤架構 (Redis 計數器 + 滑動窗口)
- 修復策略分級 (Tier 1-4: 重啟→緩解→根因→架構)
- AI 學習服務 (LearningService + Playbook 自動更新)
- Telegram 頻率告警格式 (重複次數 + 成功率統計)
- 實作清單 (P0: 22h, P1: 12h, P2: 8h)

🔴 關鍵觀點: 重啟只是治標,不是治本

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 02:04:10 +08:00
OG T
56ae7290e3 docs: 更新 LOGBOOK - 完整監控策略 + Telegram 按鈕修復
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 02:01:06 +08:00
OG T
40163a51b5 feat(monitoring): 完整監控策略與自動整合架構
新增:
1. MONITORING_COMPLETE_STRATEGY.md - 完整監控策略
   - 5 主機 × 60+ 服務監控矩陣
   - P0/P1/P2 告警規則清單
   - AI 自動修復閉環流程
   - 安全護欄配置

2. MONITORING_INTEGRATION_ARCHITECTURE.md - 自動整合架構
   - 服務註冊表 (Single Source of Truth)
   - CI/CD 自動驗證監控覆蓋率
   - 新服務自動獲得監控

3. ops/monitoring/service-registry.yaml - 服務清單
   - K8s 工作負載 (API/Web/Worker/ArgoCD)
   - Docker 容器 (Ollama/OpenClaw/Redis/Postgres)
   - 前端頁面 SLO
   - API 端點 SLO
   - 告警模板與自動修復動作

4. ops/monitoring/validate_coverage.py - 覆蓋率驗證
   - CI 階段執行
   - 檢測未監控服務
   - 生成覆蓋率報告

設計原則:
- 監控即代碼 (Monitoring as Code)
- 新服務必須在 registry 註冊才能部署
- 自動發現機制防止遺漏

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 01:52:08 +08:00
OG T
94d6a0c720 docs(ai): 更新 ADR-036 和 LOGBOOK - P3 優化記錄
- ADR-036 v1.4: P3 優化完成 (95/100)
- LOGBOOK: Phase 20 P1+P2+P3 全部完成
- 測試: 34/34 PASSED

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 01:51:35 +08:00
OG T
179e659f14 chore: 清理 Playwright 產物 + kube-state-metrics 告警擴充
清理工作:
- .gitignore 新增 playwright-report/ 和 test-results/ 排除
- 保留 phase19/ 參考截圖目錄

kube-state-metrics 告警擴充 (P3):
- CronJobLastRunFailed: Job 執行失敗
- DaemonSetMissingPods: DaemonSet 缺少 Pod
- StatefulSetReplicasMismatch: StatefulSet 副本不足
- ContainerWaiting: ImagePullBackOff/CrashLoopBackOff 偵測
- PDBViolation: PDB 健康 Pod 數不足
- NodeUnschedulable: 節點標記為不可排程

新增:
- apps/api/scripts/test_nemotron_tool_calling.py (E2E 比較測試)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 01:28:35 +08:00
OG T
ee2bceefff feat(monitoring): Phase 19.6 測試文檔 + P1-P3 改進 + 首席架構師審查
Phase 19.6 測試文檔收尾:
- E2E 測試擴充至 18 項 (Terminal/GenUI 驗證)
- 新增 PHASE19-VERIFICATION-CHECKLIST.md (完整驗證清單)

P1 驗證:
- ArgoCD Metrics NodePort 監控 (30883/30884)
- TLS 證書監控 (Blackbox Exporter 9115)

P2 改進:
- waitForTimeout → waitForLoadState('networkidle')
- 跨平台快捷鍵 (Meta+J / Control+J)
- SKIP_MULTISIG_TESTS 環境變數控制
- Prometheus GitOps 部署腳本

P3 改進:
- HPA maxReplicas 4 → 6 (API/Web)

首席架構師審查: 47/50 OUTSTANDING (94%)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 01:19:26 +08:00
OG T
b77e151387 feat(ai): ADR-036 NVIDIA Nemotron Tool Calling 整合
Phase 20 - 提升 Tool Calling 精準度 50% → 83.3%

新增:
- src/models/nvidia.py: Pydantic Schema
- src/services/nvidia_provider.py: NvidiaProvider 類別
- tests/test_nvidia_provider.py: 15 項單元測試 (全部通過)

修改:
- ai_router.py: AIProvider.NVIDIA + route_tool_calling()
- ai_rate_limiter.py: NVIDIA 限制 (5 RPM, 100/day)
- models.json: NVIDIA 配置
- cd.yaml: Secrets 注入 NVIDIA_API_KEY

路由策略:
- Tool Calling: Nemotron → Gemini → Claude
- 一般對話: Ollama → Gemini → Claude (不變)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 00:00:08 +08:00
OG T
a30f766eb1 feat(monitoring): 首席架構師完整審查 + 補充告警規則
## 首席架構師審查結果: 198/200 (99%) EXCEPTIONAL

### 審查範圍
- 架構設計: 50/50 
- 安全性: 49/50
- 模組化合規: 50/50 
- 監控告警: 49/50
- E2E 測試: 49/50

### 新增補充告警 (12 條)
- RedisDown, PostgreSQLDown, OllamaDown, OpenClawDown
- HarborDown, LangfuseDown
- HPAMaxedOut, HPAScalingDisabled
- WorkerUnavailable
- NodeHighCPU, NodeHighMemory, ContainerOOMKilled

### 檔案
- k8s/monitoring/k3s-alerts-supplemental.yaml

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-28 23:30:44 +08:00
OG T
f0572ae906 feat(k4.3): Pod Security Standards + Grafana Dashboard
K4.3 Pod Security Standards:
- awoooi-prod: baseline
- kube-state-metrics: baseline
- kured: privileged (hostPID required)
- descheduler: restricted
- velero: baseline
- argocd: baseline

Grafana Dashboard:
- K3s Cluster Overview (9 panels)
- Nodes, Pods, HPA, Velero, Alerts

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-28 23:16:54 +08:00
OG T
863fc5a426 docs: 新增監控告警完整流程文檔 (2026-03-29 ogt)
內容:
- 8 層架構圖 (ASCII)
- 工具/服務清單表格
- 配置/代碼檔案清單
- 完整資料流說明
- E2E 驗證機制 (ADR-025/035)
- 故障排查指南

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-28 22:25:14 +08:00
OG T
1a4be7b18a feat(k-mon): K3s monitoring integration (Phase K-MON)
- Add Velero metrics NodePort service (30885)
- Add K3s infrastructure alert rules:
  - VIP 6443 availability
  - Node ICMP checks
  - AWOOOI API/Web TCP checks
  - SignOz/Sentry availability
- Add Velero backup alerts (failed/missing)
- Add ADR-034 for ArgoCD GitOps adoption

Deployed to:
- K3s: velero-metrics service
- 188: Prometheus + Alertmanager configs

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-28 21:57:57 +08:00
OG T
6a38c0c968 fix(cd): ADR-035 Telegram Secrets 自動注入三層防護
🔴 事故根因: K8s Secrets 未注入,Telegram 告警長時間失效
- kustomization.yaml 說「由 CI/CD 處理」但 CD 從未執行

🛡️ 三層防護機制:
- Layer 1: Pre-flight 檢查 GitHub Secrets 存在
- Layer 2: Deploy 時 kubectl patch secret 自動注入
- Layer 3: Post-Deploy E2E 測試告警驗證

📄 文件更新:
- ADR-035: docs/adr/ADR-035-telegram-alert-chain-enforcement.md
- DevOps Skill v1.9: 新增 Secrets 注入鐵律
- CLAUDE.md: 新增告警鏈路章節
- LOGBOOK: 事故記錄

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-28 21:47:49 +08:00
OG T
66fb56c691 feat(k8s): Phase K2 自動化維運完成
- K2.4 NPD: Node Problem Detector (DaemonSet)
- K2.3 VPA: 3 Vertical Pod Autoscaler (Off 模式)
- K2.1 ArgoCD: v3.3.6 @ :30443 (GitOps)
- K2.2 Sealed Secrets: v0.26.0 (加密 Secrets)

新增檔案:
- k8s/npd/node-problem-detector.yaml
- k8s/awoooi-prod/11-vpa.yaml

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-28 21:27:05 +08:00
OG T
d3e6b59b86 docs: K1 Velero 備份系統完成
- MinIO 部署 (192.168.0.188:9000/9001)
- Velero v1.13.0 部署到 K3s
- daily-awoooi-prod Schedule (每日 02:00)
- 測試備份成功 (153 items / 30 天保留)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-28 21:16:27 +08:00
OG T
eea6e3acc3 feat(k8s): 新增 Velero 備份系統 (K1.1)
Phase K1 災難恢復:
- MinIO 部署在 192.168.0.188:9000/9001
- Velero v1.13.0 完整安裝 manifests
- velero-backups bucket 已建立
- README 含部署與使用指南

部署方式:
  ssh wooo@192.168.0.120
  sudo kubectl apply -f k8s/velero/velero-install-full.yaml

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-28 20:53:02 +08:00
OG T
269c81bdbb fix(k8s): OpenClaw 端口統一 8088→8089
- ConfigMap: OPENCLAW_URL 更新為 8089
- NetworkPolicy: 允許 8089 出站
- SERVICE-ENDPOINTS.md: 移除 legacy 8088 引用

2026-03-28 清理舊配置,統一使用正式端口

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-28 20:32:30 +08:00
OG T
e03d99b871 docs(runbook): K3s 優化 Runbook v1.2 - 標記完成狀態
Phase 完成狀態:
- K0  Swap/PDB/備份/清理 (首席架構師 9.0/10)
- K-NET  VIP 192.168.0.125 + CI/CD 整合
- K-CLEAN  9 RS + 1 Job 清理

K-HA 📋 另案規劃 (需維護窗口)

更新:
- 版本號 1.1 → 1.2
- 目錄標記完成狀態
- 各 Phase 加入執行結果
- 附錄 A 實際執行時間線
- 問題統計 (清理前後對照)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-28 18:52:13 +08:00
OG T
9fa996c9fe fix(cicd): 修正 OTEL 端點配置 192.168.0.121→188
問題: CI/CD workflows 指向錯誤的 OTEL 端點
- ci.yaml: 121:4318 → 188:24318
- cd.yaml: 121:4318 → 188:24318

SignOz 實際運行在 192.168.0.188 (AI+Web 中心)

更新:
- Skill 04 v1.8 加入可觀測性端點規範
- LOGBOOK 記錄配置修正

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-28 18:47:23 +08:00
OG T
d206460751 feat(security): Phase 20 CSRF 防護實作
Phase 19 首席架構師審查指出: 核鑰 UX 安全性缺 CSRF 防護

後端:
- 新增 src/core/csrf.py (Double Submit Cookie 模式)
- 新增 src/api/v1/csrf.py (GET /api/v1/csrf/token)
- 新增 src/models/csrf.py (CSRFTokenResponse)
- 修改 approvals.py sign/reject/bulk 端點加入 CSRFToken 驗證

前端:
- 新增 hooks/useCSRF.ts (React Hook)
- 修改 approval.store.ts 整合 CSRF Token 參數

安全特性:
- 256-bit Token (secrets.token_hex)
- 時序安全比較 (secrets.compare_digest)
- SameSite=Strict Cookie
- 1 小時 Token 有效期

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-28 18:31:58 +08:00
OG T
7b9b0c490b feat(phase19): Omni-Terminal 100% 完成 + 首席架構師審查 47/50
## Phase 19 Omni-Terminal (Wave 0-6 全部完成)

### 核心功能
- SSE 狀態機 (7-State 設計,10/10 分)
- GenUI 動態渲染 (6 張卡片 + Zod Schema 驗證)
- 核鑰 UX (長按授權 + 風險分級)
- Terminal Telemetry (Sentry 整合)

### P0-P2 修復
- P0: Singleton → FastAPI Depends 依賴注入
- P1: Zod Schema 升級 (7 個驗證 Schema)
- P1: 錯誤分類碼聚合 (Sentry fingerprint)
- P2: Slow Query 監控 (5s 警告 / 10s 嚴重)

### 測試
- test_terminal_service.py: 54 項測試全通過
- 意圖分類: 42 個測試案例 (9 種 IntentType)

### 文檔
- ADR-031: SSE 架構實作紀錄
- ADR-032: GenUI 渲染實作紀錄
- Skills: v1.9 (後端 Terminal 章節)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-28 18:04:12 +08:00
OG T
3e5315aaf8 docs(k3s): 首席架構師審查完成 46/50 (92%)
K3s 優化工作審查完成:

- ADR-033: Phase K0 + K-NET 標記為已完成
- 09-pdb.yaml: Worker PDB 設計說明註釋
- DevOps Skill: 新增 keepalived 快速操作參考

審查結果:
- 架構合規性: 9/10
- Runbook 完整性: 10/10 
- 模組化合規: 9/10
- 風險控制: 9/10
- 文檔完整性: 9/10

P2 問題已修復,無 P0/P1 阻擋項

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-28 18:00:07 +08:00
OG T
efb80b403e feat(k8s): Phase K0.5 Startup Probe + PDB + revisionHistoryLimit
K3s 生產級優化 Phase K0 變更:

- 新增 startupProbe 到 API/Web/Worker Deployment (60s 啟動時間)
- 新增 revisionHistoryLimit: 3 (減少孤立 ReplicaSet)
- 新增 09-pdb.yaml (PodDisruptionBudget 保護)
- 新增 K3S-OPTIMIZATION-RUNBOOK.md (執行手冊)
- 修正 selector 對齊現有 Deployment (app+environment+system)

首席架構師審查: 9.0/10 

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-28 11:13:44 +08:00