Commit Graph

127 Commits

Author SHA1 Message Date
OG T
3e5315aaf8 docs(k3s): 首席架構師審查完成 46/50 (92%)
K3s 優化工作審查完成:

- ADR-033: Phase K0 + K-NET 標記為已完成
- 09-pdb.yaml: Worker PDB 設計說明註釋
- DevOps Skill: 新增 keepalived 快速操作參考

審查結果:
- 架構合規性: 9/10
- Runbook 完整性: 10/10 
- 模組化合規: 9/10
- 風險控制: 9/10
- 文檔完整性: 9/10

P2 問題已修復,無 P0/P1 阻擋項

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-28 18:00:07 +08:00
OG T
e5ded3b3f2 feat(phase19): OmniTerminal + GenUI + Hybrid SSE 架構實作 (Wave 0-2)
Phase 19 OmniTerminal MVP 完成:
- Wave 0: Backend (Hybrid SSE POST→GET 架構)
- Wave 1: Frontend (OmniTerminal 狀態機 + GenUI Registry)
- Wave 2: UI 組件 (8 個 GenUI 動態卡片)

ADR 文檔:
- ADR-031: OmniTerminal SSE 架構
- ADR-032: GenUI 動態渲染框架
- ADR-033: K3s HA 架構設計

GenUI 組件:
- GenUIRenderer, K8sPodStatusCard, SentryErrorCard
- MetricsSummaryCard, IncidentTimelineCard
- TraceWaterfallCard, ApprovalCard, NuclearKeyButton

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-28 00:17:26 +08:00
OG T
4ee5376bd1 docs: 告警機制優化計畫 + ADR-030 Phase 6 + Skill 03 v1.5
- LOGBOOK: 新增告警機制完整審查記錄
- ADR-030: 新增 Phase 6 非同步分析優化章節
- Skill 03: v1.5 Stream Key 統一 + Telegram 去重

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-27 09:42:53 +08:00
OG T
d3a0ed4253 docs(adr): ADR-030 智能自動修復系統完整設計
五階段實施計畫:
- Phase 1: 智能診斷基礎  已完成
- Phase 2: 資料收集強化 (K8s Events + SignOz 深度整合)
- Phase 3: Playbook RAG (向量化 + 語意搜尋)
- Phase 4: 自動執行機制 (信任度 + 風險評估)
- Phase 5: 持續學習迴圈 (反饋 + 信任度調整)

架構相容性分析:
- 介面擴展點定義
- 資料庫 Schema 變更
- 風險評估與回滾計畫

預計時程: 10-15 週

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-26 21:48:41 +08:00
OG T
309a019cc3 docs: 記錄 Telegram 告警轟炸事故修復
更新:
- ADR-027: 新增緊急事故修復章節
- LOGBOOK: 記錄 2026-03-26 事故時間線
- Skill 02 v1.6: 新增 Telegram 去重機制章節

根因: Phase 6.5 修改 + INC- 前綴重複
修復: Redis 去重 (10 分鐘) + 前綴檢查

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-26 20:13:07 +08:00
OG T
fb03430469 feat(api): ADR-027 Phase 2 - 簽核/拒絕後自動同步 Incident 狀態
Router 整合點:
- POST /approvals/{id}/sign → on_approval_status_change("approved")
- POST /approvals/{id}/reject → on_approval_status_change("rejected")
- POST /approvals/bulk-approve → 批次同步

變更:
- 移除舊的 resolve_incident_after_approval() 調用
- 改用 IncidentApprovalService.on_approval_status_change()
- 同步失敗不阻斷主流程 (容錯設計)

ADR-027 進度: Phase 1-2  完成

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-26 19:44:59 +08:00
OG T
2f5986df5c docs: ADR 整理與新增 (021-029)
ADR 編號修正:
- ADR-023 failure-auto-repair → ADR-028
- ADR-025 cicd-ai-integration → ADR-029

新增 ADR:
- ADR-021: Playbook 更新驗證
- ADR-022: Sentry 整合架構
- ADR-027: Incident-Approval 同步
- ADR-028: 失敗自動修復閉環
- ADR-029: CI/CD AI 整合 (原 ADR-025)

更新:
- ADR-018: LLM 測試策略狀態更新

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-26 19:09:08 +08:00
OG T
0a9d94d82b feat(k8s): CoreDNS GitOps 架構 (ADR-026)
問題: DNS 配置沒有版本控制,手動修改易遺失

架構:
- k8s/k3s-system/coredns-custom.yaml: HelmChartConfig
- CD workflow: k3s-system 路徑偵測 + 自動 apply
- ADR-026: CoreDNS GitOps 管控架構

DNS 上游:
- 使用 8.8.8.8 + 1.1.1.1
- 禁止 /etc/resolv.conf (systemd-resolved)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-26 18:43:28 +08:00
OG T
6e3a7fca20 docs: ADR-006 v1.2 Rate Limiter + LOGBOOK 更新
- ADR-006: 新增 Rate Limiter 實作章節 (v1.2)
- LOGBOOK: 記錄 Gemini 切換 + Rate Limiter 上線

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-26 18:16:45 +08:00
OG T
30145c7d7e docs: ADR-025 CI/CD AI 整合架構 + Skill 07 更新
- ADR-025: 文檔化 Phase 13.1 CI/CD AI 整合架構決策
  - GitHub Webhook 事件驅動流程
  - 風險分級執行決策 (AUTO/TELEGRAM/APPROVAL/BLOCKED)
  - SignOz Log 整合
- Skill 07 v1.3: 新增 Grafana MCP + SignOz query_logs

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-26 15:41:26 +08:00
OG T
14c81f728f docs: 新增 ADR-025 告警鏈路 E2E 驗證 + 更新 Skills
新增:
- ADR-025: 告警鏈路 E2E 驗證架構 (2026-03-26 事故教訓)

更新:
- ADR-011: 新增 DNS 規則最佳實踐 (附錄 B)
- Skill 04: 新增 NetworkPolicy DNS 規則 + CoreDNS 設定
- Skill 05: 新增告警鏈路 Smoke Test 要求
- CLAUDE.md: 新增告警鏈路驗證到任務前必讀

事故根因:
1. URL 路徑錯誤 (webhook vs webhooks)
2. NetworkPolicy DNS 規則標籤不匹配
3. CoreDNS 上游 DNS 依賴 systemd-resolved

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-26 15:34:12 +08:00
OG T
579da38b8b feat(api): Phase 13 智能路由 + CI/CD 整合 (#74-88)
Phase 13.1 CI/CD Integration:
- #76 workflow_run handler for CI failure diagnosis
- #77 SignOz log query (query_logs, error_logs_summary MCP)
- #78 CIAutoRepairService with risk-based execution decisions

Phase 13.3 Smart Routing:
- #85 Intent Classifier v2.0 (rule engine + LLM fallback)
- #86 Complexity Scorer (9-dimension scoring)
- #87 AI Router v3.0 (routing decision matrix)
- #88 Token Counter (OTEL + Langfuse integration)

New files:
- services/ci_auto_repair.py (risk stratification)
- services/model_registry.py (centralized model config)
- services/token_counter.py (677 lines)
- Skill 08: Model Router Expert
- Skill 09: Strangler Pattern Expert
- ADR-023: Smart Routing Architecture
- ADR-024: API Layer Architecture

Tests:
- phase11-conversational.spec.ts (E2E tests)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-26 15:32:52 +08:00
OG T
30f045bf28 feat: ADR-019 System Prompt 集中管理 + Nightly LLM Workflow
新增:
- docs/adr/ADR-019-system-prompt-management.md - System Prompt 規範
- apps/api/src/core/prompts.py - 集中管理 System Prompts
- .github/workflows/nightly-llm.yaml - 每夜 LLM 迴歸測試

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-26 12:27:47 +08:00
OG T
edecf7a053 docs: ADR-020 E2E 驗證框架規範
Phase 18.3 配套決策文檔:
- E2E 驗證腳本架構 (5 步驟標準)
- Safe Label 防護機制
- Daily Health Check 排程規範
- 目標資源驗證要求

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-26 12:27:36 +08:00
OG T
96c3ddd8c4 feat(api): Phase 18.1 K8s 資源名稱驗證 (ADR-016)
三層防禦架構確保 kubectl 指令有效:
1. Webhook 入口正規化 (webhooks.py)
2. OpenClaw 產生指令前驗證 (openclaw.py)
3. 靜態映射表 + 模糊匹配 (k8s_naming.py, resource_resolver.py)

新增:
- src/utils/k8s_naming.py: RFC 1123 正規化 + 靜態映射
- src/services/resource_resolver.py: MCP K8s Tool 動態驗證
- docs/adr/ADR-016-k8s-resource-naming.md: 契約文檔
- scripts/e2e_tool_call_verification.py: E2E 驗證腳本 v2.0

修改:
- webhooks.py: Phase 18.1.7 入口正規化
- openclaw.py: Phase 18.1.6 產生指令前驗證
- Skill 03 v1.4: 新增 K8s 資源驗證章節

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-26 11:22:47 +08:00
OG T
fe7fd7a3e0 feat(tests): ADR-018 LLM 測試策略三層架構
問題: LLM 測試因模型波動導致 CI 失敗

解決方案: 三層測試策略
- Tier 1 (CI): Schema 驗證 + Golden Responses
- Tier 2 (Nightly): 屬性測試 + Live LLM
- Tier 3 (Weekly): 語意相似度測試

新增檔案:
- ADR-018-llm-testing-strategy.md
- tests/llm_testing/ 框架
  - schema_validators.py: Pydantic Schema 驗證
  - property_validators.py: kubectl/風險等級驗證
  - golden_responses.py: 預錄回應管理
- tests/test_llm_tier1_schema.py: 35 個 Tier 1 測試

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-26 11:17:00 +08:00
OG T
8a163609bf docs(adr): 更新 ADR-006/009/015 狀態
ADR-015: 標記為「已實作」 (Phase 16 R1 完成)
ADR-009: 標記為「已實作」 (Phase 9.1-9.5 全部完成)
ADR-006: 新增智能路由整合章節 (Phase 13.3)

首席架構師 ADR 審計 P0/P1 完成

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-26 10:45:29 +08:00
OG T
0003098c55 docs(adr): ADR-017 LLMOps Observability 三層觀測架構
建立 Phase 15 LLMOps 觀測架構決策文件,記錄:
- 三層觀測架構 (Langfuse + SignOz + Sentry)
- Langfuse 整合與 Deep Linking 實作
- Redis Streams Trace Context 傳遞機制
- 取樣率策略與成本估算

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-26 10:13:12 +08:00
OG T
24e35fee1b docs(adr): ADR-016 智能路由 (Smart Routing)
新增 Intent + Complexity → Model Selection 架構決策文件,
作為 ADR-006 (AI Fallback) 的補充,實現動態模型選擇。

- IntentClassifier: 關鍵字優先 + LLM 備援
- ComplexityScorer: 規則引擎加權評分
- AIRouter: 整合路由決策

Phase 13.3 #85-87

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-26 10:13:05 +08:00
OG T
42659a271a docs(adr): ADR-014 Dependency Governance 依賴治理
建立前端依賴治理規範文件,.dependency-cruiser.cjs 已參照此 ADR。

內容包含:
- Layer Model 四層架構定義
- Feature Isolation 規則說明
- CI 整合配置 (pnpm dep-check)
- Severity 分級策略

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-26 10:12:43 +08:00
OG T
604e38cf07 docs: Phase 14 紅區治理 + Skills 01/03 更新
- CLAUDE.md: 紅區治理章節
- Skills 01/03: 版本更新
- ADR/Architecture: 標準化

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-26 09:55:47 +08:00
OG T
643946e60c refactor(api): ADR-015 MCP 模組化架構重構
## 重構內容

符合 leWOOOgo 積木化原則:
- 新增 interfaces.py: MCPToolProvider ABC 定義
- 新增 registry.py: Provider 註冊中心 (DI 模式)
- 新增 providers/: K8s, SignOz, Database 具體實作
- 重構 mcp_bridge.py: 透過 ProviderRegistry 委派執行

## 修復 Code Review 問題

- 🔴 移除 _execute_stdio logging 敏感 parameters
- 🔴 修復 conversational-view.tsx i18n 硬編碼

## 新增檔案

- apps/api/src/plugins/mcp/interfaces.py
- apps/api/src/plugins/mcp/registry.py
- apps/api/src/plugins/mcp/providers/__init__.py
- apps/api/src/plugins/mcp/providers/k8s_provider.py
- apps/api/src/plugins/mcp/providers/signoz_provider.py
- apps/api/src/plugins/mcp/providers/database_provider.py
- docs/adr/ADR-015-mcp-modular-architecture.md
- .dependency-cruiser.cjs (Phase 14.2 準備)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-25 14:31:32 +08:00
OG T
9bff46a1b0 feat: integrate Sentry + fix CI/CD issues
Sentry Integration (補強 SignOz):
- Add @sentry/nextjs for frontend error tracking + session replay
- Add sentry-sdk[fastapi] for backend error tracking
- Create sentry.client/server/edge.config.ts
- Integrate with next.config.js + instrumentation.ts
- Add Sentry exception capture in FastAPI error handler
- Create deployment scripts for Self-Hosted @ 192.168.0.110

CI/CD Fixes:
- Fix F821 Undefined name 'Field' in incidents.py
- Add NEXT_PUBLIC_API_URL env var to CI build step
- Add build-arg to Docker build verification

E2E Test Improvements:
- Fix strict mode violations in dashboard-acceptance tests
- Add timeout increase for Phase 4 demo tests
- Make tests more resilient to UI variations

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-24 15:19:52 +08:00
OG T
2b1264df05 docs: 完整治理架構 ADR-010/011/012 + CLAUDE.md 鐵律更新
2026-03-23 重大事故修復與治理:

1. ADR-010: Secrets 集中管理 (Bitwarden + Sealed Secrets)
2. ADR-011: NetworkPolicy 變更治理 (偵測 + 告警 + 人工決策)
3. ADR-012: 危險操作治理 (Tier 分級 + CI/CD 攔截 + 審計)
4. UX-001: 告警疲勞解決方案 (時間衰減 + 智慧分組)

CLAUDE.md 更新:
- 新增最高優先級鐵律 (禁止 ClawBot、OpenClaw 核心、禁止危險 API)
- 新增任務開始前必讀 Memory 對照表

事故教訓:
- Telegram Token 連續三次被 logOut 失效
- AWOOOI API 程式碼呼叫 logOut 導致災難
- 已停用 AWOOOI API Telegram,OpenClaw 為唯一 Gateway

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 19:44:56 +08:00
OG T
7478dc0254 feat(phase6-9): Complete modular architecture and Agent Teams
Phase 6.4 - Modular Architecture:
- Add lewooogo-brain adapters for LLM providers
- Add lewooogo-data dual memory (Redis + PostgreSQL)
- Implement consensus engine for multi-agent decisions
- Add incident memory service for historical context

Phase 9 - Agent Teams (Claude Agent SDK):
- Add base agent class with Claude Sonnet 4 integration
- Implement action planner, blast radius, and security agents
- Add agent API endpoints and proposal workflow
- Integrate ADR-009 OpenClaw Agent Teams architecture

DevOps & CI/CD:
- Add GitHub Actions CI/CD workflows (ci.yaml, cd.yaml)
- Add pre-commit hooks and secrets baseline
- Add docker-compose for local development
- Update Kubernetes network policies

Frontend Improvements:
- Add auto-healing error boundary component
- Update i18n messages for agent features
- Enhance dual-state incident card with execution feedback

Documentation:
- Add 7 ADRs covering MCP, design system, architecture decisions
- Update ARCHITECTURE_MEMORY.md with modular design
- Add GLOBAL_RULES.md and SOUL.md for project identity

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 18:40:36 +08:00
OG T
80d0ef4a8f feat(packages): Phase 6.4a-c leWOOOgo modular architecture
New packages:
- packages/lewooogo-brain: AI reasoning & decision engine
  - IProposalEngine interface (ABC)
  - IIncidentProcessor interface (ABC)
  - Pydantic models: Proposal, Guardrails, Incident, Signal

- packages/lewooogo-data: Memory provider abstraction
  - IMemoryProvider interface (ABC)
  - IDualMemoryProvider for Working + Episodic memory
  - Generic type support for flexible data models

Documentation:
- ADR-008: Python modular packages architecture decision
- ARCHITECTURE_MEMORY.md: Module map index for AI developers
- LOGBOOK.md: Updated milestones and Phase 6.4 status

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 09:32:07 +08:00
OG T
ccdf757edd chore: initial commit for AWOOOI project
Phase 0 Day 1 - Project initialization:
- Independent repository (Option A)
- .awoooi-agent-rules.md (AI development contract)
- Project skeleton (apps/web, apps/api, packages, docs)
- ADR template for architecture decisions
- LOGBOOK for progress tracking

Strategic decision: 2026-03-19 Operation Cyber-Shell
Reference: /wooo-aiops/docs/meetings/2026-03-19_FRONTEND_RESTRUCTURE_STRATEGY.md

Co-Authored-By: Claude Code <noreply@anthropic.com>
2026-03-19 19:16:12 +08:00