Commit Graph

29 Commits

Author SHA1 Message Date
OG T
5ea6c3fb91 feat: alert_operation_log 查詢 API + 前端頁面 (Sprint 5.2)
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
後端:
- 新增 list_recent() 分頁方法 (alert_operation_log_repository)
- 新增 /api/v1/alert-operation-logs GET + /stats 端點
- main.py 註冊 alert_operation_logs_v1.router

前端:
- /alert-operation-logs 頁面,18 種 event_type 顏色標記
- 分頁、event_type 篩選、incident_id 篩選
- 24h 統計卡片 (總數/護欄攔截/自動修復/已解決)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-09 10:57:40 +08:00
OG T
59e7879dfb feat(webhook): Task 5 — tests GitHub→Gitea (ADR-059)
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
- test_gitea_webhook.py: 10 tests, X-Gitea-* headers
- conftest.py: GITEA_WEBHOOK_SECRET / GITEA_ALLOWED_REPOS

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 14:45:32 +08:00
OG T
7a6fa6359e feat(api): Sentry init 加入統一 layer/component 標籤
對齊 Prometheus 告警標籤規範 (layer/component/team)
讓 Sentry 事件與 auto_repair 路由決策保持一致

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 11:10:40 +08:00
OG T
f4f454fd98 feat(api): 重開機後自動 warm-up Redis Working Memory from PostgreSQL
- main.py lifespan: 啟動時從 DB restore INVESTIGATING/MITIGATING incidents
- scripts/reboot-recovery: 188 + 110 自動化腳本 + systemd services
- scripts/reboot-recovery: aiops-network 自動建立 (ClawBot 依賴)
- docs/runbooks/REBOOT-RECOVERY-SOP.md: 完整改寫,含自動化腳本說明

Why: 重開機後 Redis 清空導致前端 incidents 顯示 0 筆(DB 完整保存)
統帥批准: 「所有數據必須被長久記錄下來」

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 00:39:20 +08:00
OG T
3455044457 feat(phase25): Nemotron 主動防禦三方向 P0+P1+P2 完整實作
Some checks failed
CD Pipeline / build-and-deploy (push) Failing after 38s
Type Sync Check / check-type-sync (push) Failing after 35s
P0 - DIAGNOSE Privacy-First Routing:
- ai_router.py: _local_fallback_chain [NEMOTRON→OLLAMA→REJECT]
- DIAGNOSE 意圖 override 改為 NEMOTRON (原 OLLAMA)
- DIAGNOSE fallback 使用 local-only 鏈,不觸碰雲端
- 全部失敗時 REJECT + Telegram 通知
- config.py: NEMOTRON_DIAGNOSE_TIMEOUT_SECONDS=30, OLLAMA_DIAGNOSE_TIMEOUT_SECONDS=60
- nemotron.py: 根據 context[task_type] 選擇 timeout

P1 - Knowledge Auto-Harvesting:
- models/knowledge.py: EntryType.AUTO_RUNBOOK + ANTI_PATTERN + symptoms_hash
- EntryStatus.PUBLISHED (ANTI_PATTERN 直接發布,無需審核)
- models/playbook.py: SymptomPattern.compute_hash() (16字元確定性 hash)
- services/runbook_generator.py: NemotronRunbookGenerator (v1.1)
  - generate_runbook() → AUTO_RUNBOOK (DRAFT) + Telegram 審核 card
  - generate_anti_pattern() → ANTI_PATTERN (PUBLISHED) + Telegram 通知
  - 使用 nvidia.chat() (正確介面),Nemotron 超時時 Minimal fallback
- knowledge_service.py: check_anti_pattern(symptoms_hash, days=7)
- db/models.py: symptoms_hash VARCHAR(16) + ix_knowledge_symptoms_hash
- repositories/knowledge_repository.py: create() 支援 symptoms_hash + status
- auto_repair_service.py: anti_pattern_gate 在 decide() + runbook hook 在 execute()
- migrations/phase8_symptoms_hash.sql: ALTER TABLE + partial index + PUBLISHED constraint

P2 - Config Drift Detection:
- models/drift.py: DriftItem/DriftReport/DriftLevel/DriftIntent/DriftStatus
- services/drift_detector.py: GitStateReader + K8sStateReader + DriftDetector
- services/drift_analyzer.py: 白名單過濾 + DriftLevel 分級
- services/drift_interpreter.py: NemotronDriftInterpreter(意圖分析,不生成修復指令)
- services/drift_remediator.py: rollback(kubectl apply) + adopt(git push gitea)
- api/v1/drift.py: POST /scan, GET /reports, POST /rollback, POST /adopt
- migrations/phase9_drift_reports.sql: drift_reports 表
- k8s/drift-cronjob.yaml: 每小時自動掃描 CronJob

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 12:35:05 +08:00
OG T
9cf9e851e7 fix(api): 修正 Nginx 反向代理 307 redirect http:// Location 問題
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
加入 ProxyHeadersMiddleware,讓 FastAPI 信任 X-Forwarded-Proto header。
解決知識庫頁面無法載入內容的問題 (HTTPS→HTTP mixed content block)。

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 12:48:36 +08:00
OG T
ce11fcdc3a feat(monitoring): 監控工具區塊 — Grafana/Prometheus/SigNoz/Gitea 狀態
- 新增 GET /api/v1/monitoring/status,asyncio.gather 並行探測四工具
- 前端 MonitoringTools 元件,60s 輪詢顯示狀態/版本/統計
- 新增 monitoringTools i18n key

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 00:27:47 +08:00
OG T
d8be78b135 feat(api): Knowledge Base Phase 1 後端四層架構
Some checks failed
CD Pipeline / build-and-deploy (push) Successful in 7m0s
E2E Health Check / e2e-health (push) Successful in 17s
Type Sync Check / check-type-sync (push) Failing after 30s
- models/knowledge.py: Pydantic Schema (EntryType/Source/Status/CRUD)
- db/models.py: KnowledgeEntryRecord ORM (PostgreSQL)
- repositories/interfaces.py: IKnowledgeRepository Protocol
- repositories/knowledge_repository.py: PostgreSQL CRUD 實作
- services/knowledge_service.py: 業務邏輯 (get_db_context 內部管理 session)
- api/v1/knowledge.py: REST Router (get_knowledge_service,無直接 DB 存取)
- main.py: 掛載 Knowledge Base Router
- k8s/jobs/migrate-knowledge-entries.yaml: DB Migration Job

API 端點: GET/POST / | GET/PATCH/DELETE /{id} | POST /{id}/approve
         GET /search | GET /categories

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 00:55:56 +08:00
OG T
d2f4708663 feat(cicd): #46c OTEL Tracing 遷移到 Gitea workflows
All checks were successful
E2E Health Check / e2e-health (push) Successful in 18s
- CD: awoooi-cd service (192.168.0.188:24318)
- E2E: awoooi-e2e service
- 環境變數: OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_SERVICE_NAME, OTEL_RESOURCE_ATTRIBUTES

原 GitHub workflows (cd7d63e) → Gitea workflows (ADR-039)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-31 11:39:42 +08:00
OG T
da6d6ed006 chore: trigger cd pipeline directly
Some checks failed
CD Pipeline / build-and-deploy (push) Failing after 28s
E2E Health Check / e2e-health (push) Successful in 17s
2026-03-29 22:38:59 +08:00
OG T
50c055b547 feat(api): Phase D-G P0 修正 - Learning Repository 積木化
新增:
- ILearningRepository Protocol (interfaces.py)
- LearningRepository (Redis 持久化層)
- Learning API 端點 (/api/v1/learning/*)
- LearningService.get_recommended_fix() 方法
- LearningService.get_learning_summary() 方法

修正:
- Service 不直接依賴 Redis Client (透過 Repository)
- 符合 leWOOOgo 積木化原則
- 首席架構師審查: 74/100 → 92/100

更新:
- ADR-030: 新增 Phase D-G P0 修正章節
- Skill 02: v1.9 → v2.0
- Runner 修復: 序列建構解決 _runner_file_commands 衝突

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 11:03:51 +08:00
OG T
59c9eff83a fix(api): 修復 10 個 Lint 錯誤 (imports 排序 + unused imports + set comprehension)
- F401: 移除未使用的 imports (TerminalSessionStatus, AutoApproveDecision, TerminalSession)
- I001: 修正 import blocks 排序
- C401: set(generator) → {set comprehension}

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-28 18:51:52 +08:00
OG T
d206460751 feat(security): Phase 20 CSRF 防護實作
Phase 19 首席架構師審查指出: 核鑰 UX 安全性缺 CSRF 防護

後端:
- 新增 src/core/csrf.py (Double Submit Cookie 模式)
- 新增 src/api/v1/csrf.py (GET /api/v1/csrf/token)
- 新增 src/models/csrf.py (CSRFTokenResponse)
- 修改 approvals.py sign/reject/bulk 端點加入 CSRFToken 驗證

前端:
- 新增 hooks/useCSRF.ts (React Hook)
- 修改 approval.store.ts 整合 CSRF Token 參數

安全特性:
- 256-bit Token (secrets.token_hex)
- 時序安全比較 (secrets.compare_digest)
- SameSite=Strict Cookie
- 1 小時 Token 有效期

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-28 18:31:58 +08:00
OG T
e5ded3b3f2 feat(phase19): OmniTerminal + GenUI + Hybrid SSE 架構實作 (Wave 0-2)
Phase 19 OmniTerminal MVP 完成:
- Wave 0: Backend (Hybrid SSE POST→GET 架構)
- Wave 1: Frontend (OmniTerminal 狀態機 + GenUI Registry)
- Wave 2: UI 組件 (8 個 GenUI 動態卡片)

ADR 文檔:
- ADR-031: OmniTerminal SSE 架構
- ADR-032: GenUI 動態渲染框架
- ADR-033: K3s HA 架構設計

GenUI 組件:
- GenUIRenderer, K8sPodStatusCard, SentryErrorCard
- MetricsSummaryCard, IncidentTimelineCard
- TraceWaterfallCard, ApprovalCard, NuclearKeyButton

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-28 00:17:26 +08:00
OG T
7456492482 fix(api): 註冊 Sentry Webhook Router (Phase 10.2.1)
- 新增 sentry_webhook_v1 import
- include_router 註冊 /api/v1/webhooks/sentry/* 路由
- 修復 Sentry Alert Rule → AWOOOI 連線

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-27 16:13:04 +08:00
OG T
30153496d1 fix(api): 修復全部 lint 錯誤 (ruff --fix)
- Import sorting (I001)
- Unused imports (F401)
- f-string without placeholders (F541)
- Loop variable unused (B007)
- zip() strict parameter (B905)
- Exception chaining (B904)
- collections.abc imports (UP035)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-26 16:06:20 +08:00
OG T
698687f092 feat(api): #7 Playbook 萃取功能 (Phase 7.1-7.4)
實作內容:
- models/playbook.py: Playbook 資料模型 + Request/Response
- repositories/playbook_repository.py: Redis 雙層儲存
- repositories/interfaces.py: IPlaybookRepository Protocol
- services/playbook_service.py: 業務邏輯 (萃取/推薦/核准)
- api/v1/playbooks.py: REST API 端點

API 端點:
- POST /playbooks/extract/{incident_id} - 從成功案例萃取
- POST /playbooks/recommend - 症狀匹配推薦
- POST /playbooks/{id}/approve - 人工核准
- GET/PATCH/DELETE /playbooks/{id} - CRUD

遵循 leWOOOgo 積木化: Router → Service → Repository

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-26 10:54:13 +08:00
OG T
b6cff31653 feat(api): Phase 15.3 Deep Linking 三系統互連
實現 Sentry ↔ SignOz ↔ Langfuse 零斷鏈觀測:

新增 deep_linking.py:
- SignOz Trace URL 生成器
- Langfuse Trace URL 生成器
- Sentry Issue URL 生成器
- get_all_links() 統一取得所有連結

整合點:
- main.py: Sentry before_send 注入 otel_trace_id + signoz_trace_url
- langfuse_client.py: 自動注入 OTEL trace_id 到 metadata
- openclaw.py: SignOz span 記錄 langfuse.trace_id 反向連結

架構圖:
┌─────────┐ trace_id ┌─────────┐ trace_id ┌──────────┐
│ Sentry  │◄────────►│ SignOz  │◄────────►│ Langfuse │
│ Errors  │          │ Traces  │          │ LLMOps   │
└─────────┘          └─────────┘          └──────────┘

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-26 00:48:28 +08:00
OG T
643946e60c refactor(api): ADR-015 MCP 模組化架構重構
## 重構內容

符合 leWOOOgo 積木化原則:
- 新增 interfaces.py: MCPToolProvider ABC 定義
- 新增 registry.py: Provider 註冊中心 (DI 模式)
- 新增 providers/: K8s, SignOz, Database 具體實作
- 重構 mcp_bridge.py: 透過 ProviderRegistry 委派執行

## 修復 Code Review 問題

- 🔴 移除 _execute_stdio logging 敏感 parameters
- 🔴 修復 conversational-view.tsx i18n 硬編碼

## 新增檔案

- apps/api/src/plugins/mcp/interfaces.py
- apps/api/src/plugins/mcp/registry.py
- apps/api/src/plugins/mcp/providers/__init__.py
- apps/api/src/plugins/mcp/providers/k8s_provider.py
- apps/api/src/plugins/mcp/providers/signoz_provider.py
- apps/api/src/plugins/mcp/providers/database_provider.py
- docs/adr/ADR-015-mcp-modular-architecture.md
- .dependency-cruiser.cjs (Phase 14.2 準備)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-25 14:31:32 +08:00
OG T
75c991dbee fix(api): Sort imports to pass ruff I001 check
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-24 16:02:51 +08:00
OG T
9bff46a1b0 feat: integrate Sentry + fix CI/CD issues
Sentry Integration (補強 SignOz):
- Add @sentry/nextjs for frontend error tracking + session replay
- Add sentry-sdk[fastapi] for backend error tracking
- Create sentry.client/server/edge.config.ts
- Integrate with next.config.js + instrumentation.ts
- Add Sentry exception capture in FastAPI error handler
- Create deployment scripts for Self-Hosted @ 192.168.0.110

CI/CD Fixes:
- Fix F821 Undefined name 'Field' in incidents.py
- Add NEXT_PUBLIC_API_URL env var to CI build step
- Add build-arg to Docker build verification

E2E Test Improvements:
- Fix strict mode violations in dashboard-acceptance tests
- Add timeout increase for Phase 4 demo tests
- Make tests more resilient to UI variations

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-24 15:19:52 +08:00
OG T
765ee39a90 feat(api): Phase 6.5 Statistics API + Y/n 按鈕修復
新增:
- /stats/incidents/summary - 事件總覽統計
- /stats/incidents/resolution - 解決時間 P50/P95
- /stats/ai-performance - AI 提案效能
- /stats/services/affected - 受影響服務排名

修復:
- Y/n 按鈕永久禁用問題 (decision.state=completed 但 incident 未解決)
- decision_manager.py: 只有當 incident 也已解決才返回已完成的 decision

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-24 09:50:03 +08:00
OG T
6f049877fc fix(lint): ruff auto-fix + lewooogo-core src 加入 git
- Python: ruff --fix 修復 280 個 lint 錯誤
- lewooogo-core: src/ 目錄未追蹤,導致 CI eslint 失敗

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 23:51:37 +08:00
OG T
7d8eb26ebe feat(telegram): 新增心跳監控防止沉默盲點
功能:
- send_heartbeat(): 每 30 分鐘發送系統狀態
- start_heartbeat_monitor(): 背景心跳監控
- 沉默告警: 超過 2 小時沒訊息自動告警

目的:
- 避免 Telegram 長時間沒訊息被當成「系統穩定」
- 主動驗證告警鏈路是否正常運作

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 23:26:08 +08:00
OG T
e23493741a fix(telegram): respect C-Suite decision - OpenClaw is sole brain
架構修正 2026-03-23 (遵循 C-Suite 決議):
- 鐵律: .188 為唯一大腦,禁止腦分裂
- OpenClaw (192.168.0.188) = 唯一 Telegram Gateway
- AWOOOI API (K8s) = Web API + Sensor,不做 Polling
- TELEGRAM_ENABLE_POLLING 預設 False

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 19:25:08 +08:00
OG T
7478dc0254 feat(phase6-9): Complete modular architecture and Agent Teams
Phase 6.4 - Modular Architecture:
- Add lewooogo-brain adapters for LLM providers
- Add lewooogo-data dual memory (Redis + PostgreSQL)
- Implement consensus engine for multi-agent decisions
- Add incident memory service for historical context

Phase 9 - Agent Teams (Claude Agent SDK):
- Add base agent class with Claude Sonnet 4 integration
- Implement action planner, blast radius, and security agents
- Add agent API endpoints and proposal workflow
- Integrate ADR-009 OpenClaw Agent Teams architecture

DevOps & CI/CD:
- Add GitHub Actions CI/CD workflows (ci.yaml, cd.yaml)
- Add pre-commit hooks and secrets baseline
- Add docker-compose for local development
- Update Kubernetes network policies

Frontend Improvements:
- Add auto-healing error boundary component
- Update i18n messages for agent features
- Enhance dual-state incident card with execution feedback

Documentation:
- Add 7 ADRs covering MCP, design system, architecture decisions
- Update ARCHITECTURE_MEMORY.md with modular design
- Add GLOBAL_RULES.md and SOUL.md for project identity

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 18:40:36 +08:00
OG T
cb5d0ecfe4 feat(phase-6.4g-6.5b): API Synaptic Integration + Dual-State WarRoom UI
Phase 6.4g (API 突觸對接):
- lewooogo-brain dependency binding in apps/api/pyproject.toml
- POST /api/v1/incidents/{id}/propose route (proposals.py)
- Guardrails integration (8/8 tests passed)

Phase 6.5a (視覺皮層建置):
- DualStateIncidentCard.tsx with Nothing.tech visual compliance
- Ping radar animation for alert state
- Tier-based decision layer UI (AI 執行中 / 等待親核)

Phase 6.5b (神經網路串接):
- Main warroom page integration (page.tsx)
- IncidentResponse → DualState mapper function
- Empty state: "系統穩定。0 活躍異常。"

Tests:
- test_guardrails.py (8/8)
- test_incident_engine.py (6/6)
- test_skill_loader.py (6/6)
- Frontend build: 0 errors

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 11:58:28 +08:00
OG T
1576f2ab20 fix(db): eliminate SQLite brain-split, force PostgreSQL
Root cause: Worker used SQLITE_DATABASE_URL causing "no such table: incidents"
because each Pod had isolated SQLite file, not shared PostgreSQL.

Fixes:
- db/base.py: Use DATABASE_URL (PostgreSQL) instead of SQLITE_DATABASE_URL
- Added SQLite prohibition guard with logging
- Added pool_size and pool_pre_ping for production stability

New: packages/lewooogo-data PgMemoryProvider (Phase 6.4d)
- Episodic Memory implementation for PostgreSQL
- init_pg_engine() with auto table creation
- SQLite forbidden by Commander's decree

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 10:02:43 +08:00
OG T
196d269b92 feat: add all application source code
- apps/api: FastAPI backend with Dockerfile
- apps/web: Next.js frontend with Dockerfile
- apps/sensor: Signal collection agent
- packages: shared packages

Co-Authored-By: Claude <noreply@anthropic.com>
2026-03-22 18:57:44 +08:00