Phase 6.4 - Modular Architecture: - Add lewooogo-brain adapters for LLM providers - Add lewooogo-data dual memory (Redis + PostgreSQL) - Implement consensus engine for multi-agent decisions - Add incident memory service for historical context Phase 9 - Agent Teams (Claude Agent SDK): - Add base agent class with Claude Sonnet 4 integration - Implement action planner, blast radius, and security agents - Add agent API endpoints and proposal workflow - Integrate ADR-009 OpenClaw Agent Teams architecture DevOps & CI/CD: - Add GitHub Actions CI/CD workflows (ci.yaml, cd.yaml) - Add pre-commit hooks and secrets baseline - Add docker-compose for local development - Update Kubernetes network policies Frontend Improvements: - Add auto-healing error boundary component - Update i18n messages for agent features - Enhance dual-state incident card with execution feedback Documentation: - Add 7 ADRs covering MCP, design system, architecture decisions - Update ARCHITECTURE_MEMORY.md with modular design - Add GLOBAL_RULES.md and SOUL.md for project identity Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
435 lines
21 KiB
Markdown
435 lines
21 KiB
Markdown
<div align="center">
|
||
|
||
```
|
||
█████╗ ██╗ ██╗ ██████╗ ██████╗ ██████╗ ██╗
|
||
██╔══██╗██║ ██║██╔═══██╗██╔═══██╗██╔═══██╗██║
|
||
███████║██║ █╗ ██║██║ ██║██║ ██║██║ ██║██║
|
||
██╔══██║██║███╗██║██║ ██║██║ ██║██║ ██║██║
|
||
██║ ██║╚███╔███╔╝╚██████╔╝╚██████╔╝╚██████╔╝██║
|
||
╚═╝ ╚═╝ ╚══╝╚══╝ ╚═════╝ ╚═════╝ ╚═════╝ ╚═╝
|
||
```
|
||
|
||
### **Zero-Touch Ops. Human-Centric Decisions.**
|
||
|
||
*AI-Powered Intelligent Operations Platform*
|
||
|
||
[](https://opensource.org/licenses/MIT)
|
||
[](https://www.python.org/downloads/)
|
||
[](https://nextjs.org/)
|
||
[](https://www.typescriptlang.org/)
|
||
|
||
[Demo](#-quick-start) · [Documentation](#-architecture) · [Contributing](#-contributing)
|
||
|
||
</div>
|
||
|
||
---
|
||
|
||
## The Future of Operations is Here
|
||
|
||
> **When your system breaks at 3 AM, AWOOOI doesn't just alert you—it analyzes the blast radius, calculates how much money you're burning, and presents a one-click fix. You approve. It executes. You go back to sleep.**
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||
│ │
|
||
│ ALERT: frontend 5xx rate > 15% │
|
||
│ │
|
||
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
|
||
│ │ GraphRAG │ ──▶ │ Dry-Run │ ──▶ │ Multi-Sig │ │
|
||
│ │ Analysis │ │ Simulation │ │ Approval │ │
|
||
│ └─────────────┘ └─────────────┘ └─────────────┘ │
|
||
│ │ │ │ │
|
||
│ ▼ ▼ ▼ │
|
||
│ Root Cause: Blast Radius: [x] devops-alice │
|
||
│ postgres-db 1 pod, 0 data loss [x] sre-bob │
|
||
│ │
|
||
│ Monthly Savings: $523.60 if fixed │
|
||
│ │
|
||
│ [ APPROVE & EXECUTE ] │
|
||
│ │
|
||
└─────────────────────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
**AWOOOI** (AI + WOOO Intelligent Operations) transforms reactive firefighting into proactive, AI-assisted decision-making—while keeping humans firmly in control of critical actions.
|
||
|
||
---
|
||
|
||
## Enterprise Moats
|
||
|
||
Four pillars that make AWOOOI enterprise-ready from Day 1:
|
||
|
||
### Privacy Shield
|
||
|
||
> **Your PII never leaves your premises. Period.**
|
||
|
||
```python
|
||
# Before: Raw sensitive data
|
||
"User 192.168.1.100 with email admin@company.com triggered alert"
|
||
|
||
# After: Consistent pseudonymization
|
||
"User [IP_1] with email [EMAIL_1] triggered alert"
|
||
# Same value → Same label (AI maintains context without seeing real data)
|
||
```
|
||
|
||
- Regex-based detection: IP, Email, UUID, API Keys, JWT
|
||
- Consistent hashing: `[IP_1]` always maps to the same IP within a session
|
||
- **Rehydration Engine**: Labels restored only at MCP execution boundary
|
||
- Zero PII in logs, zero PII to cloud LLMs
|
||
|
||
---
|
||
|
||
### GraphRAG: Topology-Aware Intelligence
|
||
|
||
> **AI that understands your microservices like a senior SRE.**
|
||
|
||
```
|
||
┌─────────────────────────────────────┐
|
||
│ BLAST RADIUS ANALYSIS │
|
||
│ (Upstream Impact) │
|
||
└─────────────────────────────────────┘
|
||
|
||
┌─────────────┐
|
||
│ ingress │ ← Will be affected
|
||
└──────┬──────┘
|
||
│ depends on
|
||
▼
|
||
┌─────────────┐
|
||
│ frontend │ ← Target service
|
||
└──────┬──────┘
|
||
│ calls
|
||
▼
|
||
┌───────────────────────┼───────────────────────┐
|
||
│ │ │
|
||
▼ ▼ ▼
|
||
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
|
||
│ auth-service │ │ product-api │ │ order-api │
|
||
└──────┬───────┘ └──────┬───────┘ └──────┬───────┘
|
||
│ │ │
|
||
└─────────────────────┼─────────────────────┘
|
||
▼
|
||
┌──────────────┐
|
||
│ postgres-db │ X ROOT CAUSE
|
||
└──────────────┘
|
||
```
|
||
|
||
- **BFS-based traversal** with configurable `max_depth` (default: 3)
|
||
- **Dual-direction analysis**: Upstream (blast radius) + Downstream (root cause)
|
||
- **Priority ranking**: DATABASE > CACHE > QUEUE for root cause identification
|
||
- **Multiple root causes**: No single-point assumptions—collect ALL unhealthy dependencies
|
||
|
||
---
|
||
|
||
### Multi-Sig & Dry-Run: Defense in Depth
|
||
|
||
> **Every critical action is simulated, validated, and co-signed.**
|
||
|
||
```
|
||
┌────────────────────────────────────────────────────────────────┐
|
||
│ RISK MATRIX │
|
||
├────────────┬─────────────┬─────────────────────────────────────┤
|
||
│ Risk Level │ Signatures │ Required Roles │
|
||
├────────────┼─────────────┼─────────────────────────────────────┤
|
||
│ LOW │ 0 (auto) │ — │
|
||
│ MEDIUM │ 1 │ admin, devops, sre │
|
||
│ HIGH │ 2 │ admin, devops, sre │
|
||
│ CRITICAL │ 2 │ CTO + CISO (mandatory) │
|
||
└────────────┴─────────────┴─────────────────────────────────────┘
|
||
```
|
||
|
||
**TOCTOU Protection** (Time-of-Check to Time-of-Use):
|
||
```
|
||
1. User clicks "Approve"
|
||
2. System re-runs Dry-Run immediately before execution
|
||
3. If state changed → Status = VOIDED (not cleared!)
|
||
4. Full audit trail preserved for compliance
|
||
```
|
||
|
||
**Dry-Run Checks**:
|
||
- RBAC Permission validation
|
||
- Syntax & parameter validation
|
||
- Resource existence verification
|
||
- PodDisruptionBudget compliance
|
||
- Blast radius calculation
|
||
|
||
---
|
||
|
||
### Progressive Autonomy: Trust That Evolves
|
||
|
||
> **The more you approve, the less you need to.**
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────────────┐
|
||
│ TRUST SCORE PROGRESSION │
|
||
├─────────────────────────────────────────────────────────────────┤
|
||
│ │
|
||
│ Score: 0 ──────────────────────────────────────────────▶ 10+ │
|
||
│ │ │ │ │
|
||
│ ▼ ▼ ▼ │
|
||
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
|
||
│ │ HIGH │ ──▶ │ MEDIUM │ ──▶ │ LOW │ │
|
||
│ │ 2-sig │ @10 │ 1-sig │ @5 │ auto │ │
|
||
│ └─────────┘ └─────────┘ └─────────┘ │
|
||
│ │
|
||
│ ⚠️ CRITICAL operations NEVER auto-downgrade (enterprise law) │
|
||
│ │
|
||
│ Single REJECT → Trust score resets to 0 (instant collapse) │
|
||
│ │
|
||
└─────────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
- **Approve** → +1 trust score
|
||
- **Reject** → Score resets to 0 (trust collapses instantly)
|
||
- Pattern-based: `restart_pod:nginx-*` builds trust separately from `delete_pvc:*`
|
||
- CRITICAL operations (DROP TABLE, DELETE NAMESPACE) → **Always requires human dual-signature**
|
||
|
||
---
|
||
|
||
## leWOOOgo Engine Architecture
|
||
|
||
AWOOOI is built on the **leWOOOgo Engine**—a modular, plugin-based architecture inspired by LEGO blocks:
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||
│ leWOOOgo Engine │
|
||
├─────────────────────────────────────────────────────────────────────────────┤
|
||
│ │
|
||
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
|
||
│ │ INPUT │ │ BRAIN │ │ OUTPUT │ │ ACTION │ │ DATA │ │
|
||
│ │ ─────── │ │ ─────── │ │ ─────── │ │ ─────── │ │ ─────── │ │
|
||
│ │Webhooks │ │ Ollama │ │ Slack │ │ K8s │ │ Postgres│ │
|
||
│ │ Kafka │ │ OpenAI │ │ Discord │ │ Shell │ │ Redis │ │
|
||
│ │Prometheus│ │ Claude │ │ Email │ │ MCP │ │ S3 │ │
|
||
│ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ │
|
||
│ │ │ │ │ │ │
|
||
│ └─────────────┴─────────────┴─────────────┴─────────────┘ │
|
||
│ │ │
|
||
│ ┌───────┴───────┐ │
|
||
│ │ UI │ │
|
||
│ │ ───────────── │ │
|
||
│ │ Next.js │ │
|
||
│ │ ApprovalCard │ │
|
||
│ │ThinkingStream │ │
|
||
│ └───────────────┘ │
|
||
│ │
|
||
└─────────────────────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
### Module Overview
|
||
|
||
| Module | Purpose | Key Components |
|
||
|--------|---------|----------------|
|
||
| **INPUT** | Event ingestion | Prometheus AlertManager, Kafka, Webhooks |
|
||
| **BRAIN** | AI reasoning | Ollama (local), OpenAI, Claude, GraphRAG |
|
||
| **OUTPUT** | Notifications | Slack, Discord, Email, Custom webhooks |
|
||
| **ACTION** | Execution | K8s API, Shell, MCP Bridge, Ansible |
|
||
| **DATA** | Persistence | PostgreSQL, Redis, S3, Vector DB |
|
||
| **UI** | Human interface | Next.js 14, ApprovalCard, ThinkingTerminal |
|
||
|
||
### MCP (Model Context Protocol) Support
|
||
|
||
```typescript
|
||
// MCP enables AI to safely interact with external tools
|
||
await mcpBridge.callTool("kubernetes", "restart_pod", {
|
||
pod_name: "[POD_1]", // Redacted in logs
|
||
namespace: "production",
|
||
graceful: true,
|
||
});
|
||
// Rehydration happens at execution boundary only
|
||
```
|
||
|
||
---
|
||
|
||
## FinOps: Day-1 ROI
|
||
|
||
> **Every wasted resource has a dollar sign. AWOOOI shows you exactly how much.**
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────────────┐
|
||
│ FINOPS COST ANALYSIS │
|
||
├─────────────────────────────────────────────────────────────────┤
|
||
│ │
|
||
│ MONTHLY WASTE DETECTED: $523.60 │
|
||
│ │
|
||
│ ┌──────────────────┬──────────────────┬──────────────────┐ │
|
||
│ │ REALIZABLE │ FREED │ ANNUAL │ │
|
||
│ │ $480.00/mo │ $43.60/mo │ $5,760/yr │ │
|
||
│ │ ──────────── │ ──────────── │ ──────────── │ │
|
||
│ │ PVC deletion │ Pod cleanup │ if all fixed │ │
|
||
│ │ Node resize │ (needs scale) │ │ │
|
||
│ └──────────────────┴──────────────────┴──────────────────┘ │
|
||
│ │
|
||
│ TOP RECOMMENDATIONS: │
|
||
│ ├─ Delete orphaned PVC 'data-postgres-backup' -$40.00 LOW │
|
||
│ ├─ Resize node 'worker-large-01' -$340.00 HIGH│
|
||
│ └─ Delete zombie Pod 'legacy-api-5d7b8' -$76.00 MED │
|
||
│ │
|
||
└─────────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
**Scan Types**:
|
||
- **Orphaned PVCs**: Storage not mounted by any Pod
|
||
- **Zombie Pods**: CPU < 1% for 7+ consecutive days
|
||
- **Over-provisioned Nodes**: High request, low actual usage
|
||
|
||
**Safety Buffer**: `wasted = requested - (actual × 1.2)` prevents OOM from aggressive recommendations.
|
||
|
||
---
|
||
|
||
## Quick Start
|
||
|
||
### Prerequisites
|
||
|
||
- Python 3.11+
|
||
- Node.js 18+
|
||
- pnpm 8+
|
||
- Docker (optional, for local Ollama)
|
||
|
||
### Installation
|
||
|
||
```bash
|
||
# Clone the repository
|
||
git clone https://github.com/anthropics/awoooi.git
|
||
cd awoooi
|
||
|
||
# Install dependencies
|
||
pnpm install
|
||
|
||
# Setup Python environment
|
||
cd apps/api
|
||
python -m venv venv
|
||
source venv/bin/activate # or `venv\Scripts\activate` on Windows
|
||
pip install -r requirements.txt
|
||
```
|
||
|
||
### Run Tracer Bullet 2.0 (E2E Demo)
|
||
|
||
Experience the full AWOOOI loop in 30 seconds:
|
||
|
||
```bash
|
||
cd apps/api
|
||
python scripts/tracer_bullet_2.py
|
||
```
|
||
|
||
**Expected Output**:
|
||
```
|
||
============================================================
|
||
TRACER BULLET 2.0 - FULL LOOP TEST
|
||
Test ID: tb2-20260319143052
|
||
============================================================
|
||
|
||
[x] [trigger_alert] PASS
|
||
[x] [graphrag_analysis] PASS
|
||
[x] [generate_approval] PASS
|
||
[x] [multisig_approval] PASS
|
||
[x] [mcp_execution] PASS
|
||
|
||
============================================================
|
||
TEST SUMMARY
|
||
============================================================
|
||
Total Steps: 5
|
||
Passed: 5
|
||
Failed: 0
|
||
Status: ALL PASSED
|
||
```
|
||
|
||
### Start Development Servers
|
||
|
||
```bash
|
||
# Terminal 1: API Server
|
||
cd apps/api
|
||
uvicorn src.main:app --reload --port 8000
|
||
|
||
# Terminal 2: Web Server
|
||
cd apps/web
|
||
pnpm dev
|
||
```
|
||
|
||
Open [http://localhost:3000](http://localhost:3000) to see the AWOOOI dashboard.
|
||
|
||
---
|
||
|
||
## Project Structure
|
||
|
||
```
|
||
awoooi/
|
||
├── apps/
|
||
│ ├── api/ # FastAPI Backend
|
||
│ │ ├── src/
|
||
│ │ │ ├── services/ # Core services
|
||
│ │ │ │ ├── approval.py # Multi-Sig engine
|
||
│ │ │ │ ├── dry_run.py # Dry-Run engine
|
||
│ │ │ │ ├── trust_engine.py # Progressive autonomy
|
||
│ │ │ │ └── graph_rag.py # Topology analysis
|
||
│ │ │ └── plugins/
|
||
│ │ │ ├── security/ # Privacy Shield
|
||
│ │ │ ├── mcp/ # MCP Bridge
|
||
│ │ │ └── finops/ # Cost analyzer
|
||
│ │ └── scripts/
|
||
│ │ └── tracer_bullet_2.py # E2E test
|
||
│ │
|
||
│ └── web/ # Next.js Frontend
|
||
│ └── src/
|
||
│ ├── components/
|
||
│ │ └── agent/
|
||
│ │ ├── approval-card.tsx
|
||
│ │ └── thinking-terminal.tsx
|
||
│ └── stores/
|
||
│ └── agent.store.ts
|
||
│
|
||
├── packages/
|
||
│ └── lewooogo-core/ # Shared types & contracts
|
||
│
|
||
└── docs/
|
||
└── adr/ # Architecture Decision Records
|
||
```
|
||
|
||
---
|
||
|
||
## Roadmap
|
||
|
||
| Phase | Status | Description |
|
||
|-------|--------|-------------|
|
||
| Phase 0 | Complete | Contracts & Scaffolding |
|
||
| Phase 1 | Complete | Core Integration (Monorepo, SSE, Ollama) |
|
||
| Phase 2 | Complete | HITL (ApprovalCard, Dry-Run, Multi-Sig) |
|
||
| Phase 3 | Complete | Enterprise (Privacy Shield, GraphRAG, FinOps) |
|
||
| Phase 4 | In Progress | Production Hardening & GA Release |
|
||
| Phase 5 | Planned | Multi-cluster, Federation, SaaS |
|
||
|
||
---
|
||
|
||
## Contributing
|
||
|
||
We welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details.
|
||
|
||
```bash
|
||
# Run tests
|
||
pnpm test
|
||
|
||
# Run linting
|
||
pnpm lint
|
||
|
||
# Format code
|
||
pnpm format
|
||
```
|
||
|
||
---
|
||
|
||
## License
|
||
|
||
MIT License - see [LICENSE](LICENSE) for details.
|
||
|
||
---
|
||
|
||
<div align="center">
|
||
|
||
**Built with love by [岑洋國際行銷有限公司](https://wooo.tw)**
|
||
|
||
*Turning 3 AM pages into peaceful nights since 2026*
|
||
|
||
```
|
||
"The best incident is the one you never have to wake up for."
|
||
— AWOOOI Philosophy
|
||
```
|
||
|
||
</div>
|