完成 ADR-090 Phase 7 第 3+4 個 service,解鎖 2 張 0 writer 表:
B3. apps/api/src/jobs/capacity_scanner_job.py (~300 行)
- 每日 02:00 Taipei 撈 Prometheus node_exporter
- 寫 host_capacity_snapshot (load1/5/15, cpu, iowait, mem, swap)
- heuristic ai_verdict: cpu>80 or mem>85 → critical; >60/70 → warning
- 超過硬閾值 → 寫 capacity_violation_event
- 寫 aol(capacity_recommendation)
B4. apps/api/src/jobs/compliance_scanner_job.py (~270 行)
- 每日 03:00 Taipei 遍歷 asset_inventory active assets
- 為每個 asset 寫 7 維 compliance snapshot
- secret_rotated: 真實檢查 (metadata.creationTimestamp > 90d = warning)
- 其他 6 維 (ssl_cert_valid / cve_scan / backup_tested /
audit_log_enabled / access_reviewed / encryption_at_rest) 占位 'unknown'
+ detail TODO,後續 agent 補邏輯
- 寫 aol(coverage_recalculated) summary
main.py lifespan 同步 wire 2 個新 loop
預期解鎖 (配合 B1 asset_scanner + B2 rule_catalog_sync):
- asset_inventory: 0 → 數百 (B1)
- asset_discovery_run: 0 → 每小時 1 (B1)
- asset_coverage_snapshot: 0 → assets × 7 維 (B1)
- alert_rule_catalog: 0 → ~68 條 (B2)
- host_capacity_snapshot: 0 → 每日 hosts (B3)
- capacity_violation_event: 0 → 超閾值時 (B3)
- asset_compliance_snapshot: 0 → assets × 7 維 (B4)
automation_operation_log 新增 4 個 op_type: asset_discovered / rule_created /
rule_updated / capacity_recommendation / coverage_recalculated
8 張 0 writer 表到此全數有 writer,ADR-090 Phase 7 實作完成.
Refs: ADR-090 §4.2 Phase 4, MASTER §3.5 D5 (capacity-aware)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
█████╗ ██╗ ██╗ ██████╗ ██████╗ ██████╗ ██╗
██╔══██╗██║ ██║██╔═══██╗██╔═══██╗██╔═══██╗██║
███████║██║ █╗ ██║██║ ██║██║ ██║██║ ██║██║
██╔══██║██║███╗██║██║ ██║██║ ██║██║ ██║██║
██║ ██║╚███╔███╔╝╚██████╔╝╚██████╔╝╚██████╔╝██║
╚═╝ ╚═╝ ╚══╝╚══╝ ╚═════╝ ╚═════╝ ╚═════╝ ╚═╝
Zero-Touch Ops. Human-Centric Decisions.
AI-Powered Intelligent Operations Platform
The Future of Operations is Here
When your system breaks at 3 AM, AWOOOI doesn't just alert you—it analyzes the blast radius, calculates how much money you're burning, and presents a one-click fix. You approve. It executes. You go back to sleep.
┌─────────────────────────────────────────────────────────────────────────────┐
│ │
│ ALERT: frontend 5xx rate > 15% │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ GraphRAG │ ──▶ │ Dry-Run │ ──▶ │ Multi-Sig │ │
│ │ Analysis │ │ Simulation │ │ Approval │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ Root Cause: Blast Radius: [x] devops-alice │
│ postgres-db 1 pod, 0 data loss [x] sre-bob │
│ │
│ Monthly Savings: $523.60 if fixed │
│ │
│ [ APPROVE & EXECUTE ] │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
AWOOOI (AI + WOOO Intelligent Operations) transforms reactive firefighting into proactive, AI-assisted decision-making—while keeping humans firmly in control of critical actions.
Enterprise Moats
Four pillars that make AWOOOI enterprise-ready from Day 1:
Privacy Shield
Your PII never leaves your premises. Period.
# Before: Raw sensitive data
"User 192.168.1.100 with email admin@company.com triggered alert"
# After: Consistent pseudonymization
"User [IP_1] with email [EMAIL_1] triggered alert"
# Same value → Same label (AI maintains context without seeing real data)
- Regex-based detection: IP, Email, UUID, API Keys, JWT
- Consistent hashing:
[IP_1]always maps to the same IP within a session - Rehydration Engine: Labels restored only at MCP execution boundary
- Zero PII in logs, zero PII to cloud LLMs
GraphRAG: Topology-Aware Intelligence
AI that understands your microservices like a senior SRE.
┌─────────────────────────────────────┐
│ BLAST RADIUS ANALYSIS │
│ (Upstream Impact) │
└─────────────────────────────────────┘
┌─────────────┐
│ ingress │ ← Will be affected
└──────┬──────┘
│ depends on
▼
┌─────────────┐
│ frontend │ ← Target service
└──────┬──────┘
│ calls
▼
┌───────────────────────┼───────────────────────┐
│ │ │
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ auth-service │ │ product-api │ │ order-api │
└──────┬───────┘ └──────┬───────┘ └──────┬───────┘
│ │ │
└─────────────────────┼─────────────────────┘
▼
┌──────────────┐
│ postgres-db │ X ROOT CAUSE
└──────────────┘
- BFS-based traversal with configurable
max_depth(default: 3) - Dual-direction analysis: Upstream (blast radius) + Downstream (root cause)
- Priority ranking: DATABASE > CACHE > QUEUE for root cause identification
- Multiple root causes: No single-point assumptions—collect ALL unhealthy dependencies
Multi-Sig & Dry-Run: Defense in Depth
Every critical action is simulated, validated, and co-signed.
┌────────────────────────────────────────────────────────────────┐
│ RISK MATRIX │
├────────────┬─────────────┬─────────────────────────────────────┤
│ Risk Level │ Signatures │ Required Roles │
├────────────┼─────────────┼─────────────────────────────────────┤
│ LOW │ 0 (auto) │ — │
│ MEDIUM │ 1 │ admin, devops, sre │
│ HIGH │ 2 │ admin, devops, sre │
│ CRITICAL │ 2 │ CTO + CISO (mandatory) │
└────────────┴─────────────┴─────────────────────────────────────┘
TOCTOU Protection (Time-of-Check to Time-of-Use):
1. User clicks "Approve"
2. System re-runs Dry-Run immediately before execution
3. If state changed → Status = VOIDED (not cleared!)
4. Full audit trail preserved for compliance
Dry-Run Checks:
- RBAC Permission validation
- Syntax & parameter validation
- Resource existence verification
- PodDisruptionBudget compliance
- Blast radius calculation
Progressive Autonomy: Trust That Evolves
The more you approve, the less you need to.
┌─────────────────────────────────────────────────────────────────┐
│ TRUST SCORE PROGRESSION │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Score: 0 ──────────────────────────────────────────────▶ 10+ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ HIGH │ ──▶ │ MEDIUM │ ──▶ │ LOW │ │
│ │ 2-sig │ @10 │ 1-sig │ @5 │ auto │ │
│ └─────────┘ └─────────┘ └─────────┘ │
│ │
│ ⚠️ CRITICAL operations NEVER auto-downgrade (enterprise law) │
│ │
│ Single REJECT → Trust score resets to 0 (instant collapse) │
│ │
└─────────────────────────────────────────────────────────────────┘
- Approve → +1 trust score
- Reject → Score resets to 0 (trust collapses instantly)
- Pattern-based:
restart_pod:nginx-*builds trust separately fromdelete_pvc:* - CRITICAL operations (DROP TABLE, DELETE NAMESPACE) → Always requires human dual-signature
leWOOOgo Engine Architecture
AWOOOI is built on the leWOOOgo Engine—a modular, plugin-based architecture inspired by LEGO blocks:
┌─────────────────────────────────────────────────────────────────────────────┐
│ leWOOOgo Engine │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ INPUT │ │ BRAIN │ │ OUTPUT │ │ ACTION │ │ DATA │ │
│ │ ─────── │ │ ─────── │ │ ─────── │ │ ─────── │ │ ─────── │ │
│ │Webhooks │ │ Ollama │ │ Slack │ │ K8s │ │ Postgres│ │
│ │ Kafka │ │ OpenAI │ │ Discord │ │ Shell │ │ Redis │ │
│ │Prometheus│ │ Claude │ │ Email │ │ MCP │ │ S3 │ │
│ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ │
│ │ │ │ │ │ │
│ └─────────────┴─────────────┴─────────────┴─────────────┘ │
│ │ │
│ ┌───────┴───────┐ │
│ │ UI │ │
│ │ ───────────── │ │
│ │ Next.js │ │
│ │ ApprovalCard │ │
│ │ThinkingStream │ │
│ └───────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Module Overview
| Module | Purpose | Key Components |
|---|---|---|
| INPUT | Event ingestion | Prometheus AlertManager, Kafka, Webhooks |
| BRAIN | AI reasoning | Ollama (local), OpenAI, Claude, GraphRAG |
| OUTPUT | Notifications | Slack, Discord, Email, Custom webhooks |
| ACTION | Execution | K8s API, Shell, MCP Bridge, Ansible |
| DATA | Persistence | PostgreSQL, Redis, S3, Vector DB |
| UI | Human interface | Next.js 14, ApprovalCard, ThinkingTerminal |
MCP (Model Context Protocol) Support
// MCP enables AI to safely interact with external tools
await mcpBridge.callTool("kubernetes", "restart_pod", {
pod_name: "[POD_1]", // Redacted in logs
namespace: "production",
graceful: true,
});
// Rehydration happens at execution boundary only
FinOps: Day-1 ROI
Every wasted resource has a dollar sign. AWOOOI shows you exactly how much.
┌─────────────────────────────────────────────────────────────────┐
│ FINOPS COST ANALYSIS │
├─────────────────────────────────────────────────────────────────┤
│ │
│ MONTHLY WASTE DETECTED: $523.60 │
│ │
│ ┌──────────────────┬──────────────────┬──────────────────┐ │
│ │ REALIZABLE │ FREED │ ANNUAL │ │
│ │ $480.00/mo │ $43.60/mo │ $5,760/yr │ │
│ │ ──────────── │ ──────────── │ ──────────── │ │
│ │ PVC deletion │ Pod cleanup │ if all fixed │ │
│ │ Node resize │ (needs scale) │ │ │
│ └──────────────────┴──────────────────┴──────────────────┘ │
│ │
│ TOP RECOMMENDATIONS: │
│ ├─ Delete orphaned PVC 'data-postgres-backup' -$40.00 LOW │
│ ├─ Resize node 'worker-large-01' -$340.00 HIGH│
│ └─ Delete zombie Pod 'legacy-api-5d7b8' -$76.00 MED │
│ │
└─────────────────────────────────────────────────────────────────┘
Scan Types:
- Orphaned PVCs: Storage not mounted by any Pod
- Zombie Pods: CPU < 1% for 7+ consecutive days
- Over-provisioned Nodes: High request, low actual usage
Safety Buffer: wasted = requested - (actual × 1.2) prevents OOM from aggressive recommendations.
Quick Start
Prerequisites
- Python 3.11+
- Node.js 18+
- pnpm 8+
- Docker (optional, for local Ollama)
Installation
# Clone the repository
git clone https://github.com/anthropics/awoooi.git
cd awoooi
# Install dependencies
pnpm install
# Setup Python environment
cd apps/api
python -m venv venv
source venv/bin/activate # or `venv\Scripts\activate` on Windows
pip install -r requirements.txt
Run Tracer Bullet 2.0 (E2E Demo)
Experience the full AWOOOI loop in 30 seconds:
cd apps/api
python scripts/tracer_bullet_2.py
Expected Output:
============================================================
TRACER BULLET 2.0 - FULL LOOP TEST
Test ID: tb2-20260319143052
============================================================
[x] [trigger_alert] PASS
[x] [graphrag_analysis] PASS
[x] [generate_approval] PASS
[x] [multisig_approval] PASS
[x] [mcp_execution] PASS
============================================================
TEST SUMMARY
============================================================
Total Steps: 5
Passed: 5
Failed: 0
Status: ALL PASSED
Start Development Servers
# Terminal 1: API Server
cd apps/api
uvicorn src.main:app --reload --port 8000
# Terminal 2: Web Server
cd apps/web
pnpm dev
Open http://localhost:3000 to see the AWOOOI dashboard.
Project Structure
awoooi/
├── apps/
│ ├── api/ # FastAPI Backend
│ │ ├── src/
│ │ │ ├── services/ # Core services
│ │ │ │ ├── approval.py # Multi-Sig engine
│ │ │ │ ├── dry_run.py # Dry-Run engine
│ │ │ │ ├── trust_engine.py # Progressive autonomy
│ │ │ │ └── graph_rag.py # Topology analysis
│ │ │ └── plugins/
│ │ │ ├── security/ # Privacy Shield
│ │ │ ├── mcp/ # MCP Bridge
│ │ │ └── finops/ # Cost analyzer
│ │ └── scripts/
│ │ └── tracer_bullet_2.py # E2E test
│ │
│ └── web/ # Next.js Frontend
│ └── src/
│ ├── components/
│ │ └── agent/
│ │ ├── approval-card.tsx
│ │ └── thinking-terminal.tsx
│ └── stores/
│ └── agent.store.ts
│
├── packages/
│ └── lewooogo-core/ # Shared types & contracts
│
└── docs/
└── adr/ # Architecture Decision Records
Roadmap
| Phase | Status | Description |
|---|---|---|
| Phase 0 | Complete | Contracts & Scaffolding |
| Phase 1 | Complete | Core Integration (Monorepo, SSE, Ollama) |
| Phase 2 | Complete | HITL (ApprovalCard, Dry-Run, Multi-Sig) |
| Phase 3 | Complete | Enterprise (Privacy Shield, GraphRAG, FinOps) |
| Phase 4 | In Progress | Production Hardening & GA Release |
| Phase 5 | Planned | Multi-cluster, Federation, SaaS |
Contributing
We welcome contributions! Please see our Contributing Guide for details.
# Run tests
pnpm test
# Run linting
pnpm lint
# Format code
pnpm format
License
MIT License - see LICENSE for details.
Built with love by 岑洋國際行銷有限公司
Turning 3 AM pages into peaceful nights since 2026
"The best incident is the one you never have to wake up for."
— AWOOOI Philosophy