Your Name 3668d49f2f
Some checks failed
CD Pipeline / build-and-deploy (push) Failing after 1m38s
feat(flywheel): W2 三件 + KMWriter critic 修法(1635 tests 全綠)
W2 (onboarder 4 週飛輪 80→90 路徑第二週) + critic PR review 5 個 critical/major
全部修完,default flag=false 安全無爆炸風險。

## W2 三件 PR

### PR-R2 — AOL → catalog confidence EWMA 回灌(修飛輪斷鏈 C2)
- 新檔 `apps/api/src/jobs/aol_to_catalog_writeback_job.py`
- 邏輯:每小時掃 AOL 計算 EWMA confidence (alpha=0.3) 回灌 alert_rule_catalog
- 失敗閾值 N=5 連續低成功率 → review_status='draft'
- Hermes _fetch_noisy_rules SQL 加 OR review_status='draft'
- ENABLE_AOL_WRITEBACK_JOB=false (default)
- 8 個測試(mock path 修正:lazy import → patch src.db.base.get_db_context)

### PR-V1 — self_healing_validator 串接 (修飛輪斷鏈 C6)
- 新檔 `apps/api/src/services/self_healing_validator.py`(純函數 assess_self_healing)
- post_execution_verifier.py step 5 串接(feature flag gate)
- evidence_snapshot.py 加 self_healing_score / self_healing_detail 欄位
- db/models.py + base.py ALTER IF NOT EXISTS
- score < 0.5 → 觸發 rollback 提案 Telegram alert(不自動執行)
- ENABLE_SELF_HEALING_VALIDATOR=false (default)
- 7 個測試

### PR-L1 — KM ↔ Playbook 雙向回路 (修飛輪斷鏈 C3+C4)
- learning_service.py 三條新邏輯:
  1. _write_playbook_evolution_km:promote/demote 寫 KM 演化條目
  2. _check_and_mark_playbook_review:N=5 累積觸發 review_required
  3. _demote_alert_rule_catalog_confidence:DEPRECATED → confidence×=0.5
- PlaybookRecord 加 review_required 欄位(schema migration via base.py)
- ENABLE_KM_PLAYBOOK_FEEDBACK_LOOP=false (default)
- KM_PLAYBOOK_REVIEW_THRESHOLD=5 可調
- 6 個測試

## KMWriter Critic 5 個 Critical/Major 修復(之前 critic PR review 發現)
之前 push commit c5753e1c 已修,本 commit 補回 stash 中的對應檔案:
- C1 km_writer.py:194 backfill 自打臉(已修:同步 await + DLQ)
- C2 km_writer.py:391 KM_WRITE_AWAIT=false 路徑收緊
- M1 decision_manager.py:2178/2203 移除 _fire_and_forget
- M2 incident_service.py:1099 自製 path 加 retry+DLQ
- M3 km_writer.py:166 冪等聲明對齊(UPSERT + partial unique index)

## 驗證
- 1635 unit tests 全綠(+27 from 1608)
- 與 fb0c72db (推翻 A2 Ollama primary) 共存無衝突
- 所有新 Job/Service default flag=false(不爆炸)

## 期望影響
飛輪斷鏈 C2 + C3 + C4 + C6 全修
飛輪自主化評分:65 → 85 預估(W2 完成後)

啟用順序(待 prod fb0c72db 驗證 OLLAMA primary 跑得起來後):
1. ENABLE_AOL_WRITEBACK_JOB=true
2. ENABLE_KM_PLAYBOOK_FEEDBACK_LOOP=true
3. ENABLE_SELF_HEALING_VALIDATOR=true

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 19:44:04 +08:00
2026-03-22 18:57:44 +08:00

     █████╗ ██╗    ██╗ ██████╗  ██████╗  ██████╗ ██╗
    ██╔══██╗██║    ██║██╔═══██╗██╔═══██╗██╔═══██╗██║
    ███████║██║ █╗ ██║██║   ██║██║   ██║██║   ██║██║
    ██╔══██║██║███╗██║██║   ██║██║   ██║██║   ██║██║
    ██║  ██║╚███╔███╔╝╚██████╔╝╚██████╔╝╚██████╔╝██║
    ╚═╝  ╚═╝ ╚══╝╚══╝  ╚═════╝  ╚═════╝  ╚═════╝ ╚═╝

Zero-Touch Ops. Human-Centric Decisions.

AI-Powered Intelligent Operations Platform

License: MIT Python 3.11+ Next.js 14 TypeScript

Demo · Documentation · Contributing


The Future of Operations is Here

When your system breaks at 3 AM, AWOOOI doesn't just alert you—it analyzes the blast radius, calculates how much money you're burning, and presents a one-click fix. You approve. It executes. You go back to sleep.

┌─────────────────────────────────────────────────────────────────────────────┐
│                                                                             │
│   ALERT: frontend 5xx rate > 15%                                            │
│                                                                             │
│   ┌─────────────┐      ┌─────────────┐      ┌─────────────┐                │
│   │  GraphRAG   │ ──▶  │  Dry-Run    │ ──▶  │  Multi-Sig  │                │
│   │  Analysis   │      │  Simulation │      │  Approval   │                │
│   └─────────────┘      └─────────────┘      └─────────────┘                │
│         │                    │                    │                        │
│         ▼                    ▼                    ▼                        │
│   Root Cause:          Blast Radius:        [x] devops-alice               │
│   postgres-db          1 pod, 0 data loss   [x] sre-bob                    │
│                                                                             │
│   Monthly Savings: $523.60 if fixed                                         │
│                                                                             │
│   [ APPROVE & EXECUTE ]                                                     │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

AWOOOI (AI + WOOO Intelligent Operations) transforms reactive firefighting into proactive, AI-assisted decision-making—while keeping humans firmly in control of critical actions.


Enterprise Moats

Four pillars that make AWOOOI enterprise-ready from Day 1:

Privacy Shield

Your PII never leaves your premises. Period.

# Before: Raw sensitive data
"User 192.168.1.100 with email admin@company.com triggered alert"

# After: Consistent pseudonymization
"User [IP_1] with email [EMAIL_1] triggered alert"
# Same value → Same label (AI maintains context without seeing real data)
  • Regex-based detection: IP, Email, UUID, API Keys, JWT
  • Consistent hashing: [IP_1] always maps to the same IP within a session
  • Rehydration Engine: Labels restored only at MCP execution boundary
  • Zero PII in logs, zero PII to cloud LLMs

GraphRAG: Topology-Aware Intelligence

AI that understands your microservices like a senior SRE.

                    ┌─────────────────────────────────────┐
                    │         BLAST RADIUS ANALYSIS       │
                    │         (Upstream Impact)           │
                    └─────────────────────────────────────┘

                         ┌─────────────┐
                         │   ingress   │  ← Will be affected
                         └──────┬──────┘
                                │ depends on
                                ▼
                         ┌─────────────┐
                         │  frontend   │  ← Target service
                         └──────┬──────┘
                                │ calls
                                ▼
        ┌───────────────────────┼───────────────────────┐
        │                       │                       │
        ▼                       ▼                       ▼
┌──────────────┐      ┌──────────────┐      ┌──────────────┐
│ auth-service │      │ product-api  │      │  order-api   │
└──────┬───────┘      └──────┬───────┘      └──────┬───────┘
       │                     │                     │
       └─────────────────────┼─────────────────────┘
                             ▼
                    ┌──────────────┐
                    │ postgres-db  │ X ROOT CAUSE
                    └──────────────┘
  • BFS-based traversal with configurable max_depth (default: 3)
  • Dual-direction analysis: Upstream (blast radius) + Downstream (root cause)
  • Priority ranking: DATABASE > CACHE > QUEUE for root cause identification
  • Multiple root causes: No single-point assumptions—collect ALL unhealthy dependencies

Multi-Sig & Dry-Run: Defense in Depth

Every critical action is simulated, validated, and co-signed.

┌────────────────────────────────────────────────────────────────┐
│                      RISK MATRIX                               │
├────────────┬─────────────┬─────────────────────────────────────┤
│ Risk Level │ Signatures  │ Required Roles                      │
├────────────┼─────────────┼─────────────────────────────────────┤
│ LOW        │ 0 (auto)    │ —                                   │
│ MEDIUM     │ 1           │ admin, devops, sre                  │
│ HIGH       │ 2           │ admin, devops, sre                  │
│ CRITICAL   │ 2           │ CTO + CISO (mandatory)              │
└────────────┴─────────────┴─────────────────────────────────────┘

TOCTOU Protection (Time-of-Check to Time-of-Use):

1. User clicks "Approve"
2. System re-runs Dry-Run immediately before execution
3. If state changed → Status = VOIDED (not cleared!)
4. Full audit trail preserved for compliance

Dry-Run Checks:

  • RBAC Permission validation
  • Syntax & parameter validation
  • Resource existence verification
  • PodDisruptionBudget compliance
  • Blast radius calculation

Progressive Autonomy: Trust That Evolves

The more you approve, the less you need to.

┌─────────────────────────────────────────────────────────────────┐
│                    TRUST SCORE PROGRESSION                      │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Score: 0 ──────────────────────────────────────────────▶ 10+  │
│         │                    │                          │       │
│         ▼                    ▼                          ▼       │
│    ┌─────────┐         ┌─────────┐              ┌─────────┐    │
│    │  HIGH   │   ──▶   │ MEDIUM  │    ──▶      │   LOW   │    │
│    │ 2-sig   │  @10    │  1-sig  │    @5       │  auto   │    │
│    └─────────┘         └─────────┘              └─────────┘    │
│                                                                 │
│  ⚠️  CRITICAL operations NEVER auto-downgrade (enterprise law) │
│                                                                 │
│  Single REJECT → Trust score resets to 0 (instant collapse)    │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘
  • Approve → +1 trust score
  • Reject → Score resets to 0 (trust collapses instantly)
  • Pattern-based: restart_pod:nginx-* builds trust separately from delete_pvc:*
  • CRITICAL operations (DROP TABLE, DELETE NAMESPACE) → Always requires human dual-signature

leWOOOgo Engine Architecture

AWOOOI is built on the leWOOOgo Engine—a modular, plugin-based architecture inspired by LEGO blocks:

┌─────────────────────────────────────────────────────────────────────────────┐
│                           leWOOOgo Engine                                   │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   ┌─────────┐   ┌─────────┐   ┌─────────┐   ┌─────────┐   ┌─────────┐     │
│   │  INPUT  │   │  BRAIN  │   │ OUTPUT  │   │ ACTION  │   │  DATA   │     │
│   │ ─────── │   │ ─────── │   │ ─────── │   │ ─────── │   │ ─────── │     │
│   │Webhooks │   │ Ollama  │   │  Slack  │   │   K8s   │   │ Postgres│     │
│   │  Kafka  │   │ OpenAI  │   │ Discord │   │  Shell  │   │  Redis  │     │
│   │Prometheus│   │ Claude  │   │  Email  │   │   MCP   │   │  S3     │     │
│   └────┬────┘   └────┬────┘   └────┬────┘   └────┬────┘   └────┬────┘     │
│        │             │             │             │             │           │
│        └─────────────┴─────────────┴─────────────┴─────────────┘           │
│                                    │                                        │
│                            ┌───────┴───────┐                               │
│                            │      UI       │                               │
│                            │ ───────────── │                               │
│                            │   Next.js     │                               │
│                            │ ApprovalCard  │                               │
│                            │ThinkingStream │                               │
│                            └───────────────┘                               │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Module Overview

Module Purpose Key Components
INPUT Event ingestion Prometheus AlertManager, Kafka, Webhooks
BRAIN AI reasoning Ollama (local), OpenAI, Claude, GraphRAG
OUTPUT Notifications Slack, Discord, Email, Custom webhooks
ACTION Execution K8s API, Shell, MCP Bridge, Ansible
DATA Persistence PostgreSQL, Redis, S3, Vector DB
UI Human interface Next.js 14, ApprovalCard, ThinkingTerminal

MCP (Model Context Protocol) Support

// MCP enables AI to safely interact with external tools
await mcpBridge.callTool("kubernetes", "restart_pod", {
  pod_name: "[POD_1]",      // Redacted in logs
  namespace: "production",
  graceful: true,
});
// Rehydration happens at execution boundary only

FinOps: Day-1 ROI

Every wasted resource has a dollar sign. AWOOOI shows you exactly how much.

┌─────────────────────────────────────────────────────────────────┐
│                    FINOPS COST ANALYSIS                         │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│   MONTHLY WASTE DETECTED: $523.60                               │
│                                                                 │
│   ┌──────────────────┬──────────────────┬──────────────────┐   │
│   │   REALIZABLE     │      FREED       │     ANNUAL       │   │
│   │   $480.00/mo     │    $43.60/mo     │   $5,760/yr      │   │
│   │   ────────────   │   ────────────   │   ────────────   │   │
│   │   PVC deletion   │   Pod cleanup    │   if all fixed   │   │
│   │   Node resize    │   (needs scale)  │                  │   │
│   └──────────────────┴──────────────────┴──────────────────┘   │
│                                                                 │
│   TOP RECOMMENDATIONS:                                          │
│   ├─ Delete orphaned PVC 'data-postgres-backup'    -$40.00 LOW │
│   ├─ Resize node 'worker-large-01'                -$340.00 HIGH│
│   └─ Delete zombie Pod 'legacy-api-5d7b8'          -$76.00 MED │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Scan Types:

  • Orphaned PVCs: Storage not mounted by any Pod
  • Zombie Pods: CPU < 1% for 7+ consecutive days
  • Over-provisioned Nodes: High request, low actual usage

Safety Buffer: wasted = requested - (actual × 1.2) prevents OOM from aggressive recommendations.


Quick Start

Prerequisites

  • Python 3.11+
  • Node.js 18+
  • pnpm 8+
  • Docker (optional, for local Ollama)

Installation

# Clone the repository
git clone https://github.com/anthropics/awoooi.git
cd awoooi

# Install dependencies
pnpm install

# Setup Python environment
cd apps/api
python -m venv venv
source venv/bin/activate  # or `venv\Scripts\activate` on Windows
pip install -r requirements.txt

Run Tracer Bullet 2.0 (E2E Demo)

Experience the full AWOOOI loop in 30 seconds:

cd apps/api
python scripts/tracer_bullet_2.py

Expected Output:

============================================================
TRACER BULLET 2.0 - FULL LOOP TEST
Test ID: tb2-20260319143052
============================================================

[x] [trigger_alert] PASS
[x] [graphrag_analysis] PASS
[x] [generate_approval] PASS
[x] [multisig_approval] PASS
[x] [mcp_execution] PASS

============================================================
TEST SUMMARY
============================================================
  Total Steps: 5
  Passed: 5
  Failed: 0
  Status: ALL PASSED

Start Development Servers

# Terminal 1: API Server
cd apps/api
uvicorn src.main:app --reload --port 8000

# Terminal 2: Web Server
cd apps/web
pnpm dev

Open http://localhost:3000 to see the AWOOOI dashboard.


Project Structure

awoooi/
├── apps/
│   ├── api/                    # FastAPI Backend
│   │   ├── src/
│   │   │   ├── services/       # Core services
│   │   │   │   ├── approval.py     # Multi-Sig engine
│   │   │   │   ├── dry_run.py      # Dry-Run engine
│   │   │   │   ├── trust_engine.py # Progressive autonomy
│   │   │   │   └── graph_rag.py    # Topology analysis
│   │   │   └── plugins/
│   │   │       ├── security/       # Privacy Shield
│   │   │       ├── mcp/            # MCP Bridge
│   │   │       └── finops/         # Cost analyzer
│   │   └── scripts/
│   │       └── tracer_bullet_2.py  # E2E test
│   │
│   └── web/                    # Next.js Frontend
│       └── src/
│           ├── components/
│           │   └── agent/
│           │       ├── approval-card.tsx
│           │       └── thinking-terminal.tsx
│           └── stores/
│               └── agent.store.ts
│
├── packages/
│   └── lewooogo-core/          # Shared types & contracts
│
└── docs/
    └── adr/                    # Architecture Decision Records

Roadmap

Phase Status Description
Phase 0 Complete Contracts & Scaffolding
Phase 1 Complete Core Integration (Monorepo, SSE, Ollama)
Phase 2 Complete HITL (ApprovalCard, Dry-Run, Multi-Sig)
Phase 3 Complete Enterprise (Privacy Shield, GraphRAG, FinOps)
Phase 4 In Progress Production Hardening & GA Release
Phase 5 Planned Multi-cluster, Federation, SaaS

Contributing

We welcome contributions! Please see our Contributing Guide for details.

# Run tests
pnpm test

# Run linting
pnpm lint

# Format code
pnpm format

License

MIT License - see LICENSE for details.


Built with love by 岑洋國際行銷有限公司

Turning 3 AM pages into peaceful nights since 2026

    "The best incident is the one you never have to wake up for."
                                        — AWOOOI Philosophy
Description
AWOOOI - AI Operations Platform (Mirror from GitHub)
Readme 487 MiB
Languages
Python 77.7%
TypeScript 17.8%
Shell 3.3%
HTML 0.4%
PLpgSQL 0.4%
Other 0.2%