【十二人專家團隊全景掃描 + 並行四軌實施】
統帥質疑「有讓 12-agent 一起協作嗎」後,依照團隊規則完成全鏈路交付:
onboarder + critic + db-expert + debugger + frontend-designer 並行掃描,
找到 6 大 Gap,再由 fullstack-engineer × 4、refactor-specialist 協作落地。
【Track C — trust_drift 雙寫整併】
兩條獨立寫 event_type=trust_drift 路徑互不呼叫,下游 consumer 拿到雙份資料
無法判定 source-of-truth。整併保留 governance_agent.check_trust_drift(功能
更全:auto-deprecate + Telegram + PG),TrustDriftDetector 降為純統計 lib,
W-6 watchdog 改呼叫 governance_agent。新增 TestSinglePgWritePerDriftScenario
驗證同一 drift 場景只觸發一次 PG 寫入。
變更:
- apps/api/src/services/trust_drift_detector.py(lib only,不再寫 PG)
- apps/api/tests/test_trust_drift_watchdog.py(W-6 改 mock governance_agent)
【Track D — governance_remediation_dispatch 派遣表】
ai_governance_events 是不可變 Event Sourcing,不能塞執行狀態。新建派遣表
作為投影層:1 event → 0..N dispatches,狀態可變、可重試、可審計。
- PgEnum 5 種 event_type + 7 階段狀態機(pending → dispatched → executing →
succeeded/failed/cancelled/skipped)
- 失敗重試 INSERT 新 row(不改舊 row 的 status,保留審計痕跡)
- Partial unique index ux_grd_one_active_per_event 強制「同事件唯一活躍」
- 4 個複合 index 支援 worker poll、去重查詢、觀測面板
- FK 對應 ai_governance_events / playbooks / incidents / approval_records
全部 SET NULL(avoid cascade lock,但 governance_event 用 RESTRICT)
變更:
- apps/api/src/db/models.py(GovernanceRemediationDispatch ORM class)
- apps/api/migrations/governance_remediation_dispatch_2026-05-03.sql
- apps/api/src/repositories/governance_remediation_dispatch_repo.py
(6 個 async 函式 + 3 個自訂例外:DispatchAlreadyActive /
InvalidStatusTransition / DispatchNotFound)
- apps/api/src/models/governance_dispatch.py(DecisionContextV1 等 4 schema)
- apps/api/tests/test_governance_remediation_dispatch.py(29 tests)
【Track B — /governance 頁面】
後端 PR1 三個 endpoint + 前端 PR2-5 完整三 Tab。
PR1 後端:
- GET /api/v1/ai/governance/events(events_tab,含 event_type/severity/
狀態/時間範圍篩選 + 分頁)
- GET /api/v1/ai/governance/queue(queue_tab,含 graceful fallback:
dispatch 表不存在時回 table_pending=True 不拋 500)
- GET /api/v1/ai/governance/summary(slo_tab 30d 違反時序圖)
- severity 映射規則寫死(critic 建議未來移 settings)
PR2-5 前端:
- /governance 路由 + AppLayout + Compliance Badge 橫幅 + PageTabs
- SLO Tab:3 KPI 卡片(Syne 28px + StatusOrb + 7d sparkline)+
30d 違反 stacked BarChart
- Events Tab:篩選列 + 表格 + inline 展開行(JSON / 修復建議 / 派遣記錄)
- Queue Tab:HITL 待辦卡片 + 信任度進度條 + 批准/拒絕按鈕(本 PR console.log)
- Sidebar 加入「AI 治理」入口(ShieldCheck icon)
- i18n 雙語完整(governance namespace + nav.governance)
- 7 個新元件:slo-kpi-card / slo-violation-chart / events-table /
events-filter-bar / event-detail-drawer / queue-item-card / queue-history-tabs
變更:
- apps/api/src/api/v1/ai_governance.py(router)
- apps/api/src/services/governance_query_service.py
- apps/api/src/models/governance.py(Pydantic V2 schemas)
- apps/api/tests/test_ai_governance_endpoints.py(21 tests)
- apps/web/src/app/[locale]/governance/(page + 3 tabs)
- apps/web/src/components/governance/(7 元件)
- apps/web/messages/{zh-TW,en}.json(governance namespace)
- apps/web/src/components/layout/sidebar.tsx(+1 行)
- apps/api/src/main.py(router include)
【Track A — GovernanceDispatcher 決策融合】
把治理事件接到 remediation 執行器,走北極星方向決策融合(LLM × Playbook trust
× MCP),符合「禁寫死規則」鐵律。
- 設計鐵律:DecisionFusionAdapter 是新增 wrapper,**不修改任何 Tier 3 檔**
(decision_manager / learning_service / trust_engine),只 consume 既有 API
- 三維融合公式:confidence = 0.4×llm + 0.3×playbook_trust + 0.3×mcp_consistency
(權重加 TODO 標明未來由 AI 自學調整)
- 三分支決策路徑:
confidence ≥ 0.85 → auto_dispatch(status=dispatched)
0.65 ≤ confidence < 0.85 → pending_approval(HITL)
confidence < 0.65 → skip + log
- decision_context JSONB 完整記錄三維輸入快照(給未來 fine-tune 用)
- poll 30s 掃 unresolved 事件,仿 governance loop 模式
- 重複事件擋去重(呼叫 get_active_for_event)
變更:
- apps/api/src/services/governance_dispatcher.py
- apps/api/src/services/decision_fusion_adapter.py
- apps/api/tests/test_governance_dispatcher.py(14 tests)
- apps/api/src/main.py(lifespan task 接 run_governance_dispatcher_loop)
【驗證】
1836 個 unit test 全過(29 skipped 為既有 PG integration env 問題)
【調度教訓 — 已記入 memory】
- vuln-verifier 應在 fullstack-engineer **之前**跑(避免並行讀到已修代碼誤判)
- critic 雙輪審查不可省(第二輪抓到 NaN sentinel + Prom rule 連鎖)
- 北極星「禁寫死規則」搭配 decision-fusion 確實實施
【未動 Tier 3 — 已驗證】
git diff 確認本 commit 完全沒改 decision_manager.py / learning_service.py /
trust_engine.py,只新增 wrapper service consume 既有 API。
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
█████╗ ██╗ ██╗ ██████╗ ██████╗ ██████╗ ██╗
██╔══██╗██║ ██║██╔═══██╗██╔═══██╗██╔═══██╗██║
███████║██║ █╗ ██║██║ ██║██║ ██║██║ ██║██║
██╔══██║██║███╗██║██║ ██║██║ ██║██║ ██║██║
██║ ██║╚███╔███╔╝╚██████╔╝╚██████╔╝╚██████╔╝██║
╚═╝ ╚═╝ ╚══╝╚══╝ ╚═════╝ ╚═════╝ ╚═════╝ ╚═╝
Zero-Touch Ops. Human-Centric Decisions.
AI-Powered Intelligent Operations Platform
The Future of Operations is Here
When your system breaks at 3 AM, AWOOOI doesn't just alert you—it analyzes the blast radius, calculates how much money you're burning, and presents a one-click fix. You approve. It executes. You go back to sleep.
┌─────────────────────────────────────────────────────────────────────────────┐
│ │
│ ALERT: frontend 5xx rate > 15% │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ GraphRAG │ ──▶ │ Dry-Run │ ──▶ │ Multi-Sig │ │
│ │ Analysis │ │ Simulation │ │ Approval │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ Root Cause: Blast Radius: [x] devops-alice │
│ postgres-db 1 pod, 0 data loss [x] sre-bob │
│ │
│ Monthly Savings: $523.60 if fixed │
│ │
│ [ APPROVE & EXECUTE ] │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
AWOOOI (AI + WOOO Intelligent Operations) transforms reactive firefighting into proactive, AI-assisted decision-making—while keeping humans firmly in control of critical actions.
Enterprise Moats
Four pillars that make AWOOOI enterprise-ready from Day 1:
Privacy Shield
Your PII never leaves your premises. Period.
# Before: Raw sensitive data
"User 192.168.1.100 with email admin@company.com triggered alert"
# After: Consistent pseudonymization
"User [IP_1] with email [EMAIL_1] triggered alert"
# Same value → Same label (AI maintains context without seeing real data)
- Regex-based detection: IP, Email, UUID, API Keys, JWT
- Consistent hashing:
[IP_1]always maps to the same IP within a session - Rehydration Engine: Labels restored only at MCP execution boundary
- Zero PII in logs, zero PII to cloud LLMs
GraphRAG: Topology-Aware Intelligence
AI that understands your microservices like a senior SRE.
┌─────────────────────────────────────┐
│ BLAST RADIUS ANALYSIS │
│ (Upstream Impact) │
└─────────────────────────────────────┘
┌─────────────┐
│ ingress │ ← Will be affected
└──────┬──────┘
│ depends on
▼
┌─────────────┐
│ frontend │ ← Target service
└──────┬──────┘
│ calls
▼
┌───────────────────────┼───────────────────────┐
│ │ │
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ auth-service │ │ product-api │ │ order-api │
└──────┬───────┘ └──────┬───────┘ └──────┬───────┘
│ │ │
└─────────────────────┼─────────────────────┘
▼
┌──────────────┐
│ postgres-db │ X ROOT CAUSE
└──────────────┘
- BFS-based traversal with configurable
max_depth(default: 3) - Dual-direction analysis: Upstream (blast radius) + Downstream (root cause)
- Priority ranking: DATABASE > CACHE > QUEUE for root cause identification
- Multiple root causes: No single-point assumptions—collect ALL unhealthy dependencies
Multi-Sig & Dry-Run: Defense in Depth
Every critical action is simulated, validated, and co-signed.
┌────────────────────────────────────────────────────────────────┐
│ RISK MATRIX │
├────────────┬─────────────┬─────────────────────────────────────┤
│ Risk Level │ Signatures │ Required Roles │
├────────────┼─────────────┼─────────────────────────────────────┤
│ LOW │ 0 (auto) │ — │
│ MEDIUM │ 1 │ admin, devops, sre │
│ HIGH │ 2 │ admin, devops, sre │
│ CRITICAL │ 2 │ CTO + CISO (mandatory) │
└────────────┴─────────────┴─────────────────────────────────────┘
TOCTOU Protection (Time-of-Check to Time-of-Use):
1. User clicks "Approve"
2. System re-runs Dry-Run immediately before execution
3. If state changed → Status = VOIDED (not cleared!)
4. Full audit trail preserved for compliance
Dry-Run Checks:
- RBAC Permission validation
- Syntax & parameter validation
- Resource existence verification
- PodDisruptionBudget compliance
- Blast radius calculation
Progressive Autonomy: Trust That Evolves
The more you approve, the less you need to.
┌─────────────────────────────────────────────────────────────────┐
│ TRUST SCORE PROGRESSION │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Score: 0 ──────────────────────────────────────────────▶ 10+ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ HIGH │ ──▶ │ MEDIUM │ ──▶ │ LOW │ │
│ │ 2-sig │ @10 │ 1-sig │ @5 │ auto │ │
│ └─────────┘ └─────────┘ └─────────┘ │
│ │
│ ⚠️ CRITICAL operations NEVER auto-downgrade (enterprise law) │
│ │
│ Single REJECT → Trust score resets to 0 (instant collapse) │
│ │
└─────────────────────────────────────────────────────────────────┘
- Approve → +1 trust score
- Reject → Score resets to 0 (trust collapses instantly)
- Pattern-based:
restart_pod:nginx-*builds trust separately fromdelete_pvc:* - CRITICAL operations (DROP TABLE, DELETE NAMESPACE) → Always requires human dual-signature
leWOOOgo Engine Architecture
AWOOOI is built on the leWOOOgo Engine—a modular, plugin-based architecture inspired by LEGO blocks:
┌─────────────────────────────────────────────────────────────────────────────┐
│ leWOOOgo Engine │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ INPUT │ │ BRAIN │ │ OUTPUT │ │ ACTION │ │ DATA │ │
│ │ ─────── │ │ ─────── │ │ ─────── │ │ ─────── │ │ ─────── │ │
│ │Webhooks │ │ Ollama │ │ Slack │ │ K8s │ │ Postgres│ │
│ │ Kafka │ │ OpenAI │ │ Discord │ │ Shell │ │ Redis │ │
│ │Prometheus│ │ Claude │ │ Email │ │ MCP │ │ S3 │ │
│ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ │
│ │ │ │ │ │ │
│ └─────────────┴─────────────┴─────────────┴─────────────┘ │
│ │ │
│ ┌───────┴───────┐ │
│ │ UI │ │
│ │ ───────────── │ │
│ │ Next.js │ │
│ │ ApprovalCard │ │
│ │ThinkingStream │ │
│ └───────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Module Overview
| Module | Purpose | Key Components |
|---|---|---|
| INPUT | Event ingestion | Prometheus AlertManager, Kafka, Webhooks |
| BRAIN | AI reasoning | Ollama (local), OpenAI, Claude, GraphRAG |
| OUTPUT | Notifications | Slack, Discord, Email, Custom webhooks |
| ACTION | Execution | K8s API, Shell, MCP Bridge, Ansible |
| DATA | Persistence | PostgreSQL, Redis, S3, Vector DB |
| UI | Human interface | Next.js 14, ApprovalCard, ThinkingTerminal |
MCP (Model Context Protocol) Support
// MCP enables AI to safely interact with external tools
await mcpBridge.callTool("kubernetes", "restart_pod", {
pod_name: "[POD_1]", // Redacted in logs
namespace: "production",
graceful: true,
});
// Rehydration happens at execution boundary only
FinOps: Day-1 ROI
Every wasted resource has a dollar sign. AWOOOI shows you exactly how much.
┌─────────────────────────────────────────────────────────────────┐
│ FINOPS COST ANALYSIS │
├─────────────────────────────────────────────────────────────────┤
│ │
│ MONTHLY WASTE DETECTED: $523.60 │
│ │
│ ┌──────────────────┬──────────────────┬──────────────────┐ │
│ │ REALIZABLE │ FREED │ ANNUAL │ │
│ │ $480.00/mo │ $43.60/mo │ $5,760/yr │ │
│ │ ──────────── │ ──────────── │ ──────────── │ │
│ │ PVC deletion │ Pod cleanup │ if all fixed │ │
│ │ Node resize │ (needs scale) │ │ │
│ └──────────────────┴──────────────────┴──────────────────┘ │
│ │
│ TOP RECOMMENDATIONS: │
│ ├─ Delete orphaned PVC 'data-postgres-backup' -$40.00 LOW │
│ ├─ Resize node 'worker-large-01' -$340.00 HIGH│
│ └─ Delete zombie Pod 'legacy-api-5d7b8' -$76.00 MED │
│ │
└─────────────────────────────────────────────────────────────────┘
Scan Types:
- Orphaned PVCs: Storage not mounted by any Pod
- Zombie Pods: CPU < 1% for 7+ consecutive days
- Over-provisioned Nodes: High request, low actual usage
Safety Buffer: wasted = requested - (actual × 1.2) prevents OOM from aggressive recommendations.
Quick Start
Prerequisites
- Python 3.11+
- Node.js 18+
- pnpm 8+
- Docker (optional, for local Ollama)
Installation
# Clone the repository
git clone https://github.com/anthropics/awoooi.git
cd awoooi
# Install dependencies
pnpm install
# Setup Python environment
cd apps/api
python -m venv venv
source venv/bin/activate # or `venv\Scripts\activate` on Windows
pip install -r requirements.txt
Run Tracer Bullet 2.0 (E2E Demo)
Experience the full AWOOOI loop in 30 seconds:
cd apps/api
python scripts/tracer_bullet_2.py
Expected Output:
============================================================
TRACER BULLET 2.0 - FULL LOOP TEST
Test ID: tb2-20260319143052
============================================================
[x] [trigger_alert] PASS
[x] [graphrag_analysis] PASS
[x] [generate_approval] PASS
[x] [multisig_approval] PASS
[x] [mcp_execution] PASS
============================================================
TEST SUMMARY
============================================================
Total Steps: 5
Passed: 5
Failed: 0
Status: ALL PASSED
Start Development Servers
# Terminal 1: API Server
cd apps/api
uvicorn src.main:app --reload --port 8000
# Terminal 2: Web Server
cd apps/web
pnpm dev
Open http://localhost:3000 to see the AWOOOI dashboard.
Project Structure
awoooi/
├── apps/
│ ├── api/ # FastAPI Backend
│ │ ├── src/
│ │ │ ├── services/ # Core services
│ │ │ │ ├── approval.py # Multi-Sig engine
│ │ │ │ ├── dry_run.py # Dry-Run engine
│ │ │ │ ├── trust_engine.py # Progressive autonomy
│ │ │ │ └── graph_rag.py # Topology analysis
│ │ │ └── plugins/
│ │ │ ├── security/ # Privacy Shield
│ │ │ ├── mcp/ # MCP Bridge
│ │ │ └── finops/ # Cost analyzer
│ │ └── scripts/
│ │ └── tracer_bullet_2.py # E2E test
│ │
│ └── web/ # Next.js Frontend
│ └── src/
│ ├── components/
│ │ └── agent/
│ │ ├── approval-card.tsx
│ │ └── thinking-terminal.tsx
│ └── stores/
│ └── agent.store.ts
│
├── packages/
│ └── lewooogo-core/ # Shared types & contracts
│
└── docs/
└── adr/ # Architecture Decision Records
Roadmap
| Phase | Status | Description |
|---|---|---|
| Phase 0 | Complete | Contracts & Scaffolding |
| Phase 1 | Complete | Core Integration (Monorepo, SSE, Ollama) |
| Phase 2 | Complete | HITL (ApprovalCard, Dry-Run, Multi-Sig) |
| Phase 3 | Complete | Enterprise (Privacy Shield, GraphRAG, FinOps) |
| Phase 4 | In Progress | Production Hardening & GA Release |
| Phase 5 | Planned | Multi-cluster, Federation, SaaS |
Contributing
We welcome contributions! Please see our Contributing Guide for details.
# Run tests
pnpm test
# Run linting
pnpm lint
# Format code
pnpm format
License
MIT License - see LICENSE for details.
Built with love by 岑洋國際行銷有限公司
Turning 3 AM pages into peaceful nights since 2026
"The best incident is the one you never have to wake up for."
— AWOOOI Philosophy