Your Name e45b055e0e
Some checks failed
Code Review / ai-code-review (push) Successful in 48s
run-migration / migrate (push) Failing after 45s
CD Pipeline / tests (push) Successful in 3m46s
Type Sync Check / check-type-sync (push) Successful in 2m8s
CD Pipeline / build-and-deploy (push) Failing after 31m14s
CD Pipeline / post-deploy-checks (push) Has been skipped
feat(governance): AI 治理事件處理鏈四軌交付(C/D/B/A)
【十二人專家團隊全景掃描 + 並行四軌實施】

統帥質疑「有讓 12-agent 一起協作嗎」後,依照團隊規則完成全鏈路交付:
onboarder + critic + db-expert + debugger + frontend-designer 並行掃描,
找到 6 大 Gap,再由 fullstack-engineer × 4、refactor-specialist 協作落地。

【Track C — trust_drift 雙寫整併】

兩條獨立寫 event_type=trust_drift 路徑互不呼叫,下游 consumer 拿到雙份資料
無法判定 source-of-truth。整併保留 governance_agent.check_trust_drift(功能
更全:auto-deprecate + Telegram + PG),TrustDriftDetector 降為純統計 lib,
W-6 watchdog 改呼叫 governance_agent。新增 TestSinglePgWritePerDriftScenario
驗證同一 drift 場景只觸發一次 PG 寫入。

  變更:
    - apps/api/src/services/trust_drift_detector.py(lib only,不再寫 PG)
    - apps/api/tests/test_trust_drift_watchdog.py(W-6 改 mock governance_agent)

【Track D — governance_remediation_dispatch 派遣表】

ai_governance_events 是不可變 Event Sourcing,不能塞執行狀態。新建派遣表
作為投影層:1 event → 0..N dispatches,狀態可變、可重試、可審計。

  - PgEnum 5 種 event_type + 7 階段狀態機(pending → dispatched → executing →
    succeeded/failed/cancelled/skipped)
  - 失敗重試 INSERT 新 row(不改舊 row 的 status,保留審計痕跡)
  - Partial unique index ux_grd_one_active_per_event 強制「同事件唯一活躍」
  - 4 個複合 index 支援 worker poll、去重查詢、觀測面板
  - FK 對應 ai_governance_events / playbooks / incidents / approval_records
    全部 SET NULL(avoid cascade lock,但 governance_event 用 RESTRICT)

  變更:
    - apps/api/src/db/models.py(GovernanceRemediationDispatch ORM class)
    - apps/api/migrations/governance_remediation_dispatch_2026-05-03.sql
    - apps/api/src/repositories/governance_remediation_dispatch_repo.py
      (6 個 async 函式 + 3 個自訂例外:DispatchAlreadyActive /
       InvalidStatusTransition / DispatchNotFound)
    - apps/api/src/models/governance_dispatch.py(DecisionContextV1 等 4 schema)
    - apps/api/tests/test_governance_remediation_dispatch.py(29 tests)

【Track B — /governance 頁面】

後端 PR1 三個 endpoint + 前端 PR2-5 完整三 Tab。

PR1 後端:
  - GET /api/v1/ai/governance/events(events_tab,含 event_type/severity/
    狀態/時間範圍篩選 + 分頁)
  - GET /api/v1/ai/governance/queue(queue_tab,含 graceful fallback:
    dispatch 表不存在時回 table_pending=True 不拋 500)
  - GET /api/v1/ai/governance/summary(slo_tab 30d 違反時序圖)
  - severity 映射規則寫死(critic 建議未來移 settings)

PR2-5 前端:
  - /governance 路由 + AppLayout + Compliance Badge 橫幅 + PageTabs
  - SLO Tab:3 KPI 卡片(Syne 28px + StatusOrb + 7d sparkline)+
    30d 違反 stacked BarChart
  - Events Tab:篩選列 + 表格 + inline 展開行(JSON / 修復建議 / 派遣記錄)
  - Queue Tab:HITL 待辦卡片 + 信任度進度條 + 批准/拒絕按鈕(本 PR console.log)
  - Sidebar 加入「AI 治理」入口(ShieldCheck icon)
  - i18n 雙語完整(governance namespace + nav.governance)
  - 7 個新元件:slo-kpi-card / slo-violation-chart / events-table /
    events-filter-bar / event-detail-drawer / queue-item-card / queue-history-tabs

  變更:
    - apps/api/src/api/v1/ai_governance.py(router)
    - apps/api/src/services/governance_query_service.py
    - apps/api/src/models/governance.py(Pydantic V2 schemas)
    - apps/api/tests/test_ai_governance_endpoints.py(21 tests)
    - apps/web/src/app/[locale]/governance/(page + 3 tabs)
    - apps/web/src/components/governance/(7 元件)
    - apps/web/messages/{zh-TW,en}.json(governance namespace)
    - apps/web/src/components/layout/sidebar.tsx(+1 行)
    - apps/api/src/main.py(router include)

【Track A — GovernanceDispatcher 決策融合】

把治理事件接到 remediation 執行器,走北極星方向決策融合(LLM × Playbook trust
× MCP),符合「禁寫死規則」鐵律。

  - 設計鐵律:DecisionFusionAdapter 是新增 wrapper,**不修改任何 Tier 3 檔**
    (decision_manager / learning_service / trust_engine),只 consume 既有 API
  - 三維融合公式:confidence = 0.4×llm + 0.3×playbook_trust + 0.3×mcp_consistency
    (權重加 TODO 標明未來由 AI 自學調整)
  - 三分支決策路徑:
    confidence ≥ 0.85 → auto_dispatch(status=dispatched)
    0.65 ≤ confidence < 0.85 → pending_approval(HITL)
    confidence < 0.65 → skip + log
  - decision_context JSONB 完整記錄三維輸入快照(給未來 fine-tune 用)
  - poll 30s 掃 unresolved 事件,仿 governance loop 模式
  - 重複事件擋去重(呼叫 get_active_for_event)

  變更:
    - apps/api/src/services/governance_dispatcher.py
    - apps/api/src/services/decision_fusion_adapter.py
    - apps/api/tests/test_governance_dispatcher.py(14 tests)
    - apps/api/src/main.py(lifespan task 接 run_governance_dispatcher_loop)

【驗證】

1836 個 unit test 全過(29 skipped 為既有 PG integration env 問題)

【調度教訓 — 已記入 memory】

- vuln-verifier 應在 fullstack-engineer **之前**跑(避免並行讀到已修代碼誤判)
- critic 雙輪審查不可省(第二輪抓到 NaN sentinel + Prom rule 連鎖)
- 北極星「禁寫死規則」搭配 decision-fusion 確實實施

【未動 Tier 3 — 已驗證】

git diff 確認本 commit 完全沒改 decision_manager.py / learning_service.py /
trust_engine.py,只新增 wrapper service consume 既有 API。

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 12:42:40 +08:00
2026-03-22 18:57:44 +08:00

     █████╗ ██╗    ██╗ ██████╗  ██████╗  ██████╗ ██╗
    ██╔══██╗██║    ██║██╔═══██╗██╔═══██╗██╔═══██╗██║
    ███████║██║ █╗ ██║██║   ██║██║   ██║██║   ██║██║
    ██╔══██║██║███╗██║██║   ██║██║   ██║██║   ██║██║
    ██║  ██║╚███╔███╔╝╚██████╔╝╚██████╔╝╚██████╔╝██║
    ╚═╝  ╚═╝ ╚══╝╚══╝  ╚═════╝  ╚═════╝  ╚═════╝ ╚═╝

Zero-Touch Ops. Human-Centric Decisions.

AI-Powered Intelligent Operations Platform

License: MIT Python 3.11+ Next.js 14 TypeScript

Demo · Documentation · Contributing


The Future of Operations is Here

When your system breaks at 3 AM, AWOOOI doesn't just alert you—it analyzes the blast radius, calculates how much money you're burning, and presents a one-click fix. You approve. It executes. You go back to sleep.

┌─────────────────────────────────────────────────────────────────────────────┐
│                                                                             │
│   ALERT: frontend 5xx rate > 15%                                            │
│                                                                             │
│   ┌─────────────┐      ┌─────────────┐      ┌─────────────┐                │
│   │  GraphRAG   │ ──▶  │  Dry-Run    │ ──▶  │  Multi-Sig  │                │
│   │  Analysis   │      │  Simulation │      │  Approval   │                │
│   └─────────────┘      └─────────────┘      └─────────────┘                │
│         │                    │                    │                        │
│         ▼                    ▼                    ▼                        │
│   Root Cause:          Blast Radius:        [x] devops-alice               │
│   postgres-db          1 pod, 0 data loss   [x] sre-bob                    │
│                                                                             │
│   Monthly Savings: $523.60 if fixed                                         │
│                                                                             │
│   [ APPROVE & EXECUTE ]                                                     │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

AWOOOI (AI + WOOO Intelligent Operations) transforms reactive firefighting into proactive, AI-assisted decision-making—while keeping humans firmly in control of critical actions.


Enterprise Moats

Four pillars that make AWOOOI enterprise-ready from Day 1:

Privacy Shield

Your PII never leaves your premises. Period.

# Before: Raw sensitive data
"User 192.168.1.100 with email admin@company.com triggered alert"

# After: Consistent pseudonymization
"User [IP_1] with email [EMAIL_1] triggered alert"
# Same value → Same label (AI maintains context without seeing real data)
  • Regex-based detection: IP, Email, UUID, API Keys, JWT
  • Consistent hashing: [IP_1] always maps to the same IP within a session
  • Rehydration Engine: Labels restored only at MCP execution boundary
  • Zero PII in logs, zero PII to cloud LLMs

GraphRAG: Topology-Aware Intelligence

AI that understands your microservices like a senior SRE.

                    ┌─────────────────────────────────────┐
                    │         BLAST RADIUS ANALYSIS       │
                    │         (Upstream Impact)           │
                    └─────────────────────────────────────┘

                         ┌─────────────┐
                         │   ingress   │  ← Will be affected
                         └──────┬──────┘
                                │ depends on
                                ▼
                         ┌─────────────┐
                         │  frontend   │  ← Target service
                         └──────┬──────┘
                                │ calls
                                ▼
        ┌───────────────────────┼───────────────────────┐
        │                       │                       │
        ▼                       ▼                       ▼
┌──────────────┐      ┌──────────────┐      ┌──────────────┐
│ auth-service │      │ product-api  │      │  order-api   │
└──────┬───────┘      └──────┬───────┘      └──────┬───────┘
       │                     │                     │
       └─────────────────────┼─────────────────────┘
                             ▼
                    ┌──────────────┐
                    │ postgres-db  │ X ROOT CAUSE
                    └──────────────┘
  • BFS-based traversal with configurable max_depth (default: 3)
  • Dual-direction analysis: Upstream (blast radius) + Downstream (root cause)
  • Priority ranking: DATABASE > CACHE > QUEUE for root cause identification
  • Multiple root causes: No single-point assumptions—collect ALL unhealthy dependencies

Multi-Sig & Dry-Run: Defense in Depth

Every critical action is simulated, validated, and co-signed.

┌────────────────────────────────────────────────────────────────┐
│                      RISK MATRIX                               │
├────────────┬─────────────┬─────────────────────────────────────┤
│ Risk Level │ Signatures  │ Required Roles                      │
├────────────┼─────────────┼─────────────────────────────────────┤
│ LOW        │ 0 (auto)    │ —                                   │
│ MEDIUM     │ 1           │ admin, devops, sre                  │
│ HIGH       │ 2           │ admin, devops, sre                  │
│ CRITICAL   │ 2           │ CTO + CISO (mandatory)              │
└────────────┴─────────────┴─────────────────────────────────────┘

TOCTOU Protection (Time-of-Check to Time-of-Use):

1. User clicks "Approve"
2. System re-runs Dry-Run immediately before execution
3. If state changed → Status = VOIDED (not cleared!)
4. Full audit trail preserved for compliance

Dry-Run Checks:

  • RBAC Permission validation
  • Syntax & parameter validation
  • Resource existence verification
  • PodDisruptionBudget compliance
  • Blast radius calculation

Progressive Autonomy: Trust That Evolves

The more you approve, the less you need to.

┌─────────────────────────────────────────────────────────────────┐
│                    TRUST SCORE PROGRESSION                      │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Score: 0 ──────────────────────────────────────────────▶ 10+  │
│         │                    │                          │       │
│         ▼                    ▼                          ▼       │
│    ┌─────────┐         ┌─────────┐              ┌─────────┐    │
│    │  HIGH   │   ──▶   │ MEDIUM  │    ──▶      │   LOW   │    │
│    │ 2-sig   │  @10    │  1-sig  │    @5       │  auto   │    │
│    └─────────┘         └─────────┘              └─────────┘    │
│                                                                 │
│  ⚠️  CRITICAL operations NEVER auto-downgrade (enterprise law) │
│                                                                 │
│  Single REJECT → Trust score resets to 0 (instant collapse)    │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘
  • Approve → +1 trust score
  • Reject → Score resets to 0 (trust collapses instantly)
  • Pattern-based: restart_pod:nginx-* builds trust separately from delete_pvc:*
  • CRITICAL operations (DROP TABLE, DELETE NAMESPACE) → Always requires human dual-signature

leWOOOgo Engine Architecture

AWOOOI is built on the leWOOOgo Engine—a modular, plugin-based architecture inspired by LEGO blocks:

┌─────────────────────────────────────────────────────────────────────────────┐
│                           leWOOOgo Engine                                   │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   ┌─────────┐   ┌─────────┐   ┌─────────┐   ┌─────────┐   ┌─────────┐     │
│   │  INPUT  │   │  BRAIN  │   │ OUTPUT  │   │ ACTION  │   │  DATA   │     │
│   │ ─────── │   │ ─────── │   │ ─────── │   │ ─────── │   │ ─────── │     │
│   │Webhooks │   │ Ollama  │   │  Slack  │   │   K8s   │   │ Postgres│     │
│   │  Kafka  │   │ OpenAI  │   │ Discord │   │  Shell  │   │  Redis  │     │
│   │Prometheus│   │ Claude  │   │  Email  │   │   MCP   │   │  S3     │     │
│   └────┬────┘   └────┬────┘   └────┬────┘   └────┬────┘   └────┬────┘     │
│        │             │             │             │             │           │
│        └─────────────┴─────────────┴─────────────┴─────────────┘           │
│                                    │                                        │
│                            ┌───────┴───────┐                               │
│                            │      UI       │                               │
│                            │ ───────────── │                               │
│                            │   Next.js     │                               │
│                            │ ApprovalCard  │                               │
│                            │ThinkingStream │                               │
│                            └───────────────┘                               │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Module Overview

Module Purpose Key Components
INPUT Event ingestion Prometheus AlertManager, Kafka, Webhooks
BRAIN AI reasoning Ollama (local), OpenAI, Claude, GraphRAG
OUTPUT Notifications Slack, Discord, Email, Custom webhooks
ACTION Execution K8s API, Shell, MCP Bridge, Ansible
DATA Persistence PostgreSQL, Redis, S3, Vector DB
UI Human interface Next.js 14, ApprovalCard, ThinkingTerminal

MCP (Model Context Protocol) Support

// MCP enables AI to safely interact with external tools
await mcpBridge.callTool("kubernetes", "restart_pod", {
  pod_name: "[POD_1]",      // Redacted in logs
  namespace: "production",
  graceful: true,
});
// Rehydration happens at execution boundary only

FinOps: Day-1 ROI

Every wasted resource has a dollar sign. AWOOOI shows you exactly how much.

┌─────────────────────────────────────────────────────────────────┐
│                    FINOPS COST ANALYSIS                         │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│   MONTHLY WASTE DETECTED: $523.60                               │
│                                                                 │
│   ┌──────────────────┬──────────────────┬──────────────────┐   │
│   │   REALIZABLE     │      FREED       │     ANNUAL       │   │
│   │   $480.00/mo     │    $43.60/mo     │   $5,760/yr      │   │
│   │   ────────────   │   ────────────   │   ────────────   │   │
│   │   PVC deletion   │   Pod cleanup    │   if all fixed   │   │
│   │   Node resize    │   (needs scale)  │                  │   │
│   └──────────────────┴──────────────────┴──────────────────┘   │
│                                                                 │
│   TOP RECOMMENDATIONS:                                          │
│   ├─ Delete orphaned PVC 'data-postgres-backup'    -$40.00 LOW │
│   ├─ Resize node 'worker-large-01'                -$340.00 HIGH│
│   └─ Delete zombie Pod 'legacy-api-5d7b8'          -$76.00 MED │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Scan Types:

  • Orphaned PVCs: Storage not mounted by any Pod
  • Zombie Pods: CPU < 1% for 7+ consecutive days
  • Over-provisioned Nodes: High request, low actual usage

Safety Buffer: wasted = requested - (actual × 1.2) prevents OOM from aggressive recommendations.


Quick Start

Prerequisites

  • Python 3.11+
  • Node.js 18+
  • pnpm 8+
  • Docker (optional, for local Ollama)

Installation

# Clone the repository
git clone https://github.com/anthropics/awoooi.git
cd awoooi

# Install dependencies
pnpm install

# Setup Python environment
cd apps/api
python -m venv venv
source venv/bin/activate  # or `venv\Scripts\activate` on Windows
pip install -r requirements.txt

Run Tracer Bullet 2.0 (E2E Demo)

Experience the full AWOOOI loop in 30 seconds:

cd apps/api
python scripts/tracer_bullet_2.py

Expected Output:

============================================================
TRACER BULLET 2.0 - FULL LOOP TEST
Test ID: tb2-20260319143052
============================================================

[x] [trigger_alert] PASS
[x] [graphrag_analysis] PASS
[x] [generate_approval] PASS
[x] [multisig_approval] PASS
[x] [mcp_execution] PASS

============================================================
TEST SUMMARY
============================================================
  Total Steps: 5
  Passed: 5
  Failed: 0
  Status: ALL PASSED

Start Development Servers

# Terminal 1: API Server
cd apps/api
uvicorn src.main:app --reload --port 8000

# Terminal 2: Web Server
cd apps/web
pnpm dev

Open http://localhost:3000 to see the AWOOOI dashboard.


Project Structure

awoooi/
├── apps/
│   ├── api/                    # FastAPI Backend
│   │   ├── src/
│   │   │   ├── services/       # Core services
│   │   │   │   ├── approval.py     # Multi-Sig engine
│   │   │   │   ├── dry_run.py      # Dry-Run engine
│   │   │   │   ├── trust_engine.py # Progressive autonomy
│   │   │   │   └── graph_rag.py    # Topology analysis
│   │   │   └── plugins/
│   │   │       ├── security/       # Privacy Shield
│   │   │       ├── mcp/            # MCP Bridge
│   │   │       └── finops/         # Cost analyzer
│   │   └── scripts/
│   │       └── tracer_bullet_2.py  # E2E test
│   │
│   └── web/                    # Next.js Frontend
│       └── src/
│           ├── components/
│           │   └── agent/
│           │       ├── approval-card.tsx
│           │       └── thinking-terminal.tsx
│           └── stores/
│               └── agent.store.ts
│
├── packages/
│   └── lewooogo-core/          # Shared types & contracts
│
└── docs/
    └── adr/                    # Architecture Decision Records

Roadmap

Phase Status Description
Phase 0 Complete Contracts & Scaffolding
Phase 1 Complete Core Integration (Monorepo, SSE, Ollama)
Phase 2 Complete HITL (ApprovalCard, Dry-Run, Multi-Sig)
Phase 3 Complete Enterprise (Privacy Shield, GraphRAG, FinOps)
Phase 4 In Progress Production Hardening & GA Release
Phase 5 Planned Multi-cluster, Federation, SaaS

Contributing

We welcome contributions! Please see our Contributing Guide for details.

# Run tests
pnpm test

# Run linting
pnpm lint

# Format code
pnpm format

License

MIT License - see LICENSE for details.


Built with love by 岑洋國際行銷有限公司

Turning 3 AM pages into peaceful nights since 2026

    "The best incident is the one you never have to wake up for."
                                        — AWOOOI Philosophy
Description
AWOOOI - AI Operations Platform (Mirror from GitHub)
Readme 487 MiB
Languages
Python 77.7%
TypeScript 17.8%
Shell 3.3%
HTML 0.4%
PLpgSQL 0.4%
Other 0.2%