Files

Your Name 13e51802fe feat(awooop): Phase 0 全 ADR + Phase 1 control plane schema（含 critic 四項修正）

## Phase 0（文件層，全部 Accepted）
- ADR-106/107：AwoooP 平台架構 + 儲存策略
- ADR-111~118：Bootstrap → RLS 七項核心 ADR
- ADR-119~124：SAGA → Singleton Decomposition 六項 ADR
- ADR-UI-01~04：Operator Console 四個 UI ADR

## Phase 1（DB schema + migration）
- awooop_phase1_control_plane_2026-05-04.sql：7 張新表 + trigger + RLS
  - Step 1：三角色（platform_admin/migration BYPASSRLS，awooop_app 受 RLS）
  - Step 13：GRANT awooop_app 最小權限（7 條）
  - Step 14：RLS fail-closed，移除 __platform__ 後門
- awooop_phase1_batch1_rls_2026-05-04.sql：高流量四表三步式 ADD COLUMN
- awooop_phase1_batch1_backfill.py：SKIP LOCKED 分批回填腳本
- awooop_models.py：7 個 SQLAlchemy 2.x models

## Critic 修正（4 Critical + 3 Major）
- C-1：ADD CONSTRAINT IF NOT EXISTS → DO 塊 + pg_constraint 查詢
- C-2：__mapper_args__ 字串 list → primary_key=True on mapped_column
- C-3：__platform__ RLS 後門 → 全移除，改用 BYPASSRLS role
- C-4：awooop_app role 從未建立 → Step 1 + 7 條 GRANT
- M-1：active_pointer_guard SECURITY DEFINER（FORCE RLS 跨租戶保護）
- M-2：pg_partman create_parent 加冪等防護
- M-3：immutability trigger 新增身份欄位保護（project_id/family/contract_id）

## Task 1.2 修補
- agent_loader.py：硬編碼 Mac 路徑 → AGENTS_DIR 環境變數
- Dockerfile：補 COPY .claude/agents/

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-05-04 13:37:11 +08:00

17 KiB

Raw Blame History

ADR-106: AwoooP Agent Platform Architecture and Migration Strategy

Status: Accepted Date: 2026-05-01 Scope: Multi-tenant Agent Platform, Agent contracts, MCP Gateway, runtime state, channel events, migration strategy

Context

AWOOOI currently contains the strongest AI automation implementation in the ecosystem: OpenClaw/NemoTron/Hermes/ElephantAlpha roles, Agent Loop foundations, MCP providers, Telegram workflows, model routing, cost guards, Playbook learning, and operational audit trails.

Other products already have adjacent AI or messaging surfaces:

EwoooC / MOMO PRO has business analysis bots, ElephantAlpha orchestration, market-intelligence tools, LINE/Telegram/Email notification paths, and local AI provider selection.
Tsenyang already has Telegram webhook capability.
Bitan and future products need repeatable AI onboarding without copying AWOOOI internals.

The old choice set was too narrow:

A pure centralized HTTP hub solves governance but creates a single point of failure and makes AWOOOI look like every product's private brain.
A shared SDK reduces duplicated code but cannot solve tenant isolation, identity, budget, channel, MCP credential, or cross-project audit problems.
A light configuration plane helps routing drift but still leaves tool use, session state, and channel handling scattered across projects.

The approved direction is a fourth path:

Build AwoooP, a multi-tenant Agent Platform. AWOOOI is the first and largest tenant and first runtime host, not the platform boundary itself.

This ADR records the architecture and migration strategy only. It does not authorize runtime code changes, provider cost changes, destructive operations, or channel cutovers.

Numbering Note

ADR-105-revert-a2-ollama-primary.md previously reserved ADR-106 as a placeholder for model-catalog cleanup. No checked-in ADR-106*.md existed. This ADR consumes the ADR-106 number for the broader Agent Platform decision. The model-catalog and dynamic-routing debt is folded into the Policy / Routing contract below and should be implemented under this platform roadmap or a later non-conflicting ADR number.

Decision

D0 - Name the Platform AwoooP

The platform product name is AwoooP.

Naming rules:

Product name: AwoooP
Repository/package slug: awooop
Project/tenant id for the existing AWOOOI product remains awoooi
AwoooP is the platform boundary; AWOOOI is a tenant and initial runtime host

Do not create empty project directories just for the name. Create awooop runtime/package directories only when a concrete implementation phase owns code, schemas, clients, or workers.

Recommended future layout when implementation begins:

Purpose	Path
Shared contract schemas	`packages/awooop-contracts/`
Python/TS client SDK	`packages/awooop-client/`
Platform API/runtime shell	`apps/awooop-runtime/`
Async run workers	`apps/awooop-worker/`
Detailed schema docs	`docs/awooop/`

D1 - Adopt AwoooP as a Six-Plane Platform

AwoooP is defined as six cooperating planes:

Plane	Responsibility	Must not do
Project / Tenant	Tenant identity, isolation, budget, channel allowance, ACL, migration mode	Store agent prompt details
Agent	Agent identity, version, role, I/O contract, safety ceiling, context domain	Decide concrete model/provider credentials
MCP Gateway	Tool authorization, credential resolution, rate limits, approval, result sanitization, audit	Expose raw credentials to agents
Policy / Routing	Effective model/provider route, fallback, privacy ceiling, budget gate, generation defaults	Bypass tenant hard stops
Runtime / Run State	Run lifecycle, async state machine, shadow/canary/active mode, checkpoint/resume	Treat long tasks as simple HTTP request-response
Communication / Channel Event	Telegram/LINE/Slack/Email/API receive, verify, normalize, send	Run LLM inference or call MCP directly

D2 - Require Platform Envelope Fields

Every platform invocation must carry the following envelope fields:

project_id
environment_id when environment-specific resources are involved
agent_id
agent_version after resolution
session_id
trace_id
run_id for runtime-managed work
policy_version after effective policy resolution

Missing project_id, trace_id, or run_id in runtime or MCP paths is a hard reject, not a warning.

D3 - Treat Project as the Smallest Isolation Unit

project_id is the platform tenant boundary. All Redis keys, session stores, RAG namespaces, KM queries, MCP tool scopes, budget ledgers, ACL checks, and channel routing must be project-scoped unless a resource is explicitly declared as platform_resource.

global:* resources are platform resources, not AWOOOI resources.

D4 - Define Agents as Platform Capability Modules

Agents are not product-local prompt strings. Agents are versioned platform capabilities with:

agent_id
base_agent_ref
immutable published version
role and capability tags
privacy ceiling
payload input schema
LLM output schema
prompt artifact reference and hash
context domain constraints
MCP requirement declarations
execution profile
eval and lifecycle governance

Specialized agents must inherit from base agents without loosening safety:

openclaw-core
openclaw-sre
openclaw-biz
future openclaw-pharmacy
future openclaw-marketing

Published agent artifacts are immutable. Prompt, schema, or contract changes must publish a new version.

D5 - Use MCP Gateway Instead of Direct Tool Calls

Agents never call MCP servers directly and never see raw credentials. Every tool call goes through MCP Gateway.

Gateway authorization is the intersection of:

Project grant AND Agent requirement AND Tool contract AND Environment boundary AND Approval state

Tool results entering model context must be sanitized first. Tool calls must be traceable through trace_id and run_id.

D6 - Resolve EffectivePolicy Before Every Model Call

Provider and model selection is not embedded inside Agent Contract. Runtime must derive an EffectivePolicy before each model call from:

platform policy
agent safety ceiling
project policy and budget
environment constraints
channel constraints
intent and complexity
current provider health

Budget hard stop overrides all fallback behavior. Cloud fallback must include a reason and be written to trace/audit. Any policy change that increases cost, enables a new paid provider, or raises token ceilings still requires explicit human approval under docs/HARD_RULES.md.

D7 - Model Runtime as Durable Runs

An agent invocation is a Run, not just an HTTP request.

Required runtime states:

CREATED -> POLICY_RESOLVED -> QUEUED -> RUNNING
RUNNING -> WAITING_TOOL -> RESUMED -> RUNNING
RUNNING -> WAITING_APPROVAL -> RESUMED -> RUNNING
RUNNING -> COMPLETED
any non-terminal -> FAILED
any non-terminal -> CANCELLED

Terminal states:

COMPLETED
FAILED
CANCELLED

WAITING_APPROVAL must be resumable without replaying the whole task. FAILED must include a structured failure_code.

D8 - Keep Channel Adapters Thin

Telegram, LINE, Slack, Email, and API adapters only perform:

receive
verify
normalize to ConversationEvent
send OutboundMessage

Channel adapters must not call LLMs, call MCP tools, decide approval, or embed business logic. Channel escaping, token resolution, delivery retry, and provider message IDs remain adapter responsibilities.

D9 - Migrate by Strangler Fig

No production path should be replaced in one cutover. Migration must progress by tenant and capability:

shadow: platform receives mirrored events, writes audit/trace only, no user response and no external side effects.
canary: platform can respond to selected low-risk traffic, side effects disabled by default.
read_only: read-only queries and business chat move first.
suggest: analysis and recommendations move next, approval still external.
auto_remediate: write/execute tools move only after Gateway, approval, replay, and audit evidence are green.

AWOOOI must also become a tenant (project_id=awoooi) instead of keeping a privileged private path forever.

Contract Baselines

C1 - Project / Tenant Contract

Project is the smallest isolation unit.

Minimum contract fields:

project_id
display_name
status
environments
data_boundary
platform_resources
budget
rate_limits
allowed_agents
allowed_channels
approval_gates
status_policy

Invariants:

project_id is immutable after creation.
Data boundary enforcement happens at data/Gateway APIs, not in prompt text.
Budget hard stop rejects cloud model calls with BUDGET_EXCEEDED.
Agent calls outside allowed_agents return AGENT_NOT_PERMITTED.
Project overrides may tighten permissions, never loosen them.
Migration state lives in project_migration_state, not the stable project contract record.
suspended behavior must explicitly define channel, model, and MCP access.

C2 - Agent Contract

Agent Contract answers who the agent is, what it can do, what its interfaces are, and its highest safety boundary.

Minimum contract fields:

agent_id
base_agent_ref
version
lifecycle_status
role
capability_tags
risk_class
privacy_ceiling
artifact_refs
interface
context_policy
mcp_requirements
execution_profile
governance

Invariants:

Published versions are immutable.
base_agent_ref must include a version range and resolve to an audited exact version at runtime.
Prompt and schema artifacts require hashes.
Agent requirements are not permissions; Gateway decides actual tool access.
Agent output is LLM payload only. Runtime attaches trace, cost, policy, and validation metadata.
requires_approval is calculated by runtime from project, agent, policy, and Gateway rules, not trusted from LLM output alone.

C3 - MCP Gateway Contract

MCP Gateway is the security boundary between agents and external systems.

Minimum contract fields:

tool_id
domain
resource
verbs
owner_project_id
tenancy_scope
side_effect_level
data_sensitivity
server
schemas
credential_policy
authorization_policy
approval_policy
rate_limits
result_policy
audit_policy

Tool call envelope requires:

trace_id
run_id
project_id
environment_id
agent_id
agent_version
session_id
tool_id
verb
idempotency_key
payload

Invariants:

Agents never see raw credentials.
Missing project_id, trace_id, or run_id is rejected.
Project grant, agent declaration, and context boundary must all pass.
Results entering context are sanitized.
Write, execute, and destructive operations require approval unless explicitly allowed by all relevant contracts.

C4 - Policy / Routing Contract

Policy / Routing resolves the model and execution policy for a single call.

Minimum contract fields:

policy_id
version
lifecycle_status
scope
match
constraints
route_plan
generation_defaults
budget_guard
approval_guard
observability

EffectivePolicy must include:

trace_id
project_id
agent_id
agent_version
policy_version
allow/deny decision
provider/model refs
fallback chain
max tokens
privacy mode
budget state
approval requirement
matched policy layers
decision notes

Invariants:

Agent privacy ceiling beats project override.
Project/global budget hard stop beats fallback.
Providers/models are referenced through catalog refs, not hardcoded into agent contracts.
Policy merge uses strictest-wins semantics.
Effective policy must be replayable from versioned inputs.
Schema retry has a hard cap.

C5 - Runtime / Run State Contract

Runtime owns durable execution state.

Minimum run fields:

run_id
trace_id
project_id
environment_id
agent_id
agent_version
session_id
channel_type
mode
execution_type
state
input refs
effective policy ref
checkpoint
approval
result
audit timestamps and failure details

Invariants:

Shadow mode has no external side effects and no user-visible response.
Canary mode has no side effects by default.
Every state transition writes audit.
WAITING_APPROVAL is resumable.
CANCELLED cannot resume.
Tool loop iterations have a hard cap.
Client response requires schema validation.

C6 - Communication / Channel Event Contract

Communication Hub normalizes channels without embedding AI logic.

Inbound ConversationEvent minimum fields:

event_id
trace_id
project_id
environment_id
channel metadata
sender metadata
routing metadata
payload
security metadata
delivery metadata

Outbound message minimum fields:

outbound_id
trace_id
run_id
project_id
channel target
message payload
policy
delivery state

Invariants:

Adapter does not call LLM or MCP.
Raw payload is saved for short audit retention.
Outbound messages pass ACL, redaction, and rate limit.
Bot token resolution stays inside adapter runtime.
Retry is idempotent.
Escaping/formatting is adapter responsibility.

Implementation Order

Phase 0 - Documentation and Contract Freeze

This ADR.
Follow-up schema documents or ADRs only when implementation needs field-level detail.
No runtime behavior change.

Phase 1 - Isolation Foundation

Add project_id to Redis keys, sessions, budget ledgers, dispatch logs, MCP audit snapshots, and approval records where applicable.
Define platform_resource exceptions separately from tenant resources.
Preserve existing AWOOOI behavior under project_id=awoooi.
Follow ADR-107: PostgreSQL is the source of truth for AwoooP control-plane contracts; Redis is cache/watch only; CRDs are future runtime projection.

Phase 2 - Platform Shell

Add platform APIs and data models around existing logic.
Implement envelope validation, trace propagation, audit, and effective policy calculation.
Keep existing AWOOOI and EwoooC implementations behind adapters.

Phase 3 - Shadow Mode

Mirror selected events into AwoooP.
Compare platform decisions with legacy decisions.
Do not send user-visible responses or execute side effects.

Phase 4 - Read-Only and Suggest Cutover

Move low-risk chat and read-only analysis first.
Add EwoooC business-agent traffic as the first downstream tenant validation.
Keep remediation and write tools in legacy path until Gateway evidence is sufficient.

Phase 5 - Controlled Active Runtime

Move write/execute operations only after:
- Gateway authorization is enforced.
- Approval resume is proven.
- Trace and audit can replay a complete run.
- Budget hard stop has live evidence.
- Production rollback path is documented.

Consequences

Benefits

A single agent improvement can apply to all projects or to one specialization.
Product-specific customization happens through policy, permissions, and context scope rather than copy-pasted prompts.
Tool credentials remain outside agent context.
Cross-project data leakage is blocked by data/Gateway filters.
Telegram, LINE, Slack, Email, and API can share the same agent runtime.
Long-running agent work becomes observable, resumable, and replayable.

Costs

More platform tables and contracts must exist before large runtime changes.
Early delivery focuses on audit and shadow evidence rather than visible feature changes.
Each downstream project needs explicit tenant onboarding.
Provider/model policy changes become governed artifacts, not quick env edits.

Risks

Over-building before live traffic proves value.
Contract sprawl if every detail becomes a new ADR.
Legacy paths remaining forever if strangler milestones lack deadlines.
Confusing OpenClaw brand identity if openclaw-core, openclaw-sre, and openclaw-biz are not documented clearly.

Mitigations

Use this ADR as the index and create detailed schema docs only as needed.
Use ADR-107 for control-plane storage decisions before creating migrations or CRDs.
Require project_migration_state milestones per tenant.
Keep AWOOOI on the same tenant path as all other products.
Treat shadow/canary evidence as the gate for every cutover.
Keep cost-changing provider behavior behind explicit approval.

Non-Goals

This ADR does not deploy new providers or increase paid model usage.
This ADR does not move Telegram/LINE/Slack webhooks.
This ADR does not authorize destructive MCP tools.
This ADR does not replace ADR-105 Agent Loop governance; it generalizes the platform boundary above it.
This ADR does not require Temporal or a specific workflow engine in v1.

Acceptance Criteria

Six contract baselines are captured in this ADR.
AwoooP is named as the platform product and AWOOOI is explicitly defined as first tenant, not the whole platform.
MCP is Gateway-governed, not direct agent access.
Runtime migration is shadow/canary/active, not big-bang.
Cost and privacy hard stops are preserved.
LOGBOOK records this architecture decision.

References

docs/12-agent-game-rules.md
docs/HARD_RULES.md
docs/LOGBOOK.md
docs/adr/ADR-080-ai-autonomy-flywheel-overview.md
docs/adr/ADR-095-12agent-sdk-integration.md
docs/adr/ADR-100-ai-autonomous-slo.md
docs/adr/ADR-105-mcp-agent-loop-governance.md
docs/adr/ADR-107-awooop-control-plane-storage.md
docs/superpowers/specs/2026-04-15-MASTER-ai-autonomous-flywheel-v2.md

17 KiB Raw Blame History

ADR-106: AwoooP Agent Platform Architecture and Migration Strategy

Context

Numbering Note

Decision

D0 - Name the Platform AwoooP

D1 - Adopt AwoooP as a Six-Plane Platform

D2 - Require Platform Envelope Fields

D3 - Treat Project as the Smallest Isolation Unit

D4 - Define Agents as Platform Capability Modules

D5 - Use MCP Gateway Instead of Direct Tool Calls

D6 - Resolve EffectivePolicy Before Every Model Call

D7 - Model Runtime as Durable Runs

D8 - Keep Channel Adapters Thin

D9 - Migrate by Strangler Fig

Contract Baselines

C1 - Project / Tenant Contract

C2 - Agent Contract

C3 - MCP Gateway Contract

C4 - Policy / Routing Contract

C5 - Runtime / Run State Contract

C6 - Communication / Channel Event Contract

Implementation Order

Phase 0 - Documentation and Contract Freeze

Phase 1 - Isolation Foundation

Phase 2 - Platform Shell

Phase 3 - Shadow Mode

Phase 4 - Read-Only and Suggest Cutover

Phase 5 - Controlled Active Runtime

Consequences

Benefits

Costs

Risks

Mitigations

Non-Goals

Acceptance Criteria

References

17 KiB

Raw Blame History