## Phase 0(文件層,全部 Accepted) - ADR-106/107:AwoooP 平台架構 + 儲存策略 - ADR-111~118:Bootstrap → RLS 七項核心 ADR - ADR-119~124:SAGA → Singleton Decomposition 六項 ADR - ADR-UI-01~04:Operator Console 四個 UI ADR ## Phase 1(DB schema + migration) - awooop_phase1_control_plane_2026-05-04.sql:7 張新表 + trigger + RLS - Step 1:三角色(platform_admin/migration BYPASSRLS,awooop_app 受 RLS) - Step 13:GRANT awooop_app 最小權限(7 條) - Step 14:RLS fail-closed,移除 __platform__ 後門 - awooop_phase1_batch1_rls_2026-05-04.sql:高流量四表三步式 ADD COLUMN - awooop_phase1_batch1_backfill.py:SKIP LOCKED 分批回填腳本 - awooop_models.py:7 個 SQLAlchemy 2.x models ## Critic 修正(4 Critical + 3 Major) - C-1:ADD CONSTRAINT IF NOT EXISTS → DO 塊 + pg_constraint 查詢 - C-2:__mapper_args__ 字串 list → primary_key=True on mapped_column - C-3:__platform__ RLS 後門 → 全移除,改用 BYPASSRLS role - C-4:awooop_app role 從未建立 → Step 1 + 7 條 GRANT - M-1:active_pointer_guard SECURITY DEFINER(FORCE RLS 跨租戶保護) - M-2:pg_partman create_parent 加冪等防護 - M-3:immutability trigger 新增身份欄位保護(project_id/family/contract_id) ## Task 1.2 修補 - agent_loader.py:硬編碼 Mac 路徑 → AGENTS_DIR 環境變數 - Dockerfile:補 COPY .claude/agents/ Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
17 KiB
ADR-106: AwoooP Agent Platform Architecture and Migration Strategy
Status: Accepted Date: 2026-05-01 Scope: Multi-tenant Agent Platform, Agent contracts, MCP Gateway, runtime state, channel events, migration strategy
Context
AWOOOI currently contains the strongest AI automation implementation in the ecosystem: OpenClaw/NemoTron/Hermes/ElephantAlpha roles, Agent Loop foundations, MCP providers, Telegram workflows, model routing, cost guards, Playbook learning, and operational audit trails.
Other products already have adjacent AI or messaging surfaces:
- EwoooC / MOMO PRO has business analysis bots, ElephantAlpha orchestration, market-intelligence tools, LINE/Telegram/Email notification paths, and local AI provider selection.
- Tsenyang already has Telegram webhook capability.
- Bitan and future products need repeatable AI onboarding without copying AWOOOI internals.
The old choice set was too narrow:
- A pure centralized HTTP hub solves governance but creates a single point of failure and makes AWOOOI look like every product's private brain.
- A shared SDK reduces duplicated code but cannot solve tenant isolation, identity, budget, channel, MCP credential, or cross-project audit problems.
- A light configuration plane helps routing drift but still leaves tool use, session state, and channel handling scattered across projects.
The approved direction is a fourth path:
Build AwoooP, a multi-tenant Agent Platform. AWOOOI is the first and largest tenant and first runtime host, not the platform boundary itself.
This ADR records the architecture and migration strategy only. It does not authorize runtime code changes, provider cost changes, destructive operations, or channel cutovers.
Numbering Note
ADR-105-revert-a2-ollama-primary.md previously reserved ADR-106 as a
placeholder for model-catalog cleanup. No checked-in ADR-106*.md existed.
This ADR consumes the ADR-106 number for the broader Agent Platform decision.
The model-catalog and dynamic-routing debt is folded into the Policy / Routing
contract below and should be implemented under this platform roadmap or a later
non-conflicting ADR number.
Decision
D0 - Name the Platform AwoooP
The platform product name is AwoooP.
Naming rules:
- Product name:
AwoooP - Repository/package slug:
awooop - Project/tenant id for the existing AWOOOI product remains
awoooi AwoooPis the platform boundary;AWOOOIis a tenant and initial runtime host
Do not create empty project directories just for the name. Create awooop
runtime/package directories only when a concrete implementation phase owns
code, schemas, clients, or workers.
Recommended future layout when implementation begins:
| Purpose | Path |
|---|---|
| Shared contract schemas | packages/awooop-contracts/ |
| Python/TS client SDK | packages/awooop-client/ |
| Platform API/runtime shell | apps/awooop-runtime/ |
| Async run workers | apps/awooop-worker/ |
| Detailed schema docs | docs/awooop/ |
D1 - Adopt AwoooP as a Six-Plane Platform
AwoooP is defined as six cooperating planes:
| Plane | Responsibility | Must not do |
|---|---|---|
| Project / Tenant | Tenant identity, isolation, budget, channel allowance, ACL, migration mode | Store agent prompt details |
| Agent | Agent identity, version, role, I/O contract, safety ceiling, context domain | Decide concrete model/provider credentials |
| MCP Gateway | Tool authorization, credential resolution, rate limits, approval, result sanitization, audit | Expose raw credentials to agents |
| Policy / Routing | Effective model/provider route, fallback, privacy ceiling, budget gate, generation defaults | Bypass tenant hard stops |
| Runtime / Run State | Run lifecycle, async state machine, shadow/canary/active mode, checkpoint/resume | Treat long tasks as simple HTTP request-response |
| Communication / Channel Event | Telegram/LINE/Slack/Email/API receive, verify, normalize, send | Run LLM inference or call MCP directly |
D2 - Require Platform Envelope Fields
Every platform invocation must carry the following envelope fields:
project_idenvironment_idwhen environment-specific resources are involvedagent_idagent_versionafter resolutionsession_idtrace_idrun_idfor runtime-managed workpolicy_versionafter effective policy resolution
Missing project_id, trace_id, or run_id in runtime or MCP paths is a hard
reject, not a warning.
D3 - Treat Project as the Smallest Isolation Unit
project_id is the platform tenant boundary. All Redis keys, session stores,
RAG namespaces, KM queries, MCP tool scopes, budget ledgers, ACL checks, and
channel routing must be project-scoped unless a resource is explicitly declared
as platform_resource.
global:* resources are platform resources, not AWOOOI resources.
D4 - Define Agents as Platform Capability Modules
Agents are not product-local prompt strings. Agents are versioned platform capabilities with:
agent_idbase_agent_ref- immutable published version
- role and capability tags
- privacy ceiling
- payload input schema
- LLM output schema
- prompt artifact reference and hash
- context domain constraints
- MCP requirement declarations
- execution profile
- eval and lifecycle governance
Specialized agents must inherit from base agents without loosening safety:
openclaw-coreopenclaw-sreopenclaw-biz- future
openclaw-pharmacy - future
openclaw-marketing
Published agent artifacts are immutable. Prompt, schema, or contract changes must publish a new version.
D5 - Use MCP Gateway Instead of Direct Tool Calls
Agents never call MCP servers directly and never see raw credentials. Every tool call goes through MCP Gateway.
Gateway authorization is the intersection of:
Project grant AND Agent requirement AND Tool contract AND Environment boundary AND Approval state
Tool results entering model context must be sanitized first. Tool calls must be
traceable through trace_id and run_id.
D6 - Resolve EffectivePolicy Before Every Model Call
Provider and model selection is not embedded inside Agent Contract. Runtime must
derive an EffectivePolicy before each model call from:
- platform policy
- agent safety ceiling
- project policy and budget
- environment constraints
- channel constraints
- intent and complexity
- current provider health
Budget hard stop overrides all fallback behavior. Cloud fallback must include a
reason and be written to trace/audit. Any policy change that increases cost,
enables a new paid provider, or raises token ceilings still requires explicit
human approval under docs/HARD_RULES.md.
D7 - Model Runtime as Durable Runs
An agent invocation is a Run, not just an HTTP request.
Required runtime states:
CREATED -> POLICY_RESOLVED -> QUEUED -> RUNNING
RUNNING -> WAITING_TOOL -> RESUMED -> RUNNING
RUNNING -> WAITING_APPROVAL -> RESUMED -> RUNNING
RUNNING -> COMPLETED
any non-terminal -> FAILED
any non-terminal -> CANCELLED
Terminal states:
COMPLETEDFAILEDCANCELLED
WAITING_APPROVAL must be resumable without replaying the whole task. FAILED
must include a structured failure_code.
D8 - Keep Channel Adapters Thin
Telegram, LINE, Slack, Email, and API adapters only perform:
- receive
- verify
- normalize to
ConversationEvent - send
OutboundMessage
Channel adapters must not call LLMs, call MCP tools, decide approval, or embed business logic. Channel escaping, token resolution, delivery retry, and provider message IDs remain adapter responsibilities.
D9 - Migrate by Strangler Fig
No production path should be replaced in one cutover. Migration must progress by tenant and capability:
shadow: platform receives mirrored events, writes audit/trace only, no user response and no external side effects.canary: platform can respond to selected low-risk traffic, side effects disabled by default.read_only: read-only queries and business chat move first.suggest: analysis and recommendations move next, approval still external.auto_remediate: write/execute tools move only after Gateway, approval, replay, and audit evidence are green.
AWOOOI must also become a tenant (project_id=awoooi) instead of keeping a
privileged private path forever.
Contract Baselines
C1 - Project / Tenant Contract
Project is the smallest isolation unit.
Minimum contract fields:
project_iddisplay_namestatusenvironmentsdata_boundaryplatform_resourcesbudgetrate_limitsallowed_agentsallowed_channelsapproval_gatesstatus_policy
Invariants:
project_idis immutable after creation.- Data boundary enforcement happens at data/Gateway APIs, not in prompt text.
- Budget hard stop rejects cloud model calls with
BUDGET_EXCEEDED. - Agent calls outside
allowed_agentsreturnAGENT_NOT_PERMITTED. - Project overrides may tighten permissions, never loosen them.
- Migration state lives in
project_migration_state, not the stable project contract record. suspendedbehavior must explicitly define channel, model, and MCP access.
C2 - Agent Contract
Agent Contract answers who the agent is, what it can do, what its interfaces are, and its highest safety boundary.
Minimum contract fields:
agent_idbase_agent_refversionlifecycle_statusrolecapability_tagsrisk_classprivacy_ceilingartifact_refsinterfacecontext_policymcp_requirementsexecution_profilegovernance
Invariants:
- Published versions are immutable.
base_agent_refmust include a version range and resolve to an audited exact version at runtime.- Prompt and schema artifacts require hashes.
- Agent requirements are not permissions; Gateway decides actual tool access.
- Agent output is LLM payload only. Runtime attaches trace, cost, policy, and validation metadata.
requires_approvalis calculated by runtime from project, agent, policy, and Gateway rules, not trusted from LLM output alone.
C3 - MCP Gateway Contract
MCP Gateway is the security boundary between agents and external systems.
Minimum contract fields:
tool_iddomainresourceverbsowner_project_idtenancy_scopeside_effect_leveldata_sensitivityserverschemascredential_policyauthorization_policyapproval_policyrate_limitsresult_policyaudit_policy
Tool call envelope requires:
trace_idrun_idproject_idenvironment_idagent_idagent_versionsession_idtool_idverbidempotency_keypayload
Invariants:
- Agents never see raw credentials.
- Missing
project_id,trace_id, orrun_idis rejected. - Project grant, agent declaration, and context boundary must all pass.
- Results entering context are sanitized.
- Write, execute, and destructive operations require approval unless explicitly allowed by all relevant contracts.
C4 - Policy / Routing Contract
Policy / Routing resolves the model and execution policy for a single call.
Minimum contract fields:
policy_idversionlifecycle_statusscopematchconstraintsroute_plangeneration_defaultsbudget_guardapproval_guardobservability
EffectivePolicy must include:
trace_idproject_idagent_idagent_versionpolicy_version- allow/deny decision
- provider/model refs
- fallback chain
- max tokens
- privacy mode
- budget state
- approval requirement
- matched policy layers
- decision notes
Invariants:
- Agent privacy ceiling beats project override.
- Project/global budget hard stop beats fallback.
- Providers/models are referenced through catalog refs, not hardcoded into agent contracts.
- Policy merge uses strictest-wins semantics.
- Effective policy must be replayable from versioned inputs.
- Schema retry has a hard cap.
C5 - Runtime / Run State Contract
Runtime owns durable execution state.
Minimum run fields:
run_idtrace_idproject_idenvironment_idagent_idagent_versionsession_idchannel_typemodeexecution_typestate- input refs
- effective policy ref
- checkpoint
- approval
- result
- audit timestamps and failure details
Invariants:
- Shadow mode has no external side effects and no user-visible response.
- Canary mode has no side effects by default.
- Every state transition writes audit.
WAITING_APPROVALis resumable.CANCELLEDcannot resume.- Tool loop iterations have a hard cap.
- Client response requires schema validation.
C6 - Communication / Channel Event Contract
Communication Hub normalizes channels without embedding AI logic.
Inbound ConversationEvent minimum fields:
event_idtrace_idproject_idenvironment_id- channel metadata
- sender metadata
- routing metadata
- payload
- security metadata
- delivery metadata
Outbound message minimum fields:
outbound_idtrace_idrun_idproject_id- channel target
- message payload
- policy
- delivery state
Invariants:
- Adapter does not call LLM or MCP.
- Raw payload is saved for short audit retention.
- Outbound messages pass ACL, redaction, and rate limit.
- Bot token resolution stays inside adapter runtime.
- Retry is idempotent.
- Escaping/formatting is adapter responsibility.
Implementation Order
Phase 0 - Documentation and Contract Freeze
- This ADR.
- Follow-up schema documents or ADRs only when implementation needs field-level detail.
- No runtime behavior change.
Phase 1 - Isolation Foundation
- Add
project_idto Redis keys, sessions, budget ledgers, dispatch logs, MCP audit snapshots, and approval records where applicable. - Define
platform_resourceexceptions separately from tenant resources. - Preserve existing AWOOOI behavior under
project_id=awoooi. - Follow ADR-107: PostgreSQL is the source of truth for AwoooP control-plane contracts; Redis is cache/watch only; CRDs are future runtime projection.
Phase 2 - Platform Shell
- Add platform APIs and data models around existing logic.
- Implement envelope validation, trace propagation, audit, and effective policy calculation.
- Keep existing AWOOOI and EwoooC implementations behind adapters.
Phase 3 - Shadow Mode
- Mirror selected events into AwoooP.
- Compare platform decisions with legacy decisions.
- Do not send user-visible responses or execute side effects.
Phase 4 - Read-Only and Suggest Cutover
- Move low-risk chat and read-only analysis first.
- Add EwoooC business-agent traffic as the first downstream tenant validation.
- Keep remediation and write tools in legacy path until Gateway evidence is sufficient.
Phase 5 - Controlled Active Runtime
- Move write/execute operations only after:
- Gateway authorization is enforced.
- Approval resume is proven.
- Trace and audit can replay a complete run.
- Budget hard stop has live evidence.
- Production rollback path is documented.
Consequences
Benefits
- A single agent improvement can apply to all projects or to one specialization.
- Product-specific customization happens through policy, permissions, and context scope rather than copy-pasted prompts.
- Tool credentials remain outside agent context.
- Cross-project data leakage is blocked by data/Gateway filters.
- Telegram, LINE, Slack, Email, and API can share the same agent runtime.
- Long-running agent work becomes observable, resumable, and replayable.
Costs
- More platform tables and contracts must exist before large runtime changes.
- Early delivery focuses on audit and shadow evidence rather than visible feature changes.
- Each downstream project needs explicit tenant onboarding.
- Provider/model policy changes become governed artifacts, not quick env edits.
Risks
- Over-building before live traffic proves value.
- Contract sprawl if every detail becomes a new ADR.
- Legacy paths remaining forever if strangler milestones lack deadlines.
- Confusing
OpenClawbrand identity ifopenclaw-core,openclaw-sre, andopenclaw-bizare not documented clearly.
Mitigations
- Use this ADR as the index and create detailed schema docs only as needed.
- Use ADR-107 for control-plane storage decisions before creating migrations or CRDs.
- Require
project_migration_statemilestones per tenant. - Keep AWOOOI on the same tenant path as all other products.
- Treat shadow/canary evidence as the gate for every cutover.
- Keep cost-changing provider behavior behind explicit approval.
Non-Goals
- This ADR does not deploy new providers or increase paid model usage.
- This ADR does not move Telegram/LINE/Slack webhooks.
- This ADR does not authorize destructive MCP tools.
- This ADR does not replace ADR-105 Agent Loop governance; it generalizes the platform boundary above it.
- This ADR does not require Temporal or a specific workflow engine in v1.
Acceptance Criteria
- Six contract baselines are captured in this ADR.
- AwoooP is named as the platform product and AWOOOI is explicitly defined as first tenant, not the whole platform.
- MCP is Gateway-governed, not direct agent access.
- Runtime migration is shadow/canary/active, not big-bang.
- Cost and privacy hard stops are preserved.
- LOGBOOK records this architecture decision.
References
docs/12-agent-game-rules.mddocs/HARD_RULES.mddocs/LOGBOOK.mddocs/adr/ADR-080-ai-autonomy-flywheel-overview.mddocs/adr/ADR-095-12agent-sdk-integration.mddocs/adr/ADR-100-ai-autonomous-slo.mddocs/adr/ADR-105-mcp-agent-loop-governance.mddocs/adr/ADR-107-awooop-control-plane-storage.mddocs/superpowers/specs/2026-04-15-MASTER-ai-autonomous-flywheel-v2.md