## Phase 1-3: Control Plane + Contract System - awooop_phase1_control_plane_2026-05-04.sql: 12 張核心表 + RLS - awooop_phase1_batch1_rls_2026-05-04.sql: 全部 FORCE RLS + GRANT - packages/awooop-contracts/: 六合約 JSON Schema + golden fixtures - src/models/awooop_contracts.py: Pydantic v2 contract models(extra=forbid) - src/repositories/contract_repository.py: contract lifecycle(draft→published→active) - src/services/contract_service.py: HMAC publish sig + Redis multi-sig activate - src/services/schema_validator.py: LLM output validator(retry×3, E-SCHEMA-001) ## Phase 2: Tenant Isolation - awooop_phase2_budget_ledger_2026-05-04.sql: budget_ledger + RLS - src/services/budget_service.py: Token Budget Hard Kill 三層防線 - src/core/context.py: PROJECT_ID ContextVar(31 background loop 自動繼承) - src/db/base.py + models.py: project_id 欄位 + RLS set_config 注入 - src/hermes/nl_gateway.py: project_id Redis key 前綴(Phase A 雙寫) - src/services/anomaly_counter.py: per-project 改造(Phase A fallback) ## Phase 4: Platform Shell in Shadow Mode - awooop_phase4_run_state_2026-05-04.sql: run_state + step_journal + idempotency - src/services/run_state_machine.py: 8-state FSM + SKIP LOCKED + stale reaper - src/services/platform_runtime.py: UUID v7 + W3C trace_id + shadow_execute - src/services/audit_sink.py: PII/secret redaction 9 patterns - src/api/v1/platform/runs.py: POST/GET /v1/platform/runs(Router→Service 架構) - src/workers/platform_worker.py: SKIP LOCKED worker + heartbeat + reaper loop - src/main.py: platform router + lifespan worker start/stop ## Phase 5: MCP Gateway 五閘門 - awooop_phase5_mcp_gateway_2026-05-04.sql: 4 表 + RLS - src/plugins/mcp/gateway.py: McpGateway(Gate 1~5, E-MCP-GATE-001~009) - src/plugins/mcp/redaction_middleware.py: 雙層 redaction + 16K 截斷 - src/plugins/mcp/registry.py: __provider name mangling(ADR-116) - src/plugins/mcp/credential_resolver.py: k8s secret ref 解析 - tests/test_mcp_credential_isolation.py: 10 個迴歸測試(secret leak 防再現) ## Phase 6-8: EwoooC + Channel Hub + Approval Token - awooop_phase6_ewoooc_onboarding_2026-05-04.sql: ewoooc tenant + 4 read-only MCP tools - awooop_phase7_channel_hub_2026-05-04.sql: conversation_event + outbound_message - src/services/provider_proxy.py: ProviderProxy + PlatformEnvelope(ADR-115) - src/services/channel_hub.py: Telegram inbound mirror + Progressive Feedback(30s) - src/services/awooop_approval_token.py: HS256 + jti NX replay 防護 + suggest mode Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
66 KiB
AwoooP 完整詳細實施計畫
版本:v1.0(12-Agent 全景審查後整合版) 日期:2026-05-03(台北時區) 建立者:12-Agent 聯合審查 × Codex 整合 基礎文件:MASTER-WORKPLAN.md、ADR-106、ADR-107 ⚠️ ADR 編號修正:ADR-108/109/110 已被其他 ADR 占用 → AwoooP 專用 ADR 從 ADR-111 開始
本文件是 MASTER-WORKPLAN.md 的完整展開版。 MASTER-WORKPLAN 是主索引,本文是執行細節。 任何矛盾以本文為準(本文更新日期更晚)。
0. 全景背景
0.1 基礎架構現況(截至 2026-05-03)
| 組件 | 現況 | 備註 |
|---|---|---|
| Ollama Primary | GCP-A 34.143.170.20:11434(SSD) |
ADR-110,取代 ADR-105 |
| Ollama Secondary | GCP-B 34.21.145.224:11434(SSD) |
新增,2026-05-03 上線 |
| Ollama Fallback | Local 192.168.0.111:11434(HDD) |
最後防線,非 Primary |
| PostgreSQL | 192.168.0.188(私網) |
AwoooP 控制面唯一 source of truth |
| Redis | 192.168.0.188(私網) |
cache/watch/counter only(ADR-107 D4) |
| K3s 叢集 | awoooi-prod namespace |
AWOOOI first tenant |
| Gitea CI/CD | 192.168.0.110(或 Gitea Cloud) |
ADR-039,所有 build 從 Gitea |
0.2 12-Agent 審查發現彙整
原始 MASTER-WORKPLAN 有 24 項共識問題。12 位 Agent 並行深度審查後新增:
| Agent | 新增 P0/P1 問題數 | 新增 ADR 需求 | 新增 Inventory |
|---|---|---|---|
| critic | 10 | 1(ADR-116 Migration Discipline) | INV-5、INV-6、INV-7 |
| vuln-verifier | 8(含 PoC 確認 3 個) | 2(ADR-116/117 安全系列) | — |
| debugger | 12(故障情境) | — | 8 份 Runbook |
| db-expert | 8(表設計缺陷)+ RLS 完全空白 | 1(ADR-118 RLS 策略) | — |
| planner | 7 粒度過粗 + 10 acceptance 不閉環 | — | — |
| fullstack-engineer | 7 API endpoint 缺失 + 9 error code | — | — |
| frontend-designer | 8 UI 模組完全缺失 | ADR-UI-01~04 | — |
| refactor-specialist | 8 重構地雷 + 11 PR 方案 | — | — |
| migration-engineer | 7 相容性風險 | — | version matrix |
| onboarder | 31 background loop(vs 估計 ~10)+ 13 模組衝突 | — | INV-8 |
| tool-expert | 8 工具容量不足 + 8 工具缺失 | — | — |
| web-researcher | 業界 5 大對齊缺口(SAGA/Token Kill/MCP OAuth 2.1/OTel/OWASP) | 5(ADR-119~123) | — |
| 合計新增 | ~70 個問題 | ~12 份 ADR | ~4 份 Inventory |
結論:不先補完 Pre-flight Audit,Phase 1 必爆。
1. 完整問題清單(P0 優先順序)
P0 — 直接爆炸(必須在 Phase 1 之前修補)
| # | 問題 | 來源 | 影響範圍 |
|---|---|---|---|
| P0-01 | Redis key 直接改名無雙寫期(費用計數歸零、Telegram 409、silence 失效、Ollama failover 三層拓撲雙寫不到) | critic | 費用、告警、Ollama |
| P0-02 | Migration SQL 表名錯(incident_records / mcp_audit_snapshots)、無 rollback、ORM 1.x vs 2.x |
critic | Phase 1 migration |
| P0-03 | project_id / tenant_id 在 codebase 0 命中,30+ 業務表無此欄 |
onboarder | 全系統 |
| P0-04 | requires_approval 欄位由 LLM output 決定(security_interceptor.py:451-490) |
vuln-verifier(PoC 確認) | approval 鏈 |
| P0-05 | callback nonce 偽造:server nonce 邏輯可不知 secret 構造通過驗證(security_interceptor.py:451-490) | vuln-verifier(PoC 確認) | Telegram approval |
| P0-06 | Webhook HMAC replay 無 timestamp/nonce(webhooks.py:679-728) | vuln-verifier(PoC 確認) | 所有 webhook |
| P0-07 | 31 個 background loop 全無 project_id(main.py) | onboarder(實測) | 多租戶全崩 |
| P0-08 | telemetry.py:71 硬碼 if "192.168.0.188" not in endpoint: raise,EwoooC 啟動必失敗 |
onboarder | EwoooC Phase 6 |
| P0-09 | project_migration_state 表缺失,Strangler Fig 無資料載體 |
db-expert | Phase 1 |
| P0-10 | Task 9 順序倒置(agent prompt 載入點在 ConfigMap 前)→ 全回 None | critic | Phase 1 任何 agent |
| P0-11 | ollama:current_primary 在 ollama_auto_recovery.py:230 有第二定義,三層拓撲遷移必裂 |
onboarder | GCP Ollama 拓撲 |
| P0-12 | consensus_engine.py 中 CONSENSUS_PREFIX="consensus:" 無 project 前綴,multi-tenant 時跨 tenant 共用 |
onboarder | 多租戶一致性 |
| P0-13 | mcp_bridge.py:592-681 kubectl 呼叫硬碼 namespace="awoooi-prod" |
onboarder | EwoooC K8s tool |
P1 — 嚴重缺陷(Phase 2-4 之前必修)
| # | 問題 | 來源 | 影響範圍 |
|---|---|---|---|
| P1-01 | AWOOOI Bootstrap Paradox:cron/job/healthcheck 全無 project_id | critic | 多租戶啟動 |
| P1-02 | EwoooC 接入零技術路徑(非只改 OLLAMA_API_BASE) |
critic | Phase 6 |
| P1-03 | Strangler Fig shadow→canary→active 無量化 gate 條件 | planner | 切換決策 |
| P1-04 | Layer 3 redaction 零實作(helper 有但無 enforcement) | critic | 資訊安全 |
| P1-05 | _provider 屬性 public,可繞過 audit(mcp/registry.py:24-71) |
critic | MCP 安全 |
| P1-06 | WAITING_APPROVAL resume 不驗 caller identity,無 approval_token 簽章 |
critic | approval 安全 |
| P1-07 | Redis approval state 單點,無 PG sync | critic | approval 可靠性 |
| P1-08 | Audit log 本身會洩密(redaction 必須做在 audit sink 前) | critic | 資訊安全 |
| P1-09 | sanitization_service.py helper 無 enforcement point(MCP Gateway / AgentToolExecutor 都沒用) |
critic | tool 安全 |
| P1-10 | Active revision 切換無 transactional outbox,worker 可能吃舊 policy | db-expert | policy 一致性 |
| P1-11 | Run/Channel idempotency 缺 key derivation 規則與 unique index | db-expert | 重複執行 |
| P1-12 | Async worker 缺 lease / heartbeat / stale reaper | db-expert | worker 可靠性 |
| P1-13 | 高流量表 partition + retention 需 Phase 1 就決定(不能後補) | db-expert | 長期可擴展 |
| P1-14 | Observability metrics label cardinality(run_id/trace_id/session_id 禁進 metrics) | fullstack | Prometheus |
| P1-15 | multi_sig_redis.py:178-205 approval flow 零 trace_id |
debugger | 故障排查 |
| P1-16 | hermes/nl_gateway.py:7,146,163 Redis key 無 project 前綴 |
onboarder | Hermes 隔離 |
| P1-17 | anomaly_counter.py:790 AnomalyCounter 全域單例,6 個 prefix 無 tenant 隔離 |
onboarder | 多租戶計數 |
| P1-18 | incident_service.py:603-615 SCAN incident:* 無 project_id |
onboarder | Redis 資料隔離 |
| P1-19 | Contract publish 權限與簽章未定義 | critic | contract 治理 |
| P1-20 | 13 個全域單例跨 tenant 共用(TrustEngine/ProviderRegistry/TelegramGateway/等) | onboarder | 多租戶隔離 |
| P1-21 | Token Budget 無 Hard Kill($47k agent loop 事故教訓) | web-researcher | 費用控管 |
| P1-22 | RLS(Row Level Security)完全空白 | db-expert | DB 多租戶 |
| P1-23 | GCP Ollama 三層拓撲 Redis key 雙寫遷移未規劃(ollama:current_primary 舊 key 只知道 1 個 host) |
critic | Ollama failover |
| P1-24 | decision_manager.py:240 硬碼 telegram_silence:{target} 未 import gateway 常數(跨兩處定義) |
debugger | silence 功能 |
P2 — 設計缺口(Phase 5-8 之前必補)
| # | 問題 | 來源 | 影響範圍 |
|---|---|---|---|
| P2-01 | Telegram/LINE/Slack/API/Internal 缺 canonical principal mapping | critic | 身份統一 |
| P2-02 | Run FSM 零實作(只有表設計,無狀態機程式碼) | fullstack | Phase 4 |
| P2-03 | EwoooC Provider Proxy 不能只改 URL,需要完整 envelope+audit 入口 | critic | Phase 6 |
| P2-04 | 業界 Durable Execution / SAGA 補償交易機制缺失 | web-researcher | 長時 agent tool chain |
| P2-05 | MCP OAuth 2.1(RFC 9728 + RFC 7591)Confused Deputy 無防護 | web-researcher | MCP 安全 |
| P2-06 | OTel GenAI Semantic Conventions(span 命名 / attribute 規範)未對齊 | web-researcher | 可觀測性 |
| P2-07 | OWASP Agentic AI Top 10 對齊缺口(prompt injection、tool misuse 等 7 項) | web-researcher | AI 安全 |
| P2-08 | ISO 42001 AI 管理體系對齊文件缺失 | web-researcher | 合規 |
| P2-09 | 7 個 API endpoint 缺失(見 §6 fullstack 清單) | fullstack | API 完整性 |
| P2-10 | 9 個 error code 缺失(見 §7 error code 字典) | fullstack | 客戶端解析 |
| P2-11 | Progressive feedback policy(async run 無進度通知 ≤30s) | fullstack | UX |
| P2-12 | 8 個 Operator Console UI 模組完全缺失(見 §8 frontend) | frontend-designer | 運營可見性 |
| P2-13 | awooop-ctl CLI 工具缺失(現有 kubectl + curl 手動操作) |
tool-expert | 運維體驗 |
| P2-14 | OPA/Cedar policy engine 缺失(現在 contract 授權邏輯散落程式碼) | tool-expert | 授權集中化 |
| P2-15 | chaostoolkit / LitmusChaos 缺失(Strangler Fig 切換無混沌驗證) | tool-expert | 容災驗證 |
| P2-16 | PgBouncer 缺失(AwoooP 多 worker 下 PG connection pool 會爆) | tool-expert | DB 可擴展性 |
2. Pre-flight Audit — Phase 0 完整清單
Phase 0 全部 docs-only。無任何 runtime code 變動。 完成後才開新 Codex 對話進 Phase 1 code。
2.1 AwoooP 核心 ADR(ADR-111~115)
注意:ADR-108/109/110 已被 incident fingerprint / telegram dedup / GCP Ollama 拓撲占用,AwoooP 從 ADR-111 起。
| ADR | 主題 | 解決問題 | 主要內容 |
|---|---|---|---|
| ADR-111 | AwoooP Bootstrap Order & Identity Paradox | P0-07、P0-01、P1-01 | platform_internal / requires_project_id / legacy_awoooi_default 三種標記;31 個 background loop 分類;AWOOOI cron/job 過渡豁免時程;Ollama GCP 三層 failover 的 platform_resource 聲明 |
| ADR-112 | Contract Governance & Publishing Workflow | P1-19 | 誰可 publish / activate;CODEOWNERS;HMAC 簽章;approval workflow;activation audit;draft 與 published 隔離 |
| ADR-113 | Active Revision Invalidation & Outbox | P1-10 | awooop_contract_outbox 表設計;Redis pub/sub 通知;worker revision-aware cache;split-brain 防禦;GCP Ollama 拓撲切換事件 |
| ADR-114 | Idempotency, Worker Lease & Run Recovery | P1-11、P1-12 | channel event dedupe;(project_id, channel_type, provider_event_id) unique;worker lease_until / heartbeat_at / attempt_count;stale run reaper;SKIP LOCKED |
| ADR-115 | Canonical Principal Mapping & Tenant Onboarding | P2-01、P0-08 | Telegram/LINE/Slack/API/Internal → platform_subject 統一映射;EwoooC Proxy Adapter;Tsenyang/Bitan 模板;telemetry.py:71 IP assert 修正方案 |
2.2 安全強化 ADR
| ADR | 主題 | 解決問題 | 主要內容 |
|---|---|---|---|
| ADR-116 | AwoooP Security Hardening | P0-04、P0-05、P0-06 | callback nonce 重設計(server_secret 必參與 HMAC);webhook 加 timestamp/nonce 防 replay;requires_approval 改為 policy-derived(禁止 LLM 決定);approval_token signing 規格(HS256,15min TTL,jti 唯一性) |
| ADR-117 | MCP OAuth 2.1 & Confused Deputy Prevention | P2-05 | RFC 9728 Resource Indicators;RFC 7591 Dynamic Client Registration;per-tenant token scope;Confused Deputy 防護設計;MCP Server binding PKCE flow |
2.3 資料庫強化 ADR
| ADR | 主題 | 解決問題 | 主要內容 |
|---|---|---|---|
| ADR-118 | Row-Level Security & Tenant DB Isolation | P1-22 | 所有 AwoooP 表啟用 RLS;current_setting('app.project_id') 注入;RLS bypass role 設計;migration 驗收標準 |
| ADR-119 | Durable Execution & SAGA Compensation | P2-04 | multi-step agent tool chain 的 step-level journal;補償交易觸發條件;checkpoint/resume 設計;與 Phase 4 run state machine 整合 |
2.4 可觀測性 & AI 安全 ADR
| ADR | 主題 | 解決問題 | 主要內容 |
|---|---|---|---|
| ADR-120 | Token Budget Hard Kill | P1-21 | 每 run / 每 project / 每 tenant 三層 budget limit;hard kill(不只 alert);$47k agent loop 事故 RCA;budget_ledger 表設計;Redis hot counter + PG 事務 hard stop |
| ADR-121 | OTel GenAI Semantic Conventions Alignment | P2-06 | span 命名規範(gen_ai.request.*);token 計數 attribute;LLM provider attribute;與現有 SignOz(188:24318)整合;metrics label cardinality 規則 |
| ADR-122 | OWASP Agentic AI Top 10 & ISO 42001 Alignment | P2-07、P2-08 | Top 10 逐項對應到 AwoooP 控制面;ISO 42001 AI 管理體系必要文件清單;每 Phase 對齊驗收項 |
2.5 Migration Discipline ADR
| ADR | 主題 | 解決問題 | 主要內容 |
|---|---|---|---|
| ADR-123 | Background Loop project_id Migration Strategy | P0-07、P1-01 | 31 個 background loop 分三類(platform_internal / legacy_awoooi_default / requires_project_id);每類遷移策略;regression test 設計;完成標準(main.py 0 個無標記 loop) |
| ADR-124 | Global Singleton Decomposition for Multi-tenancy | P1-20 | 13 個全域單例清單;分解策略(per-project registry / factory pattern);AWOOOI 1.0 → AwoooP 1.0 遷移路徑;不能同時拆的依賴序 |
2.6 前端 Operator Console ADR(新增)
| ADR | 主題 | 解決問題 | 主要內容 |
|---|---|---|---|
| ADR-UI-01 | AwoooP Operator Console 架構 | P2-12 | 8 個 UI 模組規格;與現有 apps/web/ 整合方式;多租戶視角設計;i18n(next-intl)規範 |
| ADR-UI-02 | Contract Lifecycle UI | P2-12 | draft → publish → activate 操作流程;revision diff 視覺化;contract family 篩選 |
| ADR-UI-03 | Run State & Shadow Monitoring UI | P2-12 | shadow/canary/active 切換 dashboard;run FSM 視覺化;Strangler Fig gate 量化指標展示 |
| ADR-UI-04 | Tenant Budget & Audit UI | P2-12 | per-project token budget;hard kill 觸發歷史;audit log 查詢(含 redaction 遮蔽) |
2.7 ADR-106 補充章節
ADR-106 需新增:
- Strangler Fig Quantified Gates(量化切換條件)
- GCP Ollama 拓撲影響(三層 failover 如何成為
platform_resource,不屬於任何 tenant) - Bootstrap Order 參照 ADR-111
2.8 Inventory 清單(9 份)
| Inventory | 位置 | 範圍 | 解決問題 |
|---|---|---|---|
| INV-1 | docs/awooop/inventory/INV-1-redis-keys.md |
全 codebase grep redis_client.*\(["'] 等,列出 43+ 個 key、命名空間、TTL、用途、寫入/讀取點、是否硬碼 |
P0-01、P1-18 |
| INV-2 | docs/awooop/inventory/INV-2-repository-project-id-retrofit.md |
30+ 業務表 × 目前有無 project_id × 所有 repository 方法 × 需加 filter 的查詢 × 需 backfill 的歷史資料 |
P0-03 |
| INV-3 | docs/awooop/inventory/INV-3-entrypoints.md |
所有 cron job / scheduler / webhook / CLI / healthcheck / internal service call,標記三種類型 | P0-07、P1-01 |
| INV-4 | docs/awooop/inventory/INV-4-hardcoded-namespace-ip.md |
硬碼 K8s namespace(awoooi-prod)、SSH 主機 IP、白名單(含新 GCP IP:34.143.170.20、34.21.145.224) |
P0-08、P0-13 |
| INV-5 | docs/awooop/inventory/INV-5-migration-compatibility-matrix.md |
版本相容矩陣:SQLAlchemy 1.x→2.x / Alembic / Pydantic v1→v2 / FastAPI 0.x / Python 3.10→3.12;每個 breaking change + 影響範圍 | critic |
| INV-6 | docs/awooop/inventory/INV-6-rollback-playbook-register.md |
6 個 rollback playbook:Phase 1 schema rollback、Phase 2 Redis key rollback、Phase 5 MCP Gateway rollback、Phase 6 EwoooC rollback、Ollama GCP→Local fallback rollback、approval flow rollback | migration |
| INV-7 | docs/awooop/inventory/INV-7-pr-cutting-plan.md |
11 個 PR 切割方案(refactor-specialist 設計):每 PR 的範圍、前置依賴、review 者、合併順序 | refactor |
| INV-8 | docs/awooop/inventory/INV-8-background-loop-catalog.md |
31 個 background loop 逐一列出:名稱、位置(main.py 行號)、類別標記、遷移策略、預計完成 Phase | onboarder |
| INV-9 | docs/awooop/inventory/INV-9-global-singleton-catalog.md |
13 個全域單例逐一列出:名稱、位置、依賴方、分解策略、遷移風險 | onboarder |
2.9 Phase 0 驗收標準
- ADR-111~115(5 份 AwoooP 核心 ADR)全部 Accepted
- ADR-116~124(9 份強化 ADR)全部 Accepted
- ADR-UI-01~04(4 份 UI ADR)全部 Accepted(或 Proposed + 統帥批准開工)
- ADR-106 補入 Strangler Fig Quantified Gates + GCP Ollama 章節
- INV-1~INV-9(9 份 Inventory)完成初稿
- 無任何 runtime code 變動
git diff --check通過
3. 8-Phase 詳細工作項
每項含:目標、範圍(精確路徑)、輸入(前置依賴)、輸出(交付物)、驗收標準、邊界(禁止碰什麼)
Phase 1 — Control Plane Schema Foundation
目標:建立 PostgreSQL contract control plane 最小可用骨架,修正舊 SQL migration 三大 blocker,決定高流量表 partition 策略。
前置依賴:Phase 0 全部完成(所有 ADR + Inventory)
範圍(精確檔案):
apps/api/migrations/— 新增 migration filesapps/api/src/models/— 新增 AwoooP SQLAlchemy modelsapps/api/src/repositories/— 新增 AwoooP repositoriesdocs/runbooks/— 新增 partition + retention runbook
禁止碰:
- 任何既有 repository 方法(留給 Phase 2)
- provider 行為(
ai_router.py/ollama_*.py) - Telegram/LINE webhook 路徑
apps/web/- 任何 K8s manifest
工作項(順序執行):
1.1 表名核對
- grep 確認 `incidents`(非 incident_records)
- grep 確認 `mcp_audit_log`(非 mcp_audit_snapshots)
- 修正 ORM: SQLAlchemy 2.x mapped_column、補齊 Numeric/UniqueConstraint/func import
- 每個 migration 強制有 down migration(rollback SQL)
1.2 Task 9 順序修正(必須 Phase 1.1 之前完成)
- Dockerfile: agent_loader default path 指向 ConfigMap mount
- ConfigMap 預載: 確認 agent prompt 路徑在 ConfigMap 已存在
- 驗收:dry-run 一個 agent loader,輸出非 None
1.3 AwoooP 控制面表(新增 migration)
- awooop_projects(tenant 主表,project_id VARCHAR PK,budget,ACL)
- awooop_contract_revisions(六合約共用 revision 表,append-only,見 §4.1 完整欄位)
- awooop_active_revisions(active pointer,指向特定 revision_id)
- awooop_artifact_refs(prompt/schema/eval ref + sha256 + type)
- awooop_project_migration_state(Strangler Fig 階段追蹤,per project × per capability)
- awooop_contract_outbox(ADR-113,active revision 切換事件,for worker invalidation)
- awooop_channel_event_dedupe(ADR-114,idempotency,唯一鍵)
- awooop_platform_subjects(ADR-115,canonical principal mapping)
- awooop_budget_ledger(ADR-120,token budget,per project × per period)
1.4 高流量表(在 Phase 4/7 建立時已決定 partition,此時寫規則)
- 須在本 Phase migration 中加 partition template comment(不執行,留 Phase 4)
- awooop_run_state → range partition by created_at(月)
- awooop_channel_event → range partition by created_at(月)
- awooop_mcp_gateway_audit → range partition by created_at(月)
- awooop_agent_audit_log → range partition by created_at(月)
- retention: 90 天 hot + 1 年 warm(pg_partman / cron job)
- 寫進 docs/runbooks/awooop-partition-retention.md
1.5 AWOOOI Bootstrap(seed data)
- INSERT INTO awooop_projects(project_id='awoooi', display_name='AWOOOI', migration_mode='legacy_awoooi_default')
- 驗收:AWOOOI 0 行為改動
1.6 RLS 骨架(ADR-118)
- 所有 awooop_* 表啟用 RLS
- policy: USING (project_id = current_setting('app.project_id', TRUE))
- bypass role: awooop_platform(只給 platform worker 用)
- 注意:RLS 需要 migration + 測試,不只是 ALTER TABLE ENABLE ROW LEVEL SECURITY
1.7 Immutability 測試
- published contract revision 嘗試 UPDATE → 必失敗(trigger 或 check constraint)
- draft 與 active 隔離:runtime 讀取 view 不含 draft
- 自動化:pytest + db-expert review
RACI:
- R(執行):fullstack-engineer
- A(負責):db-expert review,統帥批准
- C(諮詢):refactor-specialist(migration PR 切割)、critic(最終 review)
- I(通知):migration-engineer(版本相容驗證)
DoD:
- 所有 migration up/down dry-run 通過
- AWOOOI 可表示為
project_id=awoooi,0 行為改動 - RLS 測試:cross-project SELECT 被拒絕
- partition runbook 已建立
Phase 2 — Tenant Isolation & Namespace Hardening
目標:在開放任何下游 tenant 之前,把 AWOOOI 自己變成乾淨的 tenant。 前置:Phase 1 完成
範圍:
apps/api/src/services/— Redis key 遷移(依 INV-1)apps/api/src/repositories/— 加 project_id filter(依 INV-2)apps/api/src/services/security_interceptor.py— nonce 修補(P0-05,ADR-116)apps/api/src/api/v1/webhooks.py— replay 防護(P0-06,ADR-116)apps/api/src/core/telemetry.py:71— 移除硬碼 IP assert(P0-08)apps/api/src/services/decision_manager.py:240— silence key 常數化(P1-24)apps/api/src/services/ollama_auto_recovery.py:230— 移除第二定義(P0-11)apps/api/src/plugins/mcp/mcp_bridge.py:592-681— namespace 動態化(P0-13)apps/api/src/services/consensus_engine.py— CONSENSUS_PREFIX 加 project 前綴(P0-12)apps/api/src/hermes/nl_gateway.py— Redis key 加 project 前綴(P1-16)apps/api/src/services/anomaly_counter.py:790— per-project 改造(P1-17)apps/api/src/services/incident_service.py:603— SCAN 加 prefix(P1-18)
禁止碰:
awooop_contract_revisions以外的 AwoooP Phase 1 新表結構- EwoooC / Tsenyang 任何接入(留 Phase 6)
- 任何 provider routing 改動(Ollama GCP 拓撲已由 ADR-110 定案,不在此 Phase 改)
工作項:
2.1 Redis 三階段雙寫遷移計畫執行(依 INV-1,分三批)
批次 A(Critical Path,影響 Ollama GCP 拓撲):
- ollama:current_primary(舊)→ {project_id}:ollama:primary(新)
注意:要同時支援三層 GCP-A/GCP-B/Local,INV-1 需確認所有寫入點
- ollama_auto_recovery.py:230 第二定義刪除,統一常數
批次 B(費用 + 告警關鍵):
- ai_rate:total_cost:gemini → {project_id}:ai_rate:total_cost:gemini
- telegram:polling:leader → platform:telegram:polling:leader(platform_resource)
- telegram_silence:{target} → {project_id}:telegram_silence:{target}
同步更新 decision_manager.py:240 import gateway 常數
批次 C(working memory):
- consensus: → {project_id}:consensus:(consensus_engine.py)
- hermes Redis keys(nl_gateway.py)
- anomaly_counter 6 個 prefix
- incident:* SCAN(incident_service.py:603)
每批次:Phase A(雙寫 30 天)→ Phase B(雙讀 14 天)→ Phase C(移除舊 key)
2.2 Security hardening(ADR-116)
- telemetry.py:71:移除 "192.168.0.188" 硬碼 assert,改為 config-driven allowed endpoints
- security_interceptor.py:451-490:nonce 重設計,server_secret 必參與 HMAC
- webhooks.py:679-728:加 timestamp(±5min window)+ nonce(Redis dedup)
- requires_approval:改為從 policy contract 讀取,禁止 LLM output 決定
- approval_token:HS256,15min TTL,jti 唯一性(Redis NX)
2.3 Repository project_id 改造(依 INV-2)
- 所有 30+ repository 方法加 project_id filter
- K8s namespace 白名單 → tenant-aware(mcp_bridge.py:592-681 動態化)
- SSH 主機白名單 → tenant-aware(依 INV-4)
2.4 Background loop 標記(依 ADR-123,INV-3/INV-8)
- 31 個 loop 標記為 platform_internal / legacy_awoooi_default / requires_project_id
- platform_internal 帶 project_id=__platform__
- legacy_awoooi_default fallback 到 project_id=awoooi,寫退場時程
2.5 Global singleton 分解第一步(依 ADR-124,INV-9)
- 只做:AnomalyCounter(P1-17 已修)per-project 改造
- 其餘 13 個全域單例列出退場時程(不在此 Phase 全拆,防爆炸半徑)
2.6 Token Budget Hard Kill 基礎(ADR-120)
- budget_ledger 表 migration(Phase 1 已建,此 Phase 寫入邏輯)
- 每 LLM call 前:check budget → hard kill if exceeded(不只 log)
- Redis hot counter + PG 事務 hard stop
RACI:
- R:fullstack-engineer + refactor-specialist(大量 repository 改動)
- A:db-expert(repository 改動 review)、vuln-verifier(security hardening PoC 驗證)
- C:critic(整體 diff review)、migration-engineer(相容性確認)
- I:tool-expert(K8s namespace 改動相關)
DoD:
- INV-1 所有 P0 key 完成三階段遷移(Phase A 完成,Phase B/C 在觀察期)
- cross-project test 全紅(pytest 覆蓋)
grep -r "awoooi-prod" apps/api/src/結果為 0grep -r "192.168.0.188" apps/api/src/telemetry assert 消失- vuln-verifier PoC 重跑:P0-05 nonce 偽造失敗、P0-06 webhook replay 失敗
- Budget hard kill 測試:超額後 LLM call 被拒絕
Phase 3 — Contract Packages & Validators
目標:六合約從散文升級為可驗證程式。 前置:Phase 1 完成(contract_revisions 表存在)
範圍:
packages/awooop-contracts/(此時才建立!)apps/api/src/services/contract_service.py(新建)apps/api/src/repositories/contract_repository.py(新建)
禁止碰:
- 任何既有 provider / router / telegram 路徑
apps/web/(UI 留 Phase 8 之後)
工作項:
3.1 建立 packages/awooop-contracts/(此時才有真實內容)
- 六合約 JSON Schema(Project/Tenant、Agent、MCP Gateway、Policy/Routing、Run State、Channel Event)
- Pydantic v2 models 對應六合約
- envelope schema:platform invocation、MCP tool call、run state transition、channel event
- golden fixtures(valid × 6 + invalid × 6)
3.2 Contract lifecycle service
- draft():建立 draft revision,不可被 runtime 讀
- publish():產生 immutable published revision(body_hash = sha256(body_json))
- activate():更新 active pointer,寫入 contract_outbox(ADR-113)
- get_active():runtime 讀取路徑,只返回 published + active
- 全部操作記錄 audit log
3.3 Output schema validator middleware
- LLM 回傳 → 過 schema validator → 失敗 → retry(上限 3 次)→ 失敗 → error code(E-SCHEMA-001)
- 任何 schema 不符的 LLM 輸出無法到達 channel adapter
3.4 Contract governance(ADR-112)
- CODEOWNERS 指定 packages/awooop-contracts/
- publish API:HMAC 簽章驗證
- activate API:approval workflow(multi_sig_redis 路徑)
3.5 SHA-256 artifact 驗證
- 所有 artifact ref 含 sha256
- runtime 讀取時驗 hash(與 DB 記錄比對)
DoD:
- schema 不符的 LLM 輸出無法到達 channel adapter(整合測試)
- AWOOOI 第一份 Agent contract 可 publish + activate(E2E)
- prompt/schema ref 必含 sha256
Phase 4 — Platform Shell in Shadow Mode
目標:建立第一個 runtime shell,只跑 shadow,不改 legacy 行為。 前置:Phase 3 完成
範圍:
apps/api/src/api/v1/platform/— 新增 platform runs APIapps/api/src/services/platform_runtime.py— 新建apps/api/src/services/run_state_machine.py— Run FSM 實作(P2-02)apps/api/src/workers/platform_worker.py— 新建apps/api/src/services/audit_sink.py— 加 redaction(P1-08)
禁止碰:
- 任何既有
/v1/incidents/、/v1/webhooks/路徑 - Telegram bot handler(legacy 維持)
- EwoooC 接入(留 Phase 6)
工作項:
4.1 Run API shell(shadow only)
- POST /v1/platform/runs
- 生成 run_id(UUID v7)、trace_id(W3C traceparent compatible)
- 解析 project + agent contract active revision
- 解析 EffectivePolicy(6 層合併,不改 provider 行為)
4.2 Run State Machine(ADR-114 + ADR-119)
- States: PENDING → RUNNING → WAITING_TOOL → WAITING_APPROVAL → COMPLETED / FAILED / CANCELLED
- lease_until、heartbeat_at、attempt_count 欄位
- SKIP LOCKED 取單(防 double-pickup)
- stale run reaper(每分鐘掃 expired lease,回到 PENDING 或 FAILED)
- SAGA step journal(ADR-119):每個 tool call 寫入 step_id、補償指令
4.3 Idempotency(ADR-114)
- (project_id, channel_type, provider_event_id) 複合 unique
- 重複事件 return 既有 run_id(不產生新 run)
- Redis NX + PG constraint 雙層保護
4.4 Audit log redaction(ADR-116)
- audit_sink 寫入前過 sanitization_service pipeline
- PII / secret pattern 硬攔(含 GCP IP、PG password、Telegram token 等)
- audit log 不記錄 raw LLM input/output,只記 hash + schema validation result
4.5 Observability(ADR-121)
- OTel GenAI span 命名(gen_ai.request.*)
- token 計數 attribute(gen_ai.usage.prompt_tokens 等)
- metrics label:只 project_id / agent_id / status / provider(禁止 run_id/trace_id/session_id 進 metrics)
- run_id / trace_id 只進 logs/traces(不進 metrics)
4.6 Shadow mode wiring
- 選定 3 個 AWOOOI 事件 mirror 到 shadow(不發 user response)
- shadow run 0 destructive tool call(MCP write/execute 全 block)
4.7 Token Budget Hard Kill(ADR-120)
- per-run token budget(from EffectivePolicy)
- 超額 → hard kill → FAILED state → error code E-BUDGET-001
- 每 run 完成後寫入 budget_ledger(實際消耗)
RACI:
- R:fullstack-engineer(API + service)、db-expert(run state schema review)
- A:critic(shadow mode 設計 review)、vuln-verifier(redaction PoC)
- C:debugger(trace_id 貫穿設計)、tool-expert(OTel 整合)
- I:migration-engineer(worker lease 相容性)
DoD:
- shadow run 0 user-visible response、0 destructive tool call(vuln-verifier 驗證)
- legacy AWOOOI 行為 0 改變(回歸測試通過)
- worker crash 後 stale run 1 分鐘內被回收(自動化測試)
- duplicate event 不產生重複 run(idempotency 測試)
- audit log 0 secret 命中(vuln-verifier 抽樣 100 筆)
- token budget 超額觸發 hard kill(整合測試)
Phase 5 — MCP Gateway First Slice
目標:tool 授權搬到 Gateway,read-only 工具先進,解決 sanitization enforcement。 前置:Phase 4 完成
範圍:
apps/api/src/plugins/mcp/gateway.py— 新建 MCP Gatewayapps/api/src/plugins/mcp/registry.py:24-71—_provider→__provider(P1-05)apps/api/src/plugins/mcp/mcp_bridge.py— 接入 Gatewayapps/api/src/services/sanitization_service.py— enforcement point(P1-09)
禁止碰:
- MCP write/execute tools(寫/執行工具留 Phase 8)
- Telegram approval flow(改動在 Phase 8)
工作項:
5.1 MCP Gateway 表
- awooop_mcp_tool_registry(tool_id, project_id, agent_id, tool_type, allowed_scopes)
- awooop_mcp_grants(grant_id, project_id, agent_id, tool_id, granted_by, expires_at)
- awooop_mcp_credential_refs(ref_id, tool_id, k8s_secret_ref, sha256)
- awooop_mcp_gateway_audit(call_id, trace_id, run_id, tool_id, credential_ref, latency_ms, result_status)
5.2 Five-gate enforcement
- Check: Project AND Agent AND Tool AND Environment AND Approval
- 任一不符 → 拒絕 + 記錄 audit + error code E-MCP-GATE-XXX
5.3 Result sanitization enforcement(P1-04、P1-09)
- 所有 MCP tool result 必經 sanitization_service pipeline
- MCP Gateway 加 sanitization middleware(不允許 raw result 直接進 LLM context)
- 進 LLM 前一層 + 進 audit sink 一層(雙層 redaction)
- sast 掃描 agent 程式碼路徑:0 raw credential 接觸
5.4 _provider 修正(P1-05)
- registry.py: _provider → __provider(雙底線 Python name mangling)
- 加 unit test:外部 reflect 取用 → AttributeError
5.5 Credential isolation
- agent 程式碼不直接存取 K8s Secret
- Gateway 解析 credential_ref → 回傳 masked result(token 替換)
- 2026-04-18 secret leak 重演測試:kubectl describe 輸出不出現在 LLM context
5.6 MCP OAuth 2.1(ADR-117)
- 實作 per-tenant dynamic client registration(RFC 7591)
- Resource Indicators(RFC 9728)防 Confused Deputy
- PKCE flow for MCP Server binding
RACI:
- R:fullstack-engineer(Gateway service)
- A:vuln-verifier(credential isolation 驗證)、critic(架構 review)
- C:tool-expert(MCP spec 確認)、db-expert(Gateway 表設計 review)
- I:migration-engineer(MCP registry 相容性)
DoD:
- 2026-04-18 secret leak 重演測試通過(kubectl describe 輸出不出現在 LLM context 或 audit row)
- sast 掃描:agent 程式碼路徑 0 raw credential 接觸
__provider雙底線 unit test 通過- Five-gate 全部 integration test 覆蓋
Phase 6 — EwoooC Read-Only Tenant Onboarding
目標:以真實下游 tenant 驗證 AwoooP,全 read-only。 前置:Phase 5 完成、telemetry.py:71 hardcoded IP assert 已移除(Phase 2 完成)
範圍:
apps/api/src/— EwoooC project provisioningpackages/awooop-contracts/— EwoooC agent contractapps/api/src/services/provider_proxy.py— 新建 Provider Proxy Adapter(P1-02)
禁止碰:
- AWOOOI 任何既有業務邏輯
- MCP write/execute tools
工作項:
6.1 EwoooC project provisioning
- INSERT INTO awooop_projects(project_id='ewoooc', ...)
- 不可讀 AWOOOI data(RLS 驗證)
6.2 openclaw-biz agent contract
- 針對市場情報 domain 設計 I/O schema
- 安全 ceiling:read-only only,禁止 infra tool
6.3 Provider Proxy Adapter(P1-02,ADR-115)
- 不只是改 OLLAMA_API_BASE
- Proxy 入口強制注入 envelope:project_id / agent_id / trace_id / run_id
- 過 EffectivePolicy + budget guard + audit
- GCP Ollama 三層拓撲:EwoooC 走相同 primary/secondary/fallback 路由
- read-only / model-call 入口優先啟用
6.4 Market intelligence MCP tools 註冊
- 4 個 read-only tools:market_data_fetch、product_catalog_query、competitor_analysis、trend_report
- 全部在 MCP Gateway 五重 gate 管控
6.5 Shadow → Canary 升級計畫
- 先 14 天 shadow(Strangler Fig gate 量化)
- 符合條件後升 canary(selected responses)
- canary 通過再升 read_only
RACI:
- R:fullstack-engineer
- A:critic(EwoooC 資料隔離 review)、vuln-verifier(cross-tenant isolation PoC)
- C:db-expert(RLS 驗證)、migration-engineer(EwoooC rollback playbook,INV-6)
- I:tool-expert(GCP Ollama 拓撲 EwoooC 路由設定)
DoD:
- EwoooC SELECT 無法讀到 AWOOOI data(RLS + cross-tenant pytest)
- Provider Proxy Adapter E2E 測試:envelope 正確注入
- budget / audit 完全 project-scoped
- EwoooC 啟動時 telemetry.py 不再因 IP assert 失敗
Phase 7 — Communication Hub Increment
目標:標準化 channel 但不切斷既有 bot。 前置:Phase 6 完成
範圍:
apps/api/src/services/channel_hub.py— 新建apps/api/src/services/telegram_gateway.py— mirror inbound eventsapps/api/src/api/v1/platform/channel.py— 新建
禁止碰:
- 既有 telegram bot handler(維持 legacy 權威,直到 canary 量化 gate 通過)
- LINE / Slack 接入(留 v2)
工作項:
7.1 awooop_conversation_event + awooop_outbound_message 表
- partition by created_at(月,Phase 1 已定策略)
- retention policy 配置
7.2 Telegram inbound mirror
- 現有 telegram_gateway.py 事件複製到 awooop_conversation_event
- canonical principal mapping(ADR-115):所有 sender 寫入 awooop_platform_subjects
7.3 Progressive Feedback Policy(P2-11)
- WAITING_TOOL / RUNNING / WAITING_APPROVAL → 必發 Telegram 暫態訊息
- 用 edit_message 更新(非新訊息,不觸發通知)
- 首則進度訊息 ≤ 30s
7.4 Idempotency 驗證(已由 Phase 4 完成)
- duplicate channel retry 不產生 duplicate run(整合測試)
7.5 Adapter-level 安全
- 所有 channel adapter:escaping + redaction + idempotency + delivery audit
- channel adapter 0 LLM 呼叫、0 MCP 呼叫(pytest 覆蓋)
7.6 量化 gate 監控儀表板(配合 ADR-UI-03)
- Strangler Fig gate 指標:decision divergence / p95 latency / error rate
- 供 Phase 8 升級決策用
RACI:
- R:fullstack-engineer(API + channel hub)
- A:critic(channel 設計 review)、debugger(trace_id 貫穿驗證)
- C:frontend-designer(進度訊息 UX)、tool-expert(Telegram API 規格確認)
- I:migration-engineer(channel 相容性)
DoD:
- channel adapter 0 LLM 呼叫、0 MCP 呼叫
- async run 首則進度訊息 ≤ 30s
- duplicate retry 不產生 duplicate run
Phase 8 — Suggest & Controlled Write Paths
目標:從 read-only 升級到 propose,再到 controlled execute。 前置:Phase 7 完成 + Strangler Fig shadow→canary gate 全通過
範圍:
apps/api/src/services/multi_sig_redis.py— approval token 簽章(P1-06)apps/api/src/services/approval_timeout_resolver.py— 加 trace_id(P1-15)apps/api/src/api/v1/platform/suggest.py— suggest mode endpoint- Feature flags for write/execute paths
禁止碰:
- 任何 write/execute tool 的預設啟用
- Strangler Fig 量化 gate 通過前不做 auto_remediate
工作項:
8.1 Approval Token 安全強化(P1-06,ADR-116)
- WAITING_APPROVAL resume API:強制驗 approval_token(HS256,15min TTL,jti Redis NX)
- approval state:PG 為 source of truth,Redis 為 cache
- 過期 / 已決 / 重放 → 全部拒絕 + error code E-APPROVAL-XXX
8.2 multi_sig_redis.py + approval_timeout_resolver.py trace_id 補入
- 所有 approval 操作加 trace_id(P1-15)
- 完整鏈路可追蹤(debugger 驗證)
8.3 Suggest mode for AWOOOI SRE flows
- 選定低風險 3 個 SRE flow(e.g., 告警靜音建議、playbook 推薦)
- suggest 模式:AI 輸出建議,人工決定執行
- 量化 gate(ADR-106 補章):
* shadow → canary:≥14 天 + divergence <5% + p95 <10% + 0 P1 incident
* canary → read_only:≥7 天 + error rate <0.5% + cost diff <50%
* read_only → suggest:≥14 天 + accept rate ≥50% + 0 hallucination escalation
* suggest → auto_remediate:≥30 天 + rollback evidence ≥3 次 + approval token live + dry-run ≥99%
8.4 Dry-run 與 rollback evidence gate
- 每個 write/execute tool 必須有 dry-run mode
- rollback playbook 寫入 INV-6(Phase 0 已完成,此時執行驗證)
- 記錄每次 rollback 結果作為 Phase 8 gate evidence
8.5 Feature Flag Registry(見 §10)
- suggest mode:feature flag AWOOOP_SUGGEST_MODE(default OFF)
- controlled write:feature flag AWOOOP_WRITE_MODE(default OFF)
- 需顯式 flip 才啟用,不能環境變數意外帶入
8.6 vuln-verifier PoC 驗收
- WAITING_APPROVAL 無 token resume 必失敗
- Redis 宕機時 approval 仍可從 PG 恢復
RACI:
- R:fullstack-engineer
- A:vuln-verifier(approval security PoC)、critic(write path review)
- C:debugger(trace_id 驗證)、db-expert(approval state PG review)
- I:migration-engineer(feature flag rollback)
DoD:
- WAITING_APPROVAL 無 token resume 被拒絕(vuln-verifier PoC 通過)
- Redis 宕機後 approval 從 PG 恢復(整合測試)
- write/execute 預設 OFF,feature flag 手動 flip 才啟用
- 所有 Strangler Fig gate 量化驗收通過(critic + db-expert + vuln-verifier 三方簽核)
4. 資料庫詳細 Schema
4.1 awooop_contract_revisions(六合約共用 revision 表)
CREATE TABLE awooop_contract_revisions (
revision_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
project_id VARCHAR(64) NOT NULL REFERENCES awooop_projects(project_id),
contract_family VARCHAR(32) NOT NULL -- project_tenant/agent/mcp_gateway/policy_routing/run_state/channel_event
contract_id VARCHAR(128) NOT NULL,
version VARCHAR(32) NOT NULL,
lifecycle_status VARCHAR(16) NOT NULL DEFAULT 'draft', -- draft/published/superseded/revoked
body_json JSONB NOT NULL,
body_schema_version VARCHAR(32) NOT NULL,
body_hash CHAR(64) NOT NULL, -- SHA-256 hex
created_by VARCHAR(128) NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
published_at TIMESTAMPTZ,
supersedes_revision_id UUID REFERENCES awooop_contract_revisions(revision_id),
-- Immutability constraint
CONSTRAINT published_body_immutable CHECK (
lifecycle_status = 'draft' OR body_json IS NOT NULL
)
);
-- Runtime reads view(只看 published/active,不看 draft)
CREATE VIEW awooop_published_revisions AS
SELECT * FROM awooop_contract_revisions
WHERE lifecycle_status IN ('published', 'superseded');
-- Append-only trigger
CREATE OR REPLACE FUNCTION prevent_revision_update()
RETURNS TRIGGER AS $$
BEGIN
IF OLD.lifecycle_status != 'draft' THEN
RAISE EXCEPTION 'Published contract revision is immutable';
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER enforce_revision_immutability
BEFORE UPDATE ON awooop_contract_revisions
FOR EACH ROW EXECUTE FUNCTION prevent_revision_update();
-- RLS
ALTER TABLE awooop_contract_revisions ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON awooop_contract_revisions
USING (project_id = current_setting('app.project_id', TRUE)
OR current_user = 'awooop_platform');
4.2 awooop_run_state(含 lease + SAGA journal)
CREATE TABLE awooop_run_state (
run_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
project_id VARCHAR(64) NOT NULL,
agent_id VARCHAR(128) NOT NULL,
trace_id CHAR(32), -- W3C trace_id hex
parent_run_id UUID,
status VARCHAR(32) NOT NULL DEFAULT 'PENDING',
migration_mode VARCHAR(32) NOT NULL DEFAULT 'shadow', -- shadow/canary/read_only/suggest/auto_remediate
-- Worker lease
lease_until TIMESTAMPTZ,
heartbeat_at TIMESTAMPTZ,
attempt_count INT NOT NULL DEFAULT 0,
worker_id VARCHAR(128),
-- Token budget
budget_limit_tokens BIGINT,
tokens_used BIGINT NOT NULL DEFAULT 0,
-- Timestamps
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
completed_at TIMESTAMPTZ,
-- SAGA journal(step-level)
saga_steps JSONB DEFAULT '[]', -- [{step_id, tool, status, compensation_cmd, completed_at}]
-- Metadata
input_hash CHAR(64), -- SHA-256 of input envelope(for audit)
effective_policy_revision_id UUID
) PARTITION BY RANGE (created_at);
-- Per-project RLS
ALTER TABLE awooop_run_state ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON awooop_run_state
USING (project_id = current_setting('app.project_id', TRUE)
OR current_user = 'awooop_platform');
4.3 awooop_budget_ledger(Token Budget Hard Kill)
CREATE TABLE awooop_budget_ledger (
ledger_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
project_id VARCHAR(64) NOT NULL,
period DATE NOT NULL, -- YYYY-MM-DD(月份第一天)
provider VARCHAR(32) NOT NULL,
tokens_input BIGINT NOT NULL DEFAULT 0,
tokens_output BIGINT NOT NULL DEFAULT 0,
cost_usd NUMERIC(12, 6) NOT NULL DEFAULT 0,
hard_kill_at NUMERIC(12, 6), -- NULL = no limit
hard_killed BOOLEAN NOT NULL DEFAULT FALSE,
last_run_id UUID,
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
UNIQUE(project_id, period, provider)
);
4.4 8 群新增/擴充表清單(db-expert 發現)
| 表名 | 缺失欄位 / 缺失 Index | Phase |
|---|---|---|
incidents |
加 project_id、trace_id、awooop_run_id |
Phase 2 |
playbooks |
加 project_id、agent_id |
Phase 2 |
km_entries |
加 project_id、namespace |
Phase 2 |
mcp_audit_log |
加 trace_id、run_id、project_id;加 index on (run_id) |
Phase 2 |
ai_decisions |
加 project_id、run_id、加 index on (run_id) |
Phase 2 |
approval_records |
加 trace_id、approval_token_jti、加 index on (jti) |
Phase 2/8 |
telegram_events |
加 project_id、platform_subject_id |
Phase 7 |
ollama_health_checks |
加 host_tier(gcp_a/gcp_b/local)、project_id=__platform__ |
Phase 2 |
5. 安全修補計畫(vuln-verifier 驗收)
5.1 PoC 確認的三個漏洞
| 漏洞 | 位置 | PoC 狀態 | 修補方案 | Phase |
|---|---|---|---|---|
| Nonce 偽造(server nonce 不依賴 server_secret) | security_interceptor.py:451-490 | PoC 確認可通過驗證 | HMAC(server_secret + nonce),server_secret 從 K8s Secret 注入 | Phase 2 |
| Webhook replay(無 timestamp/nonce) | webhooks.py:679-728 | PoC 確認可 replay | 加 timestamp(±5min)+ nonce Redis NX | Phase 2 |
| requires_approval 由 LLM output 決定 | decision_manager.py(approval 鏈) | PoC 確認可繞過 | policy contract 決定,禁止 LLM output 影響 | Phase 2 |
5.2 approval_token 規格
簽章算法:HS256
Payload:
- jti: UUID(唯一性,Redis NX 15min TTL)
- iss: "awooop-platform"
- sub: "{project_id}:{run_id}"
- aud: "awooop-approval"
- exp: now + 15min
- approval_type: "human" | "system"
- decision_scope: [tool_id, ...]
驗證:
1. 簽章驗證
2. exp 未過期
3. Redis NX 確認 jti 未使用(防 replay)
4. sub 與 resume 的 run_id 吻合
5. decision_scope 與 run 的 tool 吻合
5.3 vuln-verifier 每 Phase 驗收清單
- Phase 2:nonce 偽造失敗、webhook replay 失敗、requires_approval 無法由 LLM 決定
- Phase 4:audit log 0 secret 命中(抽樣 100 筆)
- Phase 5:agent 程式碼路徑 0 raw credential(sast)
- Phase 6:cross-tenant isolation PoC(EwoooC 無法讀 AWOOOI)
- Phase 8:approval token 無 token resume 被拒、Redis 宕機後從 PG 恢復
6. API Endpoint 完整清單(fullstack 補充)
6.1 現有(不動)
POST /v1/webhooks/telegramPOST /v1/webhooks/alertmanagerGET /v1/incidents/POST /v1/decisions/
6.2 Phase 4 新增(Platform Shell)
POST /v1/platform/runs— 建立 run(async)GET /v1/platform/runs/{run_id}— 查詢 run stateGET /v1/platform/runs/{run_id}/steps— 查詢 SAGA stepsPOST /v1/platform/runs/{run_id}/cancel— 取消 run
6.3 Phase 4-5 新增(Approval)
POST /v1/platform/runs/{run_id}/approve— 帶 approval_token 的 resumePOST /v1/platform/runs/{run_id}/reject— 拒絕(帶理由)
6.4 Phase 6 新增(Tenant)
POST /v1/platform/projects— 建立 project(admin only)GET /v1/platform/projects/{project_id}/migration_state— 查詢 Strangler Fig 狀態POST /v1/platform/projects/{project_id}/contracts— 建立 contract draftPOST /v1/platform/projects/{project_id}/contracts/{contract_id}/publish— publishPOST /v1/platform/projects/{project_id}/contracts/{contract_id}/activate— activate
6.5 Phase 7 新增(Channel Hub)
GET /v1/platform/channel_events— 查詢 conversation events(with pagination)POST /v1/platform/outbound— 發送 outbound message(admin/test)
7. 錯誤碼字典(必補 9 個)
| Error Code | HTTP Status | 描述 | 場景 |
|---|---|---|---|
E-SCHEMA-001 |
422 | LLM output schema validation failed | Phase 3 contract validator |
E-BUDGET-001 |
429 | Token budget hard kill triggered | Phase 4 budget guard |
E-APPROVAL-001 |
401 | approval_token missing or invalid | Phase 8 approval resume |
E-APPROVAL-002 |
401 | approval_token expired | Phase 8 |
E-APPROVAL-003 |
409 | approval_token already used (replay) | Phase 8 |
E-MCP-GATE-001 |
403 | MCP tool not authorized for this project | Phase 5 |
E-MCP-GATE-002 |
403 | MCP tool not authorized for this agent | Phase 5 |
E-MCP-GATE-003 |
403 | MCP write/execute tool blocked (not in auto_remediate mode) | Phase 5/8 |
E-TENANT-001 |
403 | Cross-tenant data access blocked | Phase 2+ |
E-IDEMPOTENT-001 |
200 | Duplicate event, returning existing run_id | Phase 4 |
E-RATE-001 |
429 | Project rate limit exceeded | Phase 2+ |
E-SAGA-001 |
500 | SAGA compensation failed, manual intervention required | Phase 4/ADR-119 |
8. 前端 Operator Console(frontend-designer,8 個模組)
實作在 Phase 8 之後(或 Phase 6 可 prototype Operator Console) ADR-UI-01~04 定架構,此處為工作項清單
| 模組 | 描述 | 優先順序 |
|---|---|---|
| Tenant Management | project 列表、建立、migration_state 視覺化、budget 設定 | P1(Phase 6 prototype) |
| Contract Lifecycle | draft/publish/activate 操作、revision diff、六合約 family 篩選 | P1(Phase 6 prototype) |
| Run Monitor | run FSM 視覺化、shadow/canary/active 標記、trace_id drill-down | P1(Phase 4 後) |
| Strangler Fig Dashboard | shadow→canary gate 量化指標(divergence / latency / error rate)即時儀表板 | P1(Phase 7 後) |
| Budget & Cost | per-project token budget、hard kill 觸發歷史、成本趨勢(GCP Ollama vs paid provider) | P2 |
| Audit Log Viewer | audit log 查詢(redaction 後)、secret 命中警告、trace_id 關聯 | P2 |
| MCP Gateway Admin | tool registry、grants 管理、credential refs(masked)、audit | P2 |
| Principal Directory | platform_subject 查詢、Telegram/LINE/API user mapping | P3 |
與現有設計系統整合:
- 必須使用 next-intl(禁止 hardcode 中文/英文)
- 禁止 emoji,使用 Lucide/SVG icon
- 遵循
feedback_design_system_consistency.md全站設計規範 - 禁止直接存取內網 IP(
feedback_frontend_internal_ip_ban.md)
9. 重構切割計畫(11 PR,refactor-specialist)
每 PR 必須獨立可合併、有 rollback 能力、不依賴後 PR
| PR# | 標題 | 前置 PR | 影響範圍 | 風險 |
|---|---|---|---|---|
| PR-01 | telemetry.py:71 硬碼 IP assert 移除 |
無 | 1 行 | 低 |
| PR-02 | decision_manager.py:240 silence key 常數化 |
無 | 2 行 | 低 |
| PR-03 | ollama_auto_recovery.py:230 第二定義移除 |
無 | ~5 行 | 低 |
| PR-04 | _provider → __provider(registry.py) |
無 | ~20 行 | 低 |
| PR-05 | mcp_bridge.py namespace 動態化 |
無 | ~30 行 | 中 |
| PR-06 | consensus_engine.py CONSENSUS_PREFIX 加 project 前綴 |
Phase 2 Redis 雙寫 Phase A | ~15 行 | 中 |
| PR-07 | nonce 重設計 + webhook timestamp/nonce(ADR-116) | 無 | ~100 行 | 高(安全修補) |
| PR-08 | Repository project_id filter 批次 1(incidents/playbooks/km) | Phase 1 schema | ~200 行 | 中 |
| PR-09 | Repository project_id filter 批次 2(mcp/ai_decisions/approval) | PR-08 | ~200 行 | 中 |
| PR-10 | Background loop 標記(31 個 loop,main.py) | ADR-123 | ~150 行 | 中 |
| PR-11 | AnomalyCounter per-project 改造 | PR-10 | ~80 行 | 中 |
PR-01
05 可並行(無依賴),先做先進。 PR-0607 需要 Redis 雙寫 Phase A 先完成。 PR-08~09 需要 Phase 1 schema 先完成。
10. Feature Flag / Kill-Switch Registry
| Flag 名稱 | 預設值 | 說明 | 開啟條件 |
|---|---|---|---|
AWOOOP_SHADOW_MODE |
OFF | 啟用 shadow run(鏡像但不回應) | Phase 4 完成後手動 flip |
AWOOOP_CANARY_MODE |
OFF | 啟用 canary(部分 user-visible 回應) | shadow gate 14天量化通過 |
AWOOOP_READ_ONLY_MODE |
OFF | read-only 查詢搬到 AwoooP | canary gate 7天量化通過 |
AWOOOP_SUGGEST_MODE |
OFF | AI 建議但人工決定 | read_only gate 14天通過 |
AWOOOP_WRITE_MODE |
OFF | 受控 write/execute tool 啟用 | suggest gate 30天通過 + rollback evidence ≥3 |
AWOOOP_BUDGET_HARD_KILL |
ON | token budget 超額直接終止(非只告警) | 預設 ON(ADR-120) |
AWOOOP_MCP_OAUTH21 |
OFF | MCP OAuth 2.1 flow(ADR-117) | Phase 5 完成後 |
AWOOOP_RLS_STRICT |
OFF | 嚴格 RLS 模式(禁止 awooop_platform bypass) | Phase 2 完成 + 30天 soak |
AWOOOP_EWOOOC_LIVE |
OFF | EwoooC tenant 切為 live(非 shadow) | Phase 6 canary 7天通過 |
11. Runbook 清單(8 份,debugger 需求)
| Runbook | 位置 | 觸發情境 | 主要步驟 |
|---|---|---|---|
| RB-01: AwoooP Contract Publish Failure | docs/runbooks/awooop-contract-publish-failure.md |
schema 驗證失敗、CODEOWNERS reject | 1.查 body_hash 2.查 draft 狀態 3.rollback to previous active |
| RB-02: Run State Stuck / Stale Lease | docs/runbooks/awooop-run-stuck.md |
run 停在 RUNNING > 10min | 1.查 lease_until 2.手動 reaper 3.查 saga_steps 決定補償或放棄 |
| RB-03: Budget Hard Kill Triggered | docs/runbooks/awooop-budget-hard-kill.md |
E-BUDGET-001 大量出現 | 1.查 budget_ledger 2.確認 hard_kill_at 閾值 3.是否 incident 爆發 4.臨時上調 or 等下月 reset |
| RB-04: Phase Rollback(Strangler Fig) | docs/runbooks/awooop-phase-rollback.md |
canary 錯誤率 > threshold | 1.切回 project_migration_state 到上一個 mode 2.清 Redis canary cache 3.通知 EwoooC(如果影響到) |
| RB-05: Approval Token Replay 告警 | docs/runbooks/awooop-approval-replay.md |
E-APPROVAL-003 出現 | 1.查 jti Redis key 2.確認 IP / user 3.吊銷 token 4.通知安全 |
| RB-06: Cross-Tenant Data Leak 告警 | docs/runbooks/awooop-cross-tenant-leak.md |
E-TENANT-001 大量出現 | 1.立即停 canary/active mode 2.查 audit log 3.RLS 設定確認 4.PITR restore 評估 |
| RB-07: GCP Ollama Failover 異常 | docs/runbooks/awooop-gcp-ollama-failover.md |
GCP-A/B 同時掛、Local fallback 也掛 | 1.確認 platform:ollama:primary Redis key 2.手動設定 fallback 3.確認 paid provider 緊急路由 |
| RB-08: SAGA Compensation 失敗 | docs/runbooks/awooop-saga-compensation-fail.md |
E-SAGA-001 出現 | 1.查 saga_steps JSON 2.找失敗 step 3.手動執行補償指令 4.更新 run 狀態 |
12. 工具補強計畫(tool-expert)
| 工具 | 用途 | 安裝位置 | Phase |
|---|---|---|---|
| PgBouncer | AwoooP 多 worker 下 PG connection pool 防爆 | K8s sidecar 或獨立 Pod | Phase 4 之前 |
| Sealed Secrets | 替代 K8s Secret 明文,CI/CD 安全 | K3s cluster | Phase 2(security hardening 時) |
| OPA / Cedar | policy engine,授權邏輯集中化(取代散落程式碼) | 作為 sidecar 或 admission webhook | Phase 5 之前 |
| chaostoolkit / LitmusChaos | Strangler Fig 切換的混沌驗證(worker 崩潰、Redis 宕機、PG timeout) | CI pipeline | Phase 4 完成後 |
| awooop-ctl | AwoooP CLI(contract CRUD / run 查詢 / migration state 管理) | 本地 CLI + CI | Phase 6 之前 |
| pg_partman | PostgreSQL partition 自動管理 | K8s Pod / cron | Phase 4(run_state 上線前) |
| pgvector(已有) | KM 向量搜索 | 已存在,需 per-project namespace | Phase 2 |
| OpenTelemetry Collector | OTel pipeline(ADR-121),現在直送 SignOz 188:24318,未來需 sampling | K8s DaemonSet | Phase 4 之前 |
13. 業界對齊(web-researcher 發現)
13.1 $47k Agent Loop 事故教訓(Token Budget Hard Kill)
問題:alert ≠ enforcement。僅發 Prometheus alert 但 agent 仍繼續執行,一個 loop 燒了 $47k。
AwoooP 解法(ADR-120):
- 三層 budget limit:per-run / per-project / per-tenant
- Hard Kill:超額 → 直接終止 run(not just log/alert)
- Redis hot counter(每次 call 減少)+ PG budget_ledger 事務(final decision)
AWOOOP_BUDGET_HARD_KILLfeature flag 預設 ON(唯一預設開啟的 flag)
13.2 Durable Execution / SAGA 補償交易(ADR-119)
業界標準(Temporal / Conductor / Azure Durable Functions):multi-step tool chain 必須有 step-level journal + 補償機制。
AwoooP 解法:
saga_stepsJSONB 欄位在awooop_run_state- 每個 tool call 記錄:step_id / tool / status / compensation_cmd / completed_at
- 失敗時執行補償指令(反向操作)
- 補償失敗 → E-SAGA-001 + Runbook RB-08
13.3 MCP OAuth 2.1 Confused Deputy(ADR-117)
MCP spec 2025-06-18 要求:
- per-tenant dynamic client registration(RFC 7591)
- Resource Indicators(RFC 9728):防止 token 被跨 resource server 使用
- PKCE(RFC 7636):防止 authorization code interception
AwoooP 解法(ADR-117):
- 每個 tenant 動態 client registration,不共用 client_id
- Resource Indicator 必須匹配 tool registry 的 target URI
E-MCP-GATE-001/002/003error codes 覆蓋 Confused Deputy 情境
13.4 OTel GenAI Semantic Conventions(ADR-121)
官方規範(opentelemetry-specification/semantic_conventions/gen-ai):
- span 命名:
gen_ai.{system}.{operation}(e.g.,gen_ai.anthropic.chat) - token attribute:
gen_ai.usage.input_tokens/gen_ai.usage.output_tokens - model attribute:
gen_ai.request.model/gen_ai.response.model
AwoooP 解法:全部 LLM call 必須 emit 以上 attribute,進 SignOz(188:24318)。
13.5 OWASP Agentic AI Top 10 對齊(ADR-122)
| OWASP 項目 | AwoooP 對應控制 |
|---|---|
| OAI-01 Prompt Injection | MCP Gateway result sanitization + schema validator |
| OAI-02 Insecure Tool Use | Five-gate MCP enforcement + audit |
| OAI-03 Excessive Agency | requires_approval from policy(禁 LLM 決定)+ write/execute feature flag |
| OAI-04 Supply Chain | contract publish HMAC + artifact SHA-256 |
| OAI-05 Data Leakage | audit log redaction + credential isolation |
| OAI-06 Insufficient Observability | OTel GenAI + audit sink + run trace_id |
| OAI-07 Unsafe Orchestration | SAGA journal + compensation + hard kill |
| OAI-08 Memory Vulnerabilities | contract revision immutability + RLS |
| OAI-09 Access Control Bypass | approval_token HS256 + jti replay prevention |
| OAI-10 Resource Exhaustion | Token Budget Hard Kill(ADR-120) |
14. GCP Ollama 拓撲對 AwoooP 的影響(ADR-110 整合)
14.1 新拓撲(ADR-110,2026-05-03 生效)
Primary : GCP-A http://34.143.170.20:11434 (SSD,9x 載速)
Secondary: GCP-B http://34.21.145.224:11434 (SSD,備援)
Fallback : Local http://192.168.0.111:11434 (HDD,最後防線)
Emergency: Gemini → Nemotron → Claude (全 Ollama 掛時)
14.2 AwoooP 必須處理的影響項目
| 影響項 | 位置 | 處理方式 | Phase |
|---|---|---|---|
ollama:current_primary Redis key 雙寫(只支援 1 個 URL,新需要 3 層) |
INV-1 | 改為 platform:ollama:topology(JSON:primary/secondary/fallback) |
Phase 2 |
ollama_auto_recovery.py:230 第二定義(P0-11) |
ollama_auto_recovery.py | 移除,統一從 config 讀 | Phase 2 PR-03 |
| GCP IP 進 INV-4(34.143.170.20, 34.21.145.224) | INV-4 | 加入 allowed IP 清單,確認 K8s NetworkPolicy egress 已設定 | Phase 0 INV-4 |
| EwoooC Provider Proxy 走 GCP Ollama 路由 | Phase 6 | EwoooC 共用 platform Ollama topology(platform_resource) | Phase 6 |
telemetry.py:71 IP assert(P0-08) |
telemetry.py:71 | 移除後,GCP IP 不再觸發 assert;改為 config-driven | Phase 2 PR-01 |
| budget_ledger 記錄 Ollama usage(免費 GCP 仍需 token 計數) | Phase 4 | Ollama call 也必須記錄 token 消耗(budget_ledger) | Phase 4 |
| Runbook RB-07(GCP Ollama failover 異常) | docs/runbooks/ | Phase 0 寫 Runbook,Phase 4 後實際 E2E 測試 | Phase 0 |
14.3 Ollama GCP 為 platform_resource(ADR-111)
GCP Ollama(34.143.170.20, 34.21.145.224)與 Local Ollama(192.168.0.111)一律聲明為 platform_resource:
- 不屬於任何 tenant
- 所有 tenant(AWOOOI / EwoooC / Tsenyang / Bitan)共用,但 audit 記錄各自 project_id
platform:ollama:topologyRedis key 前綴為platform:(非{project_id}:)
15. 工作排序總表(含並行群組 + Critical Path)
Critical Path(序列執行,不可跳)
Phase 0 全部 ADR/INV
→ Phase 1 Schema(PR-01/02/03/04/05 可並行先做)
→ Phase 2 Security Hardening + Redis 遷移(PR-06~11)
→ Phase 3 Contract Packages
→ Phase 4 Platform Shell(PgBouncer + OPA/pg_partman 同步準備)
→ Phase 5 MCP Gateway
→ Phase 6 EwoooC(14天 shadow gate)
→ Phase 7 Channel Hub(7天 canary gate)
→ Phase 8 Suggest + Write(30天 suggest gate)
可並行工作群組
| 群組 | 工作 | 可與哪個並行 |
|---|---|---|
| G-A(Phase 0 並行) | ADR-111~115 各自獨立 | 全部並行(5 份 ADR 各分配一位) |
| G-B(Phase 0 並行) | ADR-116~124 | 與 G-A 並行 |
| G-C(Phase 0 並行) | INV-1~INV-9(部分依賴 codebase 探索) | 與 G-A/G-B 並行 |
| G-D(Phase 2 並行) | PR-01/02/03/04/05(獨立小修補) | 全部並行 |
| G-E(Phase 2 並行) | Redis 雙寫 + repository 改造 + security hardening | 各自獨立,但 security hardening 優先 |
| G-F(Phase 4 並行) | PgBouncer 安裝 + pg_partman 安裝 + OPA 安裝 | 與 Phase 3 Contract Packages 並行 |
| G-G(Phase 5-6 並行) | Operator Console prototype(ADR-UI-01~04) | 與 Phase 6 EwoooC shadow 並行 |
完整排序表
| 順序 | 工作 | docs-only | 並行群組 | 阻擋誰 |
|---|---|---|---|---|
| 1-A | ADR-111 Bootstrap Order | ✅ | G-A | Phase 2 |
| 1-B | ADR-112 Contract Governance | ✅ | G-A | Phase 3 |
| 1-C | ADR-113 Active Revision Outbox | ✅ | G-A | Phase 1 |
| 1-D | ADR-114 Idempotency & Worker Lease | ✅ | G-A | Phase 4 |
| 1-E | ADR-115 Principal Mapping | ✅ | G-A | Phase 6、7 |
| 2-A | ADR-116 Security Hardening | ✅ | G-B | Phase 2 |
| 2-B | ADR-117 MCP OAuth 2.1 | ✅ | G-B | Phase 5 |
| 2-C | ADR-118 RLS Strategy | ✅ | G-B | Phase 1 |
| 2-D | ADR-119 Durable Execution SAGA | ✅ | G-B | Phase 4 |
| 2-E | ADR-120 Token Budget Hard Kill | ✅ | G-B | Phase 4 |
| 2-F | ADR-121 OTel GenAI | ✅ | G-B | Phase 4 |
| 2-G | ADR-122 OWASP Agentic AI | ✅ | G-B | 全 Phase |
| 2-H | ADR-123 Background Loop Migration | ✅ | G-B | Phase 2 |
| 2-I | ADR-124 Global Singleton Decomposition | ✅ | G-B | Phase 2 |
| 2-J | ADR-UI-01~04 Operator Console ADR | ✅ | G-B | Phase 6+ |
| 2-K | ADR-106 補 Quantified Gates | ✅ | G-B | Phase 8 |
| 3-A | INV-1 Redis Keys | ✅ | G-C | Phase 2 |
| 3-B | INV-2 Repository Retrofit Map | ✅ | G-C | Phase 2 |
| 3-C | INV-3 Entrypoints | ✅ | G-C | Phase 2 |
| 3-D | INV-4 Hardcoded Namespace/IP(含 GCP IP) | ✅ | G-C | Phase 2 |
| 3-E | INV-5 Migration Compatibility Matrix | ✅ | G-C | Phase 1 |
| 3-F | INV-6 Rollback Playbook Register | ✅ | G-C | Phase 4 |
| 3-G | INV-7 PR Cutting Plan | ✅ | G-C | Phase 2 |
| 3-H | INV-8 Background Loop Catalog(31 個) | ✅ | G-C | Phase 2 |
| 3-I | INV-9 Global Singleton Catalog(13 個) | ✅ | G-C | Phase 2 |
| 4 | Task 9 順序修正(Dockerfile/ConfigMap) | ❌ | — | Phase 1 |
| 5 | Phase 1 Schema Migration(重寫版) | ❌ | — | Phase 2~8 |
| 6-A | PR-01/02/03/04/05(並行小修補) | ❌ | G-D | Phase 2 |
| 6-B | Phase 2 Security Hardening(PR-07 優先) | ❌ | G-E | Phase 4 |
| 6-C | Phase 2 Redis 雙寫 + Repository(PR-06/08/09/10/11) | ❌ | G-E | Phase 4 |
| 7 | Phase 3 Contract Packages(packages/awooop-contracts/) | ❌ | — | Phase 4 |
| 8-A | PgBouncer + pg_partman + OPA 安裝 | ❌ | G-F | Phase 4 |
| 8-B | Phase 4 Platform Shell + Shadow(含 SAGA + Budget Kill) | ❌ | — | Phase 5 |
| 9 | Phase 5 MCP Gateway(含 OAuth 2.1) | ❌ | — | Phase 6 |
| 10-A | Phase 6 EwoooC Shadow Onboarding(14 天 gate) | ❌ | G-G | Phase 7 |
| 10-B | Operator Console prototype(G-G) | ❌ | G-G | Phase 7+ |
| 11 | Phase 7 Channel Hub(7 天 canary gate) | ❌ | — | Phase 8 |
| 12 | Phase 8 Suggest + Controlled Write(30 天 gate) | ❌ | — | AwoooP v1 GA |
1-A 到 3-I 全部 docs-only,可在當前對話視窗連續完成,完成後才開新 Codex 對話進 Phase 1 code。
16. 量化驗收門檻(完整版)
Strangler Fig Gates
| 切換 | 量化條件 | 簽核 |
|---|---|---|
| pre → shadow | tenant 已建 + agent contract published + audit/trace 寫入正常 | critic 確認 |
| shadow → canary | ≥14 天 + decision divergence < 5% + p95 退化 < 10% + 0 P0/P1 incident + audit 0 secret | critic + db-expert + vuln-verifier |
| canary → read_only | ≥7 天 + user-visible error rate < 0.5% + cost diff < 50% 預算 | critic + vuln-verifier |
| read_only → suggest | ≥14 天 + suggest accept rate ≥ 50% + 0 hallucination escalation | critic |
| suggest → auto_remediate | ≥30 天 + rollback evidence ≥ 3 成功 + approval token live + dry-run pass ≥ 99% | critic + db-expert + vuln-verifier |
Phase 驗收門檻(量化補強)
| Phase | 必要量化指標 |
|---|---|
| Phase 1 | migration up/down dry-run 通過;RLS cross-project 拒絕率 100%;AWOOOI 0 行為改動(regression pass rate 100%) |
| Phase 2 | INV-1 P0 key 遷移完成率 100%;vuln-verifier PoC 通過率 3/3;hardcode grep 結果 0 |
| Phase 3 | contract schema 覆蓋率 100%(6 個 family);invalid fixture 拒絕率 100% |
| Phase 4 | shadow run 0 user-visible response;duplicate event 唯一 run rate 100%;stale reaper 1min 內回收率 100% |
| Phase 5 | credential leak test 通過率 100%;Five-gate integration test 覆蓋率 100% |
| Phase 6 | cross-tenant data access 拒絕率 100%;EwoooC shadow 14天 gate 通過 |
| Phase 7 | 首則進度訊息 ≤ 30s 達成率 ≥ 99%;duplicate retry 0 重複 run |
| Phase 8 | approval replay 拒絕率 100%;write/execute 預設 OFF 驗證通過 |
17. 關聯文件索引
- ADR-106: AwoooP 架構
- ADR-107: 控制面儲存策略
- ADR-110: GCP Ollama 三層容災拓撲
- MASTER-WORKPLAN.md(本文展開的主索引)
- IMPLEMENTATION-ROADMAP.md(歷史文件,舊版草稿)
- 待建:
docs/awooop/inventory/INV-1~INV-9 - 待建:ADR-111~ADR-124(AwoooP 專用 ADR 系列)
- 待建:ADR-UI-01~ADR-UI-04(Operator Console ADR)
- 待建:
docs/runbooks/RB-01~RB-08
最後更新:2026-05-03(台北時區) 建立:12-Agent 聯合審查 × Codex 整合 下一步:Phase 0 docs-only 工作(ADR-111 起),完成後開新 Codex 對話進 Phase 1 code