Files
awoooi/docs/awooop/DETAILED-IMPLEMENTATION-PLAN.md
Your Name ed7c6946cb
All checks were successful
Code Review / ai-code-review (push) Successful in 10s
docs(awooop): define private Ollama mesh gateway
2026-05-05 22:56:22 +08:00

1297 lines
68 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# AwoooP 完整詳細實施計畫
**版本**v1.012-Agent 全景審查後整合版)
**日期**2026-05-03台北時區
**建立者**12-Agent 聯合審查 × Codex 整合
**基礎文件**MASTER-WORKPLAN.md、ADR-106、ADR-107
**⚠️ ADR 編號修正**ADR-108/109/110 已被其他 ADR 占用 → AwoooP 專用 ADR 從 ADR-111 開始
> 本文件是 MASTER-WORKPLAN.md 的完整展開版。
> MASTER-WORKPLAN 是主索引,本文是執行細節。
> 任何矛盾以本文為準(本文更新日期更晚)。
---
## 0. 全景背景
### 0.1 基礎架構現況(截至 2026-05-03
| 組件 | 現況 | 備註 |
|------|------|------|
| Ollama Primary | GCP-A `34.143.170.20:11434`SSD| ADR-110取代 ADR-105 |
| Ollama Secondary | GCP-B `34.21.145.224:11434`SSD| 新增2026-05-03 上線 |
| Ollama Fallback | Local `192.168.0.111:11434`HDD| 最後防線,非 Primary |
| PostgreSQL | `192.168.0.188`(私網)| AwoooP 控制面唯一 source of truth |
| Redis | `192.168.0.188`(私網)| cache/watch/counter onlyADR-107 D4|
| K3s 叢集 | `awoooi-prod` namespace | AWOOOI first tenant |
| Gitea CI/CD | `192.168.0.110`(或 Gitea Cloud| ADR-039所有 build 從 Gitea |
### 0.2 12-Agent 審查發現彙整
原始 MASTER-WORKPLAN 有 24 項共識問題。12 位 Agent 並行深度審查後新增:
| Agent | 新增 P0/P1 問題數 | 新增 ADR 需求 | 新增 Inventory |
|-------|-----------------|--------------|----------------|
| critic | 10 | 1ADR-116 Migration Discipline| INV-5、INV-6、INV-7 |
| vuln-verifier | 8含 PoC 確認 3 個)| 2ADR-116/117 安全系列)| — |
| debugger | 12故障情境| — | 8 份 Runbook |
| db-expert | 8表設計缺陷+ RLS 完全空白 | 1ADR-118 RLS 策略)| — |
| planner | 7 粒度過粗 + 10 acceptance 不閉環 | — | — |
| fullstack-engineer | 7 API endpoint 缺失 + 9 error code | — | — |
| frontend-designer | 8 UI 模組完全缺失 | ADR-UI-01~04 | — |
| refactor-specialist | 8 重構地雷 + 11 PR 方案 | — | — |
| migration-engineer | 7 相容性風險 | — | version matrix |
| onboarder | 31 background loopvs 估計 ~10+ 13 模組衝突 | — | INV-8 |
| tool-expert | 8 工具容量不足 + 8 工具缺失 | — | — |
| web-researcher | 業界 5 大對齊缺口SAGA/Token Kill/MCP OAuth 2.1/OTel/OWASP| 5ADR-119~123| — |
| **合計新增** | **~70 個問題** | **~12 份 ADR** | **~4 份 Inventory** |
**結論:不先補完 Pre-flight AuditPhase 1 必爆。**
---
## 1. 完整問題清單P0 優先順序)
### P0 — 直接爆炸(必須在 Phase 1 之前修補)
| # | 問題 | 來源 | 影響範圍 |
|---|------|------|---------|
| P0-01 | Redis key 直接改名無雙寫期費用計數歸零、Telegram 409、silence 失效、Ollama failover 三層拓撲雙寫不到)| critic | 費用、告警、Ollama |
| P0-02 | Migration SQL 表名錯(`incident_records` / `mcp_audit_snapshots`)、無 rollback、ORM 1.x vs 2.x | critic | Phase 1 migration |
| P0-03 | `project_id` / `tenant_id` 在 codebase 0 命中30+ 業務表無此欄 | onboarder | 全系統 |
| P0-04 | `requires_approval` 欄位由 LLM output 決定security_interceptor.py:451-490| vuln-verifierPoC 確認)| approval 鏈 |
| P0-05 | callback nonce 偽造server nonce 邏輯可不知 secret 構造通過驗證security_interceptor.py:451-490| vuln-verifierPoC 確認)| Telegram approval |
| P0-06 | Webhook HMAC replay 無 timestamp/noncewebhooks.py:679-728| vuln-verifierPoC 確認)| 所有 webhook |
| P0-07 | 31 個 background loop 全無 project_idmain.py| onboarder實測| 多租戶全崩 |
| P0-08 | `telemetry.py:71` 硬碼 `if "192.168.0.188" not in endpoint: raise`EwoooC 啟動必失敗 | onboarder | EwoooC Phase 6 |
| P0-09 | `project_migration_state` 表缺失Strangler Fig 無資料載體 | db-expert | Phase 1 |
| P0-10 | Task 9 順序倒置agent prompt 載入點在 ConfigMap 前)→ 全回 None | critic | Phase 1 任何 agent |
| P0-11 | `ollama:current_primary``ollama_auto_recovery.py:230` 有第二定義,三層拓撲遷移必裂 | onboarder | GCP Ollama 拓撲 |
| P0-12 | `consensus_engine.py``CONSENSUS_PREFIX="consensus:"` 無 project 前綴multi-tenant 時跨 tenant 共用 | onboarder | 多租戶一致性 |
| P0-13 | `mcp_bridge.py:592-681` kubectl 呼叫硬碼 `namespace="awoooi-prod"` | onboarder | EwoooC K8s tool |
### P1 — 嚴重缺陷Phase 2-4 之前必修)
| # | 問題 | 來源 | 影響範圍 |
|---|------|------|---------|
| P1-01 | AWOOOI Bootstrap Paradoxcron/job/healthcheck 全無 project_id | critic | 多租戶啟動 |
| P1-02 | EwoooC 接入零技術路徑(非只改 `OLLAMA_API_BASE`| critic | Phase 6 |
| P1-03 | Strangler Fig shadow→canary→active 無量化 gate 條件 | planner | 切換決策 |
| P1-04 | Layer 3 redaction 零實作helper 有但無 enforcement| critic | 資訊安全 |
| P1-05 | `_provider` 屬性 public可繞過 auditmcp/registry.py:24-71| critic | MCP 安全 |
| P1-06 | `WAITING_APPROVAL` resume 不驗 caller identity無 approval_token 簽章 | critic | approval 安全 |
| P1-07 | Redis approval state 單點,無 PG sync | critic | approval 可靠性 |
| P1-08 | Audit log 本身會洩密redaction 必須做在 audit sink 前)| critic | 資訊安全 |
| P1-09 | `sanitization_service.py` helper 無 enforcement pointMCP Gateway / AgentToolExecutor 都沒用)| critic | tool 安全 |
| P1-10 | Active revision 切換無 transactional outboxworker 可能吃舊 policy | db-expert | policy 一致性 |
| P1-11 | Run/Channel idempotency 缺 key derivation 規則與 unique index | db-expert | 重複執行 |
| P1-12 | Async worker 缺 lease / heartbeat / stale reaper | db-expert | worker 可靠性 |
| P1-13 | 高流量表 partition + retention 需 Phase 1 就決定(不能後補)| db-expert | 長期可擴展 |
| P1-14 | Observability metrics label cardinalityrun_id/trace_id/session_id 禁進 metrics| fullstack | Prometheus |
| P1-15 | `multi_sig_redis.py:178-205` approval flow 零 trace_id | debugger | 故障排查 |
| P1-16 | `hermes/nl_gateway.py:7,146,163` Redis key 無 project 前綴 | onboarder | Hermes 隔離 |
| P1-17 | `anomaly_counter.py:790` AnomalyCounter 全域單例6 個 prefix 無 tenant 隔離 | onboarder | 多租戶計數 |
| P1-18 | `incident_service.py:603-615` `SCAN incident:*` 無 project_id | onboarder | Redis 資料隔離 |
| P1-19 | Contract publish 權限與簽章未定義 | critic | contract 治理 |
| P1-20 | 13 個全域單例跨 tenant 共用TrustEngine/ProviderRegistry/TelegramGateway/等)| onboarder | 多租戶隔離 |
| P1-21 | Token Budget 無 Hard Kill$47k agent loop 事故教訓)| web-researcher | 費用控管 |
| P1-22 | RLSRow Level Security完全空白 | db-expert | DB 多租戶 |
| P1-23 | GCP Ollama 三層拓撲 Redis key 雙寫遷移未規劃(`ollama:current_primary` 舊 key 只知道 1 個 host| critic | Ollama failover |
| P1-24 | `decision_manager.py:240` 硬碼 `telegram_silence:{target}` 未 import gateway 常數(跨兩處定義)| debugger | silence 功能 |
### P2 — 設計缺口Phase 5-8 之前必補)
| # | 問題 | 來源 | 影響範圍 |
|---|------|------|---------|
| P2-01 | Telegram/LINE/Slack/API/Internal 缺 canonical principal mapping | critic | 身份統一 |
| P2-02 | Run FSM 零實作(只有表設計,無狀態機程式碼)| fullstack | Phase 4 |
| P2-03 | EwoooC Provider Proxy 不能只改 URL需要完整 envelope+audit 入口 | critic | Phase 6 |
| P2-04 | 業界 Durable Execution / SAGA 補償交易機制缺失 | web-researcher | 長時 agent tool chain |
| P2-05 | MCP OAuth 2.1RFC 9728 + RFC 7591Confused Deputy 無防護 | web-researcher | MCP 安全 |
| P2-06 | OTel GenAI Semantic Conventionsspan 命名 / attribute 規範)未對齊 | web-researcher | 可觀測性 |
| P2-07 | OWASP Agentic AI Top 10 對齊缺口prompt injection、tool misuse 等 7 項)| web-researcher | AI 安全 |
| P2-08 | ISO 42001 AI 管理體系對齊文件缺失 | web-researcher | 合規 |
| P2-09 | 7 個 API endpoint 缺失(見 §6 fullstack 清單)| fullstack | API 完整性 |
| P2-10 | 9 個 error code 缺失(見 §7 error code 字典)| fullstack | 客戶端解析 |
| P2-11 | Progressive feedback policyasync run 無進度通知 ≤30s| fullstack | UX |
| P2-12 | 8 個 Operator Console UI 模組完全缺失(見 §8 frontend| frontend-designer | 運營可見性 |
| P2-13 | `awooop-ctl` CLI 工具缺失(現有 kubectl + curl 手動操作)| tool-expert | 運維體驗 |
| P2-14 | OPA/Cedar policy engine 缺失(現在 contract 授權邏輯散落程式碼)| tool-expert | 授權集中化 |
| P2-15 | chaostoolkit / LitmusChaos 缺失Strangler Fig 切換無混沌驗證)| tool-expert | 容災驗證 |
| P2-16 | PgBouncer 缺失AwoooP 多 worker 下 PG connection pool 會爆)| tool-expert | DB 可擴展性 |
---
## 2. Pre-flight Audit — Phase 0 完整清單
> Phase 0 全部 docs-only。無任何 runtime code 變動。
> 完成後才開新 Codex 對話進 Phase 1 code。
### 2.1 AwoooP 核心 ADRADR-111115
**注意ADR-108/109/110 已被 incident fingerprint / telegram dedup / GCP Ollama 拓撲占用AwoooP 從 ADR-111 起。**
| ADR | 主題 | 解決問題 | 主要內容 |
|-----|------|---------|---------|
| **ADR-111** | AwoooP Bootstrap Order & Identity Paradox | P0-07、P0-01、P1-01 | `platform_internal` / `requires_project_id` / `legacy_awoooi_default` 三種標記31 個 background loop 分類AWOOOI cron/job 過渡豁免時程Ollama GCP 三層 failover 的 platform_resource 聲明 |
| **ADR-112** | Contract Governance & Publishing Workflow | P1-19 | 誰可 publish / activateCODEOWNERSHMAC 簽章approval workflowactivation auditdraft 與 published 隔離 |
| **ADR-113** | Active Revision Invalidation & Outbox | P1-10 | `awooop_contract_outbox` 表設計Redis pub/sub 通知worker revision-aware cachesplit-brain 防禦GCP Ollama 拓撲切換事件 |
| **ADR-114** | Idempotency, Worker Lease & Run Recovery | P1-11、P1-12 | channel event dedupe`(project_id, channel_type, provider_event_id)` uniqueworker `lease_until` / `heartbeat_at` / `attempt_count`stale run reaperSKIP LOCKED |
| **ADR-115** | Canonical Principal Mapping & Tenant Onboarding | P2-01、P0-08 | Telegram/LINE/Slack/API/Internal → `platform_subject` 統一映射EwoooC Proxy AdapterTsenyang/Bitan 模板;`telemetry.py:71` IP assert 修正方案 |
### 2.2 安全強化 ADR
| ADR | 主題 | 解決問題 | 主要內容 |
|-----|------|---------|---------|
| **ADR-116** | AwoooP Security Hardening | P0-04、P0-05、P0-06 | callback nonce 重設計server_secret 必參與 HMACwebhook 加 timestamp/nonce 防 replay`requires_approval` 改為 policy-derived禁止 LLM 決定approval_token signing 規格HS25615min TTL`jti` 唯一性)|
| **ADR-117** | MCP OAuth 2.1 & Confused Deputy Prevention | P2-05 | RFC 9728 Resource IndicatorsRFC 7591 Dynamic Client Registrationper-tenant token scopeConfused Deputy 防護設計MCP Server binding PKCE flow |
### 2.3 資料庫強化 ADR
| ADR | 主題 | 解決問題 | 主要內容 |
|-----|------|---------|---------|
| **ADR-118** | Row-Level Security & Tenant DB Isolation | P1-22 | 所有 AwoooP 表啟用 RLS`current_setting('app.project_id')` 注入RLS bypass role 設計migration 驗收標準 |
| **ADR-119** | Durable Execution & SAGA Compensation | P2-04 | multi-step agent tool chain 的 step-level journal補償交易觸發條件checkpoint/resume 設計;與 Phase 4 run state machine 整合 |
### 2.4 可觀測性 & AI 安全 ADR
| ADR | 主題 | 解決問題 | 主要內容 |
|-----|------|---------|---------|
| **ADR-120** | Token Budget Hard Kill | P1-21 | 每 run / 每 project / 每 tenant 三層 budget limithard kill不只 alert$47k agent loop 事故 RCA`budget_ledger` 表設計Redis hot counter + PG 事務 hard stop |
| **ADR-121** | OTel GenAI Semantic Conventions Alignment | P2-06 | span 命名規範(`gen_ai.request.*`token 計數 attributeLLM provider attribute與現有 SignOz188:24318整合metrics label cardinality 規則 |
| **ADR-122** | OWASP Agentic AI Top 10 & ISO 42001 Alignment | P2-07、P2-08 | Top 10 逐項對應到 AwoooP 控制面ISO 42001 AI 管理體系必要文件清單;每 Phase 對齊驗收項 |
### 2.5 Migration Discipline ADR
| ADR | 主題 | 解決問題 | 主要內容 |
|-----|------|---------|---------|
| **ADR-123** | Background Loop project_id Migration Strategy | P0-07、P1-01 | 31 個 background loop 分三類platform_internal / legacy_awoooi_default / requires_project_id每類遷移策略regression test 設計完成標準main.py 0 個無標記 loop|
| **ADR-124** | Global Singleton Decomposition for Multi-tenancy | P1-20 | 13 個全域單例清單分解策略per-project registry / factory patternAWOOOI 1.0 → AwoooP 1.0 遷移路徑;不能同時拆的依賴序 |
### 2.6 前端 Operator Console ADR新增
| ADR | 主題 | 解決問題 | 主要內容 |
|-----|------|---------|---------|
| **ADR-UI-01** | AwoooP Operator Console 架構 | P2-12 | 8 個 UI 模組規格;與現有 `apps/web/` 整合方式多租戶視角設計i18nnext-intl規範 |
| **ADR-UI-02** | Contract Lifecycle UI | P2-12 | draft → publish → activate 操作流程revision diff 視覺化contract family 篩選 |
| **ADR-UI-03** | Run State & Shadow Monitoring UI | P2-12 | shadow/canary/active 切換 dashboardrun FSM 視覺化Strangler Fig gate 量化指標展示 |
| **ADR-UI-04** | Tenant Budget & Audit UI | P2-12 | per-project token budgethard kill 觸發歷史audit log 查詢(含 redaction 遮蔽)|
### 2.7 ADR-106 補充章節
ADR-106 需新增:
- **Strangler Fig Quantified Gates**(量化切換條件)
- **GCP Ollama 拓撲影響**(三層 failover 如何成為 `platform_resource`,不屬於任何 tenant
- **Bootstrap Order** 參照 ADR-111
### 2.8 Inventory 清單9 份)
| Inventory | 位置 | 範圍 | 解決問題 |
|-----------|------|------|---------|
| **INV-1** | `docs/awooop/inventory/INV-1-redis-keys.md` | 全 codebase grep `redis_client.*\(["']` 等,列出 43+ 個 key、命名空間、TTL、用途、寫入/讀取點、是否硬碼 | P0-01、P1-18 |
| **INV-2** | `docs/awooop/inventory/INV-2-repository-project-id-retrofit.md` | 30+ 業務表 × 目前有無 `project_id` × 所有 repository 方法 × 需加 filter 的查詢 × 需 backfill 的歷史資料 | P0-03 |
| **INV-3** | `docs/awooop/inventory/INV-3-entrypoints.md` | 所有 cron job / scheduler / webhook / CLI / healthcheck / internal service call標記三種類型 | P0-07、P1-01 |
| **INV-4** | `docs/awooop/inventory/INV-4-hardcoded-namespace-ip.md` | 硬碼 K8s namespace`awoooi-prod`、SSH 主機 IP、白名單**含新 GCP IP34.143.170.20、34.21.145.224**| P0-08、P0-13 |
| **INV-5** | `docs/awooop/inventory/INV-5-migration-compatibility-matrix.md` | 版本相容矩陣SQLAlchemy 1.x→2.x / Alembic / Pydantic v1→v2 / FastAPI 0.x / Python 3.10→3.12;每個 breaking change + 影響範圍 | critic |
| **INV-6** | `docs/awooop/inventory/INV-6-rollback-playbook-register.md` | 6 個 rollback playbookPhase 1 schema rollback、Phase 2 Redis key rollback、Phase 5 MCP Gateway rollback、Phase 6 EwoooC rollback、Ollama GCP→Local fallback rollback、approval flow rollback | migration |
| **INV-7** | `docs/awooop/inventory/INV-7-pr-cutting-plan.md` | 11 個 PR 切割方案refactor-specialist 設計):每 PR 的範圍、前置依賴、review 者、合併順序 | refactor |
| **INV-8** | `docs/awooop/inventory/INV-8-background-loop-catalog.md` | 31 個 background loop 逐一列出名稱、位置main.py 行號)、類別標記、遷移策略、預計完成 Phase | onboarder |
| **INV-9** | `docs/awooop/inventory/INV-9-global-singleton-catalog.md` | 13 個全域單例逐一列出:名稱、位置、依賴方、分解策略、遷移風險 | onboarder |
### 2.9 Phase 0 驗收標準
- [ ] ADR-1111155 份 AwoooP 核心 ADR全部 Accepted
- [ ] ADR-1161249 份強化 ADR全部 Accepted
- [ ] ADR-UI-01044 份 UI ADR全部 Accepted或 Proposed + 統帥批准開工)
- [ ] ADR-106 補入 Strangler Fig Quantified Gates + GCP Ollama 章節
- [ ] INV-1INV-99 份 Inventory完成初稿
- [ ] 無任何 runtime code 變動
- [ ] `git diff --check` 通過
---
## 3. 8-Phase 詳細工作項
> 每項含:目標、範圍(精確路徑)、輸入(前置依賴)、輸出(交付物)、驗收標準、邊界(禁止碰什麼)
### Phase 1 — Control Plane Schema Foundation
**目標**:建立 PostgreSQL contract control plane 最小可用骨架,修正舊 SQL migration 三大 blocker決定高流量表 partition 策略。
**前置依賴**Phase 0 全部完成(所有 ADR + Inventory
**範圍(精確檔案)**
- `apps/api/migrations/` — 新增 migration files
- `apps/api/src/models/` — 新增 AwoooP SQLAlchemy models
- `apps/api/src/repositories/` — 新增 AwoooP repositories
- `docs/runbooks/` — 新增 partition + retention runbook
**禁止碰**
- 任何既有 repository 方法(留給 Phase 2
- provider 行為(`ai_router.py` / `ollama_*.py`
- Telegram/LINE webhook 路徑
- `apps/web/`
- 任何 K8s manifest
**工作項(順序執行)**
```
1.1 表名核對
- grep 確認 `incidents`(非 incident_records
- grep 確認 `mcp_audit_log`(非 mcp_audit_snapshots
- 修正 ORM: SQLAlchemy 2.x mapped_column、補齊 Numeric/UniqueConstraint/func import
- 每個 migration 強制有 down migrationrollback SQL
1.2 Task 9 順序修正(必須 Phase 1.1 之前完成)
- Dockerfile: agent_loader default path 指向 ConfigMap mount
- ConfigMap 預載: 確認 agent prompt 路徑在 ConfigMap 已存在
- 驗收dry-run 一個 agent loader輸出非 None
1.3 AwoooP 控制面表(新增 migration
- awooop_projectstenant 主表project_id VARCHAR PKbudgetACL
- awooop_contract_revisions六合約共用 revision 表append-only見 §4.1 完整欄位)
- awooop_active_revisionsactive pointer指向特定 revision_id
- awooop_artifact_refsprompt/schema/eval ref + sha256 + type
- awooop_project_migration_stateStrangler Fig 階段追蹤per project × per capability
- awooop_contract_outboxADR-113active revision 切換事件for worker invalidation
- awooop_channel_event_dedupeADR-114idempotency唯一鍵
- awooop_platform_subjectsADR-115canonical principal mapping
- awooop_budget_ledgerADR-120token budgetper project × per period
1.4 高流量表(在 Phase 4/7 建立時已決定 partition此時寫規則
- 須在本 Phase migration 中加 partition template comment不執行留 Phase 4
- awooop_run_state → range partition by created_at
- awooop_channel_event → range partition by created_at
- awooop_mcp_gateway_audit → range partition by created_at
- awooop_agent_audit_log → range partition by created_at
- retention: 90 天 hot + 1 年 warmpg_partman / cron job
- 寫進 docs/runbooks/awooop-partition-retention.md
1.5 AWOOOI Bootstrapseed data
- INSERT INTO awooop_projects(project_id='awoooi', display_name='AWOOOI', migration_mode='legacy_awoooi_default')
- 驗收AWOOOI 0 行為改動
1.6 RLS 骨架ADR-118
- 所有 awooop_* 表啟用 RLS
- policy: USING (project_id = current_setting('app.project_id', TRUE))
- bypass role: awooop_platform只給 platform worker 用)
- 注意RLS 需要 migration + 測試,不只是 ALTER TABLE ENABLE ROW LEVEL SECURITY
1.7 Immutability 測試
- published contract revision 嘗試 UPDATE → 必失敗trigger 或 check constraint
- draft 與 active 隔離runtime 讀取 view 不含 draft
- 自動化pytest + db-expert review
```
**RACI**
- R執行fullstack-engineer
- A負責db-expert review統帥批准
- C諮詢refactor-specialistmigration PR 切割、critic最終 review
- I通知migration-engineer版本相容驗證
**DoD**
- 所有 migration up/down dry-run 通過
- AWOOOI 可表示為 `project_id=awoooi`0 行為改動
- RLS 測試cross-project SELECT 被拒絕
- partition runbook 已建立
---
### Phase 2 — Tenant Isolation & Namespace Hardening
**目標**:在開放任何下游 tenant 之前,把 AWOOOI 自己變成乾淨的 tenant。
**前置**Phase 1 完成
**範圍**
- `apps/api/src/services/` — Redis key 遷移(依 INV-1
- `apps/api/src/repositories/` — 加 project_id filter依 INV-2
- `apps/api/src/services/security_interceptor.py` — nonce 修補P0-05ADR-116
- `apps/api/src/api/v1/webhooks.py` — replay 防護P0-06ADR-116
- `apps/api/src/core/telemetry.py:71` — 移除硬碼 IP assertP0-08
- `apps/api/src/services/decision_manager.py:240` — silence key 常數化P1-24
- `apps/api/src/services/ollama_auto_recovery.py:230` — 移除第二定義P0-11
- `apps/api/src/plugins/mcp/mcp_bridge.py:592-681` — namespace 動態化P0-13
- `apps/api/src/services/consensus_engine.py` — CONSENSUS_PREFIX 加 project 前綴P0-12
- `apps/api/src/hermes/nl_gateway.py` — Redis key 加 project 前綴P1-16
- `apps/api/src/services/anomaly_counter.py:790` — per-project 改造P1-17
- `apps/api/src/services/incident_service.py:603` — SCAN 加 prefixP1-18
**禁止碰**
- `awooop_contract_revisions` 以外的 AwoooP Phase 1 新表結構
- EwoooC / Tsenyang 任何接入(留 Phase 6
- 任何 provider routing 改動Ollama GCP 拓撲已由 ADR-110 定案,不在此 Phase 改)
**工作項**
```
2.1 Redis 三階段雙寫遷移計畫執行(依 INV-1分三批
批次 ACritical Path影響 Ollama GCP 拓撲):
- ollama:current_primary→ {project_id}:ollama:primary
注意:要同時支援三層 GCP-A/GCP-B/LocalINV-1 需確認所有寫入點
- ollama_auto_recovery.py:230 第二定義刪除,統一常數
批次 B費用 + 告警關鍵):
- ai_rate:total_cost:gemini → {project_id}:ai_rate:total_cost:gemini
- telegram:polling:leader → platform:telegram:polling:leaderplatform_resource
- telegram_silence:{target} → {project_id}:telegram_silence:{target}
同步更新 decision_manager.py:240 import gateway 常數
批次 Cworking memory
- consensus: → {project_id}:consensus:consensus_engine.py
- hermes Redis keysnl_gateway.py
- anomaly_counter 6 個 prefix
- incident:* SCANincident_service.py:603
每批次Phase A雙寫 30 天)→ Phase B雙讀 14 天)→ Phase C移除舊 key
2.2 Security hardeningADR-116
- telemetry.py:71移除 "192.168.0.188" 硬碼 assert改為 config-driven allowed endpoints
- security_interceptor.py:451-490nonce 重設計server_secret 必參與 HMAC
- webhooks.py:679-728加 timestamp±5min window+ nonceRedis dedup
- requires_approval改為從 policy contract 讀取,禁止 LLM output 決定
- approval_tokenHS25615min TTLjti 唯一性Redis NX
2.3 Repository project_id 改造(依 INV-2
- 所有 30+ repository 方法加 project_id filter
- K8s namespace 白名單 → tenant-awaremcp_bridge.py:592-681 動態化)
- SSH 主機白名單 → tenant-aware依 INV-4
2.4 Background loop 標記(依 ADR-123INV-3/INV-8
- 31 個 loop 標記為 platform_internal / legacy_awoooi_default / requires_project_id
- platform_internal 帶 project_id=__platform__
- legacy_awoooi_default fallback 到 project_id=awoooi寫退場時程
2.5 Global singleton 分解第一步(依 ADR-124INV-9
- 只做AnomalyCounterP1-17 已修per-project 改造
- 其餘 13 個全域單例列出退場時程(不在此 Phase 全拆,防爆炸半徑)
2.6 Token Budget Hard Kill 基礎ADR-120
- budget_ledger 表 migrationPhase 1 已建,此 Phase 寫入邏輯)
- 每 LLM call 前check budget → hard kill if exceeded不只 log
- Redis hot counter + PG 事務 hard stop
```
**RACI**
- Rfullstack-engineer + refactor-specialist大量 repository 改動)
- Adb-expertrepository 改動 review、vuln-verifiersecurity hardening PoC 驗證)
- Ccritic整體 diff review、migration-engineer相容性確認
- Itool-expertK8s namespace 改動相關)
**DoD**
- INV-1 所有 P0 key 完成三階段遷移Phase A 完成Phase B/C 在觀察期)
- cross-project test 全紅pytest 覆蓋)
- `grep -r "awoooi-prod" apps/api/src/` 結果為 0
- `grep -r "192.168.0.188" apps/api/src/` telemetry assert 消失
- vuln-verifier PoC 重跑P0-05 nonce 偽造失敗、P0-06 webhook replay 失敗
- Budget hard kill 測試:超額後 LLM call 被拒絕
---
### Phase 3 — Contract Packages & Validators
**目標**:六合約從散文升級為可驗證程式。
**前置**Phase 1 完成contract_revisions 表存在)
**範圍**
- `packages/awooop-contracts/`(此時才建立!)
- `apps/api/src/services/contract_service.py`(新建)
- `apps/api/src/repositories/contract_repository.py`(新建)
**禁止碰**
- 任何既有 provider / router / telegram 路徑
- `apps/web/`UI 留 Phase 8 之後)
**工作項**
```
3.1 建立 packages/awooop-contracts/(此時才有真實內容)
- 六合約 JSON SchemaProject/Tenant、Agent、MCP Gateway、Policy/Routing、Run State、Channel Event
- Pydantic v2 models 對應六合約
- envelope schemaplatform invocation、MCP tool call、run state transition、channel event
- golden fixturesvalid × 6 + invalid × 6
3.2 Contract lifecycle service
- draft():建立 draft revision不可被 runtime 讀
- publish():產生 immutable published revisionbody_hash = sha256(body_json)
- activate():更新 active pointer寫入 contract_outboxADR-113
- get_active()runtime 讀取路徑,只返回 published + active
- 全部操作記錄 audit log
3.3 Output schema validator middleware
- LLM 回傳 → 過 schema validator → 失敗 → retry上限 3 次)→ 失敗 → error codeE-SCHEMA-001
- 任何 schema 不符的 LLM 輸出無法到達 channel adapter
3.4 Contract governanceADR-112
- CODEOWNERS 指定 packages/awooop-contracts/
- publish APIHMAC 簽章驗證
- activate APIapproval workflowmulti_sig_redis 路徑)
3.5 SHA-256 artifact 驗證
- 所有 artifact ref 含 sha256
- runtime 讀取時驗 hash與 DB 記錄比對)
```
**DoD**
- schema 不符的 LLM 輸出無法到達 channel adapter整合測試
- AWOOOI 第一份 Agent contract 可 publish + activateE2E
- prompt/schema ref 必含 sha256
---
### Phase 4 — Platform Shell in Shadow Mode
**目標**:建立第一個 runtime shell只跑 shadow不改 legacy 行為。
**前置**Phase 3 完成
**範圍**
- `apps/api/src/api/v1/platform/` — 新增 platform runs API
- `apps/api/src/services/platform_runtime.py` — 新建
- `apps/api/src/services/run_state_machine.py` — Run FSM 實作P2-02
- `apps/api/src/workers/platform_worker.py` — 新建
- `apps/api/src/services/audit_sink.py` — 加 redactionP1-08
**禁止碰**
- 任何既有 `/v1/incidents/``/v1/webhooks/` 路徑
- Telegram bot handlerlegacy 維持)
- EwoooC 接入(留 Phase 6
**工作項**
```
4.1 Run API shellshadow only
- POST /v1/platform/runs
- 生成 run_idUUID v7、trace_idW3C traceparent compatible
- 解析 project + agent contract active revision
- 解析 EffectivePolicy6 層合併,不改 provider 行為)
4.2 Run State MachineADR-114 + ADR-119
- States: PENDING → RUNNING → WAITING_TOOL → WAITING_APPROVAL → COMPLETED / FAILED / CANCELLED
- lease_until、heartbeat_at、attempt_count 欄位
- SKIP LOCKED 取單(防 double-pickup
- stale run reaper每分鐘掃 expired lease回到 PENDING 或 FAILED
- SAGA step journalADR-119每個 tool call 寫入 step_id、補償指令
4.3 IdempotencyADR-114
- (project_id, channel_type, provider_event_id) 複合 unique
- 重複事件 return 既有 run_id不產生新 run
- Redis NX + PG constraint 雙層保護
4.4 Audit log redactionADR-116
- audit_sink 寫入前過 sanitization_service pipeline
- PII / secret pattern 硬攔(含 GCP IP、PG password、Telegram token 等)
- audit log 不記錄 raw LLM input/output只記 hash + schema validation result
4.5 ObservabilityADR-121
- OTel GenAI span 命名gen_ai.request.*
- token 計數 attributegen_ai.usage.prompt_tokens 等)
- metrics label只 project_id / agent_id / status / provider禁止 run_id/trace_id/session_id 進 metrics
- run_id / trace_id 只進 logs/traces不進 metrics
4.6 Shadow mode wiring
- 選定 3 個 AWOOOI 事件 mirror 到 shadow不發 user response
- shadow run 0 destructive tool callMCP write/execute 全 block
4.7 Token Budget Hard KillADR-120
- per-run token budgetfrom EffectivePolicy
- 超額 → hard kill → FAILED state → error code E-BUDGET-001
- 每 run 完成後寫入 budget_ledger實際消耗
```
**RACI**
- Rfullstack-engineerAPI + service、db-expertrun state schema review
- Acriticshadow mode 設計 review、vuln-verifierredaction PoC
- Cdebuggertrace_id 貫穿設計、tool-expertOTel 整合)
- Imigration-engineerworker lease 相容性)
**DoD**
- shadow run 0 user-visible response、0 destructive tool callvuln-verifier 驗證)
- legacy AWOOOI 行為 0 改變(回歸測試通過)
- worker crash 後 stale run 1 分鐘內被回收(自動化測試)
- duplicate event 不產生重複 runidempotency 測試)
- audit log 0 secret 命中vuln-verifier 抽樣 100 筆)
- token budget 超額觸發 hard kill整合測試
---
### Phase 5 — MCP Gateway First Slice
**目標**tool 授權搬到 Gatewayread-only 工具先進,解決 sanitization enforcement。
**前置**Phase 4 完成
**範圍**
- `apps/api/src/plugins/mcp/gateway.py` — 新建 MCP Gateway
- `apps/api/src/plugins/mcp/registry.py:24-71``_provider``__provider`P1-05
- `apps/api/src/plugins/mcp/mcp_bridge.py` — 接入 Gateway
- `apps/api/src/services/sanitization_service.py` — enforcement pointP1-09
**禁止碰**
- MCP write/execute tools寫/執行工具留 Phase 8
- Telegram approval flow改動在 Phase 8
**工作項**
```
5.1 MCP Gateway 表
- awooop_mcp_tool_registrytool_id, project_id, agent_id, tool_type, allowed_scopes
- awooop_mcp_grantsgrant_id, project_id, agent_id, tool_id, granted_by, expires_at
- awooop_mcp_credential_refsref_id, tool_id, k8s_secret_ref, sha256
- awooop_mcp_gateway_auditcall_id, trace_id, run_id, tool_id, credential_ref, latency_ms, result_status
5.2 Five-gate enforcement
- Check: Project AND Agent AND Tool AND Environment AND Approval
- 任一不符 → 拒絕 + 記錄 audit + error code E-MCP-GATE-XXX
5.3 Result sanitization enforcementP1-04、P1-09
- 所有 MCP tool result 必經 sanitization_service pipeline
- MCP Gateway 加 sanitization middleware不允許 raw result 直接進 LLM context
- 進 LLM 前一層 + 進 audit sink 一層(雙層 redaction
- sast 掃描 agent 程式碼路徑0 raw credential 接觸
5.4 _provider 修正P1-05
- registry.py: _provider → __provider雙底線 Python name mangling
- 加 unit test外部 reflect 取用 → AttributeError
5.5 Credential isolation
- agent 程式碼不直接存取 K8s Secret
- Gateway 解析 credential_ref → 回傳 masked resulttoken 替換)
- 2026-04-18 secret leak 重演測試kubectl describe 輸出不出現在 LLM context
5.6 MCP OAuth 2.1ADR-117
- 實作 per-tenant dynamic client registrationRFC 7591
- Resource IndicatorsRFC 9728防 Confused Deputy
- PKCE flow for MCP Server binding
```
**RACI**
- Rfullstack-engineerGateway service
- Avuln-verifiercredential isolation 驗證、critic架構 review
- Ctool-expertMCP spec 確認、db-expertGateway 表設計 review
- Imigration-engineerMCP registry 相容性)
**DoD**
- 2026-04-18 secret leak 重演測試通過kubectl describe 輸出不出現在 LLM context 或 audit row
- sast 掃描agent 程式碼路徑 0 raw credential 接觸
- `__provider` 雙底線 unit test 通過
- Five-gate 全部 integration test 覆蓋
---
### Phase 6 — EwoooC Read-Only Tenant Onboarding
**目標**:以真實下游 tenant 驗證 AwoooP全 read-only。
**前置**Phase 5 完成、telemetry.py:71 hardcoded IP assert 已移除Phase 2 完成)
**範圍**
- `apps/api/src/` — EwoooC project provisioning
- `packages/awooop-contracts/` — EwoooC agent contract
- `apps/api/src/services/provider_proxy.py` — 新建 Provider Proxy AdapterP1-02
**禁止碰**
- AWOOOI 任何既有業務邏輯
- MCP write/execute tools
**工作項**
```
6.1 EwoooC project provisioning
- INSERT INTO awooop_projects(project_id='ewoooc', ...)
- 不可讀 AWOOOI dataRLS 驗證)
6.2 openclaw-biz agent contract
- 針對市場情報 domain 設計 I/O schema
- 安全 ceilingread-only only禁止 infra tool
6.3 Provider Proxy AdapterP1-02ADR-115
- 不只是改 OLLAMA_API_BASE
- Proxy 入口強制注入 envelopeproject_id / agent_id / trace_id / run_id
- 過 EffectivePolicy + budget guard + audit
- GCP Ollama 三層拓撲EwoooC 走相同 primary/secondary/fallback 路由
- read-only / model-call 入口優先啟用
6.4 Market intelligence MCP tools 註冊
- 4 個 read-only toolsmarket_data_fetch、product_catalog_query、competitor_analysis、trend_report
- 全部在 MCP Gateway 五重 gate 管控
6.5 Shadow → Canary 升級計畫
- 先 14 天 shadowStrangler Fig gate 量化)
- 符合條件後升 canaryselected responses
- canary 通過再升 read_only
```
**RACI**
- Rfullstack-engineer
- AcriticEwoooC 資料隔離 review、vuln-verifiercross-tenant isolation PoC
- Cdb-expertRLS 驗證、migration-engineerEwoooC rollback playbookINV-6
- Itool-expertGCP Ollama 拓撲 EwoooC 路由設定)
**DoD**
- EwoooC SELECT 無法讀到 AWOOOI dataRLS + cross-tenant pytest
- Provider Proxy Adapter E2E 測試envelope 正確注入
- budget / audit 完全 project-scoped
- EwoooC 啟動時 telemetry.py 不再因 IP assert 失敗
---
### Phase 7 — Communication Hub Increment
**目標**:標準化 channel 但不切斷既有 bot。
**前置**Phase 6 完成
**範圍**
- `apps/api/src/services/channel_hub.py` — 新建
- `apps/api/src/services/telegram_gateway.py` — mirror inbound events
- `apps/api/src/api/v1/platform/channel.py` — 新建
**禁止碰**
- 既有 telegram bot handler維持 legacy 權威,直到 canary 量化 gate 通過)
- LINE / Slack 接入(留 v2
**工作項**
```
7.1 awooop_conversation_event + awooop_outbound_message 表
- partition by created_atPhase 1 已定策略)
- retention policy 配置
7.2 Telegram inbound mirror
- 現有 telegram_gateway.py 事件複製到 awooop_conversation_event
- canonical principal mappingADR-115所有 sender 寫入 awooop_platform_subjects
7.3 Progressive Feedback PolicyP2-11
- WAITING_TOOL / RUNNING / WAITING_APPROVAL → 必發 Telegram 暫態訊息
- 用 edit_message 更新(非新訊息,不觸發通知)
- 首則進度訊息 ≤ 30s
7.4 Idempotency 驗證(已由 Phase 4 完成)
- duplicate channel retry 不產生 duplicate run整合測試
7.5 Adapter-level 安全
- 所有 channel adapterescaping + redaction + idempotency + delivery audit
- channel adapter 0 LLM 呼叫、0 MCP 呼叫pytest 覆蓋)
7.6 量化 gate 監控儀表板(配合 ADR-UI-03
- Strangler Fig gate 指標decision divergence / p95 latency / error rate
- 供 Phase 8 升級決策用
```
**RACI**
- Rfullstack-engineerAPI + channel hub
- Acriticchannel 設計 review、debuggertrace_id 貫穿驗證)
- Cfrontend-designer進度訊息 UX、tool-expertTelegram API 規格確認)
- Imigration-engineerchannel 相容性)
**DoD**
- channel adapter 0 LLM 呼叫、0 MCP 呼叫
- async run 首則進度訊息 ≤ 30s
- duplicate retry 不產生 duplicate run
---
### Phase 8 — Suggest & Controlled Write Paths
**目標**:從 read-only 升級到 propose再到 controlled execute。
**前置**Phase 7 完成 + Strangler Fig shadow→canary gate 全通過
**範圍**
- `apps/api/src/services/multi_sig_redis.py` — approval token 簽章P1-06
- `apps/api/src/services/approval_timeout_resolver.py` — 加 trace_idP1-15
- `apps/api/src/api/v1/platform/suggest.py` — suggest mode endpoint
- Feature flags for write/execute paths
**禁止碰**
- 任何 write/execute tool 的預設啟用
- Strangler Fig 量化 gate 通過前不做 auto_remediate
**工作項**
```
8.1 Approval Token 安全強化P1-06ADR-116
- WAITING_APPROVAL resume API強制驗 approval_tokenHS25615min TTLjti Redis NX
- approval statePG 為 source of truthRedis 為 cache
- 過期 / 已決 / 重放 → 全部拒絕 + error code E-APPROVAL-XXX
8.2 multi_sig_redis.py + approval_timeout_resolver.py trace_id 補入
- 所有 approval 操作加 trace_idP1-15
- 完整鏈路可追蹤debugger 驗證)
8.3 Suggest mode for AWOOOI SRE flows
- 選定低風險 3 個 SRE flowe.g., 告警靜音建議、playbook 推薦)
- suggest 模式AI 輸出建議,人工決定執行
- 量化 gateADR-106 補章):
* shadow → canary≥14 天 + divergence <5% + p95 <10% + 0 P1 incident
* canary → read_only≥7 天 + error rate <0.5% + cost diff <50%
* read_only → suggest≥14 天 + accept rate ≥50% + 0 hallucination escalation
* suggest → auto_remediate≥30 天 + rollback evidence ≥3 次 + approval token live + dry-run ≥99%
8.4 Dry-run 與 rollback evidence gate
- 每個 write/execute tool 必須有 dry-run mode
- rollback playbook 寫入 INV-6Phase 0 已完成,此時執行驗證)
- 記錄每次 rollback 結果作為 Phase 8 gate evidence
8.5 Feature Flag Registry見 §10
- suggest modefeature flag AWOOOP_SUGGEST_MODEdefault OFF
- controlled writefeature flag AWOOOP_WRITE_MODEdefault OFF
- 需顯式 flip 才啟用,不能環境變數意外帶入
8.6 vuln-verifier PoC 驗收
- WAITING_APPROVAL 無 token resume 必失敗
- Redis 宕機時 approval 仍可從 PG 恢復
```
**RACI**
- Rfullstack-engineer
- Avuln-verifierapproval security PoC、criticwrite path review
- Cdebuggertrace_id 驗證、db-expertapproval state PG review
- Imigration-engineerfeature flag rollback
**DoD**
- WAITING_APPROVAL 無 token resume 被拒絕vuln-verifier PoC 通過)
- Redis 宕機後 approval 從 PG 恢復(整合測試)
- write/execute 預設 OFFfeature flag 手動 flip 才啟用
- 所有 Strangler Fig gate 量化驗收通過critic + db-expert + vuln-verifier 三方簽核)
---
## 4. 資料庫詳細 Schema
### 4.1 awooop_contract_revisions六合約共用 revision 表)
```sql
CREATE TABLE awooop_contract_revisions (
revision_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
project_id VARCHAR(64) NOT NULL REFERENCES awooop_projects(project_id),
contract_family VARCHAR(32) NOT NULL -- project_tenant/agent/mcp_gateway/policy_routing/run_state/channel_event
contract_id VARCHAR(128) NOT NULL,
version VARCHAR(32) NOT NULL,
lifecycle_status VARCHAR(16) NOT NULL DEFAULT 'draft', -- draft/published/superseded/revoked
body_json JSONB NOT NULL,
body_schema_version VARCHAR(32) NOT NULL,
body_hash CHAR(64) NOT NULL, -- SHA-256 hex
created_by VARCHAR(128) NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
published_at TIMESTAMPTZ,
supersedes_revision_id UUID REFERENCES awooop_contract_revisions(revision_id),
-- Immutability constraint
CONSTRAINT published_body_immutable CHECK (
lifecycle_status = 'draft' OR body_json IS NOT NULL
)
);
-- Runtime reads view只看 published/active不看 draft
CREATE VIEW awooop_published_revisions AS
SELECT * FROM awooop_contract_revisions
WHERE lifecycle_status IN ('published', 'superseded');
-- Append-only trigger
CREATE OR REPLACE FUNCTION prevent_revision_update()
RETURNS TRIGGER AS $$
BEGIN
IF OLD.lifecycle_status != 'draft' THEN
RAISE EXCEPTION 'Published contract revision is immutable';
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER enforce_revision_immutability
BEFORE UPDATE ON awooop_contract_revisions
FOR EACH ROW EXECUTE FUNCTION prevent_revision_update();
-- RLS
ALTER TABLE awooop_contract_revisions ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON awooop_contract_revisions
USING (project_id = current_setting('app.project_id', TRUE)
OR current_user = 'awooop_platform');
```
### 4.2 awooop_run_state含 lease + SAGA journal
```sql
CREATE TABLE awooop_run_state (
run_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
project_id VARCHAR(64) NOT NULL,
agent_id VARCHAR(128) NOT NULL,
trace_id CHAR(32), -- W3C trace_id hex
parent_run_id UUID,
status VARCHAR(32) NOT NULL DEFAULT 'PENDING',
migration_mode VARCHAR(32) NOT NULL DEFAULT 'shadow', -- shadow/canary/read_only/suggest/auto_remediate
-- Worker lease
lease_until TIMESTAMPTZ,
heartbeat_at TIMESTAMPTZ,
attempt_count INT NOT NULL DEFAULT 0,
worker_id VARCHAR(128),
-- Token budget
budget_limit_tokens BIGINT,
tokens_used BIGINT NOT NULL DEFAULT 0,
-- Timestamps
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
completed_at TIMESTAMPTZ,
-- SAGA journalstep-level
saga_steps JSONB DEFAULT '[]', -- [{step_id, tool, status, compensation_cmd, completed_at}]
-- Metadata
input_hash CHAR(64), -- SHA-256 of input envelopefor audit
effective_policy_revision_id UUID
) PARTITION BY RANGE (created_at);
-- Per-project RLS
ALTER TABLE awooop_run_state ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON awooop_run_state
USING (project_id = current_setting('app.project_id', TRUE)
OR current_user = 'awooop_platform');
```
### 4.3 awooop_budget_ledgerToken Budget Hard Kill
```sql
CREATE TABLE awooop_budget_ledger (
ledger_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
project_id VARCHAR(64) NOT NULL,
period DATE NOT NULL, -- YYYY-MM-DD月份第一天
provider VARCHAR(32) NOT NULL,
tokens_input BIGINT NOT NULL DEFAULT 0,
tokens_output BIGINT NOT NULL DEFAULT 0,
cost_usd NUMERIC(12, 6) NOT NULL DEFAULT 0,
hard_kill_at NUMERIC(12, 6), -- NULL = no limit
hard_killed BOOLEAN NOT NULL DEFAULT FALSE,
last_run_id UUID,
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
UNIQUE(project_id, period, provider)
);
```
### 4.4 8 群新增/擴充表清單db-expert 發現)
| 表名 | 缺失欄位 / 缺失 Index | Phase |
|------|----------------------|-------|
| `incidents` | 加 `project_id``trace_id``awooop_run_id` | Phase 2 |
| `playbooks` | 加 `project_id``agent_id` | Phase 2 |
| `km_entries` | 加 `project_id``namespace` | Phase 2 |
| `mcp_audit_log` | 加 `trace_id``run_id``project_id`;加 index on (run_id) | Phase 2 |
| `ai_decisions` | 加 `project_id``run_id`、加 index on (run_id) | Phase 2 |
| `approval_records` | 加 `trace_id``approval_token_jti`、加 index on (jti) | Phase 2/8 |
| `telegram_events` | 加 `project_id``platform_subject_id` | Phase 7 |
| `ollama_health_checks` | 加 `host_tier`gcp_a/gcp_b/local`project_id=__platform__` | Phase 2 |
---
## 5. 安全修補計畫vuln-verifier 驗收)
### 5.1 PoC 確認的三個漏洞
| 漏洞 | 位置 | PoC 狀態 | 修補方案 | Phase |
|------|------|---------|---------|-------|
| Nonce 偽造server nonce 不依賴 server_secret| security_interceptor.py:451-490 | **PoC 確認可通過驗證** | HMAC(server_secret + nonce)server_secret 從 K8s Secret 注入 | Phase 2 |
| Webhook replay無 timestamp/nonce| webhooks.py:679-728 | **PoC 確認可 replay** | 加 timestamp±5min+ nonce Redis NX | Phase 2 |
| requires_approval 由 LLM output 決定 | decision_manager.pyapproval 鏈)| **PoC 確認可繞過** | policy contract 決定,禁止 LLM output 影響 | Phase 2 |
### 5.2 approval_token 規格
```
簽章算法HS256
Payload:
- jti: UUID唯一性Redis NX 15min TTL
- iss: "awooop-platform"
- sub: "{project_id}:{run_id}"
- aud: "awooop-approval"
- exp: now + 15min
- approval_type: "human" | "system"
- decision_scope: [tool_id, ...]
驗證:
1. 簽章驗證
2. exp 未過期
3. Redis NX 確認 jti 未使用(防 replay
4. sub 與 resume 的 run_id 吻合
5. decision_scope 與 run 的 tool 吻合
```
### 5.3 vuln-verifier 每 Phase 驗收清單
- Phase 2nonce 偽造失敗、webhook replay 失敗、requires_approval 無法由 LLM 決定
- Phase 4audit log 0 secret 命中(抽樣 100 筆)
- Phase 5agent 程式碼路徑 0 raw credentialsast
- Phase 6cross-tenant isolation PoCEwoooC 無法讀 AWOOOI
- Phase 8approval token 無 token resume 被拒、Redis 宕機後從 PG 恢復
---
## 6. API Endpoint 完整清單fullstack 補充)
### 6.1 現有(不動)
- `POST /v1/webhooks/telegram`
- `POST /v1/webhooks/alertmanager`
- `GET /v1/incidents/`
- `POST /v1/decisions/`
### 6.2 Phase 4 新增Platform Shell
- `POST /v1/platform/runs` — 建立 runasync
- `GET /v1/platform/runs/{run_id}` — 查詢 run state
- `GET /v1/platform/runs/{run_id}/steps` — 查詢 SAGA steps
- `POST /v1/platform/runs/{run_id}/cancel` — 取消 run
### 6.3 Phase 4-5 新增Approval
- `POST /v1/platform/runs/{run_id}/approve` — 帶 approval_token 的 resume
- `POST /v1/platform/runs/{run_id}/reject` — 拒絕(帶理由)
### 6.4 Phase 6 新增Tenant
- `POST /v1/platform/projects` — 建立 projectadmin only
- `GET /v1/platform/projects/{project_id}/migration_state` — 查詢 Strangler Fig 狀態
- `POST /v1/platform/projects/{project_id}/contracts` — 建立 contract draft
- `POST /v1/platform/projects/{project_id}/contracts/{contract_id}/publish` — publish
- `POST /v1/platform/projects/{project_id}/contracts/{contract_id}/activate` — activate
### 6.5 Phase 7 新增Channel Hub
- `GET /v1/platform/channel_events` — 查詢 conversation eventswith pagination
- `POST /v1/platform/outbound` — 發送 outbound messageadmin/test
---
## 7. 錯誤碼字典(必補 9 個)
| Error Code | HTTP Status | 描述 | 場景 |
|------------|-------------|------|------|
| `E-SCHEMA-001` | 422 | LLM output schema validation failed | Phase 3 contract validator |
| `E-BUDGET-001` | 429 | Token budget hard kill triggered | Phase 4 budget guard |
| `E-APPROVAL-001` | 401 | approval_token missing or invalid | Phase 8 approval resume |
| `E-APPROVAL-002` | 401 | approval_token expired | Phase 8 |
| `E-APPROVAL-003` | 409 | approval_token already used (replay) | Phase 8 |
| `E-MCP-GATE-001` | 403 | MCP tool not authorized for this project | Phase 5 |
| `E-MCP-GATE-002` | 403 | MCP tool not authorized for this agent | Phase 5 |
| `E-MCP-GATE-003` | 403 | MCP write/execute tool blocked (not in auto_remediate mode) | Phase 5/8 |
| `E-TENANT-001` | 403 | Cross-tenant data access blocked | Phase 2+ |
| `E-IDEMPOTENT-001` | 200 | Duplicate event, returning existing run_id | Phase 4 |
| `E-RATE-001` | 429 | Project rate limit exceeded | Phase 2+ |
| `E-SAGA-001` | 500 | SAGA compensation failed, manual intervention required | Phase 4/ADR-119 |
---
## 8. 前端 Operator Consolefrontend-designer8 個模組)
> 實作在 Phase 8 之後(或 Phase 6 可 prototype Operator Console
> ADR-UI-01~04 定架構,此處為工作項清單
| 模組 | 描述 | 優先順序 |
|------|------|---------|
| **Tenant Management** | project 列表、建立、migration_state 視覺化、budget 設定 | P1Phase 6 prototype|
| **Contract Lifecycle** | draft/publish/activate 操作、revision diff、六合約 family 篩選 | P1Phase 6 prototype|
| **Run Monitor** | run FSM 視覺化、shadow/canary/active 標記、trace_id drill-down | P1Phase 4 後)|
| **Strangler Fig Dashboard** | shadow→canary gate 量化指標divergence / latency / error rate即時儀表板 | P1Phase 7 後)|
| **Budget & Cost** | per-project token budget、hard kill 觸發歷史、成本趨勢GCP Ollama vs paid provider| P2 |
| **Audit Log Viewer** | audit log 查詢redaction 後、secret 命中警告、trace_id 關聯 | P2 |
| **MCP Gateway Admin** | tool registry、grants 管理、credential refsmasked、audit | P2 |
| **Principal Directory** | platform_subject 查詢、Telegram/LINE/API user mapping | P3 |
**與現有設計系統整合**
- 必須使用 next-intl禁止 hardcode 中文/英文)
- 禁止 emoji使用 Lucide/SVG icon
- 遵循 `feedback_design_system_consistency.md` 全站設計規範
- 禁止直接存取內網 IP`feedback_frontend_internal_ip_ban.md`
---
## 9. 重構切割計畫11 PRrefactor-specialist
> 每 PR 必須獨立可合併、有 rollback 能力、不依賴後 PR
| PR# | 標題 | 前置 PR | 影響範圍 | 風險 |
|-----|------|---------|---------|------|
| PR-01 | `telemetry.py:71` 硬碼 IP assert 移除 | 無 | 1 行 | 低 |
| PR-02 | `decision_manager.py:240` silence key 常數化 | 無 | 2 行 | 低 |
| PR-03 | `ollama_auto_recovery.py:230` 第二定義移除 | 無 | ~5 行 | 低 |
| PR-04 | `_provider``__provider`registry.py| 無 | ~20 行 | 低 |
| PR-05 | `mcp_bridge.py` namespace 動態化 | 無 | ~30 行 | 中 |
| PR-06 | `consensus_engine.py` CONSENSUS_PREFIX 加 project 前綴 | Phase 2 Redis 雙寫 Phase A | ~15 行 | 中 |
| PR-07 | nonce 重設計 + webhook timestamp/nonceADR-116| 無 | ~100 行 | 高(安全修補)|
| PR-08 | Repository project_id filter 批次 1incidents/playbooks/km| Phase 1 schema | ~200 行 | 中 |
| PR-09 | Repository project_id filter 批次 2mcp/ai_decisions/approval| PR-08 | ~200 行 | 中 |
| PR-10 | Background loop 標記31 個 loopmain.py| ADR-123 | ~150 行 | 中 |
| PR-11 | AnomalyCounter per-project 改造 | PR-10 | ~80 行 | 中 |
> PR-01~05 可並行(無依賴),先做先進。
> PR-06~07 需要 Redis 雙寫 Phase A 先完成。
> PR-08~09 需要 Phase 1 schema 先完成。
---
## 10. Feature Flag / Kill-Switch Registry
| Flag 名稱 | 預設值 | 說明 | 開啟條件 |
|-----------|--------|------|---------|
| `AWOOOP_SHADOW_MODE` | OFF | 啟用 shadow run鏡像但不回應| Phase 4 完成後手動 flip |
| `AWOOOP_CANARY_MODE` | OFF | 啟用 canary部分 user-visible 回應)| shadow gate 14天量化通過 |
| `AWOOOP_READ_ONLY_MODE` | OFF | read-only 查詢搬到 AwoooP | canary gate 7天量化通過 |
| `AWOOOP_SUGGEST_MODE` | OFF | AI 建議但人工決定 | read_only gate 14天通過 |
| `AWOOOP_WRITE_MODE` | OFF | 受控 write/execute tool 啟用 | suggest gate 30天通過 + rollback evidence ≥3 |
| `AWOOOP_BUDGET_HARD_KILL` | ON | token budget 超額直接終止(非只告警)| **預設 ON**ADR-120|
| `AWOOOP_MCP_OAUTH21` | OFF | MCP OAuth 2.1 flowADR-117| Phase 5 完成後 |
| `AWOOOP_RLS_STRICT` | OFF | 嚴格 RLS 模式(禁止 awooop_platform bypass| Phase 2 完成 + 30天 soak |
| `AWOOOP_EWOOOC_LIVE` | OFF | EwoooC tenant 切為 live非 shadow| Phase 6 canary 7天通過 |
---
## 11. Runbook 清單8 份debugger 需求)
| Runbook | 位置 | 觸發情境 | 主要步驟 |
|---------|------|---------|---------|
| **RB-01**: AwoooP Contract Publish Failure | `docs/runbooks/awooop-contract-publish-failure.md` | schema 驗證失敗、CODEOWNERS reject | 1.查 body_hash 2.查 draft 狀態 3.rollback to previous active |
| **RB-02**: Run State Stuck / Stale Lease | `docs/runbooks/awooop-run-stuck.md` | run 停在 RUNNING > 10min | 1.查 lease_until 2.手動 reaper 3.查 saga_steps 決定補償或放棄 |
| **RB-03**: Budget Hard Kill Triggered | `docs/runbooks/awooop-budget-hard-kill.md` | E-BUDGET-001 大量出現 | 1.查 budget_ledger 2.確認 hard_kill_at 閾值 3.是否 incident 爆發 4.臨時上調 or 等下月 reset |
| **RB-04**: Phase RollbackStrangler Fig| `docs/runbooks/awooop-phase-rollback.md` | canary 錯誤率 > threshold | 1.切回 project_migration_state 到上一個 mode 2.清 Redis canary cache 3.通知 EwoooC如果影響到|
| **RB-05**: Approval Token Replay 告警 | `docs/runbooks/awooop-approval-replay.md` | E-APPROVAL-003 出現 | 1.查 jti Redis key 2.確認 IP / user 3.吊銷 token 4.通知安全 |
| **RB-06**: Cross-Tenant Data Leak 告警 | `docs/runbooks/awooop-cross-tenant-leak.md` | E-TENANT-001 大量出現 | 1.立即停 canary/active mode 2.查 audit log 3.RLS 設定確認 4.PITR restore 評估 |
| **RB-07**: GCP Ollama Failover 異常 | `docs/runbooks/awooop-gcp-ollama-failover.md` | GCP-A/B 同時掛、Local fallback 也掛 | 1.確認 `platform:ollama:primary` Redis key 2.手動設定 fallback 3.確認 paid provider 緊急路由 |
| **RB-08**: SAGA Compensation 失敗 | `docs/runbooks/awooop-saga-compensation-fail.md` | E-SAGA-001 出現 | 1.查 saga_steps JSON 2.找失敗 step 3.手動執行補償指令 4.更新 run 狀態 |
---
## 12. 工具補強計畫tool-expert
| 工具 | 用途 | 安裝位置 | Phase |
|------|------|---------|-------|
| **PgBouncer** | AwoooP 多 worker 下 PG connection pool 防爆 | K8s sidecar 或獨立 Pod | Phase 4 之前 |
| **Sealed Secrets** | 替代 K8s Secret 明文CI/CD 安全 | K3s cluster | Phase 2security hardening 時)|
| **OPA / Cedar** | policy engine授權邏輯集中化取代散落程式碼| 作為 sidecar 或 admission webhook | Phase 5 之前 |
| **chaostoolkit / LitmusChaos** | Strangler Fig 切換的混沌驗證worker 崩潰、Redis 宕機、PG timeout| CI pipeline | Phase 4 完成後 |
| **awooop-ctl** | AwoooP CLIcontract CRUD / run 查詢 / migration state 管理)| 本地 CLI + CI | Phase 6 之前 |
| **pg_partman** | PostgreSQL partition 自動管理 | K8s Pod / cron | Phase 4run_state 上線前)|
| **pgvector已有** | KM 向量搜索 | 已存在,需 per-project namespace | Phase 2 |
| **OpenTelemetry Collector** | OTel pipelineADR-121現在直送 SignOz 188:24318未來需 sampling | K8s DaemonSet | Phase 4 之前 |
---
## 13. 業界對齊web-researcher 發現)
### 13.1 $47k Agent Loop 事故教訓Token Budget Hard Kill
問題alert ≠ enforcement。僅發 Prometheus alert 但 agent 仍繼續執行,一個 loop 燒了 $47k。
AwoooP 解法ADR-120
- 三層 budget limitper-run / per-project / per-tenant
- **Hard Kill**:超額 → 直接終止 runnot just log/alert
- Redis hot counter每次 call 減少)+ PG budget_ledger 事務final decision
- `AWOOOP_BUDGET_HARD_KILL` feature flag 預設 ON唯一預設開啟的 flag
### 13.2 Durable Execution / SAGA 補償交易ADR-119
業界標準Temporal / Conductor / Azure Durable Functionsmulti-step tool chain 必須有 step-level journal + 補償機制。
AwoooP 解法:
- `saga_steps` JSONB 欄位在 `awooop_run_state`
- 每個 tool call 記錄step_id / tool / status / compensation_cmd / completed_at
- 失敗時執行補償指令(反向操作)
- 補償失敗 → E-SAGA-001 + Runbook RB-08
### 13.3 MCP OAuth 2.1 Confused DeputyADR-117
MCP spec 2025-06-18 要求:
- per-tenant dynamic client registrationRFC 7591
- Resource IndicatorsRFC 9728防止 token 被跨 resource server 使用
- PKCERFC 7636防止 authorization code interception
AwoooP 解法ADR-117
- 每個 tenant 動態 client registration不共用 client_id
- Resource Indicator 必須匹配 tool registry 的 target URI
- `E-MCP-GATE-001/002/003` error codes 覆蓋 Confused Deputy 情境
### 13.4 OTel GenAI Semantic ConventionsADR-121
官方規範opentelemetry-specification/semantic_conventions/gen-ai
- span 命名:`gen_ai.{system}.{operation}`e.g., `gen_ai.anthropic.chat`
- token attribute`gen_ai.usage.input_tokens` / `gen_ai.usage.output_tokens`
- model attribute`gen_ai.request.model` / `gen_ai.response.model`
AwoooP 解法:全部 LLM call 必須 emit 以上 attribute進 SignOz188:24318
### 13.5 OWASP Agentic AI Top 10 對齊ADR-122
| OWASP 項目 | AwoooP 對應控制 |
|-----------|---------------|
| OAI-01 Prompt Injection | MCP Gateway result sanitization + schema validator |
| OAI-02 Insecure Tool Use | Five-gate MCP enforcement + audit |
| OAI-03 Excessive Agency | requires_approval from policy禁 LLM 決定)+ write/execute feature flag |
| OAI-04 Supply Chain | contract publish HMAC + artifact SHA-256 |
| OAI-05 Data Leakage | audit log redaction + credential isolation |
| OAI-06 Insufficient Observability | OTel GenAI + audit sink + run trace_id |
| OAI-07 Unsafe Orchestration | SAGA journal + compensation + hard kill |
| OAI-08 Memory Vulnerabilities | contract revision immutability + RLS |
| OAI-09 Access Control Bypass | approval_token HS256 + jti replay prevention |
| OAI-10 Resource Exhaustion | Token Budget Hard KillADR-120|
---
## 14. GCP Ollama 拓撲對 AwoooP 的影響ADR-110 整合)
### 14.1 新拓撲ADR-110 + ADR-1252026-05-05 修正)
```
Phase 0 bridge:
Primary : GCP-A http://192.168.0.110:11435 110 nginx → GCP public IP
Secondary: GCP-B http://192.168.0.110:11436
Fallback : Local http://192.168.0.111:11434
Emergency: Gemini → Nemotron → Claude (全 Ollama 掛時budget gated
Target private mesh:
Primary : GCP-A http://10.77.114.21:11434
Secondary: GCP-B http://10.77.114.22:11434
Fallback : Local http://10.77.114.111:11434
```
ADR-125 修正 ADR-110 的傳輸層:公網 GCP IP / 110 nginx proxy 僅保留為
過渡與 rollback bridge。正式路徑是 WireGuard private meshruntime 路由由
AwoooP Inference Gateway 管理。
### 14.2 AwoooP 必須處理的影響項目
| 影響項 | 位置 | 處理方式 | Phase |
|--------|------|---------|-------|
| `ollama:current_primary` Redis key 雙寫(只支援 1 個 URL新需要 3 層)| INV-1 | 改為 `platform:ollama:topology`JSONprimary/secondary/fallback| Phase 2 |
| `ollama_auto_recovery.py:230` 第二定義P0-11| ollama_auto_recovery.py | 移除,統一從 config 讀 | Phase 2 PR-03 |
| GCP public IP 進 INV-434.143.170.20, 34.21.145.224| INV-4 | 標為 transitional only正式改用 `10.77.114.21/22` mesh IP | Phase 0 INV-4 |
| WireGuard mesh | ADR-125 / runbook | 建立 `10.77.114.0/24` private transport關閉 public 11434 | Phase 2 前置 |
| AwoooP Inference Gateway | ADR-125 / runbook | alert-fast / code-review / embedding / deep-rca lane 隔離,避免重模型搶告警 lane | Phase 4 |
| EwoooC Provider Proxy 走 GCP Ollama 路由 | Phase 6 | EwoooC 共用 platform Ollama topologyplatform_resource| Phase 6 |
| `telemetry.py:71` IP assertP0-08| telemetry.py:71 | 移除後GCP IP 不再觸發 assert改為 config-driven | Phase 2 PR-01 |
| budget_ledger 記錄 Ollama usage免費 GCP 仍需 token 計數)| Phase 4 | Ollama call 也必須記錄 token 消耗budget_ledger| Phase 4 |
| Runbook RB-07GCP Ollama failover 異常)| docs/runbooks/ | Phase 0 寫 RunbookPhase 4 後實際 E2E 測試 | Phase 0 |
### 14.3 Ollama GCP 為 platform_resourceADR-111
GCP Ollamabridge: 34.143.170.20 / 34.21.145.224target mesh:
10.77.114.21 / 10.77.114.22)與 Local Ollama192.168.0.111 / target
10.77.114.111)一律聲明為 `platform_resource`
- 不屬於任何 tenant
- 所有 tenantAWOOOI / EwoooC / Tsenyang / Bitan共用但 audit 記錄各自 project_id
- `platform:ollama:topology` Redis key 前綴為 `platform:`(非 `{project_id}:`
### 14.4 實測限制2026-05-05
`scripts/ops/ollama-topology-check.sh` 實測:
- GCP-A `gemma3:4b` 約 2s`size_vram=0`
- GCP-B `gemma3:4b` 約 8.5s,但 `size_vram=0`
- 111 fallback `gemma3:4b` 約 4.9s`size_vram=8210446336`
結論GCP-A/B 可以作為同步 `alert-fast` lane但不可承擔 14B/32B 同步告警診斷。
重模型需由 Inference Gateway 分流到 async / 111 / GPU 節點。
---
## 15. 工作排序總表(含並行群組 + Critical Path
### Critical Path序列執行不可跳
```
Phase 0 全部 ADR/INV
→ Phase 1 SchemaPR-01/02/03/04/05 可並行先做)
→ Phase 2 Security Hardening + Redis 遷移PR-06~11
→ Phase 3 Contract Packages
→ Phase 4 Platform ShellPgBouncer + OPA/pg_partman 同步準備)
→ Phase 5 MCP Gateway
→ Phase 6 EwoooC14天 shadow gate
→ Phase 7 Channel Hub7天 canary gate
→ Phase 8 Suggest + Write30天 suggest gate
```
### 可並行工作群組
| 群組 | 工作 | 可與哪個並行 |
|------|------|-----------|
| G-APhase 0 並行)| ADR-111115 各自獨立 | 全部並行5 份 ADR 各分配一位)|
| G-BPhase 0 並行)| ADR-116124 | 與 G-A 並行 |
| G-CPhase 0 並行)| INV-1INV-9部分依賴 codebase 探索)| 與 G-A/G-B 並行 |
| G-DPhase 2 並行)| PR-01/02/03/04/05獨立小修補| 全部並行 |
| G-EPhase 2 並行)| Redis 雙寫 + repository 改造 + security hardening | 各自獨立,但 security hardening 優先 |
| G-FPhase 4 並行)| PgBouncer 安裝 + pg_partman 安裝 + OPA 安裝 | 與 Phase 3 Contract Packages 並行 |
| G-GPhase 5-6 並行)| Operator Console prototypeADR-UI-01~04| 與 Phase 6 EwoooC shadow 並行 |
### 完整排序表
| 順序 | 工作 | docs-only | 並行群組 | 阻擋誰 |
|------|------|-----------|---------|-------|
| 1-A | ADR-111 Bootstrap Order | ✅ | G-A | Phase 2 |
| 1-B | ADR-112 Contract Governance | ✅ | G-A | Phase 3 |
| 1-C | ADR-113 Active Revision Outbox | ✅ | G-A | Phase 1 |
| 1-D | ADR-114 Idempotency & Worker Lease | ✅ | G-A | Phase 4 |
| 1-E | ADR-115 Principal Mapping | ✅ | G-A | Phase 6、7 |
| 2-A | ADR-116 Security Hardening | ✅ | G-B | Phase 2 |
| 2-B | ADR-117 MCP OAuth 2.1 | ✅ | G-B | Phase 5 |
| 2-C | ADR-118 RLS Strategy | ✅ | G-B | Phase 1 |
| 2-D | ADR-119 Durable Execution SAGA | ✅ | G-B | Phase 4 |
| 2-E | ADR-120 Token Budget Hard Kill | ✅ | G-B | Phase 4 |
| 2-F | ADR-121 OTel GenAI | ✅ | G-B | Phase 4 |
| 2-G | ADR-122 OWASP Agentic AI | ✅ | G-B | 全 Phase |
| 2-H | ADR-123 Background Loop Migration | ✅ | G-B | Phase 2 |
| 2-I | ADR-124 Global Singleton Decomposition | ✅ | G-B | Phase 2 |
| 2-J | ADR-UI-01~04 Operator Console ADR | ✅ | G-B | Phase 6+ |
| 2-K | ADR-106 補 Quantified Gates | ✅ | G-B | Phase 8 |
| 3-A | INV-1 Redis Keys | ✅ | G-C | Phase 2 |
| 3-B | INV-2 Repository Retrofit Map | ✅ | G-C | Phase 2 |
| 3-C | INV-3 Entrypoints | ✅ | G-C | Phase 2 |
| 3-D | INV-4 Hardcoded Namespace/IP含 GCP IP| ✅ | G-C | Phase 2 |
| 3-E | INV-5 Migration Compatibility Matrix | ✅ | G-C | Phase 1 |
| 3-F | INV-6 Rollback Playbook Register | ✅ | G-C | Phase 4 |
| 3-G | INV-7 PR Cutting Plan | ✅ | G-C | Phase 2 |
| 3-H | INV-8 Background Loop Catalog31 個)| ✅ | G-C | Phase 2 |
| 3-I | INV-9 Global Singleton Catalog13 個)| ✅ | G-C | Phase 2 |
| 4 | Task 9 順序修正Dockerfile/ConfigMap| ❌ | — | Phase 1 |
| 5 | **Phase 1 Schema Migration**(重寫版)| ❌ | — | Phase 2~8 |
| 6-A | PR-01/02/03/04/05並行小修補| ❌ | G-D | Phase 2 |
| 6-B | **Phase 2 Security Hardening**PR-07 優先)| ❌ | G-E | Phase 4 |
| 6-C | Phase 2 Redis 雙寫 + RepositoryPR-06/08/09/10/11| ❌ | G-E | Phase 4 |
| 7 | **Phase 3 Contract Packages**packages/awooop-contracts/| ❌ | — | Phase 4 |
| 8-A | PgBouncer + pg_partman + OPA 安裝 | ❌ | G-F | Phase 4 |
| 8-B | **Phase 4 Platform Shell + Shadow**(含 SAGA + Budget Kill| ❌ | — | Phase 5 |
| 9 | **Phase 5 MCP Gateway**(含 OAuth 2.1| ❌ | — | Phase 6 |
| 10-A | **Phase 6 EwoooC Shadow Onboarding**14 天 gate| ❌ | G-G | Phase 7 |
| 10-B | Operator Console prototypeG-G| ❌ | G-G | Phase 7+ |
| 11 | **Phase 7 Channel Hub**7 天 canary gate| ❌ | — | Phase 8 |
| 12 | **Phase 8 Suggest + Controlled Write**30 天 gate| ❌ | — | AwoooP v1 GA |
**1-A 到 3-I 全部 docs-only可在當前對話視窗連續完成完成後才開新 Codex 對話進 Phase 1 code。**
---
## 16. 量化驗收門檻(完整版)
### Strangler Fig Gates
| 切換 | 量化條件 | 簽核 |
|------|---------|------|
| pre → shadow | tenant 已建 + agent contract published + audit/trace 寫入正常 | critic 確認 |
| shadow → canary | ≥14 天 + decision divergence < 5% + p95 退化 < 10% + 0 P0/P1 incident + audit 0 secret | critic + db-expert + vuln-verifier |
| canary → read_only | ≥7 天 + user-visible error rate < 0.5% + cost diff < 50% 預算 | critic + vuln-verifier |
| read_only → suggest | ≥14 天 + suggest accept rate ≥ 50% + 0 hallucination escalation | critic |
| suggest → auto_remediate | ≥30 天 + rollback evidence ≥ 3 成功 + approval token live + dry-run pass ≥ 99% | critic + db-expert + vuln-verifier |
### Phase 驗收門檻(量化補強)
| Phase | 必要量化指標 |
|-------|-----------|
| Phase 1 | migration up/down dry-run 通過RLS cross-project 拒絕率 100%AWOOOI 0 行為改動regression pass rate 100%|
| Phase 2 | INV-1 P0 key 遷移完成率 100%vuln-verifier PoC 通過率 3/3hardcode grep 結果 0 |
| Phase 3 | contract schema 覆蓋率 100%6 個 familyinvalid fixture 拒絕率 100% |
| Phase 4 | shadow run 0 user-visible responseduplicate event 唯一 run rate 100%stale reaper 1min 內回收率 100% |
| Phase 5 | credential leak test 通過率 100%Five-gate integration test 覆蓋率 100% |
| Phase 6 | cross-tenant data access 拒絕率 100%EwoooC shadow 14天 gate 通過 |
| Phase 7 | 首則進度訊息 ≤ 30s 達成率 ≥ 99%duplicate retry 0 重複 run |
| Phase 8 | approval replay 拒絕率 100%write/execute 預設 OFF 驗證通過 |
---
## 17. 關聯文件索引
- [ADR-106: AwoooP 架構](../adr/ADR-106-agent-platform-architecture.md)
- [ADR-107: 控制面儲存策略](../adr/ADR-107-awooop-control-plane-storage.md)
- [ADR-110: GCP Ollama 三層容災拓撲](../adr/ADR-110-gcp-ollama-topology.md)
- [MASTER-WORKPLAN.md](MASTER-WORKPLAN.md)(本文展開的主索引)
- [IMPLEMENTATION-ROADMAP.md](IMPLEMENTATION-ROADMAP.md)(歷史文件,舊版草稿)
- 待建:`docs/awooop/inventory/` INV-1INV-9
- 待建ADR-111ADR-124AwoooP 專用 ADR 系列)
- 待建ADR-UI-01ADR-UI-04Operator Console ADR
- 待建:`docs/runbooks/` RB-01RB-08
---
*最後更新2026-05-03台北時區*
*建立12-Agent 聯合審查 × Codex 整合*
*下一步Phase 0 docs-only 工作ADR-111 起),完成後開新 Codex 對話進 Phase 1 code*