# AwoooP 完整詳細實施計畫 **版本**:v1.0(12-Agent 全景審查後整合版) **日期**:2026-05-03(台北時區) **建立者**:12-Agent 聯合審查 × Codex 整合 **基礎文件**:MASTER-WORKPLAN.md、ADR-106、ADR-107 **⚠️ ADR 編號修正**:ADR-108/109/110 已被其他 ADR 占用 → AwoooP 專用 ADR 從 ADR-111 開始 > 本文件是 MASTER-WORKPLAN.md 的完整展開版。 > MASTER-WORKPLAN 是主索引,本文是執行細節。 > 任何矛盾以本文為準(本文更新日期更晚)。 --- ## 0. 全景背景 ### 0.1 基礎架構現況(截至 2026-05-03) | 組件 | 現況 | 備註 | |------|------|------| | Ollama Primary | GCP-A `34.143.170.20:11434`(SSD)| ADR-110,取代 ADR-105 | | Ollama Secondary | GCP-B `34.21.145.224:11434`(SSD)| 新增,2026-05-03 上線 | | Ollama Fallback | Local `192.168.0.111:11434`(HDD)| 最後防線,非 Primary | | PostgreSQL | `192.168.0.188`(私網)| AwoooP 控制面唯一 source of truth | | Redis | `192.168.0.188`(私網)| cache/watch/counter only(ADR-107 D4)| | K3s 叢集 | `awoooi-prod` namespace | AWOOOI first tenant | | Gitea CI/CD | `192.168.0.110`(或 Gitea Cloud)| ADR-039,所有 build 從 Gitea | ### 0.2 12-Agent 審查發現彙整 原始 MASTER-WORKPLAN 有 24 項共識問題。12 位 Agent 並行深度審查後新增: | Agent | 新增 P0/P1 問題數 | 新增 ADR 需求 | 新增 Inventory | |-------|-----------------|--------------|----------------| | critic | 10 | 1(ADR-116 Migration Discipline)| INV-5、INV-6、INV-7 | | vuln-verifier | 8(含 PoC 確認 3 個)| 2(ADR-116/117 安全系列)| — | | debugger | 12(故障情境)| — | 8 份 Runbook | | db-expert | 8(表設計缺陷)+ RLS 完全空白 | 1(ADR-118 RLS 策略)| — | | planner | 7 粒度過粗 + 10 acceptance 不閉環 | — | — | | fullstack-engineer | 7 API endpoint 缺失 + 9 error code | — | — | | frontend-designer | 8 UI 模組完全缺失 | ADR-UI-01~04 | — | | refactor-specialist | 8 重構地雷 + 11 PR 方案 | — | — | | migration-engineer | 7 相容性風險 | — | version matrix | | onboarder | 31 background loop(vs 估計 ~10)+ 13 模組衝突 | — | INV-8 | | tool-expert | 8 工具容量不足 + 8 工具缺失 | — | — | | web-researcher | 業界 5 大對齊缺口(SAGA/Token Kill/MCP OAuth 2.1/OTel/OWASP)| 5(ADR-119~123)| — | | **合計新增** | **~70 個問題** | **~12 份 ADR** | **~4 份 Inventory** | **結論:不先補完 Pre-flight Audit,Phase 1 必爆。** --- ## 1. 完整問題清單(P0 優先順序) ### P0 — 直接爆炸(必須在 Phase 1 之前修補) | # | 問題 | 來源 | 影響範圍 | |---|------|------|---------| | P0-01 | Redis key 直接改名無雙寫期(費用計數歸零、Telegram 409、silence 失效、Ollama failover 三層拓撲雙寫不到)| critic | 費用、告警、Ollama | | P0-02 | Migration SQL 表名錯(`incident_records` / `mcp_audit_snapshots`)、無 rollback、ORM 1.x vs 2.x | critic | Phase 1 migration | | P0-03 | `project_id` / `tenant_id` 在 codebase 0 命中,30+ 業務表無此欄 | onboarder | 全系統 | | P0-04 | `requires_approval` 欄位由 LLM output 決定(security_interceptor.py:451-490)| vuln-verifier(PoC 確認)| approval 鏈 | | P0-05 | callback nonce 偽造:server nonce 邏輯可不知 secret 構造通過驗證(security_interceptor.py:451-490)| vuln-verifier(PoC 確認)| Telegram approval | | P0-06 | Webhook HMAC replay 無 timestamp/nonce(webhooks.py:679-728)| vuln-verifier(PoC 確認)| 所有 webhook | | P0-07 | 31 個 background loop 全無 project_id(main.py)| onboarder(實測)| 多租戶全崩 | | P0-08 | `telemetry.py:71` 硬碼 `if "192.168.0.188" not in endpoint: raise`,EwoooC 啟動必失敗 | onboarder | EwoooC Phase 6 | | P0-09 | `project_migration_state` 表缺失,Strangler Fig 無資料載體 | db-expert | Phase 1 | | P0-10 | Task 9 順序倒置(agent prompt 載入點在 ConfigMap 前)→ 全回 None | critic | Phase 1 任何 agent | | P0-11 | `ollama:current_primary` 在 `ollama_auto_recovery.py:230` 有第二定義,三層拓撲遷移必裂 | onboarder | GCP Ollama 拓撲 | | P0-12 | `consensus_engine.py` 中 `CONSENSUS_PREFIX="consensus:"` 無 project 前綴,multi-tenant 時跨 tenant 共用 | onboarder | 多租戶一致性 | | P0-13 | `mcp_bridge.py:592-681` kubectl 呼叫硬碼 `namespace="awoooi-prod"` | onboarder | EwoooC K8s tool | ### P1 — 嚴重缺陷(Phase 2-4 之前必修) | # | 問題 | 來源 | 影響範圍 | |---|------|------|---------| | P1-01 | AWOOOI Bootstrap Paradox:cron/job/healthcheck 全無 project_id | critic | 多租戶啟動 | | P1-02 | EwoooC 接入零技術路徑(非只改 `OLLAMA_API_BASE`)| critic | Phase 6 | | P1-03 | Strangler Fig shadow→canary→active 無量化 gate 條件 | planner | 切換決策 | | P1-04 | Layer 3 redaction 零實作(helper 有但無 enforcement)| critic | 資訊安全 | | P1-05 | `_provider` 屬性 public,可繞過 audit(mcp/registry.py:24-71)| critic | MCP 安全 | | P1-06 | `WAITING_APPROVAL` resume 不驗 caller identity,無 approval_token 簽章 | critic | approval 安全 | | P1-07 | Redis approval state 單點,無 PG sync | critic | approval 可靠性 | | P1-08 | Audit log 本身會洩密(redaction 必須做在 audit sink 前)| critic | 資訊安全 | | P1-09 | `sanitization_service.py` helper 無 enforcement point(MCP Gateway / AgentToolExecutor 都沒用)| critic | tool 安全 | | P1-10 | Active revision 切換無 transactional outbox,worker 可能吃舊 policy | db-expert | policy 一致性 | | P1-11 | Run/Channel idempotency 缺 key derivation 規則與 unique index | db-expert | 重複執行 | | P1-12 | Async worker 缺 lease / heartbeat / stale reaper | db-expert | worker 可靠性 | | P1-13 | 高流量表 partition + retention 需 Phase 1 就決定(不能後補)| db-expert | 長期可擴展 | | P1-14 | Observability metrics label cardinality(run_id/trace_id/session_id 禁進 metrics)| fullstack | Prometheus | | P1-15 | `multi_sig_redis.py:178-205` approval flow 零 trace_id | debugger | 故障排查 | | P1-16 | `hermes/nl_gateway.py:7,146,163` Redis key 無 project 前綴 | onboarder | Hermes 隔離 | | P1-17 | `anomaly_counter.py:790` AnomalyCounter 全域單例,6 個 prefix 無 tenant 隔離 | onboarder | 多租戶計數 | | P1-18 | `incident_service.py:603-615` `SCAN incident:*` 無 project_id | onboarder | Redis 資料隔離 | | P1-19 | Contract publish 權限與簽章未定義 | critic | contract 治理 | | P1-20 | 13 個全域單例跨 tenant 共用(TrustEngine/ProviderRegistry/TelegramGateway/等)| onboarder | 多租戶隔離 | | P1-21 | Token Budget 無 Hard Kill($47k agent loop 事故教訓)| web-researcher | 費用控管 | | P1-22 | RLS(Row Level Security)完全空白 | db-expert | DB 多租戶 | | P1-23 | GCP Ollama 三層拓撲 Redis key 雙寫遷移未規劃(`ollama:current_primary` 舊 key 只知道 1 個 host)| critic | Ollama failover | | P1-24 | `decision_manager.py:240` 硬碼 `telegram_silence:{target}` 未 import gateway 常數(跨兩處定義)| debugger | silence 功能 | ### P2 — 設計缺口(Phase 5-8 之前必補) | # | 問題 | 來源 | 影響範圍 | |---|------|------|---------| | P2-01 | Telegram/LINE/Slack/API/Internal 缺 canonical principal mapping | critic | 身份統一 | | P2-02 | Run FSM 零實作(只有表設計,無狀態機程式碼)| fullstack | Phase 4 | | P2-03 | EwoooC Provider Proxy 不能只改 URL,需要完整 envelope+audit 入口 | critic | Phase 6 | | P2-04 | 業界 Durable Execution / SAGA 補償交易機制缺失 | web-researcher | 長時 agent tool chain | | P2-05 | MCP OAuth 2.1(RFC 9728 + RFC 7591)Confused Deputy 無防護 | web-researcher | MCP 安全 | | P2-06 | OTel GenAI Semantic Conventions(span 命名 / attribute 規範)未對齊 | web-researcher | 可觀測性 | | P2-07 | OWASP Agentic AI Top 10 對齊缺口(prompt injection、tool misuse 等 7 項)| web-researcher | AI 安全 | | P2-08 | ISO 42001 AI 管理體系對齊文件缺失 | web-researcher | 合規 | | P2-09 | 7 個 API endpoint 缺失(見 §6 fullstack 清單)| fullstack | API 完整性 | | P2-10 | 9 個 error code 缺失(見 §7 error code 字典)| fullstack | 客戶端解析 | | P2-11 | Progressive feedback policy(async run 無進度通知 ≤30s)| fullstack | UX | | P2-12 | 8 個 Operator Console UI 模組完全缺失(見 §8 frontend)| frontend-designer | 運營可見性 | | P2-13 | `awooop-ctl` CLI 工具缺失(現有 kubectl + curl 手動操作)| tool-expert | 運維體驗 | | P2-14 | OPA/Cedar policy engine 缺失(現在 contract 授權邏輯散落程式碼)| tool-expert | 授權集中化 | | P2-15 | chaostoolkit / LitmusChaos 缺失(Strangler Fig 切換無混沌驗證)| tool-expert | 容災驗證 | | P2-16 | PgBouncer 缺失(AwoooP 多 worker 下 PG connection pool 會爆)| tool-expert | DB 可擴展性 | --- ## 2. Pre-flight Audit — Phase 0 完整清單 > Phase 0 全部 docs-only。無任何 runtime code 變動。 > 完成後才開新 Codex 對話進 Phase 1 code。 ### 2.1 AwoooP 核心 ADR(ADR-111~115) **注意:ADR-108/109/110 已被 incident fingerprint / telegram dedup / GCP Ollama 拓撲占用,AwoooP 從 ADR-111 起。** | ADR | 主題 | 解決問題 | 主要內容 | |-----|------|---------|---------| | **ADR-111** | AwoooP Bootstrap Order & Identity Paradox | P0-07、P0-01、P1-01 | `platform_internal` / `requires_project_id` / `legacy_awoooi_default` 三種標記;31 個 background loop 分類;AWOOOI cron/job 過渡豁免時程;Ollama GCP 三層 failover 的 platform_resource 聲明 | | **ADR-112** | Contract Governance & Publishing Workflow | P1-19 | 誰可 publish / activate;CODEOWNERS;HMAC 簽章;approval workflow;activation audit;draft 與 published 隔離 | | **ADR-113** | Active Revision Invalidation & Outbox | P1-10 | `awooop_contract_outbox` 表設計;Redis pub/sub 通知;worker revision-aware cache;split-brain 防禦;GCP Ollama 拓撲切換事件 | | **ADR-114** | Idempotency, Worker Lease & Run Recovery | P1-11、P1-12 | channel event dedupe;`(project_id, channel_type, provider_event_id)` unique;worker `lease_until` / `heartbeat_at` / `attempt_count`;stale run reaper;SKIP LOCKED | | **ADR-115** | Canonical Principal Mapping & Tenant Onboarding | P2-01、P0-08 | Telegram/LINE/Slack/API/Internal → `platform_subject` 統一映射;EwoooC Proxy Adapter;Tsenyang/Bitan 模板;`telemetry.py:71` IP assert 修正方案 | ### 2.2 安全強化 ADR | ADR | 主題 | 解決問題 | 主要內容 | |-----|------|---------|---------| | **ADR-116** | AwoooP Security Hardening | P0-04、P0-05、P0-06 | callback nonce 重設計(server_secret 必參與 HMAC);webhook 加 timestamp/nonce 防 replay;`requires_approval` 改為 policy-derived(禁止 LLM 決定);approval_token signing 規格(HS256,15min TTL,`jti` 唯一性)| | **ADR-117** | MCP OAuth 2.1 & Confused Deputy Prevention | P2-05 | RFC 9728 Resource Indicators;RFC 7591 Dynamic Client Registration;per-tenant token scope;Confused Deputy 防護設計;MCP Server binding PKCE flow | ### 2.3 資料庫強化 ADR | ADR | 主題 | 解決問題 | 主要內容 | |-----|------|---------|---------| | **ADR-118** | Row-Level Security & Tenant DB Isolation | P1-22 | 所有 AwoooP 表啟用 RLS;`current_setting('app.project_id')` 注入;RLS bypass role 設計;migration 驗收標準 | | **ADR-119** | Durable Execution & SAGA Compensation | P2-04 | multi-step agent tool chain 的 step-level journal;補償交易觸發條件;checkpoint/resume 設計;與 Phase 4 run state machine 整合 | ### 2.4 可觀測性 & AI 安全 ADR | ADR | 主題 | 解決問題 | 主要內容 | |-----|------|---------|---------| | **ADR-120** | Token Budget Hard Kill | P1-21 | 每 run / 每 project / 每 tenant 三層 budget limit;hard kill(不只 alert);$47k agent loop 事故 RCA;`budget_ledger` 表設計;Redis hot counter + PG 事務 hard stop | | **ADR-121** | OTel GenAI Semantic Conventions Alignment | P2-06 | span 命名規範(`gen_ai.request.*`);token 計數 attribute;LLM provider attribute;與現有 SignOz(188:24318)整合;metrics label cardinality 規則 | | **ADR-122** | OWASP Agentic AI Top 10 & ISO 42001 Alignment | P2-07、P2-08 | Top 10 逐項對應到 AwoooP 控制面;ISO 42001 AI 管理體系必要文件清單;每 Phase 對齊驗收項 | ### 2.5 Migration Discipline ADR | ADR | 主題 | 解決問題 | 主要內容 | |-----|------|---------|---------| | **ADR-123** | Background Loop project_id Migration Strategy | P0-07、P1-01 | 31 個 background loop 分三類(platform_internal / legacy_awoooi_default / requires_project_id);每類遷移策略;regression test 設計;完成標準(main.py 0 個無標記 loop)| | **ADR-124** | Global Singleton Decomposition for Multi-tenancy | P1-20 | 13 個全域單例清單;分解策略(per-project registry / factory pattern);AWOOOI 1.0 → AwoooP 1.0 遷移路徑;不能同時拆的依賴序 | ### 2.6 前端 Operator Console ADR(新增) | ADR | 主題 | 解決問題 | 主要內容 | |-----|------|---------|---------| | **ADR-UI-01** | AwoooP Operator Console 架構 | P2-12 | 8 個 UI 模組規格;與現有 `apps/web/` 整合方式;多租戶視角設計;i18n(next-intl)規範 | | **ADR-UI-02** | Contract Lifecycle UI | P2-12 | draft → publish → activate 操作流程;revision diff 視覺化;contract family 篩選 | | **ADR-UI-03** | Run State & Shadow Monitoring UI | P2-12 | shadow/canary/active 切換 dashboard;run FSM 視覺化;Strangler Fig gate 量化指標展示 | | **ADR-UI-04** | Tenant Budget & Audit UI | P2-12 | per-project token budget;hard kill 觸發歷史;audit log 查詢(含 redaction 遮蔽)| ### 2.7 ADR-106 補充章節 ADR-106 需新增: - **Strangler Fig Quantified Gates**(量化切換條件) - **GCP Ollama 拓撲影響**(三層 failover 如何成為 `platform_resource`,不屬於任何 tenant) - **Bootstrap Order** 參照 ADR-111 ### 2.8 Inventory 清單(9 份) | Inventory | 位置 | 範圍 | 解決問題 | |-----------|------|------|---------| | **INV-1** | `docs/awooop/inventory/INV-1-redis-keys.md` | 全 codebase grep `redis_client.*\(["']` 等,列出 43+ 個 key、命名空間、TTL、用途、寫入/讀取點、是否硬碼 | P0-01、P1-18 | | **INV-2** | `docs/awooop/inventory/INV-2-repository-project-id-retrofit.md` | 30+ 業務表 × 目前有無 `project_id` × 所有 repository 方法 × 需加 filter 的查詢 × 需 backfill 的歷史資料 | P0-03 | | **INV-3** | `docs/awooop/inventory/INV-3-entrypoints.md` | 所有 cron job / scheduler / webhook / CLI / healthcheck / internal service call,標記三種類型 | P0-07、P1-01 | | **INV-4** | `docs/awooop/inventory/INV-4-hardcoded-namespace-ip.md` | 硬碼 K8s namespace(`awoooi-prod`)、SSH 主機 IP、白名單(**含新 GCP IP:34.143.170.20、34.21.145.224**)| P0-08、P0-13 | | **INV-5** | `docs/awooop/inventory/INV-5-migration-compatibility-matrix.md` | 版本相容矩陣:SQLAlchemy 1.x→2.x / Alembic / Pydantic v1→v2 / FastAPI 0.x / Python 3.10→3.12;每個 breaking change + 影響範圍 | critic | | **INV-6** | `docs/awooop/inventory/INV-6-rollback-playbook-register.md` | 6 個 rollback playbook:Phase 1 schema rollback、Phase 2 Redis key rollback、Phase 5 MCP Gateway rollback、Phase 6 EwoooC rollback、Ollama GCP→Local fallback rollback、approval flow rollback | migration | | **INV-7** | `docs/awooop/inventory/INV-7-pr-cutting-plan.md` | 11 個 PR 切割方案(refactor-specialist 設計):每 PR 的範圍、前置依賴、review 者、合併順序 | refactor | | **INV-8** | `docs/awooop/inventory/INV-8-background-loop-catalog.md` | 31 個 background loop 逐一列出:名稱、位置(main.py 行號)、類別標記、遷移策略、預計完成 Phase | onboarder | | **INV-9** | `docs/awooop/inventory/INV-9-global-singleton-catalog.md` | 13 個全域單例逐一列出:名稱、位置、依賴方、分解策略、遷移風險 | onboarder | ### 2.9 Phase 0 驗收標準 - [ ] ADR-111~115(5 份 AwoooP 核心 ADR)全部 Accepted - [ ] ADR-116~124(9 份強化 ADR)全部 Accepted - [ ] ADR-UI-01~04(4 份 UI ADR)全部 Accepted(或 Proposed + 統帥批准開工) - [ ] ADR-106 補入 Strangler Fig Quantified Gates + GCP Ollama 章節 - [ ] INV-1~INV-9(9 份 Inventory)完成初稿 - [ ] 無任何 runtime code 變動 - [ ] `git diff --check` 通過 --- ## 3. 8-Phase 詳細工作項 > 每項含:目標、範圍(精確路徑)、輸入(前置依賴)、輸出(交付物)、驗收標準、邊界(禁止碰什麼) ### Phase 1 — Control Plane Schema Foundation **目標**:建立 PostgreSQL contract control plane 最小可用骨架,修正舊 SQL migration 三大 blocker,決定高流量表 partition 策略。 **前置依賴**:Phase 0 全部完成(所有 ADR + Inventory) **範圍(精確檔案)**: - `apps/api/migrations/` — 新增 migration files - `apps/api/src/models/` — 新增 AwoooP SQLAlchemy models - `apps/api/src/repositories/` — 新增 AwoooP repositories - `docs/runbooks/` — 新增 partition + retention runbook **禁止碰**: - 任何既有 repository 方法(留給 Phase 2) - provider 行為(`ai_router.py` / `ollama_*.py`) - Telegram/LINE webhook 路徑 - `apps/web/` - 任何 K8s manifest **工作項(順序執行)**: ``` 1.1 表名核對 - grep 確認 `incidents`(非 incident_records) - grep 確認 `mcp_audit_log`(非 mcp_audit_snapshots) - 修正 ORM: SQLAlchemy 2.x mapped_column、補齊 Numeric/UniqueConstraint/func import - 每個 migration 強制有 down migration(rollback SQL) 1.2 Task 9 順序修正(必須 Phase 1.1 之前完成) - Dockerfile: agent_loader default path 指向 ConfigMap mount - ConfigMap 預載: 確認 agent prompt 路徑在 ConfigMap 已存在 - 驗收:dry-run 一個 agent loader,輸出非 None 1.3 AwoooP 控制面表(新增 migration) - awooop_projects(tenant 主表,project_id VARCHAR PK,budget,ACL) - awooop_contract_revisions(六合約共用 revision 表,append-only,見 §4.1 完整欄位) - awooop_active_revisions(active pointer,指向特定 revision_id) - awooop_artifact_refs(prompt/schema/eval ref + sha256 + type) - awooop_project_migration_state(Strangler Fig 階段追蹤,per project × per capability) - awooop_contract_outbox(ADR-113,active revision 切換事件,for worker invalidation) - awooop_channel_event_dedupe(ADR-114,idempotency,唯一鍵) - awooop_platform_subjects(ADR-115,canonical principal mapping) - awooop_budget_ledger(ADR-120,token budget,per project × per period) 1.4 高流量表(在 Phase 4/7 建立時已決定 partition,此時寫規則) - 須在本 Phase migration 中加 partition template comment(不執行,留 Phase 4) - awooop_run_state → range partition by created_at(月) - awooop_channel_event → range partition by created_at(月) - awooop_mcp_gateway_audit → range partition by created_at(月) - awooop_agent_audit_log → range partition by created_at(月) - retention: 90 天 hot + 1 年 warm(pg_partman / cron job) - 寫進 docs/runbooks/awooop-partition-retention.md 1.5 AWOOOI Bootstrap(seed data) - INSERT INTO awooop_projects(project_id='awoooi', display_name='AWOOOI', migration_mode='legacy_awoooi_default') - 驗收:AWOOOI 0 行為改動 1.6 RLS 骨架(ADR-118) - 所有 awooop_* 表啟用 RLS - policy: USING (project_id = current_setting('app.project_id', TRUE)) - bypass role: awooop_platform(只給 platform worker 用) - 注意:RLS 需要 migration + 測試,不只是 ALTER TABLE ENABLE ROW LEVEL SECURITY 1.7 Immutability 測試 - published contract revision 嘗試 UPDATE → 必失敗(trigger 或 check constraint) - draft 與 active 隔離:runtime 讀取 view 不含 draft - 自動化:pytest + db-expert review ``` **RACI**: - R(執行):fullstack-engineer - A(負責):db-expert review,統帥批准 - C(諮詢):refactor-specialist(migration PR 切割)、critic(最終 review) - I(通知):migration-engineer(版本相容驗證) **DoD**: - 所有 migration up/down dry-run 通過 - AWOOOI 可表示為 `project_id=awoooi`,0 行為改動 - RLS 測試:cross-project SELECT 被拒絕 - partition runbook 已建立 --- ### Phase 2 — Tenant Isolation & Namespace Hardening **目標**:在開放任何下游 tenant 之前,把 AWOOOI 自己變成乾淨的 tenant。 **前置**:Phase 1 完成 **範圍**: - `apps/api/src/services/` — Redis key 遷移(依 INV-1) - `apps/api/src/repositories/` — 加 project_id filter(依 INV-2) - `apps/api/src/services/security_interceptor.py` — nonce 修補(P0-05,ADR-116) - `apps/api/src/api/v1/webhooks.py` — replay 防護(P0-06,ADR-116) - `apps/api/src/core/telemetry.py:71` — 移除硬碼 IP assert(P0-08) - `apps/api/src/services/decision_manager.py:240` — silence key 常數化(P1-24) - `apps/api/src/services/ollama_auto_recovery.py:230` — 移除第二定義(P0-11) - `apps/api/src/plugins/mcp/mcp_bridge.py:592-681` — namespace 動態化(P0-13) - `apps/api/src/services/consensus_engine.py` — CONSENSUS_PREFIX 加 project 前綴(P0-12) - `apps/api/src/hermes/nl_gateway.py` — Redis key 加 project 前綴(P1-16) - `apps/api/src/services/anomaly_counter.py:790` — per-project 改造(P1-17) - `apps/api/src/services/incident_service.py:603` — SCAN 加 prefix(P1-18) **禁止碰**: - `awooop_contract_revisions` 以外的 AwoooP Phase 1 新表結構 - EwoooC / Tsenyang 任何接入(留 Phase 6) - 任何 provider routing 改動(Ollama GCP 拓撲已由 ADR-110 定案,不在此 Phase 改) **工作項**: ``` 2.1 Redis 三階段雙寫遷移計畫執行(依 INV-1,分三批) 批次 A(Critical Path,影響 Ollama GCP 拓撲): - ollama:current_primary(舊)→ {project_id}:ollama:primary(新) 注意:要同時支援三層 GCP-A/GCP-B/Local,INV-1 需確認所有寫入點 - ollama_auto_recovery.py:230 第二定義刪除,統一常數 批次 B(費用 + 告警關鍵): - ai_rate:total_cost:gemini → {project_id}:ai_rate:total_cost:gemini - telegram:polling:leader → platform:telegram:polling:leader(platform_resource) - telegram_silence:{target} → {project_id}:telegram_silence:{target} 同步更新 decision_manager.py:240 import gateway 常數 批次 C(working memory): - consensus: → {project_id}:consensus:(consensus_engine.py) - hermes Redis keys(nl_gateway.py) - anomaly_counter 6 個 prefix - incident:* SCAN(incident_service.py:603) 每批次:Phase A(雙寫 30 天)→ Phase B(雙讀 14 天)→ Phase C(移除舊 key) 2.2 Security hardening(ADR-116) - telemetry.py:71:移除 "192.168.0.188" 硬碼 assert,改為 config-driven allowed endpoints - security_interceptor.py:451-490:nonce 重設計,server_secret 必參與 HMAC - webhooks.py:679-728:加 timestamp(±5min window)+ nonce(Redis dedup) - requires_approval:改為從 policy contract 讀取,禁止 LLM output 決定 - approval_token:HS256,15min TTL,jti 唯一性(Redis NX) 2.3 Repository project_id 改造(依 INV-2) - 所有 30+ repository 方法加 project_id filter - K8s namespace 白名單 → tenant-aware(mcp_bridge.py:592-681 動態化) - SSH 主機白名單 → tenant-aware(依 INV-4) 2.4 Background loop 標記(依 ADR-123,INV-3/INV-8) - 31 個 loop 標記為 platform_internal / legacy_awoooi_default / requires_project_id - platform_internal 帶 project_id=__platform__ - legacy_awoooi_default fallback 到 project_id=awoooi,寫退場時程 2.5 Global singleton 分解第一步(依 ADR-124,INV-9) - 只做:AnomalyCounter(P1-17 已修)per-project 改造 - 其餘 13 個全域單例列出退場時程(不在此 Phase 全拆,防爆炸半徑) 2.6 Token Budget Hard Kill 基礎(ADR-120) - budget_ledger 表 migration(Phase 1 已建,此 Phase 寫入邏輯) - 每 LLM call 前:check budget → hard kill if exceeded(不只 log) - Redis hot counter + PG 事務 hard stop ``` **RACI**: - R:fullstack-engineer + refactor-specialist(大量 repository 改動) - A:db-expert(repository 改動 review)、vuln-verifier(security hardening PoC 驗證) - C:critic(整體 diff review)、migration-engineer(相容性確認) - I:tool-expert(K8s namespace 改動相關) **DoD**: - INV-1 所有 P0 key 完成三階段遷移(Phase A 完成,Phase B/C 在觀察期) - cross-project test 全紅(pytest 覆蓋) - `grep -r "awoooi-prod" apps/api/src/` 結果為 0 - `grep -r "192.168.0.188" apps/api/src/` telemetry assert 消失 - vuln-verifier PoC 重跑:P0-05 nonce 偽造失敗、P0-06 webhook replay 失敗 - Budget hard kill 測試:超額後 LLM call 被拒絕 --- ### Phase 3 — Contract Packages & Validators **目標**:六合約從散文升級為可驗證程式。 **前置**:Phase 1 完成(contract_revisions 表存在) **範圍**: - `packages/awooop-contracts/`(此時才建立!) - `apps/api/src/services/contract_service.py`(新建) - `apps/api/src/repositories/contract_repository.py`(新建) **禁止碰**: - 任何既有 provider / router / telegram 路徑 - `apps/web/`(UI 留 Phase 8 之後) **工作項**: ``` 3.1 建立 packages/awooop-contracts/(此時才有真實內容) - 六合約 JSON Schema(Project/Tenant、Agent、MCP Gateway、Policy/Routing、Run State、Channel Event) - Pydantic v2 models 對應六合約 - envelope schema:platform invocation、MCP tool call、run state transition、channel event - golden fixtures(valid × 6 + invalid × 6) 3.2 Contract lifecycle service - draft():建立 draft revision,不可被 runtime 讀 - publish():產生 immutable published revision(body_hash = sha256(body_json)) - activate():更新 active pointer,寫入 contract_outbox(ADR-113) - get_active():runtime 讀取路徑,只返回 published + active - 全部操作記錄 audit log 3.3 Output schema validator middleware - LLM 回傳 → 過 schema validator → 失敗 → retry(上限 3 次)→ 失敗 → error code(E-SCHEMA-001) - 任何 schema 不符的 LLM 輸出無法到達 channel adapter 3.4 Contract governance(ADR-112) - CODEOWNERS 指定 packages/awooop-contracts/ - publish API:HMAC 簽章驗證 - activate API:approval workflow(multi_sig_redis 路徑) 3.5 SHA-256 artifact 驗證 - 所有 artifact ref 含 sha256 - runtime 讀取時驗 hash(與 DB 記錄比對) ``` **DoD**: - schema 不符的 LLM 輸出無法到達 channel adapter(整合測試) - AWOOOI 第一份 Agent contract 可 publish + activate(E2E) - prompt/schema ref 必含 sha256 --- ### Phase 4 — Platform Shell in Shadow Mode **目標**:建立第一個 runtime shell,只跑 shadow,不改 legacy 行為。 **前置**:Phase 3 完成 **範圍**: - `apps/api/src/api/v1/platform/` — 新增 platform runs API - `apps/api/src/services/platform_runtime.py` — 新建 - `apps/api/src/services/run_state_machine.py` — Run FSM 實作(P2-02) - `apps/api/src/workers/platform_worker.py` — 新建 - `apps/api/src/services/audit_sink.py` — 加 redaction(P1-08) **禁止碰**: - 任何既有 `/v1/incidents/`、`/v1/webhooks/` 路徑 - Telegram bot handler(legacy 維持) - EwoooC 接入(留 Phase 6) **工作項**: ``` 4.1 Run API shell(shadow only) - POST /v1/platform/runs - 生成 run_id(UUID v7)、trace_id(W3C traceparent compatible) - 解析 project + agent contract active revision - 解析 EffectivePolicy(6 層合併,不改 provider 行為) 4.2 Run State Machine(ADR-114 + ADR-119) - States: PENDING → RUNNING → WAITING_TOOL → WAITING_APPROVAL → COMPLETED / FAILED / CANCELLED - lease_until、heartbeat_at、attempt_count 欄位 - SKIP LOCKED 取單(防 double-pickup) - stale run reaper(每分鐘掃 expired lease,回到 PENDING 或 FAILED) - SAGA step journal(ADR-119):每個 tool call 寫入 step_id、補償指令 4.3 Idempotency(ADR-114) - (project_id, channel_type, provider_event_id) 複合 unique - 重複事件 return 既有 run_id(不產生新 run) - Redis NX + PG constraint 雙層保護 4.4 Audit log redaction(ADR-116) - audit_sink 寫入前過 sanitization_service pipeline - PII / secret pattern 硬攔(含 GCP IP、PG password、Telegram token 等) - audit log 不記錄 raw LLM input/output,只記 hash + schema validation result 4.5 Observability(ADR-121) - OTel GenAI span 命名(gen_ai.request.*) - token 計數 attribute(gen_ai.usage.prompt_tokens 等) - metrics label:只 project_id / agent_id / status / provider(禁止 run_id/trace_id/session_id 進 metrics) - run_id / trace_id 只進 logs/traces(不進 metrics) 4.6 Shadow mode wiring - 選定 3 個 AWOOOI 事件 mirror 到 shadow(不發 user response) - shadow run 0 destructive tool call(MCP write/execute 全 block) 4.7 Token Budget Hard Kill(ADR-120) - per-run token budget(from EffectivePolicy) - 超額 → hard kill → FAILED state → error code E-BUDGET-001 - 每 run 完成後寫入 budget_ledger(實際消耗) ``` **RACI**: - R:fullstack-engineer(API + service)、db-expert(run state schema review) - A:critic(shadow mode 設計 review)、vuln-verifier(redaction PoC) - C:debugger(trace_id 貫穿設計)、tool-expert(OTel 整合) - I:migration-engineer(worker lease 相容性) **DoD**: - shadow run 0 user-visible response、0 destructive tool call(vuln-verifier 驗證) - legacy AWOOOI 行為 0 改變(回歸測試通過) - worker crash 後 stale run 1 分鐘內被回收(自動化測試) - duplicate event 不產生重複 run(idempotency 測試) - audit log 0 secret 命中(vuln-verifier 抽樣 100 筆) - token budget 超額觸發 hard kill(整合測試) --- ### Phase 5 — MCP Gateway First Slice **目標**:tool 授權搬到 Gateway,read-only 工具先進,解決 sanitization enforcement。 **前置**:Phase 4 完成 **範圍**: - `apps/api/src/plugins/mcp/gateway.py` — 新建 MCP Gateway - `apps/api/src/plugins/mcp/registry.py:24-71` — `_provider` → `__provider`(P1-05) - `apps/api/src/plugins/mcp/mcp_bridge.py` — 接入 Gateway - `apps/api/src/services/sanitization_service.py` — enforcement point(P1-09) **禁止碰**: - MCP write/execute tools(寫/執行工具留 Phase 8) - Telegram approval flow(改動在 Phase 8) **工作項**: ``` 5.1 MCP Gateway 表 - awooop_mcp_tool_registry(tool_id, project_id, agent_id, tool_type, allowed_scopes) - awooop_mcp_grants(grant_id, project_id, agent_id, tool_id, granted_by, expires_at) - awooop_mcp_credential_refs(ref_id, tool_id, k8s_secret_ref, sha256) - awooop_mcp_gateway_audit(call_id, trace_id, run_id, tool_id, credential_ref, latency_ms, result_status) 5.2 Five-gate enforcement - Check: Project AND Agent AND Tool AND Environment AND Approval - 任一不符 → 拒絕 + 記錄 audit + error code E-MCP-GATE-XXX 5.3 Result sanitization enforcement(P1-04、P1-09) - 所有 MCP tool result 必經 sanitization_service pipeline - MCP Gateway 加 sanitization middleware(不允許 raw result 直接進 LLM context) - 進 LLM 前一層 + 進 audit sink 一層(雙層 redaction) - sast 掃描 agent 程式碼路徑:0 raw credential 接觸 5.4 _provider 修正(P1-05) - registry.py: _provider → __provider(雙底線 Python name mangling) - 加 unit test:外部 reflect 取用 → AttributeError 5.5 Credential isolation - agent 程式碼不直接存取 K8s Secret - Gateway 解析 credential_ref → 回傳 masked result(token 替換) - 2026-04-18 secret leak 重演測試:kubectl describe 輸出不出現在 LLM context 5.6 MCP OAuth 2.1(ADR-117) - 實作 per-tenant dynamic client registration(RFC 7591) - Resource Indicators(RFC 9728)防 Confused Deputy - PKCE flow for MCP Server binding ``` **RACI**: - R:fullstack-engineer(Gateway service) - A:vuln-verifier(credential isolation 驗證)、critic(架構 review) - C:tool-expert(MCP spec 確認)、db-expert(Gateway 表設計 review) - I:migration-engineer(MCP registry 相容性) **DoD**: - 2026-04-18 secret leak 重演測試通過(kubectl describe 輸出不出現在 LLM context 或 audit row) - sast 掃描:agent 程式碼路徑 0 raw credential 接觸 - `__provider` 雙底線 unit test 通過 - Five-gate 全部 integration test 覆蓋 --- ### Phase 6 — EwoooC Read-Only Tenant Onboarding **目標**:以真實下游 tenant 驗證 AwoooP,全 read-only。 **前置**:Phase 5 完成、telemetry.py:71 hardcoded IP assert 已移除(Phase 2 完成) **範圍**: - `apps/api/src/` — EwoooC project provisioning - `packages/awooop-contracts/` — EwoooC agent contract - `apps/api/src/services/provider_proxy.py` — 新建 Provider Proxy Adapter(P1-02) **禁止碰**: - AWOOOI 任何既有業務邏輯 - MCP write/execute tools **工作項**: ``` 6.1 EwoooC project provisioning - INSERT INTO awooop_projects(project_id='ewoooc', ...) - 不可讀 AWOOOI data(RLS 驗證) 6.2 openclaw-biz agent contract - 針對市場情報 domain 設計 I/O schema - 安全 ceiling:read-only only,禁止 infra tool 6.3 Provider Proxy Adapter(P1-02,ADR-115) - 不只是改 OLLAMA_API_BASE - Proxy 入口強制注入 envelope:project_id / agent_id / trace_id / run_id - 過 EffectivePolicy + budget guard + audit - GCP Ollama 三層拓撲:EwoooC 走相同 primary/secondary/fallback 路由 - read-only / model-call 入口優先啟用 6.4 Market intelligence MCP tools 註冊 - 4 個 read-only tools:market_data_fetch、product_catalog_query、competitor_analysis、trend_report - 全部在 MCP Gateway 五重 gate 管控 6.5 Shadow → Canary 升級計畫 - 先 14 天 shadow(Strangler Fig gate 量化) - 符合條件後升 canary(selected responses) - canary 通過再升 read_only ``` **RACI**: - R:fullstack-engineer - A:critic(EwoooC 資料隔離 review)、vuln-verifier(cross-tenant isolation PoC) - C:db-expert(RLS 驗證)、migration-engineer(EwoooC rollback playbook,INV-6) - I:tool-expert(GCP Ollama 拓撲 EwoooC 路由設定) **DoD**: - EwoooC SELECT 無法讀到 AWOOOI data(RLS + cross-tenant pytest) - Provider Proxy Adapter E2E 測試:envelope 正確注入 - budget / audit 完全 project-scoped - EwoooC 啟動時 telemetry.py 不再因 IP assert 失敗 --- ### Phase 7 — Communication Hub Increment **目標**:標準化 channel 但不切斷既有 bot。 **前置**:Phase 6 完成 **範圍**: - `apps/api/src/services/channel_hub.py` — 新建 - `apps/api/src/services/telegram_gateway.py` — mirror inbound events - `apps/api/src/api/v1/platform/channel.py` — 新建 **禁止碰**: - 既有 telegram bot handler(維持 legacy 權威,直到 canary 量化 gate 通過) - LINE / Slack 接入(留 v2) **工作項**: ``` 7.1 awooop_conversation_event + awooop_outbound_message 表 - partition by created_at(月,Phase 1 已定策略) - retention policy 配置 7.2 Telegram inbound mirror - 現有 telegram_gateway.py 事件複製到 awooop_conversation_event - canonical principal mapping(ADR-115):所有 sender 寫入 awooop_platform_subjects 7.3 Progressive Feedback Policy(P2-11) - WAITING_TOOL / RUNNING / WAITING_APPROVAL → 必發 Telegram 暫態訊息 - 用 edit_message 更新(非新訊息,不觸發通知) - 首則進度訊息 ≤ 30s 7.4 Idempotency 驗證(已由 Phase 4 完成) - duplicate channel retry 不產生 duplicate run(整合測試) 7.5 Adapter-level 安全 - 所有 channel adapter:escaping + redaction + idempotency + delivery audit - channel adapter 0 LLM 呼叫、0 MCP 呼叫(pytest 覆蓋) 7.6 量化 gate 監控儀表板(配合 ADR-UI-03) - Strangler Fig gate 指標:decision divergence / p95 latency / error rate - 供 Phase 8 升級決策用 ``` **RACI**: - R:fullstack-engineer(API + channel hub) - A:critic(channel 設計 review)、debugger(trace_id 貫穿驗證) - C:frontend-designer(進度訊息 UX)、tool-expert(Telegram API 規格確認) - I:migration-engineer(channel 相容性) **DoD**: - channel adapter 0 LLM 呼叫、0 MCP 呼叫 - async run 首則進度訊息 ≤ 30s - duplicate retry 不產生 duplicate run --- ### Phase 8 — Suggest & Controlled Write Paths **目標**:從 read-only 升級到 propose,再到 controlled execute。 **前置**:Phase 7 完成 + Strangler Fig shadow→canary gate 全通過 **範圍**: - `apps/api/src/services/multi_sig_redis.py` — approval token 簽章(P1-06) - `apps/api/src/services/approval_timeout_resolver.py` — 加 trace_id(P1-15) - `apps/api/src/api/v1/platform/suggest.py` — suggest mode endpoint - Feature flags for write/execute paths **禁止碰**: - 任何 write/execute tool 的預設啟用 - Strangler Fig 量化 gate 通過前不做 auto_remediate **工作項**: ``` 8.1 Approval Token 安全強化(P1-06,ADR-116) - WAITING_APPROVAL resume API:強制驗 approval_token(HS256,15min TTL,jti Redis NX) - approval state:PG 為 source of truth,Redis 為 cache - 過期 / 已決 / 重放 → 全部拒絕 + error code E-APPROVAL-XXX 8.2 multi_sig_redis.py + approval_timeout_resolver.py trace_id 補入 - 所有 approval 操作加 trace_id(P1-15) - 完整鏈路可追蹤(debugger 驗證) 8.3 Suggest mode for AWOOOI SRE flows - 選定低風險 3 個 SRE flow(e.g., 告警靜音建議、playbook 推薦) - suggest 模式:AI 輸出建議,人工決定執行 - 量化 gate(ADR-106 補章): * shadow → canary:≥14 天 + divergence <5% + p95 <10% + 0 P1 incident * canary → read_only:≥7 天 + error rate <0.5% + cost diff <50% * read_only → suggest:≥14 天 + accept rate ≥50% + 0 hallucination escalation * suggest → auto_remediate:≥30 天 + rollback evidence ≥3 次 + approval token live + dry-run ≥99% 8.4 Dry-run 與 rollback evidence gate - 每個 write/execute tool 必須有 dry-run mode - rollback playbook 寫入 INV-6(Phase 0 已完成,此時執行驗證) - 記錄每次 rollback 結果作為 Phase 8 gate evidence 8.5 Feature Flag Registry(見 §10) - suggest mode:feature flag AWOOOP_SUGGEST_MODE(default OFF) - controlled write:feature flag AWOOOP_WRITE_MODE(default OFF) - 需顯式 flip 才啟用,不能環境變數意外帶入 8.6 vuln-verifier PoC 驗收 - WAITING_APPROVAL 無 token resume 必失敗 - Redis 宕機時 approval 仍可從 PG 恢復 ``` **RACI**: - R:fullstack-engineer - A:vuln-verifier(approval security PoC)、critic(write path review) - C:debugger(trace_id 驗證)、db-expert(approval state PG review) - I:migration-engineer(feature flag rollback) **DoD**: - WAITING_APPROVAL 無 token resume 被拒絕(vuln-verifier PoC 通過) - Redis 宕機後 approval 從 PG 恢復(整合測試) - write/execute 預設 OFF,feature flag 手動 flip 才啟用 - 所有 Strangler Fig gate 量化驗收通過(critic + db-expert + vuln-verifier 三方簽核) --- ## 4. 資料庫詳細 Schema ### 4.1 awooop_contract_revisions(六合約共用 revision 表) ```sql CREATE TABLE awooop_contract_revisions ( revision_id UUID PRIMARY KEY DEFAULT gen_random_uuid(), project_id VARCHAR(64) NOT NULL REFERENCES awooop_projects(project_id), contract_family VARCHAR(32) NOT NULL -- project_tenant/agent/mcp_gateway/policy_routing/run_state/channel_event contract_id VARCHAR(128) NOT NULL, version VARCHAR(32) NOT NULL, lifecycle_status VARCHAR(16) NOT NULL DEFAULT 'draft', -- draft/published/superseded/revoked body_json JSONB NOT NULL, body_schema_version VARCHAR(32) NOT NULL, body_hash CHAR(64) NOT NULL, -- SHA-256 hex created_by VARCHAR(128) NOT NULL, created_at TIMESTAMPTZ NOT NULL DEFAULT now(), published_at TIMESTAMPTZ, supersedes_revision_id UUID REFERENCES awooop_contract_revisions(revision_id), -- Immutability constraint CONSTRAINT published_body_immutable CHECK ( lifecycle_status = 'draft' OR body_json IS NOT NULL ) ); -- Runtime reads view(只看 published/active,不看 draft) CREATE VIEW awooop_published_revisions AS SELECT * FROM awooop_contract_revisions WHERE lifecycle_status IN ('published', 'superseded'); -- Append-only trigger CREATE OR REPLACE FUNCTION prevent_revision_update() RETURNS TRIGGER AS $$ BEGIN IF OLD.lifecycle_status != 'draft' THEN RAISE EXCEPTION 'Published contract revision is immutable'; END IF; RETURN NEW; END; $$ LANGUAGE plpgsql; CREATE TRIGGER enforce_revision_immutability BEFORE UPDATE ON awooop_contract_revisions FOR EACH ROW EXECUTE FUNCTION prevent_revision_update(); -- RLS ALTER TABLE awooop_contract_revisions ENABLE ROW LEVEL SECURITY; CREATE POLICY tenant_isolation ON awooop_contract_revisions USING (project_id = current_setting('app.project_id', TRUE) OR current_user = 'awooop_platform'); ``` ### 4.2 awooop_run_state(含 lease + SAGA journal) ```sql CREATE TABLE awooop_run_state ( run_id UUID PRIMARY KEY DEFAULT gen_random_uuid(), project_id VARCHAR(64) NOT NULL, agent_id VARCHAR(128) NOT NULL, trace_id CHAR(32), -- W3C trace_id hex parent_run_id UUID, status VARCHAR(32) NOT NULL DEFAULT 'PENDING', migration_mode VARCHAR(32) NOT NULL DEFAULT 'shadow', -- shadow/canary/read_only/suggest/auto_remediate -- Worker lease lease_until TIMESTAMPTZ, heartbeat_at TIMESTAMPTZ, attempt_count INT NOT NULL DEFAULT 0, worker_id VARCHAR(128), -- Token budget budget_limit_tokens BIGINT, tokens_used BIGINT NOT NULL DEFAULT 0, -- Timestamps created_at TIMESTAMPTZ NOT NULL DEFAULT now(), updated_at TIMESTAMPTZ NOT NULL DEFAULT now(), completed_at TIMESTAMPTZ, -- SAGA journal(step-level) saga_steps JSONB DEFAULT '[]', -- [{step_id, tool, status, compensation_cmd, completed_at}] -- Metadata input_hash CHAR(64), -- SHA-256 of input envelope(for audit) effective_policy_revision_id UUID ) PARTITION BY RANGE (created_at); -- Per-project RLS ALTER TABLE awooop_run_state ENABLE ROW LEVEL SECURITY; CREATE POLICY tenant_isolation ON awooop_run_state USING (project_id = current_setting('app.project_id', TRUE) OR current_user = 'awooop_platform'); ``` ### 4.3 awooop_budget_ledger(Token Budget Hard Kill) ```sql CREATE TABLE awooop_budget_ledger ( ledger_id UUID PRIMARY KEY DEFAULT gen_random_uuid(), project_id VARCHAR(64) NOT NULL, period DATE NOT NULL, -- YYYY-MM-DD(月份第一天) provider VARCHAR(32) NOT NULL, tokens_input BIGINT NOT NULL DEFAULT 0, tokens_output BIGINT NOT NULL DEFAULT 0, cost_usd NUMERIC(12, 6) NOT NULL DEFAULT 0, hard_kill_at NUMERIC(12, 6), -- NULL = no limit hard_killed BOOLEAN NOT NULL DEFAULT FALSE, last_run_id UUID, updated_at TIMESTAMPTZ NOT NULL DEFAULT now(), UNIQUE(project_id, period, provider) ); ``` ### 4.4 8 群新增/擴充表清單(db-expert 發現) | 表名 | 缺失欄位 / 缺失 Index | Phase | |------|----------------------|-------| | `incidents` | 加 `project_id`、`trace_id`、`awooop_run_id` | Phase 2 | | `playbooks` | 加 `project_id`、`agent_id` | Phase 2 | | `km_entries` | 加 `project_id`、`namespace` | Phase 2 | | `mcp_audit_log` | 加 `trace_id`、`run_id`、`project_id`;加 index on (run_id) | Phase 2 | | `ai_decisions` | 加 `project_id`、`run_id`、加 index on (run_id) | Phase 2 | | `approval_records` | 加 `trace_id`、`approval_token_jti`、加 index on (jti) | Phase 2/8 | | `telegram_events` | 加 `project_id`、`platform_subject_id` | Phase 7 | | `ollama_health_checks` | 加 `host_tier`(gcp_a/gcp_b/local)、`project_id=__platform__` | Phase 2 | --- ## 5. 安全修補計畫(vuln-verifier 驗收) ### 5.1 PoC 確認的三個漏洞 | 漏洞 | 位置 | PoC 狀態 | 修補方案 | Phase | |------|------|---------|---------|-------| | Nonce 偽造(server nonce 不依賴 server_secret)| security_interceptor.py:451-490 | **PoC 確認可通過驗證** | HMAC(server_secret + nonce),server_secret 從 K8s Secret 注入 | Phase 2 | | Webhook replay(無 timestamp/nonce)| webhooks.py:679-728 | **PoC 確認可 replay** | 加 timestamp(±5min)+ nonce Redis NX | Phase 2 | | requires_approval 由 LLM output 決定 | decision_manager.py(approval 鏈)| **PoC 確認可繞過** | policy contract 決定,禁止 LLM output 影響 | Phase 2 | ### 5.2 approval_token 規格 ``` 簽章算法:HS256 Payload: - jti: UUID(唯一性,Redis NX 15min TTL) - iss: "awooop-platform" - sub: "{project_id}:{run_id}" - aud: "awooop-approval" - exp: now + 15min - approval_type: "human" | "system" - decision_scope: [tool_id, ...] 驗證: 1. 簽章驗證 2. exp 未過期 3. Redis NX 確認 jti 未使用(防 replay) 4. sub 與 resume 的 run_id 吻合 5. decision_scope 與 run 的 tool 吻合 ``` ### 5.3 vuln-verifier 每 Phase 驗收清單 - Phase 2:nonce 偽造失敗、webhook replay 失敗、requires_approval 無法由 LLM 決定 - Phase 4:audit log 0 secret 命中(抽樣 100 筆) - Phase 5:agent 程式碼路徑 0 raw credential(sast) - Phase 6:cross-tenant isolation PoC(EwoooC 無法讀 AWOOOI) - Phase 8:approval token 無 token resume 被拒、Redis 宕機後從 PG 恢復 --- ## 6. API Endpoint 完整清單(fullstack 補充) ### 6.1 現有(不動) - `POST /v1/webhooks/telegram` - `POST /v1/webhooks/alertmanager` - `GET /v1/incidents/` - `POST /v1/decisions/` ### 6.2 Phase 4 新增(Platform Shell) - `POST /v1/platform/runs` — 建立 run(async) - `GET /v1/platform/runs/{run_id}` — 查詢 run state - `GET /v1/platform/runs/{run_id}/steps` — 查詢 SAGA steps - `POST /v1/platform/runs/{run_id}/cancel` — 取消 run ### 6.3 Phase 4-5 新增(Approval) - `POST /v1/platform/runs/{run_id}/approve` — 帶 approval_token 的 resume - `POST /v1/platform/runs/{run_id}/reject` — 拒絕(帶理由) ### 6.4 Phase 6 新增(Tenant) - `POST /v1/platform/projects` — 建立 project(admin only) - `GET /v1/platform/projects/{project_id}/migration_state` — 查詢 Strangler Fig 狀態 - `POST /v1/platform/projects/{project_id}/contracts` — 建立 contract draft - `POST /v1/platform/projects/{project_id}/contracts/{contract_id}/publish` — publish - `POST /v1/platform/projects/{project_id}/contracts/{contract_id}/activate` — activate ### 6.5 Phase 7 新增(Channel Hub) - `GET /v1/platform/channel_events` — 查詢 conversation events(with pagination) - `POST /v1/platform/outbound` — 發送 outbound message(admin/test) --- ## 7. 錯誤碼字典(必補 9 個) | Error Code | HTTP Status | 描述 | 場景 | |------------|-------------|------|------| | `E-SCHEMA-001` | 422 | LLM output schema validation failed | Phase 3 contract validator | | `E-BUDGET-001` | 429 | Token budget hard kill triggered | Phase 4 budget guard | | `E-APPROVAL-001` | 401 | approval_token missing or invalid | Phase 8 approval resume | | `E-APPROVAL-002` | 401 | approval_token expired | Phase 8 | | `E-APPROVAL-003` | 409 | approval_token already used (replay) | Phase 8 | | `E-MCP-GATE-001` | 403 | MCP tool not authorized for this project | Phase 5 | | `E-MCP-GATE-002` | 403 | MCP tool not authorized for this agent | Phase 5 | | `E-MCP-GATE-003` | 403 | MCP write/execute tool blocked (not in auto_remediate mode) | Phase 5/8 | | `E-TENANT-001` | 403 | Cross-tenant data access blocked | Phase 2+ | | `E-IDEMPOTENT-001` | 200 | Duplicate event, returning existing run_id | Phase 4 | | `E-RATE-001` | 429 | Project rate limit exceeded | Phase 2+ | | `E-SAGA-001` | 500 | SAGA compensation failed, manual intervention required | Phase 4/ADR-119 | --- ## 8. 前端 Operator Console(frontend-designer,8 個模組) > 實作在 Phase 8 之後(或 Phase 6 可 prototype Operator Console) > ADR-UI-01~04 定架構,此處為工作項清單 | 模組 | 描述 | 優先順序 | |------|------|---------| | **Tenant Management** | project 列表、建立、migration_state 視覺化、budget 設定 | P1(Phase 6 prototype)| | **Contract Lifecycle** | draft/publish/activate 操作、revision diff、六合約 family 篩選 | P1(Phase 6 prototype)| | **Run Monitor** | run FSM 視覺化、shadow/canary/active 標記、trace_id drill-down | P1(Phase 4 後)| | **Strangler Fig Dashboard** | shadow→canary gate 量化指標(divergence / latency / error rate)即時儀表板 | P1(Phase 7 後)| | **Budget & Cost** | per-project token budget、hard kill 觸發歷史、成本趨勢(GCP Ollama vs paid provider)| P2 | | **Audit Log Viewer** | audit log 查詢(redaction 後)、secret 命中警告、trace_id 關聯 | P2 | | **MCP Gateway Admin** | tool registry、grants 管理、credential refs(masked)、audit | P2 | | **Principal Directory** | platform_subject 查詢、Telegram/LINE/API user mapping | P3 | **與現有設計系統整合**: - 必須使用 next-intl(禁止 hardcode 中文/英文) - 禁止 emoji,使用 Lucide/SVG icon - 遵循 `feedback_design_system_consistency.md` 全站設計規範 - 禁止直接存取內網 IP(`feedback_frontend_internal_ip_ban.md`) --- ## 9. 重構切割計畫(11 PR,refactor-specialist) > 每 PR 必須獨立可合併、有 rollback 能力、不依賴後 PR | PR# | 標題 | 前置 PR | 影響範圍 | 風險 | |-----|------|---------|---------|------| | PR-01 | `telemetry.py:71` 硬碼 IP assert 移除 | 無 | 1 行 | 低 | | PR-02 | `decision_manager.py:240` silence key 常數化 | 無 | 2 行 | 低 | | PR-03 | `ollama_auto_recovery.py:230` 第二定義移除 | 無 | ~5 行 | 低 | | PR-04 | `_provider` → `__provider`(registry.py)| 無 | ~20 行 | 低 | | PR-05 | `mcp_bridge.py` namespace 動態化 | 無 | ~30 行 | 中 | | PR-06 | `consensus_engine.py` CONSENSUS_PREFIX 加 project 前綴 | Phase 2 Redis 雙寫 Phase A | ~15 行 | 中 | | PR-07 | nonce 重設計 + webhook timestamp/nonce(ADR-116)| 無 | ~100 行 | 高(安全修補)| | PR-08 | Repository project_id filter 批次 1(incidents/playbooks/km)| Phase 1 schema | ~200 行 | 中 | | PR-09 | Repository project_id filter 批次 2(mcp/ai_decisions/approval)| PR-08 | ~200 行 | 中 | | PR-10 | Background loop 標記(31 個 loop,main.py)| ADR-123 | ~150 行 | 中 | | PR-11 | AnomalyCounter per-project 改造 | PR-10 | ~80 行 | 中 | > PR-01~05 可並行(無依賴),先做先進。 > PR-06~07 需要 Redis 雙寫 Phase A 先完成。 > PR-08~09 需要 Phase 1 schema 先完成。 --- ## 10. Feature Flag / Kill-Switch Registry | Flag 名稱 | 預設值 | 說明 | 開啟條件 | |-----------|--------|------|---------| | `AWOOOP_SHADOW_MODE` | OFF | 啟用 shadow run(鏡像但不回應)| Phase 4 完成後手動 flip | | `AWOOOP_CANARY_MODE` | OFF | 啟用 canary(部分 user-visible 回應)| shadow gate 14天量化通過 | | `AWOOOP_READ_ONLY_MODE` | OFF | read-only 查詢搬到 AwoooP | canary gate 7天量化通過 | | `AWOOOP_SUGGEST_MODE` | OFF | AI 建議但人工決定 | read_only gate 14天通過 | | `AWOOOP_WRITE_MODE` | OFF | 受控 write/execute tool 啟用 | suggest gate 30天通過 + rollback evidence ≥3 | | `AWOOOP_BUDGET_HARD_KILL` | ON | token budget 超額直接終止(非只告警)| **預設 ON**(ADR-120)| | `AWOOOP_MCP_OAUTH21` | OFF | MCP OAuth 2.1 flow(ADR-117)| Phase 5 完成後 | | `AWOOOP_RLS_STRICT` | OFF | 嚴格 RLS 模式(禁止 awooop_platform bypass)| Phase 2 完成 + 30天 soak | | `AWOOOP_EWOOOC_LIVE` | OFF | EwoooC tenant 切為 live(非 shadow)| Phase 6 canary 7天通過 | --- ## 11. Runbook 清單(8 份,debugger 需求) | Runbook | 位置 | 觸發情境 | 主要步驟 | |---------|------|---------|---------| | **RB-01**: AwoooP Contract Publish Failure | `docs/runbooks/awooop-contract-publish-failure.md` | schema 驗證失敗、CODEOWNERS reject | 1.查 body_hash 2.查 draft 狀態 3.rollback to previous active | | **RB-02**: Run State Stuck / Stale Lease | `docs/runbooks/awooop-run-stuck.md` | run 停在 RUNNING > 10min | 1.查 lease_until 2.手動 reaper 3.查 saga_steps 決定補償或放棄 | | **RB-03**: Budget Hard Kill Triggered | `docs/runbooks/awooop-budget-hard-kill.md` | E-BUDGET-001 大量出現 | 1.查 budget_ledger 2.確認 hard_kill_at 閾值 3.是否 incident 爆發 4.臨時上調 or 等下月 reset | | **RB-04**: Phase Rollback(Strangler Fig)| `docs/runbooks/awooop-phase-rollback.md` | canary 錯誤率 > threshold | 1.切回 project_migration_state 到上一個 mode 2.清 Redis canary cache 3.通知 EwoooC(如果影響到)| | **RB-05**: Approval Token Replay 告警 | `docs/runbooks/awooop-approval-replay.md` | E-APPROVAL-003 出現 | 1.查 jti Redis key 2.確認 IP / user 3.吊銷 token 4.通知安全 | | **RB-06**: Cross-Tenant Data Leak 告警 | `docs/runbooks/awooop-cross-tenant-leak.md` | E-TENANT-001 大量出現 | 1.立即停 canary/active mode 2.查 audit log 3.RLS 設定確認 4.PITR restore 評估 | | **RB-07**: GCP Ollama Failover 異常 | `docs/runbooks/awooop-gcp-ollama-failover.md` | GCP-A/B 同時掛、Local fallback 也掛 | 1.確認 `platform:ollama:primary` Redis key 2.手動設定 fallback 3.確認 paid provider 緊急路由 | | **RB-08**: SAGA Compensation 失敗 | `docs/runbooks/awooop-saga-compensation-fail.md` | E-SAGA-001 出現 | 1.查 saga_steps JSON 2.找失敗 step 3.手動執行補償指令 4.更新 run 狀態 | --- ## 12. 工具補強計畫(tool-expert) | 工具 | 用途 | 安裝位置 | Phase | |------|------|---------|-------| | **PgBouncer** | AwoooP 多 worker 下 PG connection pool 防爆 | K8s sidecar 或獨立 Pod | Phase 4 之前 | | **Sealed Secrets** | 替代 K8s Secret 明文,CI/CD 安全 | K3s cluster | Phase 2(security hardening 時)| | **OPA / Cedar** | policy engine,授權邏輯集中化(取代散落程式碼)| 作為 sidecar 或 admission webhook | Phase 5 之前 | | **chaostoolkit / LitmusChaos** | Strangler Fig 切換的混沌驗證(worker 崩潰、Redis 宕機、PG timeout)| CI pipeline | Phase 4 完成後 | | **awooop-ctl** | AwoooP CLI(contract CRUD / run 查詢 / migration state 管理)| 本地 CLI + CI | Phase 6 之前 | | **pg_partman** | PostgreSQL partition 自動管理 | K8s Pod / cron | Phase 4(run_state 上線前)| | **pgvector(已有)** | KM 向量搜索 | 已存在,需 per-project namespace | Phase 2 | | **OpenTelemetry Collector** | OTel pipeline(ADR-121),現在直送 SignOz 188:24318,未來需 sampling | K8s DaemonSet | Phase 4 之前 | --- ## 13. 業界對齊(web-researcher 發現) ### 13.1 $47k Agent Loop 事故教訓(Token Budget Hard Kill) 問題:alert ≠ enforcement。僅發 Prometheus alert 但 agent 仍繼續執行,一個 loop 燒了 $47k。 AwoooP 解法(ADR-120): - 三層 budget limit:per-run / per-project / per-tenant - **Hard Kill**:超額 → 直接終止 run(not just log/alert) - Redis hot counter(每次 call 減少)+ PG budget_ledger 事務(final decision) - `AWOOOP_BUDGET_HARD_KILL` feature flag 預設 ON(唯一預設開啟的 flag) ### 13.2 Durable Execution / SAGA 補償交易(ADR-119) 業界標準(Temporal / Conductor / Azure Durable Functions):multi-step tool chain 必須有 step-level journal + 補償機制。 AwoooP 解法: - `saga_steps` JSONB 欄位在 `awooop_run_state` - 每個 tool call 記錄:step_id / tool / status / compensation_cmd / completed_at - 失敗時執行補償指令(反向操作) - 補償失敗 → E-SAGA-001 + Runbook RB-08 ### 13.3 MCP OAuth 2.1 Confused Deputy(ADR-117) MCP spec 2025-06-18 要求: - per-tenant dynamic client registration(RFC 7591) - Resource Indicators(RFC 9728):防止 token 被跨 resource server 使用 - PKCE(RFC 7636):防止 authorization code interception AwoooP 解法(ADR-117): - 每個 tenant 動態 client registration,不共用 client_id - Resource Indicator 必須匹配 tool registry 的 target URI - `E-MCP-GATE-001/002/003` error codes 覆蓋 Confused Deputy 情境 ### 13.4 OTel GenAI Semantic Conventions(ADR-121) 官方規範(opentelemetry-specification/semantic_conventions/gen-ai): - span 命名:`gen_ai.{system}.{operation}`(e.g., `gen_ai.anthropic.chat`) - token attribute:`gen_ai.usage.input_tokens` / `gen_ai.usage.output_tokens` - model attribute:`gen_ai.request.model` / `gen_ai.response.model` AwoooP 解法:全部 LLM call 必須 emit 以上 attribute,進 SignOz(188:24318)。 ### 13.5 OWASP Agentic AI Top 10 對齊(ADR-122) | OWASP 項目 | AwoooP 對應控制 | |-----------|---------------| | OAI-01 Prompt Injection | MCP Gateway result sanitization + schema validator | | OAI-02 Insecure Tool Use | Five-gate MCP enforcement + audit | | OAI-03 Excessive Agency | requires_approval from policy(禁 LLM 決定)+ write/execute feature flag | | OAI-04 Supply Chain | contract publish HMAC + artifact SHA-256 | | OAI-05 Data Leakage | audit log redaction + credential isolation | | OAI-06 Insufficient Observability | OTel GenAI + audit sink + run trace_id | | OAI-07 Unsafe Orchestration | SAGA journal + compensation + hard kill | | OAI-08 Memory Vulnerabilities | contract revision immutability + RLS | | OAI-09 Access Control Bypass | approval_token HS256 + jti replay prevention | | OAI-10 Resource Exhaustion | Token Budget Hard Kill(ADR-120)| --- ## 14. GCP Ollama 拓撲對 AwoooP 的影響(ADR-110 整合) ### 14.1 新拓撲(ADR-110 + ADR-125,2026-05-05 修正) ``` Phase 0 bridge: Primary : GCP-A http://192.168.0.110:11435 (110 nginx → GCP public IP) Secondary: GCP-B http://192.168.0.110:11436 Fallback : Local http://192.168.0.111:11434 Emergency: Gemini → Nemotron → Claude (全 Ollama 掛時,budget gated) Target private mesh: Primary : GCP-A http://10.77.114.21:11434 Secondary: GCP-B http://10.77.114.22:11434 Fallback : Local http://10.77.114.111:11434 ``` ADR-125 修正 ADR-110 的傳輸層:公網 GCP IP / 110 nginx proxy 僅保留為 過渡與 rollback bridge。正式路徑是 WireGuard private mesh;runtime 路由由 AwoooP Inference Gateway 管理。 ### 14.2 AwoooP 必須處理的影響項目 | 影響項 | 位置 | 處理方式 | Phase | |--------|------|---------|-------| | `ollama:current_primary` Redis key 雙寫(只支援 1 個 URL,新需要 3 層)| INV-1 | 改為 `platform:ollama:topology`(JSON:primary/secondary/fallback)| Phase 2 | | `ollama_auto_recovery.py:230` 第二定義(P0-11)| ollama_auto_recovery.py | 移除,統一從 config 讀 | Phase 2 PR-03 | | GCP public IP 進 INV-4(34.143.170.20, 34.21.145.224)| INV-4 | 標為 transitional only;正式改用 `10.77.114.21/22` mesh IP | Phase 0 INV-4 | | WireGuard mesh | ADR-125 / runbook | 建立 `10.77.114.0/24` private transport;關閉 public 11434 | Phase 2 前置 | | AwoooP Inference Gateway | ADR-125 / runbook | alert-fast / code-review / embedding / deep-rca lane 隔離,避免重模型搶告警 lane | Phase 4 | | EwoooC Provider Proxy 走 GCP Ollama 路由 | Phase 6 | EwoooC 共用 platform Ollama topology(platform_resource)| Phase 6 | | `telemetry.py:71` IP assert(P0-08)| telemetry.py:71 | 移除後,GCP IP 不再觸發 assert;改為 config-driven | Phase 2 PR-01 | | budget_ledger 記錄 Ollama usage(免費 GCP 仍需 token 計數)| Phase 4 | Ollama call 也必須記錄 token 消耗(budget_ledger)| Phase 4 | | Runbook RB-07(GCP Ollama failover 異常)| docs/runbooks/ | Phase 0 寫 Runbook,Phase 4 後實際 E2E 測試 | Phase 0 | ### 14.3 Ollama GCP 為 platform_resource(ADR-111) GCP Ollama(bridge: 34.143.170.20 / 34.21.145.224;target mesh: 10.77.114.21 / 10.77.114.22)與 Local Ollama(192.168.0.111 / target 10.77.114.111)一律聲明為 `platform_resource`: - 不屬於任何 tenant - 所有 tenant(AWOOOI / EwoooC / Tsenyang / Bitan)共用,但 audit 記錄各自 project_id - `platform:ollama:topology` Redis key 前綴為 `platform:`(非 `{project_id}:`) ### 14.4 實測限制(2026-05-05) `scripts/ops/ollama-topology-check.sh` 實測: - GCP-A `gemma3:4b` 約 2s,但 `size_vram=0` - GCP-B `gemma3:4b` 約 8.5s,但 `size_vram=0` - 111 fallback `gemma3:4b` 約 4.9s,`size_vram=8210446336` 結論:GCP-A/B 可以作為同步 `alert-fast` lane,但不可承擔 14B/32B 同步告警診斷。 重模型需由 Inference Gateway 分流到 async / 111 / GPU 節點。 --- ## 15. 工作排序總表(含並行群組 + Critical Path) ### Critical Path(序列執行,不可跳) ``` Phase 0 全部 ADR/INV → Phase 1 Schema(PR-01/02/03/04/05 可並行先做) → Phase 2 Security Hardening + Redis 遷移(PR-06~11) → Phase 3 Contract Packages → Phase 4 Platform Shell(PgBouncer + OPA/pg_partman 同步準備) → Phase 5 MCP Gateway → Phase 6 EwoooC(14天 shadow gate) → Phase 7 Channel Hub(7天 canary gate) → Phase 8 Suggest + Write(30天 suggest gate) ``` ### 可並行工作群組 | 群組 | 工作 | 可與哪個並行 | |------|------|-----------| | G-A(Phase 0 並行)| ADR-111~115 各自獨立 | 全部並行(5 份 ADR 各分配一位)| | G-B(Phase 0 並行)| ADR-116~124 | 與 G-A 並行 | | G-C(Phase 0 並行)| INV-1~INV-9(部分依賴 codebase 探索)| 與 G-A/G-B 並行 | | G-D(Phase 2 並行)| PR-01/02/03/04/05(獨立小修補)| 全部並行 | | G-E(Phase 2 並行)| Redis 雙寫 + repository 改造 + security hardening | 各自獨立,但 security hardening 優先 | | G-F(Phase 4 並行)| PgBouncer 安裝 + pg_partman 安裝 + OPA 安裝 | 與 Phase 3 Contract Packages 並行 | | G-G(Phase 5-6 並行)| Operator Console prototype(ADR-UI-01~04)| 與 Phase 6 EwoooC shadow 並行 | ### 完整排序表 | 順序 | 工作 | docs-only | 並行群組 | 阻擋誰 | |------|------|-----------|---------|-------| | 1-A | ADR-111 Bootstrap Order | ✅ | G-A | Phase 2 | | 1-B | ADR-112 Contract Governance | ✅ | G-A | Phase 3 | | 1-C | ADR-113 Active Revision Outbox | ✅ | G-A | Phase 1 | | 1-D | ADR-114 Idempotency & Worker Lease | ✅ | G-A | Phase 4 | | 1-E | ADR-115 Principal Mapping | ✅ | G-A | Phase 6、7 | | 2-A | ADR-116 Security Hardening | ✅ | G-B | Phase 2 | | 2-B | ADR-117 MCP OAuth 2.1 | ✅ | G-B | Phase 5 | | 2-C | ADR-118 RLS Strategy | ✅ | G-B | Phase 1 | | 2-D | ADR-119 Durable Execution SAGA | ✅ | G-B | Phase 4 | | 2-E | ADR-120 Token Budget Hard Kill | ✅ | G-B | Phase 4 | | 2-F | ADR-121 OTel GenAI | ✅ | G-B | Phase 4 | | 2-G | ADR-122 OWASP Agentic AI | ✅ | G-B | 全 Phase | | 2-H | ADR-123 Background Loop Migration | ✅ | G-B | Phase 2 | | 2-I | ADR-124 Global Singleton Decomposition | ✅ | G-B | Phase 2 | | 2-J | ADR-UI-01~04 Operator Console ADR | ✅ | G-B | Phase 6+ | | 2-K | ADR-106 補 Quantified Gates | ✅ | G-B | Phase 8 | | 3-A | INV-1 Redis Keys | ✅ | G-C | Phase 2 | | 3-B | INV-2 Repository Retrofit Map | ✅ | G-C | Phase 2 | | 3-C | INV-3 Entrypoints | ✅ | G-C | Phase 2 | | 3-D | INV-4 Hardcoded Namespace/IP(含 GCP IP)| ✅ | G-C | Phase 2 | | 3-E | INV-5 Migration Compatibility Matrix | ✅ | G-C | Phase 1 | | 3-F | INV-6 Rollback Playbook Register | ✅ | G-C | Phase 4 | | 3-G | INV-7 PR Cutting Plan | ✅ | G-C | Phase 2 | | 3-H | INV-8 Background Loop Catalog(31 個)| ✅ | G-C | Phase 2 | | 3-I | INV-9 Global Singleton Catalog(13 個)| ✅ | G-C | Phase 2 | | 4 | Task 9 順序修正(Dockerfile/ConfigMap)| ❌ | — | Phase 1 | | 5 | **Phase 1 Schema Migration**(重寫版)| ❌ | — | Phase 2~8 | | 6-A | PR-01/02/03/04/05(並行小修補)| ❌ | G-D | Phase 2 | | 6-B | **Phase 2 Security Hardening**(PR-07 優先)| ❌ | G-E | Phase 4 | | 6-C | Phase 2 Redis 雙寫 + Repository(PR-06/08/09/10/11)| ❌ | G-E | Phase 4 | | 7 | **Phase 3 Contract Packages**(packages/awooop-contracts/)| ❌ | — | Phase 4 | | 8-A | PgBouncer + pg_partman + OPA 安裝 | ❌ | G-F | Phase 4 | | 8-B | **Phase 4 Platform Shell + Shadow**(含 SAGA + Budget Kill)| ❌ | — | Phase 5 | | 9 | **Phase 5 MCP Gateway**(含 OAuth 2.1)| ❌ | — | Phase 6 | | 10-A | **Phase 6 EwoooC Shadow Onboarding**(14 天 gate)| ❌ | G-G | Phase 7 | | 10-B | Operator Console prototype(G-G)| ❌ | G-G | Phase 7+ | | 11 | **Phase 7 Channel Hub**(7 天 canary gate)| ❌ | — | Phase 8 | | 12 | **Phase 8 Suggest + Controlled Write**(30 天 gate)| ❌ | — | AwoooP v1 GA | **1-A 到 3-I 全部 docs-only,可在當前對話視窗連續完成,完成後才開新 Codex 對話進 Phase 1 code。** --- ## 16. 量化驗收門檻(完整版) ### Strangler Fig Gates | 切換 | 量化條件 | 簽核 | |------|---------|------| | pre → shadow | tenant 已建 + agent contract published + audit/trace 寫入正常 | critic 確認 | | shadow → canary | ≥14 天 + decision divergence < 5% + p95 退化 < 10% + 0 P0/P1 incident + audit 0 secret | critic + db-expert + vuln-verifier | | canary → read_only | ≥7 天 + user-visible error rate < 0.5% + cost diff < 50% 預算 | critic + vuln-verifier | | read_only → suggest | ≥14 天 + suggest accept rate ≥ 50% + 0 hallucination escalation | critic | | suggest → auto_remediate | ≥30 天 + rollback evidence ≥ 3 成功 + approval token live + dry-run pass ≥ 99% | critic + db-expert + vuln-verifier | ### Phase 驗收門檻(量化補強) | Phase | 必要量化指標 | |-------|-----------| | Phase 1 | migration up/down dry-run 通過;RLS cross-project 拒絕率 100%;AWOOOI 0 行為改動(regression pass rate 100%)| | Phase 2 | INV-1 P0 key 遷移完成率 100%;vuln-verifier PoC 通過率 3/3;hardcode grep 結果 0 | | Phase 3 | contract schema 覆蓋率 100%(6 個 family);invalid fixture 拒絕率 100% | | Phase 4 | shadow run 0 user-visible response;duplicate event 唯一 run rate 100%;stale reaper 1min 內回收率 100% | | Phase 5 | credential leak test 通過率 100%;Five-gate integration test 覆蓋率 100% | | Phase 6 | cross-tenant data access 拒絕率 100%;EwoooC shadow 14天 gate 通過 | | Phase 7 | 首則進度訊息 ≤ 30s 達成率 ≥ 99%;duplicate retry 0 重複 run | | Phase 8 | approval replay 拒絕率 100%;write/execute 預設 OFF 驗證通過 | --- ## 17. 關聯文件索引 - [ADR-106: AwoooP 架構](../adr/ADR-106-agent-platform-architecture.md) - [ADR-107: 控制面儲存策略](../adr/ADR-107-awooop-control-plane-storage.md) - [ADR-110: GCP Ollama 三層容災拓撲](../adr/ADR-110-gcp-ollama-topology.md) - [MASTER-WORKPLAN.md](MASTER-WORKPLAN.md)(本文展開的主索引) - [IMPLEMENTATION-ROADMAP.md](IMPLEMENTATION-ROADMAP.md)(歷史文件,舊版草稿) - 待建:`docs/awooop/inventory/` INV-1~INV-9 - 待建:ADR-111~ADR-124(AwoooP 專用 ADR 系列) - 待建:ADR-UI-01~ADR-UI-04(Operator Console ADR) - 待建:`docs/runbooks/` RB-01~RB-08 --- *最後更新:2026-05-03(台北時區)* *建立:12-Agent 聯合審查 × Codex 整合* *下一步:Phase 0 docs-only 工作(ADR-111 起),完成後開新 Codex 對話進 Phase 1 code*