- **MCP Gateway**: the only execution choke point for tool calls.
- **MCP Gateway**:工具呼叫的唯一執行閘門。
- **Approval / Trust**: the only authorization state machine.
- **Approval / Trust**:唯一授權狀態機。
- **KM / RAG / PlayBook**: the only knowledge substrate.
- **KM / RAG / PlayBook**:唯一知識底座。
- **Channel Event / Audit**: the only observability and trace stream.
- **Channel Event / Audit**:唯一觀測、稽核與追蹤流。
If AI automation and AwoooP are implemented as separate tracks, the system will drift into duplicate approval state machines, duplicate audit flows, duplicate routing gates, duplicate MCP and Channel decisions, and frontend states that do not match the backend execution truth.
如果 AI 自動化與 AwoooP 分成兩條軌道推進,系統會漂移出兩套簽核狀態機、兩套稽核流、兩套路由閘門、兩套 MCP 與 Channel 決策,最後前端看到的狀態會與後端真實執行狀態不一致。
## 2. Shared Target Loop
## 2. 共同目標迴路
The target loop is:
目標閉環如下:
```text
```text
monitoring / alerting
監控 / 告警
-> event classification
-> 事件分類
-> rule match or rule creation
-> 規則匹配或規則建立
-> PlayBook / KM / RAG retrieval
-> PlayBook / KM / RAG 檢索
-> AI decision
-> AI 決策
-> Approval / Trust gate
-> Approval / Trust Gate
-> MCP Gateway execution
-> MCP Gateway 執行
-> audit and trace
-> 稽核與追蹤
-> execution verification
-> 執行驗證
-> KM / rule / PlayBook feedback
-> KM / Rule / PlayBook 回饋
```
```
AwoooP surfaces and governs that loop. It does not create an independent execution lane.
AwoooP 負責呈現與治理這條迴路,不建立獨立的執行 lane。
## 3. Architectural Invariants
## 3. 架構不變式
1.**One execution gate**: production MCP calls must pass through MCP Gateway. Direct provider calls are compatibility debt and must become`forbid-new`.
2.**One approval state machine**: AwoooP approvals, AWOOOI approval records, TrustEngine, and Telegram approval buttons must converge to one signed, auditable flow.
3.**One audit stream**: Channel events, runtime state, MCP audit, model routing, and approval decisions must be joinable by`project_id`, `run_id`, and `trace_id`.
4.**One knowledge substrate**: KM, RAG, and PlayBook embeddings must use consistent dimensions and model naming. Test fixtures must match production schema.
4.**唯一知識底座**:KM、RAG、PlayBook embeddings 必須使用一致的維度與模型命名。測試 fixtures 必須與 production schema 一致。
5.**One routing control plane**: provider/model fallback must resolve through EffectivePolicy or a wrapped legacy equivalent during strangler migration.
7.**No client-only identity**: any operator identity supplied by frontend body is display metadata only; authorization identity must come from the authenticated principal.
7.**禁止 client-only 身份**:frontend body 提供的 operator identity 只能當顯示 metadata;授權身份必須來自已認證 principal。
8.**No channel policy decisions**: Telegram, LINE, Slack, and email adapters deliver and track messages, but do not decide model, tool, approval, or incident policy.
### 5.1 P0: Production-broken or Security-critical
### 5.1 P0:production 會壞或安全關鍵
| ID | Risk | Impact | Required direction |
| ID | 風險 | 影響 | 必要方向 |
| --- | --- | --- | --- |
| --- | --- | --- | --- |
| P0-A | MCP Gateway is not yet a production choke point; callers can still reach`provider.execute()` directly. | Gateway gates, redaction, and audit can be bypassed. | Wrap all production MCP call sites; then mark direct provider access `forbid-new`. |
| P0-A | MCP Gateway 尚未成為 production choke point;caller 仍可直接碰`provider.execute()`。 | Gateway gate、redaction、audit 可能被繞過。 | 包裝所有 production MCP call sites,然後將 direct provider access 標記為`forbid-new`。 |
| P0-B | MCP blocked-call audit has gaps when`tool_row`is missing or Gate 1/2 rejects early. | Denied or suspicious calls can disappear from audit. | Audit attempt before and after gate evaluation with safe redaction. |
| P0-C | Legacy K8s tool execution still has command/shell injection risk. | Destructive command path can be polluted by LLM or user input. | Parse command structure, avoid shell, enforce operation schema and allowlists. |
| P0-D | `MCPToolResult(data=...)` mismatches dataclass fields in some success paths. | Sentry / ArgoCD or similar tools can crash on valid results. | Normalize result schema and regression-test all MCP providers. |
| P0-E | RAG/KM/PlayBook embedding dimensions remain split between 768 and 1024. | Search or backfill can silently fail or hide production drift. | Standardize on`bge-m3` 1024 dimensions and remove stale fixtures. |
| P0-F | KM backfill reconciler is missing required async imports or runtime dependencies. | Repair path that should recover embeddings may crash. | Compile and integration-test the reconciler path. |
| P0-G | Ollama routing still has direct `OLLAMA_URL` consumers. | GCP-A/GCP-B/111 ordering and Gemini fallback policy can be bypassed. | Inventory, wrap, and migrate call sites to resolver / EffectivePolicy. |
| P0-I | Approval APIs trust frontend/body identity such as`approver_id: "operator"`. | Audit identity is not legally or operationally usable. | Decide endpoint must derive principal from auth/session/token, not body. |
| P0-J | API control plane lacks real authorization in some paths; CSRF is not authorization. | Operator APIs can be invoked without a real role boundary. | Add authenticated principal and role checks to AwoooP APIs. |
| P0-J | 部分 API control plane 缺少真正 authorization;CSRF 不是 authorization。 | Operator APIs 可能在沒有真實角色邊界下被呼叫。 | 為 AwoooP APIs 加入 authenticated principal 與 role checks。 |
| P0-K | Alertmanager internal bypass plus`X-Forwarded-For`and broad trusted hosts may allow spoofed source identity. | Forged alert ingress can create incidents or approvals. | Require signed webhook or strict trusted proxy chain. |
| P1-B | TrustEngine uses process memory dict. | Pending approvals vanish or split across pods/restarts. | Move authoritative state to PostgreSQL; Redis only for cache/notification. |
| P1-D | Kustomize may omit service registry, HPA, VPA, backup cron, or related resources. | GitOps drift and partial deploys. | Inventory generated/live manifests and add missing resources. |
| P1-E | Hardcoded restart/SSH actions remain in alert rules. | Conflicts with AI automation direction and rule/PlayBook evidence loop. | Convert hardcoded actions into PlayBook proposals with trust gates. |
| P1-G | Tests still use stale 768-dimension vectors. | Dev tests can pass while production RAG fails. | Test fixtures must mirror production vector dimensions. |
| P1-G | Tests 仍使用 stale 768 維 vectors。 | Dev tests 可能通過,但 production RAG 失效。 | Test fixtures 必須映射 production vector dimensions。 |
| P1-H | GCP-B can remain passive fallback only. | Active-active design is not realized. | Use GCP-B for batch, embedding, shadow, and canary lanes via policy. |
| P2-D | Docs and runtime can drift. | Future sessions repeat the same analysis. | Keep ADR, LOGBOOK, inventory, and release checklists linked to commits. |
The following corrections must be applied when consuming the earlier 12-agent inventories:
消化早期 12-agent inventories 時,必須套用以下修正:
1. C21/C22 CronJob service account and image issues appear to have been partially fixed in the worktree, but must be verified against live cluster state before closure.
1. C21/C22 CronJob service account 與 image 問題在 worktree 中看起來已部分修正,但結案前必須對 live cluster state 驗證。
2. C1 `aioredis`is not necessarily a top-level import in all affected paths; at least one Gate 5 runtime import still makes the risk valid.
4. C12 is partially stale for the main embedding service, but`knowledge_rag_service.py`and`playbook_rag.py`still need verification for 1024-dimensional consistency.
4. C12 對主要 embedding service 的描述部分過期,但`knowledge_rag_service.py`與`playbook_rag.py`仍需要驗證 1024 維一致性。
## 7. Twelve-agent Ownership Matrix
## 7. 12-agent 權責矩陣
| Owner | Scope | First verification |
| Owner | 範圍 | 第一個驗證點 |
| --- | --- | --- |
| --- | --- | --- |
| Chief Architect | Total blueprint, dependency order, red-zone governance | This integration plan is linked from AwoooP master workplan and LOGBOOK. |
| Frontend / AwoooP | i18n, sidebar, approver identity, internal IP ban | `/zh-TW/awooop` renders without redirect error and no private `NEXT_PUBLIC_*`bundle leak. |
| Frontend / AwoooP | i18n、sidebar、approver identity、internal IP ban | `/zh-TW/awooop` 可正常 render,且 browser bundle 無 private `NEXT_PUBLIC_*`外洩。 |
| QA / Test | Integration tests, no-mock scanning, regression matrix | High-risk paths have production-like integration tests, not only mocks. |
1.Wire`scripts/ops/deploy-alertmanager-config.sh`into the reboot/release checklist, then consider whether CD should run it for`ops/alertmanager/**`changes.
1.將`scripts/ops/deploy-alertmanager-config.sh`納入 reboot/release checklist,並評估`ops/alertmanager/**`變更是否應由 CD 自動執行。
2.Continue Wave 1 with MCP Gateway bypass and MCP audit completeness, because production callers can still route around the gateway.
4.Add a Sentry/Snuba post-reboot health gate: ClickHouse table existence, Snuba migration status, and Kafka consumer offsets must be part of cold-start validation.
5.Add a post-deploy Alertmanager live check for `amtool check-config`, container status, and config-file mode; direct Telegram must remain emergency-only and target the SRE group.
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.