chore(rls): 套用 tool registry canary wave1.1
All checks were successful
Code Review / ai-code-review (push) Successful in 10s
All checks were successful
Code Review / ai-code-review (push) Successful in 10s
This commit is contained in:
@@ -1,3 +1,62 @@
|
||||
## 2026-05-12 | RLS Canary Wave1.1 已套用
|
||||
|
||||
**背景**:Wave1 空表 canary 已完成後,下一個候選是低行數非空表。Live preflight 顯示 `awooop_projects=2 rows`、`awooop_mcp_tool_registry=4 rows`;本輪先做 read-path 盤點再決定範圍。
|
||||
|
||||
**範圍校正**:
|
||||
- `awooop_projects` 暫不納入:
|
||||
- `platform_operator_service.list_tenants()` 目前使用 `get_db_context("awoooi")`,但 API contract 寫明 Operator Console 要返回所有 projects。
|
||||
- 若直接開 tenant policy,`ewoooc` row 會被 `awoooi` context 隱藏,破壞 Operator Console 跨租戶視圖。
|
||||
- 需先建立 platform-admin/bypass DB path 或重定義 list-tenants 語意。
|
||||
- `awooop_mcp_tool_registry` 納入 Wave1.1:
|
||||
- live data:`ewoooc=4`。
|
||||
- runtime read path:`McpGateway._gate3_tool()` 依 `ctx.project_id` + `tool_name` + `is_active` 查詢。
|
||||
|
||||
**新增/更新 artifact**:
|
||||
- 新增 apply / rollback SQL:
|
||||
- `scripts/ops/awooop-rls-canary-wave1-1-tool-registry.sql`
|
||||
- `scripts/ops/awooop-rls-canary-wave1-1-tool-registry-rollback.sql`
|
||||
- 新增 `docs/runbooks/AWOOOP-RLS-CANARY-WAVE1-1.md`。
|
||||
- 更新 `scripts/ops/awooop_rls_preflight.py`:
|
||||
- 對 `--exact-counts` 增加 `scope=rls_filtered|global_visible` 與 `project_context`。
|
||||
- 當已啟用 RLS 的表存在時,新增 `WARN exact_counts_scope`,避免把 app-role tenant-visible count 誤讀成全域 count。
|
||||
|
||||
**production apply**:
|
||||
- 已同步到 188 `/home/ollama/awoooi-ops/`:
|
||||
- `awooop-rls-canary-wave1-1-tool-registry.sql`
|
||||
- `awooop-rls-canary-wave1-1-tool-registry-rollback.sql`
|
||||
- 以 postgres/operator socket path 執行:
|
||||
- Docker image:`pgvector/pgvector:pg14`
|
||||
- UID/GID:`115:121` (`postgres:postgres`)
|
||||
- DB:`awoooi_prod`
|
||||
- Apply result:`COMMIT`,`awooop_mcp_tool_registry` 已 `ENABLE ROW LEVEL SECURITY` + `FORCE ROW LEVEL SECURITY` + fail-closed `FOR ALL TO awooop_app` policy。
|
||||
|
||||
**套用後驗證**:
|
||||
- `awooop_mcp_tool_registry` → `rls=true force=true policies=1 fail_open=false`。
|
||||
- API pod behavior test:
|
||||
- `tool_registry_no_context=0`
|
||||
- `tool_registry_ewoooc_context=4`
|
||||
- `tool_registry_awoooi_context=0`
|
||||
- `tool_registry_insert_with_context=allowed_and_rolled_back`
|
||||
- `tool_registry_probe_rows_after=0`
|
||||
- operator/global count → `ewoooc=4`。
|
||||
- production health `/api/v1/health` → 200 healthy。
|
||||
- runtime/manual audits 仍為:
|
||||
- runtime access audit:`BLOCKED=0 ALLOW=10`
|
||||
- manual script audit:`BLOCKED=0 REVIEW=5 PASS=13`
|
||||
- preflight 現況:
|
||||
- `PASS=7 WARN=1 BLOCKED=1`
|
||||
- `WARN exact_counts_scope` 是預期警告:已啟用 RLS 的表在 API pod 中只能做 tenant-visible count。
|
||||
- 剩餘 blocker 表:`audit_logs`、`awooop_outbound_message`、`awooop_projects`、`awooop_run_state`、`incidents`、`knowledge_entries`、`playbooks`。
|
||||
|
||||
**整體進度**:
|
||||
- Wave 0:MOMO PostgreSQL backup → AwoooP 失敗通知接線完成。
|
||||
- Wave 1:GitHub deploy 競爭停用、RLS live 驗證、role bootstrap、API runtime access path、manual script gate、Wave1 空表 canary、Wave1.1 MCP tool registry canary 已完成。
|
||||
- 尚未完成:token rotation、188 certbot 正式修復、剩餘 RLS waves、188 local Ollama 停用窗口。
|
||||
|
||||
**下一步**:
|
||||
- 先修 `awooop_projects` 的 platform-admin read path,再考慮啟用 projects RLS。
|
||||
- 下一批 RLS 候選不應直接跳高流量表;可先針對 `awooop_outbound_message` / `awooop_run_state` 做 query-path 與 rollback rehearsal,但需注意兩者持續新增資料。
|
||||
|
||||
## 2026-05-12 | RLS Canary Wave1 已套用
|
||||
|
||||
**背景**:上一輪已產出 `scripts/ops/awooop-rls-canary-wave1-empty-tables.sql` 與 rollback SQL;使用者批准後,本輪只套用六張 live preflight 顯示為空表的 Wave1 canary policy,不碰 `incidents` / `knowledge_entries` / `playbooks` / `audit_logs` 等高流量或非空表。
|
||||
|
||||
99
docs/runbooks/AWOOOP-RLS-CANARY-WAVE1-1.md
Normal file
99
docs/runbooks/AWOOOP-RLS-CANARY-WAVE1-1.md
Normal file
@@ -0,0 +1,99 @@
|
||||
# AwoooP RLS Canary Wave 1.1
|
||||
|
||||
This wave targets one low-row non-empty table:
|
||||
|
||||
- `awooop_mcp_tool_registry`
|
||||
|
||||
It does not include `awooop_projects`.
|
||||
|
||||
Status: applied to production on 2026-05-12.
|
||||
|
||||
## Why Not `awooop_projects`
|
||||
|
||||
`awooop_projects` has only two rows, but it is not safe to enable with a normal
|
||||
tenant-only policy yet. `platform_operator_service.list_tenants()` currently
|
||||
uses `get_db_context("awoooi")` while the API contract says Operator Console
|
||||
returns all projects. A tenant policy on `awooop_projects` would hide the
|
||||
`ewoooc` row from that endpoint.
|
||||
|
||||
Blocker before enabling `awooop_projects`:
|
||||
|
||||
- introduce an explicit platform-admin/bypass role path for Operator Console
|
||||
cross-tenant reads, or
|
||||
- redesign list-tenants semantics so it is intentionally tenant-scoped.
|
||||
|
||||
## Scope
|
||||
|
||||
Latest live evidence before apply:
|
||||
|
||||
```text
|
||||
awooop_mcp_tool_registry: ewoooc=4
|
||||
```
|
||||
|
||||
Runtime read path:
|
||||
|
||||
- `McpGateway._gate3_tool()` filters by `ctx.project_id`, `tool_name`, and
|
||||
`is_active`.
|
||||
|
||||
## Apply
|
||||
|
||||
```bash
|
||||
psql "$DATABASE_URL" -v ON_ERROR_STOP=1 \
|
||||
-f scripts/ops/awooop-rls-canary-wave1-1-tool-registry.sql
|
||||
```
|
||||
|
||||
The SQL aborts if:
|
||||
|
||||
- table is missing,
|
||||
- `project_id` is missing,
|
||||
- any `project_id` is NULL,
|
||||
- row count exceeds the reviewed canary cap of 20 rows.
|
||||
|
||||
## Verification
|
||||
|
||||
Expected after apply:
|
||||
|
||||
- no context: `awooop_mcp_tool_registry` reads/writes are denied or return no
|
||||
rows, depending on query shape and privilege path.
|
||||
- `app.project_id='ewoooc'`: the four active ewoooc tools are visible.
|
||||
- `app.project_id='awoooi'`: no ewoooc tools are visible.
|
||||
- global RLS preflight remains blocked only by later-wave tables.
|
||||
|
||||
## 2026-05-12 Production Evidence
|
||||
|
||||
Apply completed with `COMMIT` through the 188 postgres/operator socket path.
|
||||
|
||||
Post-apply relation state:
|
||||
|
||||
```text
|
||||
awooop_mcp_tool_registry|rls=true|force=true|policies=1|fail_open=false
|
||||
```
|
||||
|
||||
API pod behavior test:
|
||||
|
||||
```text
|
||||
tool_registry_no_context=0
|
||||
tool_registry_ewoooc_context=4
|
||||
tool_registry_awoooi_context=0
|
||||
tool_registry_insert_with_context=allowed_and_rolled_back
|
||||
tool_registry_probe_rows_after=0
|
||||
```
|
||||
|
||||
Operator/global count:
|
||||
|
||||
```text
|
||||
ewoooc|4
|
||||
```
|
||||
|
||||
Note: after RLS is enabled, API-pod `--exact-counts` are tenant-visible counts,
|
||||
not global counts. Use postgres/operator evidence for global row-count checks.
|
||||
|
||||
## Rollback
|
||||
|
||||
```bash
|
||||
psql "$DATABASE_URL" -v ON_ERROR_STOP=1 \
|
||||
-f scripts/ops/awooop-rls-canary-wave1-1-tool-registry-rollback.sql
|
||||
```
|
||||
|
||||
Rollback removes the Wave1.1 policy and disables RLS on
|
||||
`awooop_mcp_tool_registry`. It does not modify data.
|
||||
@@ -28,6 +28,19 @@ AWOOOP_RLS_SSH_TARGET=wooo@192.168.0.120 bash scripts/ops/awooop-rls-preflight.s
|
||||
|
||||
Exit code `2` means the gate is blocked and RLS must not be enabled yet.
|
||||
|
||||
## Exact Count Scope
|
||||
|
||||
After any target table has RLS enabled, `--exact-counts` runs as the production
|
||||
app DB user and is filtered by the current `app.project_id`. The output marks
|
||||
these rows with:
|
||||
|
||||
```text
|
||||
scope=rls_filtered project_context=...
|
||||
```
|
||||
|
||||
Treat those counts as tenant-visible evidence, not global row counts. Use a
|
||||
reviewed postgres/operator path for global counts after RLS is enabled.
|
||||
|
||||
## 2026-05-12 Initial Production Result
|
||||
|
||||
`--exact-counts` returned:
|
||||
|
||||
@@ -0,0 +1,14 @@
|
||||
-- Rollback for AwoooP RLS Canary Wave 1.1.
|
||||
-- This only removes the wave1.1 policy and disables RLS on the tool registry.
|
||||
-- It intentionally does not touch data.
|
||||
|
||||
BEGIN;
|
||||
|
||||
SET LOCAL lock_timeout = '5s';
|
||||
SET LOCAL statement_timeout = '30s';
|
||||
|
||||
DROP POLICY IF EXISTS awooop_mcp_tool_registry_tenant ON awooop_mcp_tool_registry;
|
||||
ALTER TABLE awooop_mcp_tool_registry NO FORCE ROW LEVEL SECURITY;
|
||||
ALTER TABLE awooop_mcp_tool_registry DISABLE ROW LEVEL SECURITY;
|
||||
|
||||
COMMIT;
|
||||
72
scripts/ops/awooop-rls-canary-wave1-1-tool-registry.sql
Normal file
72
scripts/ops/awooop-rls-canary-wave1-1-tool-registry.sql
Normal file
@@ -0,0 +1,72 @@
|
||||
-- AwoooP RLS Canary Wave 1.1: low-row MCP tool registry
|
||||
-- Date: 2026-05-12
|
||||
--
|
||||
-- Scope:
|
||||
-- - awooop_mcp_tool_registry
|
||||
--
|
||||
-- Why this table:
|
||||
-- Latest production exact count: 4 rows, all project_id='ewoooc'.
|
||||
-- Runtime read path is MCP Gateway Gate 3 and filters by ctx.project_id.
|
||||
--
|
||||
-- Why not awooop_projects in this wave:
|
||||
-- Operator Console list_tenants() currently expects cross-tenant visibility.
|
||||
-- A normal tenant policy would hide ewoooc when context is awoooi, so
|
||||
-- awooop_projects remains blocked until an explicit platform-admin DB path
|
||||
-- exists.
|
||||
--
|
||||
-- Safety:
|
||||
-- - fail-closed policy only; no NULL/empty-string app.project_id bypass.
|
||||
-- - aborts if target is missing project_id, has NULL project_id, or has
|
||||
-- more rows than the reviewed canary cap.
|
||||
-- - run with a migration/operator role, not through the production app role.
|
||||
|
||||
BEGIN;
|
||||
|
||||
SET LOCAL lock_timeout = '5s';
|
||||
SET LOCAL statement_timeout = '30s';
|
||||
|
||||
DO $$
|
||||
DECLARE
|
||||
total_rows bigint;
|
||||
null_project_rows bigint;
|
||||
BEGIN
|
||||
IF to_regclass('public.awooop_mcp_tool_registry') IS NULL THEN
|
||||
RAISE EXCEPTION 'RLS canary target table does not exist: awooop_mcp_tool_registry';
|
||||
END IF;
|
||||
|
||||
IF NOT EXISTS (
|
||||
SELECT 1
|
||||
FROM information_schema.columns
|
||||
WHERE table_schema = 'public'
|
||||
AND table_name = 'awooop_mcp_tool_registry'
|
||||
AND column_name = 'project_id'
|
||||
) THEN
|
||||
RAISE EXCEPTION 'RLS canary target missing project_id: awooop_mcp_tool_registry';
|
||||
END IF;
|
||||
|
||||
SELECT COUNT(*), COUNT(*) FILTER (WHERE project_id IS NULL)
|
||||
INTO total_rows, null_project_rows
|
||||
FROM awooop_mcp_tool_registry;
|
||||
|
||||
IF null_project_rows <> 0 THEN
|
||||
RAISE EXCEPTION 'RLS canary target has NULL project_id rows: %, nulls=%',
|
||||
'awooop_mcp_tool_registry', null_project_rows;
|
||||
END IF;
|
||||
|
||||
IF total_rows > 20 THEN
|
||||
RAISE EXCEPTION 'RLS canary wave1.1 reviewed cap exceeded: %, rows=%',
|
||||
'awooop_mcp_tool_registry', total_rows;
|
||||
END IF;
|
||||
END
|
||||
$$;
|
||||
|
||||
ALTER TABLE awooop_mcp_tool_registry ENABLE ROW LEVEL SECURITY;
|
||||
ALTER TABLE awooop_mcp_tool_registry FORCE ROW LEVEL SECURITY;
|
||||
DROP POLICY IF EXISTS mcp_tool_registry_tenant_isolation ON awooop_mcp_tool_registry;
|
||||
DROP POLICY IF EXISTS awooop_mcp_tool_registry_tenant ON awooop_mcp_tool_registry;
|
||||
CREATE POLICY awooop_mcp_tool_registry_tenant ON awooop_mcp_tool_registry
|
||||
FOR ALL TO awooop_app
|
||||
USING (project_id = current_setting('app.project_id', TRUE))
|
||||
WITH CHECK (project_id = current_setting('app.project_id', TRUE));
|
||||
|
||||
COMMIT;
|
||||
@@ -273,16 +273,35 @@ async def collect(exact_counts: bool) -> tuple[list[Check], dict[str, Any]]:
|
||||
quoted = '"' + row["table_name"].replace('"', '""') + '"'
|
||||
count_row = await rows(
|
||||
conn,
|
||||
f"SELECT :table_name AS table_name, COUNT(*) AS total_rows, COUNT(*) FILTER (WHERE project_id IS NULL) AS null_project_id_rows FROM {quoted}",
|
||||
{"table_name": row["table_name"]},
|
||||
f"""
|
||||
SELECT
|
||||
:table_name AS table_name,
|
||||
CAST(:rls_filtered AS boolean) AS rls_filtered,
|
||||
current_setting('app.project_id', TRUE) AS project_context,
|
||||
COUNT(*) AS total_rows,
|
||||
COUNT(*) FILTER (WHERE project_id IS NULL) AS null_project_id_rows
|
||||
FROM {quoted}
|
||||
""",
|
||||
{
|
||||
"table_name": row["table_name"],
|
||||
"rls_filtered": bool(row["rls_enabled"]),
|
||||
},
|
||||
)
|
||||
exact_rows.extend(count_row)
|
||||
evidence["exact_counts"] = exact_rows
|
||||
null_tables = [row["table_name"] for row in exact_rows if int(row["null_project_id_rows"]) > 0]
|
||||
rls_filtered_tables = [row["table_name"] for row in exact_rows if row.get("rls_filtered")]
|
||||
if null_tables:
|
||||
add(checks, "project_id_backfill", "BLOCKED", f"NULL project_id remains: {', '.join(null_tables)}")
|
||||
else:
|
||||
add(checks, "project_id_backfill", "PASS", "no NULL project_id rows in counted tables")
|
||||
if rls_filtered_tables:
|
||||
add(
|
||||
checks,
|
||||
"exact_counts_scope",
|
||||
"WARN",
|
||||
"counts for RLS-enabled tables are tenant-visible only; use operator role for global counts",
|
||||
)
|
||||
else:
|
||||
add(checks, "project_id_backfill", "WARN", "exact counts skipped; rerun with --exact-counts before enabling RLS")
|
||||
|
||||
@@ -325,9 +344,12 @@ def print_human(checks: list[Check], evidence: dict[str, Any]) -> None:
|
||||
)
|
||||
|
||||
for row in evidence.get("exact_counts", []):
|
||||
scope = "rls_filtered" if row.get("rls_filtered") else "global_visible"
|
||||
print(
|
||||
"count "
|
||||
f"{row['table_name']} "
|
||||
f"scope={scope} "
|
||||
f"project_context={row.get('project_context')} "
|
||||
f"total_rows={row['total_rows']} "
|
||||
f"null_project_id_rows={row['null_project_id_rows']}"
|
||||
)
|
||||
|
||||
Reference in New Issue
Block a user