chore(rls): 套用 tool registry canary wave1.1
All checks were successful
Code Review / ai-code-review (push) Successful in 10s

This commit is contained in:
Your Name
2026-05-12 21:15:14 +08:00
parent 1617b73a9d
commit b7af597459
6 changed files with 281 additions and 2 deletions

View File

@@ -1,3 +1,62 @@
## 2026-05-12 | RLS Canary Wave1.1 已套用
**背景**Wave1 空表 canary 已完成後下一個候選是低行數非空表。Live preflight 顯示 `awooop_projects=2 rows``awooop_mcp_tool_registry=4 rows`;本輪先做 read-path 盤點再決定範圍。
**範圍校正**
- `awooop_projects` 暫不納入:
- `platform_operator_service.list_tenants()` 目前使用 `get_db_context("awoooi")`,但 API contract 寫明 Operator Console 要返回所有 projects。
- 若直接開 tenant policy`ewoooc` row 會被 `awoooi` context 隱藏,破壞 Operator Console 跨租戶視圖。
- 需先建立 platform-admin/bypass DB path 或重定義 list-tenants 語意。
- `awooop_mcp_tool_registry` 納入 Wave1.1
- live data`ewoooc=4`
- runtime read path`McpGateway._gate3_tool()``ctx.project_id` + `tool_name` + `is_active` 查詢。
**新增/更新 artifact**
- 新增 apply / rollback SQL
- `scripts/ops/awooop-rls-canary-wave1-1-tool-registry.sql`
- `scripts/ops/awooop-rls-canary-wave1-1-tool-registry-rollback.sql`
- 新增 `docs/runbooks/AWOOOP-RLS-CANARY-WAVE1-1.md`
- 更新 `scripts/ops/awooop_rls_preflight.py`
-`--exact-counts` 增加 `scope=rls_filtered|global_visible``project_context`
- 當已啟用 RLS 的表存在時,新增 `WARN exact_counts_scope`,避免把 app-role tenant-visible count 誤讀成全域 count。
**production apply**
- 已同步到 188 `/home/ollama/awoooi-ops/`
- `awooop-rls-canary-wave1-1-tool-registry.sql`
- `awooop-rls-canary-wave1-1-tool-registry-rollback.sql`
- 以 postgres/operator socket path 執行:
- Docker image`pgvector/pgvector:pg14`
- UID/GID`115:121` (`postgres:postgres`)
- DB`awoooi_prod`
- Apply result`COMMIT``awooop_mcp_tool_registry``ENABLE ROW LEVEL SECURITY` + `FORCE ROW LEVEL SECURITY` + fail-closed `FOR ALL TO awooop_app` policy。
**套用後驗證**:
- `awooop_mcp_tool_registry``rls=true force=true policies=1 fail_open=false`
- API pod behavior test
- `tool_registry_no_context=0`
- `tool_registry_ewoooc_context=4`
- `tool_registry_awoooi_context=0`
- `tool_registry_insert_with_context=allowed_and_rolled_back`
- `tool_registry_probe_rows_after=0`
- operator/global count → `ewoooc=4`
- production health `/api/v1/health` → 200 healthy。
- runtime/manual audits 仍為:
- runtime access audit`BLOCKED=0 ALLOW=10`
- manual script audit`BLOCKED=0 REVIEW=5 PASS=13`
- preflight 現況:
- `PASS=7 WARN=1 BLOCKED=1`
- `WARN exact_counts_scope` 是預期警告:已啟用 RLS 的表在 API pod 中只能做 tenant-visible count。
- 剩餘 blocker 表:`audit_logs``awooop_outbound_message``awooop_projects``awooop_run_state``incidents``knowledge_entries``playbooks`
**整體進度**
- Wave 0MOMO PostgreSQL backup → AwoooP 失敗通知接線完成。
- Wave 1GitHub deploy 競爭停用、RLS live 驗證、role bootstrap、API runtime access path、manual script gate、Wave1 空表 canary、Wave1.1 MCP tool registry canary 已完成。
- 尚未完成token rotation、188 certbot 正式修復、剩餘 RLS waves、188 local Ollama 停用窗口。
**下一步**
- 先修 `awooop_projects` 的 platform-admin read path再考慮啟用 projects RLS。
- 下一批 RLS 候選不應直接跳高流量表;可先針對 `awooop_outbound_message` / `awooop_run_state` 做 query-path 與 rollback rehearsal但需注意兩者持續新增資料。
## 2026-05-12 | RLS Canary Wave1 已套用
**背景**:上一輪已產出 `scripts/ops/awooop-rls-canary-wave1-empty-tables.sql` 與 rollback SQL使用者批准後本輪只套用六張 live preflight 顯示為空表的 Wave1 canary policy不碰 `incidents` / `knowledge_entries` / `playbooks` / `audit_logs` 等高流量或非空表。

View File

@@ -0,0 +1,99 @@
# AwoooP RLS Canary Wave 1.1
This wave targets one low-row non-empty table:
- `awooop_mcp_tool_registry`
It does not include `awooop_projects`.
Status: applied to production on 2026-05-12.
## Why Not `awooop_projects`
`awooop_projects` has only two rows, but it is not safe to enable with a normal
tenant-only policy yet. `platform_operator_service.list_tenants()` currently
uses `get_db_context("awoooi")` while the API contract says Operator Console
returns all projects. A tenant policy on `awooop_projects` would hide the
`ewoooc` row from that endpoint.
Blocker before enabling `awooop_projects`:
- introduce an explicit platform-admin/bypass role path for Operator Console
cross-tenant reads, or
- redesign list-tenants semantics so it is intentionally tenant-scoped.
## Scope
Latest live evidence before apply:
```text
awooop_mcp_tool_registry: ewoooc=4
```
Runtime read path:
- `McpGateway._gate3_tool()` filters by `ctx.project_id`, `tool_name`, and
`is_active`.
## Apply
```bash
psql "$DATABASE_URL" -v ON_ERROR_STOP=1 \
-f scripts/ops/awooop-rls-canary-wave1-1-tool-registry.sql
```
The SQL aborts if:
- table is missing,
- `project_id` is missing,
- any `project_id` is NULL,
- row count exceeds the reviewed canary cap of 20 rows.
## Verification
Expected after apply:
- no context: `awooop_mcp_tool_registry` reads/writes are denied or return no
rows, depending on query shape and privilege path.
- `app.project_id='ewoooc'`: the four active ewoooc tools are visible.
- `app.project_id='awoooi'`: no ewoooc tools are visible.
- global RLS preflight remains blocked only by later-wave tables.
## 2026-05-12 Production Evidence
Apply completed with `COMMIT` through the 188 postgres/operator socket path.
Post-apply relation state:
```text
awooop_mcp_tool_registry|rls=true|force=true|policies=1|fail_open=false
```
API pod behavior test:
```text
tool_registry_no_context=0
tool_registry_ewoooc_context=4
tool_registry_awoooi_context=0
tool_registry_insert_with_context=allowed_and_rolled_back
tool_registry_probe_rows_after=0
```
Operator/global count:
```text
ewoooc|4
```
Note: after RLS is enabled, API-pod `--exact-counts` are tenant-visible counts,
not global counts. Use postgres/operator evidence for global row-count checks.
## Rollback
```bash
psql "$DATABASE_URL" -v ON_ERROR_STOP=1 \
-f scripts/ops/awooop-rls-canary-wave1-1-tool-registry-rollback.sql
```
Rollback removes the Wave1.1 policy and disables RLS on
`awooop_mcp_tool_registry`. It does not modify data.

View File

@@ -28,6 +28,19 @@ AWOOOP_RLS_SSH_TARGET=wooo@192.168.0.120 bash scripts/ops/awooop-rls-preflight.s
Exit code `2` means the gate is blocked and RLS must not be enabled yet.
## Exact Count Scope
After any target table has RLS enabled, `--exact-counts` runs as the production
app DB user and is filtered by the current `app.project_id`. The output marks
these rows with:
```text
scope=rls_filtered project_context=...
```
Treat those counts as tenant-visible evidence, not global row counts. Use a
reviewed postgres/operator path for global counts after RLS is enabled.
## 2026-05-12 Initial Production Result
`--exact-counts` returned:

View File

@@ -0,0 +1,14 @@
-- Rollback for AwoooP RLS Canary Wave 1.1.
-- This only removes the wave1.1 policy and disables RLS on the tool registry.
-- It intentionally does not touch data.
BEGIN;
SET LOCAL lock_timeout = '5s';
SET LOCAL statement_timeout = '30s';
DROP POLICY IF EXISTS awooop_mcp_tool_registry_tenant ON awooop_mcp_tool_registry;
ALTER TABLE awooop_mcp_tool_registry NO FORCE ROW LEVEL SECURITY;
ALTER TABLE awooop_mcp_tool_registry DISABLE ROW LEVEL SECURITY;
COMMIT;

View File

@@ -0,0 +1,72 @@
-- AwoooP RLS Canary Wave 1.1: low-row MCP tool registry
-- Date: 2026-05-12
--
-- Scope:
-- - awooop_mcp_tool_registry
--
-- Why this table:
-- Latest production exact count: 4 rows, all project_id='ewoooc'.
-- Runtime read path is MCP Gateway Gate 3 and filters by ctx.project_id.
--
-- Why not awooop_projects in this wave:
-- Operator Console list_tenants() currently expects cross-tenant visibility.
-- A normal tenant policy would hide ewoooc when context is awoooi, so
-- awooop_projects remains blocked until an explicit platform-admin DB path
-- exists.
--
-- Safety:
-- - fail-closed policy only; no NULL/empty-string app.project_id bypass.
-- - aborts if target is missing project_id, has NULL project_id, or has
-- more rows than the reviewed canary cap.
-- - run with a migration/operator role, not through the production app role.
BEGIN;
SET LOCAL lock_timeout = '5s';
SET LOCAL statement_timeout = '30s';
DO $$
DECLARE
total_rows bigint;
null_project_rows bigint;
BEGIN
IF to_regclass('public.awooop_mcp_tool_registry') IS NULL THEN
RAISE EXCEPTION 'RLS canary target table does not exist: awooop_mcp_tool_registry';
END IF;
IF NOT EXISTS (
SELECT 1
FROM information_schema.columns
WHERE table_schema = 'public'
AND table_name = 'awooop_mcp_tool_registry'
AND column_name = 'project_id'
) THEN
RAISE EXCEPTION 'RLS canary target missing project_id: awooop_mcp_tool_registry';
END IF;
SELECT COUNT(*), COUNT(*) FILTER (WHERE project_id IS NULL)
INTO total_rows, null_project_rows
FROM awooop_mcp_tool_registry;
IF null_project_rows <> 0 THEN
RAISE EXCEPTION 'RLS canary target has NULL project_id rows: %, nulls=%',
'awooop_mcp_tool_registry', null_project_rows;
END IF;
IF total_rows > 20 THEN
RAISE EXCEPTION 'RLS canary wave1.1 reviewed cap exceeded: %, rows=%',
'awooop_mcp_tool_registry', total_rows;
END IF;
END
$$;
ALTER TABLE awooop_mcp_tool_registry ENABLE ROW LEVEL SECURITY;
ALTER TABLE awooop_mcp_tool_registry FORCE ROW LEVEL SECURITY;
DROP POLICY IF EXISTS mcp_tool_registry_tenant_isolation ON awooop_mcp_tool_registry;
DROP POLICY IF EXISTS awooop_mcp_tool_registry_tenant ON awooop_mcp_tool_registry;
CREATE POLICY awooop_mcp_tool_registry_tenant ON awooop_mcp_tool_registry
FOR ALL TO awooop_app
USING (project_id = current_setting('app.project_id', TRUE))
WITH CHECK (project_id = current_setting('app.project_id', TRUE));
COMMIT;

View File

@@ -273,16 +273,35 @@ async def collect(exact_counts: bool) -> tuple[list[Check], dict[str, Any]]:
quoted = '"' + row["table_name"].replace('"', '""') + '"'
count_row = await rows(
conn,
f"SELECT :table_name AS table_name, COUNT(*) AS total_rows, COUNT(*) FILTER (WHERE project_id IS NULL) AS null_project_id_rows FROM {quoted}",
{"table_name": row["table_name"]},
f"""
SELECT
:table_name AS table_name,
CAST(:rls_filtered AS boolean) AS rls_filtered,
current_setting('app.project_id', TRUE) AS project_context,
COUNT(*) AS total_rows,
COUNT(*) FILTER (WHERE project_id IS NULL) AS null_project_id_rows
FROM {quoted}
""",
{
"table_name": row["table_name"],
"rls_filtered": bool(row["rls_enabled"]),
},
)
exact_rows.extend(count_row)
evidence["exact_counts"] = exact_rows
null_tables = [row["table_name"] for row in exact_rows if int(row["null_project_id_rows"]) > 0]
rls_filtered_tables = [row["table_name"] for row in exact_rows if row.get("rls_filtered")]
if null_tables:
add(checks, "project_id_backfill", "BLOCKED", f"NULL project_id remains: {', '.join(null_tables)}")
else:
add(checks, "project_id_backfill", "PASS", "no NULL project_id rows in counted tables")
if rls_filtered_tables:
add(
checks,
"exact_counts_scope",
"WARN",
"counts for RLS-enabled tables are tenant-visible only; use operator role for global counts",
)
else:
add(checks, "project_id_backfill", "WARN", "exact counts skipped; rerun with --exact-counts before enabling RLS")
@@ -325,9 +344,12 @@ def print_human(checks: list[Check], evidence: dict[str, Any]) -> None:
)
for row in evidence.get("exact_counts", []):
scope = "rls_filtered" if row.get("rls_filtered") else "global_visible"
print(
"count "
f"{row['table_name']} "
f"scope={scope} "
f"project_context={row.get('project_context')} "
f"total_rows={row['total_rows']} "
f"null_project_id_rows={row['null_project_id_rows']}"
)