diff --git a/docs/LOGBOOK.md b/docs/LOGBOOK.md index b424c1ed..6c3f2e7c 100644 --- a/docs/LOGBOOK.md +++ b/docs/LOGBOOK.md @@ -1,3 +1,62 @@ +## 2026-05-12 | RLS Canary Wave1.1 已套用 + +**背景**:Wave1 空表 canary 已完成後,下一個候選是低行數非空表。Live preflight 顯示 `awooop_projects=2 rows`、`awooop_mcp_tool_registry=4 rows`;本輪先做 read-path 盤點再決定範圍。 + +**範圍校正**: +- `awooop_projects` 暫不納入: + - `platform_operator_service.list_tenants()` 目前使用 `get_db_context("awoooi")`,但 API contract 寫明 Operator Console 要返回所有 projects。 + - 若直接開 tenant policy,`ewoooc` row 會被 `awoooi` context 隱藏,破壞 Operator Console 跨租戶視圖。 + - 需先建立 platform-admin/bypass DB path 或重定義 list-tenants 語意。 +- `awooop_mcp_tool_registry` 納入 Wave1.1: + - live data:`ewoooc=4`。 + - runtime read path:`McpGateway._gate3_tool()` 依 `ctx.project_id` + `tool_name` + `is_active` 查詢。 + +**新增/更新 artifact**: +- 新增 apply / rollback SQL: + - `scripts/ops/awooop-rls-canary-wave1-1-tool-registry.sql` + - `scripts/ops/awooop-rls-canary-wave1-1-tool-registry-rollback.sql` +- 新增 `docs/runbooks/AWOOOP-RLS-CANARY-WAVE1-1.md`。 +- 更新 `scripts/ops/awooop_rls_preflight.py`: + - 對 `--exact-counts` 增加 `scope=rls_filtered|global_visible` 與 `project_context`。 + - 當已啟用 RLS 的表存在時,新增 `WARN exact_counts_scope`,避免把 app-role tenant-visible count 誤讀成全域 count。 + +**production apply**: +- 已同步到 188 `/home/ollama/awoooi-ops/`: + - `awooop-rls-canary-wave1-1-tool-registry.sql` + - `awooop-rls-canary-wave1-1-tool-registry-rollback.sql` +- 以 postgres/operator socket path 執行: + - Docker image:`pgvector/pgvector:pg14` + - UID/GID:`115:121` (`postgres:postgres`) + - DB:`awoooi_prod` +- Apply result:`COMMIT`,`awooop_mcp_tool_registry` 已 `ENABLE ROW LEVEL SECURITY` + `FORCE ROW LEVEL SECURITY` + fail-closed `FOR ALL TO awooop_app` policy。 + +**套用後驗證**: +- `awooop_mcp_tool_registry` → `rls=true force=true policies=1 fail_open=false`。 +- API pod behavior test: + - `tool_registry_no_context=0` + - `tool_registry_ewoooc_context=4` + - `tool_registry_awoooi_context=0` + - `tool_registry_insert_with_context=allowed_and_rolled_back` + - `tool_registry_probe_rows_after=0` +- operator/global count → `ewoooc=4`。 +- production health `/api/v1/health` → 200 healthy。 +- runtime/manual audits 仍為: + - runtime access audit:`BLOCKED=0 ALLOW=10` + - manual script audit:`BLOCKED=0 REVIEW=5 PASS=13` +- preflight 現況: + - `PASS=7 WARN=1 BLOCKED=1` + - `WARN exact_counts_scope` 是預期警告:已啟用 RLS 的表在 API pod 中只能做 tenant-visible count。 + - 剩餘 blocker 表:`audit_logs`、`awooop_outbound_message`、`awooop_projects`、`awooop_run_state`、`incidents`、`knowledge_entries`、`playbooks`。 + +**整體進度**: +- Wave 0:MOMO PostgreSQL backup → AwoooP 失敗通知接線完成。 +- Wave 1:GitHub deploy 競爭停用、RLS live 驗證、role bootstrap、API runtime access path、manual script gate、Wave1 空表 canary、Wave1.1 MCP tool registry canary 已完成。 +- 尚未完成:token rotation、188 certbot 正式修復、剩餘 RLS waves、188 local Ollama 停用窗口。 + +**下一步**: +- 先修 `awooop_projects` 的 platform-admin read path,再考慮啟用 projects RLS。 +- 下一批 RLS 候選不應直接跳高流量表;可先針對 `awooop_outbound_message` / `awooop_run_state` 做 query-path 與 rollback rehearsal,但需注意兩者持續新增資料。 + ## 2026-05-12 | RLS Canary Wave1 已套用 **背景**:上一輪已產出 `scripts/ops/awooop-rls-canary-wave1-empty-tables.sql` 與 rollback SQL;使用者批准後,本輪只套用六張 live preflight 顯示為空表的 Wave1 canary policy,不碰 `incidents` / `knowledge_entries` / `playbooks` / `audit_logs` 等高流量或非空表。 diff --git a/docs/runbooks/AWOOOP-RLS-CANARY-WAVE1-1.md b/docs/runbooks/AWOOOP-RLS-CANARY-WAVE1-1.md new file mode 100644 index 00000000..aa91ea96 --- /dev/null +++ b/docs/runbooks/AWOOOP-RLS-CANARY-WAVE1-1.md @@ -0,0 +1,99 @@ +# AwoooP RLS Canary Wave 1.1 + +This wave targets one low-row non-empty table: + +- `awooop_mcp_tool_registry` + +It does not include `awooop_projects`. + +Status: applied to production on 2026-05-12. + +## Why Not `awooop_projects` + +`awooop_projects` has only two rows, but it is not safe to enable with a normal +tenant-only policy yet. `platform_operator_service.list_tenants()` currently +uses `get_db_context("awoooi")` while the API contract says Operator Console +returns all projects. A tenant policy on `awooop_projects` would hide the +`ewoooc` row from that endpoint. + +Blocker before enabling `awooop_projects`: + +- introduce an explicit platform-admin/bypass role path for Operator Console + cross-tenant reads, or +- redesign list-tenants semantics so it is intentionally tenant-scoped. + +## Scope + +Latest live evidence before apply: + +```text +awooop_mcp_tool_registry: ewoooc=4 +``` + +Runtime read path: + +- `McpGateway._gate3_tool()` filters by `ctx.project_id`, `tool_name`, and + `is_active`. + +## Apply + +```bash +psql "$DATABASE_URL" -v ON_ERROR_STOP=1 \ + -f scripts/ops/awooop-rls-canary-wave1-1-tool-registry.sql +``` + +The SQL aborts if: + +- table is missing, +- `project_id` is missing, +- any `project_id` is NULL, +- row count exceeds the reviewed canary cap of 20 rows. + +## Verification + +Expected after apply: + +- no context: `awooop_mcp_tool_registry` reads/writes are denied or return no + rows, depending on query shape and privilege path. +- `app.project_id='ewoooc'`: the four active ewoooc tools are visible. +- `app.project_id='awoooi'`: no ewoooc tools are visible. +- global RLS preflight remains blocked only by later-wave tables. + +## 2026-05-12 Production Evidence + +Apply completed with `COMMIT` through the 188 postgres/operator socket path. + +Post-apply relation state: + +```text +awooop_mcp_tool_registry|rls=true|force=true|policies=1|fail_open=false +``` + +API pod behavior test: + +```text +tool_registry_no_context=0 +tool_registry_ewoooc_context=4 +tool_registry_awoooi_context=0 +tool_registry_insert_with_context=allowed_and_rolled_back +tool_registry_probe_rows_after=0 +``` + +Operator/global count: + +```text +ewoooc|4 +``` + +Note: after RLS is enabled, API-pod `--exact-counts` are tenant-visible counts, +not global counts. Use postgres/operator evidence for global row-count checks. + +## Rollback + +```bash +psql "$DATABASE_URL" -v ON_ERROR_STOP=1 \ + -f scripts/ops/awooop-rls-canary-wave1-1-tool-registry-rollback.sql +``` + +Rollback removes the Wave1.1 policy and disables RLS on +`awooop_mcp_tool_registry`. It does not modify data. diff --git a/docs/runbooks/AWOOOP-RLS-PREFLIGHT.md b/docs/runbooks/AWOOOP-RLS-PREFLIGHT.md index 8cb2a184..fa0b3392 100644 --- a/docs/runbooks/AWOOOP-RLS-PREFLIGHT.md +++ b/docs/runbooks/AWOOOP-RLS-PREFLIGHT.md @@ -28,6 +28,19 @@ AWOOOP_RLS_SSH_TARGET=wooo@192.168.0.120 bash scripts/ops/awooop-rls-preflight.s Exit code `2` means the gate is blocked and RLS must not be enabled yet. +## Exact Count Scope + +After any target table has RLS enabled, `--exact-counts` runs as the production +app DB user and is filtered by the current `app.project_id`. The output marks +these rows with: + +```text +scope=rls_filtered project_context=... +``` + +Treat those counts as tenant-visible evidence, not global row counts. Use a +reviewed postgres/operator path for global counts after RLS is enabled. + ## 2026-05-12 Initial Production Result `--exact-counts` returned: diff --git a/scripts/ops/awooop-rls-canary-wave1-1-tool-registry-rollback.sql b/scripts/ops/awooop-rls-canary-wave1-1-tool-registry-rollback.sql new file mode 100644 index 00000000..5fb56c23 --- /dev/null +++ b/scripts/ops/awooop-rls-canary-wave1-1-tool-registry-rollback.sql @@ -0,0 +1,14 @@ +-- Rollback for AwoooP RLS Canary Wave 1.1. +-- This only removes the wave1.1 policy and disables RLS on the tool registry. +-- It intentionally does not touch data. + +BEGIN; + +SET LOCAL lock_timeout = '5s'; +SET LOCAL statement_timeout = '30s'; + +DROP POLICY IF EXISTS awooop_mcp_tool_registry_tenant ON awooop_mcp_tool_registry; +ALTER TABLE awooop_mcp_tool_registry NO FORCE ROW LEVEL SECURITY; +ALTER TABLE awooop_mcp_tool_registry DISABLE ROW LEVEL SECURITY; + +COMMIT; diff --git a/scripts/ops/awooop-rls-canary-wave1-1-tool-registry.sql b/scripts/ops/awooop-rls-canary-wave1-1-tool-registry.sql new file mode 100644 index 00000000..ba64625d --- /dev/null +++ b/scripts/ops/awooop-rls-canary-wave1-1-tool-registry.sql @@ -0,0 +1,72 @@ +-- AwoooP RLS Canary Wave 1.1: low-row MCP tool registry +-- Date: 2026-05-12 +-- +-- Scope: +-- - awooop_mcp_tool_registry +-- +-- Why this table: +-- Latest production exact count: 4 rows, all project_id='ewoooc'. +-- Runtime read path is MCP Gateway Gate 3 and filters by ctx.project_id. +-- +-- Why not awooop_projects in this wave: +-- Operator Console list_tenants() currently expects cross-tenant visibility. +-- A normal tenant policy would hide ewoooc when context is awoooi, so +-- awooop_projects remains blocked until an explicit platform-admin DB path +-- exists. +-- +-- Safety: +-- - fail-closed policy only; no NULL/empty-string app.project_id bypass. +-- - aborts if target is missing project_id, has NULL project_id, or has +-- more rows than the reviewed canary cap. +-- - run with a migration/operator role, not through the production app role. + +BEGIN; + +SET LOCAL lock_timeout = '5s'; +SET LOCAL statement_timeout = '30s'; + +DO $$ +DECLARE + total_rows bigint; + null_project_rows bigint; +BEGIN + IF to_regclass('public.awooop_mcp_tool_registry') IS NULL THEN + RAISE EXCEPTION 'RLS canary target table does not exist: awooop_mcp_tool_registry'; + END IF; + + IF NOT EXISTS ( + SELECT 1 + FROM information_schema.columns + WHERE table_schema = 'public' + AND table_name = 'awooop_mcp_tool_registry' + AND column_name = 'project_id' + ) THEN + RAISE EXCEPTION 'RLS canary target missing project_id: awooop_mcp_tool_registry'; + END IF; + + SELECT COUNT(*), COUNT(*) FILTER (WHERE project_id IS NULL) + INTO total_rows, null_project_rows + FROM awooop_mcp_tool_registry; + + IF null_project_rows <> 0 THEN + RAISE EXCEPTION 'RLS canary target has NULL project_id rows: %, nulls=%', + 'awooop_mcp_tool_registry', null_project_rows; + END IF; + + IF total_rows > 20 THEN + RAISE EXCEPTION 'RLS canary wave1.1 reviewed cap exceeded: %, rows=%', + 'awooop_mcp_tool_registry', total_rows; + END IF; +END +$$; + +ALTER TABLE awooop_mcp_tool_registry ENABLE ROW LEVEL SECURITY; +ALTER TABLE awooop_mcp_tool_registry FORCE ROW LEVEL SECURITY; +DROP POLICY IF EXISTS mcp_tool_registry_tenant_isolation ON awooop_mcp_tool_registry; +DROP POLICY IF EXISTS awooop_mcp_tool_registry_tenant ON awooop_mcp_tool_registry; +CREATE POLICY awooop_mcp_tool_registry_tenant ON awooop_mcp_tool_registry + FOR ALL TO awooop_app + USING (project_id = current_setting('app.project_id', TRUE)) + WITH CHECK (project_id = current_setting('app.project_id', TRUE)); + +COMMIT; diff --git a/scripts/ops/awooop_rls_preflight.py b/scripts/ops/awooop_rls_preflight.py index 63448fdd..24441f47 100755 --- a/scripts/ops/awooop_rls_preflight.py +++ b/scripts/ops/awooop_rls_preflight.py @@ -273,16 +273,35 @@ async def collect(exact_counts: bool) -> tuple[list[Check], dict[str, Any]]: quoted = '"' + row["table_name"].replace('"', '""') + '"' count_row = await rows( conn, - f"SELECT :table_name AS table_name, COUNT(*) AS total_rows, COUNT(*) FILTER (WHERE project_id IS NULL) AS null_project_id_rows FROM {quoted}", - {"table_name": row["table_name"]}, + f""" + SELECT + :table_name AS table_name, + CAST(:rls_filtered AS boolean) AS rls_filtered, + current_setting('app.project_id', TRUE) AS project_context, + COUNT(*) AS total_rows, + COUNT(*) FILTER (WHERE project_id IS NULL) AS null_project_id_rows + FROM {quoted} + """, + { + "table_name": row["table_name"], + "rls_filtered": bool(row["rls_enabled"]), + }, ) exact_rows.extend(count_row) evidence["exact_counts"] = exact_rows null_tables = [row["table_name"] for row in exact_rows if int(row["null_project_id_rows"]) > 0] + rls_filtered_tables = [row["table_name"] for row in exact_rows if row.get("rls_filtered")] if null_tables: add(checks, "project_id_backfill", "BLOCKED", f"NULL project_id remains: {', '.join(null_tables)}") else: add(checks, "project_id_backfill", "PASS", "no NULL project_id rows in counted tables") + if rls_filtered_tables: + add( + checks, + "exact_counts_scope", + "WARN", + "counts for RLS-enabled tables are tenant-visible only; use operator role for global counts", + ) else: add(checks, "project_id_backfill", "WARN", "exact counts skipped; rerun with --exact-counts before enabling RLS") @@ -325,9 +344,12 @@ def print_human(checks: list[Check], evidence: dict[str, Any]) -> None: ) for row in evidence.get("exact_counts", []): + scope = "rls_filtered" if row.get("rls_filtered") else "global_visible" print( "count " f"{row['table_name']} " + f"scope={scope} " + f"project_context={row.get('project_context')} " f"total_rows={row['total_rows']} " f"null_project_id_rows={row['null_project_id_rows']}" )