Files
awoooi/docs/runbooks/AWOOOP-RLS-PREFLIGHT.md
Your Name b7af597459
All checks were successful
Code Review / ai-code-review (push) Successful in 10s
chore(rls): 套用 tool registry canary wave1.1
2026-05-12 21:15:14 +08:00

137 lines
4.6 KiB
Markdown

# AwoooP RLS Preflight Runbook
> Purpose: verify whether production is ready for PostgreSQL Row-Level Security
> without enabling RLS or changing data.
## Command
Default path runs the probe inside the production API pod through the 120
control-plane host. `DATABASE_URL` stays inside Kubernetes and is not printed.
```bash
bash scripts/ops/awooop-rls-preflight.sh
```
Before enabling RLS, run exact backfill counts:
```bash
bash scripts/ops/awooop-rls-preflight.sh --exact-counts
```
Useful variants:
```bash
bash scripts/ops/awooop-rls-preflight.sh --json
bash scripts/ops/awooop-rls-preflight.sh --local
AWOOOP_RLS_SSH_TARGET=wooo@192.168.0.120 bash scripts/ops/awooop-rls-preflight.sh
```
Exit code `2` means the gate is blocked and RLS must not be enabled yet.
## Exact Count Scope
After any target table has RLS enabled, `--exact-counts` runs as the production
app DB user and is filtered by the current `app.project_id`. The output marks
these rows with:
```text
scope=rls_filtered project_context=...
```
Treat those counts as tenant-visible evidence, not global row counts. Use a
reviewed postgres/operator path for global counts after RLS is enabled.
## 2026-05-12 Initial Production Result
`--exact-counts` returned:
- `PASS current_role_rls_enforced`: current DB user is `awoooi`, not superuser and not `BYPASSRLS`.
- `PASS project_context_set_config`: `set_config('app.project_id', 'awoooi', TRUE)` works in the API pod.
- `BLOCKED required_roles`: `awooop_app`, `awooop_platform_admin`, and `awooop_migration` do not exist.
- `WARN role_bootstrap_authority`: current API DB user `awoooi` is not `CREATEROLE`; role bootstrap requires `postgres` or a `CREATEROLE` operator.
- `WARN app_role_membership`: `awooop_app` is missing, so membership cannot be evaluated yet.
- `PASS project_id_columns`: every existing target table has `project_id`.
- `BLOCKED rls_enabled_forced_policy`: existing target tables are not yet RLS enabled, forced, or policied.
- `PASS fail_open_policies`: production DB currently has no fail-open policy expressions.
- `PASS project_id_backfill`: exact counts found zero `NULL project_id` rows in counted target tables.
Current blocker summary:
```text
PASS=5 WARN=2 BLOCKED=2
```
Important exact counts from the same run:
| Table | Rows | NULL project_id |
| --- | ---: | ---: |
| `audit_logs` | 686 | 0 |
| `awooop_mcp_tool_registry` | 4 | 0 |
| `awooop_outbound_message` | 235 | 0 |
| `awooop_projects` | 2 | 0 |
| `awooop_run_state` | 113 | 0 |
| `incidents` | 1518 | 0 |
| `knowledge_entries` | 2102 | 0 |
| `playbooks` | 220 | 0 |
## 2026-05-12 Role Bootstrap Applied
At `19:33 CST`, the manual role bootstrap was applied through the host
PostgreSQL socket as `postgres`. It did not enable RLS policies.
Post-bootstrap `--exact-counts` returned:
- `PASS current_role_rls_enforced`: current DB user is still `awoooi`, not
superuser and not `BYPASSRLS`.
- `PASS project_context_set_config`: `set_config('app.project_id', 'awoooi', TRUE)` works.
- `PASS required_roles`: `awooop_app`, `awooop_platform_admin`, and
`awooop_migration` now exist.
- `PASS app_role_membership`: current API DB user is a member of `awooop_app`.
- `PASS project_id_columns`: every existing target table has `project_id`.
- `BLOCKED rls_enabled_forced_policy`: target tables are still not RLS enabled,
forced, or policied.
- `PASS fail_open_policies`: no fail-open policy expressions detected.
- `PASS project_id_backfill`: exact counts found zero `NULL project_id` rows in
counted target tables.
Current blocker summary:
```text
PASS=7 WARN=0 BLOCKED=1
```
Updated exact counts:
| Table | Rows | NULL project_id |
| --- | ---: | ---: |
| `audit_logs` | 686 | 0 |
| `awooop_mcp_tool_registry` | 4 | 0 |
| `awooop_outbound_message` | 248 | 0 |
| `awooop_projects` | 2 | 0 |
| `awooop_run_state` | 126 | 0 |
| `incidents` | 1524 | 0 |
| `knowledge_entries` | 2103 | 0 |
| `playbooks` | 220 | 0 |
## Remediation Order
1. Verify all DB access paths use `get_db()` / `get_db_context()` or otherwise set
`app.project_id` before queries.
2. Apply policies first in staging or a canary DB.
3. In production, enable one batch at a time.
4. After each batch, run:
```bash
bash scripts/ops/awooop-rls-preflight.sh --exact-counts
```
5. Validate AwoooP Runs, Approvals, Monitoring, Tickets, Cost, alert ingestion,
background workers, and TelegramGateway mirror paths.
## Do Not
- Do not enable all policies in production before the role path is decided.
- Do not rely on fail-open `IS NULL` or empty-string policies as the target state.
- Do not run destructive rollback SQL unless the incident commander explicitly
approves it.