diff --git a/docs/LOGBOOK.md b/docs/LOGBOOK.md index ec0210aa..72f2f831 100644 --- a/docs/LOGBOOK.md +++ b/docs/LOGBOOK.md @@ -1,3 +1,52 @@ +## 2026-05-12 | 188 Ollama Gate 綠燈與 RLS Role Bootstrap 設計 + +**背景**:Wave 1 尚有兩個可收斂點:188 local Ollama 是否仍有 direct caller,以及 RLS roles 缺失如何安全補上。原則維持:只驗證與準備,不直接 uninstall 188 Ollama,不直接 production 熱開 RLS。 + +**188 Ollama retirement gate**: +- 執行 `POST_SINCE='24 hours ago' HEALTH_SINCE='10 minutes ago' scripts/ops/ollama188-retirement-gate.sh`。 +- 結果:`failures=0 warnings=0`。 +- PASS 項目: + - repo runtime 已無 `192.168.0.188:11434` / `ollama_188` 引用。 + - `awoooi-prod` live env:GCP-A `34.143.170.20`、GCP-B `34.21.145.224`、local fallback `192.168.0.111`,未指向 188。 + - `awoooi-dev` live env:走 110 proxy `11435/11436/11437`,未指向 188。 + - Prometheus live config 已無 188 Ollama target。 + - 188 `ollama.service` active,但 `OLLAMA_HOST=127.0.0.1:11434`,LAN `192.168.0.188:11434` 已拒絕。 + - 24 小時內沒有 `/api/generate`、`/api/chat`、`/v1/chat/completions` 推理 POST。 + - 近期未看到 121/dev health check 打 188。 +- 判讀:Claude 報告的「188 Local Ollama 還在跑」已驗證為 cleanup candidate,不是現行 production caller blocker;可以安排 Stop 階段,但不直接 uninstall。 +- 更新 `docs/runbooks/OLLAMA-188-RETIREMENT-GATE.md` 記錄 2026-05-12 24h gate 綠燈。 + +**RLS role bootstrap 補強**: +- `scripts/ops/awooop_rls_preflight.py` 補充: + - current DB user `rolcreaterole` / `rolcreatedb`。 + - required roles 是否存在,以及 current user 是否為 member。 + - app role membership gate:避免 policies `FOR awooop_app` 套上後 app connection user 不匹配。 + - target table owner,供後續 owner / FORCE RLS 評估。 +- 重新跑 `scripts/ops/awooop-rls-preflight.sh --json`: + - current user `awoooi` 不是 superuser、不是 `CREATEROLE`、不是 `BYPASSRLS`。 + - `awooop_app` / `awooop_platform_admin` / `awooop_migration` 仍不存在。 + - 新增 WARN:role bootstrap 需要 postgres / CREATEROLE operator;`awooop_app` 缺失,無法評估 app membership。 + - target tables owner 多為 `awoooi`,後續 policy/force RLS 可由 owner 路徑處理,但 CREATE ROLE 不能由 app DB user 完成。 +- 新增 `scripts/ops/awooop-rls-role-bootstrap.sql`: + - **不放在 `apps/api/migrations/`**,避免 Gitea auto-migration 用限權 migrator 嘗試 CREATE ROLE / BYPASSRLS。 + - 手動由 `postgres` 或 CREATEROLE operator 執行。 + - 建立 `awooop_app`、`awooop_platform_admin`、`awooop_migration` NOLOGIN group roles。 + - `awooop_platform_admin` / `awooop_migration` 設定 `BYPASSRLS`。 + - `GRANT awooop_app TO awoooi`,讓現行 app connection user 能匹配 `FOR awooop_app` policy,不需立即輪換 `DATABASE_URL`。 + - 若 `awoooi_migrator` 存在,授權 `awooop_migration` group;不建立密碼、不改 K8s Secret。 + - 對已存在 target tables 動態 grant `SELECT/INSERT/UPDATE/DELETE` 給 `awooop_app`;不啟用 RLS policy。 +- 已同步到 188 `/home/ollama/awoooi-ops/awooop-rls-role-bootstrap.sql`,只放檔、不執行。 + +**驗證**: +- `python3 -m py_compile scripts/ops/awooop_rls_preflight.py` → passed。 +- `bash -n scripts/ops/awooop-rls-preflight.sh scripts/ops/188-registry-certbot-fix.sh scripts/ops/ollama188-retirement-gate.sh` → passed。 +- 188 Ollama 24h gate → `failures=0 warnings=0`。 +- RLS preflight live run → blocked/warn 結果符合預期;未改 DB。 + +**下一步**: +- 由具 postgres / CREATEROLE 權限者審查後執行 `scripts/ops/awooop-rls-role-bootstrap.sql`,再重跑 `awooop-rls-preflight.sh --exact-counts`。 +- 188 Ollama 可進入 Stop 候選窗口;仍需保留服務與模型,不能 uninstall。 + ## 2026-05-12 | RLS Preflight 與 188 Registry Certbot 修復包 **背景**:Wave 1 已確認 production RLS 是 P0,但不可直接熱開;188 `registry.wooo.work` certbot 也已確認失效,但目前 `ollama` SSH 帳號沒有免密 sudo。這輪把兩個紅燈轉成可重跑、可交接、可審批的 remediation 前置包。 diff --git a/docs/runbooks/AWOOOP-RLS-PREFLIGHT.md b/docs/runbooks/AWOOOP-RLS-PREFLIGHT.md index 859de509..01be7b90 100644 --- a/docs/runbooks/AWOOOP-RLS-PREFLIGHT.md +++ b/docs/runbooks/AWOOOP-RLS-PREFLIGHT.md @@ -35,6 +35,8 @@ Exit code `2` means the gate is blocked and RLS must not be enabled yet. - `PASS current_role_rls_enforced`: current DB user is `awoooi`, not superuser and not `BYPASSRLS`. - `PASS project_context_set_config`: `set_config('app.project_id', 'awoooi', TRUE)` works in the API pod. - `BLOCKED required_roles`: `awooop_app`, `awooop_platform_admin`, and `awooop_migration` do not exist. +- `WARN role_bootstrap_authority`: current API DB user `awoooi` is not `CREATEROLE`; role bootstrap requires `postgres` or a `CREATEROLE` operator. +- `WARN app_role_membership`: `awooop_app` is missing, so membership cannot be evaluated yet. - `PASS project_id_columns`: every existing target table has `project_id`. - `BLOCKED rls_enabled_forced_policy`: existing target tables are not yet RLS enabled, forced, or policied. - `PASS fail_open_policies`: production DB currently has no fail-open policy expressions. @@ -43,7 +45,7 @@ Exit code `2` means the gate is blocked and RLS must not be enabled yet. Current blocker summary: ```text -PASS=5 WARN=0 BLOCKED=2 +PASS=5 WARN=2 BLOCKED=2 ``` Important exact counts from the same run: @@ -52,11 +54,11 @@ Important exact counts from the same run: | --- | ---: | ---: | | `audit_logs` | 686 | 0 | | `awooop_mcp_tool_registry` | 4 | 0 | -| `awooop_outbound_message` | 228 | 0 | +| `awooop_outbound_message` | 235 | 0 | | `awooop_projects` | 2 | 0 | -| `awooop_run_state` | 106 | 0 | +| `awooop_run_state` | 113 | 0 | | `incidents` | 1518 | 0 | -| `knowledge_entries` | 2099 | 0 | +| `knowledge_entries` | 2102 | 0 | | `playbooks` | 220 | 0 | ## Remediation Order @@ -67,6 +69,9 @@ Important exact counts from the same run: policies are enforced. - Do not create passworded LOGIN roles in a migration unless the K8s Secret rotation path is ready. + - Use `scripts/ops/awooop-rls-role-bootstrap.sql` only after review, and run + it manually as `postgres` or a `CREATEROLE` operator. It is intentionally + outside `apps/api/migrations/` so Gitea auto-migration will not run it. 2. Verify all DB access paths use `get_db()` / `get_db_context()` or otherwise set `app.project_id` before queries. 3. Apply policies first in staging or a canary DB. diff --git a/docs/runbooks/OLLAMA-188-RETIREMENT-GATE.md b/docs/runbooks/OLLAMA-188-RETIREMENT-GATE.md index 63579e3e..324e5153 100644 --- a/docs/runbooks/OLLAMA-188-RETIREMENT-GATE.md +++ b/docs/runbooks/OLLAMA-188-RETIREMENT-GATE.md @@ -112,3 +112,4 @@ curl -sS --max-time 3 http://192.168.0.188:11434/api/tags || echo LAN_CLOSED - 24 小時 Gate 尚未綠燈:仍看得到 `192.168.0.88` 在 24 小時內送過推理 POST。 - 2026-05-06 15:14 已執行臨時封口:188 只聽 `127.0.0.1:11434`;從本機、110、K8s Pod 直連 `192.168.0.188:11434` 均拒絕。 - 2026-05-06 15:22 已執行永久 systemd 修復:`ollama.service` active,override 固定 `OLLAMA_HOST=127.0.0.1:11434`,不再有 systemd restart loop。 +- 2026-05-12 18:35 重新跑 24 小時 Gate:repo runtime、`awoooi-prod`/`awoooi-dev` live env、Prometheus target、LAN exposure、24 小時推理 POST、dev health check 全部 PASS。判讀:188 Ollama 已是 stop candidate,但仍不直接 uninstall;依階段決策先保留服務、等待明確停用窗口。 diff --git a/scripts/ops/awooop-rls-role-bootstrap.sql b/scripts/ops/awooop-rls-role-bootstrap.sql new file mode 100644 index 00000000..160fd02e --- /dev/null +++ b/scripts/ops/awooop-rls-role-bootstrap.sql @@ -0,0 +1,86 @@ +-- AwoooP RLS role bootstrap. +-- +-- IMPORTANT: +-- - Do not put this file under apps/api/migrations; Gitea auto-migration should +-- not attempt CREATE ROLE / BYPASSRLS with the limited migrator account. +-- - Run manually as postgres or a CREATEROLE-capable operator after review. +-- - This script does not create passwords and does not change application +-- DATABASE_URL. It creates NOLOGIN group roles and grants awooop_app to the +-- current production app connection role (`awoooi`). +-- +-- Suggested command on 188: +-- sudo -u postgres psql -d awoooi_prod -v ON_ERROR_STOP=1 \ +-- -f /path/to/awooop-rls-role-bootstrap.sql +-- +-- Post-check: +-- bash scripts/ops/awooop-rls-preflight.sh --exact-counts + +BEGIN; + +DO $$ +BEGIN + IF NOT EXISTS (SELECT 1 FROM pg_roles WHERE rolname = 'awooop_app') THEN + EXECUTE 'CREATE ROLE awooop_app NOLOGIN'; + END IF; + + IF NOT EXISTS (SELECT 1 FROM pg_roles WHERE rolname = 'awooop_platform_admin') THEN + EXECUTE 'CREATE ROLE awooop_platform_admin NOLOGIN'; + END IF; + + IF NOT EXISTS (SELECT 1 FROM pg_roles WHERE rolname = 'awooop_migration') THEN + EXECUTE 'CREATE ROLE awooop_migration NOLOGIN'; + END IF; +END $$; + +ALTER ROLE awooop_platform_admin BYPASSRLS; +ALTER ROLE awooop_migration BYPASSRLS; + +-- Current production API connects as `awoooi`. Until DATABASE_URL is split to a +-- dedicated LOGIN role, make that role a member of the RLS-constrained group so +-- policies written as `FOR ALL TO awooop_app` apply without secret rotation. +GRANT awooop_app TO awoooi; + +-- Keep existing migration account usable without changing its password or +-- DATABASE_URL. The group role is NOLOGIN; operators may SET ROLE during manual +-- RLS migrations if needed. +DO $$ +BEGIN + IF EXISTS (SELECT 1 FROM pg_roles WHERE rolname = 'awoooi_migrator') THEN + EXECUTE 'GRANT awooop_migration TO awoooi_migrator'; + END IF; +END $$; + +-- Minimum grants for existing target tables that already have project_id. RLS +-- policies remain a separate staged migration and are not enabled here. +GRANT USAGE ON SCHEMA public TO awooop_app; + +DO $$ +DECLARE + table_name text; + target_tables text[] := ARRAY[ + 'incidents', + 'knowledge_entries', + 'playbooks', + 'audit_logs', + 'budget_ledger', + 'awooop_projects', + 'awooop_contract_revisions', + 'awooop_run_state', + 'awooop_mcp_tool_registry', + 'awooop_mcp_grants', + 'awooop_mcp_credential_refs', + 'awooop_mcp_gateway_audit', + 'awooop_conversation_event', + 'awooop_outbound_message' + ]; +BEGIN + FOREACH table_name IN ARRAY target_tables LOOP + IF to_regclass('public.' || table_name) IS NOT NULL THEN + EXECUTE format('GRANT SELECT, INSERT, UPDATE, DELETE ON public.%I TO awooop_app', table_name); + END IF; + END LOOP; +END $$; + +GRANT USAGE, SELECT, UPDATE ON ALL SEQUENCES IN SCHEMA public TO awooop_app; + +COMMIT; diff --git a/scripts/ops/awooop_rls_preflight.py b/scripts/ops/awooop_rls_preflight.py index 440c4bd8..63448fdd 100755 --- a/scripts/ops/awooop_rls_preflight.py +++ b/scripts/ops/awooop_rls_preflight.py @@ -85,6 +85,8 @@ async def collect(exact_counts: bool) -> tuple[list[Check], dict[str, Any]]: current_user AS current_user, session_user AS session_user, r.rolsuper AS current_user_superuser, + r.rolcreaterole AS current_user_createrole, + r.rolcreatedb AS current_user_createdb, r.rolbypassrls AS current_user_bypassrls FROM pg_roles r WHERE r.rolname = current_user @@ -125,8 +127,13 @@ async def collect(exact_counts: bool) -> tuple[list[Check], dict[str, Any]]: SELECT rr.rolname, r.rolsuper, + r.rolcreaterole, r.rolbypassrls, - r.oid IS NOT NULL AS exists + r.oid IS NOT NULL AS exists, + CASE + WHEN r.oid IS NULL THEN FALSE + ELSE pg_has_role(current_user, rr.rolname, 'member') + END AS current_user_is_member FROM required_roles rr LEFT JOIN pg_roles r ON r.rolname = rr.rolname ORDER BY rr.rolname @@ -141,6 +148,29 @@ async def collect(exact_counts: bool) -> tuple[list[Check], dict[str, Any]]: else: add(checks, "required_roles", "PASS", "all required RLS roles exist") + if not role.get("current_user_superuser") and not role.get("current_user_createrole") and missing_roles: + add( + checks, + "role_bootstrap_authority", + "WARN", + "current API DB user cannot create missing roles; bootstrap requires postgres/CREATEROLE", + ) + elif missing_roles: + add(checks, "role_bootstrap_authority", "PASS", "current DB user can create roles") + + app_role = next((row for row in roles if row["rolname"] == "awooop_app" and row["exists"]), None) + if app_role is None: + add(checks, "app_role_membership", "WARN", "awooop_app role missing; membership not evaluated") + elif app_role["current_user_is_member"]: + add(checks, "app_role_membership", "PASS", "current API DB user is member of awooop_app") + else: + add( + checks, + "app_role_membership", + "BLOCKED", + "current API DB user is not a member of awooop_app; policies FOR awooop_app would not apply", + ) + table_rows = await rows( conn, """ @@ -153,6 +183,7 @@ async def collect(exact_counts: bool) -> tuple[list[Check], dict[str, Any]]: c.oid, c.relrowsecurity, c.relforcerowsecurity, + pg_get_userbyid(c.relowner) AS table_owner, COALESCE(c.reltuples, 0)::bigint AS estimated_rows FROM target t LEFT JOIN pg_class c @@ -191,6 +222,7 @@ async def collect(exact_counts: bool) -> tuple[list[Check], dict[str, Any]]: COALESCE(ps.policy_count, 0) AS policy_count, COALESCE(ps.has_null_fail_open_policy, FALSE) AS has_null_fail_open_policy, COALESCE(ps.has_empty_string_fail_open_policy, FALSE) AS has_empty_string_fail_open_policy, + r.table_owner, r.estimated_rows FROM rels r LEFT JOIN project_columns pc ON pc.table_name = r.relname @@ -273,6 +305,7 @@ def print_human(checks: list[Check], evidence: dict[str, Any]) -> None: f"current_user={role.get('current_user')} " f"session_user={role.get('session_user')} " f"superuser={role.get('current_user_superuser')} " + f"createrole={role.get('current_user_createrole')} " f"bypassrls={role.get('current_user_bypassrls')}" ) @@ -287,6 +320,7 @@ def print_human(checks: list[Check], evidence: dict[str, Any]) -> None: f"policies={row['policy_count']} " f"fail_open_null={row['has_null_fail_open_policy']} " f"fail_open_empty={row['has_empty_string_fail_open_policy']} " + f"owner={row['table_owner']} " f"estimated_rows={row['estimated_rows']}" )