Files
awoooi/docs/runbooks/awooop-partition-retention.md
Your Name 13e51802fe feat(awooop): Phase 0 全 ADR + Phase 1 control plane schema(含 critic 四項修正)
## Phase 0(文件層,全部 Accepted)
- ADR-106/107:AwoooP 平台架構 + 儲存策略
- ADR-111~118:Bootstrap → RLS 七項核心 ADR
- ADR-119~124:SAGA → Singleton Decomposition 六項 ADR
- ADR-UI-01~04:Operator Console 四個 UI ADR

## Phase 1(DB schema + migration)
- awooop_phase1_control_plane_2026-05-04.sql:7 張新表 + trigger + RLS
  - Step 1:三角色(platform_admin/migration BYPASSRLS,awooop_app 受 RLS)
  - Step 13:GRANT awooop_app 最小權限(7 條)
  - Step 14:RLS fail-closed,移除 __platform__ 後門
- awooop_phase1_batch1_rls_2026-05-04.sql:高流量四表三步式 ADD COLUMN
- awooop_phase1_batch1_backfill.py:SKIP LOCKED 分批回填腳本
- awooop_models.py:7 個 SQLAlchemy 2.x models

## Critic 修正(4 Critical + 3 Major)
- C-1:ADD CONSTRAINT IF NOT EXISTS → DO 塊 + pg_constraint 查詢
- C-2:__mapper_args__ 字串 list → primary_key=True on mapped_column
- C-3:__platform__ RLS 後門 → 全移除,改用 BYPASSRLS role
- C-4:awooop_app role 從未建立 → Step 1 + 7 條 GRANT
- M-1:active_pointer_guard SECURITY DEFINER(FORCE RLS 跨租戶保護)
- M-2:pg_partman create_parent 加冪等防護
- M-3:immutability trigger 新增身份欄位保護(project_id/family/contract_id)

## Task 1.2 修補
- agent_loader.py:硬編碼 Mac 路徑 → AGENTS_DIR 環境變數
- Dockerfile:補 COPY .claude/agents/

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 13:37:11 +08:00

5.2 KiB
Raw Blame History

AwoooP Partition & Retention Runbook

建立2026-05-04台北 ADR 依據ADR-114channel_event_dedupe、ADR-119run_state 關聯Phase 1 Task 1.4、Phase 4/7run_state / mcp_gateway_audit 建立時套用)


概覽

Partition 策略 Retention 建立 Phase
awooop_channel_event_dedupe RANGE by created_at(日) 7 天 Phase 1
awooop_run_state RANGE by created_at(月) 90 天 hot + 1 年 warm Phase 4
awooop_mcp_gateway_audit RANGE by created_at(月) 90 天 hot + 1 年 warm Phase 5
awooop_agent_audit_log RANGE by created_at(月) 90 天 hot + 1 年 warm Phase 7

1. awooop_channel_event_dedupePhase 1已完成

pg_partman 維護(建議)

-- 確認 pg_partman 已安裝
SELECT extname, extversion FROM pg_extension WHERE extname = 'pg_partman';

-- 初始化(若 Phase 1 migration 未自動完成)
SELECT partman.create_parent(
    p_parent_table := 'public.awooop_channel_event_dedupe',
    p_control      := 'created_at',
    p_type         := 'native',
    p_interval     := '1 day',
    p_premake      := 4
);

UPDATE partman.part_config
   SET retention = '7 days',
       retention_keep_table = false
 WHERE parent_table = 'public.awooop_channel_event_dedupe';

pg_partman 定期維護CronJob每天 00:00

# K8s CronJob 或 pg_cron 執行
psql $DATABASE_URL -c "SELECT partman.run_maintenance('public.awooop_channel_event_dedupe');"

手動維護(無 pg_partman

-- 查看現有 partition
SELECT inhrelid::regclass AS partition_name,
       pg_get_expr(c.relpartbound, c.oid) AS bounds
  FROM pg_inherits i
  JOIN pg_class c ON c.oid = i.inhrelid
 WHERE inhparent = 'awooop_channel_event_dedupe'::regclass
 ORDER BY bounds;

-- 建立下一週 partition每天執行一次
DO $$
DECLARE
    d DATE := CURRENT_DATE + 1;
BEGIN
    EXECUTE format(
        'CREATE TABLE IF NOT EXISTS awooop_channel_event_dedupe_%s
         PARTITION OF awooop_channel_event_dedupe
         FOR VALUES FROM (%L) TO (%L)',
        to_char(d, 'YYYYMMDD'),
        d::TIMESTAMPTZ,
        (d + INTERVAL '1 day')::TIMESTAMPTZ
    );
END $$;

-- 刪除 7 天前的 partition毫秒級遠優於 DELETE
DO $$
DECLARE
    old_date DATE := CURRENT_DATE - 8;
    partition_name TEXT := 'awooop_channel_event_dedupe_' || to_char(old_date, 'YYYYMMDD');
BEGIN
    EXECUTE 'DROP TABLE IF EXISTS ' || partition_name;
    RAISE NOTICE 'Dropped partition: %', partition_name;
END $$;

2. awooop_run_statePhase 4 建立時套用)

尚未建立。Phase 4 建立 awooop_run_state 表時使用以下模板。

月份 Partition 建立模板

CREATE TABLE awooop_run_state (
    run_id          UUID NOT NULL DEFAULT gen_random_uuid(),
    project_id      VARCHAR(64) NOT NULL,
    -- ... 其他欄位 ...
    created_at      TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    PRIMARY KEY (run_id, created_at)
) PARTITION BY RANGE (created_at);

-- pg_partman 初始化
SELECT partman.create_parent(
    p_parent_table := 'public.awooop_run_state',
    p_control      := 'created_at',
    p_type         := 'native',
    p_interval     := '1 month',
    p_premake      := 3
);

UPDATE partman.part_config
   SET retention              = '90 days',   -- hot tier: 90 天自動 DROP
       retention_keep_table   = true,         -- warm tier: 保留為 detached table
       retention_schema       = 'warm_archive'-- warm partition 移到此 schema
 WHERE parent_table = 'public.awooop_run_state';

Retention 策略

Tier 資料年齡 存放位置 清理方式
Hot 0~90 天 public.awooop_run_state_* pg_partman 自動管理
Warm 91 天~1 年 warm_archive.awooop_run_state_* 保留detach不 DROP
Cold > 1 年 S3 / GCS export可選 手動 COPY TO 後 DROP

3. awooop_mcp_gateway_audit + awooop_agent_audit_logPhase 5/7

awooop_run_state 相同策略(月份 partition90 天 hot + 1 年 warm

建立時直接套用同一 pg_partman 模板,替換表名即可。


健康檢查

-- 確認各 partition 的資料分佈
SELECT
    child.relname AS partition,
    pg_size_pretty(pg_total_relation_size(child.oid)) AS size,
    (SELECT count(*) FROM ONLY pg_class WHERE oid = child.oid) AS approx_rows
FROM pg_inherits
JOIN pg_class parent ON parent.oid = pg_inherits.inhparent
JOIN pg_class child  ON child.oid  = pg_inherits.inhrelid
WHERE parent.relname = 'awooop_channel_event_dedupe'
ORDER BY child.relname;

-- 確認最舊 partition應 <= 7 天前)
SELECT min(created_at), max(created_at) FROM awooop_channel_event_dedupe;

告警規則(建議加入 Prometheus

# partition 數量異常(應維持 7 ± 2 個)
- alert: AwoooPDedupPartitionCountAbnormal
  expr: |
    (SELECT count(*) FROM pg_inherits
     WHERE inhparent = 'awooop_channel_event_dedupe'::regclass) NOT BETWEEN 5 AND 10
  labels:
    severity: warning

# pg_partman maintenance 未執行(超過 25 小時)
- alert: AwoooPPartmanMaintenanceStale
  expr: time() - awooop_partman_last_run_timestamp > 90000
  labels:
    severity: warning