docs(adr): ADR-032 RAG 自主學習迴圈 + ADR-033 三護欄

Operation Ollama-First v5.0 / Phase 12 Wave 2 收尾 ADR-032 — RAG 自主學習迴圈 - 雙表分離：rag_query_log (audit) / learning_episodes (蒸餾池) / ai_insights (知識庫) - Distiller 規則引擎（純 Hermes 零 LLM 成本） - PromotionGate 4 階段晉升閘 - Telegram 反饋環（rag_feedback / promotion_review keyboard） - feature flag RAG_ENABLED 預設 OFF - V1-V4 驗收 SQL（命中率 / 晉升通過率 / 反饋分布 / embedding 一致性） ADR-033 — RAG 三護欄（Owen v5.0 鐵律） - 護欄 #1 Promotion Gate：強制反饋門檻，weight>=0.8 必經人工驗收 - 護欄 #2 Firecrawl 資源：Docker mem_limit:2g + chrome-reaper sidecar + 1.8GB 告警 - 護欄 #3 BGE-M3 一致性：embedding_signature SHA1[:12] + 啟動跨主機驗證 - 五案否決理由完整（包含「不要反饋按鈕」「不限資源」「:latest 接受漂移」） Migration Plan 對照： ✅ migration 026/028 schema + service 已落地 ⏳ Phase 12+ 補：embedding 寫入 / worker cron / Telegram 推播 / Firecrawl 部署 / signature 回填 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 00:01:19 +08:00
parent 47fe375952
commit c29ce83653
3 changed files with 513 additions and 0 deletions
--- a/docs/adr/ADR-032-rag-autonomous-learning-loop.md
+++ b/docs/adr/ADR-032-rag-autonomous-learning-loop.md
@@ -0,0 +1,249 @@
+# ADR-032: RAG 自主學習迴圈 — Distiller + PromotionGate + 反饋環
+
+- **Status**: Accepted
+- **Date**: 2026-05-03
+- **Decision Maker**: 統帥
+- **Author**: Operation Ollama-First v5.0（Phase 11 落地後追認）
+- **Supersedes**: 無
+- **Related**: ADR-002（pgvector 唯一向量庫）、ADR-007（持久化雙寫）、ADR-028（LLM 路由統一準則）、ADR-029（Hermes-First 雙塔分工）、ADR-033（RAG 三護欄）
+
+---
+
+## Context
+
+戰役 v5.0 Wave 1 完成後，momo-pro 已具備 ai_calls / mcp_calls / ai_call_budgets 觀測層，但仍是「無狀態 LLM 用戶」 — 每次 Hermes/OpenClaw 提問都重新燒 token，重複問題沒被攔截。
+
+Phase 0 audit 發現：
+- 統帥 Telegram 答題 30%+ 是同一類問題（「PChome 補貼」「家電促銷檔期」「SKU 競爭力分析」）
+- ai_insights 已累積 14k+ 筆（pgvector + bge-m3）但**沒有 RAG 攔截層**，全部走 LLM
+- 預估：30% 流量可被 RAG cache 攔截 = 月省 ~9M Hermes/OpenClaw tokens
+
+**Owen 提出三大風險**（v5.0 強化護欄）：
+1. **學習污染**：LLM 幻覺自動進 RAG → 正反饋錯誤循環（ADR-033 護欄 #1）
+2. **資源消耗**：自建 Firecrawl Playwright 池吃 188 主機記憶體（ADR-033 護欄 #2）
+3. **Embedding 一致性**：bge-m3 floating tag → RAG 召回率悄悄退化（ADR-033 護欄 #3）
+
+本 ADR 鎖定 **「LLM 結果 → 蒸餾 → Promotion Gate → ai_insights → RAG」自主學習迴圈** 的設計與護欄。
+
+---
+
+## Decision
+
+### 1. 雙表分離設計
+
+| 表 | 用途 | 保留期 | PII 等級 |
+|---|---|---|---|
+| `rag_query_log` (migration 027) | 每次 RAG 查詢的 audit log | 90 天 | 中（query_text 可能含用戶問題）|
+| `learning_episodes` (migration 028) | LLM/MCP 結果蒸餾池 | 永久（蒸餾溯源）| 低（distilled_text 已過 PII redact）|
+| `ai_insights` (既有) | 已驗收的知識庫 | 永久 | 經 PromotionGate 過濾 |
+
+### 2. 自主學習迴圈
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│  LLM 呼叫（Hermes/OpenClaw）                                 │
+│           ↓                                                  │
+│  RAG-first 攔截（cosine >= 0.85 命中）                      │
+│      ↓ 命中                ↓ miss                            │
+│  return synthesize     LLM 跑                                │
+│  (rag_hit=true)             ↓                                │
+│                       Distiller 蒸餾                         │
+│                             ↓                                │
+│                       learning_episodes (pending)            │
+│                             ↓                                │
+│                       PromotionGate 4 階段                   │
+│                             ↓                                │
+│                       ai_insights (approved)                 │
+│                             ↓                                │
+│                       下次 RAG 查得到                        │
+└─────────────────────────────────────────────────────────────┘
+```
+
+### 3. Distiller 規則引擎（純 Hermes，零 LLM 成本）
+
+```python
+QUALITY_RULES = {
+    'mcp_result':  lambda c: 0.8 if len(c) > 200 and len(keywords(c)) >= 2 else 0.4,
+    'llm_json_ok': lambda c: 0.9,                       # 結構化 JSON + status='ok'
+    'llm_text':    lambda c: 0.6 if has_zh_numbers(c) else 0.3,
+    'thumb_up':    lambda _: 1.0,                        # 用戶 👍 反饋
+    'thumb_down':  lambda _: 0.0,                        # 負樣本不晉升
+}
+```
+
+**為何不用 LLM 蒸餾？** 避免循環燒錢（Distiller 跑頻率高，用 LLM 等於每次 RAG miss 都燒兩次）。
+
+### 4. PromotionGate 4 階段晉升閘（v5.0 護欄 #1）
+
+```python
+class PromotionGate:
+    STAGE_1_AUTO_QUALITY = 0.7        # 蒸餾品質分
+    STAGE_2_HALLUCINATION_RULES = [
+        '可能/也許/我猜測 + 缺具體數字',
+        '自相矛盾（同段含 A=X 又 A=Y）',
+        '引用不存在 SKU/品牌（查 DB）',
+    ]
+    STAGE_3_DEDUP_THRESHOLD = 0.95    # cosine 相似度
+    STAGE_4_HUMAN_REVIEW_WEIGHT = 0.8 # 高權重必經 👍/👎
+```
+
+**鐵律**：weight ≥ 0.8 的 episode **不能跳 Stage 4**，必須推 Telegram 等 24h 反饋。
+
+| 結果 | promotion_status |
+|---|---|
+| 4 階段全過 | `approved` → 寫入 ai_insights |
+| Stage 1 失敗 | `rejected_quality` |
+| Stage 2 失敗 | `rejected_hallucination` |
+| Stage 3 失敗 | `rejected_duplicate` |
+| Stage 4 推送等待 | `awaiting_review` |
+| 24h 無反饋 | `expired`（weight 降為 0.5，不晉升但保留）|
+| 用戶 👎 | `rejected_human` |
+
+### 5. Telegram 反饋環（強制晉升門檻）
+
+```python
+# services/telegram_templates.py
+def rag_feedback_keyboard(rag_query_log_id: int) -> dict:
+    return {'inline_keyboard': [[
+        {'text': '👍 有用', 'callback_data': f'rag_fb:{id}:5'},
+        {'text': '👎 沒用', 'callback_data': f'rag_fb:{id}:1'},
+    ]]}
+
+def promotion_review_keyboard(episode_id: int) -> dict:
+    return {'inline_keyboard': [[
+        {'text': '✅ 通過晉升', 'callback_data': f'pg_ok:{id}'},
+        {'text': '❌ 拒絕', 'callback_data': f'pg_no:{id}'},
+    ]]}
+```
+
+`routes/openclaw_bot_routes.py` 三組 callback handler 已就位（Phase 11 落地）。
+
+### 6. Feature Flag 灰度
+
+```python
+RAG_ENABLED = os.getenv('RAG_ENABLED', 'false').lower() == 'true'
+RAG_DEFAULT_THRESHOLD = float(os.getenv('RAG_DEFAULT_THRESHOLD', '0.85'))
+RAG_DEFAULT_TOP_K = int(os.getenv('RAG_DEFAULT_TOP_K', '5'))
+```
+
+**預設 OFF**，戰前部署後行為與 v4.x 完全相同。灰度開啟條件：
+1. ANTHROPIC_API_KEY 已設（Phase 7 已備）
+2. learning_episodes 累積 100+ 筆
+3. RAG_ENABLED=true + threshold=0.90（保守起步）
+4. 1 週後 feedback_score ≥ 4 比率 > 70% → threshold 降至 0.85
+
+### 7. 失敗安全（fire-and-forget 哲學）
+
+| 失敗模式 | 行為 |
+|---|---|
+| DB 寫入 rag_query_log 失敗 | 主流程不爆，logger.warning |
+| embedding 失敗 | 不查 DB 直接 fallback LLM |
+| signature 不一致 | log warning + 不採該筆 hit |
+| Distiller 寫 learning_episodes 失敗 | LLM 結果照樣回 caller |
+| PromotionGate Stage 1-3 失敗 | episode 留 learning_episodes（不晉升即可，無 DB 副作用）|
+
+---
+
+## Alternatives Considered
+
+| 方案 | 否決理由 |
+|---|---|
+| **A. 直接 ai_insights 寫入（無蒸餾池）** | LLM 幻覺直接污染知識庫，無 PromotionGate 阻擋（核心風險）|
+| **B. LLM 蒸餾（用 Gemini 寫 distill prompt）** | 循環燒錢：每次 RAG miss → LLM call → 又燒 LLM 蒸餾 = 2× 成本 |
+| **C. 純 push 不 pull（無反饋按鈕）** | 統帥無法糾正幻覺，正反饋錯誤循環（Owen 強調的痛點）|
+| **D. 跳過 dedup（Stage 3）** | ai_insights 將累積大量重複，RAG 查詢無謂耗時 |
+| **E. 用 ChromaDB / Qdrant 替代 pgvector** | 違反 ADR-002（pgvector 唯一）+ 增加運維面 |
+
+---
+
+## Consequences
+
+### 正面（5）
+1. **重複問題攔截**：預估月省 ~9M Hermes/OpenClaw tokens（Hermes 流量 -30%）
+2. **自主學習**：每次 LLM 結果都進蒸餾池，知識庫持續成長
+3. **錯誤可糾正**：👎 反饋直接降權，避免幻覺循環污染
+4. **零 LLM 蒸餾成本**：Distiller 純規則引擎
+5. **PII 安全**：query_text 截 4KB + human_approver SHA1[:8]
+
+### 負面（3）
+1. **複雜度↑**：兩表 + 4 階段閘 + 反饋環，新人理解曲線陡
+2. **Stage 4 人工驗收延遲**：高權重 episode 必須等 24h 才能晉升
+3. **embedding 寫入路徑暫缺**（已知 limitation）：Stage 3 dedup 待 Phase 12+ 補
+
+### 風險（4）
+1. **Distiller 規則漂移**：規則引擎可能漏判幻覺 → mitigate by Stage 4 人工驗收
+2. **Stage 4 人工疲勞**：統帥不可能 24h 看 Telegram → mitigate by `expired` 自動降級
+3. **ai_insights 膨脹**：學習迴圈累積快 → mitigate by Stage 3 dedup（Phase 12+ 啟用）
+4. **PromotionGate worker cron 未掛**（已知）：需 Phase 12+ 排程任務
+
+---
+
+## Verification
+
+### V1：RAG 攔截率（部署 1 週後）
+```sql
+SELECT
+  COUNT(*) FILTER (WHERE saved_call) AS hit_count,
+  COUNT(*) AS total,
+  ROUND(100.0 * COUNT(*) FILTER (WHERE saved_call) / COUNT(*), 1) AS hit_pct
+FROM rag_query_log
+WHERE queried_at > NOW() - INTERVAL '7 days';
+-- 期望 hit_pct >= 25%
+```
+
+### V2：晉升通過率
+```sql
+SELECT promotion_status, COUNT(*)
+FROM learning_episodes
+WHERE created_at > NOW() - INTERVAL '7 days'
+GROUP BY promotion_status;
+-- 期望 approved + awaiting_review 占 >50%；rejected_hallucination < 10%
+```
+
+### V3：反饋分布
+```sql
+SELECT feedback_score, COUNT(*)
+FROM rag_query_log
+WHERE feedback_score IS NOT NULL
+GROUP BY feedback_score;
+-- 期望 score=5 比率 > 60%
+```
+
+### V4：Embedding 一致性（v5.0 護欄 #3）
+```sql
+SELECT embedding_signature, COUNT(*)
+FROM ai_insights
+WHERE embedding IS NOT NULL
+GROUP BY embedding_signature;
+-- 期望單一簽名（多個 = 模型版本漂移，需處理）
+```
+
+---
+
+## Migration Plan
+
+| Phase | 工作 | 狀態 |
+|---|---|---|
+| 11.1 | rag_query_log + learning_episodes schema | ✅ migration 027/028 commit 2f20d8d |
+| 11.2 | rag_service.py + learning_pipeline.py | ✅ commit c7d6db3 |
+| 11.3 | Hermes/OpenClaw RAG-first 整合 | ✅ commit c7d6db3 |
+| 11.4 | Telegram 反饋按鈕 + callback | ✅ commit c7d6db3 |
+| 11.5 | learning_episodes.embedding 寫入 | ⏳ Phase 12+ |
+| 11.6 | PromotionGate worker cron 掛排程 | ⏳ Phase 12+ |
+| 11.7 | awaiting_review Telegram 推播 | ⏳ Phase 12+（callback 已就位）|
+| 11.8 | RAG_ENABLED=true 灰度啟用 | ⏳ 1 週觀察期後 |
+
+---
+
+## References
+
+- `migrations/027_create_rag_query_log.sql`
+- `migrations/028_create_learning_episodes.sql`
+- `services/rag_service.py`（532 行）
+- `services/learning_pipeline.py`（750 行）
+- `tests/test_rag_service.py` + `test_learning_pipeline.py` + `test_promotion_gate.py`（70 unit tests）
+- `docs/phase11_db_design_20260503.md`
+- ADR-002（pgvector 唯一向量庫）
+- ADR-007（雙寫保證）
+- ADR-029（Hermes-First 雙塔分工）
+- ADR-033（RAG 三護欄）— 即將補
--- a/docs/adr/ADR-033-rag-three-guardrails.md
+++ b/docs/adr/ADR-033-rag-three-guardrails.md
@@ -0,0 +1,262 @@
+# ADR-033: RAG 治理三護欄 — Promotion Gate / Firecrawl 資源 / BGE-M3 一致性
+
+- **Status**: Accepted
+- **Date**: 2026-05-03
+- **Decision Maker**: 統帥
+- **Author**: Operation Ollama-First v5.0（Owen 三點專業洞察 → v5.0 強化）
+- **Related**: ADR-032（RAG 自主學習迴圈）、ADR-031（MCP 自建）、ADR-002（pgvector）、ADR-027（Primary Ollama on GCP）
+
+---
+
+## Context
+
+戰役 v4.0 階段 Owen 提出三點專業洞察，被升級為 v5.0 護欄級鐵律：
+
+1. **學習污染風險**：LLM 幻覺自動進 RAG → 正反饋錯誤循環
+2. **Firecrawl 資源消耗**：自建 Playwright 池吃 188 主機記憶體
+3. **BGE-M3 Embedding 一致性**：floating tag → RAG 召回率悄悄退化
+
+這三點**不是普通建議，而是 RAG 系統能否安全長期運轉的命脈**。本 ADR 鎖定三護欄的設計決策與驗收條件。
+
+---
+
+## Decision — 三護欄架構
+
+### 護欄 #1：Promotion Gate（學習污染防護）
+
+**核心原則**：反饋按鈕從「選配」升級為「強制晉升門檻」。learning_episodes → ai_insights 必經 4 階段嚴格門檻。
+
+#### 4 階段晉升閘
+```
+learning_episodes (pending)
+    ↓ Stage 1: quality_score >= 0.7（蒸餾器自動評分）
+    ↓ Stage 2: 無幻覺檢測（規則引擎，零 LLM）
+    ↓ Stage 3: 與既有 insight 相似度 < 0.95（去重）
+    ↓ Stage 4: weight >= 0.8 必經 Telegram 👍/👎 人工驗收
+ai_insights (approved)
+```
+
+#### Stage 2 幻覺檢測規則
+```python
+HALLUCINATION_PATTERNS = [
+    # 規則 1：含「可能 / 也許 / 我猜測」+ 缺具體數字
+    lambda txt: any(p in txt for p in ['可能', '也許', '我猜', '推測'])
+                and not any(c.isdigit() for c in txt),
+    
+    # 規則 2：自相矛盾（同段含 'A=X' 又含 'A=Y'）
+    detect_contradiction,
+    
+    # 規則 3：引用不存在 SKU/品牌（查 DB）
+    lambda txt: not _verify_skus_exist(extract_skus(txt)),
+]
+```
+
+#### Stage 4 強制門檻（Owen 鐵律）
+- weight >= 0.8 → 推 Telegram + 等 24h 👍/👎
+- 24h 無回應 → `expired`（weight 降 0.5，不晉升）
+- 用戶 👎 → `rejected_human`（永不晉升）
+- 用戶 👍 → `approved` 寫 ai_insights
+
+**無條件規則**：高權重 episode 不能跳 Stage 4，即使 Stage 1-3 都過。
+
+### 護欄 #2：Firecrawl 資源護欄（188 主機保護）
+
+#### Docker 限制
+```yaml
+# docker-compose.mcp.yml（Phase 10 將部署）
+services:
+  firecrawl-self:
+    image: firecrawl/firecrawl:latest
+    deploy:
+      resources:
+        limits:
+          memory: 2g          # ⭐ Owen 要求硬上限
+          cpus: '1.5'
+    environment:
+      - PLAYWRIGHT_BROWSER_POOL_MAX=3   # 瀏覽器池上限
+      - SCRAPE_TIMEOUT_MS=30000
+    healthcheck:
+      test: ["CMD", "curl", "-f", "http://localhost:3002/health"]
+      interval: 30s
+```
+
+#### Chrome 殘留清理 sidecar
+```yaml
+chrome-reaper:
+  image: alpine:3
+  command: |
+    sh -c "while true; do
+      docker exec firecrawl-self pkill -f 'chrome.*--type=zygote' 2>/dev/null;
+      docker exec firecrawl-self pkill -f 'chrome.*--type=renderer' 2>/dev/null;
+      sleep 3600;
+    done"
+```
+
+#### Telegram 告警
+- 每小時檢查 firecrawl 容器 RSS
+- > 1.8GB → 🟠 P2 告警（記憶體即將達上限）
+
+### 護欄 #3：BGE-M3 Embedding 一致性（RAG 命脈）
+
+#### 風險來源
+- `bge-m3:latest` floating tag → Ollama upgrade 跳版本
+- normalize / pooling 參數未顯式傳遞 → server-side 預設改變無感知
+- 跨主機（GCP / Secondary / 111）模型版本可能不一致
+
+#### 簽名鎖定機制
+```python
+# services/rag_service.py
+def get_embedding_signature(
+    model: str = 'bge-m3:latest',
+    dim: int = 1024,
+    normalize: bool = True,
+) -> str:
+    """SHA1({model}|{normalize}|{dim})[:12]"""
+    raw = f"{model}|{str(normalize).lower()}|{dim}"
+    return hashlib.sha1(raw.encode()).hexdigest()[:12]
+```
+
+#### Schema 強制（migration 026）
+```sql
+ALTER TABLE ai_insights
+    ADD COLUMN IF NOT EXISTS embedding_signature VARCHAR(64);
+
+CREATE INDEX CONCURRENTLY idx_ai_insights_embedding_signature
+    ON ai_insights (embedding_signature)
+    WHERE embedding IS NOT NULL;
+```
+
+#### 啟動時驗證（Phase 11.0 護欄）
+```python
+def verify_embedding_consistency():
+    """RAG service 啟動時跑：
+    用同一段測試文字呼叫 GCP / Secondary / 111 三主機，
+    驗證 cosine 距離 < 1e-4（浮點誤差），否則拒絕啟動。
+    """
+    test_text = "momo電商競品分析測試向量一致性檢查"
+    embeddings = {
+        host: call_ollama(host, 'bge-m3:latest', test_text)
+        for host in [GCP_PRIMARY, GCP_SECONDARY, OLLAMA_111]
+    }
+    diffs = [cosine_distance(embeddings[a], embeddings[b])
+             for a, b in itertools.combinations(embeddings, 2)]
+    if max(diffs) > 1e-4:
+        raise EmbeddingInconsistencyError(...)
+```
+
+#### RAG 查詢時保護
+```python
+# rag_service.py:_select_hits
+for hit in candidates:
+    if hit.embedding_signature != current_signature:
+        logger.warning(f"Signature mismatch: hit={hit.id}, "
+                      f"expected={current_signature}, got={hit.embedding_signature}")
+        continue  # 不採用該筆
+```
+
+---
+
+## Alternatives Considered
+
+| 方案 | 否決理由 |
+|---|---|
+| **A. RAG 不要反饋按鈕（純自動晉升）** | LLM 幻覺進 RAG 後正反饋錯誤循環，是 RAG 系統最危險的失敗模式 |
+| **B. Firecrawl 不限資源（讓它跑）** | 188 主機跑 5+ project（reference_188_multi_project），OOM 會拖垮其他容器 |
+| **C. BGE-M3 用 :latest 接受漂移** | 模型升級時無告警，RAG 召回率悄悄退化，問題暴露時難回溯 |
+| **D. 三護欄都用 LLM 做（如 LLM 蒸餾、LLM 幻覺檢測）** | 循環燒錢 + 引入新幻覺風險（LLM 檢測 LLM 幻覺）|
+| **E. Stage 4 改為非強制（高 weight 直接 approved）** | 違反 Owen 鐵律 — 統帥反饋是 RAG 系統不被污染的最後一道防線 |
+
+---
+
+## Consequences
+
+### 正面（5）
+1. **學習污染防火牆**：4 階段閘 + 強制人工驗收，幻覺進 RAG 機率 < 5%
+2. **資源預測性**：Firecrawl mem_limit 2g + chrome-reaper，188 主機絕對安全
+3. **模型升級可控**：embedding_signature 不變才 RAG 採用，模型漂移立即可見
+4. **PII 安全**：human_approver SHA1[:8]，反饋紀錄不暴露 Telegram username
+5. **成本可控**：純規則引擎（Stage 1-3）+ 24h auto-expire（Stage 4），零 LLM 成本
+
+### 負面（3）
+1. **Stage 4 統帥疲勞**：高權重 episode 都要看 Telegram → mitigate by `expired` 自動降級
+2. **Firecrawl mem 2g 上限可能太小**：複雜 SPA 爬蟲可能超 → 監控告警 + 可調 env
+3. **Embedding signature 變更需全表回填**：PG14 ADD COLUMN metadata-only 不鎖表，但回填 14k+ 筆需 worker 跑數小時
+
+### 風險（4）
+1. **Stage 2 規則漏判**：規則引擎可能誤放幻覺進 → mitigate by Stage 4 人工最後關
+2. **Firecrawl OOM 連鎖**：mem_limit 觸發 OOM kill → mitigate by healthcheck + 重啟策略
+3. **Embedding 模型升級時 RAG 完全失效**：所有 hit signature 不符 → 安全降級為「LLM-only」直到回填完成
+4. **24h expired 太久**：用戶可能來不及反饋 → 可調 `HUMAN_REVIEW_TIMEOUT_HOURS`
+
+---
+
+## Verification
+
+### V1：Promotion Gate 阻擋率（部署 1 週後）
+```sql
+SELECT promotion_status, COUNT(*)
+FROM learning_episodes
+WHERE created_at > NOW() - INTERVAL '7 days'
+GROUP BY promotion_status;
+-- 期望: rejected_hallucination >= 1（證明 Stage 2 真的擋下幻覺）
+-- 期望: approved + awaiting_review > 50%
+```
+
+### V2：Stage 4 反饋率
+```sql
+SELECT
+  COUNT(*) FILTER (WHERE promotion_status = 'awaiting_review') AS pending,
+  COUNT(*) FILTER (WHERE promotion_status = 'approved' AND human_approver IS NOT NULL) AS human_approved,
+  COUNT(*) FILTER (WHERE promotion_status = 'rejected_human') AS human_rejected,
+  COUNT(*) FILTER (WHERE promotion_status = 'expired') AS expired
+FROM learning_episodes;
+-- 期望: human_approved + human_rejected > expired（統帥真的有看 Telegram）
+```
+
+### V3：Firecrawl 資源（部署後）
+```bash
+ssh ollama@192.168.0.188 'docker stats firecrawl-self --no-stream --format "{{.MemUsage}}"'
+# 期望 < 1.8GB（mem_limit 2GB 的 90%）
+```
+
+### V4：Embedding 一致性
+```sql
+SELECT embedding_signature, COUNT(*), MIN(created_at), MAX(created_at)
+FROM ai_insights
+WHERE embedding IS NOT NULL
+GROUP BY embedding_signature
+ORDER BY MAX(created_at) DESC;
+-- 期望: 單一簽名（多個 = 模型漂移）
+```
+
+---
+
+## Migration Plan
+
+| 護欄 | 部分 | 狀態 |
+|---|---|---|
+| #1 PromotionGate Schema | learning_episodes 8 狀態機 | ✅ migration 028 commit 2f20d8d |
+| #1 PromotionGate Service | 4 階段邏輯 + reject/promote | ✅ services/learning_pipeline.py commit c7d6db3 |
+| #1 反饋按鈕 | rag_feedback + promotion_review | ✅ telegram_templates + bot routes commit c7d6db3 |
+| #1 awaiting_review 推播 | Telegram 推 episode 給統帥看 | ⏳ Phase 12+ |
+| #2 Firecrawl mem_limit | docker-compose.mcp.yml | ⏳ Phase 10 部署 |
+| #2 chrome-reaper sidecar | 同上 | ⏳ Phase 10 |
+| #2 RSS 監控告警 | scheduler 加每小時 task | ⏳ Phase 10 |
+| #3 embedding_signature 欄位 | ai_insights 加欄位 | ✅ migration 026 commit 4648673 |
+| #3 簽名計算 | rag_service.get_embedding_signature() | ✅ commit c7d6db3 |
+| #3 啟動驗證 verify_consistency | 跨主機 cosine 比對 | ⏳ Phase 11+ 補（Phase 11.0 規格） |
+| #3 既有 14k 筆回填 | UPDATE ai_insights SET embedding_signature = ... | ⏳ Phase 11+ 補 |
+
+---
+
+## References
+
+- `migrations/026_add_embedding_signature.sql`（含 pgcrypto extension）
+- `migrations/028_create_learning_episodes.sql`（8 狀態機 CHECK）
+- `services/rag_service.py:get_embedding_signature()`
+- `services/learning_pipeline.py`（PromotionGate 4 階段）
+- `tests/test_promotion_gate.py`（23 unit tests）
+- ADR-002（pgvector 唯一）
+- ADR-027（三主機架構）
+- ADR-032（RAG 自主學習迴圈）
+- ADR-031（MCP 自建 — Phase 10 將補）
--- a/docs/adr/README.md
+++ b/docs/adr/README.md
@@ -52,6 +52,8 @@
 | [028](ADR-028-llm-routing-unified-principles.md) | LLM 路由統一準則 — Ollama-First 五大支柱（補述 ADR-027） | Accepted | 2026-05-03 |
 | [029](ADR-029-hermes-first-twin-tower.md) | Hermes-First 雙塔分工（戰術主塔 / 戰略副塔，Gemini 月支出 -23%） | Accepted | 2026-05-03 |
 | [030](ADR-030-frontier-multi-vendor-strategy.md) | Frontier 多供應商策略（Anthropic + Google + OpenRouter；Phase 7 Code Review 升 Claude Opus 4.7） | Accepted | 2026-05-03 |
+| [032](ADR-032-rag-autonomous-learning-loop.md) | RAG 自主學習迴圈 — Distiller + PromotionGate + 反饋環（Phase 11） | Accepted | 2026-05-03 |
+| [033](ADR-033-rag-three-guardrails.md) | RAG 治理三護欄 — Promotion Gate / Firecrawl 資源 / BGE-M3 一致性（Owen v5.0 鐵律） | Accepted | 2026-05-03 |

 ## 規範