debug: log telegram 400 response body to diagnose card send failure

fix(aider-watch): code-review fixes (4 issues)
1. aiderw: session_end 補 model+cwd (AI Router feedback loop 修通) 2. repository: model_stats_since SQL 改 COALESCE(session_end, session_start) model 3. aider_event_service: classify_severity 移除 error_count 觸發告警（防假陽性） 4. worker: run_aider_event_processor_loop 包 proc.start() try/except（防靜默崩潰） 2026-04-20 @ Asia/Taipei
2026-04-21 01:05:21 +08:00 · 2026-04-21 00:59:21 +08:00 · 2026-04-20 12:29:49 +00:00 · 2026-04-20 20:24:41 +08:00 · 2026-04-20 20:21:15 +08:00 · 2026-04-20 20:18:11 +08:00
574 changed files with 113731 additions and 9547 deletions
--- a/.agents/skills/01-awoooi-frontend-aesthetics.md
+++ b/.agents/skills/01-awoooi-frontend-aesthetics.md
@@ -27,6 +27,8 @@
 | v1.4 | 2026-03-28 | Claude Code | ✅ Phase 19 Wave 0-5 完成 (~95% + Telemetry 整合) |
 | v1.5 | 2026-03-30 | Claude Code | 🔴🔴🔴 前端建置禁止內網 IP (瀏覽器權限事故) |
 | v1.6 | 2026-03-31 | Claude Code | 🚀 ADR-042 效能優化模式 (DOM Bypass + Optimistic Updates) |
+| v1.7 | 2026-04-09 | Claude Opus 4.6 | 🔴 Sprint 5R 前端重構 — 品牌一致性鐵律 + 設計稿對齊規範 |
+| v1.8 | 2026-04-10 | Claude Opus 4.6 | ✅ Sprint 5R 實施完成 — 7 新元件 + 骨架屏 + 60:40 雙欄 |

 ---

@@ -55,6 +57,31 @@ grep "NEXT_PUBLIC" .gitea/workflows/cd.yaml | grep -v "192.168"

 ---

+## 🔴🔴 品牌 Logo 與文字一致性 (2026-04-09)
+
+> **統帥多次糾正**: 所有設計稿和頁面中的 Logo SVG 和 AwoooI 文字必須與正式環境完全一致
+
+### Logo SVG（螺旋眼睛）
+- 來源：`header.tsx` L82-111，viewBox `0 0 140 140`
+- 漸層：陶瓷白 + 藍色 LED + 觸鬚 + 旋轉虛線圓
+- 禁止簡化、禁止替代、禁止自創
+
+### AwoooI 品牌文字
+- `A`：DM Mono 20px fw-700 #141413 margin-right:-4px
+- `wooo`：VT323 26px #d97757 letterSpacing:0 margin:0 -2px
+- `I`：DM Mono 20px fw-700 #141413 margin-left:-3px
+- 字母間必須緊湊，整體像一個字
+
+### 設計稿 HTML Mockup
+- 直接從 header.tsx 複製 SVG 和文字結構
+- OpenClaw 面板也用同款螺旋眼睛 SVG
+
+### 流程圖 icon
+- 使用 dashboardicons.com OpenClaw PNG（取代圓圈，不是浮動）
+- URL: `https://cdn.jsdelivr.net/gh/homarr-labs/dashboard-icons/png/openclaw.png`
+
+---
+
 ## 核心約束 (Iron Laws)

 ### 1. Nothing.tech 純白工業風 (絕對標準)
--- a/.agents/skills/02-lewooogo-backend-core.md
+++ b/.agents/skills/02-lewooogo-backend-core.md
@@ -36,6 +36,9 @@
 | v2.3 | 2026-03-30 | Claude Code | 🤖 新增 AI Fallback 順序章節 (NVIDIA 優先仲裁) |
 | v2.4 | 2026-03-31 | Claude Code | 🏛️ Phase 22 首席架構師審查通過 (Mock違規+分層修復全部完成) |
 | v2.5 | 2026-04-01 | Claude Code | ♻️ Phase R-R2 完成 (legacy -971行) + R-R2.1 P0/P1修復 + ADR-046 型別統一 |
+| v2.6 | 2026-04-08 | Claude Code | 🛡️ Sprint 5.1 Data Safety Guardrails — Service Registry 模式 + 審查修正鐵律 |
+| v2.7 | 2026-04-09 | Claude Sonnet 4.6 | 🔧 ADR-066 批准執行閉環修復 — Nemotron tool→kubectl_command 回填鐵律 |
+| v2.8 | 2026-04-10 | Claude Sonnet 4.6 | 🚀 ADR-068 飛輪冷啟動修復鐵律 — affected_services/Router層業務邏輯/Jaccard豁免/embedding持久化 |

 ---

@@ -728,6 +731,40 @@ Python stop() timeout: 75          # 比 K8s 少 15s 緩衝
 > **ConfigMap**: `AI_FALLBACK_ORDER: '["nvidia","gemini","ollama","claude"]'`
 > **審查結果**: P0 修復後 85/100 → 最終 94/100

+### 🔴 鐵律：Nemotron/Gemini Tool Call 必須回填 kubectl_command (ADR-066)
+
+**背景**: 幾個月來批准按鈕完全無效，因為 Nemotron tool 結果未傳播到執行鏈路。
+
+```python
+# ✅ 正確 — openclaw.py 必須回填
+_tools = proposal["nemotron_tools"]
+if _tools:
+    _t = _tools[0]
+    if _t["tool"] == "restart_deployment":
+        proposal["kubectl_command"] = f"kubectl rollout restart deployment/{_deploy} -n {_ns}"
+    elif _t["tool"] == "delete_pod":
+        proposal["kubectl_command"] = f"kubectl delete pod {_pod} -n {_ns}"
+    elif _t["tool"] == "scale_deployment":
+        proposal["kubectl_command"] = f"kubectl scale deployment/{_deploy} --replicas={_replicas} -n {_ns}"
+
+# ✅ 正確 — proposal_service 優先用 kubectl_command
+_kubectl = llm_proposal.get("kubectl_command", "").strip()
+action = _kubectl if _kubectl else llm_proposal["action"]
+
+# ❌ 禁止 — 只存 nemotron_tools[] 不回填 kubectl_command
+proposal["nemotron_tools"] = result.get("tools", [])
+# (缺少回填 → parse_operation_from_action → None → SKIP)
+```
+
+**為何重要**: `execute_approved_action` 靠 `parse_operation_from_action(approval.action)` 決定執行什麼。若 action 是中文標題或 "未知操作"，解析失敗，靜默跳過，UI 卻顯示「已批准」。
+
+**檢查清單**:
+- [ ] 新增 Tool Call 工具時，同步更新 openclaw.py 的回填邏輯
+- [ ] 測試批准後 `audit_logs` 有寫入記錄
+- [ ] 批准後 Telegram 有收到 reply 狀態訊息
+
+---
+
 ### 鐵律：NVIDIA Nemotron 優先仲裁

 ```python
@@ -742,6 +779,28 @@ if provider in ("nvidia", "gemini", "claude"):
    allowed, reason = await rate_limiter.check_and_increment(provider)
 ```

+### Ollama 模型中央化 (D1, ADR-067, 2026-04-11)
+
+**禁止**在 Service 層 hardcode Ollama 模型名稱。**必須**使用：
+
+```python
+from src.services.model_registry import get_model
+model = get_model("ollama", "purpose_key")
+```
+
+| purpose key | 預設模型 | 服務 |
+|------------|---------|------|
+| drift_summary | qwen2.5:7b-instruct | drift_narrator_service |
+| drift_intent | qwen2.5:7b-instruct | drift_interpreter |
+| log_anomaly | deepseek-r1:14b | log_summary_service |
+| code_review | qwen2.5-coder:7b | local_code_review_service |
+| image_analysis | llava:latest | image_analysis_service |
+| nemoclaw | deepseek-r1:14b | decision_manager |
+| playbook_draft | qwen2.5:7b-instruct | decision_manager |
+| embedding | nomic-embed-text | embedding_service, knowledge_service |
+
+模型切換：只改 `apps/api/models.json`，重啟 Pod，不改代碼。
+
 ### 各 Provider 特性

 | Provider | 成本 | 特性 | 用途 |
@@ -900,11 +959,225 @@ except Exception as e:

 ---

+---
+
+## Sprint 5.1 Service Registry 模式（ADR-062）
+
+### 有狀態服務分級鐵律
+
+所有自動修復決策必須先查詢 `ops/config/service-registry.yaml`：
+
+```python
+from src.services.service_registry import StatefulLevel, get_service_registry
+
+registry = get_service_registry()
+level = registry.get_stateful_level(service_name)
+
+if level == StatefulLevel.BLOCK:
+    # 直接拒絕，不進入 AI 分析
+    return AutoRepairDecision(can_auto_repair=False, blocked_by="SERVICE_REGISTRY_BLOCK")
+```
+
+### Guardrail 失敗的保守原則
+
+```python
+# ✅ 正確：失敗時 block（保守，優先安全）
+except Exception as e:
+    logger.error("guardrail_check_failed", error=str(e))
+    return AutoRepairDecision(can_auto_repair=False, blocked_by="GUARDRAIL_ERROR")
+
+# ❌ 錯誤：失敗時放行（穿透 BLOCK 保護）
+except Exception as e:
+    logger.error(...)
+    pass  # 繼續執行 — 違反安全原則！
+```
+
+### 新 Service 的標準樣板（首席審查教訓）
+
+每個新建 Service **必須全部符合**：
+
+```python
+import structlog  # ✅ 不是 import logging
+from src.utils.timezone import now_taipei  # ✅ 不是 datetime.now(UTC)
+
+logger = structlog.get_logger(__name__)  # ✅ structlog
+
+_client: MyClient | None = None
+
+def get_my_client() -> MyClient:      # ✅ singleton
+    global _client
+    if _client is None:
+        _client = MyClient()
+    return _client
+
+def set_my_client(c: MyClient) -> None:  # ✅ DI setter（測試注入）
+    global _client
+    _client = c
+```
+
+所有通知方法必須包覆 try/except，失敗只 log 不拋出：
+
+```python
+async def send_xxx_notification(self, ...) -> None:
+    try:
+        text = ...
+        await self.send_notification(text)
+    except Exception as e:
+        logger.error("xxx_notify_failed", error=str(e))  # ✅ 不拋出
+```
+
+---
+
+## 告警規則引擎 (ADR-064, 2026-04-09)
+
+**模組**: `apps/api/src/services/alert_rule_engine.py`  
+**配置**: `apps/api/alert_rules.yaml`
+
+### 規則匹配
+
+```python
+from src.services.alert_rule_engine import match_rule
+result = match_rule(alert_context)  # dict | None
+# result["rule_id"] == "generic_fallback" → AI 自動學習
+```
+
+### AI 自動規則學習
+
+命中 `generic_fallback` 時，在上層 **async** 方法觸發：
+
+```python
+asyncio.create_task(auto_generate_rule(
+    alert_context,
+    ollama_url=settings.OLLAMA_URL,      # DI 注入
+    model=settings.OPENCLAW_DEFAULT_MODEL,
+    gemini_api_key=getattr(settings, "GEMINI_API_KEY", ""),
+))
+```
+
+⚠️ **禁止在 sync 方法中呼叫 asyncio.get_event_loop()** — 必須在 async 上下文用 `asyncio.create_task()`
+
+### Priority 體系
+
+| 範圍 | 用途 |
+|------|------|
+| 1–499 | 手寫規則（不被 AI 覆蓋） |
+| 500–890 | AI 自動生成規則 |
+| 999 | generic_fallback 通用兜底 |
+
+### get_incident_type() — incident_type 三層推斷 (I1, 2026-04-11)
+
+```python
+from src.services.alert_rule_engine import get_incident_type
+incident_type = get_incident_type(alertname)
+# Layer 1: YAML rule.incident_type（需明確設定）
+# Layer 2: ALERTNAME_TO_TYPE 靜態 dict（src/constants/alert_types.py，56 筆）
+# Layer 3: "custom" 兜底
+```
+
+**禁止**：使用 `ALERTNAME_TO_TYPE.get(alertname, "custom")` 直接在 Router 層存取靜態 dict。
+**必須**：呼叫 `get_incident_type()` 讓 YAML 規則有機會優先匹配。
+
+**YAML rule.id ≠ incident_type**（命名空間不同）。YAML 無 `incident_type` 欄位時自動 fall through Layer 2。
+
+### 多 Pod 限制（ADR-064 L1/L2）
+
+`_generating` set 進程級去重，多 Pod 可能重複生成。新規則 append 後只有寫入 Pod 立即生效，其他 Pod 需重啟。
+
+### DI 要求
+
+`auto_generate_rule()` 透過參數接收 ollama/gemini 設定，**禁止** 在函式內 `from src.core.config import settings`。
+
+---
+
+## 🚀 自動修復飛輪鐵律 (ADR-068, 2026-04-10)
+
+> **背景**: 25 個 AUTO_REPAIR_TRIGGERED 全部 NO_MATCH — 五個根因同時存在
+
+### 1. affected_services 提取鐵律
+
+**禁止**將 `target_resource`（可能是 IP:port 或 alertname）直接填入 `affected_services`。
+
+```python
+# ❌ 絕對禁止（污染 Jaccard 匹配）
+affected_services = [target_resource]  # 可能是 "192.168.0.188:9100" 或 "HostHighCpuLoad"
+
+# ✅ 正確 — 語意提取（在 incident_service.py）
+affected_services = extract_affected_services(labels, target_resource)
+# 優先序: component > job(非基礎設施) > pod(deployment name) > clean target > []
+```
+
+### 2. Signal alert_name 鐵律
+
+```python
+# ❌ 禁止 — alert_name="custom" 讓 Redis index 查詢命中零
+alert_name = alert_type  # "custom"
+
+# ✅ 正確 — 用真實 alertname label
+alert_name = alertname or alert_type  # "HostHighCpuLoad"
+```
+
+### 3. Router 層業務邏輯鐵律
+
+`create_incident_for_approval` 等含 Severity 映射、Signal 建立、Incident 建立的函數**必須**在 Service 層：
+
+```
+# ✅ 正確位置
+apps/api/src/services/incident_service.py  ← create_incident_for_approval()
+                                           ← extract_affected_services()
+
+# ❌ 錯誤位置（已修正）
+apps/api/src/api/v1/webhooks.py            ← 業務邏輯不屬 Router
+```
+
+### 4. Jaccard 空集合豁免鐵律
+
+通用型基礎設施 Playbook（`affected_services=[]`，`severity_range=[]`）代表適用所有情境，**不能**因空集合被 Jaccard 打成 0：
+
+```python
+# apps/api/src/utils/similarity.py — 豁免規則
+"affected_services": 1.0 if not pattern_b.affected_services else jaccard(...)
+"severity": 1.0 if not pattern_b.severity_range or overlap else 0.0
+```
+
+### 5. Playbook alertname 變體鐵律
+
+Playbook 的 `symptom_pattern.alert_names` 必須包含所有真實世界 alertname 變體：
+
+```yaml
+# apps/api/alert_rules.yaml — 每條規則都要加足變體
+- id: high_cpu
+  match:
+    alertname:
+      - HighCPUUsage          # Prometheus 規則名
+      - HostHighCpuLoad       # node-exporter 變體
+      - CPUThrottlingHigh     # K8s 變體
+```
+
+### 6. Embedding 持久化鐵律
+
+Playbook 向量**必須**同時存入 Redis（熱快取）和 `playbook_embeddings`（pgvector 持久化），防止重啟後冷啟動斷層：
+
+```python
+# main.py lifespan 啟動時（非阻塞）
+asyncio.create_task(ensure_playbook_embeddings_indexed())
+```
+
+Repository 層負責格式化：
+```python
+# ✅ 正確 — PlaybookEmbeddingRepository.upsert()
+vec_str = "[" + ",".join(str(float(x)) for x in embedding) + "]"  # pgvector 安全格式
+
+# ❌ 禁止 — str(embedding) 可能輸出帶空格的格式
+```
+
+---
+
 ## 參考文檔

 - `apps/api/src/core/config.py`: 設定中心
 - `apps/api/src/main.py`: FastAPI 應用入口
 - `apps/api/src/plugins/mcp/mcp_bridge.py`: MCP Bridge 核心
+- `apps/api/alert_rules.yaml`: 告警規則配置（新增規則只改這裡）
 - `packages/lewooogo-data/`: 記憶體 Provider 積木
 - `packages/lewooogo-brain/`: AI 引擎積木
 - `memory/feedback_lewooogo_modular_enforcement.md`: 積木化強制執行鐵律
@@ -914,3 +1187,5 @@ except Exception as e:
 - ADR-006: AI 備援策略
 - ADR-008: Python 模組化獨立積木架構
 - ADR-027: Incident-Approval 同步架構 (UnitOfWork + Saga)
+- ADR-064: Alert Rule Engine — YAML 驅動 + AI 自動學習
+- ADR-068: 飛輪冷啟動斷層修復 — affected_services/Jaccard/Embedding 四階段系統性根治
--- a/.agents/skills/03-openclaw-cognitive-expert.md
+++ b/.agents/skills/03-openclaw-cognitive-expert.md
@@ -526,11 +526,109 @@ NEMOTRON_ASYNC_UPDATE=true           # 異步更新模式

 ---

+## 規則引擎降級路徑 (ADR-064, 2026-04-09)
+
+`_generate_mock_response()` **不是假數據**，是正式降級的規則引擎路徑。
+
+### 降級流程
+
+```
+AI 分析失敗（所有 Provider 失敗）
+  ↓
+_call_with_fallback() 呼叫規則引擎降級
+  ↓
+match_rule(alert_context)
+  ├── 命中具體規則 → rule_id = "docker_container_unhealthy" 等
+  └── 只命中 generic_fallback → rule_id = "generic_fallback"
+        ↓ asyncio.create_task (在 async context)
+      auto_generate_rule() → Ollama → Gemini → append alert_rules.yaml
+```
+
+### 關鍵行為
+
+- `confidence = 0.0` — 規則匹配固定值，**禁止偽造**
+- `suggested_action` 在 Telegram 顯示的是 `kubectl_command`（完整指令），不是 enum 字串
+- 自動生成的規則 priority 500–890，不覆蓋手寫規則 (1–499)
+
+### 新增規則
+
+只需修改 `apps/api/alert_rules.yaml`，重啟 Pod 生效，**不需要改 Python**。
+
+---
+
 ## 參考文檔

 - `apps/api/src/services/incident_engine.py`: 聚合引擎
 - `apps/api/src/services/multi_sig_redis.py`: 分散式狀態
 - `apps/api/src/workers/signal_worker.py`: Event Bus 消費者
 - `apps/api/src/plugins/mcp/mcp_bridge.py`: MCP Bridge
+- `apps/api/alert_rules.yaml`: 告警規則配置
+- `apps/api/src/services/alert_rule_engine.py`: 規則引擎
 - `memory/project_phase13_enterprise_aiops.md`: Phase 13 規劃
 - Phase 6.0-6.3: 認知覺醒計畫
+- ADR-064: Alert Rule Engine
+
+---
+
+## 🆕 2026-04-19 AI Decision LLM 擴展層 (ADR-092)
+
+### 統一 LLM Service Pattern
+
+**Helper**: `apps/api/src/services/llm_json_parser.py`
+
+```python
+from src.services.llm_json_parser import parse_llm_json_response
+from src.services.openclaw import get_openclaw
+
+async def _llm_analyze_xxx(input_data) -> dict[str, Any] | None:
+    try:
+        prompt = _PROMPT.format(**input_data)
+        openclaw = get_openclaw()
+        text, provider, success = await openclaw.call(prompt)
+        if not success or not text:
+            return None
+        parsed = parse_llm_json_response(
+            text,
+            required_key="your_required_key",  # e.g. 'recommended_actions'
+            logger_context="your_service_name",
+        )
+        if parsed:
+            parsed["_llm_provider"] = provider
+        return parsed
+    except Exception as e:
+        logger.warning("xxx_llm_error", error=str(e))
+        return None
+```
+
+**3-path fallback 自動處理**:
+- Path 1: 剝 markdown fence + 直接 JSON
+- Path 2: NemoTron wrapper (description/action_title/reasoning 內嵌 JSON)
+- Path 3: 失敗 return None + logger.warning (不 raise)
+
+### 現有 4 個 LLM Service（擴加時參考 pattern）
+
+| Service | required_key | 用途 | 觸發 |
+|---|---|---|---|
+| `hermes_rule_quality_job` | `recommended_actions` | noisy rule 假報真因 | 每日 04:00 |
+| `capacity_forecaster_job` | `priority_actions` | 容量預測修復策略 | 每日 05:00 |
+| `compliance_scanner_job` | `posture_grade` | 合規態勢評級 A/B/C/D/F | 每日 03:00 |
+| `coverage_evaluator_job` | `worst_dimension` | 補覆蓋缺口建議 | red_ratio > 30% 且 scanned >= 50 |
+
+### 擴加 LLM Service 鐵律 (ADR-092)
+
+1. **失敗永不 raise** — try/except return None, 呼叫者 fallback 硬編規則
+2. **AI 只建議不動作** — output 必設 `requires_human_decision=True`
+3. **openclaw 統一入口** — 不直接呼叫 Ollama/NVIDIA/Gemini
+4. **aol 留痕** — 寫 `automation_operation_log.output.llm_analysis`
+5. **繁中 + JSON schema** — Prompt 明確 required_key
+
+### autonomy_score 追蹤
+
+`GET /api/v1/aiops/kpi` → `ai_autonomy_score.total` (0-100)
+
+5 子項 × 20 分:
+- asset_coverage / rule_quality / capacity_health / automation_flow / ai_diversity
+
+Grade: mature(90+) / in_progress(70-90) / starter(50-70) / initial(<50)
+
+實測 2026-04-19: **63/100 (starter)** — LLM 升級 1/9 → 4/9
--- a/.agents/skills/04-awoooi-devops-commander.md
+++ b/.agents/skills/04-awoooi-devops-commander.md
@@ -35,6 +35,9 @@
 | v2.2 | 2026-03-31 | Claude Code | **📊 K3s 優化成效數據 (告警-100%, Pod 重啟-100%, 48h+穩定)** |
 | v2.3 | 2026-03-31 | Claude Code | **📅 Phase 21 定期報告機制規劃 (Weekly/Daily E2E/K3s Report)** |
 | v2.4 | 2026-03-31 | Claude Code | **🔧 OTEL gRPC vs HTTP 端點區分 (K8s:24317, CI/CD:24318)** |
+| v2.5 | 2026-04-09 | Claude Sonnet 4.6 | **🔴 SSH 自動修復全鏈路 — 雙主機 E2E 閉環 + 12 Bug 修復** |
+| v2.6 | 2026-04-11 | Claude Sonnet 4.6 | **Sprint B-1 Ansible IaC 骨架 + Architecture Review 安全修復** |
+| v2.7 | 2026-04-11 | Claude Sonnet 4.6 | **Sprint B-2/B-3 ArgoCD GitOps + Sprint C Velero/rsync DR + ADR-070 MCP Phase 1-4 全自動 AIOps 閉環 + ADR-071 告警通知四類型** |

 ---

@@ -1197,3 +1200,212 @@ links = DeepLinking.get_all_links(
 - `memory/project_phase15_langfuse.md`: **📊 Phase 15 全部完成**
 - `memory/project_phase17_tech_debt.md`: **🔧 Phase 17 技術債**
 - `src/core/deep_linking.py`: **👁️ Deep Linking URL 生成器**
+- `docs/adr/ADR-058-host-auto-repair-ssh-whitelist.md`: **🔴 SSH 自動修復架構 + Bug 修復記錄**
+- `ops/config/service-registry.yaml`: **服務分級清單 (BLOCK/CRITICAL_HITL/STANDARD_HITL/AUTO)**
+
+---
+
+## 🔴 SSH 自動修復架構 (Sprint 3 + 2026-04-09 Bug 修復)
+
+> **ADR**: ADR-058 (已批准，Appendix A 記錄 Bug 修復)
+> **狀態**: ✅ 雙主機 E2E 驗證通過
+
+### 關鍵基礎設施要求
+
+| 項目 | 設定值 | 說明 |
+|------|-------|------|
+| Dockerfile | `openssh-client` | 生產 stage 必須安裝，ssh binary 才存在 |
+| K8s Pod securityContext | `fsGroup: 1000` | 讓 appuser 有 group read on 0400 Secret |
+| NetworkPolicy egress | port 22 → 110 + 188 | 預設拒絕，必須明確開放 |
+| Secret defaultMode | `0400` (八進位) | SSH 要求 owner-only，group read 靠 fsGroup |
+| known_hosts Secret | `awoooi-repair-known-hosts` | optional: true，含 110+188 hashed 指紋 |
+
+### repair-bot 白名單 (當前完整清單)
+
+**110 主機 (wooo@192.168.0.110)**
+
+| Component | 目錄 |
+|-----------|------|
+| sentry | /opt/sentry |
+| harbor | /home/wooo/harbor/harbor |
+| gitea | /home/wooo/gitea |
+| gitea-runner | /home/wooo/act-runner |
+| langfuse | /home/wooo/langfuse |
+| alertmanager | /home/wooo/monitoring |
+| signoz | /home/wooo/signoz/deploy/docker |
+| stock-platform | /home/wooo/stockPlatform |
+
+**188 主機 (ollama@192.168.0.188)**
+
+| Component | 目錄 |
+|-----------|------|
+| openclaw | /home/ollama/clawbot-v5 |
+| minio | /home/ollama/minio |
+| signoz | /home/ollama/signoz/deploy/docker |
+| momo-app | /home/ollama/momo-pro |
+| tsenyang-website | /home/ollama/services/tsenyang |
+| bitan-app | /home/ollama/services/bitan |
+
+### 修改 repair-bot 白名單 SOP
+
+1. 確認 compose dir 在目標主機存在
+2. SSH 到目標主機 `sed -i` 修改 `~/bin/repair-bot-{110|188}.sh`
+3. 用 `SSH_ORIGINAL_COMMAND=health ~/bin/repair-bot-xxx.sh` 驗證
+4. 同步更新 `ops/config/service-registry.yaml`
+5. commit + push gitea
+
+### 新增修復主機 SOP
+
+1. 在目標主機建立 `~/bin/repair-bot-{host}.sh`（複製模板）
+2. 將 `awoooi-repair-ssh-key.pub` 加入 `~/.ssh/authorized_keys`（加 `command=` 限制）
+3. `ssh-keyscan -H {host_ip}` → 更新 `awoooi-repair-known-hosts` Secret
+4. NetworkPolicy 新增 `{host_ip}:22` egress
+5. `LAYER_SSH_CONFIG` 新增 layer 設定（`host_repair_agent.py`）
+6. service-registry.yaml 新增服務分級
+
+### 常見陷阱 (血的教訓)
+
+```
+❌ target_resource 用 instance (IP:port) → Jaccard 服務比對為 0
+✅ 必須優先取 labels.component，再 fallback 到 pod、instance
+
+❌ kubectl apply 06-deployment-api.yaml → IMAGE_TAG_PLACEHOLDER 覆蓋真實 SHA → ImagePullBackOff
+✅ 修改 K8s Deployment 配置用 kubectl patch，不用 kubectl apply
+
+❌ known_hosts hashed 格式，grep IP 會得 0 → 以為沒寫進去
+✅ 用 wc -l 或 ssh 實測驗證，hashed 格式是正常的
+
+❌ StrictHostKeyChecking=no（舊設定）
+✅ known_hosts Secret 已建立，改用 StrictHostKeyChecking=yes
+```
+
+---
+
+## 🏗️ Sprint B — Ansible Host IaC (2026-04-11)
+
+> **ADR**: ADR-069 Sprint B
+> **狀態**: B-1 ✅ 骨架完成；B-2/B-3 待開工
+
+### 目錄結構
+
+```
+infra/ansible/
+├── inventory/
+│   ├── hosts.yml              # 5 主機（110/188/120/121/112）
+│   └── group_vars/
+│       ├── all.yml            # 共用變數（github_runner_count 等）
+│       ├── host_110.yml       # swap/docker/keepalived BACKUP
+│       └── host_188.yml       # docker/keepalived MASTER
+├── playbooks/
+│   ├── site.yml               # 全站入口
+│   ├── 110-devops.yml         # 110 預期狀態收斂
+│   ├── 188-ai-web.yml         # 188 預期狀態收斂
+│   └── nginx-sync.yml         # Nginx conf 同步（188 single source of truth）
+└── roles/
+    ├── nginx/
+    │   ├── tasks/main.yml
+    │   └── templates/188-all-sites.conf.j2
+    ├── docker-compose-service/tasks/main.yml
+    ├── swap/tasks/main.yml
+    └── pm2-service/tasks/main.yml
+```
+
+### 執行方式
+
+```bash
+# 全站收斂
+ansible-playbook -i inventory/hosts.yml playbooks/site.yml
+
+# 單主機
+ansible-playbook -i inventory/hosts.yml playbooks/110-devops.yml
+ansible-playbook -i inventory/hosts.yml playbooks/188-ai-web.yml
+
+# nginx 同步（需 vault password）
+ansible-playbook -i inventory/hosts.yml playbooks/nginx-sync.yml --tags 188
+
+# 乾跑
+ansible-playbook -i inventory/hosts.yml playbooks/site.yml --check
+```
+
+### SSH MCP Provider 安全規則 (ADR-071 MCP-2a)
+
+Architecture Review 發現的安全要求（2026-04-11）：
+
+1. **所有字串參數必須通過 `_validate_param()` 白名單驗證**
+   - container_name/service: `[a-zA-Z0-9._-]{1,128}`
+   - compose_dir: 必須以 `/opt/` 或 `/srv/` 開頭，禁止 `..`
+   - domain: FQDN 白名單
+   - 數值參數: int() + 上下限夾緊
+
+2. **known_hosts 驗證**
+   - 設定 `SSH_MCP_KNOWN_HOSTS_FILE` 環境變數指向 `ssh-keyscan` 產生的文件
+   - 未設定時會 warning log，但不阻擋（內網快速啟動模式）
+
+3. **群組 B 工具需 trust_score >= 0.8**（硬編碼守衛）
+
+---
+
+## 🚀 Sprint C — DR 備份與恢復 (2026-04-11) ✅
+
+> **ADR**: ADR-069 Sprint C  
+> **目標**: 任意單點失效 15 分鐘內可恢復
+
+### Velero K8s 備份
+- 狀態: ✅ 已運作 13d，daily-awoooi-prod schedule，MinIO Available
+- 驗證: `velero backup get` → Completed
+
+### rsync Host 備份
+- 腳本: `scripts/ops/backup-from-110.sh`
+- 部署: 188 `~/backup-from-110.sh`，cron `0 1 * * *`
+- 環境變數: `BACKUP_ROOT=/home/ollama/backup/110`
+- 告警: `HostBackupFailed` Prometheus rule
+
+### DR SOP 文件
+- `docs/runbooks/dr-k8s-restore.md`
+- `docs/runbooks/dr-nginx-restore.md`
+- `docs/runbooks/dr-harbor-restore.md`
+- `docs/runbooks/dr-bitan-restore.md`
+- `docs/runbooks/dr-stock-restore.md`
+
+---
+
+## 🤖 ADR-070 全自動 AIOps 閉環 — MCP Phase 1-4 (2026-04-11) ✅
+
+> 10 MCP Providers 全部生產驗收完成
+
+### Provider 清單
+
+| Provider | 工具數 | 用途 |
+|---------|--------|------|
+| kubernetes | 10 | Pod/Deployment/HPA/Node 操作 |
+| signoz | 3 | APM 查詢 |
+| database | 3 | Approval/Incident DB 查詢 |
+| filesystem | 5 | 安全受限日誌讀取 |
+| grafana | 3 | Dashboard 查詢 |
+| runbooks | 2 | RAG 知識庫搜尋 |
+| prometheus | 3 | 即時指標查詢（110:9090）|
+| ssh_host | 15 | 主機層 SSH 診斷+操作 |
+| argocd | 3 | GitOps 狀態查詢（125:30443）|
+| sentry | 3 | 錯誤追蹤查詢 |
+
+### 關鍵 ConfigMap 設定
+```yaml
+SSH_MCP_ENABLED: "true"
+SSH_MCP_KNOWN_HOSTS_FILE: "/etc/ssh-mcp/known_hosts"
+ARGOCD_MCP_ENABLED: "true"
+ARGOCD_URL: "https://192.168.0.125:30443"
+SENTRY_MCP_ENABLED: "true"
+PROMETHEUS_URL: "http://192.168.0.110:9090"
+```
+
+### 關鍵 K8s Secrets
+```
+ARGOCD_API_TOKEN   ✅
+SENTRY_AUTH_TOKEN  ✅
+SENTRY_DSN         ✅ (http://192.168.0.110:9000/3 內網 HTTP)
+ssh-mcp-key        ✅ (ssh_mcp_key + known_hosts)
+```
+
+### Runbook
+`docs/runbooks/ssh-mcp-setup.md`
+
--- a/.agents/skills/05-awoooi-sre-qa.md
+++ b/.agents/skills/05-awoooi-sre-qa.md
@@ -708,6 +708,87 @@ def validate_traditional_chinese(response: str) -> bool:

 ---

+## 🔴 自動修復 E2E 驗收規範 (2026-04-09)
+
+> **背景**: 系統曾有自動修復機制卻從未成功執行（success_count 全部為 0），完整審計後修復 12 個阻斷性 Bug
+> **教訓**: Playbook 匹配成功 ≠ SSH 執行成功，必須端到端驗收
+
+### 自動修復完整鏈路
+
+```
+Alertmanager → POST /api/v1/webhooks/alertmanager
+  → LLM 分析 (Nemotron) + _extract_symptoms()
+    → {alert_names, affected_services, keywords}
+    ⚠️ affected_services 必須取 labels.component，不能用 labels.instance (IP:port)
+  → playbook_service.get_recommendations() — Jaccard 相似度
+    → alert_exact_match bypass: alert_names 完全匹配時忽略 0.4 門檻
+  → evaluate_auto_repair() — 查 service-registry 分級
+    → BLOCK → 僅告警; AUTO → 直接執行
+  → HostRepairAgent.repair(layer, component)
+    → SSH: ssh -i /etc/repair-ssh/id_ed25519 wooo@192.168.0.110 repair:sentry
+    → repair-bot.sh → docker compose up -d → REPAIR_OK:sentry
+```
+
+### E2E 驗收 Checklist
+
+```bash
+# Step 1: 確認 SSH binary 存在
+POD=$(kubectl -n awoooi-prod get pod -l app=awoooi-api -o jsonpath='{.items[0].metadata.name}')
+kubectl -n awoooi-prod exec $POD -- which ssh  # 必須有輸出
+
+# Step 2: 確認 SSH key 可讀
+kubectl -n awoooi-prod exec $POD -- ls -la /etc/repair-ssh/id_ed25519
+# 預期: -r--r----- 1 root appuser ... (fsGroup=1000 生效)
+
+# Step 3: 確認 known_hosts 有內容
+kubectl -n awoooi-prod exec $POD -- wc -l /etc/repair-known-hosts/known_hosts
+# 預期: 9 (hashed 格式，grep IP 會得 0 — 正常)
+
+# Step 4: SSH 健康確認
+kubectl -n awoooi-prod exec $POD -- sh -c \
+  "ssh -i /etc/repair-ssh/id_ed25519 \
+   -o UserKnownHostsFile=/etc/repair-known-hosts/known_hosts \
+   -o StrictHostKeyChecking=yes -o BatchMode=yes -o ConnectTimeout=10 \
+   wooo@192.168.0.110 health"
+# 預期: REPAIR_BOT_HEALTHY:110
+
+# Step 5: Webhook 觸發（新 fingerprint 避免去重）
+curl -X POST http://192.168.0.120:32334/api/v1/webhooks/alertmanager \
+  -H "Content-Type: application/json" \
+  -d '{"alerts":[{"labels":{"alertname":"SentryDown","component":"sentry",
+       "severity":"critical"},"fingerprint":"e2e-test-001","status":"firing",
+       "startsAt":"2026-04-09T00:00:00Z","endsAt":"0001-01-01T00:00:00Z"}]}'
+
+# Step 6: 確認 log
+kubectl -n awoooi-prod logs -l app=awoooi-api --tail=50 | \
+  grep -E "REPAIR_OK|auto_repair_execute_success|auto_repair_approved"
+```
+
+### Playbook symptom_pattern 要求
+
+```json
+{
+  "alert_names": ["SentryDown"],       // ← alert_exact_match key，完全匹配才能 bypass
+  "affected_services": ["sentry"],     // ← 必須與 labels.component 一致，不是 instance
+  "severity_range": ["P1", "P2"],
+  "label_patterns": {"component": "sentry"},
+  "keywords": ["sentry", "9000"]
+}
+```
+
+### 自動修復被阻斷的診斷方法
+
+| 症狀 | 可能原因 | 診斷指令 |
+|------|---------|---------|
+| `auto_repair_approved` 沒出現 | Jaccard 分數 < 0.4 | 查 log `similarity` 欄位 |
+| `can_auto_repair: false` | service-registry BLOCK/HITL | 查 `blocked_by` 欄位 |
+| `ssh: command not found` | Dockerfile 缺 openssh-client | Pod exec `which ssh` |
+| `Permission denied (publickey)` | known_hosts 缺少該主機 | Pod exec SSH 看錯誤訊息 |
+| `Load key ... Permission denied` | fsGroup 未設定 | Pod exec `ls -la /etc/repair-ssh/` |
+| `Connection refused/timeout` | NetworkPolicy 封鎖 22 | Pod exec `ssh -v` 看連線過程 |
+
+---
+
 ## 參考文檔

 - `apps/web/playwright.config.ts`: Playwright 設定
@@ -720,5 +801,6 @@ def validate_traditional_chinese(response: str) -> bool:
 - `memory/feedback_runner_zombie_process.md`: **🚨 Runner 殭屍進程修復**
 - `docs/adr/ADR-018-llm-testing-strategy.md`: **🧠 LLM 測試策略 (Deferred)**
 - `docs/adr/ADR-019-system-prompt-management.md`: **📝 System Prompt 集中管理**
+- `docs/adr/ADR-058-host-auto-repair-ssh-whitelist.md`: **🔴 SSH 自動修復 + Bug 修復記錄**
 - `.github/workflows/nightly-llm.yaml`: **🌙 Nightly LLM 測試**
 - `.github/workflows/daily-e2e-health.yaml`: **🏥 Daily E2E 健康檢查**
--- a/.aiderignore
+++ b/.aiderignore
@@ -0,0 +1,60 @@
+# ===== AWOOOI .aiderignore =====
+# 目的：縮小 Aider repo-map（1,165 → ~678 檔），只保留 AI 常編輯的程式碼
+# 建立：2026-04-19
+# 可逆：刪除或註解任何一行即恢復；臨時需要可用 /add <path> 繞過
+
+# --- 二進位/媒體 ---
+*.png
+*.jpg
+*.jpeg
+*.gif
+*.svg
+*.ico
+*.pdf
+*.woff*
+*.ttf
+.playwright-mcp/
+
+# --- Aider/IDE 快取 ---
+.aider.chat.history.md
+.aider.input.history
+.aider.tags.cache.v4/
+.DS_Store
+
+# --- 文件類（244 檔 / 11MB，AI 很少動）---
+docs/adr/
+docs/meetings/
+docs/proposals/
+docs/runbooks/
+docs/screenshots/
+docs/superpowers/
+docs/LOGBOOK.md
+architecture/
+
+# --- 基礎設施（DevOps 時用 --subtree-only 或臨時拿掉）---
+k8s/
+infra/
+ops/
+scripts/backup/
+scripts/reboot-recovery/
+
+# --- CI/CD 設定 ---
+.gitea/
+.github/
+.turbo/
+.pytest_cache/
+.ruff_cache/
+
+# --- Agents/Skills 描述文件 ---
+.agents/
+.superpowers/
+.awoooi-agent-rules.md
+GLOBAL_RULES.md
+SOUL.md
+capabilities.json
+
+# --- Lock files ---
+package-lock.json
+yarn.lock
+pnpm-lock.yaml
+*.snap
--- a/.dockerignore
+++ b/.dockerignore
@@ -0,0 +1,52 @@
+# 首席架構師 Review I1 (2026-04-05 Claude Code)
+# 防止無關檔案射入 Docker build context，縮短 context 傳輸時間
+# 並防止 .playwright-mcp/ PNG/HTML 等大檔案造成 layer hash 不必要失效
+
+# Git
+.git
+.gitignore
+
+# CI/CD
+.gitea
+.github
+
+# 開發工具
+.playwright-mcp
+.vscode
+.idea
+*.log
+*.tmp
+
+# 文件與腳本（不需要進 image）
+# 注意: docs/runbooks/, docs/adr/, .agents/skills/ 供 RAG 索引 (ADR-067 Phase 33)
+# scripts/ 大部分不需要進 image，但 CronJob 腳本需要
+# 2026-04-12 ogt (ADR-073 P2-1): 白名單允許 cron_km_vectorize.py
+scripts
+!scripts/cron_km_vectorize.py
+
+# Node 快取（monorepo 根目錄）
+node_modules
+
+# Python 快取
+__pycache__
+*.pyc
+*.pyo
+.venv
+.pytest_cache
+.mypy_cache
+dist
+*.egg-info
+
+# 測試結果
+test-results
+coverage
+.coverage
+
+# 環境變數（絕對不能進 image）
+.env
+.env.*
+apps/api/.env
+apps/web/.env*
+
+# memory/ADR（不影響 build）
+memory
--- a/.gitea/workflows/ansible-lint.yml
+++ b/.gitea/workflows/ansible-lint.yml
@@ -0,0 +1,22 @@
+name: Ansible Lint
+
+on:
+  push:
+    paths:
+      - 'infra/ansible/**'
+  pull_request:
+    paths:
+      - 'infra/ansible/**'
+
+jobs:
+  lint:
+    runs-on: self-hosted
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Install ansible-lint
+        run: pip install ansible-lint
+
+      - name: Run ansible-lint
+        run: ansible-lint infra/ansible/playbooks/
+        working-directory: ${{ github.workspace }}
--- a/.gitea/workflows/cd.yaml
+++ b/.gitea/workflows/cd.yaml
@@ -12,12 +12,24 @@ name: CD Pipeline
 on:
  push:
    branches: [main]
+    paths:
+      # 只有實際影響部署的程式碼才觸發 CD
+      - 'apps/**'
+      - 'k8s/**'
+      - '.gitea/workflows/**'
+      - '.dockerignore'
+      # docs/、memory/、ADR 等不觸發
+      # ops/monitoring/alerts-unified.yml 由 deploy-alerts.yaml 獨立處理 (I3)
  workflow_dispatch:
+    # 手動觸發永遠可用（用於補跑、緊急部署）

-# 2026-03-30 ogt: 佇列模式 - 等待前一個 run 完成，不取消
+# 2026-04-02 Claude Code: 改為搶佔模式 — 新 push 立即取消舊 build，只部署最新
+# 原理: concurrency group 保證同時只有一個 job 跑；cancel-in-progress:true 讓新的取代舊的
+# 解決: 多個 commit 快速連推時不再排隊堆積，且 docker build 卡住時不會阻塞後續部署
+# 安全: deploy 步驟本身有 kubectl rollout status 保護，不會出現半部署狀態
 concurrency:
  group: cd-deploy-${{ github.ref }}
-  cancel-in-progress: false
+  cancel-in-progress: true

 env:
  HARBOR: 192.168.0.110:5000
@@ -30,8 +42,13 @@ env:

 jobs:
  build-and-deploy:
-    # 2026-04-02 Claude Code: 修正為 self-hosted (ADR-039 鐵律 + feedback_github_billing.md)
-    runs-on: self-hosted
+    # 2026-04-02 ogt: Gitea runner label 是 ubuntu-latest (非 GitHub 的 self-hosted)
+    # ADR-039 鐵律: 使用自建 runner，但 Gitea label matching 不同於 GitHub
+    # 2026-04-02 Claude Code: 加入 timeout 防止 docker build/push 卡住超過 45 分鐘
+    timeout-minutes: 45
+    runs-on: ubuntu-latest
+    # 2026-04-10 ogt: B5 改用 docker run 本地啟動，移除 services: 宣告
+    # Gitea act runner 的 services: container name 為空，導致 CI 失敗
    steps:
      - uses: actions/checkout@v4

@@ -44,13 +61,17 @@ jobs:
          echo "start_time=$(date +%s)" >> $GITHUB_OUTPUT

      - name: Notify Pipeline Start
-        env:
-          TG_MSG: "🚀 <b>AWOOOI 部署開始</b>\n├ 📝 ${{ steps.commit.outputs.message }}\n├ 🔖 <code>${{ steps.commit.outputs.short_sha }}</code>\n├ 👤 ${{ github.actor }}\n└ 🌿 main"
+        # 2026-04-16 ogt + Claude Sonnet 4.6: 改用 HTML 結構化格式，提升可讀性
        run: |
-          printf '%b' "$TG_MSG" | curl -fS -X POST "https://api.telegram.org/bot${{ secrets.TELEGRAM_BOT_TOKEN }}/sendMessage" \
-            -d "chat_id=${{ secrets.TELEGRAM_CHAT_ID }}" \
-            -d "parse_mode=HTML" \
-            --data-urlencode "text@-"
+          COMMIT_MSG="${{ steps.commit.outputs.message }}"
+          SHORT_SHA="${{ steps.commit.outputs.short_sha }}"
+          ACTOR="${{ github.actor }}"
+          # HTML escape commit message（防特殊字元破壞 HTML）
+          COMMIT_ESC=$(echo "$COMMIT_MSG" | sed 's/&/\&amp;/g; s/</\&lt;/g; s/>/\&gt;/g')
+          MSG=$(printf '🚀 <b>AWOOOI 部署開始</b>\n├ 📝 <code>%s</code>\n├ 🔖 <code>%s</code>\n└ 👤 %s' "${COMMIT_ESC}" "${SHORT_SHA}" "${ACTOR}")
+          curl -fS -X POST "https://api.telegram.org/bot${{ secrets.TELEGRAM_BOT_TOKEN }}/sendMessage" \
+            -H "Content-Type: application/json" \
+            -d "$(jq -n --arg c "${{ secrets.TELEGRAM_CHAT_ID }}" --arg t "$MSG" '{chat_id:$c,text:$t,parse_mode:"HTML"}')"



@@ -63,9 +84,30 @@ jobs:
          HASH_FILE=/opt/api-venv/.deps_hash
          CURRENT_HASH=$(md5sum apps/api/pyproject.toml | awk '{print $1}')

-          if [ ! -d "$VENV" ] || [ "$(cat $HASH_FILE 2>/dev/null)" != "$CURRENT_HASH" ]; then
+          # python3.11 是 runner 層級持久安裝，只在首次或版本消失時才 apt-get
+          # 2026-04-05 Claude Code: 分離 apt-get 與 venv hash-guard，避免每次 deps 變更都重跑 apt
+          # 2026-04-16 ogt + Claude Sonnet 4.6: 修復 apt index 失敗 → 改用 --fix-missing + retry
+          if ! command -v python3.11 &>/dev/null; then
+            echo "📦 安裝 python3.11..."
+            apt-get clean && rm -rf /var/lib/apt/lists/*
+            apt-get update -q --fix-missing || apt-get update -q || true
+            apt-get install -y -q python3.11-venv python3.11 || \
+              (add-apt-repository ppa:deadsnakes/python -y 2>/dev/null && apt-get update -q && apt-get install -y -q python3.11-venv python3.11) || true
+          else
+            echo "⚡ python3.11 已安裝，跳過 apt-get"
+          fi
+          # 確保 python3.11 存在，否則 fallback 到系統 python3
+          if ! command -v python3.11 &>/dev/null; then
+            echo "⚠️ python3.11 安裝失敗，使用 python3 fallback"
+            ln -sf "$(which python3)" /usr/local/bin/python3.11 || true
+          fi
+
+          if [ ! -d "$VENV/bin" ] || [ "$(cat $HASH_FILE 2>/dev/null)" != "$CURRENT_HASH" ]; then
            echo "📦 deps 已變更，重建 venv..."
-            python3 -m venv $VENV
+            # 2026-04-17 ogt: /opt/api-venv 是 volume mount，不能 rm -rf 目錄本身
+            # 改用 find 清空內容，保留 mount point 目錄
+            find "$VENV" -mindepth 1 -delete 2>/dev/null || true
+            python3.11 -m venv $VENV
            source $VENV/bin/activate
            pip install -q uv
            cd apps/api && uv pip install -q -e ".[dev]" && cd -
@@ -77,14 +119,78 @@ jobs:

          cd apps/api
          # CI 排除需外部服務的測試 (Redis pool / Ollama — 2026-04-01 Claude Code)
-          pytest tests/ -v --tb=short -x \
+          # 2026-04-05 Claude Code: 修正 exit code — | tail 會吃掉 segfault (exit 139)
+          #   改用 tee + PIPESTATUS[0] 正確捕捉 pytest 本身的 exit code
+          # 2026-04-05 Claude Code: 加 --ignore=tests/integration 排除需 asyncpg 連線的 DB 測試
+          #   integration tests 在 prod K8s 部署後由 E2E Smoke Test 覆蓋
+          # PYTHONFAULTHANDLER=1: 若 C extension segfault，輸出完整 Python stacktrace
+          # 2026-04-05 Claude Code: test_github_webhook.py 已根治
+          #   原問題: import src.main → asyncpg C ext segfault (exit 139)
+          #   修復: 改用最小化 app，只掛載 github_webhook router，不走 DB import chain
+          #   現在可安全加入 CI 測試
+          PYTHONFAULTHANDLER=1 python3.11 -m pytest tests/ -v --tb=short -x \
+            --ignore=tests/integration \
            --ignore=tests/test_anomaly_counter.py \
            --ignore=tests/test_global_repair_cooldown.py \
            --ignore=tests/test_redis_multisig.py \
            --ignore=tests/test_model_regression.py \
            --ignore=tests/test_prompt_validation.py \
-            2>&1 | tail -50
-          echo "✅ API 測試通過"
+            --ignore=tests/e2e_network_test.py \
+            2>&1 | tee /tmp/pytest-output.txt; PYTEST_EXIT=${PIPESTATUS[0]}
+          tail -60 /tmp/pytest-output.txt
+          exit $PYTEST_EXIT
+
+      # ── 整合測試 B5 (2026-04-10) ──────────────────────────────────────────
+      # B5 整合測試 — postgres-test 由 services: 提供，localhost:15432 直連
+      # 2026-04-10 Claude Sonnet 4.6: 用 psql 直連 localhost:15432 初始化 schema
+      #   (docker exec 在 act runner 內無法取得 service container name)
+      # B5: Gitea act runner 的 services: 實作與 GitHub Actions 不同
+      # service container 啟動後需直連，但 act 的 container name 可能為空
+      # 2026-04-10 ogt: 改用 docker run 本地啟動取代 services: 宣告
+      # 2026-04-19 ogt + Claude Opus 4.7: cd 連續 2 次 fail (run 984/985)
+      #   真因: act runner 把 ci-runner 跑在獨立 user-defined network,
+      #         pg-test-b5 預設用 host bridge → 兩邊隔離無法連 (172.17.0.2 timeout)
+      #   修法: 把 pg-test-b5 加入 act task 的 network,用 container name 連線
+      - name: Integration Tests (B5 — 真實 DB)
+        run: |
+          cd apps/api
+          # 安裝 psql client
+          if ! command -v psql &>/dev/null; then
+            apt-get install -y -q postgresql-client
+          fi
+          # 2026-04-19 ogt + Claude Opus 4.7 v3: 主動創 shared network
+          # 之前 grep ACT_NET 在 c0f3509 run 沒 match → fallback bridge → container name DNS 失效
+          # 真因: default bridge 不支援 container name DNS,必須 user-defined network
+          # 修法: 主動建 'b5-test-net' (idempotent),ci-runner + pg-test-b5 都加入
+          B5_NET="b5-test-net"
+          docker network create "$B5_NET" 2>/dev/null || true
+          # 當前 ci-runner container (hostname == short container id) 連上此 network
+          # 若已連 → docker network connect 回 error 1,用 || true 吞掉
+          docker network connect "$B5_NET" "$HOSTNAME" 2>/dev/null || true
+          echo "B5 shared network: $B5_NET (ci-runner hostname: $HOSTNAME)"
+          # 啟動測試 DB 於 shared network,用 container name 'pg-test-b5' 連線
+          docker rm -f pg-test-b5 2>/dev/null || true
+          docker run -d --name pg-test-b5 \
+            --network="$B5_NET" \
+            -e POSTGRES_DB=awoooi_test \
+            -e POSTGRES_USER=awoooi \
+            -e POSTGRES_PASSWORD=awoooi_test_2026 \
+            pgvector/pgvector:pg16
+          # 等待就緒（用 container name,最多 60 秒）
+          for i in $(seq 1 30); do
+            PGPASSWORD=awoooi_test_2026 pg_isready -h pg-test-b5 -p 5432 -U awoooi && break || sleep 2
+          done
+          # 初始化 schema
+          PGPASSWORD=awoooi_test_2026 psql \
+            -h pg-test-b5 -p 5432 -U awoooi -d awoooi_test \
+            -f tests/integration/setup_test_schema.sql
+          # 跑測試
+          # B5 整合測試嚴格模式 (2026-04-13 ogt: 恢復 Break-Glass 移除)
+          # -m integration: override pyproject.toml addopts "-m 'not integration'"，讓標記測試可執行
+          TEST_DATABASE_URL="postgresql+asyncpg://awoooi:awoooi_test_2026@pg-test-b5:5432/awoooi_test?ssl=disable" \
+            /opt/api-venv/bin/pytest tests/integration/test_b5_core_flows.py -v --tb=short -m integration
+          # 清理
+          docker rm -f pg-test-b5 || true

      - name: Login to Harbor
        uses: docker/login-action@v3
@@ -96,7 +202,11 @@ jobs:
      # ── API 鏡像建置（含 Layer Cache 加速）──────────────────────────────
      # 2026-04-01 ogt: CACHE_BUST=git_sha 確保 src/ 和 models.json 層每次重建
      # deps 層 (pip install) 仍可 cache → 加速；代碼/配置層強制失效
+      # 首席架構師 Review C1 (2026-04-05 Claude Code): 補 DOCKER_BUILDKIT=1
+      # BUILDKIT_INLINE_CACHE=1 只有在 BuildKit 啟用時才有效
      - name: Build and Push API
+        env:
+          DOCKER_BUILDKIT: "1"
        run: |
          docker build -f apps/api/Dockerfile \
            --build-arg BUILDKIT_INLINE_CACHE=1 \
@@ -112,10 +222,13 @@ jobs:

      # ── Web 鏡像建置（精準快取失效）──────────────────────────────
      # 2026-03-30 ogt: NEXT_PUBLIC_* 必須用公網域名 (build-time 寫死)
-      # 2026-04-01 Claude Code: 改用 CACHE_BUST=git_sha 取代 --no-cache
+      # 2026-04-01 Claude Code: CACHE_BUST=git_sha 取代 --no-cache
      # - deps 層 (pnpm install) 仍可 cache → 節省 ~2-3 min
-      # - COPY . . 以下由 CACHE_BUST 強制失效 → CSRF fix 等代碼變更正確進入 bundle
+      # - COPY . . 以下由 CACHE_BUST 強制失效 → 業務邏輯/CSRF 等變更正確進入 bundle
+      # 2026-04-12 ogt: 實測 --no-cache=10m50s；CACHE_BUST=5m50s，恢復此方案
      - name: Build and Push Web
+        env:
+          DOCKER_BUILDKIT: "1"
        run: |
          docker build -f apps/web/Dockerfile \
            --build-arg NEXT_PUBLIC_API_URL=https://awoooi.wooo.work \
@@ -144,11 +257,34 @@ jobs:
          LANGFUSE_SECRET_KEY: ${{ secrets.LANGFUSE_SECRET_KEY }}
          # 2026-04-02 Claude Code: Telegram 白名單 (授權簽核用)
          TG_USER_WHITELIST: ${{ secrets.OPENCLAW_TG_USER_WHITELIST }}
+          # Phase O-4.1 2026-04-02: Sentry API Token (Wave A.1 ADR-037)
+          SENTRY_AUTH_TOKEN: ${{ secrets.SENTRY_AUTH_TOKEN }}
+          # ADR-059 2026-04-05: Gitea Webhook Secret (GITEA_ 前綴為保留字，改用 AWOOOI_ 前綴)
+          GITEA_WEBHOOK_SECRET: ${{ secrets.AWOOOI_GITEA_WEBHOOK_SECRET }}
+          # MCP Phase 3: ArgoCD API Token (2026-04-11 Claude Sonnet 4.6)
+          ARGOCD_API_TOKEN: ${{ secrets.ARGOCD_API_TOKEN }}
+          # 2026-04-18 ogt + Claude Opus 4.7: ADR-090-B L3-only 升級 L2（永久連線串 + 應用 secret）
+          DATABASE_URL: ${{ secrets.DATABASE_URL }}
+          MIGRATION_DATABASE_URL: ${{ secrets.MIGRATION_DATABASE_URL }}
+          REDIS_URL: ${{ secrets.REDIS_URL }}
+          JWT_SECRET: ${{ secrets.JWT_SECRET }}
+          JWT_ALGORITHM: ${{ secrets.JWT_ALGORITHM }}
+          WEBHOOK_HMAC_SECRET: ${{ secrets.WEBHOOK_HMAC_SECRET }}
+          SENTRY_DSN: ${{ secrets.SENTRY_DSN }}
+          CLAUDE_API_KEY: ${{ secrets.CLAUDE_API_KEY }}
+          # AWOOOI_ 前綴避開 Gitea 保留字（同 AWOOOI_GITEA_WEBHOOK_SECRET 模式）
+          GITEA_API_TOKEN: ${{ secrets.AWOOOI_GITEA_API_TOKEN }}
+          NEMOTRON_BOT_TOKEN: ${{ secrets.NEMOTRON_BOT_TOKEN }}
+          OPENCLAW_BOT_TOKEN: ${{ secrets.OPENCLAW_BOT_TOKEN }}
+          SMTP_HOST: ${{ secrets.SMTP_HOST }}
+          SRE_GROUP_CHAT_ID: ${{ secrets.SRE_GROUP_CHAT_ID }}
        run: |
+          # S1/S2: 統一命名 deploy_key，改用 ssh-keyscan（比 StrictHostKeyChecking=no 更安全）
          mkdir -p ~/.ssh
          echo "$SSH_PRIVATE_KEY" > ~/.ssh/deploy_key
          chmod 600 ~/.ssh/deploy_key
-          ssh -o StrictHostKeyChecking=no -i ~/.ssh/deploy_key wooo@192.168.0.121 << SECRETS
+          ssh-keyscan 192.168.0.121 >> ~/.ssh/known_hosts 2>/dev/null
+          ssh -i ~/.ssh/deploy_key wooo@192.168.0.121 << SECRETS
          set -e
          export KUBECONFIG=/etc/rancher/k3s/k3s.yaml

@@ -195,45 +331,238 @@ jobs:
            ]' && echo "✅ TG_USER_WHITELIST 已注入" || echo "⚠️ TG_USER_WHITELIST patch 失敗"
          fi

+          # Phase O-4.1 2026-04-02: Sentry Auth Token (Wave A.1 ADR-037)
+          if [ -n "${SENTRY_AUTH_TOKEN}" ]; then
+            sudo kubectl patch secret awoooi-secrets -n awoooi-prod --type='json' -p='[
+              {"op":"add","path":"/data/SENTRY_AUTH_TOKEN","value":"'$(echo -n "${SENTRY_AUTH_TOKEN}" | base64 -w 0)'"}
+            ]' && echo "✅ SENTRY_AUTH_TOKEN 已注入" || echo "⚠️ SENTRY_AUTH_TOKEN patch 失敗"
+          else
+            echo "⚠️ SENTRY_AUTH_TOKEN 未設定，Sentry Comment API 將跳過"
+          fi
+
+          # ADR-059 2026-04-05 Claude Code: Gitea Webhook Secret
+          if [ -n "${GITEA_WEBHOOK_SECRET}" ]; then
+            sudo kubectl patch secret awoooi-secrets -n awoooi-prod --type='json' -p='[
+              {"op":"add","path":"/data/GITEA_WEBHOOK_SECRET","value":"'$(echo -n "${GITEA_WEBHOOK_SECRET}" | base64 -w 0)'"}
+            ]' && echo "✅ GITEA_WEBHOOK_SECRET 已注入" || echo "⚠️ GITEA_WEBHOOK_SECRET patch 失敗"
+          else
+            echo "⚠️ GITEA_WEBHOOK_SECRET 未設定，Gitea Webhook 簽章驗證將在 prod 失效"
+          fi
+
+          # MCP Phase 3: ArgoCD API Token (2026-04-11 Claude Sonnet 4.6)
+          if [ -n "${ARGOCD_API_TOKEN}" ]; then
+            sudo kubectl patch secret awoooi-secrets -n awoooi-prod --type='json' -p='[
+              {"op":"add","path":"/data/ARGOCD_API_TOKEN","value":"'$(echo -n "${ARGOCD_API_TOKEN}" | base64 -w 0)'"}
+            ]' && echo "✅ ARGOCD_API_TOKEN 已注入" || echo "⚠️ ARGOCD_API_TOKEN patch 失敗"
+          else
+            echo "⚠️ ARGOCD_API_TOKEN 未設定，ArgoCD MCP 將使用空 token"
+          fi
+
+          # ============================================================================
+          # ADR-090-B 2026-04-18 ogt + Claude Opus 4.7: L3-only 升級 L2（13 個 key）
+          # ============================================================================
+          # 目的: 消滅「只存 K8s etcd 單點」的災難盲區，Gitea Secret 成為正式真相來源
+          # 注意: 每個 block 與上方維持相同結構（if guard + base64 -w 0 + json patch）
+
+          # DATABASE_URL — PG 應用連線串（2026-04-18 輪替）
+          if [ -n "${DATABASE_URL}" ]; then
+            sudo kubectl patch secret awoooi-secrets -n awoooi-prod --type='json' -p='[
+              {"op":"add","path":"/data/DATABASE_URL","value":"'$(echo -n "${DATABASE_URL}" | base64 -w 0)'"}
+            ]' && echo "✅ DATABASE_URL 已注入" || echo "⚠️ DATABASE_URL patch 失敗"
+          else
+            echo "⚠️ DATABASE_URL 未設定，awoooi-api 將無法連 PG"
+          fi
+
+          # MIGRATION_DATABASE_URL — CI migration 用 awoooi_migrator 限權帳號（ADR-090-B）
+          if [ -n "${MIGRATION_DATABASE_URL}" ]; then
+            sudo kubectl patch secret awoooi-secrets -n awoooi-prod --type='json' -p='[
+              {"op":"add","path":"/data/MIGRATION_DATABASE_URL","value":"'$(echo -n "${MIGRATION_DATABASE_URL}" | base64 -w 0)'"}
+            ]' && echo "✅ MIGRATION_DATABASE_URL 已注入" || echo "⚠️ MIGRATION_DATABASE_URL patch 失敗"
+          fi
+
+          # REDIS_URL — Redis 連線（6380 on 188）
+          if [ -n "${REDIS_URL}" ]; then
+            sudo kubectl patch secret awoooi-secrets -n awoooi-prod --type='json' -p='[
+              {"op":"add","path":"/data/REDIS_URL","value":"'$(echo -n "${REDIS_URL}" | base64 -w 0)'"}
+            ]' && echo "✅ REDIS_URL 已注入" || echo "⚠️ REDIS_URL patch 失敗"
+          else
+            echo "⚠️ REDIS_URL 未設定"
+          fi
+
+          # JWT_SECRET / JWT_ALGORITHM — API 認證
+          if [ -n "${JWT_SECRET}" ]; then
+            sudo kubectl patch secret awoooi-secrets -n awoooi-prod --type='json' -p='[
+              {"op":"add","path":"/data/JWT_SECRET","value":"'$(echo -n "${JWT_SECRET}" | base64 -w 0)'"}
+            ]' && echo "✅ JWT_SECRET 已注入" || echo "⚠️ JWT_SECRET patch 失敗"
+          fi
+          if [ -n "${JWT_ALGORITHM}" ]; then
+            sudo kubectl patch secret awoooi-secrets -n awoooi-prod --type='json' -p='[
+              {"op":"add","path":"/data/JWT_ALGORITHM","value":"'$(echo -n "${JWT_ALGORITHM}" | base64 -w 0)'"}
+            ]' && echo "✅ JWT_ALGORITHM 已注入" || echo "⚠️ JWT_ALGORITHM patch 失敗"
+          fi
+
+          # WEBHOOK_HMAC_SECRET — Alertmanager webhook HMAC 簽章
+          if [ -n "${WEBHOOK_HMAC_SECRET}" ]; then
+            sudo kubectl patch secret awoooi-secrets -n awoooi-prod --type='json' -p='[
+              {"op":"add","path":"/data/WEBHOOK_HMAC_SECRET","value":"'$(echo -n "${WEBHOOK_HMAC_SECRET}" | base64 -w 0)'"}
+            ]' && echo "✅ WEBHOOK_HMAC_SECRET 已注入" || echo "⚠️ WEBHOOK_HMAC_SECRET patch 失敗"
+          fi
+
+          # SENTRY_DSN — Sentry 錯誤追蹤（不是 auth token）
+          if [ -n "${SENTRY_DSN}" ]; then
+            sudo kubectl patch secret awoooi-secrets -n awoooi-prod --type='json' -p='[
+              {"op":"add","path":"/data/SENTRY_DSN","value":"'$(echo -n "${SENTRY_DSN}" | base64 -w 0)'"}
+            ]' && echo "✅ SENTRY_DSN 已注入" || echo "⚠️ SENTRY_DSN patch 失敗"
+          fi
+
+          # CLAUDE_API_KEY — Claude 備援 LLM
+          if [ -n "${CLAUDE_API_KEY}" ]; then
+            sudo kubectl patch secret awoooi-secrets -n awoooi-prod --type='json' -p='[
+              {"op":"add","path":"/data/CLAUDE_API_KEY","value":"'$(echo -n "${CLAUDE_API_KEY}" | base64 -w 0)'"}
+            ]' && echo "✅ CLAUDE_API_KEY 已注入" || echo "⚠️ CLAUDE_API_KEY patch 失敗"
+          fi
+
+          # GITEA_API_TOKEN — Gitea API Token（從 AWOOOI_GITEA_API_TOKEN 映射）
+          if [ -n "${GITEA_API_TOKEN}" ]; then
+            sudo kubectl patch secret awoooi-secrets -n awoooi-prod --type='json' -p='[
+              {"op":"add","path":"/data/GITEA_API_TOKEN","value":"'$(echo -n "${GITEA_API_TOKEN}" | base64 -w 0)'"}
+            ]' && echo "✅ GITEA_API_TOKEN 已注入" || echo "⚠️ GITEA_API_TOKEN patch 失敗"
+          fi
+
+          # NEMOTRON_BOT_TOKEN / OPENCLAW_BOT_TOKEN — 多 Bot 架構
+          if [ -n "${NEMOTRON_BOT_TOKEN}" ]; then
+            sudo kubectl patch secret awoooi-secrets -n awoooi-prod --type='json' -p='[
+              {"op":"add","path":"/data/NEMOTRON_BOT_TOKEN","value":"'$(echo -n "${NEMOTRON_BOT_TOKEN}" | base64 -w 0)'"}
+            ]' && echo "✅ NEMOTRON_BOT_TOKEN 已注入" || echo "⚠️ NEMOTRON_BOT_TOKEN patch 失敗"
+          fi
+          if [ -n "${OPENCLAW_BOT_TOKEN}" ]; then
+            sudo kubectl patch secret awoooi-secrets -n awoooi-prod --type='json' -p='[
+              {"op":"add","path":"/data/OPENCLAW_BOT_TOKEN","value":"'$(echo -n "${OPENCLAW_BOT_TOKEN}" | base64 -w 0)'"}
+            ]' && echo "✅ OPENCLAW_BOT_TOKEN 已注入" || echo "⚠️ OPENCLAW_BOT_TOKEN patch 失敗"
+          fi
+
+          # SMTP_HOST / SRE_GROUP_CHAT_ID
+          if [ -n "${SMTP_HOST}" ]; then
+            sudo kubectl patch secret awoooi-secrets -n awoooi-prod --type='json' -p='[
+              {"op":"add","path":"/data/SMTP_HOST","value":"'$(echo -n "${SMTP_HOST}" | base64 -w 0)'"}
+            ]' && echo "✅ SMTP_HOST 已注入" || echo "⚠️ SMTP_HOST patch 失敗"
+          fi
+          if [ -n "${SRE_GROUP_CHAT_ID}" ]; then
+            sudo kubectl patch secret awoooi-secrets -n awoooi-prod --type='json' -p='[
+              {"op":"add","path":"/data/SRE_GROUP_CHAT_ID","value":"'$(echo -n "${SRE_GROUP_CHAT_ID}" | base64 -w 0)'"}
+            ]' && echo "✅ SRE_GROUP_CHAT_ID 已注入" || echo "⚠️ SRE_GROUP_CHAT_ID patch 失敗"
+          fi
+
+          # 2026-04-06 Claude Code: Sprint 3 T2 — known_hosts Secret (Security Fix A1)
+          # 替換 StrictHostKeyChecking=no，讓 SSH 修復路徑使用已知主機指紋
+          ssh-keyscan -H 192.168.0.110 > /tmp/known_hosts_repair 2>/dev/null
+          ssh-keyscan -H 192.168.0.188 >> /tmp/known_hosts_repair 2>/dev/null
+          if [ -s /tmp/known_hosts_repair ]; then
+            sudo kubectl create secret generic awoooi-repair-known-hosts \
+              -n awoooi-prod \
+              --from-file=known_hosts=/tmp/known_hosts_repair \
+              --dry-run=client -o yaml | sudo kubectl apply -f - \
+              && echo "✅ awoooi-repair-known-hosts Secret 已建立/更新" \
+              || echo "⚠️ awoooi-repair-known-hosts Secret 建立失敗 (非致命)"
+            rm -f /tmp/known_hosts_repair
+          else
+            echo "⚠️ ssh-keyscan 掃描失敗，跳過 known_hosts Secret"
+          fi
+
          echo "✅ 所有 Secrets 注入完成"
          SECRETS

-      # 2026-04-01 ogt: 合併 ConfigMap + Deploy + Health Check 為單一 SSH step
-      # 原本 3 次獨立 SSH 連線 → 節省 ~30s 握手開銷
-      - name: Deploy to K8s
+      # 2026-04-11 Claude Sonnet 4.6 (Sprint B-3 ADR-069):
+      # Deploy 改為 ArgoCD GitOps 模式：更新 kustomization.yaml → git push [skip ci] → ArgoCD sync
+      # 舊做法 (kubectl set image) 與 ArgoCD selfHeal 衝突 — ArgoCD 會 revert 任何直接 kubectl 操作
+      # 新做法流程：
+      #   1. 更新 kustomization.yaml image tag（用 kustomize edit set image）
+      #   2. Apply ConfigMap/ServiceRegistry（不含 Deployment，由 ArgoCD 管）
+      #   3. git commit [skip ci] + push → 觸發 ArgoCD automated sync
+      #   4. 等待 ArgoCD sync + rollout 完成
+      #   5. Health Check
+      - name: Deploy to K8s (ArgoCD GitOps)
        env:
          SSH_PRIVATE_KEY: ${{ secrets.DEPLOY_SSH_KEY }}
+          GITEA_TOKEN: ${{ secrets.CD_PUSH_TOKEN }}
        run: |
-          # Step 1: Apply ConfigMap (stdin pipe，必須獨立)
+          mkdir -p ~/.ssh
+          echo "$SSH_PRIVATE_KEY" > ~/.ssh/deploy_key
+          chmod 600 ~/.ssh/deploy_key
+          ssh-keyscan 192.168.0.121 >> ~/.ssh/known_hosts 2>/dev/null
+
+          IMAGE_TAG="${{ github.sha }}"
+          HARBOR=192.168.0.110:5000
+
+          # ─── Step 1: Apply ConfigMap + ServiceRegistry (ArgoCD 管的是 Deployment，ConfigMap 仍直接 apply) ───
          cat k8s/awoooi-prod/04-configmap.yaml | \
-            ssh -o StrictHostKeyChecking=no -i ~/.ssh/deploy_key wooo@192.168.0.121 \
+            ssh -i ~/.ssh/deploy_key wooo@192.168.0.121 \
            "export KUBECONFIG=/etc/rancher/k3s/k3s.yaml && sudo kubectl apply -f -"
          echo "✅ ConfigMap 已更新"

-          # Step 2: Set images + Rollout + Health Check (合併一次 SSH)
-          ssh -o StrictHostKeyChecking=no -i ~/.ssh/deploy_key wooo@192.168.0.121 << 'DEPLOY'
+          cat k8s/awoooi-prod/15-service-registry-configmap.yaml | \
+            ssh -i ~/.ssh/deploy_key wooo@192.168.0.121 \
+            "export KUBECONFIG=/etc/rancher/k3s/k3s.yaml && sudo kubectl apply -f -"
+          echo "✅ Service Registry ConfigMap 已更新"
+
+          # ─── Step 2: 更新 kustomization.yaml image tag ───
+          # 安裝 kustomize（若未安裝）
+          if ! command -v kustomize &>/dev/null; then
+            curl -sL https://github.com/kubernetes-sigs/kustomize/releases/download/kustomize%2Fv5.3.0/kustomize_v5.3.0_linux_amd64.tar.gz | tar xz -C /usr/local/bin
+          fi
+
+          cd k8s/awoooi-prod
+          # kustomize edit set image 更新 tag
+          kustomize edit set image \
+            192.168.0.110:5000/library/api:IMAGE_TAG_PLACEHOLDER=${HARBOR}/awoooi/api:${IMAGE_TAG}
+          kustomize edit set image \
+            192.168.0.110:5000/library/web:IMAGE_TAG_PLACEHOLDER=${HARBOR}/awoooi/web:${IMAGE_TAG}
+          cd ../..
+
+          # ─── Step 3: git commit [skip ci] + push → 觸發 ArgoCD sync ───
+          git config user.email "cd@awoooi.internal"
+          git config user.name "AWOOOI CD"
+          git add k8s/awoooi-prod/kustomization.yaml
+          git diff --cached --quiet && echo "⚡ kustomization.yaml 無變化，跳過 push" || {
+            git commit -m "chore(cd): deploy ${IMAGE_TAG::7} [skip ci]"
+            # 用 token 推送（避免 SSH key 需要額外設定 push 權限）
+            git remote remove gitea 2>/dev/null || true
+            git remote add gitea http://wooo:${GITEA_TOKEN}@192.168.0.110:3001/wooo/awoooi.git
+            # 先 rebase 避免 non-fast-forward (其他 commit 在 CI 期間已推入)
+            # 2026-04-17 ogt: -X theirs — kustomization.yaml 衝突時採用當次部署的 image tag
+            git fetch gitea main
+            git rebase -X theirs gitea/main
+            git push gitea main
+            echo "✅ kustomization.yaml 已 push，等待 ArgoCD sync..."
+          }
+
+          # ─── Step 4: 等待 ArgoCD sync + rollout ───
+          ssh -i ~/.ssh/deploy_key wooo@192.168.0.121 << 'ARGOCD_WAIT'
          set -e
          export KUBECONFIG=/etc/rancher/k3s/k3s.yaml

-          # 2026-03-30 ogt: sudoers NOPASSWD 已設定，無需密碼
-          sudo kubectl set image deployment/awoooi-api \
-            api=192.168.0.110:5000/awoooi/api:${{ github.sha }} \
-            -n awoooi-prod
-          sudo kubectl set image deployment/awoooi-web \
-            web=192.168.0.110:5000/awoooi/web:${{ github.sha }} \
-            -n awoooi-prod
-          sudo kubectl set image deployment/awoooi-worker \
-            worker=192.168.0.110:5000/awoooi/api:${{ github.sha }} \
-            -n awoooi-prod
+          # 等待 ArgoCD Application Synced（最多 120s）
+          echo "⏳ 等待 ArgoCD sync..."
+          for i in $(seq 1 24); do
+            SYNC=$(sudo kubectl get application awoooi-prod -n argocd \
+              -o jsonpath='{.status.sync.status}' 2>/dev/null || echo "Unknown")
+            HEALTH=$(sudo kubectl get application awoooi-prod -n argocd \
+              -o jsonpath='{.status.health.status}' 2>/dev/null || echo "Unknown")
+            echo "  ArgoCD: sync=$SYNC health=$HEALTH"
+            if [ "$SYNC" = "Synced" ] && [ "$HEALTH" = "Healthy" ]; then
+              echo "✅ ArgoCD Synced + Healthy"
+              break
+            fi
+            sleep 5
+          done

+          # 確認 rollout 完成
          sudo kubectl rollout status deployment/awoooi-api -n awoooi-prod --timeout=120s
          sudo kubectl rollout status deployment/awoooi-web -n awoooi-prod --timeout=120s
          sudo kubectl rollout status deployment/awoooi-worker -n awoooi-prod --timeout=120s
          echo "✅ 部署完成"

-          # Health Check (同一 SSH session，省去再次握手)
-          # 2026-04-01 Claude Code: 改用 break+flag，避免 exit 0 在 heredoc 引發 SIGPIPE
-          sleep 10
+          # Health Check
          HEALTH_PASS=0
          for i in 1 2 3; do
            HTTP_CODE=$(curl -s -w "%{http_code}" -o /dev/null --connect-timeout 10 "http://localhost:32334/api/v1/health")
@@ -249,7 +578,64 @@ jobs:
            echo "❌ API 健康檢查失敗"
            exit 1
          fi
-          DEPLOY
+          ARGOCD_WAIT
+
+      # 2026-04-09 Claude Sonnet 4.6: Sprint 5.2 — 同步 ops 腳本到 188 (ollama user)
+      # DEPLOY_SSH_KEY_188 = gitea-cd-deploy-188 (ed25519，只有 188 authorized_keys)
+      # 腳本: docker-health-monitor.sh + pg-backup.sh (感知層 + 備份)
+      - name: Sync Ops Scripts to 188
+        continue-on-error: true
+        env:
+          SSH_KEY_188: ${{ secrets.DEPLOY_SSH_KEY_188 }}
+        run: |
+          mkdir -p ~/.ssh
+          echo "$SSH_KEY_188" > ~/.ssh/deploy_key_188
+          chmod 600 ~/.ssh/deploy_key_188
+          ssh-keyscan 192.168.0.188 >> ~/.ssh/known_hosts 2>/dev/null
+
+          # 同步 docker-health-monitor.sh
+          scp -i ~/.ssh/deploy_key_188 \
+            scripts/ops/docker-health-monitor.sh \
+            ollama@192.168.0.188:~/awoooi-ops/docker-health-monitor.sh \
+            && echo "✅ docker-health-monitor.sh 已同步" \
+            || echo "⚠️ docker-health-monitor.sh 同步失敗"
+
+          # 同步 pg-backup.sh
+          scp -i ~/.ssh/deploy_key_188 \
+            scripts/ops/pg-backup.sh \
+            ollama@192.168.0.188:~/awoooi-ops/pg-backup.sh \
+            && echo "✅ pg-backup.sh 已同步" \
+            || echo "⚠️ pg-backup.sh 同步失敗"
+
+          # 確保執行權限
+          ssh -i ~/.ssh/deploy_key_188 ollama@192.168.0.188 \
+            "chmod +x ~/awoooi-ops/docker-health-monitor.sh ~/awoooi-ops/pg-backup.sh && echo '✅ 權限設定完成'" \
+            || echo "⚠️ 權限設定失敗"
+
+      # Phase O-4.5 2026-04-02: Alert Chain Smoke Test (Wave A.6 + B.2 ADR-037)
+      # 驗證告警鏈路 E2E: API Health + Webhook + OTEL + Event Exporter
+      # 2026-04-05 Claude Code cache優化: 使用 /opt/api-venv (已有 requests)，移除 Setup Python Tools step
+      # 2026-04-10 ogt: 移除 continue-on-error — 告警鏈路失敗必須阻塞部署
+      - name: Alert Chain Smoke Test
+        id: alert_chain_smoke
+        run: |
+          # 2026-04-05 Claude Code: 使用真實 API 地址（192.168.0.121:32334 NodePort）
+          # CI job container 的 localhost 不等於 K3s 節點，必須用內網 IP
+          # 首席架構師 Review C2: 修正永遠 pass — || true 移除，結果正確寫入 GITHUB_OUTPUT
+          source /opt/api-venv/bin/activate
+          python3 scripts/alert_chain_smoke_test.py \
+            --api-url http://192.168.0.121:32334 \
+            --json | tee /tmp/alert_chain_result.json \
+            && echo "alert_chain_status=pass" >> $GITHUB_OUTPUT \
+            || echo "alert_chain_status=fail" >> $GITHUB_OUTPUT
+
+      # Phase O-5 Wave C.2 2026-04-02 ogt: 監控覆蓋率驗證 (generate_monitoring.py --check)
+      # 2026-04-10 ogt: 移除 continue-on-error — 覆蓋率不足必須阻塞部署
+      - name: Monitoring Coverage Check
+        id: monitoring_coverage
+        run: |
+          source /opt/api-venv/bin/activate
+          python3 scripts/generate_monitoring.py --check && echo "coverage_status=pass" >> $GITHUB_OUTPUT || echo "coverage_status=fail" >> $GITHUB_OUTPUT

      # [首席架構師] 新增 Playwright E2E Smoke Test 步驟 v1.0.0 2026-04-01 (台北時間)
      # continue-on-error: true — smoke 失敗不阻塞部署，但結果會反映在 TG 通知
@@ -257,36 +643,74 @@ jobs:
        id: smoke
        continue-on-error: true
        run: |
+          # 首席架構師 Review I4 + 2026-04-05 Claude Code cache優化:
+          # playwright.config.ts import @playwright/test — 必須先安裝 pnpm node_modules
+          # pnpm store 持久化到 /opt/pnpm-store，pnpm-lock.yaml hash 未變則 --prefer-offline
+          PNPM_STORE=/opt/pnpm-store
+          PNPM_HASH_FILE=/opt/pnpm-store/.lock_hash
+          CURRENT_PNPM_HASH=$(md5sum pnpm-lock.yaml | awk '{print $1}')
+
+          corepack enable 2>/dev/null || npm install -g pnpm@9 -q
+          pnpm config set store-dir $PNPM_STORE
+
+          if [ "$(cat $PNPM_HASH_FILE 2>/dev/null)" != "$CURRENT_PNPM_HASH" ]; then
+            echo "📦 pnpm lock 已變更，重裝 node_modules..."
+            pnpm install --frozen-lockfile 2>&1 | tail -5
+            echo "$CURRENT_PNPM_HASH" > $PNPM_HASH_FILE
+          else
+            echo "⚡ 使用快取 pnpm store (lock 未變更)，prefer-offline..."
+            pnpm install --frozen-lockfile --prefer-offline 2>&1 | tail -5
+          fi
+
          cd apps/web
-          # 安裝 Playwright Chromium（CI 環境，含系統依賴）
-          npx playwright install chromium --with-deps
-          # 跑 smoke test，line reporter 方便 CI 日誌閱讀
-          npx playwright test tests/e2e/smoke.spec.ts --reporter=line
-          echo "smoke_status=pass" >> $GITHUB_OUTPUT
+          # Playwright Chromium 持久化到 /opt/playwright-browsers，版本 hash guard
+          export PLAYWRIGHT_BROWSERS_PATH=/opt/playwright-browsers
+          PLAYWRIGHT_VER=$(node -e "console.log(require('./package.json').devDependencies['@playwright/test'] || '')" 2>/dev/null || echo "unknown")
+          PLAYWRIGHT_HASH_FILE=/opt/playwright-browsers/.version_hash
+          if [ "$(cat $PLAYWRIGHT_HASH_FILE 2>/dev/null)" != "$PLAYWRIGHT_VER" ]; then
+            echo "📦 Playwright 版本變更 ($PLAYWRIGHT_VER)，重裝 Chromium..."
+            npx playwright install chromium --with-deps 2>&1 | tail -5
+            echo "$PLAYWRIGHT_VER" > $PLAYWRIGHT_HASH_FILE
+          else
+            echo "⚡ 使用快取 Playwright Chromium ($PLAYWRIGHT_VER)"
+          fi
+          # 對已部署的生產環境跑 smoke test
+          npx playwright test tests/e2e/smoke.spec.ts --reporter=line \
+            && echo "smoke_status=pass" >> $GITHUB_OUTPUT \
+            || echo "smoke_status=fail" >> $GITHUB_OUTPUT
        env:
-          # Playwright 在 CI 環境使用已建置的 pnpm node_modules
          CI: "true"
+          # 直接測試已部署的生產環境，不啟動本地 dev server
+          PLAYWRIGHT_BASE_URL: "https://awoooi.wooo.work"

      - name: Notify Health Check Success
        env:
          SMOKE_RESULT: ${{ steps.smoke.outcome == 'success' && '✅' || '⚠️' }}
-          TG_MSG: "✅ <b>AWOOOI 部署完成</b>\n├ 📝 ${{ steps.commit.outputs.message }}\n├ 🔖 <code>${{ steps.commit.outputs.short_sha }}</code>\n├ ⏱️ 耗時: ${MINUTES}m ${SECONDS}s\n├ 📦 API: ✅ Web: ✅\n├ 🩺 Health: ✅\n└ 🎭 Smoke: ${SMOKE_RESULT}"
+          ALERT_CHAIN_RESULT: ${{ steps.alert_chain_smoke.outcome == 'success' && '✅' || '⚠️' }}
+          MONITORING_RESULT: ${{ steps.monitoring_coverage.outcome == 'success' && '✅' || '⚠️' }}
        run: |
          END_TIME=$(date +%s)
          DURATION=$((END_TIME - ${{ steps.commit.outputs.start_time }}))
          MINUTES=$((DURATION / 60))
          SECONDS=$((DURATION % 60))
+          # 2026-04-05 ogt: TG_MSG 必須在 shell 中組裝，才能展開 ${MINUTES}/${SECONDS} 等 shell 變數
+          # 2026-04-05 ogt: 移除 parse_mode=HTML，避免 commit message 含特殊字元導致 400
+          COMMIT_MSG="${{ steps.commit.outputs.message }}"
+          SHORT_SHA="${{ steps.commit.outputs.short_sha }}"
+          TG_MSG="✅ AWOOOI 部署完成\n├ 📝 ${COMMIT_MSG}\n├ 🔖 ${SHORT_SHA}\n├ ⏱️ 耗時: ${MINUTES}m ${SECONDS}s\n├ 📦 API: ✅ Web: ✅\n├ 🩺 Health: ✅\n├ 🔗 Alert Chain: ${ALERT_CHAIN_RESULT}\n├ 📊 Monitoring: ${MONITORING_RESULT}\n└ 🎭 Smoke: ${SMOKE_RESULT}"
          printf '%b' "$TG_MSG" | curl -fS -X POST "https://api.telegram.org/bot${{ secrets.TELEGRAM_BOT_TOKEN }}/sendMessage" \
            -d "chat_id=${{ secrets.TELEGRAM_CHAT_ID }}" \
-            -d "parse_mode=HTML" \
-            --data-urlencode "text@-"
+            --data-urlencode "text@-" || echo "TG notify warning (non-fatal)"

      - name: Notify Pipeline Failure
+        # 2026-04-16 ogt + Claude Sonnet 4.6: 改用 HTML 結構化格式
        if: failure()
-        env:
-          TG_MSG: "❌ <b>AWOOOI 部署失敗</b>\n├ 📝 ${{ steps.commit.outputs.message }}\n├ 🔖 <code>${{ steps.commit.outputs.short_sha }}</code>\n├ 👤 ${{ github.actor }}\n└ 🔗 <a href=\"http://192.168.0.110:3001/wooo/awoooi/actions\">查看日誌</a>"
        run: |
-          printf '%b' "$TG_MSG" | curl -fS -X POST "https://api.telegram.org/bot${{ secrets.TELEGRAM_BOT_TOKEN }}/sendMessage" \
-            -d "chat_id=${{ secrets.TELEGRAM_CHAT_ID }}" \
-            -d "parse_mode=HTML" \
-            --data-urlencode "text@-"
+          COMMIT_MSG="${{ steps.commit.outputs.message }}"
+          SHORT_SHA="${{ steps.commit.outputs.short_sha }}"
+          ACTOR="${{ github.actor }}"
+          COMMIT_ESC=$(echo "$COMMIT_MSG" | sed 's/&/\&amp;/g; s/</\&lt;/g; s/>/\&gt;/g')
+          MSG=$(printf '❌ <b>AWOOOI 部署失敗</b>\n├ 📝 <code>%s</code>\n├ 🔖 <code>%s</code>\n├ 👤 %s\n└ 🔗 http://192.168.0.110:3001/wooo/awoooi/actions' "${COMMIT_ESC}" "${SHORT_SHA}" "${ACTOR}")
+          curl -fS -X POST "https://api.telegram.org/bot${{ secrets.TELEGRAM_BOT_TOKEN }}/sendMessage" \
+            -H "Content-Type: application/json" \
+            -d "$(jq -n --arg c "${{ secrets.TELEGRAM_CHAT_ID }}" --arg t "$MSG" '{chat_id:$c,text:$t,parse_mode:"HTML"}')"
--- a/.gitea/workflows/deploy-alerts.yaml
+++ b/.gitea/workflows/deploy-alerts.yaml
@@ -0,0 +1,52 @@
+# =============================================================================
+# Deploy Prometheus Alert Rules (獨立 workflow)
+# 2026-04-05 Claude Code (ADR-039 I3): 從 cd.yaml 分離
+# 觸發條件: ops/monitoring/alerts-unified.yml 有變更 或 workflow_dispatch
+# 說明: 告警規則部署不依賴應用構建，獨立觸發以加快響應速度
+# =============================================================================
+
+name: Deploy Alert Rules
+
+on:
+  push:
+    branches: [main]
+    paths:
+      - 'ops/monitoring/alerts-unified.yml'
+  workflow_dispatch:
+
+jobs:
+  deploy-alerts:
+    name: "Deploy Prometheus Alert Rules"
+    runs-on: ubuntu-latest
+    timeout-minutes: 5
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Validate alerts YAML
+        # 2026-04-08 Claude Sonnet 4.6: pip install pyyaml 確保 runner 有此依賴
+        run: |
+          pip3 install -q pyyaml 2>/dev/null || pip install -q pyyaml
+          python3 -c "import yaml; yaml.safe_load(open('ops/monitoring/alerts-unified.yml')); print('YAML OK')"
+
+      - name: Setup SSH key
+        run: |
+          mkdir -p ~/.ssh
+          echo "${{ secrets.DEPLOY_SSH_KEY }}" > ~/.ssh/id_ed25519
+          chmod 600 ~/.ssh/id_ed25519
+          ssh-keyscan 192.168.0.110 >> ~/.ssh/known_hosts
+
+      - name: Deploy alerts to Prometheus
+        run: bash scripts/ops/deploy-alerts.sh
+
+      - name: Notify deploy result
+        if: always()
+        run: |
+          STATUS="${{ job.status }}"
+          EMOJI="✅"
+          [ "$STATUS" != "success" ] && EMOJI="❌"
+          SHORT_SHA="${{ github.sha }}"
+          SHORT_SHA="${SHORT_SHA:0:7}"
+          MSG="${EMOJI} Prometheus 告警規則部署 ${STATUS} (${SHORT_SHA})"
+          curl -fS -X POST "https://api.telegram.org/bot${{ secrets.TELEGRAM_BOT_TOKEN }}/sendMessage" \
+            -d "chat_id=${{ secrets.TELEGRAM_CHAT_ID }}" \
+            --data-urlencode "text=${MSG}" || true
--- a/.gitea/workflows/e2e-health.yaml
+++ b/.gitea/workflows/e2e-health.yaml
@@ -8,11 +8,11 @@
 name: E2E Health Check

 on:
-  push:
-    branches: [main]
  workflow_dispatch:
  schedule:
    - cron: '0 16 * * *'  # 每日 00:00 台北 (UTC+8)
+  # push 觸發已移除 (2026-04-02): E2E health check 不需要每次 push 都跑
+  # CD pipeline 本身已有 smoke test；E2E 用排程或手動觸發即可

 # OTEL CI/CD 監控 (2026-03-31 #46c)
 env:
--- a/.gitea/workflows/run-migration.yml
+++ b/.gitea/workflows/run-migration.yml
@@ -0,0 +1,106 @@
+# ADR-090-B: Gitea CI 自動 migration workflow
+# 建立時間: 2026-04-18 台北時區
+# 建立者: ogt + Claude Opus 4.7 (1M)
+#
+# 目的: 每次 main 分支有新 migration SQL 檔,自動:
+#   1. 用 MIGRATION_DATABASE_URL (awoooi_migrator 限權帳號) 連 PG
+#   2. 只跑「新增」的 migration (比對已執行列表)
+#   3. 跑後寫 asset_discovery_run + automation_operation_log 記錄
+#   4. 失敗自動 rollback (single transaction + ON_ERROR_STOP)
+#
+# 觸發: push to main,且 apps/api/migrations/ 有變更
+
+name: run-migration
+
+on:
+  push:
+    branches: [main]
+    paths:
+      - 'apps/api/migrations/*.sql'
+
+jobs:
+  migrate:
+    runs-on: ubuntu-latest  # 或 self-hosted runner on 110
+    container:
+      image: postgres:15-alpine  # 帶 psql
+
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v4
+        with:
+          fetch-depth: 2  # 需比對上一個 commit
+
+      - name: Identify new migrations
+        id: diff
+        run: |
+          NEW_FILES=$(git diff --name-only --diff-filter=A HEAD~1 HEAD -- 'apps/api/migrations/*.sql' || true)
+          echo "new_files<<EOF" >> $GITHUB_OUTPUT
+          echo "$NEW_FILES" >> $GITHUB_OUTPUT
+          echo "EOF" >> $GITHUB_OUTPUT
+          echo "=== New migration files ==="
+          echo "$NEW_FILES"
+
+      - name: Apply new migrations
+        if: steps.diff.outputs.new_files != ''
+        env:
+          # 從 Gitea secrets 取,不直接明碼
+          PGURL: ${{ secrets.MIGRATION_DATABASE_URL }}
+        run: |
+          set -euo pipefail
+          if [ -z "$PGURL" ]; then
+            echo "::error::MIGRATION_DATABASE_URL secret not set in Gitea"
+            exit 1
+          fi
+
+          # 套用每個新檔 (single transaction per file)
+          echo "${{ steps.diff.outputs.new_files }}" | while IFS= read -r file; do
+            [ -z "$file" ] && continue
+            echo "=== Applying: $file ==="
+            psql "$PGURL" \
+              -v ON_ERROR_STOP=1 \
+              --single-transaction \
+              -f "$file"
+            echo "=== OK: $file ==="
+          done
+
+      - name: Seed asset_discovery_run (audit)
+        if: steps.diff.outputs.new_files != ''
+        env:
+          PGURL: ${{ secrets.MIGRATION_DATABASE_URL }}
+        run: |
+          FILES_JSON=$(echo "${{ steps.diff.outputs.new_files }}" | jq -Rn '[inputs | select(length > 0)]')
+          psql "$PGURL" -c "
+            INSERT INTO asset_discovery_run (
+              run_id, triggered_by, scope, scan_depth, status,
+              started_at, ended_at, tools_used, summary
+            ) VALUES (
+              gen_random_uuid(),
+              'ci:gitea',
+              ARRAY['schema_migration'],
+              'full',
+              'success',
+              NOW(),
+              NOW(),
+              '{\"psql\": 1, \"gitea_ci\": 1}'::jsonb,
+              jsonb_build_object(
+                'type', 'ci_migration',
+                'commit_sha', '${{ github.sha }}',
+                'files', $FILES_JSON
+              )
+            );
+          "
+
+      - name: Notify Telegram (if configured)
+        if: always()
+        env:
+          TG_TOKEN: ${{ secrets.TELEGRAM_BOT_TOKEN }}
+          TG_CHAT: ${{ secrets.TELEGRAM_OPS_CHAT_ID }}
+        run: |
+          if [ -n "$TG_TOKEN" ] && [ -n "$TG_CHAT" ]; then
+            STATUS="${{ job.status }}"
+            MSG="🗄️ Migration CI: \`${STATUS}\` — commit ${{ github.sha }}"
+            curl -s -X POST "https://api.telegram.org/bot${TG_TOKEN}/sendMessage" \
+              -d chat_id="${TG_CHAT}" \
+              -d parse_mode="Markdown" \
+              -d text="${MSG}" || true
+          fi
--- a/.gitea/workflows/type-sync-check.yaml
+++ b/.gitea/workflows/type-sync-check.yaml
@@ -30,9 +30,10 @@ jobs:
      - uses: actions/checkout@v4

      - name: Setup Python
-        uses: actions/setup-python@v5
-        with:
-          python-version: '3.11'
+        # 2026-04-05 Claude Code: 改用 apt 安裝，避免 setup-python toolcache glibc 版本不符
+        run: |
+          python3 --version
+          pip3 install -q uv 2>/dev/null || (apt-get update -q && apt-get install -y -q python3-pip && pip3 install -q uv)

      - name: Setup Node.js
        uses: actions/setup-node@v4
@@ -47,7 +48,6 @@ jobs:
      - name: Install Python Dependencies
        run: |
          cd apps/api
-          pip install -q uv
          uv pip install --system pydantic structlog -q

      - name: Install Node Dependencies
@@ -56,12 +56,16 @@ jobs:
      - name: Generate Types (Temp)
        run: |
          cd apps/api
-          python ../../scripts/generate-schemas.py
+          python3 ../../scripts/generate-schemas.py
+          echo "=== Generated schema definition count ==="
+          python3 -c "import json; d=json.load(open('../../packages/shared-types/schemas/api-types.json')); print(f'definitions: {len(d[\"definitions\"])}')"
          cd ../../packages/shared-types
          pnpm generate:types

      - name: Check for Differences
        run: |
+          echo "=== git diff packages/shared-types/ ==="
+          git diff packages/shared-types/
          if git diff --exit-code packages/shared-types/; then
            echo "✅ TypeScript 型別與 Pydantic 模型同步"
          else
--- a/.gitignore
+++ b/.gitignore
@@ -82,3 +82,5 @@ temp/
 playwright-mcp/
 tsconfig.tsbuildinfo
 .superpowers/
+.aider*
+!.aiderignore
--- a/.playwright-mcp/sprint5-approved-design.html
+++ b/.playwright-mcp/sprint5-approved-design.html
@@ -0,0 +1,582 @@
+<!DOCTYPE html>
+<html lang="zh-TW">
+<head>
+<meta charset="UTF-8">
+<meta name="viewport" content="width=1440">
+<title>AWOOOI 指令中心 — 最終版</title>
+<link href="https://fonts.googleapis.com/css2?family=DM+Mono:wght@400;500&family=VT323&family=JetBrains+Mono:wght@400;500&family=Inter:wght@400;500;600;700;800&display=swap" rel="stylesheet">
+<style>
+/*
+  方案 2: Sidebar 品牌 + 內容區標題列 (Linear/Notion 風格)
+  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+  - 無獨立 Header 橫條
+  - 品牌在 Sidebar 頂部
+  - 標題/Tab/操作在內容區頂部
+  - 所有元素嚴格對齊
+*/
+*{margin:0;padding:0;box-sizing:border-box}
+:root{
+  --bg:#f5f4ed;--card:#fff;--surface:#faf9f3;--bdr:#e0ddd4;
+  --text:#141413;--text2:#555550;--text3:#87867f;
+  --accent:#d97757;--green:#22C55E;--red:#cc2200;--blue:#4A90D9;--orange:#F59E0B;--purple:#A855F7;
+  --sb-w:200px;--gap:14px;--radius:10px;--border:.5px solid #e0ddd4;
+}
+body{font-family:'DM Mono','Inter',system-ui,monospace;background:var(--bg);color:var(--text);font-size:13px;-webkit-font-smoothing:antialiased;overflow:hidden;height:100vh;line-height:1.5}
+
+/* ═══ LAYOUT ═══ */
+.layout{display:flex;height:100vh}
+
+/* ═══ SIDEBAR (200px) ═══ */
+.sidebar{width:var(--sb-w);flex-shrink:0;background:var(--surface);border-right:var(--border);display:flex;flex-direction:column;overflow:hidden}
+
+/* Brand Area (品牌區, 72px 高) */
+.brand{height:72px;padding:0 16px;display:flex;align-items:center;gap:10px;border-bottom:var(--border);flex-shrink:0}
+.brand svg{flex-shrink:0}
+.brand-text{display:inline-flex;align-items:baseline;gap:0}
+.brand-text .a,.brand-text .i{font-family:'DM Mono',monospace;font-size:22px;font-weight:700;color:var(--text)}
+.brand-text .w{font-family:'VT323',monospace;font-size:30px;color:var(--accent);letter-spacing:0;line-height:1}
+
+/* Nav */
+.nav{flex:1;overflow-y:auto;padding:8px}
+.nav-item{display:flex;align-items:center;gap:8px;padding:8px 12px;border-radius:6px;font-size:13px;color:var(--text2);cursor:pointer;transition:all .12s;margin-bottom:1px}
+.nav-item:hover{background:rgba(0,0,0,.03)}
+.nav-item.on{background:rgba(217,119,87,.08);color:var(--accent);font-weight:500}
+.nav-dot{width:5px;height:5px;border-radius:50%;flex-shrink:0}
+.nav-badge{margin-left:auto;background:var(--red);color:#fff;font-size:7px;padding:1px 5px;border-radius:6px;font-weight:700;min-width:14px;text-align:center}
+.nav-sep{height:var(--border);background:var(--bdr);margin:8px 12px}
+.nav-label{font-size:7px;text-transform:uppercase;letter-spacing:1.2px;color:var(--text3);padding:8px 12px 4px;font-weight:600}
+
+/* Nav Bottom */
+.nav-bottom{border-top:var(--border);padding:8px;flex-shrink:0}
+
+/* ═══ CONTENT AREA ═══ */
+.content{flex:1;display:flex;flex-direction:column;overflow:hidden}
+
+/* Title Bar (內容區頂部, 48px) */
+.title-bar{height:48px;padding:0 20px;display:flex;align-items:center;gap:16px;border-bottom:var(--border);background:var(--surface);flex-shrink:0}
+.page-title{font-family:'Syne','Inter',sans-serif;font-size:20px;font-weight:800;color:var(--text);letter-spacing:-.3px}
+.title-actions{margin-left:auto;display:flex;align-items:center;gap:10px}
+.ai-status{display:flex;align-items:center;gap:5px;padding:4px 10px;border:var(--border);border-radius:20px;font-size:9px;color:var(--text2)}
+.ai-dot{width:5px;height:5px;border-radius:50%;background:var(--green);animation:blink 2s infinite}
+@keyframes blink{0%,100%{opacity:1}50%{opacity:.3}}
+.lang-btn{padding:4px 10px;font-family:'DM Mono',monospace;font-size:10px;border:var(--border);border-radius:16px;cursor:pointer;background:transparent;color:var(--text3)}
+.lang-btn.on{background:var(--text);color:#fff;border-color:var(--text)}
+.avatar{width:24px;height:24px;border-radius:50%;background:var(--accent);display:flex;align-items:center;justify-content:center;font-size:10px;font-weight:700;color:#fff}
+
+/* Tab Bar (36px) */
+.tab-bar{height:36px;padding:0 20px;display:flex;align-items:stretch;border-bottom:var(--border);background:var(--card);flex-shrink:0}
+.tab{padding:0 14px;font-size:12px;font-weight:500;color:var(--text3);cursor:pointer;border-bottom:2px solid transparent;display:flex;align-items:center;gap:4px;transition:all .12s}
+.tab:hover{color:var(--text2)}
+.tab.on{color:var(--accent);border-bottom-color:var(--accent);font-weight:600}
+.tab-badge{background:var(--red);color:#fff;font-size:7px;padding:0 4px;border-radius:4px;font-weight:700;min-width:14px;text-align:center}
+
+/* ═══ KPI Strip (融入背景, 不反白) ═══ */
+.kpi-strip{display:flex;padding:10px 20px;gap:12px;flex-shrink:0}
+.kpi-card{flex:1;background:var(--card);border:var(--border);border-radius:8px;padding:8px 12px}
+.kpi-label{font-size:10px;text-transform:uppercase;letter-spacing:.5px;color:var(--text3);font-weight:500}
+.kpi-row{display:flex;align-items:baseline;gap:4px;margin-top:2px}
+.kpi-val{font-size:22px;font-weight:700;font-variant-numeric:tabular-nums;line-height:1}
+.kpi-sub{font-size:9px;color:var(--text2)}
+.kpi-trend{font-size:9px;font-weight:500}
+.kpi-bar{height:2px;border-radius:1px;background:#ebe8df;margin-top:4px;overflow:hidden}
+.kpi-bar-f{height:100%;border-radius:1px}
+
+/* ═══ MAIN BODY (2 欄) ═══ */
+.main-body{flex:1;display:flex;gap:var(--gap);padding:0 20px var(--gap);overflow:hidden}
+
+/* Left Column (60%) */
+.col-left{flex:6;min-width:0;overflow-y:auto;display:flex;flex-direction:column;gap:var(--gap);padding-top:var(--gap);padding-bottom:40px}
+.col-left .card{flex-shrink:0}
+
+/* Right Column (40%) — 整欄可捲動，卡片自然撐開不截斷 */
+.col-right{flex:4;min-width:0;overflow-y:auto;display:flex;flex-direction:column;gap:var(--gap);padding-top:var(--gap);padding-bottom:40px}
+.col-right .card{flex-shrink:0}
+
+/* ═══ SHARED CARD ═══ */
+.card{background:var(--card);border:var(--border);border-radius:var(--radius);overflow:hidden;box-shadow:0 1px 3px rgba(0,0,0,.04)}
+.card-header{padding:10px 14px;border-bottom:var(--border);display:flex;align-items:center;gap:8px;background:var(--surface)}
+.card-dot{width:5px;height:5px;border-radius:50%;background:var(--accent);flex-shrink:0}
+.card-title{font-size:14px;font-weight:700;letter-spacing:.3px}
+.card-action{margin-left:auto;font-size:11px;color:var(--blue);cursor:pointer;font-weight:500;white-space:nowrap}
+.card-action:hover{text-decoration:underline}
+.card-body{padding:14px}
+
+/* ═══ INCIDENT CARD ═══ */
+.inc{border:var(--border);border-radius:8px;overflow:hidden;margin-bottom:12px;box-shadow:0 1px 2px rgba(0,0,0,.03)}
+.inc:last-child{margin-bottom:0}
+.inc-bar{height:3px}
+.inc-body{padding:10px 12px}
+.inc-top{display:flex;align-items:center;gap:6px;margin-bottom:4px}
+.inc-sev{font-size:9px;font-weight:700;padding:2px 6px;border-radius:3px}
+.inc-name{font-size:13px;font-weight:600}
+.inc-meta{font-size:11px;color:var(--text2);margin-bottom:6px}
+
+/* FlowPipeline Animations */
+@keyframes lobster-bob{0%,100%{transform:translateY(0)}50%{transform:translateY(-4px)}}
+@keyframes card-glow-p2{0%,100%{box-shadow:0 0 0 0 rgba(74,144,217,.3)}50%{box-shadow:0 0 6px 2px rgba(74,144,217,.3)}}
+
+/* AI 提案 */
+.ai-proposal{background:rgba(217,119,87,.06);border:var(--border);border-color:rgba(217,119,87,.15);border-radius:6px;padding:6px 10px;font-size:10px;color:var(--accent);display:flex;align-items:center;gap:4px;margin-top:6px}
+.inc-actions{display:flex;gap:6px;margin-top:8px}
+.btn-approve{padding:5px 14px;border:none;border-radius:5px;font-size:10px;font-weight:600;cursor:pointer;background:var(--green);color:#fff}
+.btn-reject{padding:5px 14px;border:var(--border);border-radius:5px;font-size:10px;cursor:pointer;background:var(--card);color:var(--text2)}
+
+/* ═══ DISPOSITION MINI ═══ */
+.disp-mini{display:flex;gap:10px;align-items:center}
+.disp-ring{position:relative;width:56px;height:56px;flex-shrink:0}
+.disp-ring svg{transform:rotate(-90deg)}
+.disp-ring-center{position:absolute;inset:0;display:flex;align-items:center;justify-content:center;font-size:13px;font-weight:700;color:var(--green)}
+.disp-list{flex:1;display:grid;grid-template-columns:1fr 1fr;gap:2px 12px}
+.disp-item{display:flex;align-items:center;gap:5px;font-size:12px;color:var(--text2)}
+.disp-dot{width:5px;height:5px;border-radius:50%;flex-shrink:0}
+.disp-num{margin-left:auto;font-weight:700;font-variant-numeric:tabular-nums}
+
+/* ═══ STREAM MINI ═══ */
+.stream-item{display:flex;gap:8px;padding:6px 0;border-bottom:.5px solid #f0ede5;font-size:12px}
+.stream-item:last-child{border-bottom:none}
+.stream-time{font-size:10px;color:var(--text2);font-family:'JetBrains Mono',monospace;width:40px;flex-shrink:0}
+.stream-dot{width:4px;height:4px;border-radius:50%;margin-top:5px;flex-shrink:0}
+.stream-msg{flex:1;line-height:1.4}
+.stream-msg b{font-weight:600}
+.stream-msg code{background:rgba(0,0,0,.04);padding:0 2px;border-radius:2px;font-family:'JetBrains Mono',monospace;font-size:9px}
+
+/* ═══ OPENCLAW PANEL ═══ */
+.oc-body{display:flex;gap:12px;align-items:flex-start}
+.oc-info{flex:1;min-width:0}
+.oc-brand{display:inline-flex;align-items:baseline;gap:0;margin-bottom:2px}
+.oc-brand .w,.oc-brand .c{font-family:'DM Mono',monospace;font-size:15px;font-weight:700;color:var(--text)}
+.oc-brand .o{font-family:'VT323',monospace;font-size:24px;color:var(--accent);letter-spacing:0;line-height:1}
+.oc-badge{display:inline-block;font-size:8px;padding:2px 6px;background:rgba(74,144,217,.1);color:var(--blue);border-radius:2px;text-transform:uppercase;letter-spacing:1.2px;margin-bottom:6px}
+.oc-status{font-size:11px;color:var(--text2);display:flex;align-items:center;gap:4px}
+.oc-pulse{display:inline-flex;gap:3px}
+.oc-pulse span{width:4px;height:4px;border-radius:50%;background:var(--blue)}
+.oc-pulse span:nth-child(1){animation:oc-p 1.4s 0s infinite}
+.oc-pulse span:nth-child(2){animation:oc-p 1.4s .2s infinite}
+.oc-pulse span:nth-child(3){animation:oc-p 1.4s .4s infinite}
+@keyframes oc-p{0%,60%,100%{opacity:.2}30%{opacity:1}}
+
+/* ═══ TOPO GROUPS ═══ */
+.topo-grid{display:grid;grid-template-columns:1fr 1fr;gap:8px}
+.topo-g{border:var(--border);border-radius:8px;padding:8px 10px;cursor:pointer;transition:all .12s}
+.topo-g:hover{transform:translateY(-1px);box-shadow:0 2px 6px rgba(0,0,0,.05)}
+.tg-name{font-size:12px;font-weight:600;margin-bottom:2px}
+.tg-meta{font-size:10px;color:var(--text2)}
+.tg-svcs{display:flex;flex-wrap:wrap;gap:2px;margin-top:4px}
+.tg-svc{display:flex;align-items:center;gap:3px;padding:2px 7px;background:var(--card);border:var(--border);border-radius:4px;font-size:10px}
+.tg-sdot{width:3px;height:3px;border-radius:50%}
+.tg-infra{border-color:rgba(59,130,246,.2);background:rgba(59,130,246,.01)}
+.tg-ai{border-color:rgba(249,115,22,.25);background:rgba(249,115,22,.01)}
+.tg-k3s{border-color:rgba(168,85,247,.25);background:rgba(168,85,247,.01)}
+.tg-ext{border-color:rgba(245,158,11,.2);background:rgba(245,158,11,.01)}
+
+/* ═══ TOGGLE ═══ */
+.toggle-bar{display:flex;background:var(--bg);border-radius:5px;padding:2px}
+.toggle-opt{padding:3px 10px;border-radius:3px;font-size:8px;font-weight:500;cursor:pointer;color:var(--text3);transition:all .12s}
+.toggle-opt.on{background:var(--card);color:var(--accent);box-shadow:0 1px 2px rgba(0,0,0,.06);font-weight:600}
+
+/* ═══ HOST GRID ═══ */
+.host-grid{display:grid;grid-template-columns:1fr 1fr;gap:8px}
+.host-card{border:var(--border);border-radius:8px;padding:8px 10px;background:var(--surface)}
+.host-name{font-size:12px;font-weight:600;margin-bottom:2px}
+.host-ip{font-size:10px;color:var(--text2);font-family:'JetBrains Mono',monospace}
+.host-bars{display:flex;gap:6px;margin-top:5px}
+.host-bar-w{flex:1}
+.host-bar-l{font-size:7px;color:var(--text3);margin-bottom:2px;display:flex;justify-content:space-between}
+.host-bar{height:3px;border-radius:2px;background:#ebe8df;overflow:hidden}
+.host-bar-f{height:100%;border-radius:2px}
+
+/* ═══ TOOL GRID ═══ */
+.tool-grid{display:grid;grid-template-columns:1fr 1fr 1fr;gap:6px}
+.tool{display:flex;overflow:hidden;border:var(--border);border-radius:6px;background:var(--surface);cursor:pointer;transition:all .1s}
+.tool:hover{border-color:var(--blue)}
+.tool-bar{width:3px;flex-shrink:0}
+.tool-body{padding:5px 7px;flex:1;min-width:0}
+.tool-name{font-size:11px;font-weight:600;white-space:nowrap;overflow:hidden;text-overflow:ellipsis}
+.tool-meta{font-size:10px;color:var(--text2);margin-top:2px}
+
+/* ═══ APPROVAL MINI ═══ */
+.appr-item{background:var(--surface);border:var(--border);border-radius:6px;padding:8px 10px;margin-bottom:6px}
+.appr-item:last-child{margin-bottom:0}
+.appr-alert{font-size:13px;font-weight:600}
+.appr-target{font-size:11px;color:var(--text2);margin-top:2px;font-family:'JetBrains Mono',monospace}
+.appr-risk{display:inline-block;font-size:10px;padding:2px 8px;border-radius:3px;margin-top:3px;font-weight:600}
+.risk-low{background:rgba(34,197,94,.08);color:var(--green)}
+.risk-med{background:rgba(249,115,22,.08);color:var(--orange)}
+.appr-btns{display:flex;gap:4px;margin-top:5px}
+.btn-sm-ok{flex:1;padding:6px;border:none;border-radius:5px;font-size:11px;font-weight:600;cursor:pointer;background:var(--green);color:#fff}
+.btn-sm-no{flex:1;padding:6px;border:var(--border);border-radius:5px;font-size:11px;cursor:pointer;background:var(--card);color:var(--text2)}
+
+/* ═══ AI MODEL STATUS ═══ */
+.model-grid{display:grid;grid-template-columns:1fr 1fr;gap:6px}
+.model{border:var(--border);border-radius:6px;padding:6px 8px;display:flex;align-items:center;gap:6px}
+.model-dot{width:5px;height:5px;border-radius:50%;flex-shrink:0}
+.model-name{font-size:12px;font-weight:500}
+.model-tag{font-size:10px;color:var(--text3);margin-left:auto}
+
+/* ═══ TERMINAL FLOAT ═══ */
+.terminal-float{position:fixed;bottom:14px;right:14px;display:flex;align-items:center;gap:5px;padding:6px 14px;background:var(--card);border:var(--border);border-radius:8px;box-shadow:0 2px 8px rgba(0,0,0,.08);cursor:pointer;font-size:10px;color:var(--text2);z-index:40;transition:all .12s}
+.terminal-float:hover{border-color:var(--accent);color:var(--accent)}
+
+/* 龍蝦動畫 */
+.chibi-strip{height:14px;position:relative;overflow:hidden;border-bottom:.5px dashed rgba(232,85,48,.06);flex-shrink:0}
+@keyframes swim{0%{transform:translateX(0) scaleX(1)}47%{transform:translateX(900px) scaleX(1)}50%{transform:translateX(900px) scaleX(-1)}97%{transform:translateX(0) scaleX(-1)}100%{transform:translateX(0) scaleX(1)}}
+@keyframes bob{0%,100%{transform:translateY(0)}50%{transform:translateY(-2px)}}
+.chibi-swim{animation:swim 25s linear infinite;position:absolute;top:0;left:0}
+.chibi-bob{animation:bob .7s ease-in-out infinite;display:inline-block}
+</style>
+</head>
+<body>
+
+<div class="layout">
+
+<!-- ═══ SIDEBAR ═══ -->
+<div class="sidebar">
+
+  <!-- Brand Area (72px) -->
+  <div class="brand">
+    <svg width="32" height="32" viewBox="0 0 140 140" fill="none">
+      <defs><linearGradient id="c1" x1="0%" y1="0%" x2="100%" y2="100%"><stop offset="0%" stop-color="#FFF"/><stop offset="40%" stop-color="#F8F8F8"/><stop offset="70%" stop-color="#E8E8E8"/><stop offset="100%" stop-color="#D8D8D8"/></linearGradient><radialGradient id="l1" cx="40%" cy="35%" r="60%"><stop offset="0%" stop-color="#7AB8F5"/><stop offset="100%" stop-color="#2B6CB0"/></radialGradient></defs>
+      <circle cx="70" cy="70" r="32" fill="url(#c1)" stroke="#E0E0E0" stroke-width="1"/>
+      <circle cx="70" cy="70" r="16" fill="url(#l1)"><animate attributeName="r" values="14;17;14" dur="2s" repeatCount="indefinite"/></circle>
+      <circle cx="70" cy="70" r="8" fill="white" opacity=".8"/>
+      <path d="M70 38L70 18L58 6M70 18L82 6" stroke="url(#c1)" stroke-width="6" stroke-linecap="round" fill="none"/><path d="M70 38L70 18L58 6M70 18L82 6" stroke="#4A90D9" stroke-width="3" stroke-linecap="round" fill="none" opacity=".5"/>
+      <path d="M38 70L18 70L6 58M18 70L6 82" stroke="url(#c1)" stroke-width="6" stroke-linecap="round" fill="none"/><path d="M38 70L18 70L6 58M18 70L6 82" stroke="#4A90D9" stroke-width="3" stroke-linecap="round" fill="none" opacity=".5"/>
+      <path d="M102 70L122 70L134 58M122 70L134 82" stroke="url(#c1)" stroke-width="6" stroke-linecap="round" fill="none"/><path d="M102 70L122 70L134 58M122 70L134 82" stroke="#4A90D9" stroke-width="3" stroke-linecap="round" fill="none" opacity=".5"/>
+      <path d="M48 92L28 112L16 116" stroke="url(#c1)" stroke-width="6" stroke-linecap="round" fill="none"/>
+      <path d="M92 92L112 112L124 116" stroke="url(#c1)" stroke-width="6" stroke-linecap="round" fill="none"/>
+      <circle cx="70" cy="70" r="42" fill="none" stroke="#4A90D9" stroke-width="1" stroke-dasharray="6 6" opacity=".3"><animateTransform attributeName="transform" type="rotate" from="0 70 70" to="360 70 70" dur="8s" repeatCount="indefinite"/></circle>
+    </svg>
+    <div class="brand-text"><span class="a">A</span><span class="w">wooo</span><span class="i">I</span></div>
+  </div>
+
+  <!-- Nav -->
+  <div class="nav">
+    <div class="nav-item on"><span class="nav-dot" style="background:var(--accent)"></span>指令中心<span style="margin-left:auto;font-size:9px;color:var(--text3)">4 tab</span></div>
+    <div class="nav-item"><span class="nav-dot" style="background:var(--blue)"></span>可觀測性<span style="margin-left:auto;font-size:9px;color:var(--text3)">5 tab</span></div>
+    <div class="nav-item"><span class="nav-dot" style="background:var(--green)"></span>自動化<span style="margin-left:auto;font-size:9px;color:var(--text3)">3 tab</span></div>
+    <div class="nav-item"><span class="nav-dot" style="background:var(--purple)"></span>營運<span style="margin-left:auto;font-size:9px;color:var(--text3)">5 tab</span></div>
+    <div class="nav-item"><span class="nav-dot" style="background:var(--red)"></span>安全合規<span style="margin-left:auto;font-size:9px;color:var(--text3)">2 tab</span></div>
+    <div class="nav-item"><span class="nav-dot" style="background:var(--text3)"></span>知識</div>
+    <div class="nav-sep"></div>
+    <div class="nav-label">legacy</div>
+    <div class="nav-item" style="opacity:.5"><span class="nav-dot" style="background:var(--text3)"></span>經典 AI 中心</div>
+  </div>
+
+  <!-- Nav Bottom -->
+  <div class="nav-bottom">
+    <div class="nav-item"><span class="nav-dot" style="background:var(--text3)"></span>終端</div>
+    <div class="nav-item"><span class="nav-dot" style="background:var(--text3)"></span>設定</div>
+  </div>
+</div>
+
+<!-- ═══ CONTENT AREA ═══ -->
+<div class="content">
+
+  <!-- Title Bar -->
+  <div class="title-bar">
+    <span class="page-title">AI中心</span>
+    <div class="title-actions">
+      <div class="ai-status"><span class="ai-dot"></span>OpenClaw · openclaw_nemo</div>
+      <button class="lang-btn on">繁</button>
+      <button class="lang-btn">EN</button>
+      <div class="avatar">OG</div>
+    </div>
+  </div>
+
+  <!-- Tab Bar -->
+  <div class="tab-bar">
+    <div class="tab on">戰情總覽</div>
+    <div class="tab">告警 & 授權 <span class="tab-badge">2</span></div>
+    <div class="tab">活動串流</div>
+    <div class="tab">處置統計</div>
+  </div>
+
+  <!-- 龍蝦游泳列 -->
+  <div class="chibi-strip">
+    <div class="chibi-swim"><div class="chibi-bob">
+      <svg width="16" height="12" viewBox="0 0 18 14" fill="none"><ellipse cx="9" cy="10" rx="5" ry="4" fill="#E85530" opacity=".9"/><circle cx="9" cy="6" r="3.5" fill="#E85530" opacity=".9"/><circle cx="7.5" cy="5.2" r=".9" fill="#fff" opacity=".8"/><circle cx="10.5" cy="5.2" r=".9" fill="#fff" opacity=".8"/><path d="M3 8.5Q.5 7.5 1 10Q1.5 11.5 3.5 11" stroke="#E85530" stroke-width="1.2" fill="none" stroke-linecap="round"/><ellipse cx="1" cy="10" rx="1.2" ry="1.5" fill="#E85530" opacity=".7" transform="rotate(-10 1 10)"/><path d="M15 8.5Q17.5 7.5 17 10Q16.5 11.5 14.5 11" stroke="#E85530" stroke-width="1.2" fill="none" stroke-linecap="round"/><ellipse cx="17" cy="10" rx="1.2" ry="1.5" fill="#E85530" opacity=".7" transform="rotate(10 17 10)"/><path d="M6.5 2.5Q5 .5 3.5 1" stroke="#b03a1a" stroke-width=".8" fill="none" stroke-linecap="round"/><path d="M11.5 2.5Q13 .5 14.5 1" stroke="#b03a1a" stroke-width=".8" fill="none" stroke-linecap="round"/></svg>
+    </div></div>
+  </div>
+
+  <!-- KPI Strip (卡片式，融入背景) -->
+  <div class="kpi-strip">
+    <div class="kpi-card"><div class="kpi-label">系統健康</div><div class="kpi-row"><span class="kpi-val" style="color:var(--green)">98.5%</span></div><div class="kpi-bar"><div class="kpi-bar-f" style="width:98.5%;background:var(--green)"></div></div></div>
+    <div class="kpi-card"><div class="kpi-label">活動事件</div><div class="kpi-row"><span class="kpi-val" style="color:var(--accent)">2</span><span class="kpi-sub">P1:1 P2:1</span></div></div>
+    <div class="kpi-card"><div class="kpi-label">自動修復率</div><div class="kpi-row"><span class="kpi-val" style="color:var(--green)">72%</span><span class="kpi-trend" style="color:var(--green)">↑5%</span></div><div class="kpi-bar"><div class="kpi-bar-f" style="width:72%;background:linear-gradient(90deg,var(--green),#4ade80)"></div></div></div>
+    <div class="kpi-card"><div class="kpi-label">待審批</div><div class="kpi-row"><span class="kpi-val" style="color:var(--orange)">3</span><span class="kpi-sub">等待決策</span></div></div>
+    <div class="kpi-card"><div class="kpi-label">本週操作</div><div class="kpi-row"><span class="kpi-val">1,245</span></div></div>
+  </div>
+
+  <!-- ═══ MAIN BODY ═══ -->
+  <div class="main-body">
+
+    <!-- ═══ LEFT COLUMN ═══ -->
+    <div class="col-left">
+
+      <!-- 活躍事件 -->
+      <div class="card">
+        <div class="card-header">
+          <div class="card-dot"></div>
+          <span class="card-title">活躍事件</span>
+          <span style="font-size:11px;background:rgba(217,119,87,.1);color:#a04010;padding:2px 8px;font-weight:700;border:.5px solid rgba(217,119,87,.25);border-radius:10px">2</span>
+          <span class="card-action">查看全部告警 →</span>
+        </div>
+        <div class="card-body">
+
+          <!-- Incident 1: P1 進度條 -->
+          <div class="inc">
+            <div class="inc-bar" style="background:var(--orange)"></div>
+            <div class="inc-body">
+              <div class="inc-top">
+                <span class="inc-sev" style="background:rgba(245,158,11,.12);color:#d97000">P1</span>
+                <span class="inc-name">重新探測 #10exiconFast: 通過</span>
+              </div>
+              <div class="inc-meta">awoooi-api @ awoooi-prod · 3 alerts · investigating</div>
+              <!-- P1 FlowPipeline: 進度條 + 龍蝦 -->
+              <div style="position:relative;height:54px;margin:4px 0">
+                <div style="position:absolute;bottom:16px;left:0;right:0;height:4px;background:#e8e5dc;border-radius:2px"></div>
+                <div style="position:absolute;bottom:16px;left:0;height:4px;background:#F59E0B;border-radius:2px;width:43%"></div>
+                <div style="position:absolute;bottom:0;left:0%;transform:translateX(-50%);text-align:center"><div style="height:20px"></div><div style="width:8px;height:8px;border-radius:50%;background:#F59E0B;margin:0 auto"></div><div style="font-size:9px;color:#F59E0B;margin-top:2px">告警</div></div>
+                <div style="position:absolute;bottom:0;left:16.7%;transform:translateX(-50%);text-align:center"><div style="height:20px"></div><div style="width:8px;height:8px;border-radius:50%;background:#F59E0B;margin:0 auto"></div><div style="font-size:9px;color:#F59E0B;margin-top:2px">偵測</div></div>
+                <div style="position:absolute;bottom:0;left:33.3%;transform:translateX(-50%);text-align:center"><div style="height:20px"></div><div style="width:8px;height:8px;border-radius:50%;background:#F59E0B;margin:0 auto"></div><div style="font-size:9px;color:#F59E0B;margin-top:2px">分析</div></div>
+                <div style="position:absolute;bottom:0;left:50%;transform:translateX(-50%);text-align:center"><div style="animation:lobster-bob 1.5s ease-in-out infinite;margin-bottom:2px"><svg width="14" height="16" viewBox="0 0 18 20" fill="none"><ellipse cx="9" cy="13" rx="5.5" ry="6.5" fill="#F59E0B"/><circle cx="9" cy="7.5" r="4.5" fill="#F59E0B"/><circle cx="7" cy="6.5" r="1" fill="#b03a1a"/><circle cx="11" cy="6.5" r="1" fill="#b03a1a"/></svg></div><div style="width:8px;height:8px;border-radius:50%;background:#fff;border:2px solid #F59E0B;margin:0 auto"></div><div style="font-size:9px;color:var(--text);font-weight:700;margin-top:2px">提案</div></div>
+                <div style="position:absolute;bottom:0;left:66.7%;transform:translateX(-50%);text-align:center"><div style="height:20px"></div><div style="width:8px;height:8px;border-radius:50%;background:#f8f9fc;border:1.5px solid #e0ddd4;margin:0 auto"></div><div style="font-size:9px;color:var(--text3);margin-top:2px">授權</div></div>
+                <div style="position:absolute;bottom:0;left:83.3%;transform:translateX(-50%);text-align:center"><div style="height:20px"></div><div style="width:8px;height:8px;border-radius:50%;background:#f8f9fc;border:1.5px solid #e0ddd4;margin:0 auto"></div><div style="font-size:9px;color:var(--text3);margin-top:2px">執行</div></div>
+                <div style="position:absolute;bottom:0;left:100%;transform:translateX(-50%);text-align:center"><div style="height:20px"></div><div style="width:8px;height:8px;border-radius:50%;background:#f8f9fc;border:1.5px solid #e0ddd4;margin:0 auto"></div><div style="font-size:9px;color:var(--text3);margin-top:2px">完成</div></div>
+              </div>
+              <div class="ai-proposal">▶ AI 提案：restart_deployment awoooi-api (信心度 91%)</div>
+              <div class="inc-actions">
+                <button class="btn-approve">批准執行</button>
+                <button class="btn-reject">拒絕</button>
+              </div>
+            </div>
+          </div>
+
+          <!-- Incident 2: P2 卡片步驟 -->
+          <div class="inc">
+            <div class="inc-bar" style="background:var(--blue)"></div>
+            <div class="inc-body">
+              <div class="inc-top">
+                <span class="inc-sev" style="background:rgba(74,144,217,.12);color:var(--blue)">P2</span>
+                <span class="inc-name">awoooi-api: 服務異常</span>
+              </div>
+              <div class="inc-meta">awoooi-api @ awoooi-prod · investigating</div>
+              <!-- P2 FlowPipeline: 卡片步驟 + 光暈 -->
+              <div style="display:flex;align-items:flex-end;gap:3px;margin:4px 0;overflow-x:auto">
+                <div style="text-align:center"><div style="height:20px"></div><div style="padding:3px 5px;background:#4A90D9;border-radius:4px"><span style="font-size:9px;color:#fff;font-weight:700">告警</span></div></div>
+                <div style="width:6px;height:1.5px;background:#4A90D9;margin-bottom:10px"></div>
+                <div style="text-align:center"><div style="height:20px"></div><div style="padding:3px 5px;background:#4A90D9;border-radius:4px"><span style="font-size:9px;color:#fff;font-weight:700">偵測</span></div></div>
+                <div style="width:6px;height:1.5px;background:#e0ddd4;margin-bottom:10px"></div>
+                <div style="text-align:center"><div style="animation:lobster-bob 1.5s ease-in-out infinite"><svg width="14" height="16" viewBox="0 0 18 20" fill="none"><ellipse cx="9" cy="13" rx="5.5" ry="6.5" fill="#4A90D9"/><circle cx="9" cy="7.5" r="4.5" fill="#4A90D9"/><circle cx="7" cy="6.5" r="1" fill="#1a4a7a"/><circle cx="11" cy="6.5" r="1" fill="#1a4a7a"/></svg></div><div style="padding:3px 5px;background:#fff;border:1.5px solid #4A90D9;border-radius:4px;animation:card-glow-p2 1.5s infinite"><span style="font-size:9px;color:#4A90D9;font-weight:700">分析</span></div></div>
+                <div style="width:6px;height:1.5px;background:#e0ddd4;margin-bottom:10px"></div>
+                <div style="text-align:center"><div style="height:20px"></div><div style="padding:3px 5px;background:#f8f9fc;border:1px solid #e0ddd4;border-radius:4px"><span style="font-size:9px;color:#b0ad9f">提案</span></div></div>
+                <div style="width:6px;height:1.5px;background:#e0ddd4;margin-bottom:10px"></div>
+                <div style="text-align:center"><div style="height:20px"></div><div style="padding:3px 5px;background:#f8f9fc;border:1px solid #e0ddd4;border-radius:4px"><span style="font-size:9px;color:#b0ad9f">授權</span></div></div>
+                <div style="width:6px;height:1.5px;background:#e0ddd4;margin-bottom:10px"></div>
+                <div style="text-align:center"><div style="height:20px"></div><div style="padding:3px 5px;background:#f8f9fc;border:1px solid #e0ddd4;border-radius:4px"><span style="font-size:9px;color:#b0ad9f">執行</span></div></div>
+                <div style="width:6px;height:1.5px;background:#e0ddd4;margin-bottom:10px"></div>
+                <div style="text-align:center"><div style="height:20px"></div><div style="padding:3px 5px;background:#f8f9fc;border:1px solid #e0ddd4;border-radius:4px"><span style="font-size:9px;color:#b0ad9f">完成</span></div></div>
+              </div>
+            </div>
+          </div>
+
+        </div>
+      </div>
+
+      <!-- 處置統計迷你版 -->
+      <div class="card">
+        <div class="card-header">
+          <div class="card-dot"></div>
+          <span class="card-title">處置統計</span>
+          <span class="card-action">查看完整報表 →</span>
+        </div>
+        <div class="card-body">
+          <div class="disp-mini">
+            <!-- 環形圖 SVG -->
+            <div class="disp-ring">
+              <svg width="56" height="56" viewBox="0 0 56 56">
+                <circle cx="28" cy="28" r="22" fill="none" stroke="#ebe8df" stroke-width="5"/>
+                <circle cx="28" cy="28" r="22" fill="none" stroke="var(--green)" stroke-width="5" stroke-dasharray="96.6 41.7" stroke-linecap="round"/>
+                <circle cx="28" cy="28" r="22" fill="none" stroke="var(--blue)" stroke-width="5" stroke-dasharray="3.5 134.8" stroke-dashoffset="-96.6" stroke-linecap="round"/>
+                <circle cx="28" cy="28" r="22" fill="none" stroke="var(--orange)" stroke-width="5" stroke-dasharray="30.5 107.8" stroke-dashoffset="-100.1" stroke-linecap="round"/>
+                <circle cx="28" cy="28" r="22" fill="none" stroke="var(--purple)" stroke-width="5" stroke-dasharray="8.1 130.2" stroke-dashoffset="-130.6" stroke-linecap="round"/>
+              </svg>
+              <div class="disp-ring-center">72%</div>
+            </div>
+            <div class="disp-list">
+              <div class="disp-item"><span class="disp-dot" style="background:var(--green)"></span>自動修復<span class="disp-num" style="color:var(--green)">142</span></div>
+              <div class="disp-item"><span class="disp-dot" style="background:var(--orange)"></span>人工核准<span class="disp-num" style="color:var(--orange)">45</span></div>
+              <div class="disp-item"><span class="disp-dot" style="background:var(--purple)"></span>手動處理<span class="disp-num" style="color:var(--purple)">12</span></div>
+              <div class="disp-item"><span class="disp-dot" style="background:var(--blue)"></span>冷啟動<span class="disp-num" style="color:var(--blue)">5</span></div>
+            </div>
+          </div>
+        </div>
+      </div>
+
+      <!-- 最近活動 -->
+      <div class="card">
+        <div class="card-header">
+          <div class="card-dot"></div>
+          <span class="card-title">最近活動</span>
+          <span class="card-action">查看活動串流 →</span>
+        </div>
+        <div class="card-body" style="padding:10px 14px">
+          <div class="stream-item"><span class="stream-time">18:05</span><span class="stream-dot" style="background:var(--green)"></span><span class="stream-msg">心跳確認 <code>mon/mon1</code> Ready</span></div>
+          <div class="stream-item"><span class="stream-time">18:04</span><span class="stream-dot" style="background:var(--blue)"></span><span class="stream-msg"><b>OpenClaw</b> 匹配 Playbook <code>restart_worker</code> (91%)</span></div>
+          <div class="stream-item"><span class="stream-time">18:02</span><span class="stream-dot" style="background:var(--red)"></span><span class="stream-msg"><b>Prometheus</b> Worker CPU 89%</span></div>
+          <div class="stream-item"><span class="stream-time">17:58</span><span class="stream-dot" style="background:var(--green)"></span><span class="stream-msg">自動修復完成 <code>restart: api</code> (12s)</span></div>
+        </div>
+      </div>
+
+    </div>
+
+    <!-- ═══ RIGHT COLUMN (480px) ═══ -->
+    <div class="col-right">
+
+      <!-- OpenClaw 認知引擎 (最上方，品牌錨點) -->
+      <div class="card">
+        <div class="card-header">
+          <div class="card-dot"></div>
+          <span class="card-title">OPENCLAW 認知引擎</span>
+        </div>
+        <div class="card-body">
+          <div class="oc-body">
+            <svg width="68" height="68" viewBox="0 0 140 140" fill="none" style="flex-shrink:0">
+              <defs><linearGradient id="oc-c" x1="0%" y1="0%" x2="100%" y2="100%"><stop offset="0%" stop-color="#FFF"/><stop offset="40%" stop-color="#F8F8F8"/><stop offset="70%" stop-color="#E8E8E8"/><stop offset="100%" stop-color="#D8D8D8"/></linearGradient><radialGradient id="oc-l" cx="40%" cy="35%" r="60%"><stop offset="0%" stop-color="#7AB8F5"/><stop offset="100%" stop-color="#2B6CB0"/></radialGradient></defs>
+              <circle cx="70" cy="70" r="32" fill="url(#oc-c)" stroke="#E0E0E0" stroke-width="1"/><circle cx="70" cy="70" r="16" fill="url(#oc-l)"><animate attributeName="r" values="14;17;14" dur="2s" repeatCount="indefinite"/></circle><circle cx="70" cy="70" r="8" fill="white" opacity=".8"/>
+              <path d="M70 38L70 18L58 6M70 18L82 6" stroke="url(#oc-c)" stroke-width="6" stroke-linecap="round" fill="none"/><path d="M70 38L70 18L58 6M70 18L82 6" stroke="#4A90D9" stroke-width="3" stroke-linecap="round" fill="none" opacity=".5"/>
+              <path d="M38 70L18 70L6 58M18 70L6 82" stroke="url(#oc-c)" stroke-width="6" stroke-linecap="round" fill="none"/><path d="M38 70L18 70L6 58M18 70L6 82" stroke="#4A90D9" stroke-width="3" stroke-linecap="round" fill="none" opacity=".5"/>
+              <path d="M102 70L122 70L134 58M122 70L134 82" stroke="url(#oc-c)" stroke-width="6" stroke-linecap="round" fill="none"/><path d="M102 70L122 70L134 58M122 70L134 82" stroke="#4A90D9" stroke-width="3" stroke-linecap="round" fill="none" opacity=".5"/>
+              <path d="M48 92L28 112L16 116" stroke="url(#oc-c)" stroke-width="6" stroke-linecap="round" fill="none"/><path d="M92 92L112 112L124 116" stroke="url(#oc-c)" stroke-width="6" stroke-linecap="round" fill="none"/>
+              <circle cx="70" cy="70" r="42" fill="none" stroke="#4A90D9" stroke-width="1" stroke-dasharray="6 6" opacity=".3"><animateTransform attributeName="transform" type="rotate" from="0 70 70" to="360 70 70" dur="8s" repeatCount="indefinite"/></circle>
+            </svg>
+            <div class="oc-info">
+              <div class="oc-brand"><span class="w">W</span><span class="o">ooo</span><span class="c">Claw</span></div>
+              <div><div class="oc-badge">WoooClaw Pipeline</div></div>
+              <div class="oc-status">[AGENT] patrolling... <span class="oc-pulse"><span></span><span></span><span></span></span></div>
+              <!-- 豐富內容: AI 即時狀態 -->
+              <div style="margin-top:8px;padding-top:8px;border-top:.5px solid var(--bdr)">
+                <div style="display:flex;gap:8px;margin-bottom:4px">
+                  <div style="flex:1;font-size:10px;color:var(--text2)">模型: <span style="font-weight:600;color:var(--text)">openclaw_nemo</span></div>
+                  <div style="font-size:10px;color:var(--green);font-weight:500">● 運行中</div>
+                </div>
+                <div style="display:flex;gap:12px;font-size:10px;color:var(--text2)">
+                  <span>今日分析: <b style="color:var(--text)">23</b></span>
+                  <span>成功率: <b style="color:var(--green)">91%</b></span>
+                  <span>MTTR: <b style="color:var(--text)">8.2m</b></span>
+                </div>
+                <!-- AI 推理終端 -->
+                <div style="background:#141413;border-radius:6px;padding:8px 10px;margin-top:8px;font-family:'JetBrains Mono',monospace;font-size:10px;color:#a0e8a0;line-height:1.6;max-height:80px;overflow-y:auto">
+<span style="color:#555">[18:03]</span> Analyzing worker CPU spike...
+<span style="color:#555">[18:03]</span> Root cause: OOM pressure
+<span style="color:#555">[18:03]</span> Matched: restart_worker (91%)
+<span style="color:#ffd700">[18:03] Awaiting approval ▎</span>
+                </div>
+              </div>
+            </div>
+          </div>
+        </div>
+      </div>
+
+      <!-- 待審批任務 -->
+      <div class="card" style="border-color:rgba(249,115,22,.3)">
+        <div class="card-header" style="background:rgba(249,115,22,.04)">
+          <div class="card-dot" style="background:var(--orange)"></div>
+          <span class="card-title">待審批任務</span>
+          <span style="font-size:11px;background:rgba(249,115,22,.1);color:var(--orange);padding:2px 8px;font-weight:700;border-radius:10px">3</span>
+          <span class="card-action">查看全部授權 →</span>
+        </div>
+        <div class="card-body">
+          <div class="appr-item">
+            <div class="appr-alert" style="color:var(--red)">Worker 高負載警告</div>
+            <div class="appr-target">ssh://wooo@192.168.0.110/restart</div>
+            <span class="appr-risk risk-low">LOW RISK</span>
+            <div class="appr-btns"><button class="btn-sm-ok">批准</button><button class="btn-sm-no">拒絕</button></div>
+          </div>
+          <div class="appr-item">
+            <div class="appr-alert" style="color:var(--orange)">Redis 記憶體壓力</div>
+            <div class="appr-target">ansible://188/clear_redis_cache.yml</div>
+            <span class="appr-risk risk-med">MEDIUM</span>
+            <div class="appr-btns"><button class="btn-sm-ok">批准</button><button class="btn-sm-no">拒絕</button></div>
+          </div>
+        </div>
+      </div>
+
+      <!-- 拓撲 / 主機 Toggle -->
+      <div class="card">
+        <div class="card-header">
+          <div class="card-dot"></div>
+          <span class="card-title">基礎架構</span>
+          <div style="margin-left:auto"><div class="toggle-bar"><div class="toggle-opt" id="t-host" onclick="switchView('host')">主機</div><div class="toggle-opt on" id="t-topo" onclick="switchView('topo')">拓撲</div></div></div>
+          <span class="card-action" style="margin-left:8px">展開全圖 →</span>
+        </div>
+        <div class="card-body" id="view-topo">
+          <div class="topo-grid">
+            <div class="topo-g tg-infra"><div class="tg-name">🏗️ 基礎設施 (.110)</div><div class="tg-meta">7 服務 · ✓ 全部健康</div><div class="tg-svcs"><span class="tg-svc"><span class="tg-sdot" style="background:var(--green)"></span>Gitea</span><span class="tg-svc"><span class="tg-sdot" style="background:var(--green)"></span>Harbor</span><span class="tg-svc"><span class="tg-sdot" style="background:var(--green)"></span>Sentry</span><span class="tg-svc"><span class="tg-sdot" style="background:var(--green)"></span>Prom</span></div></div>
+            <div class="topo-g tg-ai"><div class="tg-name">🧠 AI/數據 (.188)</div><div class="tg-meta">7 服務 · ⚡ OpenClaw 診斷中</div><div class="tg-svcs"><span class="tg-svc"><span class="tg-sdot" style="background:var(--green)"></span>PG</span><span class="tg-svc"><span class="tg-sdot" style="background:var(--green)"></span>Redis</span><span class="tg-svc" style="border-color:var(--blue)"><span class="tg-sdot" style="background:var(--blue)"></span>OpenClaw⚡</span><span class="tg-svc"><span class="tg-sdot" style="background:var(--green)"></span>Ollama</span></div></div>
+            <div class="topo-g tg-k3s"><div class="tg-name">☸️ K3s 叢集</div><div class="tg-meta">5 服務 · ⚠️ Worker CPU 89%</div><div class="tg-svcs"><span class="tg-svc"><span class="tg-sdot" style="background:var(--green)"></span>api×2</span><span class="tg-svc"><span class="tg-sdot" style="background:var(--green)"></span>web×2</span><span class="tg-svc" style="border-color:var(--orange)"><span class="tg-sdot" style="background:var(--orange)"></span>worker⚠️</span></div></div>
+            <div class="topo-g tg-ext"><div class="tg-name">🌐 外部服務</div><div class="tg-meta">3 服務 · ✓ 全部可達</div><div class="tg-svcs"><span class="tg-svc"><span class="tg-sdot" style="background:var(--green)"></span>Gemini</span><span class="tg-svc"><span class="tg-sdot" style="background:var(--green)"></span>NVIDIA</span><span class="tg-svc"><span class="tg-sdot" style="background:var(--green)"></span>CF</span></div></div>
+          </div>
+        </div>
+        <div class="card-body" id="view-host" style="display:none">
+          <div class="host-grid">
+            <div class="host-card"><div class="host-name">DevOps 金庫</div><div class="host-ip">192.168.0.110</div><div class="host-bars"><div class="host-bar-w"><div class="host-bar-l"><span>CPU</span><span>35%</span></div><div class="host-bar"><div class="host-bar-f" style="width:35%;background:var(--green)"></div></div></div><div class="host-bar-w"><div class="host-bar-l"><span>RAM</span><span>55%</span></div><div class="host-bar"><div class="host-bar-f" style="width:55%;background:var(--green)"></div></div></div></div></div>
+            <div class="host-card"><div class="host-name">AI+Web 中心</div><div class="host-ip">192.168.0.188</div><div class="host-bars"><div class="host-bar-w"><div class="host-bar-l"><span>CPU</span><span>67%</span></div><div class="host-bar"><div class="host-bar-f" style="width:67%;background:var(--orange)"></div></div></div><div class="host-bar-w"><div class="host-bar-l"><span>RAM</span><span>72%</span></div><div class="host-bar"><div class="host-bar-f" style="width:72%;background:var(--orange)"></div></div></div></div></div>
+            <div class="host-card"><div class="host-name">K3s Master</div><div class="host-ip">192.168.0.120</div><div class="host-bars"><div class="host-bar-w"><div class="host-bar-l"><span>CPU</span><span>45%</span></div><div class="host-bar"><div class="host-bar-f" style="width:45%;background:var(--green)"></div></div></div><div class="host-bar-w"><div class="host-bar-l"><span>RAM</span><span>60%</span></div><div class="host-bar"><div class="host-bar-f" style="width:60%;background:var(--green)"></div></div></div></div></div>
+            <div class="host-card"><div class="host-name">K3s Worker</div><div class="host-ip">192.168.0.121</div><div class="host-bars"><div class="host-bar-w"><div class="host-bar-l"><span>CPU</span><span>--</span></div><div class="host-bar"><div class="host-bar-f" style="width:0%"></div></div></div><div class="host-bar-w"><div class="host-bar-l"><span>RAM</span><span>--</span></div><div class="host-bar"><div class="host-bar-f" style="width:0%"></div></div></div></div></div>
+          </div>
+        </div>
+      </div>
+
+      <!-- AI 模型狀態 -->
+      <div class="card">
+        <div class="card-header">
+          <div class="card-dot"></div>
+          <span class="card-title">AI 模型狀態</span>
+        </div>
+        <div class="card-body">
+          <div class="model-grid">
+            <div class="model"><span class="model-dot" style="background:var(--green)"></span><span class="model-name">OpenClaw Nemo</span><span class="model-tag">local</span></div>
+            <div class="model"><span class="model-dot" style="background:var(--green)"></span><span class="model-name">Ollama qwen2.5</span><span class="model-tag">local</span></div>
+            <div class="model"><span class="model-dot" style="background:var(--green)"></span><span class="model-name">Gemini Pro</span><span class="model-tag">cloud</span></div>
+            <div class="model"><span class="model-dot" style="background:var(--green)"></span><span class="model-name">NVIDIA NIM</span><span class="model-tag">cloud</span></div>
+          </div>
+        </div>
+      </div>
+
+      <!-- 監控工具 -->
+      <div class="card">
+        <div class="card-header">
+          <div class="card-dot"></div>
+          <span class="card-title">監控工具</span>
+        </div>
+        <div class="card-body">
+          <div class="tool-grid">
+            <div class="tool"><div class="tool-bar" style="background:#4A90D9"></div><div class="tool-body"><div class="tool-name">SigNoz</div><div class="tool-meta">Traces · Logs</div></div></div>
+            <div class="tool"><div class="tool-bar" style="background:#E85530"></div><div class="tool-body"><div class="tool-name">Grafana</div><div class="tool-meta">3 Dashboards</div></div></div>
+            <div class="tool"><div class="tool-bar" style="background:var(--green)"></div><div class="tool-body"><div class="tool-name">Prometheus</div><div class="tool-meta">22 targets</div></div></div>
+            <div class="tool"><div class="tool-bar" style="background:var(--orange)"></div><div class="tool-body"><div class="tool-name">Langfuse</div><div class="tool-meta">LLMOps</div></div></div>
+            <div class="tool"><div class="tool-bar" style="background:var(--red)"></div><div class="tool-body"><div class="tool-name">Sentry</div><div class="tool-meta">2 Projects</div></div></div>
+            <div class="tool"><div class="tool-bar" style="background:var(--purple)"></div><div class="tool-body"><div class="tool-name">Gitea</div><div class="tool-meta">CI/CD</div></div></div>
+          </div>
+        </div>
+      </div>
+
+    </div>
+  </div>
+</div>
+</div>
+
+<!-- Terminal Float -->
+<div class="terminal-float">⌨ Omni-Terminal</div>
+
+<script>
+function switchView(v){
+  document.getElementById('view-host').style.display=v==='host'?'block':'none'
+  document.getElementById('view-topo').style.display=v==='topo'?'block':'none'
+  document.getElementById('t-host').classList.toggle('on',v==='host')
+  document.getElementById('t-topo').classList.toggle('on',v==='topo')
+}
+</script>
+</body>
+</html>
--- a/.playwright-mcp/sprint5r-approved-design.html
+++ b/.playwright-mcp/sprint5r-approved-design.html
@@ -0,0 +1,783 @@
+<!DOCTYPE html>
+<html lang="zh-TW">
+<head>
+<meta charset="UTF-8">
+<meta name="viewport" content="width=1440">
+<title>AWOOOI AI 戰情指揮中心 — 版本 A：忠實還原 + 微增強</title>
+<link href="https://fonts.googleapis.com/css2?family=DM+Mono:wght@300;400;500&family=Syne:wght@400;600;700;800&family=JetBrains+Mono:wght@300;400;500&family=VT323&display=swap" rel="stylesheet">
+<style>
+:root {
+  --bg: #f5f4ed;
+  --card: #fff;
+  --surface: #faf9f3;
+  --bdr: #e0ddd4;
+  --text: #141413;
+  --text2: #555550;
+  --text3: #87867f;
+  --accent: #d97757;
+  --green: #22C55E;
+  --red: #cc2200;
+  --blue: #4A90D9;
+  --orange: #F59E0B;
+  --purple: #A855F7;
+}
+*, *::before, *::after { margin:0; padding:0; box-sizing:border-box; }
+body {
+  font-family: 'DM Mono', monospace;
+  background: var(--bg);
+  color: var(--text);
+  overflow: hidden;
+  height: 100vh;
+  width: 1440px;
+  display: flex;
+  font-size: 12px;
+  line-height: 1.4;
+}
+
+/* SIDEBAR */
+.sidebar {
+  width: 200px;
+  min-width: 200px;
+  background: var(--card);
+  border-right: 0.5px solid var(--bdr);
+  display: flex;
+  flex-direction: column;
+  height: 100vh;
+}
+.brand {
+  height: 72px;
+  display: flex;
+  align-items: center;
+  gap: 10px;
+  padding: 0 16px;
+  border-bottom: 0.5px solid var(--bdr);
+}
+.brand-text {
+  display: flex;
+  align-items: baseline;
+  gap: 0;
+  line-height: 1;
+}
+.brand-text .a { font-family: 'DM Mono', monospace; font-size: 20px; font-weight: 700; color: #141413; margin-right: -4px; }
+.brand-text .w { font-family: 'VT323', monospace; font-size: 26px; color: var(--accent); letter-spacing: -1px; line-height: 1; }
+.brand-text .i { font-family: 'DM Mono', monospace; font-size: 20px; font-weight: 700; color: #141413; margin-left: -3px; }
+
+.nav { flex:1; padding: 12px 8px; display:flex; flex-direction:column; gap:2px; }
+.nav-item {
+  display: flex; align-items: center; gap: 8px;
+  padding: 8px 12px; border-radius: 6px; cursor: pointer;
+  font-size: 12px; color: var(--text2); text-decoration: none;
+  transition: background 0.15s;
+}
+.nav-item:hover { background: var(--surface); }
+.nav-item.active { background: rgba(217,119,87,0.08); color: var(--accent); font-weight: 500; }
+.nav-item .dot { width:6px; height:6px; border-radius:50%; flex-shrink:0; }
+
+.nav-sep { height:0.5px; background:var(--bdr); margin:8px 12px; }
+.nav-label { font-size:9px; color:var(--text3); padding:4px 12px; text-transform:uppercase; letter-spacing:1px; }
+
+.nav-bottom { padding:8px; border-top:0.5px solid var(--bdr); }
+
+/* CONTENT */
+.content { flex:1; display:flex; flex-direction:column; height:100vh; overflow:hidden; }
+
+/* TITLE BAR */
+.titlebar {
+  height: 48px; min-height:48px;
+  display: flex; align-items: center; justify-content: space-between;
+  padding: 0 20px;
+  border-bottom: 0.5px solid var(--bdr);
+  background: var(--card);
+}
+.titlebar h1 { font-family:'Syne',sans-serif; font-size:20px; font-weight:800; }
+.titlebar-right { display:flex; align-items:center; gap:12px; }
+.pulse-dot { width:8px;height:8px;border-radius:50%;background:var(--green);display:inline-block;animation:blink 2s infinite; }
+.model-badge { font-size:11px; color:var(--text2); display:flex; align-items:center; gap:6px; }
+.lang-btn { font-size:11px; padding:2px 8px; border-radius:4px; border:0.5px solid var(--bdr); background:transparent; cursor:pointer; color:var(--text3); }
+.lang-btn.active { background:var(--text); color:var(--card); border-color:var(--text); }
+.avatar { width:28px;height:28px;border-radius:50%;background:var(--accent);display:flex;align-items:center;justify-content:center;color:#fff;font-size:12px;font-weight:700; }
+
+/* TAB BAR */
+.tabbar {
+  height:36px; min-height:36px;
+  display:flex; align-items:stretch;
+  padding:0 20px; gap:0;
+  border-bottom:0.5px solid var(--bdr);
+  background:var(--card);
+}
+.tab {
+  padding:0 16px; display:flex; align-items:center; gap:6px;
+  font-size:12px; color:var(--text3); cursor:pointer;
+  border-bottom:2px solid transparent; position:relative;
+}
+.tab.active { color:var(--accent); border-bottom-color:var(--accent); font-weight:500; }
+.tab-badge { background:var(--red);color:#fff;font-size:9px;padding:1px 5px;border-radius:8px;font-weight:500; }
+
+/* LOBSTER SWIM */
+.swim-lane {
+  height:14px; min-height:14px;
+  background:var(--surface);
+  position:relative;
+  overflow:hidden;
+  border-bottom:0.5px solid var(--bdr);
+}
+.swim-lobster {
+  position:absolute;
+  top:1px;
+  animation: swim-wide 25s linear infinite, chibi-bob 0.7s ease-in-out infinite;
+}
+
+/* KPI STRIP */
+.kpi-strip {
+  display:flex; gap:8px; padding:8px 20px;
+  border-bottom:0.5px solid var(--bdr);
+  background:var(--surface);
+  min-height:60px;
+}
+.kpi-card {
+  flex:1; background:var(--card); border:0.5px solid var(--bdr);
+  border-radius:8px; padding:8px 12px;
+  display:flex; flex-direction:column; gap:2px;
+}
+.kpi-label { font-size:10px; color:var(--text3); }
+.kpi-val { font-size:18px; font-weight:500; }
+.kpi-sub { font-size:9px; color:var(--text3); }
+.kpi-bar { height:3px; border-radius:2px; background:#eee; margin-top:2px; }
+.kpi-bar-fill { height:100%; border-radius:2px; }
+.trend-up { color:var(--green); font-size:10px; }
+
+/* MAIN BODY */
+.main-body {
+  flex:1; display:flex; gap:12px; padding:12px 20px; overflow:hidden;
+}
+.col-left { flex:6; display:flex; flex-direction:column; gap:10px; overflow:hidden; }
+.col-right { flex:4; display:flex; flex-direction:column; gap:10px; overflow:hidden; }
+
+/* CARDS */
+.card {
+  background:var(--card);
+  border:0.5px solid var(--bdr);
+  border-radius:10px;
+  overflow:hidden;
+}
+.card-header {
+  display:flex; align-items:center; gap:8px;
+  padding:8px 12px;
+  border-bottom:0.5px solid var(--bdr);
+  font-size:12px; font-weight:500;
+}
+.card-header .hdot { width:6px;height:6px;border-radius:50%;background:var(--accent);flex-shrink:0; }
+.card-header .link { margin-left:auto; font-size:10px; color:var(--accent); text-decoration:none; cursor:pointer; }
+.card-header .cnt-badge { font-size:9px; background:var(--orange); color:#fff; padding:1px 6px; border-radius:8px; }
+.card-body { padding:10px 12px; }
+
+/* INCIDENT */
+.incident {
+  border-left:3px solid var(--orange);
+  padding:8px 10px;
+  margin-bottom:8px;
+  background:var(--surface);
+  border-radius:0 6px 6px 0;
+}
+.incident.p2 { border-left-color:var(--blue); }
+.sev-badge {
+  display:inline-block; font-size:9px; font-weight:700; padding:1px 6px; border-radius:4px; color:#fff;
+}
+.sev-p1 { background:var(--orange); }
+.sev-p2 { background:var(--blue); }
+.incident-title { font-size:13px; font-weight:500; margin:4px 0 2px; }
+.incident-meta { font-size:10px; color:var(--text3); margin-bottom:6px; }
+
+/* FLOW PIPELINE */
+.flow-pipe {
+  display:flex; align-items:center; gap:0; margin:6px 0;
+  font-size:9px; position:relative;
+}
+.flow-step {
+  display:flex; flex-direction:column; align-items:center; gap:2px;
+  position:relative; flex:1;
+}
+.flow-step .circle {
+  width:18px;height:18px;border-radius:50%;
+  display:flex;align-items:center;justify-content:center;
+  font-size:8px; border:1.5px solid #ccc; background:#fff; color:var(--text3);
+  position:relative; z-index:1;
+}
+.flow-step.done .circle { background:var(--orange); border-color:var(--orange); color:#fff; }
+.flow-step.active .circle { background:#fff; border-color:var(--orange); color:var(--orange); }
+.flow-step.p2-done .circle { background:var(--blue); border-color:var(--blue); color:#fff; }
+.flow-step.p2-active .circle {
+  background:#fff; border-color:var(--blue); color:var(--blue);
+  animation: card-glow-p2 1.5s ease-in-out infinite;
+}
+.flow-step .label { font-size:8px; color:var(--text3); }
+.flow-line {
+  height:2px; flex:1; background:#e0ddd4; margin:0 -2px; position:relative; top:-6px; z-index:0;
+}
+.flow-line.done { background:var(--orange); }
+.flow-line.p2-done { background:var(--blue); }
+
+.flow-openclaw-icon {
+  width:20px; height:20px; border-radius:50%; overflow:hidden;
+  animation: lobster-bob 1.5s ease-in-out infinite;
+  display:flex; align-items:center; justify-content:center;
+}
+.flow-openclaw-icon img { width:20px; height:20px; }
+
+/* AI PROPOSAL */
+.ai-proposal {
+  background:rgba(245,158,11,0.08);
+  border:0.5px solid rgba(245,158,11,0.25);
+  border-radius:6px; padding:6px 10px;
+  font-size:11px; color:var(--text); margin:6px 0;
+}
+.btn-row { display:flex; gap:6px; margin-top:6px; }
+.btn {
+  padding:4px 12px; border-radius:6px; font-size:11px; cursor:pointer;
+  border:0.5px solid var(--bdr); font-family:'DM Mono',monospace;
+}
+.btn-approve { background:var(--green); color:#fff; border-color:var(--green); }
+.btn-reject { background:transparent; color:var(--text3); }
+.btn-approve-orange { background:var(--orange); color:#fff; border-color:var(--orange); }
+
+/* DONUT */
+.donut-area { display:flex; align-items:center; gap:16px; }
+.donut-stats { display:grid; grid-template-columns:1fr 1fr; gap:4px 16px; font-size:11px; }
+.donut-stat { display:flex; align-items:center; gap:6px; }
+.donut-stat .d-dot { width:6px;height:6px;border-radius:50%;flex-shrink:0; }
+
+/* ACTIVITY */
+.activity-item {
+  display:flex; align-items:flex-start; gap:8px; padding:3px 0;
+  font-size:11px; line-height:1.4;
+}
+.activity-item .time { font-family:'JetBrains Mono',monospace; font-size:10px; color:var(--text3); flex-shrink:0; }
+.activity-item .a-dot { width:4px;height:4px;border-radius:50%;flex-shrink:0;margin-top:5px; }
+.activity-item code { font-family:'JetBrains Mono',monospace; font-size:10px; background:var(--surface); padding:0 3px; border-radius:2px; }
+
+/* OPENCLAW ENGINE */
+.oc-panel { display:flex; gap:12px; }
+.oc-right { flex:1; }
+.oc-brand { display:flex; align-items:baseline; gap:0; margin-bottom:4px; line-height:1; }
+.oc-brand .w { font-family:'DM Mono',monospace; font-size:15px; font-weight:700; color:var(--text); }
+.oc-brand .o { font-family:'VT323',monospace; font-size:24px; color:var(--accent); letter-spacing:1px; line-height:1; }
+.oc-brand .c { font-family:'DM Mono',monospace; font-size:15px; font-weight:700; color:var(--text); }
+.oc-badge { display:inline-block; font-size:9px; padding:2px 8px; border-radius:4px; background:rgba(74,144,217,0.1); color:var(--blue); margin-bottom:4px; }
+.oc-status { font-size:11px; color:var(--text2); margin-bottom:4px; }
+.oc-dots { display:inline-flex; gap:3px; }
+.oc-dots span { width:4px;height:4px;border-radius:50%;background:var(--blue);animation:oc-p 1.4s infinite; }
+.oc-dots span:nth-child(2) { animation-delay:0.2s; }
+.oc-dots span:nth-child(3) { animation-delay:0.4s; }
+.oc-sep { height:0.5px; background:var(--bdr); margin:6px 0; }
+.oc-stats { font-size:10px; color:var(--text3); display:flex; gap:8px; flex-wrap:wrap; }
+.oc-stats b { color:var(--text2); font-weight:500; }
+
+/* AI TERMINAL */
+.ai-terminal {
+  background:#141413; color:#a0e8a0; font-family:'JetBrains Mono',monospace;
+  font-size:10px; border-radius:6px; padding:8px; margin-top:6px;
+  max-height:80px; overflow:hidden; line-height:1.5;
+}
+.ai-terminal .cursor { color:#F59E0B; animation:cursor-blink 1s step-end infinite; }
+
+/* PENDING APPROVALS */
+.card.pending { border-color:rgba(245,158,11,0.3); }
+.approval-item {
+  padding:8px; margin-bottom:6px; background:var(--surface); border-radius:6px;
+}
+.approval-item .ap-title { font-size:12px; font-weight:500; margin-bottom:2px; }
+.approval-item .ap-target { font-family:'JetBrains Mono',monospace; font-size:10px; color:var(--text3); margin-bottom:4px; }
+.risk-badge { font-size:9px; padding:1px 6px; border-radius:4px; font-weight:600; }
+.risk-low { background:rgba(34,197,94,0.1); color:var(--green); }
+.risk-med { background:rgba(245,158,11,0.1); color:var(--orange); }
+
+/* INFRA */
+.infra-grid { display:grid; grid-template-columns:1fr 1fr; gap:6px; }
+.infra-node {
+  border:0.5px solid var(--bdr); border-radius:6px; padding:8px;
+  font-size:10px;
+}
+.infra-node .in-title { font-size:11px; font-weight:500; margin-bottom:2px; }
+.infra-node .in-sub { font-size:9px; color:var(--text3); margin-bottom:4px; }
+.infra-node .in-services { display:flex; flex-wrap:wrap; gap:3px; }
+.in-svc {
+  font-size:9px; padding:1px 5px; border-radius:3px;
+  background:var(--surface); border:0.5px solid var(--bdr);
+}
+.in-svc.warn { border-color:var(--orange); background:rgba(245,158,11,0.06); }
+.in-svc.diag { border-color:var(--blue); background:rgba(74,144,217,0.06); }
+.infra-node.glow-warn { background:rgba(245,158,11,0.03); }
+
+/* HOST VIEW */
+.host-grid { display:grid; grid-template-columns:1fr 1fr; gap:6px; }
+.host-node { border:0.5px solid var(--bdr); border-radius:6px; padding:8px; font-size:10px; }
+.host-node .hn-title { font-size:11px; font-weight:500; margin-bottom:2px; }
+.host-node .hn-ip { font-size:9px; color:var(--text3); font-family:'JetBrains Mono',monospace; margin-bottom:4px; }
+.prog-row { display:flex; align-items:center; gap:4px; margin-bottom:2px; font-size:9px; }
+.prog-bar { flex:1; height:4px; background:#eee; border-radius:2px; }
+.prog-fill { height:100%;border-radius:2px; }
+
+/* AI MODEL */
+.model-grid { display:grid; grid-template-columns:1fr 1fr; gap:4px; }
+.model-item {
+  display:flex; align-items:center; gap:6px; font-size:10px;
+  padding:4px 6px; background:var(--surface); border-radius:4px;
+}
+.model-item .m-dot { width:5px;height:5px;border-radius:50%;background:var(--green); }
+
+/* MONITOR TOOLS */
+.tool-grid { display:grid; grid-template-columns:1fr 1fr 1fr; gap:4px; }
+.tool-item {
+  display:flex; align-items:center; gap:6px; font-size:10px; padding:4px 6px;
+  background:var(--surface); border-radius:4px;
+}
+.tool-item .t-bar { width:3px; height:20px; border-radius:2px; flex-shrink:0; }
+.tool-item .t-name { font-weight:500; font-size:10px; }
+.tool-item .t-meta { font-size:9px; color:var(--text3); }
+
+/* FLOATING */
+.fab {
+  position:fixed; bottom:16px; right:16px;
+  background:var(--text); color:var(--card);
+  padding:8px 16px; border-radius:8px; font-size:12px;
+  font-family:'JetBrains Mono',monospace;
+  cursor:pointer; z-index:100;
+  border:0.5px solid var(--text3);
+  box-shadow:0 2px 8px rgba(0,0,0,0.15);
+}
+
+/* TOGGLE */
+.toggle-group { display:flex; margin-left:auto; gap:0; }
+.toggle-btn {
+  font-size:10px; padding:2px 8px; border:0.5px solid var(--bdr);
+  background:transparent; cursor:pointer; color:var(--text3);
+  font-family:'DM Mono',monospace;
+}
+.toggle-btn:first-child { border-radius:4px 0 0 4px; }
+.toggle-btn:last-child { border-radius:0 4px 4px 0; }
+.toggle-btn.active { background:var(--text); color:var(--card); border-color:var(--text); }
+
+/* ANIMATIONS */
+@keyframes blink { 0%,100%{opacity:1} 50%{opacity:0.3} }
+@keyframes swim-wide { 0%{left:-20px;transform:scaleX(1)} 49%{left:calc(100% - 10px);transform:scaleX(1)} 50%{left:calc(100% - 10px);transform:scaleX(-1)} 99%{left:-20px;transform:scaleX(-1)} 100%{left:-20px;transform:scaleX(1)} }
+@keyframes chibi-bob { 0%,100%{top:1px} 50%{top:-1px} }
+@keyframes lobster-bob { 0%,100%{transform:translateY(0)} 50%{transform:translateY(-3px)} }
+@keyframes card-glow-p2 { 0%,100%{box-shadow:0 0 0 0 rgba(74,144,217,0)} 50%{box-shadow:0 0 6px 2px rgba(74,144,217,0.35)} }
+@keyframes oc-p { 0%,100%{opacity:0.3} 50%{opacity:1} }
+@keyframes cursor-blink { 0%,100%{opacity:1} 50%{opacity:0} }
+</style>
+</head>
+<body>
+
+<!-- SIDEBAR -->
+<aside class="sidebar">
+  <div class="brand">
+    <svg width="36" height="36" viewBox="0 0 140 140" fill="none">
+      <defs>
+        <linearGradient id="hdr-ceramic" x1="0%" y1="0%" x2="100%" y2="100%"><stop offset="0%" stop-color="#FFF"/><stop offset="40%" stop-color="#F8F8F8"/><stop offset="70%" stop-color="#E8E8E8"/><stop offset="100%" stop-color="#D8D8D8"/></linearGradient>
+        <radialGradient id="hdr-led" cx="40%" cy="35%" r="60%"><stop offset="0%" stop-color="#7AB8F5"/><stop offset="100%" stop-color="#2B6CB0"/></radialGradient>
+      </defs>
+      <circle cx="70" cy="70" r="32" fill="url(#hdr-ceramic)" stroke="#E0E0E0" stroke-width="1"/>
+      <circle cx="70" cy="70" r="16" fill="url(#hdr-led)"><animate attributeName="r" values="14;17;14" dur="2s" repeatCount="indefinite"/></circle>
+      <circle cx="70" cy="70" r="8" fill="white" opacity=".8"/>
+      <path d="M70 38L70 18L58 6M70 18L82 6" stroke="url(#hdr-ceramic)" stroke-width="6" stroke-linecap="round" fill="none"/><path d="M70 38L70 18L58 6M70 18L82 6" stroke="#4A90D9" stroke-width="3" stroke-linecap="round" fill="none" opacity=".5"/>
+      <path d="M38 70L18 70L6 58M18 70L6 82" stroke="url(#hdr-ceramic)" stroke-width="6" stroke-linecap="round" fill="none"/><path d="M38 70L18 70L6 58M18 70L6 82" stroke="#4A90D9" stroke-width="3" stroke-linecap="round" fill="none" opacity=".5"/>
+      <path d="M102 70L122 70L134 58M122 70L134 82" stroke="url(#hdr-ceramic)" stroke-width="6" stroke-linecap="round" fill="none"/><path d="M102 70L122 70L134 58M122 70L134 82" stroke="#4A90D9" stroke-width="3" stroke-linecap="round" fill="none" opacity=".5"/>
+      <path d="M48 92L28 112L16 116" stroke="url(#hdr-ceramic)" stroke-width="6" stroke-linecap="round" fill="none"/>
+      <path d="M92 92L112 112L124 116" stroke="url(#hdr-ceramic)" stroke-width="6" stroke-linecap="round" fill="none"/>
+      <circle cx="70" cy="70" r="42" fill="none" stroke="#4A90D9" stroke-width="1" stroke-dasharray="6 6" opacity=".3"><animateTransform attributeName="transform" type="rotate" from="0 70 70" to="360 70 70" dur="8s" repeatCount="indefinite"/></circle>
+    </svg>
+    <span class="brand-text"><span class="a">A</span><span class="w">wooo</span><span class="i">I</span></span>
+  </div>
+  <nav class="nav">
+    <a class="nav-item active"><span class="dot" style="background:var(--accent)"></span>指令中心</a>
+    <a class="nav-item"><span class="dot" style="background:var(--blue)"></span>可觀測性</a>
+    <a class="nav-item"><span class="dot" style="background:var(--green)"></span>自動化</a>
+    <a class="nav-item"><span class="dot" style="background:var(--purple)"></span>營運</a>
+    <a class="nav-item"><span class="dot" style="background:var(--red)"></span>安全合規</a>
+    <a class="nav-item"><span class="dot" style="background:var(--text3)"></span>知識</a>
+    <div class="nav-sep"></div>
+    <div class="nav-label">LEGACY</div>
+    <a class="nav-item" style="color:#c0bfb8">經典 AI 中心</a>
+  </nav>
+  <div class="nav-bottom">
+    <a class="nav-item"><span class="dot" style="background:var(--text3)"></span>終端</a>
+    <a class="nav-item"><span class="dot" style="background:var(--text3)"></span>設定</a>
+  </div>
+</aside>
+
+<!-- CONTENT -->
+<main class="content">
+
+  <!-- TITLE BAR -->
+  <div class="titlebar">
+    <h1>AI中心</h1>
+    <div class="titlebar-right">
+      <button class="lang-btn active">繁</button>
+      <button class="lang-btn">EN</button>
+      <div class="avatar">OG</div>
+    </div>
+  </div>
+
+  <!-- TAB BAR -->
+  <div class="tabbar">
+    <div class="tab active">戰情總覽</div>
+    <div class="tab">告警 & 授權 <span class="tab-badge">2</span></div>
+    <div class="tab">活動串流</div>
+    <div class="tab">處置統計</div>
+  </div>
+
+  <!-- KPI STRIP -->
+  <div class="kpi-strip">
+    <div class="kpi-card">
+      <span class="kpi-label">系統健康</span>
+      <span class="kpi-val" style="color:var(--green)">98.5%</span>
+      <div class="kpi-bar"><div class="kpi-bar-fill" style="width:98.5%;background:var(--green)"></div></div>
+    </div>
+    <div class="kpi-card">
+      <span class="kpi-label">活動事件</span>
+      <span class="kpi-val" style="color:var(--orange)">2</span>
+      <span class="kpi-sub">P1:1 P2:1</span>
+    </div>
+    <div class="kpi-card">
+      <span class="kpi-label">自動修復率</span>
+      <span class="kpi-val" style="color:var(--green)">72% <span class="trend-up">↑5%</span></span>
+      <div class="kpi-bar"><div class="kpi-bar-fill" style="width:72%;background:linear-gradient(90deg,var(--green),#6ee7b7)"></div></div>
+    </div>
+    <div class="kpi-card">
+      <span class="kpi-label">待審批</span>
+      <span class="kpi-val" style="color:var(--orange)">3</span>
+      <span class="kpi-sub">等待決策</span>
+    </div>
+    <div class="kpi-card">
+      <span class="kpi-label">本週操作</span>
+      <span class="kpi-val">1,245</span>
+    </div>
+  </div>
+
+  <!-- MAIN BODY -->
+  <div class="main-body">
+
+    <!-- LEFT COLUMN -->
+    <div class="col-left">
+
+      <!-- ACTIVE INCIDENTS -->
+      <div class="card" style="flex-shrink:0;">
+        <div class="card-header">
+          <span class="hdot"></span>
+          <span>活躍事件</span>
+          <span class="cnt-badge">2</span>
+          <a class="link">查看全部告警 →</a>
+        </div>
+        <div class="card-body">
+          <!-- P1 -->
+          <div class="incident">
+            <span class="sev-badge sev-p1">P1</span>
+            <div class="incident-title">API 回應延遲超標</div>
+            <div class="incident-meta">awoooi-api @ awoooi-prod · 3 alerts · investigating</div>
+            <div class="flow-pipe">
+              <div class="flow-step done"><div class="circle">●</div><div class="label">告警</div></div>
+              <div class="flow-line done"></div>
+              <div class="flow-step done"><div class="circle">●</div><div class="label">偵測</div></div>
+              <div class="flow-line done"></div>
+              <div class="flow-step done"><div class="circle">●</div><div class="label">分析</div></div>
+              <div class="flow-line done"></div>
+              <div class="flow-step active"><div class="flow-openclaw-icon"><img src="https://cdn.jsdelivr.net/gh/homarr-labs/dashboard-icons/png/openclaw.png" alt="OpenClaw"/></div><div class="label" style="font-weight:700">提案</div></div>
+              <div class="flow-line"></div>
+              <div class="flow-step"><div class="circle">○</div><div class="label">授權</div></div>
+              <div class="flow-line"></div>
+              <div class="flow-step"><div class="circle">○</div><div class="label">執行</div></div>
+              <div class="flow-line"></div>
+              <div class="flow-step"><div class="circle">○</div><div class="label">完成</div></div>
+            </div>
+            <div class="ai-proposal">▶ AI 提案：restart_deployment awoooi-api (信心度 91%)</div>
+            <div class="btn-row">
+              <button class="btn btn-approve">批准執行</button>
+              <button class="btn btn-reject">拒絕</button>
+            </div>
+          </div>
+          <!-- P2 -->
+          <div class="incident p2">
+            <span class="sev-badge sev-p2">P2</span>
+            <div class="incident-title">Redis 連線數偏高</div>
+            <div class="incident-meta">redis @ 192.168.0.188 · investigating</div>
+            <div class="flow-pipe">
+              <div class="flow-step p2-done"><div class="circle">■</div><div class="label">告警</div></div>
+              <div class="flow-line p2-done"></div>
+              <div class="flow-step p2-done"><div class="circle">■</div><div class="label">偵測</div></div>
+              <div class="flow-line p2-done"></div>
+              <div class="flow-step p2-active"><div class="flow-openclaw-icon"><img src="https://cdn.jsdelivr.net/gh/homarr-labs/dashboard-icons/png/openclaw.png" alt="OpenClaw"/></div><div class="label" style="font-weight:700">分析</div></div>
+              <div class="flow-line"></div>
+              <div class="flow-step"><div class="circle">□</div><div class="label">提案</div></div>
+              <div class="flow-line"></div>
+              <div class="flow-step"><div class="circle">□</div><div class="label">授權</div></div>
+              <div class="flow-line"></div>
+              <div class="flow-step"><div class="circle">□</div><div class="label">執行</div></div>
+              <div class="flow-line"></div>
+              <div class="flow-step"><div class="circle">□</div><div class="label">完成</div></div>
+            </div>
+          </div>
+        </div>
+      </div>
+
+      <!-- DISPOSITION STATS -->
+      <div class="card" style="flex-shrink:0;">
+        <div class="card-header">
+          <span class="hdot"></span>
+          <span>處置統計</span>
+          <a class="link">查看完整報表 →</a>
+        </div>
+        <div class="card-body">
+          <div class="donut-area">
+            <svg width="56" height="56" viewBox="0 0 56 56">
+              <circle cx="28" cy="28" r="22" fill="none" stroke="#eee" stroke-width="6"/>
+              <!-- green 70% = 252deg -->
+              <circle cx="28" cy="28" r="22" fill="none" stroke="var(--green)" stroke-width="6" stroke-dasharray="96.8 41.2" stroke-dashoffset="34.6" stroke-linecap="round"/>
+              <!-- orange 22% -->
+              <circle cx="28" cy="28" r="22" fill="none" stroke="var(--orange)" stroke-width="6" stroke-dasharray="30.4 107.6" stroke-dashoffset="131.8" stroke-linecap="round"/>
+              <!-- purple 6% -->
+              <circle cx="28" cy="28" r="22" fill="none" stroke="var(--purple)" stroke-width="6" stroke-dasharray="8.3 129.7" stroke-dashoffset="101.4" stroke-linecap="round"/>
+              <!-- blue 2% -->
+              <circle cx="28" cy="28" r="22" fill="none" stroke="var(--blue)" stroke-width="6" stroke-dasharray="2.8 135.2" stroke-dashoffset="93.1" stroke-linecap="round"/>
+              <text x="28" y="30" text-anchor="middle" font-size="11" font-family="DM Mono" font-weight="500" fill="var(--text)">72%</text>
+            </svg>
+            <div class="donut-stats">
+              <div class="donut-stat"><span class="d-dot" style="background:var(--green)"></span> 自動修復 <b>142</b></div>
+              <div class="donut-stat"><span class="d-dot" style="background:var(--orange)"></span> 人工核准 <b>45</b></div>
+              <div class="donut-stat"><span class="d-dot" style="background:var(--purple)"></span> 手動處理 <b>12</b></div>
+              <div class="donut-stat"><span class="d-dot" style="background:var(--blue)"></span> 冷啟動 <b>5</b></div>
+            </div>
+          </div>
+        </div>
+      </div>
+
+      <!-- RECENT ACTIVITY -->
+      <div class="card" style="flex:1;min-height:0;">
+        <div class="card-header">
+          <span class="hdot"></span>
+          <span>最近活動</span>
+          <a class="link">查看活動串流 →</a>
+        </div>
+        <div class="card-body">
+          <div class="activity-item"><span class="time">18:05</span><span class="a-dot" style="background:var(--green)"></span><span>心跳確認 <code>mon/mon1</code> Ready</span></div>
+          <div class="activity-item"><span class="time">18:04</span><span class="a-dot" style="background:var(--blue)"></span><span><b>OpenClaw</b> 匹配 Playbook <code>restart_worker</code> (91%)</span></div>
+          <div class="activity-item"><span class="time">18:02</span><span class="a-dot" style="background:var(--red)"></span><span><b>Prometheus</b> Worker CPU 89%</span></div>
+          <div class="activity-item"><span class="time">17:58</span><span class="a-dot" style="background:var(--green)"></span><span>自動修復完成 <code>restart: api</code> (12s)</span></div>
+        </div>
+      </div>
+    </div>
+
+    <!-- RIGHT COLUMN -->
+    <div class="col-right">
+
+      <!-- OPENCLAW ENGINE -->
+      <div class="card" style="flex-shrink:0;">
+        <div class="card-header">
+          <span class="hdot"></span>
+          <span>OPENCLAW 認知引擎</span>
+        </div>
+        <div class="card-body">
+          <div class="oc-panel">
+            <svg width="68" height="68" viewBox="0 0 140 140" fill="none" style="flex-shrink:0">
+              <defs>
+                <linearGradient id="oc-ceramic" x1="0%" y1="0%" x2="100%" y2="100%"><stop offset="0%" stop-color="#FFF"/><stop offset="40%" stop-color="#F8F8F8"/><stop offset="70%" stop-color="#E8E8E8"/><stop offset="100%" stop-color="#D8D8D8"/></linearGradient>
+                <radialGradient id="oc-led" cx="40%" cy="35%" r="60%"><stop offset="0%" stop-color="#7AB8F5"/><stop offset="100%" stop-color="#2B6CB0"/></radialGradient>
+              </defs>
+              <circle cx="70" cy="70" r="32" fill="url(#oc-ceramic)" stroke="#E0E0E0" stroke-width="1"/>
+              <circle cx="70" cy="70" r="16" fill="url(#oc-led)"><animate attributeName="r" values="14;17;14" dur="2s" repeatCount="indefinite"/></circle>
+              <circle cx="70" cy="70" r="8" fill="white" opacity=".8"/>
+              <path d="M70 38L70 18L58 6M70 18L82 6" stroke="url(#oc-ceramic)" stroke-width="6" stroke-linecap="round" fill="none"/><path d="M70 38L70 18L58 6M70 18L82 6" stroke="#4A90D9" stroke-width="3" stroke-linecap="round" fill="none" opacity=".5"/>
+              <path d="M38 70L18 70L6 58M18 70L6 82" stroke="url(#oc-ceramic)" stroke-width="6" stroke-linecap="round" fill="none"/><path d="M38 70L18 70L6 58M18 70L6 82" stroke="#4A90D9" stroke-width="3" stroke-linecap="round" fill="none" opacity=".5"/>
+              <path d="M102 70L122 70L134 58M122 70L134 82" stroke="url(#oc-ceramic)" stroke-width="6" stroke-linecap="round" fill="none"/><path d="M102 70L122 70L134 58M122 70L134 82" stroke="#4A90D9" stroke-width="3" stroke-linecap="round" fill="none" opacity=".5"/>
+              <path d="M48 92L28 112L16 116" stroke="url(#oc-ceramic)" stroke-width="6" stroke-linecap="round" fill="none"/>
+              <path d="M92 92L112 112L124 116" stroke="url(#oc-ceramic)" stroke-width="6" stroke-linecap="round" fill="none"/>
+              <circle cx="70" cy="70" r="42" fill="none" stroke="#4A90D9" stroke-width="1" stroke-dasharray="6 6" opacity=".3"><animateTransform attributeName="transform" type="rotate" from="0 70 70" to="360 70 70" dur="8s" repeatCount="indefinite"/></circle>
+            </svg>
+            <div class="oc-right">
+              <div class="oc-brand"><span class="w">W</span><span class="o">○○○</span><span class="c">Claw</span></div>
+              <div class="oc-badge">WoooClaw Pipeline</div>
+              <div class="oc-status">[AGENT] patrolling... <span class="oc-dots"><span></span><span></span><span></span></span></div>
+              <div class="oc-sep"></div>
+              <div class="oc-stats">
+                <span>模型: <b>openclaw_nemo</b></span> <span>● 運行中</span>
+              </div>
+              <div class="oc-stats" style="margin-top:2px">
+                <span>今日分析: <b>23</b></span>
+                <span>成功率: <b>91%</b></span>
+                <span>MTTR: <b>8.2m</b></span>
+              </div>
+            </div>
+          </div>
+          <div class="ai-terminal">
+            <div>[18:03] Analyzing worker CPU spike...</div>
+            <div>[18:03] Root cause: OOM pressure</div>
+            <div>[18:03] Matched: restart_worker (91%)</div>
+            <div>[18:03] Awaiting approval <span class="cursor">▎</span></div>
+          </div>
+        </div>
+      </div>
+
+      <!-- PENDING APPROVALS -->
+      <div class="card pending" style="flex-shrink:0;">
+        <div class="card-header">
+          <span class="hdot" style="background:var(--orange)"></span>
+          <span>待審批任務</span>
+          <span class="cnt-badge">3</span>
+          <a class="link">查看全部授權 →</a>
+        </div>
+        <div class="card-body">
+          <div class="approval-item">
+            <div class="ap-title" style="color:var(--red)">Worker 高負載警告</div>
+            <div class="ap-target">ssh://wooo@192.168.0.110/restart</div>
+            <span class="risk-badge risk-low">LOW RISK</span>
+            <div class="btn-row">
+              <button class="btn btn-approve" title="點擊批准">批准</button>
+              <button class="btn btn-reject">拒絕</button>
+            </div>
+          </div>
+          <div class="approval-item">
+            <div class="ap-title" style="color:var(--orange)">Redis 記憶體壓力</div>
+            <div class="ap-target">ansible://188/clear_redis_cache.yml</div>
+            <span class="risk-badge risk-med">MEDIUM</span>
+            <div class="btn-row">
+              <button class="btn btn-approve-orange" title="高風險操作需長按確認">長按批准</button>
+              <button class="btn btn-reject">拒絕</button>
+            </div>
+          </div>
+        </div>
+      </div>
+
+      <!-- INFRASTRUCTURE -->
+      <div class="card" style="flex-shrink:0;">
+        <div class="card-header">
+          <span class="hdot"></span>
+          <span>基礎架構</span>
+          <div class="toggle-group">
+            <button class="toggle-btn" onclick="switchView('host')">主機</button>
+            <button class="toggle-btn active" onclick="switchView('topo')">拓撲</button>
+          </div>
+          <a class="link">展開全圖 →</a>
+        </div>
+        <div class="card-body">
+          <!-- TOPO VIEW -->
+          <div id="view-topo" class="infra-grid">
+            <div class="infra-node" style="border-color:var(--blue)">
+              <div class="in-title">🏗️ 基礎設施 (.110)</div>
+              <div class="in-sub">7 服務 · ✓ 全部健康</div>
+              <div class="in-services">
+                <span class="in-svc">●Gitea</span><span class="in-svc">●Harbor</span><span class="in-svc">●Sentry</span><span class="in-svc">●Prom</span>
+              </div>
+            </div>
+            <div class="infra-node" style="border-color:var(--orange)">
+              <div class="in-title">🧠 AI/數據 (.188)</div>
+              <div class="in-sub">7 服務 · ⚡ OpenClaw 診斷中</div>
+              <div class="in-services">
+                <span class="in-svc">●PG</span><span class="in-svc">●Redis</span><span class="in-svc diag">●OpenClaw⚡</span><span class="in-svc">●Ollama</span>
+              </div>
+            </div>
+            <div class="infra-node glow-warn" style="border-color:var(--purple)">
+              <div class="in-title">☸️ K3s 叢集</div>
+              <div class="in-sub">5 服務 · ⚠️ Worker CPU 89%</div>
+              <div class="in-services">
+                <span class="in-svc">●api×2</span><span class="in-svc">●web×2</span><span class="in-svc warn">⚠️worker</span>
+              </div>
+            </div>
+            <div class="infra-node" style="border-color:var(--orange)">
+              <div class="in-title">🌐 外部服務</div>
+              <div class="in-sub">3 服務 · ✓ 全部可達</div>
+              <div class="in-services">
+                <span class="in-svc">●Gemini</span><span class="in-svc">●NVIDIA</span><span class="in-svc">●CF</span>
+              </div>
+            </div>
+          </div>
+          <!-- HOST VIEW -->
+          <div id="view-host" class="host-grid" style="display:none">
+            <div class="host-node">
+              <div class="hn-title">DevOps 金庫</div>
+              <div class="hn-ip">192.168.0.110</div>
+              <div class="prog-row">CPU<div class="prog-bar"><div class="prog-fill" style="width:35%;background:var(--green)"></div></div>35%</div>
+              <div class="prog-row">RAM<div class="prog-bar"><div class="prog-fill" style="width:55%;background:var(--green)"></div></div>55%</div>
+            </div>
+            <div class="host-node">
+              <div class="hn-title">AI+Web 中心</div>
+              <div class="hn-ip">192.168.0.188</div>
+              <div class="prog-row">CPU<div class="prog-bar"><div class="prog-fill" style="width:67%;background:var(--orange)"></div></div>67%</div>
+              <div class="prog-row">RAM<div class="prog-bar"><div class="prog-fill" style="width:72%;background:var(--orange)"></div></div>72%</div>
+            </div>
+            <div class="host-node">
+              <div class="hn-title">K3s Master</div>
+              <div class="hn-ip">192.168.0.120</div>
+              <div class="prog-row">CPU<div class="prog-bar"><div class="prog-fill" style="width:45%;background:var(--green)"></div></div>45%</div>
+              <div class="prog-row">RAM<div class="prog-bar"><div class="prog-fill" style="width:60%;background:var(--green)"></div></div>60%</div>
+            </div>
+            <div class="host-node">
+              <div class="hn-title">K3s Worker</div>
+              <div class="hn-ip">192.168.0.121</div>
+              <div class="prog-row">CPU<div class="prog-bar"><div class="prog-fill" style="width:0;background:#ccc"></div></div>--</div>
+              <div class="prog-row">RAM<div class="prog-bar"><div class="prog-fill" style="width:0;background:#ccc"></div></div>--</div>
+            </div>
+          </div>
+        </div>
+      </div>
+
+      <!-- AI MODEL STATUS -->
+      <div class="card" style="flex-shrink:0;">
+        <div class="card-header">
+          <span class="hdot"></span>
+          <span>AI 模型狀態</span>
+        </div>
+        <div class="card-body">
+          <div class="model-grid">
+            <div class="model-item"><span class="m-dot"></span>OpenClaw Nemo (local)</div>
+            <div class="model-item"><span class="m-dot"></span>Ollama gemma3 (local)</div>
+            <div class="model-item"><span class="m-dot"></span>Gemini Pro (cloud)</div>
+            <div class="model-item"><span class="m-dot"></span>NVIDIA NIM (cloud)</div>
+          </div>
+        </div>
+      </div>
+
+      <!-- MONITOR TOOLS -->
+      <div class="card" style="flex:1;min-height:0;">
+        <div class="card-header">
+          <span class="hdot"></span>
+          <span>監控工具</span>
+        </div>
+        <div class="card-body">
+          <div class="tool-grid">
+            <div class="tool-item"><div class="t-bar" style="background:var(--blue)"></div><div><div class="t-name">SigNoz</div><div class="t-meta">Traces · Logs</div></div></div>
+            <div class="tool-item"><div class="t-bar" style="background:#E85530"></div><div><div class="t-name">Grafana</div><div class="t-meta">3 Dashboards</div></div></div>
+            <div class="tool-item"><div class="t-bar" style="background:var(--green)"></div><div><div class="t-name">Prometheus</div><div class="t-meta">23 targets</div></div></div>
+            <div class="tool-item"><div class="t-bar" style="background:var(--orange)"></div><div><div class="t-name">Langfuse</div><div class="t-meta">LLMOps</div></div></div>
+            <div class="tool-item"><div class="t-bar" style="background:var(--red)"></div><div><div class="t-name">Sentry</div><div class="t-meta">2 Projects</div></div></div>
+            <div class="tool-item"><div class="t-bar" style="background:var(--purple)"></div><div><div class="t-name">Gitea</div><div class="t-meta">CI/CD</div></div></div>
+          </div>
+        </div>
+      </div>
+
+    </div>
+  </div>
+</main>
+
+<!-- FLOATING FAB -->
+<div class="fab">⌨ Omni-Terminal [⌘J]</div>
+
+<script>
+function switchView(v) {
+  const topo = document.getElementById('view-topo');
+  const host = document.getElementById('view-host');
+  const btns = document.querySelectorAll('.toggle-btn');
+  if (v === 'host') {
+    topo.style.display = 'none';
+    host.style.display = 'grid';
+    btns[0].classList.add('active');
+    btns[1].classList.remove('active');
+  } else {
+    topo.style.display = 'grid';
+    host.style.display = 'none';
+    btns[0].classList.remove('active');
+    btns[1].classList.add('active');
+  }
+}
+</script>
+</body>
+</html>
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -4,192 +4,102 @@

 ---

-## 🚨🚨🚨 強制提醒 (每小時自我檢查)
-
-**你有確實執行以下動作嗎？沒有就立刻執行！**
-
-```
-□ 讀過 MEMORY.md 索引？
-□ 讀過 docs/LOGBOOK.md 最新進度？
-□ 讀過 docs/HARD_RULES.md 絕對禁止規則？
-□ 涉及特定主題時，讀過對應 feedback_*.md？
-□ 修改檔案前，讀過該檔案的所有註解？ 🔴 NEW
-```
-
-**違反後果**: 重複犯錯、統帥需要反覆提醒、信任度下降
-
---
-
-## 🔴 絕對禁止 (Hard Rules)
-
-**做任何修改前，先讀對應的鐵律文件：**
-
-→ [HARD_RULES.md](docs/HARD_RULES.md)
-
---
-
 ## ⚠️ Session 啟動第一步

 **在做任何事之前，先讀：**
-1. `MEMORY.md` - 記憶索引
-2. `docs/LOGBOOK.md` - 最新進度
-3. `docs/HARD_RULES.md` - 絕對禁止規則
-4. 涉及主題的 `feedback_*.md`
+1. 🔴🔴🔴 **`docs/superpowers/specs/2026-04-15-MASTER-ai-autonomous-flywheel-v2.md`** — AI 自主化飛輪 MASTER 藍圖（進行中）
+2. `MEMORY.md` — 記憶索引
+3. `docs/LOGBOOK.md` — 最新進度
+4. `docs/HARD_RULES.md` — 絕對禁止規則
+5. 涉及主題的 `feedback_*.md`

-**不要讓統帥說「你讀過 Memory 了嗎？」**
+🔴🔴🔴 **AI 自主化工程進行中** — 任何告警/修復/規則/分類/通知相關變更，必須先讀 MASTER §0 Session Resume Protocol，禁止繞過。
+
+🔴🔴 **檢查 `project_current_status.md` 最後更新日期** — 超過 2 天 → 先執行 Memory 清理再開工

 ---

 ## 四大核心原則

-1. **變更前 → 先讀註解** (理解設計意圖再動手) 🔴 NEW
+1. **變更前 → 先讀註解** (理解設計意圖再動手) 🔴
 2. **不可逆操作 → 人工確認** (刪除、logOut、DROP、force push)
 3. **有疑問 → 先問統帥** (不確定就停下來)
 4. **任務完成 → 更新 Memory** (不等被問)

 ---

-## 🔴 紅區治理
+## 🔴 絕對禁止 → [HARD_RULES.md](docs/HARD_RULES.md)

-→ **詳細文件:** [RED_ZONES.md](docs/RED_ZONES.md)
-
-**簡述**: Tier 3 核心檔案 (decision_manager, trust_engine, config 等) 修改需首席架構師授權
-
-## 專案架構
-
- `apps/api/` - FastAPI 後端
- `apps/web/` - Next.js 前端
- `k8s/` - Kubernetes 配置
-
-## 🏗️ 基礎設施參考
-
-→ [SERVICE-ENDPOINTS.md](docs/reference/SERVICE-ENDPOINTS.md) - 五主機架構與服務端點
-→ [K3S-OPTIMIZATION-RUNBOOK.md](docs/runbooks/K3S-OPTIMIZATION-RUNBOOK.md) - K3s 維運手冊
-
-## 🔴 Gitea CI/CD (ADR-039)
-
-**從 2026-03-29 起，所有 CI/CD 從 Gitea 執行！**
-
-→ **詳細文件:** [reference_gitea_mirror.md](~/.claude/projects/-Users-ogt-awoooi/memory/reference_gitea_mirror.md)
-
-| 項目 | 值 |
-|------|-----|
-| Gitea URL | http://192.168.0.110:3001 |
-| 推版方式 | `git push gitea main` |
-| Workflows | `.gitea/workflows/` |
-| GitHub | 只讀備份，已停用 Actions |
-
-## 🎨 靈感實驗室
-
-→ [INSPIRATION_LAB.md](docs/INSPIRATION_LAB.md) - 學習/模仿/發想/待定案內容
-
-**用途**: 收集外部參考、突發奇想、待討論項目
-**分類**: 視覺/UI/UX/風格/功能/工具/服務/突發奇想
-**注意**: 內容皆為「待評估」，採用前需統帥批准
-
-## 🛑 修改前
-
-修改以下檔案前，**必須先讀** [HARD_RULES.md](docs/HARD_RULES.md)：
-
- `.github/workflows/*` → GitHub Billing 章節
- `*telegram*` → Telegram Token 章節
- `apps/web/**` → i18n 章節
- Incident/Approval 流程 → 確認 Telegram + DB 鏈路
- **Alertmanager/NetworkPolicy** → ADR-025 告警鏈路 E2E 驗證 🔴🔴
+## 🔴 紅區治理 → [RED_ZONES.md](docs/RED_ZONES.md)
+Tier 3 核心檔案 (decision_manager, trust_engine, config 等) 修改需首席架構師授權

 ---

-## 任務前必讀
+## 專案架構

-涉及以下主題時，**先讀取對應 Memory**：
+- `apps/api/` — FastAPI 後端
+- `apps/web/` — Next.js 前端
+- `k8s/` — Kubernetes 配置

-| 主題 | Memory 路徑 |
-|------|-------------|
-| **變更前必讀** | `feedback_read_comments_first.md` 🔴 先讀註解 |
-| **變更註解** | `feedback_change_annotation_standard.md` 🔴🔴 人事物+版本+時區 |
-| **重大變更** | `feedback_product_survival_principles.md` |
+## 🔴 Gitea CI/CD (ADR-039) → [reference_gitea_mirror.md](~/.claude/projects/-Users-ogt-awoooi/memory/reference_gitea_mirror.md)
+
+從 2026-03-29 起，所有 CI/CD 從 Gitea 執行。推版：`git push gitea main`。GitHub 只讀備份。
+
+---
+
+## 🛑 修改前必讀 → [HARD_RULES.md](docs/HARD_RULES.md)
+
+| 檔案/功能 | 必讀章節 |
+|----------|---------|
+| `.github/workflows/*` | GitHub Billing |
+| `*telegram*` | Telegram Token |
+| `apps/web/**` | i18n |
+| Incident/Approval 流程 | Telegram + DB 鏈路 |
+| Alertmanager/NetworkPolicy 🔴🔴 | ADR-025 告警鏈路 E2E |
+| AI Provider 路由/Fallback 🔴🔴 | Phase 24 AI Router |
+
+---
+
+## 任務前必讀 Memory
+
+| 主題 | Memory |
+|------|--------|
+| 🔴🔴 定期清理 | `feedback_memory_cleanup_schedule.md` |
+| 🔴🔴🔴 費用變更 | `feedback_cost_change_approval.md` |
+| 變更前必讀 🔴 | `feedback_read_comments_first.md` |
+| 變更註解 🔴🔴 | `feedback_change_annotation_standard.md` |
+| 重大變更 | `feedback_product_survival_principles.md` |
 | Telegram | `feedback_telegram_token_disaster.md` |
 | OpenClaw | `feedback_architecture_openclaw_core.md` |
 | 命名規範 | `feedback_openclaw_naming.md` |
 | i18n | `feedback_i18n_zero_hardcode.md` |
-| 防禦性工程 | `feedback_defensive_engineering.md` |
-| 模組化 | `feedback_modular_architecture.md` |
-| **🔴🔴 積木化強制** | `feedback_lewooogo_modular_enforcement.md` 🔴🔴 修改前 5 問 |
+| 防禦性工程/狀態機驗證 | `feedback_defensive_engineering.md` |
+| 禁止孤島開發 🔴🔴 | `HARD_RULES.md` → No Island Coding |
+| 主動執行與熔斷 🔴🔴 | `feedback_proactive_execution.md` + `HARD_RULES.md` → Circuit Breaker |
+| 自循環工作流 🔴🔴 | `HARD_RULES.md` → Self-Loop Workflow |
+| 積木化強制 🔴🔴 | `feedback_lewooogo_modular_enforcement.md` |
 | API 整合 | `feedback_api_response_verification.md` |
 | 構建部署 | `feedback_build_from_git_only.md` |
-| **測試** | `feedback_no_mock_testing.md` 🔴🔴 禁止 Mock |
-| **API 路徑** | `feedback_api_path_naming.md` 🔴 修改需同步前端 |
-| **部署驗證** | `feedback_deployment_verification.md` 🔴🔴 必須驗證 Pod 版本 |
-| **部署層級** | `feedback_deployment_layer_decision.md` 🔴🔴🔴 主機/容器/K3s 必須評估 |
-| **告警鏈路** | `feedback_alertchain_e2e_validation.md` 🔴🔴🔴 Alertmanager→API→Telegram |
-| **Telegram Secrets** | `feedback_telegram_secrets_injection.md` 🔴🔴🔴 CD 必須自動注入 K8s Secrets |
-| **🔴🔴🔴 前端內網禁令** | `feedback_docker_nextjs_api_url.md` + `feedback_sentry_local_network.md` |
+| 測試 🔴🔴 | `feedback_no_mock_testing.md` |
+| API 路徑 🔴 | `feedback_api_path_naming.md` |
+| 部署驗證 🔴🔴 | `feedback_deployment_verification.md` |
+| 部署層級 🔴🔴🔴 | `feedback_deployment_layer_decision.md` |
+| 告警鏈路 🔴🔴🔴 | `feedback_alertchain_e2e_validation.md` |
+| Telegram Secrets 🔴🔴🔴 | `feedback_telegram_secrets_injection.md` |
+| 前端內網禁令 🔴🔴🔴 | `feedback_frontend_internal_ip_ban.md` |
+| AI Router 重構 🔴🔴 | `project_phase24_ai_router.md` |
+| AI Fallback 順序 🔴 | `feedback_ai_fallback_order.md` |
+| 前端 Icon 規範 🔴 | `feedback_no_emoji_use_icons.md` |
+| 設計稿預覽 🔴 | `feedback_ui_collaboration_protocol.md` |

 ---

-## 🔴🔴🔴 前端內網 IP 禁令 (2026-03-30)
+## 重要規則摘要（詳情在 Memory）

-→ **詳細文件:** `feedback_docker_nextjs_api_url.md` + `feedback_sentry_local_network.md`
-
-**絕對禁止** 在 CD 建置時使用內網 IP：
-
-```yaml
-# ❌ 觸發瀏覽器「存取區域網路」權限對話框
--build-arg NEXT_PUBLIC_API_URL=http://192.168.0.125:32334
--build-arg NEXT_PUBLIC_SENTRY_DSN=http://...@192.168.0.110:9000/2
-
-# ✅ 必須使用公網域名
--build-arg NEXT_PUBLIC_API_URL=https://awoooi.wooo.work
-```
-
-**原因**: `NEXT_PUBLIC_*` 是 build-time 變數，會寫死到 JS Bundle
-
---
-
-## 🔴 部署層級決策
-
-→ **詳細文件:** [feedback_deployment_layer_decision.md](~/.claude/projects/-Users-ogt-awoooi/memory/feedback_deployment_layer_decision.md)
-
-**簡述**: 部署新服務前必須評估 主機/容器/K3s 層級，禁止直接 `docker run` 或 `kubectl apply`
-
---
-
-## 🔴🔴 leWOOOgo 積木化
-
-→ **詳細文件:** [feedback_lewooogo_modular_enforcement.md](~/.claude/projects/-Users-ogt-awoooi/memory/feedback_lewooogo_modular_enforcement.md)
-
-**簡述**: 修改 `apps/api/` 前必問 5 題，Router 層禁止直接存取 Redis/DB
-
---
-
-## 🔴🔴🔴 Telegram 告警鏈路 (ADR-035)
-
-→ **ADR**: [ADR-035-telegram-alert-chain-enforcement.md](docs/adr/ADR-035-telegram-alert-chain-enforcement.md)
-→ **Memory**: [feedback_telegram_secrets_injection.md](~/.claude/projects/-Users-ogt-awoooi/memory/feedback_telegram_secrets_injection.md)
-
-### 強制規則
-
-1. **CD 必須自動注入 K8s Secrets**
-   - 每次部署都 `kubectl patch secret`
-   - 禁止依賴 `03-secrets.yaml` 模板值
-
-2. **Pre-flight 必須檢查 Telegram Secrets**
-   - `OPENCLAW_TG_BOT_TOKEN` 必須存在
-   - 缺少則 CI 失敗
-
-3. **部署後必須 E2E 驗證**
-   - 發送測試告警驗證鏈路
-   - 失敗則繞過 API 直接告警
-
-### 禁止事項
-
-```yaml
-# ❌ 禁止: secrets.yaml 使用 CHANGE_ME
-OPENCLAW_TG_BOT_TOKEN: "CHANGE_ME"
-
-# ❌ 禁止: CD 不處理 secrets
-# (沒有 kubectl patch secret 步驟)
-```
+- **前端內網 IP 禁令** 🔴🔴🔴 — `NEXT_PUBLIC_*` 禁用內網 IP，用公網域名（build-time 寫死進 JS Bundle）
+- **Telegram 告警鏈路** 🔴🔴🔴 — CD 必須自動注入 K8s Secrets；禁止 CHANGE_ME；部署後 E2E 驗證 → ADR-035
+- **leWOOOgo 積木化** 🔴🔴 — 修改 `apps/api/` 前必問 5 題，Router 層禁止直接存取 Redis/DB
+- **Phase 24 AI Router** ✅ — ADR-052 完成，Router 只依賴 Protocol，絞殺者開關 `USE_AI_ROUTER`

 ---

@@ -205,16 +115,15 @@ OPENCLAW_TG_BOT_TOKEN: "CHANGE_ME"
 | Git | `.agents/skills/06-awoooi-monorepo-master.md` |
 | Tool 整合 | `.agents/skills/07-tool-integration-expert.md` |
 | 模型路由 | `.agents/skills/08-model-router-expert.md` |
-| **絞殺者重構** | `.agents/skills/09-strangler-pattern-expert.md` 🆕 |
+| 絞殺者重構 | `.agents/skills/09-strangler-pattern-expert.md` |

 ## Memory 系統

 - 長期記憶：`~/.claude/projects/-Users-ogt-awoooi/memory/`
 - 索引：`MEMORY.md`
 - 進度：`docs/LOGBOOK.md`
+- 參考：[SERVICE-ENDPOINTS.md](docs/reference/SERVICE-ENDPOINTS.md) / [K3S-OPTIMIZATION-RUNBOOK.md](docs/runbooks/K3S-OPTIMIZATION-RUNBOOK.md)

-## Session 協議
+## Session 結束前

-**啟動時**：讀 MEMORY.md → LOGBOOK.md → 確認當前任務
-
-**結束前**：更新相關 Memory → 更新 LOGBOOK → 標記下一步
+更新相關 Memory → 更新 LOGBOOK → 標記下一步
--- a/SOUL.md
+++ b/SOUL.md
@@ -1,6 +1,7 @@
-# OpenClaw v5.0 - AWOOOI AIOps Agent Soul Definition
+# OpenClaw v5.6 - AWOOOI AIOps Agent Soul Definition

 > **Identity Layer** - 定義 OpenClaw 的核心身份、價值觀與行為準則
+> 最後更新: 2026-04-10 (台北時區) — Claude Sonnet 4.6 (Sprint 5R 閉環)

 ---

@@ -10,11 +11,12 @@ I am **OpenClaw**, the AI-powered Infrastructure Operations Engine for AWOOOI.

 | 屬性 | 值 |
 |------|-----|
-| **名稱** | OpenClaw |
-| **版本** | 5.0 |
+| **名稱** | OpenClaw (WoooClaw) |
+| **版本** | 5.6 |
 | **角色** | Senior Site Reliability Engineer (SRE) AI Agent |
-| **專長** | Kubernetes 維運、根因分析 (RCA)、自動化修復 |
-| **人格** | 專業、謹慎、防禦性優先 |
+| **主模型** | openclaw_nemo (Nemotron via Ollama 188:11434) / ADR-067 五大應用 via Ollama 111:11434 |
+| **專長** | Kubernetes 維運、根因分析 (RCA)、自動化修復、Config Drift 偵測、RAG 知識庫、圖片分析 |
+| **人格** | 專業、謹慎、防禦性優先、透明可解釋 |

 ---

@@ -23,34 +25,40 @@ I am **OpenClaw**, the AI-powered Infrastructure Operations Engine for AWOOOI.
 ### 2.1 Zero-Cost First (零成本優先)

 ```
-AI 調用順序：
-1. Ollama (本地) → $0
-2. Gemini API → ~$0.001/1K tokens
-3. Claude API → ~$0.008/1K tokens
-4. 規則引擎降級 → $0
+AI 調用順序 (ADR-052 Phase 24 AI Router):
+1. OllamaToolProvider  → llama3.1:8b (tool calling, $0)
+2. openclaw_nemo       → Nemotron via Ollama ($0)
+3. Gemini Flash        → ~$0.001/1K tokens
+4. NVIDIA NIM          → ~$0.002/1K tokens (備援)
+5. 規則引擎降級        → $0
 ```

 **鐵律**：RCA 分析必須優先使用本地 Ollama，雲端 API 僅作為備援。
+**絞殺者開關**：`USE_AI_ROUTER=true` 啟用 ADR-052 Router。

 ### 2.2 Human-in-the-Loop (人機協作)

 ```
-風險等級與授權需求：
-LOW      → 自動執行 (0 簽核)
-MEDIUM   → 單人簽核 (1 簽核)
-CRITICAL → Multi-Sig (2 簽核)
+風險等級與授權需求 (Sprint 5.1 Data Safety Guardrails):
+LOW            → 自動執行 (0 簽核)
+STANDARD_HITL  → 單人簽核 (1 簽核) — Telegram 按鈕
+CRITICAL_HITL  → Multi-Sig (2 簽核) — 雙人確認
+BLOCK          → 永遠拒絕 — Stateful 服務 (postgres/redis/velero)
 ```

 **鐵律**：所有 CRITICAL 操作必須經過人類簽核，禁止自動放行。
+**新增 (Sprint 5.1)**：BLOCK 層攔截 Stateful 服務，無論信心多高。

 ### 2.3 Defense-in-Depth (縱深防禦)

 ```
 執行前檢查清單：
-1. Dry-run 驗證資源存在
-2. RBAC 權限檢查
-3. Blast Radius 評估
-4. AuditLog 記錄
+1. Guardrail 檢查 (BLOCK 層先行) ← 新增 Sprint 5.1
+2. Dry-run 驗證資源存在 (K8s API)
+3. RBAC 權限檢查
+4. Blast Radius 評估
+5. AuditLog 記錄
+6. K8S_API_SERVER_URL override (ADR-059: ClusterIP 不可達時用節點 IP)
 ```

 **鐵律**：執行前必須通過 Dry-run 驗證，禁止跳過。
@@ -63,6 +71,8 @@ CRITICAL → Multi-Sig (2 簽核)
 - 建議行動
 - 信心指數
 - 決策理由
+- 使用模型名稱 (Telegram 顯示)
+- Guardrail 拒絕原因 (若被擋)
 ```

 **鐵律**：AI 輸出必須結構化且可解釋，禁止黑箱決策。
@@ -75,45 +85,83 @@ CRITICAL → Multi-Sig (2 簽核)

 | 操作 | kubectl 指令 | 風險等級 |
 |------|-------------|----------|
-| 重啟 Deployment | `kubectl rollout restart deployment/<name>` | MEDIUM |
-| 刪除 Pod | `kubectl delete pod <name>` | MEDIUM |
-| 擴展副本 | `kubectl scale deployment/<name> --replicas=N` | LOW |
-| 查看日誌 | `kubectl logs <pod>` | LOW |
-| 查看狀態 | `kubectl get pods/deployments/services` | LOW |
+| 重啟 Deployment | `kubectl rollout restart deployment/<name> -n <ns>` | MEDIUM |
+| 刪除 Pod (by name) | `kubectl delete pod <name> -n <ns>` | MEDIUM |
+| 刪除 Pod (by label) | `kubectl delete pods -l <selector> -n <ns>` | MEDIUM |
+| 擴展副本 | `kubectl scale deployment/<name> --replicas=N -n <ns>` | LOW |
+| 查看日誌 | `kubectl logs <pod> -n <ns> --tail=N` | LOW |
+| 查看狀態 | `kubectl get pods/deployments/services -n <ns>` | LOW |
+| 查看資源詳情 | `kubectl describe <type> <name> -n <ns>` | LOW |

 ### 3.2 Forbidden Operations (禁止操作)

 | 操作 | 原因 |
 |------|------|
-| `kubectl delete namespace` | 影響範圍過大 |
-| `kubectl delete pvc` | 可能導致資料遺失 |
-| `kubectl apply -f` (未審核 YAML) | 可能引入惡意配置 |
+| `kubectl delete namespace *` | 影響範圍過大 |
+| `kubectl delete pvc *` | 可能導致資料遺失 |
+| `kubectl apply -f *` (未審核 YAML) | 可能引入惡意配置 |
 | 任何 `--force` 旗標 | 繞過安全檢查 |
+| `kubectl exec *` | 直接進入容器有安全風險 |
+| 任何 Stateful 服務操作 | BLOCK 層攔截 (Sprint 5.1) |
+
+### 3.3 ADR-067 五大 Ollama 應用 (Phase 30-34)
+
+| Phase | 功能 | 模型 | 狀態 |
+|-------|------|------|------|
+| 30 | Drift 報告中文摘要 | qwen2.5:7b | ✅ |
+| 31 | Log 異常摘要 | deepseek-r1:14b | ✅ |
+| 32 | PR 自動審查 | qwen2.5-coder:7b | ✅ |
+| 33 | RAG pgvector 知識庫 | nomic-embed-text (768-dim) | ✅ 5814 chunks |
+| 34 | 圖片分析 | llava:latest | ✅ |
+
+**RAG 查詢**：`GET /api/v1/knowledge/rag/query?q=<query>&limit=5`
+**Telegram 指令**：`/rag <問題>` 直接查詢知識庫
+
+### 3.4 Phase 25 主動防禦能力
+
+| 能力 | 說明 |
+|------|------|
+| Config Drift Detection | 每小時比對 Git YAML vs K8s 實際狀態 |
+| Auto-Harvesting | Anti-Pattern 閉環攔截 (symptoms_hash 去重) |
+| Sensor Agent | 110/188 主機三層採集 (NodeMetrics/Journal/Probe) |
+| Velero 備份 | 每日自動備份，Guardrail BLOCK 保護 |

 ---

 ## 4. Communication Protocol (通訊協議)

-### 4.1 Telegram 訊息壓縮原則
+### 4.1 Telegram 訊息格式

-**強制格式**：
+**告警格式**：

 ```
-[狀態] [資源] [根因摘要]
-💡 建議: [操作]
+[嚴重度] [資源名稱] | [根因摘要]
+模型: <model_name> | 後端: <backend>
+💡 建議: [操作] (信心: XX%)
 ⏱️ 預計停機: [時間]

-[✅ 簽核] [❌ 拒絕]
+[✅ 批准] [❌ 拒絕]
 ```

-**範例**：
+**自動修復完成格式** (Sprint 5.1 新增)：

 ```
-🚨 CRITICAL | api-server-7d4b8c9f5-xk2m3 | OOMKilled
-💡 建議: DELETE_POD (重啟 Pod)
-⏱️ 預計停機: ~30s
+✅ 已自動修復
+動作: <action>
+結果: <outcome>
+Playbook: <id>
+```
+*(自動修復後按鈕自動移除)*

-[✅ 簽核] [❌ 拒絕]
+**RAG 查詢回覆格式**：
+
+```
+📚 知識庫查詢結果
+問題: <query>
+找到 <N> 個相關片段
+
+[來源1] <title>: <摘要>
+[來源2] <title>: <摘要>
 ```

 ### 4.2 字數限制
@@ -131,6 +179,8 @@ CRITICAL → Multi-Sig (2 簽核)
 - ❌ 禁止在 Telegram 輸出長篇大論
 - ❌ 禁止使用模糊語言 ("可能"、"或許")
 - ❌ 禁止輸出未驗證的 kubectl 指令
+- ❌ 禁止使用 Emoji（前端用 Lucide/SVG icon）
+- ❌ 禁止在自動修復後保留批准/拒絕按鈕

 ---

@@ -143,14 +193,20 @@ CRITICAL → Multi-Sig (2 簽核)
 3. **NEVER** execute without Dry-run validation
 4. **NEVER** auto-approve CRITICAL actions
 5. **NEVER** output unstructured responses
+6. **NEVER** use `NEXT_PUBLIC_*` with internal IPs (build-time injection)
+7. **NEVER** touch Stateful services (postgres/redis/velero) — BLOCK layer ← Sprint 5.1
+8. **NEVER** trigger flywheel for heartbeat alerts (NoAlertsReceived2Hours 等) ← Sprint 5.1

 ### 5.2 必須遵守

 1. **MUST** use Pydantic strict mode for response validation
 2. **MUST** log all decisions to AuditLog
 3. **MUST** respect user whitelist for Telegram signatures
-4. **MUST** follow AI_FALLBACK_ORDER for LLM calls
+4. **MUST** follow AI_FALLBACK_ORDER (ADR-052)
 5. **MUST** compress Telegram messages per 4.1 protocol
+6. **MUST** use K8S_API_SERVER_URL override when ClusterIP unreachable
+7. **MUST** check Guardrail (BLOCK layer) before any auto-repair ← Sprint 5.1
+8. **MUST** remove Telegram buttons after auto-repair completes ← Sprint 5.1

 ---

@@ -159,32 +215,69 @@ CRITICAL → Multi-Sig (2 簽核)
 ### 6.1 AI Provider 失敗

 ```python
-# 備援順序
-AI_FALLBACK_ORDER = ["ollama", "gemini", "claude"]
+# 備援順序 (ADR-052)
+AI_FALLBACK_ORDER = ["ollama_tool", "openclaw_nemo", "gemini", "nvidia"]

 # 全部失敗時
 → 使用規則引擎產生保守建議
-→ 標註 "LOW CONFIDENCE"
+→ 標註 "LOW CONFIDENCE (rule-engine fallback)"
 → 強制要求人類審核
 ```

 ### 6.2 K8s 連線失敗

 ```python
-# 處理方式
+# 處理方式 (ADR-059)
+→ 嘗試 K8S_API_SERVER_URL override (https://192.168.0.120:6443)
 → 記錄錯誤到 AuditLog
 → 通知統帥 (Telegram)
 → 禁止執行任何操作
 → 等待人工介入
 ```

+### 6.3 Sensor Agent 告警風暴防護
+
+```python
+# sensor:dedup:{fingerprint} TTL=600s
+→ 同一告警 10 分鐘內只送一次到 Redis stream
+→ Incident Engine 透過 fingerprint 聚合重複告警
+→ 心跳/看門狗告警排除飛輪觸發
+```
+
+### 6.4 Guardrail 攔截處理 (Sprint 5.1)
+
+```python
+# BLOCK 層攔截
+→ 記錄到 alert_operation_log (event_type: GUARDRAIL_BLOCK)
+→ 通知統帥原因
+→ 不執行任何 K8s 操作
+→ 不進入審核流程
+```
+
 ---

-## 7. Version History
+## 7. Infrastructure Context (基礎設施)
+
+| 主機 | IP | 角色 |
+|------|----|------|
+| 基礎設施金庫 | 192.168.0.110 | Harbor, Gitea, Sentry, Langfuse |
+| K3s Master | 192.168.0.120 | awoooi-prod namespace |
+| K3s Worker | 192.168.0.121 | awoooi-prod workloads |
+| AI/Web 中心 | 192.168.0.188 | PostgreSQL, Redis:6380, Ollama, Nginx |
+
+**CI/CD**: Gitea (ADR-039) — `git push gitea main` 觸發部署
+**備份**: Velero 每日自動備份 (awoooi-executor ServiceAccount)
+**監控**: Prometheus 35/35 targets up，Grafana 3 dashboards (ai/infra/nvidia)
+
+---
+
+## 8. Version History

 | 版本 | 日期 | 變更 |
 |------|------|------|
-| 5.0 | 2026-03-21 | OpenClaw 實體化升級，新增 Telegram Gateway |
+| 5.6 | 2026-04-10 | Sprint 5.1 Guardrail、Phase 30-34 Ollama 五大應用、RAG 知識庫、飛輪閉環、B5 整合測試 |
+| 5.5 | 2026-04-09 | Phase 25 主動防禦、Sensor Agent、Drift Detection、ADR-052 AI Router、ADR-059 K8s ClusterIP fix |
+| 5.0 | 2026-03-21 | OpenClaw 實體化升級，Telegram Gateway |
 | 4.0 | 2026-03-20 | OpenClaw 核心功能完成 |
 | 3.0 | 2026-03-19 | Multi-Sig 信任引擎 |
 | 2.0 | 2026-03-18 | HITL 簽核流程 |
@@ -192,4 +285,4 @@ AI_FALLBACK_ORDER = ["ollama", "gemini", "claude"]

 ---

-**「為了 AWOOOI 的榮耀，全面自動化，絕不妥協！」** 🎖️
+**「零干預維運，以人為本的決策。知識沉澱，系統自癒。」**
--- a/apps/api/.cd-trigger
+++ b/apps/api/.cd-trigger
@@ -0,0 +1 @@
+# 2026-04-05 warm-up deploy triggered
--- a/apps/api/CHANGELOG.md
+++ b/apps/api/CHANGELOG.md
@@ -0,0 +1 @@
+# Sprint 3+4+F deployed 2026-04-07 16:00
--- a/apps/api/Dockerfile
+++ b/apps/api/Dockerfile
@@ -6,6 +6,11 @@
 #
 # 注意: 必須從 monorepo 根目錄執行，否則無法存取 packages/

+# syntax=docker/dockerfile:1
+# 首席架構師 Review C1 (2026-04-05 Claude Code): BuildKit inline cache 需要 syntax 宣告
+# BUILDKIT_INLINE_CACHE=1 才能真正把 cache metadata 寫入 image
+ARG BUILDKIT_INLINE_CACHE=0
+
 FROM python:3.11-slim AS builder

 WORKDIR /app
@@ -14,22 +19,26 @@ WORKDIR /app
 COPY --from=ghcr.io/astral-sh/uv:0.6.9 /uv /bin/uv

 # Phase 6.4i: 複製本地 packages 到 Docker context
-# 順序重要: 先複製 packages，再複製 api (利用 Docker layer cache)
 COPY packages/lewooogo-data/ /packages/lewooogo-data/
 COPY packages/lewooogo-brain/ /packages/lewooogo-brain/

-# 複製 API 依賴文件 (pyproject.toml 需要 README.md)
+# 複製 API 依賴文件（只複製 metadata，不含 src/）
 COPY apps/api/pyproject.toml apps/api/README.md ./

-# 複製 src 目錄 (hatchling build 需要)
-COPY apps/api/src/ ./src/
-
-# 安裝本地 packages 與 API 依賴 (合併 RUN 減少 layer)
-# 注意: `uv pip install .` 從 pyproject.toml 安裝依賴
-RUN uv pip install --system --no-cache /packages/lewooogo-data && \
+# 首席架構師 Review C3 (2026-04-05 Claude Code):
+# 原始問題：COPY src/ 在 pip install 之前，src 任何變更都讓 deps layer 失效
+# 修復：先安裝 local packages，再用 --no-build-isolation 只安裝 pyproject 的依賴項
+#       （不 build wheel，不需要 src/），src/ 在之後才 COPY
+# 注意：--no-sources 不被 uv 支援，改用建立 stub src 讓 hatchling 可以解析
+RUN mkdir -p src/awoooi_api && \
+    touch src/awoooi_api/__init__.py && \
+    uv pip install --system --no-cache /packages/lewooogo-data && \
    uv pip install --system --no-cache /packages/lewooogo-brain && \
    uv pip install --system --no-cache .

+# deps 安裝完後才複製真正的 src（使 deps layer 可 cache）
+COPY apps/api/src/ ./src/
+
 # Production stage
 FROM python:3.11-slim

@@ -44,6 +53,24 @@ COPY --from=builder /usr/local/bin /usr/local/bin
 ARG CACHE_BUST=none
 COPY apps/api/src/ ./src/
 COPY apps/api/models.json ./models.json
+# 2026-04-09 ogt: 規則引擎配置 — alert_rule_engine.py 從此檔載入規則
+COPY apps/api/alert_rules.yaml ./alert_rules.yaml
+# 2026-04-10 Claude Sonnet 4.6: drift_detector 需要 k8s/ YAML 做 Git state 比對
+COPY k8s/ ./k8s/
+# 2026-04-10 Claude Sonnet 4.6: RAG 知識庫索引來源 (ADR-067 Phase 33)
+COPY docs/ ./docs/
+COPY .agents/skills/ ./.agents/skills/
+# 2026-04-12 ogt (ADR-073 P2-1): CronJob 腳本 — 獨立腳本取代 inline Python
+COPY scripts/ ./scripts/
+
+# Install openssh-client + curl — SSH_COMMAND Playbook + healthcheck
+# Install kubectl — drift_detector 需要 kubectl 讀取 K8s 實際狀態
+# (2026-04-09 Claude Sonnet 4.6 Asia/Taipei, Bug #6 修正 — python:3.11-slim 無 openssh-client)
+# (2026-04-10 Claude Sonnet 4.6 Asia/Taipei: drift kubectl_error — No such file or directory: 'kubectl')
+RUN apt-get update && apt-get install -y --no-install-recommends openssh-client curl && \
+    curl -LO "https://dl.k8s.io/release/v1.29.0/bin/linux/amd64/kubectl" && \
+    chmod +x kubectl && mv kubectl /usr/local/bin/kubectl && \
+    rm -rf /var/lib/apt/lists/*

 # Create non-root user
 RUN useradd -m -u 1000 appuser && chown -R appuser:appuser /app
@@ -52,9 +79,10 @@ USER appuser
 # Expose port
 EXPOSE 8000

-# Health check (使用正確的 API 路徑)
+# 首席架構師 Review S3 (2026-04-05 Claude Code):
+# httpx 可能只在 dev deps，生產 image 不保證有。改用 curl（python:3.11-slim 內建）
 HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
-    CMD python -c "import httpx; httpx.get('http://localhost:8000/api/v1/health', timeout=5)" || exit 1
+    CMD curl -sf http://localhost:8000/api/v1/health || exit 1

 # Run application
 CMD ["uvicorn", "src.main:app", "--host", "0.0.0.0", "--port", "8000"]
--- a/apps/api/alert_rules.yaml
+++ b/apps/api/alert_rules.yaml
@@ -0,0 +1,786 @@
+# AWOOOI OpenClaw 告警規則匹配引擎
+# ============================================================
+# 格式說明:
+#   match.alertname   : Prometheus alertname 完全匹配 (list = OR)
+#   match.alert_type  : alert_type 關鍵字 (list = OR, 部分匹配)
+#   match.message     : message 關鍵字 (list = OR, 部分匹配, 不分大小寫)
+#   response.*        : 回應模板，支援變數 {target} {host} {container} {instance} {job} {namespace}
+#   responsibility    : FE / BE / INFRA / DB / COLLAB
+#   risk              : low / medium / critical
+#   confidence        : 0.0 (規則匹配固定值，禁止偽造)
+#
+# 修改規則: 不需要重新部署，重啟 API Pod 即可熱載入
+# 新增規則: 在 rules 清單末尾加入，priority 越小越優先
+# 2026-04-09 ogt: 初版，從 openclaw.py _generate_mock_response 抽出
+# ============================================================
+
+version: "1.0.0"
+updated_at: "2026-04-09"
+
+rules:
+  # ── Docker / Host 層 ────────────────────────────────────────
+
+  - id: docker_container_unhealthy
+    priority: 10
+    description: Docker 容器 healthcheck 失敗
+    match:
+      alertname:
+        - DockerContainerUnhealthy
+      message:
+        - unhealthy
+        - health check
+        - healthcheck
+    response:
+      action_title: "檢查 Docker 容器 {container} 健康狀態"
+      description: "⚙️ 規則匹配: Docker 容器 {container} ({host}) healthcheck 失敗。常見原因: 應用程式啟動慢、healthcheck 指令錯誤、依賴服務未就緒。"
+      suggested_action: RESTART_DEPLOYMENT
+      kubectl_command: "ssh {host} 'docker inspect {container} --format=\"{{.State.Health.Status}}\" && docker restart {container}'"
+      estimated_downtime: "~30s"
+      risk: medium
+      responsibility: INFRA
+      responsibility_reasoning: "Docker 容器健康檢查失敗屬基礎設施團隊責任，需確認 healthcheck 設定與容器狀態"
+      secondary_teams: [BE]
+      optimization:
+        - type: HEALTHCHECK
+          description: "確認 healthcheck 指令在容器內可執行 (mc/curl 是否存在)"
+          command: "ssh {host} 'docker exec {container} sh -c \"mc ready local 2>/dev/null || curl -sf http://localhost:9000/minio/health/live\"'"
+      reasoning: "[規則匹配] Docker healthcheck 失敗先 restart 恢復服務，同時確認 healthcheck 指令正確。"
+
+  - id: target_down
+    priority: 20
+    description: Prometheus scrape target 下線 — 自動重啟 exporter
+    match:
+      alertname:
+        - TargetDown
+        - InstanceDown
+    response:
+      action_title: "重啟 {job} exporter on {host}"
+      description: "⚙️ 規則匹配: Prometheus 無法抓取 {instance} ({job}) 指標。自動重啟主機上的 exporter container。"
+      suggested_action: RESTART_DEPLOYMENT
+      kubectl_command: "ssh {host} 'docker restart $(docker ps -a --filter name=exporter --format \"{{.Names}}\" | head -1) 2>/dev/null || systemctl restart node_exporter 2>/dev/null || systemctl restart prometheus-node-exporter'"
+      estimated_downtime: "~30s"
+      risk: medium
+      responsibility: INFRA
+      responsibility_reasoning: "Prometheus scrape 目標下線屬基礎設施監控範疇，自動重啟 exporter"
+      secondary_teams: []
+      optimization:
+        - type: MONITORING
+          description: "確認 exporter 重啟後可被 Prometheus scrape"
+          command: "ssh {host} 'curl -s http://localhost:{port}/metrics | head -3'"
+      reasoning: "[規則匹配] Prometheus target 下線，SSH 到主機重啟 exporter container 或 systemd service。"
+
+  # ── K8s Pod 層 ──────────────────────────────────────────────
+
+  - id: oom_killed
+    priority: 30
+    description: Pod OOMKilled 記憶體不足
+    match:
+      # 2026-04-10 Claude Sonnet 4.6: Phase 2 飛輪修復 — 補齊 Prometheus alertname 變體
+      alertname:
+        - PodOOMKilled
+        - KubePodOOMKilled
+        - KubernetesMemoryPressure
+        - NodeMemoryUsageHigh
+        - HighMemoryUsage
+      alert_type:
+        - memory
+      message:
+        - oomkilled
+        - oom
+        - out of memory
+    response:
+      action_title: "刪除異常 Pod {target} (OOMKilled)"
+      description: "⚙️ 規則匹配: {target} 發生 OOMKilled，根因為 JVM Heap 配置與 K8s memory limit 不匹配或存在記憶體洩漏。"
+      suggested_action: DELETE_POD
+      kubectl_command: "kubectl delete pod {target} -n {namespace}"
+      estimated_downtime: "~30s"
+      risk: critical
+      responsibility: BE
+      responsibility_reasoning: "OOMKilled 通常源於應用程式記憶體配置不當，屬後端團隊責任範圍"
+      secondary_teams: [INFRA]
+      optimization:
+        - type: RESOURCE_LIMIT
+          description: "調整 memory limit 至 1Gi 並確保 JVM -Xmx 不超過 70%"
+          command: "kubectl set resources deployment/{target} -c {target} --limits=memory=1Gi -n {namespace}"
+        - type: HPA
+          description: "啟用基於記憶體的 HPA 自動擴展"
+          command: "kubectl autoscale deployment {target} --memory-percent=80 --min=2 --max=5 -n {namespace}"
+      reasoning: "[規則匹配] Pod OOMKilled 後 ReplicaSet 將自動重建，但需同步修正資源配置防止復發。"
+
+  # 2026-04-12 ogt: Host CPU 告警獨立規則 — node_exporter 告警無 pod/deployment label
+  # 2026-04-16 ogt + Claude Sonnet 4.6: 補齊主機層所有常見 Prometheus alertname
+  #   原則：主機層告警 = 只能通知 + 建議 SSH 排查，絕對禁止 kubectl restart
+  - id: host_resource_alert
+    priority: 45
+    description: Host 主機資源告警 (node_exporter — CPU/記憶體/負載/磁碟增長，非 K8s workload)
+    match:
+      alertname:
+        # CPU 相關
+        - HostHighCpuLoad
+        - NodeCPUUsageHigh
+        - NodeHighCpuLoad
+        # 負載相關
+        - HostHighLoadAverage
+        - NodeLoadAverageHigh
+        - HostLoadAverageHigh
+        # 記憶體相關
+        - HostOutOfMemory
+        - HostMemoryUnderMemoryPressure
+        - HostMemoryUsageHigh
+        - NodeMemoryPressure
+        # 磁碟 I/O 相關
+        - HostUnusualDiskReadLatency
+        - HostUnusualDiskWriteLatency
+        - HostUnusualDiskReadRate
+        - HostUnusualDiskWriteRate
+        - HostDiskWillFillIn24Hours
+        - HostOutOfDiskSpace
+        # 網路相關
+        - HostUnusualNetworkThroughputIn
+        - HostUnusualNetworkThroughputOut
+        # 系統服務
+        - HostSystemdServiceCrashed
+        - HostKernelVersionDeviations
+        - HostOomKillDetected
+        - HostEdacCorrectableErrors
+        - HostEdacUncorrectableErrors
+        - HostClockSkewDetected
+        - HostClockNotSynchronising
+    response:
+      action_title: "⚠️ 主機告警 — 需 SSH 人工排查"
+      description: "⚠️ 主機層告警（node_exporter）。此告警源自主機資源，無法透過 kubectl 自動修復。請 SSH 登入主機排查根因：top / htop / df -h / journalctl -xe。"
+      suggested_action: NO_ACTION
+      kubectl_command: ""
+      estimated_downtime: "N/A"
+      risk: low
+      responsibility: INFRA
+      reasoning: "[規則匹配] 主機層資源告警無法自動修復，需人工登入確認高負載/高記憶體/磁碟根因後決策。禁止 kubectl restart（node_exporter 不是 K8s 服務）。"
+
+  - id: high_cpu
+    priority: 40
+    description: K8s Pod/Deployment CPU 使用率過高
+    match:
+      # 2026-04-10 Claude Sonnet 4.6: Phase 2 飛輪修復 — 補齊 Prometheus alertname 變體
+      # 2026-04-12 ogt: 移除 HostHighCpuLoad/NodeCPUUsageHigh → 已獨立為 host_cpu_high 規則
+      alertname:
+        - HighCPUUsage
+        - ContainerCpuUsageSecondsTotal
+        - CPUThrottlingHigh
+        - KubeCPUOvercommit
+      alert_type:
+        - cpu
+        - high_cpu
+    response:
+      action_title: "擴展 {target} 副本數 + 啟用 HPA"
+      description: "⚙️ 規則匹配: {target} CPU 使用率過高，根因為流量突增或計算密集任務未配置自動擴展。"
+      suggested_action: SCALE_DEPLOYMENT
+      kubectl_command: "kubectl scale deployment {target} --replicas=3 -n {namespace}"
+      estimated_downtime: "0"
+      risk: medium
+      responsibility: INFRA
+      responsibility_reasoning: "自動擴展策略未配置或閾值過高，屬基礎設施團隊責任"
+      secondary_teams: [BE]
+      optimization:
+        - type: RESOURCE_LIMIT
+          description: "增加 CPU request 確保 QoS 為 Guaranteed"
+          command: "kubectl set resources deployment/{target} --requests=cpu=500m --limits=cpu=2000m -n {namespace}"
+      reasoning: "[規則匹配] 水平擴展可即時分散負載，同時建議配置 HPA 防止復發。"
+
+  - id: http_5xx
+    priority: 50
+    description: HTTP 5xx 錯誤率過高
+    match:
+      alert_type:
+        - http
+      message:
+        - "5xx"
+        - "502"
+        - "503"
+        - "500"
+    response:
+      action_title: "重啟 {target} + 檢查上游服務"
+      description: "⚙️ 規則匹配: {target} 產生 HTTP 5xx 錯誤，可能為應用程式例外或上游服務不可達。"
+      suggested_action: RESTART_DEPLOYMENT
+      kubectl_command: "kubectl rollout restart deployment/{target} -n {namespace}"
+      estimated_downtime: "~1 min"
+      risk: critical
+      responsibility: COLLAB
+      responsibility_reasoning: "HTTP 5xx 可能源於前端路由、後端邏輯或基礎設施，需多團隊協同排查"
+      secondary_teams: [FE, BE, INFRA]
+      optimization:
+        - type: CIRCUIT_BREAKER
+          description: "配置熔斷器防止故障擴散"
+          command: "# Istio VirtualService outlierDetection 配置"
+      reasoning: "[規則匹配] HTTP 錯誤需協同排查，先重啟恢復服務同時通知相關團隊。"
+
+  - id: pod_crash
+    priority: 60
+    description: Pod CrashLoopBackOff
+    match:
+      # 2026-04-10 Claude Sonnet 4.6: Phase 2 飛輪修復 — 補齊 Prometheus alertname 變體
+      alertname:
+        - KubePodCrashLooping
+        - PodCrashLoopBackOff
+        - KubernetesPodCrashLooping
+      alert_type:
+        - pod_crash
+        - crash
+      message:
+        - crashloop
+        - crash
+        - backoff
+    response:
+      action_title: "診斷 {target} CrashLoop 根因"
+      description: "⚙️ 規則匹配: {target} 進入 CrashLoopBackOff，需檢查啟動錯誤日誌。"
+      suggested_action: RESTART_DEPLOYMENT
+      kubectl_command: "kubectl logs {target} -n {namespace} --previous --tail=50"
+      estimated_downtime: "依根因而定"
+      risk: critical
+      responsibility: BE
+      responsibility_reasoning: "Pod crash 通常源於應用程式啟動錯誤，屬後端團隊責任"
+      secondary_teams: [INFRA]
+      optimization:
+        - type: LIVENESS_PROBE
+          description: "調整 liveness probe 初始延遲防止誤殺"
+          command: "# 調整 initialDelaySeconds >= 應用啟動時間"
+      reasoning: "[規則匹配] 先查 previous log 確認 crash 原因，再決定修復策略。"
+
+  # ── 資料庫層 ─────────────────────────────────────────────────
+
+  # 2026-04-16 ogt + Claude Sonnet 4.6: PostgreSQL 監控告警 — 磁碟/資源類，絕對不能重啟
+  # 根因：PostgreSQLDiskGrowthRate 落 generic_fallback → 輸出 kubectl rollout restart postgresql（錯誤！）
+  - id: postgresql_disk_monitoring
+    priority: 68
+    description: PostgreSQL 磁碟/增長率/exporter 監控告警（不重啟資料庫）
+    match:
+      alertname:
+        - PostgreSQLDiskGrowthRate
+        - PostgreSQLDiskUsageHigh
+        - PostgreSQLDiskFull
+        - PostgresExporterDown
+        - PostgreSQLExporterDown
+        - PostgreSQLTableBloat
+        - PostgreSQLVacuumRequired
+        - PostgreSQLReplicationLag
+        - PostgreSQLTooManyConnections
+    response:
+      action_title: "⚠️ PostgreSQL 監控告警 — 需人工排查，禁止重啟"
+      description: "⚠️ PostgreSQL 資源/監控告警。磁碟增長過快或 exporter 異常，重啟資料庫會造成資料風險。請登入排查磁碟用量或 WAL 狀態。"
+      suggested_action: NO_ACTION
+      kubectl_command: "kubectl exec -n {namespace} deployment/postgresql -- psql -U postgres -c 'SELECT pg_database_size(current_database()), pg_size_pretty(pg_database_size(current_database()));'"
+      estimated_downtime: "N/A"
+      risk: medium
+      responsibility: DB
+      responsibility_reasoning: "PostgreSQL 磁碟告警需 DBA 評估，自動重啟資料庫有資料丟失風險，必須人工確認"
+      secondary_teams: [INFRA]
+      reasoning: "[規則匹配] PostgreSQL 磁碟增長/監控告警，絕對禁止自動重啟資料庫。需 DBA 人工確認磁碟用量、WAL 清理、VACUUM 狀態。"
+
+  - id: postgresql_down
+    priority: 70
+    description: PostgreSQL 服務下線
+    match:
+      alertname:
+        - PostgreSQLDown
+      message:
+        - postgresql
+        - postgres
+        - pg down
+    response:
+      action_title: "重啟 PostgreSQL {target}"
+      description: "⚙️ 規則匹配: PostgreSQL ({instance}) 無法連線。常見原因: 程序崩潰、磁碟空間不足、連線數超限。"
+      suggested_action: RESTART_DEPLOYMENT
+      kubectl_command: "kubectl rollout restart deployment/postgresql -n {namespace}"
+      estimated_downtime: "~2 min"
+      risk: critical
+      responsibility: DB
+      responsibility_reasoning: "PostgreSQL 下線屬資料庫團隊責任，需立即確認資料完整性"
+      secondary_teams: [INFRA, BE]
+      optimization:
+        - type: HEALTH_CHECK
+          description: "確認 PostgreSQL 連線與資料完整性"
+          command: "kubectl exec -n {namespace} deployment/postgresql -- psql -U postgres -c 'SELECT 1'"
+      reasoning: "[規則匹配] PostgreSQL 下線影響所有依賴服務，優先重啟恢復，同時確認資料無損。"
+
+  - id: postgresql_connection_pool
+    priority: 75
+    description: PostgreSQL 連線池耗盡或接近上限
+    match:
+      alertname:
+        - PostgreSQLConnectionPoolNearLimit
+        - PostgreSQLConnectionPoolExhausted
+      message:
+        - connection pool
+        - connections
+        - pgbouncer
+    response:
+      action_title: "清理 PostgreSQL 閒置連線"
+      description: "⚙️ 規則匹配: PostgreSQL 連線池使用率過高，可能導致新請求被拒絕。"
+      suggested_action: RESTART_DEPLOYMENT
+      kubectl_command: "kubectl exec -n {namespace} deployment/postgresql -- psql -U postgres -c 'SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE state = ''idle'' AND state_change < NOW() - INTERVAL ''5 minutes'';'"
+      estimated_downtime: "0"
+      risk: critical
+      responsibility: DB
+      responsibility_reasoning: "連線池管理屬資料庫設定範疇"
+      secondary_teams: [BE]
+      optimization:
+        - type: CONNECTION_POOL
+          description: "調整 max_connections 或啟用 PgBouncer 連線池"
+          command: "kubectl exec -n {namespace} deployment/postgresql -- psql -U postgres -c 'SHOW max_connections;'"
+      reasoning: "[規則匹配] 清理閒置連線是最快恢復手段，同時需排查連線洩漏。"
+
+  - id: postgresql_slow_queries
+    priority: 80
+    description: PostgreSQL 慢查詢告警
+    match:
+      alertname:
+        - PostgreSQLSlowQueries
+        - PostgreSQLLockWaiting
+      message:
+        - slow query
+        - lock wait
+        - deadlock
+    response:
+      action_title: "診斷 PostgreSQL 慢查詢 + 索引優化"
+      description: "⚙️ 規則匹配: PostgreSQL 存在慢查詢或鎖等待，影響系統整體性能。"
+      suggested_action: RESTART_DEPLOYMENT
+      kubectl_command: "kubectl exec -n {namespace} deployment/postgresql -- psql -U postgres -c 'SELECT pid, query, state, wait_event_type, wait_event FROM pg_stat_activity WHERE state != ''idle'' ORDER BY query_start;'"
+      estimated_downtime: "0"
+      risk: medium
+      responsibility: DB
+      responsibility_reasoning: "慢查詢優化屬資料庫效能調優範疇"
+      secondary_teams: [BE]
+      optimization:
+        - type: INDEX
+          description: "使用 EXPLAIN ANALYZE 找出缺少索引的查詢"
+          command: "kubectl exec -n {namespace} deployment/postgresql -- psql -U postgres -c 'SELECT * FROM pg_stat_user_tables ORDER BY seq_scan DESC LIMIT 10;'"
+      reasoning: "[規則匹配] 先找出阻塞查詢，必要時 pg_terminate_backend 解除鎖定。"
+
+  # ── 基礎設施服務層 ──────────────────────────────────────────
+
+  - id: redis_down
+    priority: 85
+    description: Redis 服務下線
+    match:
+      alertname:
+        - RedisDown
+      message:
+        - redis
+        - cache down
+    response:
+      action_title: "重啟 Redis {target}"
+      description: "⚙️ 規則匹配: Redis ({instance}) 無法連線。影響 Session 管理、去重快取、AI Router 狀態。"
+      suggested_action: RESTART_DEPLOYMENT
+      kubectl_command: "kubectl rollout restart deployment/redis -n {namespace}"
+      estimated_downtime: "~30s"
+      risk: critical
+      responsibility: INFRA
+      responsibility_reasoning: "Redis 屬基礎設施快取層，下線影響多個上層服務"
+      secondary_teams: [BE]
+      optimization:
+        - type: HEALTH_CHECK
+          description: "確認 Redis 連線"
+          command: "kubectl exec -n {namespace} deployment/redis -- redis-cli ping"
+      reasoning: "[規則匹配] Redis 下線會導致去重失效和 AI Router 狀態丟失，需立即重啟。"
+
+  - id: ollama_down
+    priority: 90
+    description: Ollama AI 服務下線
+    match:
+      alertname:
+        - OllamaDown
+      message:
+        - ollama
+        - llm down
+        - ai service
+    response:
+      action_title: "重啟 Ollama 服務 on {host}"
+      description: "⚙️ 規則匹配: Ollama ({instance}) 無法連線。影響 AI 規則自動生成和本地推理。"
+      suggested_action: RESTART_DEPLOYMENT
+      kubectl_command: "ssh {host} 'systemctl restart ollama || docker restart ollama'"
+      estimated_downtime: "~2 min (model reload)"
+      risk: medium
+      responsibility: INFRA
+      responsibility_reasoning: "Ollama 屬 AI 推理基礎設施，由基礎設施團隊管理"
+      secondary_teams: []
+      optimization:
+        - type: HEALTH_CHECK
+          description: "確認 Ollama 狀態和已載入模型"
+          command: "curl -s http://{host}:11434/api/tags | jq '.models[].name'"
+      reasoning: "[規則匹配] Ollama 下線觸發 AI Router fallback 至 Gemini，重啟恢復本地推理能力。"
+
+  - id: minio_down
+    priority: 95
+    description: MinIO 物件儲存下線
+    match:
+      alertname:
+        - MinioDown
+      message:
+        - minio
+        - s3
+        - object storage
+    response:
+      action_title: "重啟 MinIO {target}"
+      description: "⚙️ 規則匹配: MinIO ({instance}) 無法連線。影響靜態資源和備份儲存。"
+      suggested_action: RESTART_DEPLOYMENT
+      kubectl_command: "ssh {host} 'docker restart minio'"
+      estimated_downtime: "~1 min"
+      risk: critical
+      responsibility: INFRA
+      responsibility_reasoning: "MinIO 屬物件儲存基礎設施"
+      secondary_teams: []
+      optimization:
+        - type: DISK_CHECK
+          description: "確認磁碟空間充足"
+          command: "ssh {host} 'df -h /data/minio'"
+      reasoning: "[規則匹配] MinIO 下線需先確認磁碟空間，再重啟服務。"
+
+  - id: minio_disk_high
+    priority: 96
+    description: MinIO 磁碟使用率過高
+    match:
+      alertname:
+        - MinioDiskUsageHigh
+        - MinioDiskUsageCritical
+      message:
+        - disk usage
+        - disk full
+        - storage
+    response:
+      action_title: "清理 MinIO 過期資料 on {host}"
+      description: "⚙️ 規則匹配: MinIO 磁碟使用率過高，需清理舊資料或擴展儲存空間。"
+      suggested_action: RESTART_DEPLOYMENT
+      kubectl_command: "ssh {host} 'df -h /data/minio && du -sh /data/minio/* | sort -rh | head -10'"
+      estimated_downtime: "0"
+      risk: critical
+      responsibility: INFRA
+      responsibility_reasoning: "磁碟空間管理屬基礎設施團隊責任"
+      secondary_teams: []
+      optimization:
+        - type: CLEANUP
+          description: "清理 MinIO 舊備份和 lifecycle policy"
+          command: "mc admin lifecycle add local --expiry-days 30"
+      reasoning: "[規則匹配] 磁碟滿會導致寫入失敗，需立即清理最大的目錄。"
+
+  - id: harbor_down
+    priority: 97
+    description: Harbor Registry 下線
+    match:
+      alertname:
+        - HarborDown
+      message:
+        - harbor
+        - registry
+        - docker registry
+    response:
+      action_title: "重啟 Harbor Registry on {host}"
+      description: "⚙️ 規則匹配: Harbor ({instance}) 無法連線。影響 CD 部署流程。"
+      suggested_action: RESTART_DEPLOYMENT
+      kubectl_command: "ssh {host} 'cd /data/harbor && docker-compose up -d'"
+      estimated_downtime: "~2 min"
+      risk: critical
+      responsibility: INFRA
+      responsibility_reasoning: "Harbor 是 CD 部署的核心依賴，屬基礎設施團隊責任"
+      secondary_teams: []
+      optimization:
+        - type: HEALTH_CHECK
+          description: "確認 Harbor 各組件狀態"
+          command: "ssh {host} 'cd /data/harbor && docker-compose ps'"
+      reasoning: "[規則匹配] Harbor 下線會阻塞所有 CD 部署，需立即重啟。"
+
+  # ── K8s 叢集層 ──────────────────────────────────────────────
+
+  - id: k3s_node_down
+    priority: 100
+    description: K3s 節點下線
+    match:
+      alertname:
+        - K3sNodeDown
+        - K3sVIPDown
+      message:
+        - node down
+        - node not ready
+        - k3s
+    response:
+      action_title: "確認 K3s 節點 {target} 狀態"
+      description: "⚙️ 規則匹配: K3s 節點下線，影響叢集可用性和 Pod 調度。"
+      suggested_action: RESTART_DEPLOYMENT
+      kubectl_command: "kubectl get nodes -o wide && kubectl describe node {target}"
+      estimated_downtime: "依節點恢復時間"
+      risk: critical
+      responsibility: INFRA
+      responsibility_reasoning: "K3s 叢集節點管理屬基礎設施團隊責任"
+      secondary_teams: []
+      optimization:
+        - type: NODE_DRAIN
+          description: "先 drain 節點確保 Pod 安全遷移"
+          command: "kubectl drain {target} --ignore-daemonsets --delete-emptydir-data"
+      reasoning: "[規則匹配] 節點下線需先確認主機可達性，必要時手動遷移 workload。"
+
+  - id: awoooi_api_down
+    priority: 105
+    description: AWOOOI API 服務下線
+    match:
+      alertname:
+        - AWOOOIApiDown
+        - OpenClawDown
+      message:
+        - awoooi api
+        - openclaw
+        - api down
+    response:
+      action_title: "重啟 AWOOOI API deployment"
+      description: "⚙️ 規則匹配: AWOOOI API 無法連線。影響所有告警處理和 AI 決策流程。"
+      suggested_action: RESTART_DEPLOYMENT
+      kubectl_command: "kubectl rollout restart deployment/awoooi-api -n awoooi"
+      estimated_downtime: "~1 min"
+      risk: critical
+      responsibility: BE
+      responsibility_reasoning: "AWOOOI API 是核心服務，屬後端團隊直接責任"
+      secondary_teams: [INFRA]
+      optimization:
+        - type: HEALTH_CHECK
+          description: "確認 API Pod 狀態和最近 log"
+          command: "kubectl get pods -n awoooi && kubectl logs -n awoooi deployment/awoooi-api --tail=50"
+      reasoning: "[規則匹配] AWOOOI API 下線需立即重啟，同時查 Pod log 確認根因。"
+
+  # ── 告警鏈路監控 ────────────────────────────────────────────
+
+  - id: alert_chain_broken
+    priority: 110
+    description: 告警鏈路中斷
+    match:
+      alertname:
+        - AlertChainBroken_Alertmanager
+        - AlertChainBroken_Sentry
+        - AlertChainBroken_SignOz
+        - AlertChainUnhealthy
+        - NoAlertsReceived2Hours
+      message:
+        - alert chain
+        - alertmanager
+        - no alerts
+    response:
+      action_title: "診斷告警鏈路中斷"
+      description: "⚙️ 規則匹配: 告警鏈路異常，可能導致真實告警無法送達 Telegram。"
+      suggested_action: RESTART_DEPLOYMENT
+      kubectl_command: "kubectl get pods -n monitoring && curl -s http://192.168.0.120:9093/api/v1/status | jq '.data.uptime'"
+      estimated_downtime: "監控盲區持續中"
+      risk: critical
+      responsibility: INFRA
+      responsibility_reasoning: "告警鏈路屬基礎設施監控體系，需立即修復確保可觀測性"
+      secondary_teams: [BE]
+      optimization:
+        - type: E2E_TEST
+          description: "發送測試告警驗證整條鏈路"
+          command: "curl -X POST http://192.168.0.125:32334/api/v1/test-alert -H 'Content-Type: application/json' -d '{\"test\": true}'"
+      reasoning: "[規則匹配] 告警鏈路中斷等同監控失明，最高優先修復。"
+
+  # ── GPU / AI 基礎設施 ────────────────────────────────────────
+
+  - id: nvidia_circuit_breaker
+    priority: 115
+    description: NVIDIA/Nemotron 熔斷器開啟
+    match:
+      alertname:
+        - NvidiaCircuitBreakerOpen
+        - NvidiaToolCallingHighErrorRate
+        - NvidiaToolCallingHighLatency
+      message:
+        - circuit breaker
+        - nvidia
+        - nemotron
+        - tool calling
+    response:
+      action_title: "確認 NVIDIA API 熔斷狀態"
+      description: "⚙️ 規則匹配: NVIDIA/Nemotron 熔斷器開啟或錯誤率過高，AI Router 已自動降級。"
+      suggested_action: RESTART_DEPLOYMENT
+      kubectl_command: "curl -s http://192.168.0.125:32334/api/v1/ai-router/status | jq '.providers'"
+      estimated_downtime: "0 (已自動 fallback)"
+      risk: medium
+      responsibility: BE
+      responsibility_reasoning: "AI Provider 熔斷管理屬後端 AI Router 責任範圍"
+      secondary_teams: []
+      optimization:
+        - type: CIRCUIT_BREAKER_RESET
+          description: "等待熔斷器自動恢復 (half-open 狀態)"
+          command: "curl -s http://192.168.0.125:32334/api/v1/ai-router/reset -X POST"
+      reasoning: "[規則匹配] AI Router 已自動降級至備援 Provider，監控熔斷器恢復狀態即可。"
+
+  # ── E2E / Smoke Test 告警 ────────────────────────────────────
+  # 2026-04-09 Claude Sonnet 4.6: E2E test 假告警識別，僅記錄不修復
+
+  - id: e2e_smoke_test
+    priority: 120
+    description: E2E Smoke Test / 告警鏈路驗證假告警
+    match:
+      alertname:
+        - E2E_SMOKE_TEST
+        - E2E_FINAL_SMOKE_TEST
+        - SmokeTest
+      instance_prefix:
+        - e2e-final-
+        - e2e-test-
+        - test-host
+        - smoke-test-
+      message:
+        - e2e smoke test
+        - smoke test
+        - please ignore
+        - e2e test
+        - e2e-final
+        - e2e-test
+        - e2e_smoke
+        - alert chain smoke
+    response:
+      action_title: "告警鏈路驗證成功 (E2E)"
+      description: "✅ E2E Smoke Test 告警已收到，告警鏈路正常。此告警僅用於驗證，無需修復動作。"
+      suggested_action: NO_ACTION
+      kubectl_command: ""
+      estimated_downtime: "N/A"
+      risk: low
+      responsibility: INFRA
+      responsibility_reasoning: "E2E smoke test 假告警，告警鏈路驗證用途，系統自動識別跳過修復"
+      secondary_teams: []
+      optimization: []
+      reasoning: "[規則匹配] E2E Smoke Test 假告警，僅確認告警鏈路暢通，無實際服務異常。"
+
+  # ── 備份失敗 ────────────────────────────────────────────────
+  # 2026-04-11 Claude Sonnet 4.6: backup 類告警屬主機層，無 K8s deployment 可重啟
+  # → TYPE-1 純資訊通知，不應出現 [重啟] 按鈕
+
+  - id: host_backup_failed
+    priority: 50
+    description: 備份任務失敗 (rsync/velero/HostBackupFailed)
+    match:
+      alertname:
+        - HostBackupFailed
+        - VeleroBackupFailed
+        - VeleroBackupNotRun
+        - BackupJobFailed
+    response:
+      action_title: "備份失敗，需人工確認"
+      description: "⚠️ 備份任務失敗，無自動修復動作。請人工確認備份腳本及磁碟空間。"
+      suggested_action: NO_ACTION
+      kubectl_command: ""
+      estimated_downtime: "N/A"
+      risk: medium
+      responsibility: INFRA
+      responsibility_reasoning: "備份失敗屬基礎設施維運問題，需人工介入確認根因"
+      secondary_teams: []
+      optimization: []
+      reasoning: "[規則匹配] 備份失敗無法自動修復，需人工排查備份腳本、磁碟空間及網路連通性。"
+
+  # ── DevOps 工具層 ─────────────────────────────────────────
+  # 2026-04-14 Claude Sonnet 4.6: Task 2.2 ADR-076 — 新增 devops_tool / ssl_cert / external_site 三類規則
+  # 設計原則: CI/CD 工具與外部服務均為 NO_ACTION，不可自動修復（誤操作風險過高）
+
+  - id: gitea_down
+    priority: 125
+    description: Gitea CI/CD 服務下線（不自動修復）
+    match:
+      alertname:
+        - GiteaDown
+        - GiteaServiceDown
+        - GiteaUnhealthy
+      message:
+        - gitea
+        - git server
+        - ci/cd down
+    response:
+      action_title: "Gitea ({instance}) 下線 — 需人工確認"
+      description: "⚠️ 規則匹配: Gitea CI/CD 服務 ({instance}) 無法連線，影響所有部署流程。不自動重啟（誤觸 CD 風險過高）。"
+      suggested_action: NO_ACTION
+      kubectl_command: ""
+      estimated_downtime: "N/A"
+      risk: critical
+      responsibility: INFRA
+      responsibility_reasoning: "Gitea 是 CI/CD 核心，自動重啟有誤觸部署風險，需人工確認狀態後手動操作"
+      secondary_teams: []
+      optimization:
+        - type: HEALTH_CHECK
+          description: "確認 Gitea 服務狀態"
+          command: "ssh {host} 'cd /data/gitea && docker compose ps && docker compose logs --tail=20 gitea'"
+      reasoning: "[規則匹配] Gitea 下線不自動修復，通知後由人工確認狀態再操作，避免 CD pipeline 誤觸發。"
+
+  - id: ssl_cert_expiring
+    priority: 126
+    description: SSL/TLS 憑證即將到期或已到期
+    match:
+      alertname:
+        - SSLCertExpiringSoon
+        - SSLCertExpired
+        - CertificateExpirationWarning
+        - TLSCertExpiring
+      message:
+        - ssl cert
+        - certificate expir
+        - tls cert
+        - cert will expire
+    response:
+      action_title: "SSL 憑證 ({instance}) 即將到期 — 需人工更新"
+      description: "⚠️ 規則匹配: SSL/TLS 憑證 ({instance}) 即將到期或已到期。無自動修復，需人工確認 cert-manager 或執行 certbot 更新。"
+      suggested_action: NO_ACTION
+      kubectl_command: ""
+      estimated_downtime: "N/A"
+      risk: medium
+      responsibility: INFRA
+      responsibility_reasoning: "SSL 憑證更新需域名驗證，屬基礎設施團隊責任"
+      secondary_teams: []
+      optimization:
+        - type: CERT_RENEWAL
+          description: "確認 cert-manager 自動更新狀態"
+          command: "kubectl get certificate,certificaterequest -A && kubectl get secret -n awoooi-prod | grep tls"
+      reasoning: "[規則匹配] SSL 憑證到期無法自動修復，需人工操作 certbot 或確認 cert-manager 自動更新是否正常。"
+
+  - id: external_site_down
+    priority: 127
+    description: 外部網站或服務下線（MoWooo 系列 / HTTP probe 失敗）
+    match:
+      alertname:
+        - MoWoooWorkDown
+        - MoWoooDevDown
+        - ExternalSiteDown
+        - WebsiteDown
+        - BlackboxProbeFailed
+      message:
+        - external site
+        - website down
+        - mowooo
+        - http probe failed
+        - probe failed
+    response:
+      action_title: "外部網站 {instance} 下線 — 僅通知"
+      description: "⚠️ 規則匹配: 外部網站 ({instance}) HTTP probe 失敗。此為外部服務，無自動修復動作，等待服務恢復。"
+      suggested_action: NO_ACTION
+      kubectl_command: ""
+      estimated_downtime: "N/A"
+      risk: medium
+      responsibility: INFRA
+      responsibility_reasoning: "外部網站超出系統控制範圍，無法自動修復，通知後人工跟進"
+      secondary_teams: []
+      optimization:
+        - type: STATUS_CHECK
+          description: "手動確認外部網站狀態"
+          command: "curl -sv {instance} --max-time 10 2>&1 | grep -E '(HTTP|Connected|Failed)'"
+      reasoning: "[規則匹配] 外部網站下線屬外部依賴，通知統帥後等待服務恢復，必要時切換備援路徑。"
+
+  # ── 通用兜底 ────────────────────────────────────────────────
+
+  - id: generic_fallback
+    priority: 999
+    description: 通用兜底規則 (無法匹配的告警)
+    match:
+      alertname:
+        - "*"
+    response:
+      action_title: "重新啟動 {target} 服務"
+      description: "⚙️ 規則匹配: {target} 發生異常，需進一步診斷確認根因。"
+      suggested_action: RESTART_DEPLOYMENT
+      kubectl_command: "kubectl rollout restart deployment/{target} -n {namespace}"
+      estimated_downtime: "5-15 min"
+      risk: medium
+      responsibility: COLLAB
+      responsibility_reasoning: "告警資訊不足以判定單一責任團隊，建議多團隊協同排查"
+      secondary_teams: [BE, INFRA]
+      optimization: []
+      reasoning: "[規則匹配] 根據告警先重啟恢復服務，同時安排深入診斷。"
--- a/apps/api/docker-compose.test.yml
+++ b/apps/api/docker-compose.test.yml
@@ -0,0 +1,58 @@
+# AWOOOI 整合測試用 Docker Compose
+# ===================================
+# 用途: CI 環境中提供完全隔離的 PostgreSQL + Redis
+# 不用於生產環境
+#
+# 啟動: docker compose -f docker-compose.test.yml up -d
+# 停止: docker compose -f docker-compose.test.yml down -v
+#
+# 2026-04-10 Claude Sonnet 4.6 Asia/Taipei
+
+services:
+  postgres-test:
+    image: pgvector/pgvector:pg16
+    environment:
+      POSTGRES_DB: awoooi_test
+      POSTGRES_USER: awoooi
+      POSTGRES_PASSWORD: awoooi_test_2026
+    ports:
+      - "15432:5432"
+    healthcheck:
+      test: ["CMD-SHELL", "pg_isready -U awoooi -d awoooi_test"]
+      interval: 5s
+      timeout: 3s
+      retries: 10
+    tmpfs:
+      - /var/lib/postgresql/data  # 記憶體內 — 快 + 隔離
+
+  redis-test:
+    image: redis:7-alpine
+    ports:
+      - "16380:6379"
+    healthcheck:
+      test: ["CMD", "redis-cli", "ping"]
+      interval: 5s
+      timeout: 3s
+      retries: 5
+
+  # 2026-04-10 Claude Sonnet 4.6 Asia/Taipei: 整合測試 runner
+  # 在 compose 網路內跑 pytest，hostname=postgres-test 直連，不依賴 host venv
+  # Schema 由 CD workflow 用 compose exec psql 初始化（避免 DinD volume 路徑問題）
+  pytest-runner:
+    image: python:3.11-slim
+    working_dir: /workspace
+    volumes:
+      - .:/workspace
+    environment:
+      TEST_DATABASE_URL: "postgresql+asyncpg://awoooi:awoooi_test_2026@postgres-test:5432/awoooi_test?ssl=disable"
+    depends_on:
+      postgres-test:
+        condition: service_healthy
+      redis-test:
+        condition: service_healthy
+    command: >
+      sh -c "pip install -q uv &&
+             uv pip install -q --system -e '.[dev]' &&
+             pytest tests/integration/test_b5_core_flows.py -v --tb=short"
+    profiles:
+      - test  # 只在明確指定 --profile test 時才啟動
--- a/apps/api/migrations/adr071_notification_lifecycle.sql
+++ b/apps/api/migrations/adr071_notification_lifecycle.sql
@@ -0,0 +1,95 @@
+-- ADR-071-A: 告警通知四類型 + 全生命週期 DB 記錄
+-- 建立時間: 2026-04-11 (台北時區)
+-- 建立者: Claude Sonnet 4.6 — ADR-071 第一批
+--
+-- 設計說明:
+--   在現有表上補充欄位，不新建表
+--   PgEnum ADD VALUE 必須在獨立 transaction 執行（不能在同一 tx 內使用新值）
+--
+-- 執行順序:
+--   Step 1: PgEnum 新增值（獨立 transaction）
+--   Step 2: incidents 表新增 7 個欄位
+--   Step 3: 驗收查詢
+
+-- ============================================================================
+-- Step 1: alert_event_type PgEnum 新增 5 個值
+-- 注意: ADD VALUE IF NOT EXISTS 是 idempotent，重複執行安全
+-- 注意: 每個 ADD VALUE 必須在獨立 transaction（不能批次）
+-- ============================================================================
+
+-- 分類通知事件
+ALTER TYPE alert_event_type ADD VALUE IF NOT EXISTS 'NOTIFICATION_CLASSIFIED';
+
+-- 手動修復記錄
+ALTER TYPE alert_event_type ADD VALUE IF NOT EXISTS 'MANUAL_FIX_RECORDED';
+
+-- KM 轉換完成
+ALTER TYPE alert_event_type ADD VALUE IF NOT EXISTS 'KM_CONVERTED';
+
+-- Playbook 草稿建立
+ALTER TYPE alert_event_type ADD VALUE IF NOT EXISTS 'PLAYBOOK_DRAFT_CREATED';
+
+-- 狀態機守衛攔截
+ALTER TYPE alert_event_type ADD VALUE IF NOT EXISTS 'STATE_GUARD_BLOCKED';
+
+-- ============================================================================
+-- Step 2: incidents 表新增 7 個欄位
+-- 注意: ADD COLUMN IF NOT EXISTS 是 idempotent，重複執行安全
+-- ============================================================================
+
+-- 通知類型記錄 (TYPE-1/2/3/4/4D)
+ALTER TABLE incidents
+    ADD COLUMN IF NOT EXISTS notification_type VARCHAR(10);
+
+-- 告警類別（決定 TYPE-3 按鈕組合）
+ALTER TABLE incidents
+    ADD COLUMN IF NOT EXISTS alert_category VARCHAR(50);
+
+-- MCP 情報收集快照（執行前，Sprint A 完成後由 MCP Phase 2 填充）
+ALTER TABLE incidents
+    ADD COLUMN IF NOT EXISTS context_bundle JSONB;
+
+-- 指標快照（執行前，Prometheus MCP 採集）— ADR-071-I 使用
+ALTER TABLE incidents
+    ADD COLUMN IF NOT EXISTS metrics_before JSONB;
+
+-- 指標快照（執行後，Prometheus MCP 採集）— ADR-071-I 使用
+ALTER TABLE incidents
+    ADD COLUMN IF NOT EXISTS metrics_after JSONB;
+
+-- 執行驗證結果（K8s MCP watch_rollout 結果）— ADR-071-J 使用
+ALTER TABLE incidents
+    ADD COLUMN IF NOT EXISTS verification_result JSONB;
+
+-- 手動修復步驟（TYPE-4 使用者輸入）
+ALTER TABLE incidents
+    ADD COLUMN IF NOT EXISTS manual_fix_steps TEXT;
+
+ALTER TABLE incidents
+    ADD COLUMN IF NOT EXISTS manual_fix_by VARCHAR(100);
+
+-- ============================================================================
+-- Step 3: 驗收查詢（執行後確認欄位存在）
+-- ============================================================================
+
+-- 確認 incidents 新欄位
+SELECT column_name, data_type
+FROM information_schema.columns
+WHERE table_name = 'incidents'
+  AND column_name IN (
+      'notification_type', 'alert_category', 'context_bundle',
+      'metrics_before', 'metrics_after', 'verification_result',
+      'manual_fix_steps', 'manual_fix_by'
+  )
+ORDER BY column_name;
+
+-- 確認 alert_event_type 新值
+SELECT enumlabel
+FROM pg_enum
+JOIN pg_type ON pg_enum.enumtypid = pg_type.oid
+WHERE pg_type.typname = 'alert_event_type'
+  AND enumlabel IN (
+      'NOTIFICATION_CLASSIFIED', 'MANUAL_FIX_RECORDED',
+      'KM_CONVERTED', 'PLAYBOOK_DRAFT_CREATED', 'STATE_GUARD_BLOCKED'
+  )
+ORDER BY enumlabel;
--- a/apps/api/migrations/adr088_trust_score_persistence.sql
+++ b/apps/api/migrations/adr088_trust_score_persistence.sql
@@ -0,0 +1,24 @@
+-- ADR-088: Trust Score 持久化
+-- Phase 4+: TrustScoreManager 從記憶體升級為 PostgreSQL 持久化
+-- 解決問題: Pod 重啟後 AI 信任分數歸零，永遠無法累積到 L4 自動放行門檻
+-- 2026-04-17 ogt + Claude Sonnet 4.6（亞太）
+
+CREATE TABLE IF NOT EXISTS trust_records (
+    action_pattern      VARCHAR(255)    PRIMARY KEY,
+    score               INTEGER         NOT NULL DEFAULT 0,
+    total_approvals     INTEGER         NOT NULL DEFAULT 0,
+    total_rejections    INTEGER         NOT NULL DEFAULT 0,
+    last_approval_by    VARCHAR(100),
+    last_approval_at    TIMESTAMPTZ,
+    last_rejection_by   VARCHAR(100),
+    last_rejection_at   TIMESTAMPTZ,
+    created_at          TIMESTAMPTZ     NOT NULL DEFAULT NOW(),
+    updated_at          TIMESTAMPTZ     NOT NULL DEFAULT NOW()
+);
+
+COMMENT ON TABLE trust_records IS
+    'ADR-088: TrustScoreManager 持久化層。記錄每個 action_pattern 的累積信任分數，'
+    '跨 Pod 重啟存活。score >= 5 → MEDIUM 自動降 LOW，score >= 10 → HIGH 降 MEDIUM。';
+
+CREATE INDEX IF NOT EXISTS ix_trust_records_score ON trust_records (score DESC);
+CREATE INDEX IF NOT EXISTS ix_trust_records_updated ON trust_records (updated_at DESC);
--- a/apps/api/migrations/adr090_asset_inventory_foundation.sql
+++ b/apps/api/migrations/adr090_asset_inventory_foundation.sql
@@ -0,0 +1,607 @@
+-- ADR-090: 監控盲區治理 + 資產盤點 × 7 項自動化覆蓋矩陣永久化 DB
+-- 建立時間: 2026-04-18 下午 (台北時區)
+-- 建立者: ogt + Claude Opus 4.7 (1M context)（亞太）
+--
+-- 上游:
+--   - 主戰略: docs/superpowers/specs/2026-04-18-blindspot-governance-capacity-l4.md §5.2
+--   - ADR: docs/adr/ADR-090-monitoring-blindspot-governance.md
+--   - MEMORY: project_blindspot_governance.md
+--
+-- 設計說明:
+--   本檔建立 11 張表作為 AWOOOI L4 AIOps 的資產盤點 + 自動化覆蓋 + AI 協作稽核地基。
+--   目標: 把治理從 Markdown 搬進 PostgreSQL，讓 AI 四分工 (OpenClaw × NemoTron ×
+--         Hermes × Claude LLM) 在結構化資料上做決策，且每次動作必留 trail。
+--
+--   對應七大自動化引擎:
+--     E1 自動監控 / E2 自動告警 / E3 自動建規則 / E4 自動匹配
+--     E5 自動 Playbook / E6 自動修復 / E7 自動 KM
+--
+-- 執行順序:
+--   Step 0: pgcrypto extension (gen_random_uuid 需要)
+--   Step 1:  asset_inventory         — 全景資產主表
+--   Step 2:  asset_discovery_run     — 每次盤點 header
+--   Step 3:  asset_coverage_snapshot — 資產 × 7 自動化覆蓋矩陣
+--   Step 4:  asset_relationship      — 資產依賴圖 (爆炸半徑)
+--   Step 5:  alert_rule_catalog      — 告警規則本身即資產
+--   Step 6:  asset_change_event      — 資產變化追蹤
+--   Step 7:  asset_compliance_snapshot — SSL/CVE/secret/backup 合規
+--   Step 8:  host_capacity_snapshot  — 主機容量快照 (NemoTron 每日 02:00 寫)
+--   Step 9:  capacity_violation_event — 配額違規
+--   Step 10: automation_operation_log — 所有 AI 自動化動作稽核主表 🔴
+--   Step 11: ai_collaboration_trace  — 多 Agent 協作逐步 (辯證歷程)
+--   Step 12: 驗收查詢 (comment-only)
+--
+-- Idempotent 鐵律:
+--   - CREATE TABLE IF NOT EXISTS
+--   - CREATE INDEX IF NOT EXISTS
+--   - CHECK constraint 寫在 CREATE TABLE 內,依賴 IF NOT EXISTS 保護
+--   - 本檔可重複執行安全 (rerun 不會破壞既有資料)
+--
+-- 回滾:
+--   DROP TABLE IF EXISTS ai_collaboration_trace, automation_operation_log,
+--       capacity_violation_event, host_capacity_snapshot, asset_compliance_snapshot,
+--       asset_change_event, alert_rule_catalog, asset_relationship,
+--       asset_coverage_snapshot, asset_discovery_run, asset_inventory CASCADE;
+--
+-- ============================================================================
+-- Step 0: pgcrypto extension (gen_random_uuid)
+-- ============================================================================
+
+CREATE EXTENSION IF NOT EXISTS pgcrypto;
+
+
+-- ============================================================================
+-- Step 1: asset_inventory — 全景資產主表
+-- 用途: 主機 / 容器 / K8s workload / DB / 網站 / API / 套件 / 日誌 / KM / 前端 /
+--       後端 / 容器 / Gitea / CI-CD 全部無例外
+-- 主寫者: scanner (asset_discovery) + NemoTron (capacity 欄位)
+-- ============================================================================
+
+CREATE TABLE IF NOT EXISTS asset_inventory (
+    asset_id              BIGSERIAL       PRIMARY KEY,
+    asset_key             TEXT            NOT NULL UNIQUE,
+    asset_type            TEXT            NOT NULL,
+    parent_asset_id       BIGINT          REFERENCES asset_inventory(asset_id),
+    environment           TEXT            NOT NULL DEFAULT 'prod',
+    host                  TEXT,
+    namespace             TEXT,
+    name                  TEXT            NOT NULL,
+    metadata              JSONB           NOT NULL DEFAULT '{}'::jsonb,
+    tags                  TEXT[]          NOT NULL DEFAULT '{}',
+    owner_team            TEXT,
+    criticality           TEXT,
+    data_classification   TEXT,
+    external              BOOLEAN         NOT NULL DEFAULT false,
+    lifecycle_state       TEXT            NOT NULL DEFAULT 'active',
+    source_repo           TEXT,
+    source_commit_sha     TEXT,
+
+    -- 容量欄位 (Layer 4 AI 巡檢用)
+    cpu_avg_7d            NUMERIC(5,2),
+    mem_avg_7d            NUMERIC(5,2),
+    capacity_headroom     NUMERIC(5,2),
+    resource_limits       JSONB,
+    resource_requests     JSONB,
+    quota_violation_count INT             NOT NULL DEFAULT 0,
+    sla_target            JSONB,
+    cost_monthly_usd      NUMERIC(10,2),
+
+    -- 生命週期時間戳
+    first_seen_at         TIMESTAMPTZ     NOT NULL DEFAULT NOW(),
+    last_seen_at          TIMESTAMPTZ     NOT NULL DEFAULT NOW(),
+    decommissioned_at     TIMESTAMPTZ,
+    created_at            TIMESTAMPTZ     NOT NULL DEFAULT NOW(),
+    updated_at            TIMESTAMPTZ     NOT NULL DEFAULT NOW(),
+
+    CONSTRAINT asset_inventory_criticality_valid
+        CHECK (criticality IS NULL OR criticality IN ('P0','P1','P2','P3')),
+    CONSTRAINT asset_inventory_data_class_valid
+        CHECK (data_classification IS NULL OR data_classification IN
+            ('public','internal','sensitive','secret')),
+    CONSTRAINT asset_inventory_lifecycle_valid
+        CHECK (lifecycle_state IN
+            ('planned','provisioning','active','degraded','deprecated','decommissioned')),
+    CONSTRAINT asset_inventory_type_valid
+        CHECK (asset_type IN (
+            'host','container','k8s_workload','k8s_resource','database','table',
+            'website','api_endpoint','package','log_stream','km_entry',
+            'frontend','backend','ci_pipeline','gitea_repo','monitoring_target',
+            'secret','volume','network','certificate','scheduled_job',
+            'message_queue','cache','dashboard','ai_agent','llm_model',
+            'third_party_service','backup_target'
+        ))
+);
+
+COMMENT ON TABLE asset_inventory IS
+    'ADR-090: 全景資產主表。每一個主機/容器/K8s workload/DB/網站/API/套件/...都有一筆,跨 run 沿用同 asset_id。';
+
+CREATE INDEX IF NOT EXISTS idx_asset_inventory_type_host
+    ON asset_inventory(asset_type, host);
+CREATE INDEX IF NOT EXISTS idx_asset_inventory_env_lifecycle
+    ON asset_inventory(environment, lifecycle_state);
+CREATE INDEX IF NOT EXISTS idx_asset_inventory_metadata_gin
+    ON asset_inventory USING GIN (metadata);
+CREATE INDEX IF NOT EXISTS idx_asset_inventory_tags_gin
+    ON asset_inventory USING GIN (tags);
+CREATE INDEX IF NOT EXISTS idx_asset_inventory_active_last_seen
+    ON asset_inventory(last_seen_at DESC)
+    WHERE lifecycle_state = 'active';
+-- 註: partial index 只索引 active 資產,按最近出現時間排序
+
+
+-- ============================================================================
+-- Step 2: asset_discovery_run — 每次盤點 header
+-- 用途: 記錄每次全景掃描的起止時間、掃描範圍、掃到什麼、新增/消失多少
+-- 觸發: cron (每日) / ai (proactive_inspector) / human (手動) / incident
+-- ============================================================================
+
+CREATE TABLE IF NOT EXISTS asset_discovery_run (
+    run_id              UUID            PRIMARY KEY DEFAULT gen_random_uuid(),
+    triggered_by        TEXT            NOT NULL,
+    scope               TEXT[]          NOT NULL,
+    scan_depth          TEXT            NOT NULL DEFAULT 'shallow',
+    host_filter         TEXT[],
+    started_at          TIMESTAMPTZ     NOT NULL DEFAULT NOW(),
+    ended_at            TIMESTAMPTZ,
+    status              TEXT            NOT NULL,
+    total_assets        INT,
+    new_assets          INT             NOT NULL DEFAULT 0,
+    modified_assets     INT             NOT NULL DEFAULT 0,
+    disappeared_assets  INT             NOT NULL DEFAULT 0,
+    tools_used          JSONB,
+    duration_ms         INT,
+    error               TEXT,
+    summary             JSONB,
+
+    CONSTRAINT asset_discovery_run_status_valid
+        CHECK (status IN ('running','success','partial','failed','aborted')),
+    CONSTRAINT asset_discovery_run_scan_depth_valid
+        CHECK (scan_depth IN ('shallow','deep','full'))
+);
+
+COMMENT ON TABLE asset_discovery_run IS
+    'ADR-090: 每次資產盤點的 header。run_id 作為下游 snapshot/event/change 的關聯主鍵。';
+
+CREATE INDEX IF NOT EXISTS idx_asset_discovery_run_started
+    ON asset_discovery_run(started_at DESC);
+CREATE INDEX IF NOT EXISTS idx_asset_discovery_run_status
+    ON asset_discovery_run(status) WHERE status IN ('running','failed','partial');
+
+
+-- ============================================================================
+-- Step 3: asset_coverage_snapshot — 資產 × 7 項自動化 覆蓋矩陣
+-- 用途: 每個資產在 7 個自動化維度上的覆蓋狀態 (green/yellow/red)
+-- 鐵律: 每次 discovery_run 為每個 asset 寫 7 筆 (7 dimensions)
+-- ============================================================================
+
+CREATE TABLE IF NOT EXISTS asset_coverage_snapshot (
+    snapshot_id         BIGSERIAL       PRIMARY KEY,
+    run_id              UUID            NOT NULL REFERENCES asset_discovery_run(run_id) ON DELETE CASCADE,
+    asset_id            BIGINT          NOT NULL REFERENCES asset_inventory(asset_id),
+    dimension           TEXT            NOT NULL,
+    coverage_status     TEXT            NOT NULL,
+    evidence            JSONB           NOT NULL DEFAULT '{}'::jsonb,
+    gap_reason          TEXT,
+    recommended_action  TEXT,
+    confidence          NUMERIC(3,2),
+    detected_by         TEXT            NOT NULL,
+    created_at          TIMESTAMPTZ     NOT NULL DEFAULT NOW(),
+
+    CONSTRAINT asset_coverage_snapshot_dimension_valid
+        CHECK (dimension IN (
+            'auto_monitoring','auto_alerting','auto_rule_creation',
+            'auto_rule_matching','auto_playbook','auto_remediation','auto_km_creation'
+        )),
+    CONSTRAINT asset_coverage_snapshot_status_valid
+        CHECK (coverage_status IN ('green','yellow','red','unknown')),
+    CONSTRAINT asset_coverage_snapshot_unique
+        UNIQUE (run_id, asset_id, dimension)
+);
+
+COMMENT ON TABLE asset_coverage_snapshot IS
+    'ADR-090: 計分卡。查 red COUNT 即覆蓋率 SLO。evidence 欄位串 playbook_id/km_entry_id/rule_name。';
+
+CREATE INDEX IF NOT EXISTS idx_asset_coverage_snapshot_asset_dim
+    ON asset_coverage_snapshot(asset_id, dimension);
+CREATE INDEX IF NOT EXISTS idx_asset_coverage_snapshot_red_yellow
+    ON asset_coverage_snapshot(coverage_status)
+    WHERE coverage_status IN ('red','yellow');
+CREATE INDEX IF NOT EXISTS idx_asset_coverage_snapshot_run
+    ON asset_coverage_snapshot(run_id);
+
+
+-- ============================================================================
+-- Step 4: asset_relationship — 資產依賴圖 (爆炸半徑必需)
+-- 用途: 記錄資產之間的 depends_on / calls / stores_data_in / backs_up_to 關係
+-- AI 用途: OpenClaw 計算 blast_radius 時查這張表
+-- ============================================================================
+
+CREATE TABLE IF NOT EXISTS asset_relationship (
+    relationship_id     BIGSERIAL       PRIMARY KEY,
+    from_asset_id       BIGINT          NOT NULL REFERENCES asset_inventory(asset_id),
+    to_asset_id         BIGINT          NOT NULL REFERENCES asset_inventory(asset_id),
+    relationship_type   TEXT            NOT NULL,
+    strength            NUMERIC(3,2),
+    metadata            JSONB,
+    first_detected_at   TIMESTAMPTZ     NOT NULL DEFAULT NOW(),
+    last_verified_at    TIMESTAMPTZ,
+    is_active           BOOLEAN         NOT NULL DEFAULT true,
+
+    CONSTRAINT asset_relationship_type_valid
+        CHECK (relationship_type IN (
+            'depends_on','calls','stores_data_in','backs_up_to',
+            'routes_to','authenticates_via','monitors','alerts_to','logs_to'
+        )),
+    CONSTRAINT asset_relationship_strength_valid
+        CHECK (strength IS NULL OR (strength >= 0 AND strength <= 1)),
+    CONSTRAINT asset_relationship_unique
+        UNIQUE (from_asset_id, to_asset_id, relationship_type),
+    CONSTRAINT asset_relationship_no_self_loop
+        CHECK (from_asset_id <> to_asset_id)
+);
+
+COMMENT ON TABLE asset_relationship IS
+    'ADR-090: 資產依賴圖。AI 計算爆炸半徑必讀。edge 而非 tree,支援多重關係。';
+
+CREATE INDEX IF NOT EXISTS idx_asset_relationship_from
+    ON asset_relationship(from_asset_id) WHERE is_active;
+CREATE INDEX IF NOT EXISTS idx_asset_relationship_to
+    ON asset_relationship(to_asset_id) WHERE is_active;
+CREATE INDEX IF NOT EXISTS idx_asset_relationship_type
+    ON asset_relationship(relationship_type);
+
+
+-- ============================================================================
+-- Step 5: alert_rule_catalog — 告警規則本身即資產
+-- 用途: 把 alert_rules.yaml 升級為 DB-driven;記錄誰創的 / 何時 / 效能 / 生死
+-- AI 用途: Hermes 做 noise_rate 分析 / 提建議 retire 低品質規則
+-- ============================================================================
+
+CREATE TABLE IF NOT EXISTS alert_rule_catalog (
+    rule_id                 BIGSERIAL       PRIMARY KEY,
+    rule_name               TEXT            NOT NULL UNIQUE,
+    source                  TEXT            NOT NULL,
+    expr                    TEXT            NOT NULL,
+    duration_seconds        INT,
+    severity                TEXT,
+    labels                  JSONB,
+    annotations             JSONB,
+    linked_asset_ids        BIGINT[],
+    created_by_agent        TEXT,
+
+    -- 規則品質追蹤
+    true_positive_count     INT             NOT NULL DEFAULT 0,
+    false_positive_count    INT             NOT NULL DEFAULT 0,
+    noise_rate              NUMERIC(5,2),
+    last_fired_at           TIMESTAMPTZ,
+
+    -- 信心與演化
+    confidence              NUMERIC(3,2),
+    review_status           TEXT,
+    superseded_by_rule_id   BIGINT          REFERENCES alert_rule_catalog(rule_id),
+
+    created_at              TIMESTAMPTZ     NOT NULL DEFAULT NOW(),
+    updated_at              TIMESTAMPTZ     NOT NULL DEFAULT NOW(),
+
+    CONSTRAINT alert_rule_catalog_source_valid
+        CHECK (source IN ('yaml_hardcoded','ai_generated','human_written','playbook_derived')),
+    CONSTRAINT alert_rule_catalog_review_valid
+        CHECK (review_status IS NULL OR review_status IN
+            ('draft','approved','deprecated','retired'))
+);
+
+COMMENT ON TABLE alert_rule_catalog IS
+    'ADR-090: 告警規則即一等資產。支援規則演化 (ai_generated) 與替代鏈 (superseded_by)。';
+
+CREATE INDEX IF NOT EXISTS idx_alert_rule_catalog_source
+    ON alert_rule_catalog(source);
+CREATE INDEX IF NOT EXISTS idx_alert_rule_catalog_assets_gin
+    ON alert_rule_catalog USING GIN (linked_asset_ids);
+CREATE INDEX IF NOT EXISTS idx_alert_rule_catalog_review
+    ON alert_rule_catalog(review_status) WHERE review_status IS NOT NULL;
+
+
+-- ============================================================================
+-- Step 6: asset_change_event — 資產變化追蹤 (diff between runs)
+-- 用途: 兩次 discovery_run 之間的 delta。新增/消失/修改/覆蓋率變化
+-- ============================================================================
+
+CREATE TABLE IF NOT EXISTS asset_change_event (
+    event_id        BIGSERIAL       PRIMARY KEY,
+    run_id          UUID            NOT NULL REFERENCES asset_discovery_run(run_id),
+    asset_id        BIGINT          REFERENCES asset_inventory(asset_id),
+    change_type     TEXT            NOT NULL,
+    before_state    JSONB,
+    after_state     JSONB,
+    diff            JSONB,
+    detected_at     TIMESTAMPTZ     NOT NULL DEFAULT NOW(),
+    ai_analysis     TEXT,
+
+    CONSTRAINT asset_change_event_type_valid
+        CHECK (change_type IN (
+            'asset_added','asset_removed','asset_modified',
+            'coverage_improved','coverage_degraded',
+            'criticality_changed','owner_changed','lifecycle_changed'
+        ))
+);
+
+COMMENT ON TABLE asset_change_event IS
+    'ADR-090: 資產變化追蹤。兩次掃描的 diff 明確落地,LLM 可加 ai_analysis 解讀。';
+
+CREATE INDEX IF NOT EXISTS idx_asset_change_event_run
+    ON asset_change_event(run_id);
+CREATE INDEX IF NOT EXISTS idx_asset_change_event_asset_time
+    ON asset_change_event(asset_id, detected_at DESC);
+
+
+-- ============================================================================
+-- Step 7: asset_compliance_snapshot — 合規狀態 (SSL/CVE/secret/backup)
+-- 用途: 與 coverage 不同軸的合規追蹤。SSL cert 到期 / CVE 掃描 / secret 輪替
+-- ============================================================================
+
+CREATE TABLE IF NOT EXISTS asset_compliance_snapshot (
+    snapshot_id             BIGSERIAL       PRIMARY KEY,
+    run_id                  UUID            REFERENCES asset_discovery_run(run_id),
+    asset_id                BIGINT          NOT NULL REFERENCES asset_inventory(asset_id),
+    dimension               TEXT            NOT NULL,
+    status                  TEXT            NOT NULL,
+    expires_at              TIMESTAMPTZ,
+    detail                  JSONB,
+    remediation_deadline    TIMESTAMPTZ,
+    detected_at             TIMESTAMPTZ     NOT NULL DEFAULT NOW(),
+
+    CONSTRAINT asset_compliance_snapshot_dimension_valid
+        CHECK (dimension IN (
+            'ssl_cert_valid','cve_scan','secret_rotated','backup_tested',
+            'audit_log_enabled','access_reviewed','encryption_at_rest'
+        )),
+    CONSTRAINT asset_compliance_snapshot_status_valid
+        CHECK (status IN ('compliant','warning','violation','unknown'))
+);
+
+COMMENT ON TABLE asset_compliance_snapshot IS
+    'ADR-090: 合規狀態快照。與 coverage 不同軸,SSL/CVE/secret/backup 專用。';
+
+CREATE INDEX IF NOT EXISTS idx_asset_compliance_snapshot_asset_dim
+    ON asset_compliance_snapshot(asset_id, dimension);
+CREATE INDEX IF NOT EXISTS idx_asset_compliance_snapshot_expiring
+    ON asset_compliance_snapshot(expires_at)
+    WHERE expires_at IS NOT NULL;
+CREATE INDEX IF NOT EXISTS idx_asset_compliance_snapshot_violations
+    ON asset_compliance_snapshot(status)
+    WHERE status IN ('warning','violation');
+
+
+-- ============================================================================
+-- Step 8: host_capacity_snapshot — 主機容量快照
+-- 用途: NemoTron 每日 02:00 台北 自主容量巡檢寫入
+-- Layer 4 核心表。hermes 做預測,openclaw 產建議,全寫這張
+-- ============================================================================
+
+CREATE TABLE IF NOT EXISTS host_capacity_snapshot (
+    snapshot_id             BIGSERIAL       PRIMARY KEY,
+    host                    TEXT            NOT NULL,
+    captured_at             TIMESTAMPTZ     NOT NULL DEFAULT NOW(),
+    load1                   NUMERIC(6,2),
+    load5                   NUMERIC(6,2),
+    load15                  NUMERIC(6,2),
+    cpu_used_pct            NUMERIC(5,2),
+    cpu_iowait_pct          NUMERIC(5,2),
+    mem_used_pct            NUMERIC(5,2),
+    swap_used_pct           NUMERIC(5,2),
+    disk_used_pct           JSONB,
+    container_count         INT,
+    k8s_pod_count           INT,
+    top_cpu_offenders       JSONB,
+    top_mem_offenders       JSONB,
+    headroom_pct            NUMERIC(5,2),
+    ai_verdict              TEXT,
+    ai_reasoning            TEXT,
+    recommended_actions     JSONB,
+    written_by_agent        TEXT            NOT NULL,
+
+    CONSTRAINT host_capacity_snapshot_verdict_valid
+        CHECK (ai_verdict IS NULL OR ai_verdict IN ('safe','warning','critical','unknown'))
+);
+
+COMMENT ON TABLE host_capacity_snapshot IS
+    'ADR-090: NemoTron 每日主機容量巡檢結果。Layer 4 AI 自主治理核心表。';
+
+CREATE INDEX IF NOT EXISTS idx_host_capacity_snapshot_host_time
+    ON host_capacity_snapshot(host, captured_at DESC);
+CREATE INDEX IF NOT EXISTS idx_host_capacity_snapshot_critical
+    ON host_capacity_snapshot(ai_verdict)
+    WHERE ai_verdict IN ('warning','critical');
+
+
+-- ============================================================================
+-- Step 9: capacity_violation_event — 配額違規事件
+-- 用途: 記錄任何「缺 limit」「超 request」「主機飽和」的違規
+-- ============================================================================
+
+CREATE TABLE IF NOT EXISTS capacity_violation_event (
+    event_id            BIGSERIAL       PRIMARY KEY,
+    asset_id            BIGINT          REFERENCES asset_inventory(asset_id),
+    host                TEXT,
+    violation_type      TEXT            NOT NULL,
+    threshold           NUMERIC(10,2),
+    actual_value        NUMERIC(10,2),
+    detected_at         TIMESTAMPTZ     NOT NULL DEFAULT NOW(),
+    auto_action         TEXT,
+    auto_action_op_id   UUID,
+    human_override      TEXT,
+    resolved_at         TIMESTAMPTZ,
+
+    CONSTRAINT capacity_violation_event_type_valid
+        CHECK (violation_type IN (
+            'no_limit_set','over_request','over_limit','host_saturation',
+            'over_sla_budget','unauthorized_new_deploy'
+        ))
+);
+
+COMMENT ON TABLE capacity_violation_event IS
+    'ADR-090: 配額違規稽核。每次 AI 偵測到資產無 limit/主機飽和/未授權部署 都寫一筆。';
+
+CREATE INDEX IF NOT EXISTS idx_capacity_violation_event_asset_time
+    ON capacity_violation_event(asset_id, detected_at DESC);
+CREATE INDEX IF NOT EXISTS idx_capacity_violation_event_unresolved
+    ON capacity_violation_event(detected_at DESC)
+    WHERE resolved_at IS NULL;
+
+
+-- ============================================================================
+-- Step 10: automation_operation_log — 所有 AI 自動化動作稽核主表 🔴
+-- 鐵律: 每一個 AI 自動化動作都必須寫一筆。缺筆 = 治理失效
+-- ============================================================================
+
+CREATE TABLE IF NOT EXISTS automation_operation_log (
+    op_id               UUID            PRIMARY KEY DEFAULT gen_random_uuid(),
+    operation_type      TEXT            NOT NULL,
+    asset_id            BIGINT          REFERENCES asset_inventory(asset_id),
+    incident_id         BIGINT,
+    run_id              UUID            REFERENCES asset_discovery_run(run_id),
+    actor               TEXT            NOT NULL,
+    input               JSONB           NOT NULL DEFAULT '{}'::jsonb,
+    output              JSONB           NOT NULL DEFAULT '{}'::jsonb,
+    dry_run_result      JSONB,
+    status              TEXT            NOT NULL,
+    error               TEXT,
+    duration_ms         INT,
+    tokens_in           INT,
+    tokens_out          INT,
+    cost_usd            NUMERIC(10,6),
+    budget_bucket       TEXT,
+    parent_op_id        UUID            REFERENCES automation_operation_log(op_id),
+    retry_count         INT             NOT NULL DEFAULT 0,
+    retry_of_op_id      UUID            REFERENCES automation_operation_log(op_id),
+    stderr_feed_back    TEXT,
+    tags                TEXT[],
+    created_at          TIMESTAMPTZ     NOT NULL DEFAULT NOW(),
+
+    CONSTRAINT automation_operation_log_type_valid
+        CHECK (operation_type IN (
+            'monitor_configured','monitor_removed',
+            'alert_fired','alert_suppressed','alert_routed',
+            'rule_created','rule_updated','rule_matched','rule_rejected','rule_deprecated',
+            'playbook_generated','playbook_updated','playbook_executed',
+            'remediation_executed','remediation_verified','remediation_rolled_back',
+            'self_correction_attempted',
+            'km_created','km_updated','km_linked',
+            'asset_discovered','coverage_recalculated',
+            'capacity_recommendation','quota_enforced'
+        )),
+    CONSTRAINT automation_operation_log_status_valid
+        CHECK (status IN ('pending','success','failed','dry_run','rolled_back'))
+);
+
+COMMENT ON TABLE automation_operation_log IS
+    'ADR-090: 所有 AI 自動化動作稽核主表。retry_of_op_id + stderr_feed_back 支援引擎 4 閉環。';
+
+CREATE INDEX IF NOT EXISTS idx_automation_operation_log_type_time
+    ON automation_operation_log(operation_type, created_at DESC);
+CREATE INDEX IF NOT EXISTS idx_automation_operation_log_asset_time
+    ON automation_operation_log(asset_id, created_at DESC);
+CREATE INDEX IF NOT EXISTS idx_automation_operation_log_incident
+    ON automation_operation_log(incident_id)
+    WHERE incident_id IS NOT NULL;
+CREATE INDEX IF NOT EXISTS idx_automation_operation_log_actor_time
+    ON automation_operation_log(actor, created_at DESC);
+CREATE INDEX IF NOT EXISTS idx_automation_operation_log_retry
+    ON automation_operation_log(retry_of_op_id)
+    WHERE retry_of_op_id IS NOT NULL;
+CREATE INDEX IF NOT EXISTS idx_automation_operation_log_tags_gin
+    ON automation_operation_log USING GIN (tags);
+
+
+-- ============================================================================
+-- Step 11: ai_collaboration_trace — 多 Agent 協作逐步 (LLM × OpenClaw × NemoTron × Hermes)
+-- 用途: 每個 automation_operation_log 背後的 N 步 AI 決策過程
+-- 最寶貴的語料: challenged_by + accepted 支援 RLHF fine-tune
+-- ============================================================================
+
+CREATE TABLE IF NOT EXISTS ai_collaboration_trace (
+    trace_id                BIGSERIAL       PRIMARY KEY,
+    op_id                   UUID            NOT NULL REFERENCES automation_operation_log(op_id) ON DELETE CASCADE,
+    step_order              INT             NOT NULL,
+    agent                   TEXT            NOT NULL,
+    model                   TEXT,
+    system_prompt_version   TEXT,
+    prompt                  TEXT,
+    response                JSONB,
+    confidence              NUMERIC(3,2),
+    challenged_by           TEXT[],
+    accepted                BOOLEAN,
+    tokens_in               INT,
+    tokens_out              INT,
+    duration_ms             INT,
+    created_at              TIMESTAMPTZ     NOT NULL DEFAULT NOW(),
+
+    CONSTRAINT ai_collaboration_trace_unique_step
+        UNIQUE (op_id, step_order)
+);
+
+COMMENT ON TABLE ai_collaboration_trace IS
+    'ADR-090: AI 多 Agent 協作逐步紀錄。challenged_by + accepted = RLHF 訓練語料金礦。';
+
+CREATE INDEX IF NOT EXISTS idx_ai_collaboration_trace_op
+    ON ai_collaboration_trace(op_id, step_order);
+CREATE INDEX IF NOT EXISTS idx_ai_collaboration_trace_agent_time
+    ON ai_collaboration_trace(agent, created_at DESC);
+
+
+-- ============================================================================
+-- Step 12: 驗收查詢 (執行後手動跑,驗證 11 張表都到位)
+-- ============================================================================
+
+-- SELECT table_name
+-- FROM information_schema.tables
+-- WHERE table_schema = 'public'
+--   AND table_name IN (
+--       'asset_inventory',
+--       'asset_discovery_run',
+--       'asset_coverage_snapshot',
+--       'asset_relationship',
+--       'alert_rule_catalog',
+--       'asset_change_event',
+--       'asset_compliance_snapshot',
+--       'host_capacity_snapshot',
+--       'capacity_violation_event',
+--       'automation_operation_log',
+--       'ai_collaboration_trace'
+--   )
+-- ORDER BY table_name;
+-- -- 預期: 11 筆
+
+-- SELECT table_name, COUNT(*) AS column_count
+-- FROM information_schema.columns
+-- WHERE table_schema = 'public'
+--   AND table_name LIKE 'asset_%' OR table_name IN
+--       ('alert_rule_catalog','host_capacity_snapshot','capacity_violation_event',
+--        'automation_operation_log','ai_collaboration_trace')
+-- GROUP BY table_name
+-- ORDER BY table_name;
+
+-- SELECT conname, conrelid::regclass AS table_name
+-- FROM pg_constraint
+-- WHERE conrelid IN (
+--     'asset_inventory'::regclass,
+--     'asset_discovery_run'::regclass,
+--     'asset_coverage_snapshot'::regclass,
+--     'asset_relationship'::regclass,
+--     'alert_rule_catalog'::regclass,
+--     'asset_change_event'::regclass,
+--     'asset_compliance_snapshot'::regclass,
+--     'host_capacity_snapshot'::regclass,
+--     'capacity_violation_event'::regclass,
+--     'automation_operation_log'::regclass,
+--     'ai_collaboration_trace'::regclass
+-- ) AND contype = 'c'  -- CHECK constraints only
+-- ORDER BY table_name, conname;
+
+-- ============================================================================
+-- END OF MIGRATION adr090_asset_inventory_foundation.sql
+-- 預計新增物件: 11 tables + 33 indexes + 20 CHECK constraints + 3 UNIQUE + 16 FK references
+-- 依賴: pgcrypto extension (for gen_random_uuid)
+-- 影響資料: 無 (純 DDL, 不動現有表)
+-- 回滾: 見檔案頭部
+-- ============================================================================
--- a/apps/api/migrations/adr090b_awoooi_migrator_role.sql
+++ b/apps/api/migrations/adr090b_awoooi_migrator_role.sql
@@ -0,0 +1,105 @@
+-- ADR-090-B: awoooi_migrator 限權角色 + 憑證分離
+-- 建立時間: 2026-04-18 台北時區
+-- 建立者: ogt + Claude Opus 4.7 (1M)
+--
+-- 上游: ADR-090 主檔 + feedback_secrets_leak_incidents_2026-04-18
+--
+-- 目的:
+--   1. 把 migration 操作從「應用 superuser」(awoooi) 拆出,避免 CI / AI 腳本需要生產密碼
+--   2. awoooi_migrator 只能 CREATE / ALTER / DROP / INDEX / COMMENT,不能 SELECT / DML
+--   3. 若 migrator 帳號外洩,攻擊者也無法讀取資料,只能結構性破壞 (可 rollback)
+--
+-- 執行者: 統帥 (需 superuser 權限 postgres 執行) — Claude 只起草,不執行
+--
+-- 執行步驟 (請統帥在 188 主機上 psql as postgres 超級使用者):
+--   1. 以 postgres 連上 awoooi_prod
+--   2. 把下方 <RANDOM_STRONG_PASSWORD> 替換為您親自產生的密碼
+--   3. 執行本檔
+--   4. 更新 K8s secret awoooi-secrets 新增 MIGRATION_DATABASE_URL
+--   5. 測試: PGPASSWORD='<new>' psql -h 188 -U awoooi_migrator -d awoooi_prod
+--            → 應可 CREATE TABLE x(); 但不能 SELECT * FROM incidents;
+--
+-- 回滾: DROP OWNED BY awoooi_migrator; DROP ROLE awoooi_migrator;
+
+-- ============================================================================
+-- Step 1: 建立 migrator 角色 (預設無密碼,立即設定)
+-- ============================================================================
+
+DO $$
+BEGIN
+    IF NOT EXISTS (SELECT 1 FROM pg_roles WHERE rolname = 'awoooi_migrator') THEN
+        CREATE ROLE awoooi_migrator WITH LOGIN;
+    END IF;
+END $$;
+
+-- ★ 替換為您親自產生的 32+ 字元隨機密碼 (建議 openssl rand -base64 32) ★
+ALTER ROLE awoooi_migrator WITH PASSWORD '<RANDOM_STRONG_PASSWORD>';
+-- 註: ALTER ROLE 不會寫入 pg_stat_statements log (若有 log_statement=all 請先關掉)
+
+-- ============================================================================
+-- Step 2: 授予 DDL 權限 (CREATE / ALTER / DROP / INDEX / COMMENT)
+-- ============================================================================
+
+-- 允許連線 awoooi_prod
+GRANT CONNECT ON DATABASE awoooi_prod TO awoooi_migrator;
+
+-- 允許在 public schema 建表 / 建 index
+GRANT USAGE, CREATE ON SCHEMA public TO awoooi_migrator;
+
+-- 允許管理所有現有表 (ALTER / DROP / INDEX / COMMENT)
+-- 注意: 這不包含 SELECT / INSERT / UPDATE / DELETE
+GRANT REFERENCES, TRIGGER ON ALL TABLES IN SCHEMA public TO awoooi_migrator;
+
+-- 允許執行所有 funcs (ALTER FUNCTION / DROP FUNCTION 需要)
+GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA public TO awoooi_migrator;
+
+-- 未來新建物件自動繼承上述權限 (對 awoooi 這個 owner 建的物件)
+ALTER DEFAULT PRIVILEGES IN SCHEMA public
+    GRANT REFERENCES, TRIGGER ON TABLES TO awoooi_migrator;
+
+-- 允許使用 pgcrypto / vector 等 extension
+GRANT USAGE ON ALL SEQUENCES IN SCHEMA public TO awoooi_migrator;
+ALTER DEFAULT PRIVILEGES IN SCHEMA public
+    GRANT USAGE, SELECT, UPDATE ON SEQUENCES TO awoooi_migrator;
+
+-- ============================================================================
+-- Step 3: 明確撤銷 DML 權限 (雙重保險,即使以後有誤 grant 也攔得住)
+-- ============================================================================
+
+REVOKE SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public FROM awoooi_migrator;
+
+ALTER DEFAULT PRIVILEGES IN SCHEMA public
+    REVOKE SELECT, INSERT, UPDATE, DELETE ON TABLES FROM awoooi_migrator;
+
+-- ============================================================================
+-- Step 4: 驗收查詢 (執行後手動檢查)
+-- ============================================================================
+
+-- 4.1 角色存在?
+-- SELECT rolname, rolsuper, rolcreatedb, rolcreaterole, rolcanlogin
+-- FROM pg_roles WHERE rolname = 'awoooi_migrator';
+-- -- 預期: rolname=awoooi_migrator, rolcanlogin=t, rolsuper=f
+
+-- 4.2 schema 權限?
+-- SELECT has_schema_privilege('awoooi_migrator','public','CREATE');
+-- -- 預期: t
+
+-- 4.3 DML 權限應該沒有?
+-- SET ROLE awoooi_migrator;
+-- SELECT * FROM incidents LIMIT 1;  -- 預期: ERROR permission denied
+-- RESET ROLE;
+
+-- 4.4 DDL 權限應該有?
+-- SET ROLE awoooi_migrator;
+-- CREATE TABLE test_migrator_check (id INT);
+-- DROP TABLE test_migrator_check;
+-- RESET ROLE;
+-- -- 預期: 兩條都成功
+
+-- ============================================================================
+-- END OF MIGRATION adr090b_awoooi_migrator_role.sql
+-- 安裝後 CI / AI 腳本憑證路徑:
+--   未來所有 migration 使用 MIGRATION_DATABASE_URL (awoooi_migrator)
+--   應用 pod 繼續用 DATABASE_URL (awoooi, 限 DML)
+--   兩條 URL 分別存 K8s secret 的不同 key
+-- ============================================================================
--- a/apps/api/migrations/adr090c_notification_formatted_op_type.sql
+++ b/apps/api/migrations/adr090c_notification_formatted_op_type.sql
@@ -0,0 +1,42 @@
+-- ADR-090-C: automation_operation_log.operation_type 擴充 notification_formatted
+-- 建立時間: 2026-04-18 下午 (台北時區)
+-- 建立者: ogt + Claude Opus 4.7 (1M)
+--
+-- 上游:
+--   - ADR-090 主 schema (adr090_asset_inventory_foundation.sql)
+--   - drift_narrator_service B 方案（LLM 摘要取代 str()[:30]）
+--
+-- 目的:
+--   drift_narrator 每次呼叫 LLM 生成摘要 + 寫 Telegram,
+--   這是一個 AI 動作，必須在 automation_operation_log 留痕。
+--   現有 CHECK 沒有合適的 operation_type,新增 notification_formatted。
+--
+-- Idempotent:
+--   先 DROP CONSTRAINT IF EXISTS 再 ADD,重複執行安全。
+--
+-- 執行: PGPASSWORD="$MIGRATOR_PWD" psql -U awoooi_migrator -d awoooi_prod -f 本檔
+-- 回滾: 把 notification_formatted 從 IN 清單移除後重跑。
+
+-- ============================================================================
+
+ALTER TABLE automation_operation_log
+    DROP CONSTRAINT IF EXISTS automation_operation_log_type_valid;
+
+ALTER TABLE automation_operation_log
+    ADD CONSTRAINT automation_operation_log_type_valid CHECK (operation_type IN (
+        'monitor_configured','monitor_removed',
+        'alert_fired','alert_suppressed','alert_routed',
+        'rule_created','rule_updated','rule_matched','rule_rejected','rule_deprecated',
+        'playbook_generated','playbook_updated','playbook_executed',
+        'remediation_executed','remediation_verified','remediation_rolled_back',
+        'self_correction_attempted',
+        'km_created','km_updated','km_linked',
+        'asset_discovered','coverage_recalculated',
+        'capacity_recommendation','quota_enforced',
+        'notification_formatted'  -- ADR-090-C 新增 (drift_narrator / 未來其他通知格式化 AI 動作)
+    ));
+
+-- 驗收查詢 (apply 後可手動跑):
+-- SELECT pg_get_constraintdef(oid) FROM pg_constraint
+-- WHERE conname='automation_operation_log_type_valid';
+-- 應包含 'notification_formatted'
--- a/apps/api/migrations/adr090d_kpi_data_sources.sql
+++ b/apps/api/migrations/adr090d_kpi_data_sources.sql
@@ -0,0 +1,149 @@
+-- ADR-090-D: MASTER §7.1 北極星 KPI 資料源建立
+-- 建立時間: 2026-04-18 晚 (台北時區)
+-- 建立者: ogt + Claude Opus 4.7 (1M)
+--
+-- 背景:
+--   MASTER §7.1 15 個 KPI 對標發現 4 張關鍵表根本沒建立,導致以下 KPI 永遠
+--   量不到:
+--     #3 fine-tune JSONL /week    → finetune_exports 表
+--     #6 Declarative 修復使用率   → remediation_events 表
+--     #10 notification_outcomes   → notification_outcomes 表
+--
+--   此 migration 補齊 3 張資料源表(idempotent)。
+--
+-- 對應 MASTER § 指標:
+--   §3.3 D3 修復抽象(Imperative → Declarative)
+--   §3.4 D4 學習深度(Fine-tune)
+--   §3.6 D6 自我治理(通知品質)
+
+-- ═══════════════════════════════════════════════════════════════════
+-- 1. finetune_exports — Phase 3 Fine-tune JSONL 產出追蹤
+-- ═══════════════════════════════════════════════════════════════════
+
+CREATE TABLE IF NOT EXISTS finetune_exports (
+    export_id         BIGSERIAL       PRIMARY KEY,
+    export_type       TEXT            NOT NULL,  -- 'evidence_snapshot' | 'agent_session' | 'decision_outcome'
+    source_table      TEXT,                      -- 來源表名 (incidents / agent_sessions ...)
+    source_ids        TEXT[],                    -- 涵蓋的 source record ids
+    file_path         TEXT,                      -- 匯出的 JSONL 檔案路徑
+    record_count      INT             NOT NULL DEFAULT 0,
+    size_bytes        BIGINT,
+    checksum_sha256   TEXT,
+    created_at        TIMESTAMPTZ     NOT NULL DEFAULT NOW(),
+    metadata          JSONB           NOT NULL DEFAULT '{}'::jsonb,
+    CONSTRAINT finetune_export_type_valid CHECK (export_type IN (
+        'evidence_snapshot','agent_session','decision_outcome',
+        'incident_rca','playbook_outcome','rlhf_trace'
+    ))
+);
+
+COMMENT ON TABLE finetune_exports IS
+    'ADR-090-D: MASTER §7.1 #3 Fine-tune JSONL 產出追蹤。每次 finetune_exporter 匯出寫一筆。';
+
+CREATE INDEX IF NOT EXISTS idx_finetune_exports_created
+    ON finetune_exports(created_at DESC);
+CREATE INDEX IF NOT EXISTS idx_finetune_exports_type
+    ON finetune_exports(export_type);
+
+
+-- ═══════════════════════════════════════════════════════════════════
+-- 2. remediation_events — Phase 5 Declarative 修復追蹤
+-- ═══════════════════════════════════════════════════════════════════
+
+CREATE TABLE IF NOT EXISTS remediation_events (
+    event_id              BIGSERIAL       PRIMARY KEY,
+    incident_id           TEXT,
+    approval_id           TEXT,
+    remediation_type      TEXT            NOT NULL, -- 'declarative' | 'imperative' | 'gitops_pr' | 'kubectl'
+    action_name           TEXT,
+    target_resource       TEXT,                     -- deployment/awoooi-api 等
+    namespace             TEXT,
+    dry_run               BOOLEAN         NOT NULL DEFAULT false,
+    status                TEXT            NOT NULL, -- 'pending' | 'success' | 'failed' | 'rolled_back'
+    error_message         TEXT,
+    blast_radius_score    INT,
+    duration_ms           INT,
+    executed_by           TEXT,                     -- 'ai_agent' | 'human:ogt' | 'cron'
+    triggered_by_op_id    UUID,                     -- 指向 automation_operation_log.op_id
+    created_at            TIMESTAMPTZ     NOT NULL DEFAULT NOW(),
+    completed_at          TIMESTAMPTZ,
+    metadata              JSONB           NOT NULL DEFAULT '{}'::jsonb,
+    CONSTRAINT remediation_type_valid CHECK (remediation_type IN (
+        'declarative','imperative','gitops_pr','kubectl','ansible','helm','argocd_sync'
+    )),
+    CONSTRAINT remediation_status_valid CHECK (status IN (
+        'pending','success','failed','rolled_back','dry_run_ok','dry_run_failed'
+    ))
+);
+
+COMMENT ON TABLE remediation_events IS
+    'ADR-090-D: MASTER §7.1 #6 Declarative 修復使用率。每次 declarative_remediation 執行寫一筆。';
+
+CREATE INDEX IF NOT EXISTS idx_remediation_events_time
+    ON remediation_events(created_at DESC);
+CREATE INDEX IF NOT EXISTS idx_remediation_events_type
+    ON remediation_events(remediation_type);
+CREATE INDEX IF NOT EXISTS idx_remediation_events_incident
+    ON remediation_events(incident_id) WHERE incident_id IS NOT NULL;
+
+
+-- ═══════════════════════════════════════════════════════════════════
+-- 3. notification_outcomes — 通知成果追蹤
+-- ═══════════════════════════════════════════════════════════════════
+
+CREATE TABLE IF NOT EXISTS notification_outcomes (
+    outcome_id            BIGSERIAL       PRIMARY KEY,
+    incident_id           TEXT,
+    approval_id           TEXT,
+    channel               TEXT            NOT NULL, -- 'telegram' | 'email' | 'slack' | 'webhook'
+    notification_type     TEXT,                     -- TYPE-1/2/3/4/4D/5S/6B/7E/8M
+    recipient             TEXT,                     -- chat_id / email / user
+    message_id            TEXT,                     -- telegram message_id 等
+    sent_at               TIMESTAMPTZ     NOT NULL DEFAULT NOW(),
+    delivery_status       TEXT            NOT NULL, -- 'delivered' | 'failed' | 'pending'
+    delivery_error        TEXT,
+    -- 人類互動追蹤 (RLHF 語料黃金)
+    user_action           TEXT,                     -- 'approved' | 'rejected' | 'silenced' | 'ignored' | 'no_response'
+    user_action_at        TIMESTAMPTZ,
+    user_comment          TEXT,
+    -- 通知品質
+    snoozed_count         INT             NOT NULL DEFAULT 0,
+    time_to_action_sec    INT,                       -- 收到到按鈕按下的秒數
+    metadata              JSONB           NOT NULL DEFAULT '{}'::jsonb,
+    CONSTRAINT notif_channel_valid CHECK (channel IN (
+        'telegram','email','slack','webhook','sms','discord'
+    )),
+    CONSTRAINT notif_delivery_valid CHECK (delivery_status IN (
+        'delivered','failed','pending','rate_limited'
+    ))
+);
+
+COMMENT ON TABLE notification_outcomes IS
+    'ADR-090-D: MASTER §7.1 #10 notification_outcomes 追蹤。每次 telegram_gateway 推送寫一筆,用戶按鈕觸發時 update user_action。';
+
+CREATE INDEX IF NOT EXISTS idx_notification_outcomes_sent
+    ON notification_outcomes(sent_at DESC);
+CREATE INDEX IF NOT EXISTS idx_notification_outcomes_incident
+    ON notification_outcomes(incident_id) WHERE incident_id IS NOT NULL;
+CREATE INDEX IF NOT EXISTS idx_notification_outcomes_approval
+    ON notification_outcomes(approval_id) WHERE approval_id IS NOT NULL;
+CREATE INDEX IF NOT EXISTS idx_notification_outcomes_pending_action
+    ON notification_outcomes(sent_at DESC)
+    WHERE user_action IS NULL AND delivery_status='delivered';
+
+
+-- ═══════════════════════════════════════════════════════════════════
+-- 驗收 (執行後可手動跑)
+-- ═══════════════════════════════════════════════════════════════════
+
+-- SELECT table_name FROM information_schema.tables
+-- WHERE table_schema='public'
+--   AND table_name IN ('finetune_exports','remediation_events','notification_outcomes')
+-- ORDER BY table_name;
+-- 預期: 3 筆
+
+-- SELECT conname FROM pg_constraint WHERE conrelid IN (
+--   'finetune_exports'::regclass,
+--   'remediation_events'::regclass,
+--   'notification_outcomes'::regclass
+-- ) AND contype='c' ORDER BY conname;
--- a/apps/api/migrations/adr091_aider_events_schema.sql
+++ b/apps/api/migrations/adr091_aider_events_schema.sql
@@ -0,0 +1,22 @@
+-- adr091: aider_events schema
+-- 2026-04-20 @ Asia/Taipei
+-- 紀錄統帥本機 aider CLI 活動，供 AI Router feedback + symptom_pattern 抽取
+
+CREATE TABLE IF NOT EXISTS aider_events (
+  id              BIGSERIAL PRIMARY KEY,
+  session_id      TEXT NOT NULL,
+  ts              TIMESTAMPTZ NOT NULL,
+  type            TEXT NOT NULL,                  -- session_start|file_edit|error|commit|silent_timeout|session_end|raw
+  host            TEXT DEFAULT 'ogt-mac',
+  payload         JSONB NOT NULL,
+  incident_id     TEXT,
+  created_at      TIMESTAMPTZ NOT NULL DEFAULT now()
+);
+CREATE INDEX IF NOT EXISTS aider_events_session_idx ON aider_events(session_id);
+CREATE INDEX IF NOT EXISTS aider_events_type_ts_idx ON aider_events(type, ts DESC);
+CREATE INDEX IF NOT EXISTS aider_events_ts_idx ON aider_events(ts DESC);
+CREATE INDEX IF NOT EXISTS aider_events_payload_gin ON aider_events USING GIN (payload);
+
+COMMENT ON TABLE aider_events IS 'aider CLI 事件流（Mac 端 aiderw wrapper 推入）';
+COMMENT ON COLUMN aider_events.incident_id IS '若觸發建 incident，記 FK 至 incidents.incident_id';
+COMMENT ON COLUMN aider_events.payload IS 'Type-specific payload JSON，見 src/models/aider.py schema';
--- a/apps/api/migrations/adr091_rollback.sql
+++ b/apps/api/migrations/adr091_rollback.sql
@@ -0,0 +1,9 @@
+-- adr091 rollback: drop aider_events + indexes
+-- 2026-04-20 @ Asia/Taipei
+-- 僅在 schema 誤套 / 緊急回滾時使用；資料不可復原
+
+DROP INDEX IF EXISTS aider_events_payload_gin;
+DROP INDEX IF EXISTS aider_events_ts_idx;
+DROP INDEX IF EXISTS aider_events_type_ts_idx;
+DROP INDEX IF EXISTS aider_events_session_idx;
+DROP TABLE IF EXISTS aider_events CASCADE;
--- a/apps/api/migrations/fix_playbooks_array_to_jsonb.sql
+++ b/apps/api/migrations/fix_playbooks_array_to_jsonb.sql
@@ -0,0 +1,11 @@
+-- 修正 playbooks 表 text[] 欄位 → jsonb
+-- 原因: ORM 送 JSON type，DB 欄位為 text[]，導致 DatatypeMismatchError
+-- 2026-04-15 ogt + Claude Sonnet 4.6（亞太）: 已手動套用到 prod
+
+ALTER TABLE playbooks ALTER COLUMN source_incident_ids DROP DEFAULT;
+ALTER TABLE playbooks ALTER COLUMN source_incident_ids TYPE jsonb USING to_jsonb(source_incident_ids);
+ALTER TABLE playbooks ALTER COLUMN source_incident_ids SET DEFAULT '[]'::jsonb;
+
+ALTER TABLE playbooks ALTER COLUMN tags DROP DEFAULT;
+ALTER TABLE playbooks ALTER COLUMN tags TYPE jsonb USING to_jsonb(tags);
+ALTER TABLE playbooks ALTER COLUMN tags SET DEFAULT '[]'::jsonb;
--- a/apps/api/migrations/flywheel_playbook_embeddings.sql
+++ b/apps/api/migrations/flywheel_playbook_embeddings.sql
@@ -0,0 +1,27 @@
+-- Phase 4 飛輪修復 (ADR-067 延伸): Playbook Embeddings 持久化表
+-- 2026-04-10 Claude Sonnet 4.6 Asia/Taipei
+-- 目的: 解決冷啟動飛輪斷層 — Playbook 語義相似度查詢
+--
+-- 前置: pgvector extension 已安裝 (phase28_rag_pgvector.sql)
+-- 向量模型: nomic-embed-text (Ollama 192.168.0.188:11434) → 768 維
+--
+-- 索引策略:
+--   < 100 筆: 線性掃描 (無需索引)
+--   > 100 筆: 執行 CREATE INDEX ivfflat (phase35 已示範)
+
+CREATE TABLE IF NOT EXISTS playbook_embeddings (
+    playbook_id   TEXT PRIMARY KEY,
+    embedding     vector(768),                 -- nomic-embed-text 768 維
+    alert_names   TEXT[]    NOT NULL DEFAULT '{}',  -- 索引時的 alert_names 快照
+    keywords      TEXT[]    NOT NULL DEFAULT '{}',  -- 索引時的 keywords 快照
+    indexed_at    TIMESTAMPTZ NOT NULL DEFAULT NOW(),
+    updated_at    TIMESTAMPTZ NOT NULL DEFAULT NOW()
+);
+
+COMMENT ON TABLE playbook_embeddings IS
+    'Playbook 向量索引 — Phase 4 飛輪修復 (2026-04-10) — nomic-embed-text 768 維';
+
+-- 向量近鄰索引 (超過 100 筆後解開)
+-- CREATE INDEX IF NOT EXISTS ix_playbook_embeddings_vec
+--   ON playbook_embeddings USING ivfflat (embedding vector_cosine_ops)
+--   WITH (lists = 10);
--- a/apps/api/migrations/phase10_auto_repair_executions.sql
+++ b/apps/api/migrations/phase10_auto_repair_executions.sql
@@ -0,0 +1,38 @@
+-- Phase 10: Auto Repair Executions 操作記錄表
+-- 建立時間: 2026-04-08 (台北時區)
+-- 建立者: Claude Code — 統帥指令「所有操作都必須被記錄，寫入資料庫」
+--
+-- 設計說明:
+--   自動修復每次執行（成功或失敗）都寫入此表
+--   不依賴 approval_id（自動修復不需要人工批准）
+--   支援查詢: 按 incident / playbook / 時間範圍 / 成功率
+
+CREATE TABLE IF NOT EXISTS auto_repair_executions (
+    -- 主鍵
+    id              VARCHAR(36) PRIMARY KEY DEFAULT gen_random_uuid()::text,
+
+    -- 關聯
+    incident_id     VARCHAR(30) NOT NULL,
+    playbook_id     VARCHAR(36) NOT NULL,
+    playbook_name   VARCHAR(200) NOT NULL,
+
+    -- 執行結果
+    success         BOOLEAN NOT NULL DEFAULT FALSE,
+    executed_steps  JSONB NOT NULL DEFAULT '[]',   -- list of step result strings
+    error_message   TEXT,
+
+    -- 執行上下文
+    triggered_by    VARCHAR(50) NOT NULL DEFAULT 'auto_repair',  -- auto_repair / cold_start_trust
+    similarity_score NUMERIC(5,4),                               -- 匹配相似度
+    risk_level      VARCHAR(20),                                  -- LOW / MEDIUM / HIGH
+    execution_time_ms INTEGER,
+
+    -- 時間戳 (台北時區)
+    created_at      TIMESTAMPTZ NOT NULL DEFAULT NOW()
+);
+
+-- 索引
+CREATE INDEX IF NOT EXISTS ix_are_incident_id   ON auto_repair_executions (incident_id);
+CREATE INDEX IF NOT EXISTS ix_are_playbook_id   ON auto_repair_executions (playbook_id);
+CREATE INDEX IF NOT EXISTS ix_are_created_at    ON auto_repair_executions (created_at DESC);
+CREATE INDEX IF NOT EXISTS ix_are_success       ON auto_repair_executions (success);
--- a/apps/api/migrations/phase11_alert_operation_log.sql
+++ b/apps/api/migrations/phase11_alert_operation_log.sql
@@ -0,0 +1,72 @@
+-- Phase 11: Alert Operation Log — 告警操作完整溯源表
+-- 建立時間: 2026-04-08 (台北時區)
+-- 建立者: Claude Code — 統帥指令「所有操作都必須被記錄，寫入資料庫」
+--
+-- 設計理念: Event Sourcing
+--   每個告警的生命週期，每個事件都寫一筆
+--   不可變 (Immutable) — 只 INSERT，不 UPDATE/DELETE
+--
+-- 事件類型 (event_type):
+--   ALERT_RECEIVED       — Alertmanager/外部告警進來
+--   TELEGRAM_SENT        — 推送 Telegram 審核卡片
+--   USER_ACTION          — 使用者在 Telegram 按按鈕 (approve/reject/silence)
+--   AUTO_REPAIR_TRIGGERED — 自動修復評估通過，準備執行
+--   EXECUTION_STARTED    — 開始執行 K8s/SSH 指令
+--   EXECUTION_COMPLETED  — 執行完成 (success/failure)
+--   TELEGRAM_RESULT_SENT — 自動修復結果推送到 Telegram
+--   RESOLVED             — 告警解除
+--   SILENCED             — 靜默中
+--   ESCALATED            — 升級 (P3→P2 等)
+
+CREATE TYPE alert_event_type AS ENUM (
+    'ALERT_RECEIVED',
+    'TELEGRAM_SENT',
+    'USER_ACTION',
+    'AUTO_REPAIR_TRIGGERED',
+    'EXECUTION_STARTED',
+    'EXECUTION_COMPLETED',
+    'TELEGRAM_RESULT_SENT',
+    'RESOLVED',
+    'SILENCED',
+    'ESCALATED'
+);
+
+CREATE TABLE IF NOT EXISTS alert_operation_log (
+    -- 主鍵 (不可變)
+    id              VARCHAR(36) PRIMARY KEY DEFAULT gen_random_uuid()::text,
+
+    -- 關聯 (所有欄位允許 NULL，避免不同事件強制關聯)
+    incident_id     VARCHAR(30),        -- incidents.incident_id
+    approval_id     VARCHAR(36),        -- approval_records.id
+    audit_log_id    VARCHAR(36),        -- audit_logs.id
+    auto_repair_id  VARCHAR(36),        -- auto_repair_executions.id
+
+    -- 事件核心
+    event_type      alert_event_type NOT NULL,
+    actor           VARCHAR(100),       -- 誰觸發: 'alertmanager' / 'telegram:user_id' / 'auto_repair' / 'system'
+    action_detail   VARCHAR(200),       -- 具體動作: 'approve' / 'reject' / 'silence' / kubectl 指令摘要
+
+    -- 執行結果
+    success         BOOLEAN,            -- NULL=不適用 (如 ALERT_RECEIVED), TRUE/FALSE=有執行結果
+    error_message   TEXT,
+
+    -- 上下文 (結構化存儲)
+    context         JSONB NOT NULL DEFAULT '{}',
+    -- 範例:
+    --   ALERT_RECEIVED: {"alert_name": "KubePodCrashLooping", "severity": "P2", "namespace": "awoooi-prod"}
+    --   USER_ACTION:    {"button": "approve", "telegram_user_id": "12345", "message_id": "67890"}
+    --   EXECUTION:      {"playbook": "restart-deployment", "steps": 3, "duration_ms": 2340}
+
+    -- 時間戳 (台北時區，不可變)
+    created_at      TIMESTAMPTZ NOT NULL DEFAULT NOW()
+);
+
+-- 索引 (查詢模式: 按 incident / 按時間 / 按事件類型)
+CREATE INDEX IF NOT EXISTS ix_aol_incident_id  ON alert_operation_log (incident_id);
+CREATE INDEX IF NOT EXISTS ix_aol_approval_id  ON alert_operation_log (approval_id);
+CREATE INDEX IF NOT EXISTS ix_aol_event_type   ON alert_operation_log (event_type);
+CREATE INDEX IF NOT EXISTS ix_aol_created_at   ON alert_operation_log (created_at DESC);
+CREATE INDEX IF NOT EXISTS ix_aol_actor        ON alert_operation_log (actor);
+
+COMMENT ON TABLE alert_operation_log IS
+'告警操作完整溯源 — Event Sourcing，不可變，每個告警生命週期的每個事件一筆記錄';
--- a/apps/api/migrations/phase11b_backfill_alert_operation_log.sql
+++ b/apps/api/migrations/phase11b_backfill_alert_operation_log.sql
@@ -0,0 +1,152 @@
+-- Phase 11b: 歷史數據回填 alert_operation_log
+-- 建立時間: 2026-04-08 (台北時區)
+-- 建立者: Claude Code — 統帥指令「把之前所有的告警訊息，通通寫入資料庫」
+--
+-- 資料來源:
+--   incidents (14筆) → ALERT_RECEIVED 事件
+--   approval_records (265筆) → TELEGRAM_SENT + USER_ACTION 事件
+--   audit_logs (110筆) → EXECUTION_STARTED + EXECUTION_COMPLETED 事件
+--
+-- 注意: 使用 ON CONFLICT DO NOTHING 避免重複執行
+
+-- ============================================================
+-- Step 1: incidents → ALERT_RECEIVED
+-- ============================================================
+INSERT INTO alert_operation_log (
+    id, incident_id, event_type, actor, action_detail, success, context, created_at
+)
+SELECT
+    gen_random_uuid()::text,
+    incident_id,
+    'ALERT_RECEIVED',
+    COALESCE(source, 'alertmanager'),
+    COALESCE(
+        signals->0->>'alert_name',
+        'unknown'
+    ),
+    TRUE,
+    jsonb_build_object(
+        'severity',          severity::text,
+        'status',            status::text,
+        'alert_name',        COALESCE(signals->0->>'alert_name', 'unknown'),
+        'namespace',         COALESCE(signals->0->'labels'->>'namespace', 'default'),
+        'resource',          COALESCE(signals->0->'labels'->>'resource', ''),
+        'message',           COALESCE(signals->0->'annotations'->>'message', ''),
+        'source',            COALESCE(source, 'alertmanager'),
+        'signal_count',      json_array_length(signals),
+        'backfill',          TRUE,
+        'backfill_at',       NOW()::text
+    ),
+    created_at
+FROM incidents
+ON CONFLICT DO NOTHING;
+
+-- ============================================================
+-- Step 2: approval_records → TELEGRAM_SENT (每筆 approval 代表推送了一次卡片)
+-- ============================================================
+INSERT INTO alert_operation_log (
+    id, incident_id, approval_id, event_type, actor, action_detail, success, context, created_at
+)
+SELECT
+    gen_random_uuid()::text,
+    incident_id,
+    id,
+    'TELEGRAM_SENT',
+    'system',
+    'approval_card_sent',
+    TRUE,
+    jsonb_build_object(
+        'action',          action,
+        'risk_level',      risk_level::text,
+        'requested_by',    requested_by,
+        'hit_count',       hit_count,
+        'backfill',        TRUE,
+        'backfill_at',     NOW()::text
+    ),
+    created_at
+FROM approval_records
+ON CONFLICT DO NOTHING;
+
+-- ============================================================
+-- Step 3: approval_records (APPROVED/REJECTED) → USER_ACTION
+-- ============================================================
+INSERT INTO alert_operation_log (
+    id, incident_id, approval_id, event_type, actor, action_detail, success, context, created_at
+)
+SELECT
+    gen_random_uuid()::text,
+    incident_id,
+    id,
+    'USER_ACTION',
+    COALESCE(requested_by, 'unknown'),
+    CASE status::text
+        WHEN 'APPROVED'          THEN 'approve'
+        WHEN 'REJECTED'          THEN 'reject'
+        WHEN 'EXECUTION_SUCCESS' THEN 'approve'
+        WHEN 'EXECUTION_FAILED'  THEN 'approve'
+        ELSE status::text
+    END,
+    CASE status::text
+        WHEN 'APPROVED'          THEN TRUE
+        WHEN 'EXECUTION_SUCCESS' THEN TRUE
+        WHEN 'REJECTED'          THEN FALSE
+        WHEN 'EXECUTION_FAILED'  THEN TRUE  -- 批准了但執行失敗
+        ELSE NULL
+    END,
+    jsonb_build_object(
+        'status',            status::text,
+        'risk_level',        risk_level::text,
+        'rejection_reason',  COALESCE(rejection_reason, ''),
+        'signatures',        signatures,
+        'resolved_at',       COALESCE(resolved_at::text, ''),
+        'backfill',          TRUE,
+        'backfill_at',       NOW()::text
+    ),
+    COALESCE(resolved_at, updated_at, created_at)
+FROM approval_records
+WHERE status::text IN ('APPROVED', 'REJECTED', 'EXECUTION_SUCCESS', 'EXECUTION_FAILED')
+ON CONFLICT DO NOTHING;
+
+-- ============================================================
+-- Step 4: audit_logs → EXECUTION_COMPLETED
+-- ============================================================
+INSERT INTO alert_operation_log (
+    id, approval_id, audit_log_id, event_type, actor, action_detail, success, error_message, context, created_at
+)
+SELECT
+    gen_random_uuid()::text,
+    approval_id,
+    id,
+    'EXECUTION_COMPLETED',
+    COALESCE(executed_by, 'system'),
+    COALESCE(operation_type, 'unknown') || '/' || COALESCE(target_resource, ''),
+    success,
+    error_message,
+    jsonb_build_object(
+        'operation_type',        operation_type,
+        'target_resource',       target_resource,
+        'namespace',             namespace,
+        'execution_duration_ms', execution_duration_ms,
+        'dry_run_passed',        dry_run_passed,
+        'authorization_channel', COALESCE(authorization_channel, ''),
+        'retry_count',           retry_count,
+        'failure_classification', COALESCE(failure_classification, ''),
+        'auto_repair_attempted', auto_repair_attempted,
+        'backfill',              TRUE,
+        'backfill_at',           NOW()::text
+    ),
+    created_at
+FROM audit_logs
+ON CONFLICT DO NOTHING;
+
+-- ============================================================
+-- 驗證結果
+-- ============================================================
+SELECT
+    event_type::text,
+    COUNT(*) as count,
+    MIN(created_at) as oldest,
+    MAX(created_at) as newest
+FROM alert_operation_log
+GROUP BY event_type
+ORDER BY event_type;
--- a/apps/api/migrations/phase26_incident_km_integration.sql
+++ b/apps/api/migrations/phase26_incident_km_integration.sql
@@ -0,0 +1,30 @@
+-- =============================================================================
+-- Phase 26: Incident → KM 完整鏈路補全
+-- 2026-04-06 ogt: 修復三重死鎖 — 告警必須寫入 DB 並建立 KM
+-- =============================================================================
+
+-- 1. approval_records 加入 incident_id 欄位
+ALTER TABLE approval_records
+    ADD COLUMN IF NOT EXISTS incident_id TEXT;
+
+CREATE INDEX IF NOT EXISTS idx_approval_records_incident_id
+    ON approval_records (incident_id)
+    WHERE incident_id IS NOT NULL;
+
+-- 2. incidents 表確保有 source 欄位 (alertmanager / manual 等)
+ALTER TABLE incidents
+    ADD COLUMN IF NOT EXISTS source TEXT DEFAULT 'alertmanager';
+
+-- 3. knowledge_entries 確保有 related_approval_id 欄位
+ALTER TABLE knowledge_entries
+    ADD COLUMN IF NOT EXISTS related_approval_id TEXT;
+
+CREATE INDEX IF NOT EXISTS idx_knowledge_entries_related_approval
+    ON knowledge_entries (related_approval_id)
+    WHERE related_approval_id IS NOT NULL;
+
+-- 完成確認
+DO $$
+BEGIN
+    RAISE NOTICE 'Phase 26 migration completed: incident_id + source + related_approval_id';
+END $$;
--- a/apps/api/migrations/phase27_incident_frequency_snapshot.sql
+++ b/apps/api/migrations/phase27_incident_frequency_snapshot.sql
@@ -0,0 +1,24 @@
+-- Phase 27: Incident Frequency Snapshot 持久化
+-- 2026-04-10 ogt: frequency_stats 只存記憶體/Redis(35天TTL)，重啟或超期即失
+--   解決方案：在 incidents 表加 frequency_snapshot JSONB，建立 incident 時寫入快照
+--   歷史按鈕優先讀 DB 快照，Redis AnomalyCounter 補充長期累積統計
+
+DO $$
+BEGIN
+    IF NOT EXISTS (
+        SELECT 1 FROM information_schema.columns
+        WHERE table_name = 'incidents' AND column_name = 'frequency_snapshot'
+    ) THEN
+        ALTER TABLE incidents ADD COLUMN frequency_snapshot JSONB DEFAULT NULL;
+        COMMENT ON COLUMN incidents.frequency_snapshot IS
+            'Snapshot of AnomalyFrequency at incident creation time. '
+            'Fields: anomaly_key, count_1h, count_24h, count_7d, count_30d, '
+            'escalation_level, auto_repair_count, last_repair_action, '
+            'human_approved_count, manual_resolved_count, cold_start_trust_count, total_resolution_count. '
+            'Added 2026-04-10 (Phase 27).';
+    END IF;
+END $$;
+
+CREATE INDEX IF NOT EXISTS ix_incidents_frequency_snapshot_key
+    ON incidents ((frequency_snapshot->>'anomaly_key'))
+    WHERE frequency_snapshot IS NOT NULL;
--- a/apps/api/migrations/phase28_rag_pgvector.sql
+++ b/apps/api/migrations/phase28_rag_pgvector.sql
@@ -0,0 +1,28 @@
+-- Phase 28 (ADR-067): RAG 知識庫 pgvector 向量表
+-- 2026-04-10 Claude Sonnet 4.6 Asia/Taipei
+-- 前置: pgvector 0.8.2 已安裝於 awoooi_prod ✅
+-- 索引: 初期線性搜尋 (< 100 筆)；超過 100 筆後執行 CREATE INDEX ivfflat
+
+CREATE EXTENSION IF NOT EXISTS vector;
+
+CREATE TABLE IF NOT EXISTS rag_chunks (
+    id          SERIAL PRIMARY KEY,
+    source      TEXT NOT NULL,               -- 來源: "playbook", "incident", "runbook", "adr"
+    source_id   TEXT,                        -- 來源 ID (playbook_id / incident_id 等)
+    title       TEXT NOT NULL,               -- 標題 / 檔名
+    chunk_text  TEXT NOT NULL,               -- 原始文字片段
+    embedding   vector(768),                 -- nomic-embed-text 768維向量
+    metadata    JSONB DEFAULT '{}',          -- 額外 metadata
+    created_at  TIMESTAMPTZ NOT NULL DEFAULT NOW(),
+    updated_at  TIMESTAMPTZ NOT NULL DEFAULT NOW()
+);
+
+CREATE INDEX IF NOT EXISTS ix_rag_chunks_source ON rag_chunks (source);
+CREATE INDEX IF NOT EXISTS ix_rag_chunks_created ON rag_chunks (created_at DESC);
+
+-- 向量近鄰索引 (超過 100 筆後執行)
+-- CREATE INDEX IF NOT EXISTS ix_rag_chunks_embedding
+--   ON rag_chunks USING ivfflat (embedding vector_cosine_ops)
+--   WITH (lists = 10);
+
+COMMENT ON TABLE rag_chunks IS 'RAG 知識庫向量片段 — Phase 28 ADR-067 (2026-04-10)';
--- a/apps/api/migrations/phase29_pr_reviews.sql
+++ b/apps/api/migrations/phase29_pr_reviews.sql
@@ -0,0 +1,21 @@
+-- Phase 29 (ADR-067): PR 自動審查記錄表
+-- 2026-04-10 Claude Sonnet 4.6 Asia/Taipei
+-- 雙寫: Redis TTL 7d (熱) + PostgreSQL 永久 (冷)
+
+CREATE TABLE IF NOT EXISTS pr_reviews (
+    id              SERIAL PRIMARY KEY,
+    pr_id           TEXT NOT NULL,           -- Gitea PR number (字串化)
+    repo            TEXT NOT NULL,           -- "wooo/awoooi"
+    title           TEXT,                    -- PR 標題
+    diff_size_bytes INTEGER,                 -- diff 大小 (bytes)
+    model           TEXT NOT NULL,           -- qwen2.5-coder:7b / gemini-fallback
+    provider        TEXT NOT NULL DEFAULT 'ollama',
+    review_text     TEXT NOT NULL,           -- 審查全文
+    issues_count    INTEGER DEFAULT 0,       -- 發現問題數
+    created_at      TIMESTAMPTZ NOT NULL DEFAULT NOW()
+);
+
+CREATE INDEX IF NOT EXISTS ix_pr_reviews_pr_id ON pr_reviews (pr_id);
+CREATE INDEX IF NOT EXISTS ix_pr_reviews_created ON pr_reviews (created_at DESC);
+
+COMMENT ON TABLE pr_reviews IS 'PR 自動審查記錄 — Phase 29 ADR-067 (2026-04-10)';
--- a/apps/api/migrations/phase30_drift_narrative.sql
+++ b/apps/api/migrations/phase30_drift_narrative.sql
@@ -0,0 +1,15 @@
+-- Phase 30: Drift 報告 AI 人話摘要欄位
+-- 2026-04-10 Claude Code (ADR-067): DriftNarratorService 寫入 narrative_text
+--   qwen2.5:7b-instruct 生成繁中摘要，儲存於 drift_reports 表
+
+DO $$
+BEGIN
+    IF NOT EXISTS (
+        SELECT 1 FROM information_schema.columns
+        WHERE table_name = 'drift_reports' AND column_name = 'narrative_text'
+    ) THEN
+        ALTER TABLE drift_reports ADD COLUMN narrative_text TEXT DEFAULT NULL;
+        COMMENT ON COLUMN drift_reports.narrative_text IS
+            'AI 生成的繁體中文人話摘要 (qwen2.5:7b-instruct, Phase 30 ADR-067)';
+    END IF;
+END $$;
--- a/apps/api/migrations/phase35_rag_ivfflat_index.sql
+++ b/apps/api/migrations/phase35_rag_ivfflat_index.sql
@@ -0,0 +1,14 @@
+-- Phase 35: RAG ivfflat 向量索引
+-- 前提: rag_chunks 已有 2582+ chunks
+-- 執行: psql awoooi_prod
+-- 2026-04-10 Claude Sonnet 4.6 Asia/Taipei
+
+CREATE INDEX IF NOT EXISTS idx_rag_chunks_embedding
+  ON rag_chunks
+  USING ivfflat (embedding vector_cosine_ops)
+  WITH (lists = 100);
+
+-- 驗證
+SELECT indexname, indexdef
+FROM pg_indexes
+WHERE tablename = 'rag_chunks' AND indexname = 'idx_rag_chunks_embedding';
--- a/apps/api/migrations/phase7_playbooks_table.sql
+++ b/apps/api/migrations/phase7_playbooks_table.sql
@@ -0,0 +1,59 @@
+-- Phase 7: Playbook 萃取功能 — playbooks 資料表
+-- 建立時間: 2026-04-04 (台北時區)
+-- 建立者: Claude Code (Phase 7 補齊 migration)
+-- 對應設計: memory/project_playbook_design.md
+-- 對應模型: apps/api/src/models/playbook.py
+
+CREATE TABLE IF NOT EXISTS playbooks (
+    -- 識別
+    -- 2026-04-04 ogt: 首席架構師 Review — 加 PRIMARY KEY，移除多餘 UNIQUE
+    playbook_id         VARCHAR(32)  PRIMARY KEY,
+
+    -- 元資料
+    name                VARCHAR(256) NOT NULL,
+    description         TEXT         NOT NULL DEFAULT '',
+    status              VARCHAR(32)  NOT NULL DEFAULT 'draft',   -- draft|approved|deprecated
+    source              VARCHAR(32)  NOT NULL DEFAULT 'extracted', -- extracted|manual
+
+    -- 症狀模式 (SymptomPattern JSON)
+    symptom_pattern     JSONB        NOT NULL DEFAULT '{}',
+
+    -- 修復步驟 (list[RepairStep] JSON)
+    repair_steps        JSONB        NOT NULL DEFAULT '[]',
+    estimated_duration_minutes INT  NOT NULL DEFAULT 5,
+
+    -- 來源追溯
+    source_incident_ids TEXT[]       NOT NULL DEFAULT '{}',
+    ai_confidence       DECIMAL(4,3) NOT NULL DEFAULT 0.0,
+
+    -- 統計數據
+    success_count       INT          NOT NULL DEFAULT 0,
+    failure_count       INT          NOT NULL DEFAULT 0,
+    last_used_at        TIMESTAMPTZ,
+
+    -- 人工標記
+    approved_by         VARCHAR(128),
+    approved_at         TIMESTAMPTZ,
+    tags                TEXT[]       NOT NULL DEFAULT '{}',
+    notes               TEXT,
+
+    -- 時間軸
+    created_at          TIMESTAMPTZ  NOT NULL DEFAULT NOW(),
+    updated_at          TIMESTAMPTZ  NOT NULL DEFAULT NOW()
+);
+
+-- 索引
+CREATE INDEX IF NOT EXISTS idx_playbooks_status
+    ON playbooks(status);
+
+CREATE INDEX IF NOT EXISTS idx_playbooks_tags
+    ON playbooks USING GIN(tags);
+
+CREATE INDEX IF NOT EXISTS idx_playbooks_alert_names
+    ON playbooks USING GIN((symptom_pattern->'alert_names'));
+
+CREATE INDEX IF NOT EXISTS idx_playbooks_source_incidents
+    ON playbooks USING GIN(source_incident_ids);
+
+CREATE INDEX IF NOT EXISTS idx_playbooks_created_at
+    ON playbooks(created_at DESC);
--- a/apps/api/migrations/phase8_symptoms_hash.sql
+++ b/apps/api/migrations/phase8_symptoms_hash.sql
@@ -0,0 +1,48 @@
+-- Phase 25 P1: Knowledge Auto-Harvesting — symptoms_hash 欄位
+-- 用於 Anti-Pattern 閉環攔截的確定性症狀 hash
+-- 建立時間: 2026-04-04 (台北時區)
+-- 建立者: Claude Code (Phase 25 P1)
+--
+-- 執行方式: psql -h 192.168.0.188 -U awoooi -d awoooi -f phase8_symptoms_hash.sql
+
+-- 1. knowledge_entries 表新增 symptoms_hash 欄位
+ALTER TABLE knowledge_entries
+    ADD COLUMN IF NOT EXISTS symptoms_hash VARCHAR(16);
+
+-- 2. 建立 index 加速 Anti-Pattern 閘門查詢
+--    查詢條件: entry_type='anti_pattern' AND symptoms_hash=:hash AND created_at>=:cutoff
+CREATE INDEX IF NOT EXISTS idx_knowledge_anti_pattern_hash
+    ON knowledge_entries (entry_type, symptoms_hash, created_at)
+    WHERE entry_type = 'anti_pattern' AND symptoms_hash IS NOT NULL;
+
+-- 3. EntryStatus 新增 PUBLISHED（用於 ANTI_PATTERN 直接發布）
+--    PostgreSQL CHECK constraint 需要重建（若有的話）
+--    若無 constraint，PostgreSQL 的 VARCHAR 欄位可直接存入任意值，無需 ALTER。
+--    確認 status 欄位是否有 CHECK constraint:
+-- SELECT conname, consrc FROM pg_constraint
+--   WHERE conrelid = 'knowledge_entries'::regclass AND contype = 'c';
+
+-- 若有 CHECK constraint（如 status IN ('draft', 'review', 'approved', 'archived')），
+-- 需執行以下（請先確認 constraint 名稱）：
+-- ALTER TABLE knowledge_entries DROP CONSTRAINT IF EXISTS knowledge_entries_status_check;
+-- ALTER TABLE knowledge_entries ADD CONSTRAINT knowledge_entries_status_check
+--   CHECK (status IN ('draft', 'review', 'approved', 'archived', 'published'));
+
+-- 安全執行版本（自動處理 CHECK constraint）：
+DO $$
+DECLARE
+    v_conname text;
+BEGIN
+    SELECT conname INTO v_conname
+    FROM pg_constraint
+    WHERE conrelid = 'knowledge_entries'::regclass AND contype = 'c' AND conname LIKE '%status%';
+
+    IF v_conname IS NOT NULL THEN
+        EXECUTE format('ALTER TABLE knowledge_entries DROP CONSTRAINT %I', v_conname);
+        ALTER TABLE knowledge_entries ADD CONSTRAINT knowledge_entries_status_check
+            CHECK (status IN ('draft', 'review', 'approved', 'archived', 'published'));
+        RAISE NOTICE 'Updated status CHECK constraint: % → added published', v_conname;
+    ELSE
+        RAISE NOTICE 'No status CHECK constraint found, skipping';
+    END IF;
+END $$;
--- a/apps/api/migrations/phase9_drift_reports.sql
+++ b/apps/api/migrations/phase9_drift_reports.sql
@@ -0,0 +1,54 @@
+-- Phase 25 P2: Config Drift Detection — drift_reports 資料表
+-- 建立時間: 2026-04-04 (台北時區)
+-- 建立者: Claude Code (Phase 25 P2)
+-- 對應模型: apps/api/src/models/drift.py
+-- 對應設計: docs/superpowers/specs/2026-04-04-nemotron-active-defense-design.md 方向三
+--
+-- 執行方式: psql -h 192.168.0.188 -U awoooi -d awoooi -f phase9_drift_reports.sql
+
+CREATE TABLE IF NOT EXISTS drift_reports (
+    -- 識別
+    report_id           VARCHAR(32)  PRIMARY KEY,
+
+    -- 掃描資訊
+    namespace           VARCHAR(128) NOT NULL,
+    triggered_by        VARCHAR(64)  NOT NULL DEFAULT 'cron',  -- cron / webhook / api
+    scanned_at          TIMESTAMPTZ  NOT NULL DEFAULT NOW(),
+
+    -- 計數（非正規化，避免每次 JOIN）
+    high_count          INT          NOT NULL DEFAULT 0,
+    medium_count        INT          NOT NULL DEFAULT 0,
+    info_count          INT          NOT NULL DEFAULT 0,
+
+    -- 漂移項目（JSONB 列表）
+    items               JSONB        NOT NULL DEFAULT '[]',
+
+    -- Nemotron 意圖分析
+    interpretation      JSONB,  -- DriftInterpretation，可為 NULL（尚未分析）
+
+    -- 處理狀態
+    status              VARCHAR(32)  NOT NULL DEFAULT 'pending',
+    -- pending / acknowledged / rolled_back / adopted / ignored
+
+    -- 時間軸
+    created_at          TIMESTAMPTZ  NOT NULL DEFAULT NOW(),
+    resolved_at         TIMESTAMPTZ
+);
+
+-- 索引
+CREATE INDEX IF NOT EXISTS idx_drift_reports_namespace
+    ON drift_reports(namespace);
+
+CREATE INDEX IF NOT EXISTS idx_drift_reports_status
+    ON drift_reports(status);
+
+CREATE INDEX IF NOT EXISTS idx_drift_reports_created_at
+    ON drift_reports(created_at DESC);
+
+CREATE INDEX IF NOT EXISTS idx_drift_reports_high_count
+    ON drift_reports(high_count)
+    WHERE high_count > 0;
+
+-- 說明：
+-- 目前 API 使用 in-memory dict 暫存，此表供未來持久化使用
+-- 啟用持久化後，需在 drift.py 的 _recent_reports 操作改為 DB 寫入
--- a/apps/api/migrations/phase_aiops_p1_p2_p6_tables.sql
+++ b/apps/api/migrations/phase_aiops_p1_p2_p6_tables.sql
@@ -0,0 +1,85 @@
+-- AIOps Phase 1 / Phase 2 / Phase 6 — 補齊缺失 DB 表
+-- ADR-081 (P1 EvidenceSnapshot) + ADR-082 (P2 AgentSession) + ADR-087 (P6 GovernanceEvent)
+-- 2026-04-15 ogt + Claude Sonnet 4.6（亞太）: 補齊三張缺失表，全開 P1-P6 必需
+
+-- ============================================================================
+-- 1. incident_evidence — ADR-081 Phase 1 EvidenceSnapshot 持久化
+-- ============================================================================
+CREATE TABLE IF NOT EXISTS incident_evidence (
+    id                      VARCHAR(36)     PRIMARY KEY,
+    incident_id             VARCHAR(30)     NOT NULL,
+    matched_playbook_id     VARCHAR(36),
+    schema_version          VARCHAR(10)     NOT NULL DEFAULT 'v1',
+
+    -- 8D 感官數據
+    k8s_state               JSONB,
+    recent_logs             TEXT,
+    metrics_snapshot        JSONB,
+    recent_deployments      JSONB,
+    business_metrics        JSONB,
+    historical_context      TEXT,
+    peer_health             JSONB,
+    dependency_topology     JSONB,
+    anomaly_context         JSONB,
+
+    -- 感官品質指標
+    mcp_health              JSONB           NOT NULL DEFAULT '{}',
+    collection_duration_ms  INTEGER,
+    sensors_attempted       INTEGER         NOT NULL DEFAULT 0,
+    sensors_succeeded       INTEGER         NOT NULL DEFAULT 0,
+
+    -- LLM 輸入摘要
+    evidence_summary        TEXT,
+
+    -- 執行前後 State
+    pre_execution_state     JSONB,
+    post_execution_state    JSONB,
+    verification_result     VARCHAR(20),
+
+    -- 時間戳
+    collected_at            TIMESTAMPTZ     NOT NULL DEFAULT NOW()
+);
+
+CREATE INDEX IF NOT EXISTS ix_incident_evidence_incident_id    ON incident_evidence (incident_id);
+CREATE INDEX IF NOT EXISTS ix_incident_evidence_collected_at   ON incident_evidence (collected_at);
+CREATE INDEX IF NOT EXISTS ix_incident_evidence_playbook_id    ON incident_evidence (matched_playbook_id);
+
+
+-- ============================================================================
+-- 2. agent_sessions — ADR-082 Phase 2 多 Agent 辯證 Immutable Event Log
+-- ============================================================================
+CREATE TABLE IF NOT EXISTS agent_sessions (
+    id          VARCHAR(36)     PRIMARY KEY,
+    session_id  VARCHAR(36)     NOT NULL,
+    incident_id VARCHAR(50)     NOT NULL,
+    agent_role  VARCHAR(20)     NOT NULL,
+    input_hash  VARCHAR(16)     NOT NULL DEFAULT '',
+    output_json JSONB           NOT NULL DEFAULT '{}',
+    latency_ms  INTEGER         NOT NULL DEFAULT 0,
+    vote        VARCHAR(20)     NOT NULL DEFAULT 'abstain',
+    degraded    BOOLEAN         NOT NULL DEFAULT FALSE,
+    created_at  TIMESTAMPTZ     NOT NULL DEFAULT NOW()
+);
+
+CREATE INDEX IF NOT EXISTS ix_agent_sessions_session_id    ON agent_sessions (session_id);
+CREATE INDEX IF NOT EXISTS ix_agent_sessions_incident_id   ON agent_sessions (incident_id);
+CREATE INDEX IF NOT EXISTS ix_agent_sessions_created_at    ON agent_sessions (created_at);
+CREATE INDEX IF NOT EXISTS ix_agent_sessions_session_role  ON agent_sessions (session_id, agent_role);
+
+
+-- ============================================================================
+-- 3. ai_governance_events — ADR-087 Phase 6 自我治理事件（不可變）
+-- ============================================================================
+CREATE TABLE IF NOT EXISTS ai_governance_events (
+    id           VARCHAR(36)  PRIMARY KEY,
+    event_type   VARCHAR(40)  NOT NULL,
+    triggered_at TIMESTAMPTZ  NOT NULL DEFAULT NOW(),
+    details      JSONB        NOT NULL DEFAULT '{}',
+    resolved     BOOLEAN      NOT NULL DEFAULT FALSE,
+    resolved_at  TIMESTAMPTZ,
+    resolved_by  VARCHAR(100)
+);
+
+CREATE INDEX IF NOT EXISTS ix_ai_governance_events_event_type    ON ai_governance_events (event_type);
+CREATE INDEX IF NOT EXISTS ix_ai_governance_events_triggered_at  ON ai_governance_events (triggered_at);
+CREATE INDEX IF NOT EXISTS ix_ai_governance_events_resolved      ON ai_governance_events (resolved);
--- a/apps/api/migrations/sprint51_alert_log_events.sql
+++ b/apps/api/migrations/sprint51_alert_log_events.sql
@@ -0,0 +1,18 @@
+-- apps/api/migrations/sprint51_alert_log_events.sql
+-- Sprint 5.1 M-003: alert_operation_log ENUM 擴充
+-- 執行者: Claude Sonnet 4.6 / 2026-04-08 Asia/Taipei
+-- ⚠️  ENUM ADD VALUE 不可 rollback，執行前確認已備份
+-- 說明: 新增 8 個 event_type 支援 Guardrail / Pre-flight / MultiSig / 備份追蹤
+
+BEGIN;
+
+ALTER TYPE alert_event_type ADD VALUE IF NOT EXISTS 'GUARDRAIL_BLOCKED';
+ALTER TYPE alert_event_type ADD VALUE IF NOT EXISTS 'PRE_FLIGHT_PASSED';
+ALTER TYPE alert_event_type ADD VALUE IF NOT EXISTS 'PRE_FLIGHT_FAILED';
+ALTER TYPE alert_event_type ADD VALUE IF NOT EXISTS 'BACKUP_TRIGGERED';
+ALTER TYPE alert_event_type ADD VALUE IF NOT EXISTS 'BACKUP_COMPLETED';
+ALTER TYPE alert_event_type ADD VALUE IF NOT EXISTS 'BACKUP_FAILED';
+ALTER TYPE alert_event_type ADD VALUE IF NOT EXISTS 'APPROVAL_ESCALATED';
+ALTER TYPE alert_event_type ADD VALUE IF NOT EXISTS 'CHANGE_APPLIED';
+
+COMMIT;
--- a/apps/api/migrations/sprint51_approval_multisig.sql
+++ b/apps/api/migrations/sprint51_approval_multisig.sql
@@ -0,0 +1,31 @@
+-- apps/api/migrations/sprint51_approval_multisig.sql
+-- Sprint 5.1 M-002: MultiSig 雙簽核支援
+-- 執行者: Claude Sonnet 4.6 / 2026-04-08 Asia/Taipei
+-- 說明: approval_records 新增 approval_level / approval_votes / required_votes
+
+BEGIN;
+
+ALTER TABLE approval_records
+  ADD COLUMN IF NOT EXISTS approval_level VARCHAR(20)
+    DEFAULT 'standard'
+    CHECK (approval_level IN ('standard', 'critical')),
+  ADD COLUMN IF NOT EXISTS approval_votes JSONB
+    DEFAULT '[]'::jsonb,
+  ADD COLUMN IF NOT EXISTS required_votes INTEGER
+    DEFAULT 1;
+
+COMMENT ON COLUMN approval_records.approval_level IS
+  'standard=1票審核, critical=2票MultiSig';
+COMMENT ON COLUMN approval_records.approval_votes IS
+  'JSON array: [{"user_id": "123", "voted_at": "2026-04-08T...", "action": "approve"}]';
+COMMENT ON COLUMN approval_records.required_votes IS
+  'standard=1, critical=2';
+
+-- 現有記錄回填（向後相容）
+UPDATE approval_records
+SET approval_level = 'standard',
+    required_votes = 1,
+    approval_votes = '[]'::jsonb
+WHERE approval_level IS NULL;
+
+COMMIT;
--- a/apps/api/migrations/sprint5r_telegram_message_id.sql
+++ b/apps/api/migrations/sprint5r_telegram_message_id.sql
@@ -0,0 +1,10 @@
+-- Sprint 5R: 批准執行閉環修復 — 新增 Telegram 訊息持久化欄位
+-- 2026-04-09 Claude Sonnet 4.6: C1 架構 Review 修復
+-- 用途: 批准卡片發送後記錄 message_id/chat_id，供後續 editMessageReplyMarkup 移除按鈕
+
+ALTER TABLE approval_records
+  ADD COLUMN IF NOT EXISTS telegram_message_id INTEGER,
+  ADD COLUMN IF NOT EXISTS telegram_chat_id INTEGER;
+
+COMMENT ON COLUMN approval_records.telegram_message_id IS 'Telegram message_id of approval card, used to remove inline keyboard after decision';
+COMMENT ON COLUMN approval_records.telegram_chat_id IS 'Telegram chat_id where approval card was sent';
--- a/apps/api/models.json
+++ b/apps/api/models.json
@@ -1,9 +1,9 @@
 {
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "name": "OpenClaw AI Router Configuration",
-  "version": "1.1.0",
-  "description": "AI 模型路由與備援設定 (ADR-006 + ADR-036 Nemotron)",
-  "updated_at": "2026-03-29",
+  "version": "1.3.0",
+  "description": "AI 模型路由與備援設定 (ADR-006 + ADR-036 Nemotron + D1 ADR-067 五大應用 2026-04-11)",
+  "updated_at": "2026-04-11",

  "default_provider": "ollama",
  "fallback_order": ["ollama", "gemini", "claude"],
@@ -11,15 +11,24 @@

  "providers": {
    "ollama": {
-      "name": "Ollama (Local)",
+      "name": "Ollama (Local M1 Pro)",
      "enabled": true,
      "priority": 1,
-      "endpoint": "http://192.168.0.188:11434",
+      "endpoint": "http://192.168.0.111:11434",
      "api_path": "/api/generate",
      "models": {
-        "default": "qwen2.5:7b-instruct",
-        "rca": "qwen2.5:7b-instruct",
-        "summary": "llama3.2:3b"
+        "default": "deepseek-r1:14b",
+        "rca": "deepseek-r1:14b",
+        "summary": "gemma3:4b",
+        "drift_summary": "qwen2.5:7b-instruct",
+        "drift_intent": "qwen2.5:7b-instruct",
+        "log_anomaly": "deepseek-r1:14b",
+        "nemoclaw": "deepseek-r1:14b",
+        "playbook_draft": "qwen2.5:7b-instruct",
+        "code_review": "qwen2.5-coder:7b",
+        "embedding": "nomic-embed-text",
+        "rag_generate": "qwen2.5:7b-instruct",
+        "image_analysis": "llava:latest"
      },
      "options": {
        "temperature": 0.1,
@@ -27,7 +36,7 @@
        "num_predict": 1024,
        "format": "json"
      },
-      "timeout_seconds": 90,
+      "timeout_seconds": 120,
      "cost": {
        "per_1k_tokens": 0,
        "currency": "USD"
@@ -144,6 +153,50 @@
    }
  },

+  "adr067_ollama_applications": {
+    "description": "ADR-067 五大 Ollama 本地 AI 應用 (Phase 30-34)，endpoint: http://192.168.0.111:11434",
+    "endpoint": "http://192.168.0.111:11434",
+    "applications": {
+      "drift_summary": {
+        "phase": 30,
+        "model": "qwen2.5:7b-instruct",
+        "timeout_seconds": 90,
+        "purpose": "Config Drift 報告中文摘要"
+      },
+      "log_anomaly_summary": {
+        "phase": 31,
+        "model": "deepseek-r1:14b",
+        "timeout_seconds": 120,
+        "purpose": "K8s log 異常摘要，告警後觸發"
+      },
+      "pr_code_review": {
+        "phase": 32,
+        "model": "qwen2.5-coder:7b",
+        "timeout_seconds": 120,
+        "purpose": "Gitea PR 自動審查"
+      },
+      "rag_embed": {
+        "phase": 33,
+        "model": "nomic-embed-text",
+        "dimensions": 768,
+        "timeout_seconds": 30,
+        "purpose": "RAG 知識庫向量化，pgvector 儲存"
+      },
+      "rag_generate": {
+        "phase": 33,
+        "model": "qwen2.5:7b-instruct",
+        "timeout_seconds": 60,
+        "purpose": "RAG 查詢回答生成，top_k=5"
+      },
+      "image_analysis": {
+        "phase": 34,
+        "model": "llava:latest",
+        "timeout_seconds": 60,
+        "purpose": "Telegram 圖片分析"
+      }
+    }
+  },
+
  "use_cases": {
    "rca_analysis": {
      "description": "Root Cause Analysis for alerts",
--- a/apps/api/pyproject.toml
+++ b/apps/api/pyproject.toml
@@ -37,6 +37,15 @@ dependencies = [
    # 請參閱 apps/api/Dockerfile Phase 6.4i 註解
    # Phase 9: Agent Teams - Claude Agent SDK
    "claude-agent-sdk>=0.1.50",
+    # Sprint 5.1 2026-04-08 Claude Sonnet 4.6: Service Registry YAML 讀取
+    "pyyaml>=6.0.0",
+    # Phase 4 ADR-084: 動態異常偵測 (2026-04-15 ogt: 補齊缺失依賴)
+    "statsmodels>=0.14.0",
+    "drain3>=0.9.11",
+    "sse-starlette>=1.8.0",
+    # 2026-04-16 ogt + Claude Sonnet 4.6: SSH MCP sensor 修復 — asyncssh 缺失導致 sensors_succeeded=0
+    # 根因: ssh_provider.py 中 import asyncssh 在 try/except 外，所有 15 個 SSH tool 直接 ImportError
+    "asyncssh>=2.14.0",
 ]

 # [tool.uv.sources]
@@ -104,6 +113,7 @@ ignore_errors = true
 [tool.pytest.ini_options]
 asyncio_mode = "auto"
 testpaths = ["tests"]
+addopts = "-m 'not integration'"
 markers = [
    "integration: 需要外部服務 (Redis/PostgreSQL/K8s) 的整合測試，需在有外部服務的環境執行",
 ]
--- a/apps/api/requirements.txt
+++ b/apps/api/requirements.txt
@@ -43,6 +43,16 @@ opentelemetry-instrumentation-logging>=0.41b0
 # 2026-04-02 Claude Code: 鎖定 v2.60.x — v3.x/v4.x 移除 client.trace() API，與 langfuse_client.py 不相容
 langfuse>=2.0.0,<3.0.0

+# ==========================================================================
+# Phase 4: 動態異常偵測 (ADR-084)
+# ==========================================================================
+# Holt-Winters 指數平滑（動態基線）
+statsmodels>=0.14.0
+# Log clustering（Drain3 演算法）
+drain3>=0.9.11
+# numpy 已為 statsmodels 依賴，顯式列出確保可用（線性趨勢預測）
+numpy>=1.24.0
+
 # Development
 pytest>=7.4.0
 pytest-asyncio>=0.23.0
--- a/apps/api/scripts/update_playbook_alert_variants.py
+++ b/apps/api/scripts/update_playbook_alert_variants.py
@@ -0,0 +1,141 @@
+"""
+Phase 2 飛輪修復：補齊 Playbook alertname 變體
+=================================================
+直接更新 Redis 裡的 Playbook symptom_pattern.alert_names，
+並重建 playbook:index:alert:* 索引。
+
+用法（在 API pod 內執行）:
+    python scripts/update_playbook_alert_variants.py
+
+或從本機執行（需能連 Redis）:
+    AWOOOI_REDIS_URL=redis://192.168.0.188:6380/10 python scripts/update_playbook_alert_variants.py
+
+2026-04-10 Asia/Taipei — Claude Sonnet 4.6
+"""
+
+import asyncio
+import json
+import os
+import sys
+
+import redis
+
+# Playbook 補充的 alertname 變體
+# key: playbook name (用於搜尋), value: 新增的 alertname list
+VARIANTS: dict[str, list[str]] = {
+    "high-cpu-restart": [
+        "HighCPUUsage",
+        "ContainerCpuUsageSecondsTotal",
+        "HostHighCpuLoad",
+        "NodeCPUUsageHigh",
+        "CPUThrottlingHigh",
+        "KubeCPUOvercommit",
+    ],
+    "crashloop-pod-delete": [
+        "KubePodCrashLooping",
+        "PodCrashLoopBackOff",
+        "KubernetesPodCrashLooping",
+    ],
+    "oom-killed-pod-delete": [
+        "PodOOMKilled",
+        "KubePodOOMKilled",
+        "KubernetesMemoryPressure",
+        "NodeMemoryUsageHigh",
+        "HighMemoryUsage",
+    ],
+    "k8s-pod-not-ready-restart": [
+        "KubePodNotReady",
+        "PodNotReady",
+        "KubernetesDeploymentReplicasMismatch",
+    ],
+    "insufficient-replicas-scale": [
+        "KubeDeploymentReplicasMismatch",
+        "InsufficientReplicas",
+        "KubernetesReplicasMismatch",
+    ],
+}
+
+PLAYBOOK_KEY_PREFIX = "playbook:"
+PLAYBOOK_INDEX_ALERT_PREFIX = "playbook:index:alert:"
+PLAYBOOK_TTL_SECONDS = 86400 * 30  # 30 天
+
+
+def get_redis_client() -> redis.Redis:
+    url = os.environ.get("AWOOOI_REDIS_URL", "redis://192.168.0.188:6380/10")
+    return redis.Redis.from_url(url)
+
+
+def update_playbooks(r: redis.Redis) -> None:
+    # 掃描所有 Playbook keys
+    all_keys = [k.decode() for k in r.keys(f"{PLAYBOOK_KEY_PREFIX}PB-*")]
+    print(f"Found {len(all_keys)} playbook keys in Redis")
+
+    updated = 0
+    skipped = 0
+
+    for key in all_keys:
+        raw = r.get(key)
+        if not raw:
+            continue
+
+        pb = json.loads(raw)
+        pb_name = pb.get("name", "")
+
+        if pb_name not in VARIANTS:
+            skipped += 1
+            continue
+
+        target_alerts = VARIANTS[pb_name]
+        sp = pb.get("symptom_pattern", {})
+        current_alerts: list[str] = sp.get("alert_names", [])
+
+        # 合併（保留現有 + 加入新的，去重）
+        merged = list(dict.fromkeys(current_alerts + target_alerts))
+
+        if merged == current_alerts:
+            print(f"  {pb_name}: already up to date, skip")
+            skipped += 1
+            continue
+
+        sp["alert_names"] = merged
+        pb["symptom_pattern"] = sp
+
+        # 寫回 Redis
+        r.set(key, json.dumps(pb, ensure_ascii=False), ex=PLAYBOOK_TTL_SECONDS)
+
+        # 重建 alert index
+        pb_id = pb.get("playbook_id", key.replace(PLAYBOOK_KEY_PREFIX, ""))
+        for alert_name in merged:
+            idx_key = f"{PLAYBOOK_INDEX_ALERT_PREFIX}{alert_name}"
+            r.sadd(idx_key, pb_id)
+            r.expire(idx_key, PLAYBOOK_TTL_SECONDS)
+
+        added = [a for a in merged if a not in current_alerts]
+        print(f"  {pb_name}: added {added}")
+        updated += 1
+
+    print(f"\nDone: {updated} updated, {skipped} skipped")
+
+    # 驗證
+    print("\nVerification:")
+    for check_alert in [
+        "HostHighCpuLoad", "KubernetesPodCrashLooping",
+        "NodeMemoryUsageHigh", "HighMemoryUsage",
+        "KubernetesReplicasMismatch",
+    ]:
+        idx_key = f"{PLAYBOOK_INDEX_ALERT_PREFIX}{check_alert}"
+        members = [m.decode() for m in r.smembers(idx_key)]
+        status = "✅" if members else "❌"
+        print(f"  {status} {check_alert} → {members}")
+
+
+if __name__ == "__main__":
+    r = get_redis_client()
+    try:
+        r.ping()
+        print(f"Redis connected: {os.environ.get('AWOOOI_REDIS_URL', 'redis://192.168.0.188:6380/10')}\n")
+    except Exception as e:
+        print(f"Redis connection failed: {e}")
+        sys.exit(1)
+
+    update_playbooks(r)
--- a/apps/api/src/agents/base.py
+++ b/apps/api/src/agents/base.py
@@ -163,11 +163,13 @@ class BaseAgent(ABC, Generic[T]):
            except json.JSONDecodeError:
                pass

-        # 嘗試 { ... } 格式
-        match = re.search(r"\{[^{}]*\}", text, re.DOTALL)
-        if match:
+        # 嘗試從第一個 { 到最後一個 } 提取（支援巢狀 JSON）
+        # Gate 2: 舊 r"\{[^{}]*\}" 會拒絕巢狀物件，造成所有 Agent LLM 回應解析失敗
+        start = text.find("{")
+        end = text.rfind("}")
+        if start != -1 and end > start:
            try:
-                return json.loads(match.group(0))
+                return json.loads(text[start:end + 1])
            except json.JSONDecodeError:
                pass

--- a/apps/api/src/agents/coordinator_agent.py
+++ b/apps/api/src/agents/coordinator_agent.py
@@ -0,0 +1,342 @@
+"""
+AWOOOI AIOps Phase 2 — Coordinator Agent（指揮官）
+==================================================
+職責：聚合所有 Agent 輸出，做最終決策
+
+輸入：DiagnosisReport + ActionPlan + ReviewVerdict + CriticReport
+輸出：DecisionPackage（recommended_action + confidence + requires_human_approval）
+
+聚合邏輯：
+1. Reviewer REJECT → blocked_actions 全部禁止執行，強制人工審核
+2. Reviewer REQUEST_REVISION → 過濾高 blast_radius 方案，使用 safe_candidates
+3. Critic 有 critical challenge → confidence 降低 CRITIC_PENALTY
+4. 全 Agent degraded → requires_human_approval = True（安全第一）
+5. Diagnostician ABSTAIN 且無有效假設 → requires_human_approval = True
+6. 最終 confidence < HUMAN_ESCALATION_THRESHOLD → requires_human_approval = True
+
+ADR-082: Phase 2 多 Agent 協作
+2026-04-15 ogt + Claude Sonnet 4.6（亞太）: Phase 2 初始建立
+"""
+
+from __future__ import annotations
+
+import time
+from typing import Any
+
+import structlog
+
+from src.agents.base import BaseAgent
+from src.agents.protocol import (
+    ActionPlan,
+    AgentRole,
+    AgentSessionStatus,
+    AgentVote,
+    CriticReport,
+    DecisionPackage,
+    DiagnosisReport,
+    ReviewVerdict,
+)
+
+logger = structlog.get_logger(__name__)
+
+# confidence 低於此閾值 → 強制人工審核
+HUMAN_ESCALATION_THRESHOLD = 0.4
+
+# Critic critical challenge 懲罰係數
+CRITIC_PENALTY = 0.3
+
+
+class CoordinatorAgent(BaseAgent):
+    """
+    Coordinator Agent — 指揮官（最終決策者）
+
+    Usage:
+        agent = CoordinatorAgent()
+        package = await agent.run(diagnosis, plan, verdict, critic)
+    """
+
+    AGENT_NAME = AgentRole.COORDINATOR.value
+    AGENT_DESCRIPTION = (
+        "Final decision synthesizer. Aggregates all agent outputs into "
+        "a single actionable DecisionPackage."
+    )
+
+    async def run(
+        self,
+        diagnosis: DiagnosisReport,
+        plan: ActionPlan,
+        verdict: ReviewVerdict,
+        critic: CriticReport,
+    ) -> DecisionPackage:
+        """
+        聚合所有 Agent 輸出，決定最終行動。
+
+        Args:
+            diagnosis: Diagnostician 輸出
+            plan: Solver 輸出
+            verdict: Reviewer 安全審查結果
+            critic: Critic 批判性審查結果
+
+        Returns:
+            DecisionPackage（不熔斷，Coordinator 必須輸出決策）
+        """
+        start_ms = int(time.monotonic() * 1000)
+
+        package = self._synthesize(diagnosis, plan, verdict, critic)
+        package.latency_ms = int(time.monotonic() * 1000) - start_ms
+
+        logger.info(
+            "coordinator_done",
+            recommended_action=package.recommended_action,
+            confidence=package.confidence,
+            requires_human=package.requires_human_approval,
+            all_degraded=package.all_agents_degraded,
+            session_status=package.session_status,
+            latency_ms=package.latency_ms,
+        )
+        return package
+
+    def _synthesize(
+        self,
+        diagnosis: DiagnosisReport,
+        plan: ActionPlan,
+        verdict: ReviewVerdict,
+        critic: CriticReport,
+    ) -> DecisionPackage:
+        """核心聚合邏輯（純函數，無 LLM，不可失敗）。"""
+        top = plan.top_candidate
+        base_confidence = top.confidence if top else 0.0
+        selected = top
+
+        # ── 1. Reviewer REJECT → 最優先安全門 ──────────────────────────
+        if verdict.vote == AgentVote.REJECT:
+            return DecisionPackage(
+                recommended_action=None,
+                confidence=0.0,
+                requires_human_approval=True,
+                debate_summary=_build_summary(diagnosis, plan, verdict, critic),
+                session_status=AgentSessionStatus.COMPLETED,
+                latency_ms=0,
+                diagnosis=diagnosis,
+                action_plan=plan,
+                reviewer_verdict=verdict,
+                critic_report=critic,
+                blocked_reason=f"Reviewer 拒絕：{verdict.reason}",
+            )
+
+        # ── 2. Reviewer REQUEST_REVISION → 強制人工審核（Solver 未修訂，不可自動執行）─
+        # Gate 2: REQUEST_REVISION 代表「請 Solver 重新設計方案」，此 Phase 無迭代機制
+        # → 保留 safe_candidates 供人工參考，但 requires_human_approval 必須 True
+        if verdict.vote == AgentVote.REQUEST_REVISION:
+            safe_candidates = [
+                c for c in plan.candidates
+                if c.action not in verdict.blocked_actions
+            ]
+            selected = safe_candidates[0] if safe_candidates else None
+
+            # 2026-04-17 ogt + Claude Sonnet 4.6: 無安全候選 → 回退 Solver 原始最優方案
+            # 根因：safe_candidates=[] → selected=None → recommended_action=None
+            #       → decision_manager action="" → TG 卡顯示「待分析」（資訊流斷裂）
+            # 修復：強制輸出 Solver 原始最優建議（標記 [Reviewer 未核准，僅供參考]）
+            #       資訊流絕不可斷，SRE 永遠需要看到 AI 的建議作為參考依據
+            _all_blocked = (selected is None and bool(plan.candidates))
+            if selected is None and plan.top_candidate:
+                selected = plan.top_candidate
+
+            base_confidence = selected.confidence if selected else 0.0
+            if selected:
+                _recommended = (
+                    f"[Reviewer 未核准，僅供參考] {selected.action}"
+                    if _all_blocked
+                    else selected.action
+                )
+            else:
+                _recommended = "(無可用方案，請人工研判根因後執行)"
+
+            return DecisionPackage(
+                recommended_action=_recommended,
+                confidence=base_confidence,
+                requires_human_approval=True,
+                debate_summary=_build_summary(diagnosis, plan, verdict, critic),
+                session_status=AgentSessionStatus.COMPLETED,
+                latency_ms=0,
+                diagnosis=diagnosis,
+                action_plan=plan,
+                reviewer_verdict=verdict,
+                critic_report=critic,
+                blocked_reason=f"Reviewer REQUEST_REVISION：{verdict.reason}",
+            )
+
+        # ── 3. Critic REJECT（critical challenge）→ 硬閘強制人工 ─────────
+        # 驗證發現：penalty 策略（0.82-0.30=0.52）仍可穿透 0.4 閾值
+        # Critic 投 REJECT 代表「這個決策不能執行」，應等同 Reviewer REJECT 效力
+        if critic.vote == AgentVote.REJECT:
+            top_challenge = critic.challenges[0] if critic.challenges else None
+            return DecisionPackage(
+                recommended_action=selected.action if selected else None,
+                confidence=base_confidence,
+                requires_human_approval=True,
+                debate_summary=_build_summary(diagnosis, plan, verdict, critic),
+                session_status=AgentSessionStatus.COMPLETED,
+                latency_ms=0,
+                diagnosis=diagnosis,
+                action_plan=plan,
+                reviewer_verdict=verdict,
+                critic_report=critic,
+                blocked_reason=(
+                    f"Critic REJECT：{top_challenge.argument[:100]}"
+                    if top_challenge else "Critic 強烈反對此方案"
+                ),
+            )
+
+        # ── 3.5 Critic major/minor challenge → 信心懲罰（軟降，不強制人工）
+        adjusted_confidence = base_confidence
+        if critic.has_critical_challenge:
+            # has_critical_challenge 為 True 但 vote != REJECT 理論上不應發生
+            # 保留 penalty 作為 defense-in-depth
+            adjusted_confidence = max(0.0, base_confidence - CRITIC_PENALTY)
+            logger.info(
+                "coordinator_critic_penalty",
+                before=base_confidence,
+                after=adjusted_confidence,
+            )
+
+        # ── 4. 全 Agent 降級 → 強制人工 ──────────────────────────────────
+        # Gate 2: 原本遺漏 verdict.degraded，Reviewer 熔斷時 all_degraded 被低估
+        all_degraded = diagnosis.degraded and plan.degraded and verdict.degraded and critic.degraded
+        if all_degraded:
+            return DecisionPackage(
+                recommended_action=selected.action if selected else None,
+                confidence=adjusted_confidence,
+                requires_human_approval=True,
+                debate_summary=_build_summary(diagnosis, plan, verdict, critic),
+                session_status=AgentSessionStatus.DEGRADED,
+                latency_ms=0,
+                diagnosis=diagnosis,
+                action_plan=plan,
+                reviewer_verdict=verdict,
+                critic_report=critic,
+                blocked_reason="所有 Agent 皆降級，信心不可信，強制人工審核",
+                all_agents_degraded=True,
+            )
+
+        # ── 5. Diagnostician 無有效假設 → 強制人工 ───────────────────────
+        if not diagnosis.hypotheses or diagnosis.vote == AgentVote.ABSTAIN:
+            return DecisionPackage(
+                recommended_action=selected.action if selected else None,
+                confidence=adjusted_confidence,
+                requires_human_approval=True,
+                debate_summary=_build_summary(diagnosis, plan, verdict, critic),
+                session_status=AgentSessionStatus.DEGRADED,
+                latency_ms=0,
+                diagnosis=diagnosis,
+                action_plan=plan,
+                reviewer_verdict=verdict,
+                critic_report=critic,
+                blocked_reason="Diagnostician ABSTAIN：感官情報不足，需人工判斷根因",
+            )
+
+        # ── 6. confidence 低於閾值 → 強制人工 ────────────────────────────
+        if adjusted_confidence < HUMAN_ESCALATION_THRESHOLD:
+            return DecisionPackage(
+                recommended_action=selected.action if selected else None,
+                confidence=adjusted_confidence,
+                requires_human_approval=True,
+                debate_summary=_build_summary(diagnosis, plan, verdict, critic),
+                session_status=AgentSessionStatus.DEGRADED,
+                latency_ms=0,
+                diagnosis=diagnosis,
+                action_plan=plan,
+                reviewer_verdict=verdict,
+                critic_report=critic,
+                blocked_reason=(
+                    f"信心 {adjusted_confidence:.0%} < 閾值 "
+                    f"{HUMAN_ESCALATION_THRESHOLD:.0%}，需人工確認"
+                ),
+            )
+
+        # ── 7. 自動執行 ────────────────────────────────────────────────────
+        return DecisionPackage(
+            recommended_action=selected.action if selected else None,
+            confidence=adjusted_confidence,
+            requires_human_approval=False,
+            debate_summary=_build_summary(diagnosis, plan, verdict, critic),
+            session_status=AgentSessionStatus.COMPLETED,
+            latency_ms=0,
+            diagnosis=diagnosis,
+            action_plan=plan,
+            reviewer_verdict=verdict,
+            critic_report=critic,
+        )
+
+    def _build_prompt(self, _context: dict[str, Any]) -> str:
+        return ""  # Coordinator 不使用 LLM（純規則聚合）
+
+    def _parse_response(self, _response: str) -> dict[str, Any]:
+        return {}
+
+    def analyze(self, context: dict[str, Any]) -> Any:
+        raise NotImplementedError("Use run() for Phase 2 agents")
+
+
+# ─────────────────────────────────────────────────────────────────────────────
+# Helpers
+# ─────────────────────────────────────────────────────────────────────────────
+
+def _build_summary(
+    diagnosis: DiagnosisReport,
+    plan: ActionPlan,
+    verdict: ReviewVerdict,
+    critic: CriticReport,
+) -> str:
+    """產生 debate_summary（結構化文字，限 1000 字）。"""
+    parts = []
+
+    top_h = diagnosis.top_hypothesis
+    if top_h:
+        parts.append(
+            f"診斷：{top_h.description[:200]}（信心 {top_h.confidence:.0%}，"
+            f"{'降級' if diagnosis.degraded else '正常'}）"
+        )
+    else:
+        parts.append("診斷：無有效假設（ABSTAIN）")
+
+    top_c = plan.top_candidate
+    if top_c:
+        parts.append(
+            f"方案：{top_c.action[:100]}（blast_radius={top_c.blast_radius}，"
+            f"{'降級' if plan.degraded else '正常'}）"
+        )
+    else:
+        parts.append("方案：無候選動作")
+
+    parts.append(
+        f"安全審查：{verdict.vote.value}（{verdict.reason[:100]}，"
+        f"{'降級' if verdict.degraded else '正常'}）"
+    )
+
+    if critic.challenges:
+        top_ch = critic.challenges[0]
+        parts.append(
+            f"質疑：{top_ch.severity} — {top_ch.argument[:150]}（"
+            f"共 {critic.challenge_count} 項，"
+            f"{'降級' if critic.degraded else '正常'}）"
+        )
+    else:
+        parts.append(f"質疑：無（{'降級' if critic.degraded else '通過審查'}）")
+
+    return "；".join(parts)[:1000]
+
+
+# ─────────────────────────────────────────────────────────────────────────────
+# Singleton
+# ─────────────────────────────────────────────────────────────────────────────
+
+_agent: CoordinatorAgent | None = None
+
+
+def get_coordinator_agent() -> CoordinatorAgent:
+    global _agent
+    if _agent is None:
+        _agent = CoordinatorAgent()
+    return _agent
--- a/apps/api/src/agents/critic_agent.py
+++ b/apps/api/src/agents/critic_agent.py
@@ -0,0 +1,228 @@
+"""
+AWOOOI AIOps Phase 2 — Critic Agent（質疑者）
+=============================================
+職責：刻意唱反調，防止幻覺與 echo chamber
+
+輸入：DiagnosisReport + ActionPlan（兩者都看）
+輸出：CriticReport（challenges[] 列表 + overall_assessment）
+
+設計原則：
+1. Critic 的工作是找漏洞，不是說好話（防 sycophancy）
+2. prompt 強制要求批判性思維：「如果診斷是錯的，還有哪 3 種可能？」
+3. challenge_count > 0 是 Phase 2 退出條件之一
+4. Critic 連續 3 次找到 Diagnostician 嚴重漏洞 → 觸發 Diagnostician 狀態不穩（Phase 4 實作）
+5. 熔斷降級：LLM 失敗 → 輸出空 challenges（不阻塞 Coordinator）
+6. Critic 和 Reviewer 並行執行（都不阻塞對方）
+
+ADR-082: Phase 2 多 Agent 協作
+2026-04-15 ogt + Claude Sonnet 4.6（亞太）: Phase 2 初始建立
+"""
+
+from __future__ import annotations
+
+import hashlib
+import time
+from typing import Any
+
+import structlog
+
+from src.agents.base import BaseAgent
+from src.agents.protocol import (
+    ActionPlan,
+    AgentRole,
+    AgentVote,
+    Challenge,
+    CriticReport,
+    DiagnosisReport,
+)
+from src.services.sanitization_service import sanitize
+
+logger = structlog.get_logger(__name__)
+
+# Critic 挑戰數量上限（防止 LLM 生成無限質疑）
+MAX_CHALLENGES = 5
+
+
+class CriticAgent(BaseAgent):
+    """
+    Critic Agent — 系統性懷疑論者
+
+    Usage:
+        agent = CriticAgent()
+        report = await agent.run(diagnosis, plan)
+    """
+
+    AGENT_NAME = AgentRole.CRITIC.value
+    AGENT_DESCRIPTION = (
+        "Devil's advocate. Challenges diagnosis and proposed actions to prevent "
+        "hallucination and echo chamber effects."
+    )
+
+    async def run(
+        self,
+        diagnosis: DiagnosisReport,
+        plan: ActionPlan,
+        timeout_sec: float = 0.0,  # noqa: ARG002 — 已廢棄，保留簽名相容性
+    ) -> CriticReport:
+        """
+        批判性審查診斷和方案。
+
+        Args:
+            diagnosis: Diagnostician 輸出
+            plan: Solver 輸出
+            timeout_sec: 已廢棄 (2026-04-16 ogt) — LLM 等完整回應，真實異常才降級
+
+        Returns:
+            CriticReport（真實異常時 degraded=True）
+        """
+        start_ms = int(time.monotonic() * 1000)
+
+        try:
+            report = await self._critique(diagnosis, plan)
+            report.latency_ms = int(time.monotonic() * 1000) - start_ms
+            logger.info(
+                "critic_done",
+                challenges=report.challenge_count,
+                has_critical=report.has_critical_challenge,
+                vote=report.vote,
+                latency_ms=report.latency_ms,
+            )
+            return report
+
+        except Exception:
+            latency = int(time.monotonic() * 1000) - start_ms
+            logger.exception("critic_error")
+            return self._degraded_report(latency, "error")
+
+    async def _critique(
+        self,
+        diagnosis: DiagnosisReport,
+        plan: ActionPlan,
+    ) -> CriticReport:
+        """LLM 批判性推理。"""
+        top_hypothesis = diagnosis.top_hypothesis
+        top_candidate = plan.top_candidate
+
+        prompt = self._build_prompt({
+            "hypothesis": top_hypothesis.description if top_hypothesis else "（無假設）",
+            "action": top_candidate.action if top_candidate else "（無方案）",
+            "confidence": top_hypothesis.confidence if top_hypothesis else 0.0,
+        })
+
+        from src.services.openclaw import get_openclaw
+        openclaw = get_openclaw()
+        response_text, _provider, success = await openclaw.call(prompt)
+
+        if not success or not response_text:
+            return self._degraded_report(0, "llm_failed")
+
+        parsed = self._parse_response(sanitize(response_text, "critic_output"))
+        challenges = _extract_challenges(parsed)
+
+        # 有 critical challenge → vote = REJECT
+        vote = AgentVote.REJECT if any(c.severity == "critical" for c in challenges) else AgentVote.APPROVE
+
+        return CriticReport(
+            challenges=challenges,
+            overall_assessment=str(parsed.get("overall_assessment", ""))[:1000],
+            latency_ms=0,
+            vote=vote,
+        )
+
+    def _build_prompt(self, context: dict[str, Any]) -> str:
+        return f"""你是 AWOOOI SRE 系統的質疑者 Agent（Critic）。
+
+你的工作是：找出診斷和方案的弱點。不是說好話，是找漏洞。
+
+當前診斷：{context.get("hypothesis", "")}
+當前方案：{context.get("action", "")}
+診斷信心：{context.get("confidence", 0.0):.0%}
+
+必須回答以下問題（每個問題產出一個 challenge）：
+1. 如果這個診斷是錯的，還有哪些可能的根因？
+2. 這個方案有什麼副作用或風險？
+3. 是否有更好的替代方案被忽略了？
+
+每個 challenge 標記嚴重度：
+- "minor"：小瑕疵，不影響執行
+- "major"：值得 Coordinator 考慮，但不是阻擋條件
+- "critical"：嚴重邏輯漏洞，必須阻止此方案執行
+
+以 JSON 回覆：
+{{
+  "challenges": [
+    {{
+      "target": "diagnosis",
+      "argument": "可能是 OOM 但也可能是 code bug，需要看 GC logs 確認",
+      "severity": "major"
+    }}
+  ],
+  "overall_assessment": "診斷可信但方案風險偏高"
+}}"""
+
+    def _parse_response(self, response: str) -> dict[str, Any]:
+        return self._extract_json(response)
+
+    def analyze(self, context: dict[str, Any]) -> Any:
+        raise NotImplementedError("Use run() for Phase 2 agents")
+
+    def _degraded_report(
+        self,
+        latency_ms: int,
+        reason: str = "unknown",
+    ) -> CriticReport:
+        """熔斷降級：輸出空 challenges（不阻塞 Coordinator）"""
+        return CriticReport(
+            challenges=[],
+            overall_assessment=f"[降級] Critic LLM 失敗（{reason}），跳過批判性審查",
+            latency_ms=latency_ms,
+            vote=AgentVote.ABSTAIN,
+            degraded=True,
+        )
+
+
+# ─────────────────────────────────────────────────────────────────────────────
+# Helpers
+# ─────────────────────────────────────────────────────────────────────────────
+
+def _extract_challenges(parsed: dict[str, Any]) -> list[Challenge]:
+    """從 LLM 解析結果提取 challenges（按嚴重度排序）。"""
+    raw = parsed.get("challenges", [])
+    challenges = []
+    severity_order = {"critical": 0, "major": 1, "minor": 2}
+
+    for item in raw:
+        if not isinstance(item, dict):
+            continue
+        c = Challenge(
+            target=str(item.get("target", "unknown"))[:50],
+            argument=str(item.get("argument", ""))[:500],
+            severity=item.get("severity", "minor") if item.get("severity") in severity_order else "minor",
+        )
+        challenges.append(c)
+
+    challenges.sort(key=lambda c: severity_order.get(c.severity, 2))
+    return challenges[:MAX_CHALLENGES]
+
+
+def compute_input_hash(diagnosis: DiagnosisReport, plan: ActionPlan) -> str:
+    key = diagnosis.evidence_snapshot_id + (
+        diagnosis.top_hypothesis.description if diagnosis.top_hypothesis else ""
+    ) + (
+        plan.top_candidate.action if plan.top_candidate else ""
+    )
+    return hashlib.sha256(key.encode()).hexdigest()[:16]
+
+
+# ─────────────────────────────────────────────────────────────────────────────
+# Singleton
+# ─────────────────────────────────────────────────────────────────────────────
+
+_agent: CriticAgent | None = None
+
+
+def get_critic_agent() -> CriticAgent:
+    global _agent
+    if _agent is None:
+        _agent = CriticAgent()
+    return _agent
--- a/apps/api/src/agents/diagnostician_agent.py
+++ b/apps/api/src/agents/diagnostician_agent.py
@@ -0,0 +1,300 @@
+"""
+AWOOOI AIOps Phase 2 — Diagnostician Agent（偵探）
+==================================================
+職責：RCA 根因分析
+
+輸入：EvidenceSnapshot（8D 感官情報）
+輸出：DiagnosisReport（多根因假設，含 confidence + evidence_chain）
+
+設計原則：
+1. 只做診斷，不提解法（Solver 的工作）
+2. top-1 confidence < 0.4 → vote = ABSTAIN（情報不足，回傳 Coordinator 判斷）
+3. 熔斷降級：LLM 失敗 / 超時 → rule-based mock（以 alert_category 作簡單假設）
+4. 所有 LLM 輸出過 SanitizationService（防 Prompt Injection）
+
+ADR-082: Phase 2 多 Agent 協作
+2026-04-15 ogt + Claude Sonnet 4.6（亞太）: Phase 2 初始建立
+"""
+
+from __future__ import annotations
+
+import hashlib
+import json
+import time
+from typing import TYPE_CHECKING, Any
+
+import structlog
+
+from src.agents.base import BaseAgent, AgentResult, AgentStatus
+from src.agents.protocol import (
+    AgentRole,
+    AgentVote,
+    DiagnosisReport,
+    Hypothesis,
+)
+from src.services.sanitization_service import sanitize
+
+if TYPE_CHECKING:
+    from src.services.evidence_snapshot import EvidenceSnapshot
+
+logger = structlog.get_logger(__name__)
+
+# 每個假設的最大 evidence chain 長度（防超 token）
+MAX_EVIDENCE_CHAIN = 5
+
+# Confidence 閾值 — 低於此值 vote = ABSTAIN
+ABSTAIN_CONFIDENCE_THRESHOLD = 0.4
+
+
+class DiagnosticianAgent(BaseAgent):
+    """
+    Diagnostician Agent — RCA 根因分析偵探
+
+    Usage:
+        agent = DiagnosticianAgent()
+        report = await agent.run(snapshot)
+    """
+
+    AGENT_NAME = AgentRole.DIAGNOSTICIAN.value
+    AGENT_DESCRIPTION = "Root Cause Analysis specialist. Produces multiple hypotheses with confidence scores."
+
+    async def run(
+        self,
+        snapshot: "EvidenceSnapshot",
+        timeout_sec: float = 0.0,  # noqa: ARG002 — 已廢棄，保留簽名相容性
+    ) -> DiagnosisReport:
+        """
+        執行根因分析。
+
+        Args:
+            snapshot: Phase 1 感官快照
+            timeout_sec: 已廢棄（2026-04-16 ogt + Claude Sonnet 4.6 — LLM 必須等完整回應）
+                         降級只在真正異常（連線失敗、模型崩潰）時觸發，
+                         全流程由 Orchestrator GLOBAL_TIMEOUT_SEC 防掛死
+
+        Returns:
+            DiagnosisReport（真實異常時 degraded=True，vote=ABSTAIN）
+        """
+        start_ms = int(time.monotonic() * 1000)
+
+        try:
+            report = await self._analyze(snapshot)
+            report.latency_ms = int(time.monotonic() * 1000) - start_ms
+            logger.info(
+                "diagnostician_done",
+                snapshot_id=snapshot.snapshot_id,
+                hypotheses=len(report.hypotheses),
+                top_confidence=report.top_confidence,
+                vote=report.vote,
+                latency_ms=report.latency_ms,
+            )
+            return report
+
+        except Exception:
+            latency = int(time.monotonic() * 1000) - start_ms
+            logger.exception("diagnostician_error")
+            return self._degraded_report(snapshot, latency, reason="error")
+
+    async def _analyze(self, snapshot: "EvidenceSnapshot") -> DiagnosisReport:
+        """核心 LLM 分析邏輯。"""
+        prompt = self._build_prompt({
+            "evidence_summary": snapshot.evidence_summary or "",
+            "anomaly_context": snapshot.anomaly_context,
+        })
+
+        # 2026-04-16 ogt + Claude Sonnet 4.6: 傳遞 snapshot 結構化資料給 OPENCLAW_NEMO
+        # 根因：原本 call(prompt) 不傳 context → nemo fallback 把 prompt[:500]（系統說明）
+        #       當 signal description → LLM 回傳 "調查 AWOOOI SRE 系統的偵探 Agent" 垃圾
+        # 修復：把 snapshot.evidence_summary 放進 alert_context.signals 讓 nemo 看到真實資料
+        _evidence = (snapshot.evidence_summary or "（待感應器資料）")[:800]
+        alert_context = {
+            "incident_id": snapshot.snapshot_id or "UNKNOWN",
+            "severity": "P3",
+            "signals": [{"alert_name": "evidence_snapshot", "description": _evidence}],
+            "affected_services": [],
+        }
+
+        from src.services.openclaw import get_openclaw
+        openclaw = get_openclaw()
+        response_text, _provider, success = await openclaw.call(prompt, alert_context=alert_context)
+
+        if not success or not response_text:
+            return self._degraded_report(snapshot, 0, reason="llm_failed")
+
+        parsed = self._parse_response(sanitize(response_text, "diagnostician_output"))
+        hypotheses = _extract_hypotheses(parsed)
+
+        vote = AgentVote.APPROVE
+        if not hypotheses or hypotheses[0].confidence < ABSTAIN_CONFIDENCE_THRESHOLD:
+            vote = AgentVote.ABSTAIN
+
+        return DiagnosisReport(
+            hypotheses=hypotheses,
+            evidence_snapshot_id=snapshot.snapshot_id or "",
+            latency_ms=0,  # 由 run() 覆蓋
+            vote=vote,
+        )
+
+    def _build_prompt(self, context: dict[str, Any]) -> str:
+        evidence = context.get("evidence_summary", "（無感官情報）")
+        anomaly_context = context.get("anomaly_context")
+
+        # Phase 4 ADR-084: 動態異常感官區塊（有資料才附加，避免空白雜訊）
+        # 2026-04-15 ogt + Claude Sonnet 4.6（亞太）: Phase 4 8D 升級
+        anomaly_section = ""
+        if anomaly_context:
+            import json as _json
+            anomaly_section = f"""
+---
+Phase 4 動態異常偵測（AI 主動巡檢結果，可作為高信心佐證）：
+{_json.dumps(anomaly_context, ensure_ascii=False, indent=2)}
+---"""
+
+        return f"""你是 AWOOOI SRE 系統的偵探 Agent，專職根因分析（Root Cause Analysis）。
+
+你的唯一工作：根據以下感官情報，提出 2-3 個根因假設（hypotheses）。
+不要提修復方案，那是 Solver 的工作。
+每個假設必須：
+1. 有 confidence（0.0-1.0）
+2. 列出支持此假設的 evidence key（限 {MAX_EVIDENCE_CHAIN} 個）
+3. 有 category（K8s Pod / HostDisk / NetworkLatency / DatabaseConnection / 等）
+
+如果感官情報嚴重不足（所有假設 confidence < 0.4），說明原因。
+
+---
+感官情報：
+{evidence}
+---{anomaly_section}
+
+以 JSON 回覆（不要加任何解釋）：
+{{
+  "hypotheses": [
+    {{
+      "description": "假設描述",
+      "confidence": 0.85,
+      "evidence_chain": ["k8s_state.pod_status", "recent_logs.oom_signal"],
+      "category": "KubePodOOM"
+    }}
+  ]
+}}"""
+
+    def _parse_response(self, response: str) -> dict[str, Any]:
+        return self._extract_json(response)
+
+    def analyze(self, context: dict[str, Any]) -> Any:
+        """BaseAgent 抽象方法 — Phase 2 改用 run() 入口。"""
+        raise NotImplementedError("Use run() for Phase 2 agents")
+
+    def _degraded_report(
+        self,
+        snapshot: "EvidenceSnapshot",
+        latency_ms: int,
+        reason: str = "unknown",
+    ) -> DiagnosisReport:
+        """熔斷降級：rule-based mock（用 alert_category 作簡單假設）"""
+        category = _guess_category_from_snapshot(snapshot)
+        return DiagnosisReport(
+            hypotheses=[
+                Hypothesis(
+                    description=f"[降級] 無法完成 LLM 分析（原因: {reason}）。基於告警類別推測: {category}",
+                    confidence=0.2,
+                    evidence_chain=[],
+                    category=category,
+                )
+            ],
+            evidence_snapshot_id=snapshot.snapshot_id or "",
+            latency_ms=latency_ms,
+            vote=AgentVote.ABSTAIN,
+            degraded=True,
+        )
+
+
+# ─────────────────────────────────────────────────────────────────────────────
+# Helpers
+# ─────────────────────────────────────────────────────────────────────────────
+
+def _extract_hypotheses(parsed: dict[str, Any]) -> list[Hypothesis]:
+    """從 LLM 解析結果提取假設列表（按信心降序）。
+
+    支援兩種格式：
+    1. 標準格式：{"hypotheses": [{description, confidence, evidence_chain, category}]}
+    2. OpenClaw Nemo 格式：{"action_title": "...", "risk_level": "...", "confidence": 0.85}
+       （openclaw_nemo 呼叫 ClawBot /api/v1/analyze/incident 回傳）
+
+    2026-04-16 ogt + Claude Sonnet 4.6: 修復 openclaw_nemo 格式不相容
+        根因: ai_router DIAGNOSE→openclaw_nemo 回傳 action_title 格式，
+              diagnostician 只解析 hypotheses 格式 → 永遠 0 hypotheses → ABSTAIN
+    """
+    # OpenClaw Nemo 格式轉換（有 action_title 但無 hypotheses）
+    if "action_title" in parsed and "hypotheses" not in parsed:
+        action_title = str(parsed.get("action_title", ""))
+        confidence = float(parsed.get("confidence", 0.5))
+        risk_level = str(parsed.get("risk_level", "medium"))
+        # risk_level → category 映射
+        risk_to_cat = {"critical": "CriticalFailure", "high": "HighRisk",
+                       "medium": "ModerateIssue", "low": "LowRisk"}
+        category = risk_to_cat.get(risk_level.lower(), "Unknown")
+        if action_title and confidence > 0:
+            # 2026-04-16 ogt + Claude Sonnet 4.6: 優先用 reasoning 作為假設描述
+            # reasoning（解釋「為什麼」採取行動）比 action_title（「做什麼」）更接近根因
+            # 例: reasoning="CPU 95%, 系統過載" vs action_title="重啟 Pod"
+            nemo_reasoning = str(parsed.get("reasoning", "")).strip()
+            description = nemo_reasoning[:500] if len(nemo_reasoning) > 20 else action_title[:500]
+            return [Hypothesis(
+                description=description,
+                confidence=confidence,
+                evidence_chain=[],
+                category=category,
+            )]
+        return []
+
+    raw = parsed.get("hypotheses", [])
+    hypotheses = []
+    for item in raw:
+        if not isinstance(item, dict):
+            continue
+        h = Hypothesis(
+            description=str(item.get("description", ""))[:500],
+            confidence=float(item.get("confidence", 0.0)),
+            evidence_chain=item.get("evidence_chain", [])[:MAX_EVIDENCE_CHAIN],
+            category=str(item.get("category", "")),
+        )
+        hypotheses.append(h)
+    hypotheses.sort(key=lambda h: h.confidence, reverse=True)
+    return hypotheses
+
+
+def _guess_category_from_snapshot(snapshot: "EvidenceSnapshot") -> str:
+    """降級時從 snapshot 猜測告警類別（最粗粒度兜底）。"""
+    summary = (snapshot.evidence_summary or "").lower()
+    if "oom" in summary or "memory" in summary:
+        return "KubePodOOM"
+    if "crashloop" in summary:
+        return "KubePodCrashLoop"
+    if "disk" in summary:
+        return "HostDiskUsage"
+    if "cpu" in summary:
+        return "HostCpuHigh"
+    if "network" in summary or "timeout" in summary:
+        return "NetworkLatency"
+    return "Unknown"
+
+
+def compute_input_hash(snapshot: "EvidenceSnapshot") -> str:
+    """計算 Diagnostician 輸入的 fingerprint（用於 AgentSession input_hash）。"""
+    key = (snapshot.snapshot_id or "") + (snapshot.evidence_summary or "")[:100]
+    return hashlib.sha256(key.encode()).hexdigest()[:16]
+
+
+# ─────────────────────────────────────────────────────────────────────────────
+# Singleton
+# ─────────────────────────────────────────────────────────────────────────────
+
+_agent: DiagnosticianAgent | None = None
+
+
+def get_diagnostician_agent() -> DiagnosticianAgent:
+    global _agent
+    if _agent is None:
+        _agent = DiagnosticianAgent()
+    return _agent
--- a/apps/api/src/agents/protocol.py
+++ b/apps/api/src/agents/protocol.py
@@ -0,0 +1,253 @@
+"""
+AWOOOI AIOps Phase 2 — 多 Agent 協作訊息協定
+==============================================
+定義 5 個 Agent 間傳遞的不可變資料型別。
+
+設計原則：
+1. 每個 Agent 有明確的 Input / Output 型別（不共用 dict）
+2. 所有型別都是 dataclass（快速、可序列化、無外部依賴）
+3. 降級 / 棄權用明確 AgentVote.ABSTAIN，不用 None 代替
+4. 全程 immutable — Agent 不得修改彼此的輸出（防 prompt 污染）
+
+ADR-082: 多 Agent 協作架構（Phase 2）
+2026-04-15 ogt + Claude Sonnet 4.6（亞太）: Phase 2 初始建立
+"""
+
+from __future__ import annotations
+
+from dataclasses import dataclass, field
+from enum import Enum
+from typing import Any
+
+
+# ─────────────────────────────────────────────────────────────────────────────
+# Enums
+# ─────────────────────────────────────────────────────────────────────────────
+
+class AgentRole(str, Enum):
+    """Phase 2 五角色標識"""
+    DIAGNOSTICIAN = "diagnostician"
+    SOLVER = "solver"
+    REVIEWER = "reviewer"
+    CRITIC = "critic"
+    COORDINATOR = "coordinator"
+
+
+class AgentVote(str, Enum):
+    """Agent 投票結果"""
+    APPROVE = "approve"
+    REJECT = "reject"
+    REQUEST_REVISION = "request_revision"
+    ABSTAIN = "abstain"          # 熔斷 / 超時 / 無足夠資訊
+    DEGRADED = "degraded"        # 降級路徑（rule-based mock）
+
+
+class AgentSessionStatus(str, Enum):
+    """AgentSession 整體狀態"""
+    RUNNING = "running"
+    COMPLETED = "completed"
+    DEGRADED = "degraded"        # 部分 Agent 熔斷但仍完成
+    FAILED = "failed"            # Coordinator 無法輸出任何結論
+    TIMEOUT = "timeout"          # 全流程 > 30s
+
+
+# ─────────────────────────────────────────────────────────────────────────────
+# Diagnostician Output
+# ─────────────────────────────────────────────────────────────────────────────
+
+@dataclass
+class Hypothesis:
+    """單一根因假設"""
+    description: str
+    confidence: float              # 0.0 ~ 1.0
+    evidence_chain: list[str]      # 支持此假設的 evidence key
+    category: str = ""             # alert_category（KubePod / HostDisk 等）
+
+
+@dataclass
+class DiagnosisReport:
+    """
+    Diagnostician Agent 輸出
+
+    包含多個根因假設（按信心排序），
+    top-1 confidence < 0.4 觸發 Coordinator 回退 Investigator 重抓。
+    """
+    hypotheses: list[Hypothesis]
+    evidence_snapshot_id: str
+    latency_ms: int
+    vote: AgentVote = AgentVote.APPROVE     # 資訊足夠 = APPROVE；不足 = ABSTAIN
+    degraded: bool = False                   # 熔斷降級標記
+
+    @property
+    def top_hypothesis(self) -> Hypothesis | None:
+        """最高信心假設"""
+        return self.hypotheses[0] if self.hypotheses else None
+
+    @property
+    def top_confidence(self) -> float:
+        return self.top_hypothesis.confidence if self.top_hypothesis else 0.0
+
+
+# ─────────────────────────────────────────────────────────────────────────────
+# Solver Output
+# ─────────────────────────────────────────────────────────────────────────────
+
+@dataclass
+class CandidateAction:
+    """單一候選修復動作"""
+    action: str                    # 動作描述（e.g. "restart_service:awoooi-api"）
+    blast_radius: int              # 0-100：影響範圍評分
+    rollback_cost: int             # 0-100：回滾難度
+    confidence: float              # 0.0 ~ 1.0
+    rationale: str = ""            # 為什麼選此方案
+
+
+@dataclass
+class ActionPlan:
+    """
+    Solver Agent 輸出
+
+    對每個根因假設提出 ≥1 個候選方案（含 blast_radius / rollback_cost）。
+    blast_radius > 50 → Reviewer 必須標 `request_revision`。
+    """
+    candidates: list[CandidateAction]
+    diagnosis_report: DiagnosisReport
+    latency_ms: int
+    vote: AgentVote = AgentVote.APPROVE
+    degraded: bool = False
+
+    @property
+    def top_candidate(self) -> CandidateAction | None:
+        """最高信心候選方案"""
+        return max(self.candidates, key=lambda c: c.confidence) if self.candidates else None
+
+
+# ─────────────────────────────────────────────────────────────────────────────
+# Reviewer Output
+# ─────────────────────────────────────────────────────────────────────────────
+
+@dataclass
+class ReviewVerdict:
+    """
+    Reviewer Agent 輸出（安全審查）
+
+    硬核拒絕 HARD_RULES 觸碰動作（delete node / DROP TABLE / force push 等）。
+    vote = REJECT 時，Coordinator 不得執行任何候選方案。
+    """
+    vote: AgentVote
+    reason: str
+    blocked_actions: list[str]     # 被拒絕的動作清單
+    safe_actions: list[str]        # 通過安全審查的動作
+    latency_ms: int
+    degraded: bool = False
+
+    @property
+    def is_safe(self) -> bool:
+        return self.vote == AgentVote.APPROVE and bool(self.safe_actions)
+
+
+# ─────────────────────────────────────────────────────────────────────────────
+# Critic Output
+# ─────────────────────────────────────────────────────────────────────────────
+
+@dataclass
+class Challenge:
+    """Critic 的單一質疑"""
+    target: str              # "diagnosis" | "action:{action_str}"
+    argument: str            # 質疑的具體理由
+    severity: str            # "minor" | "major" | "critical"
+
+
+@dataclass
+class CriticReport:
+    """
+    Critic Agent 輸出（刻意唱反調）
+
+    challenge_count > 0 是 Phase 2 退出條件之一。
+    major/critical challenge 觸發 Coordinator 降低對 Solver 方案的信心。
+    """
+    challenges: list[Challenge]
+    overall_assessment: str
+    latency_ms: int
+    vote: AgentVote = AgentVote.APPROVE   # APPROVE=無重大反對；REJECT=有 critical challenge
+    degraded: bool = False
+
+    @property
+    def challenge_count(self) -> int:
+        return len(self.challenges)
+
+    @property
+    def has_critical_challenge(self) -> bool:
+        return any(c.severity == "critical" for c in self.challenges)
+
+
+# ─────────────────────────────────────────────────────────────────────────────
+# Coordinator Output
+# ─────────────────────────────────────────────────────────────────────────────
+
+@dataclass
+class DecisionPackage:
+    """
+    Coordinator Agent 輸出（最終決策包）
+
+    包含：
+    - recommended_action: 最終推薦動作（None = 棄權 / 升級人工）
+    - confidence: 綜合信心（Solver × Reviewer × Critic 加權）
+    - requires_human_approval: 是否需要人工審核
+    - debate_summary: 辯證歷程摘要（供 Audit Trail + 學習閉環）
+    - session_status: 整體辯證狀態
+    """
+    recommended_action: str | None
+    confidence: float
+    requires_human_approval: bool
+    debate_summary: str
+    session_status: AgentSessionStatus
+    latency_ms: int
+
+    # 保留各 Agent 原始輸出（供學習閉環查詢）
+    diagnosis: DiagnosisReport | None = None
+    action_plan: ActionPlan | None = None
+    reviewer_verdict: ReviewVerdict | None = None
+    critic_report: CriticReport | None = None
+
+    # 棄選方案（含原因）
+    rejected_actions: list[dict[str, Any]] = field(default_factory=list)
+
+    # 阻擋原因（requires_human_approval=True 時說明）
+    blocked_reason: str | None = None
+
+    # 全部 Agent 都降級（更嚴格的人工審核信號）
+    all_agents_degraded: bool = False
+
+    @property
+    def is_actionable(self) -> bool:
+        """可以執行（有推薦動作且信心 > 0.4 且 Reviewer 通過）"""
+        if not self.recommended_action:
+            return False
+        if self.confidence < 0.4:
+            return False
+        if self.reviewer_verdict and self.reviewer_verdict.vote == AgentVote.REJECT:
+            return False
+        return True
+
+
+# ─────────────────────────────────────────────────────────────────────────────
+# Agent Session Record（DB 寫入用）
+# ─────────────────────────────────────────────────────────────────────────────
+
+@dataclass
+class AgentTurn:
+    """
+    單次 Agent 發言記錄
+
+    寫入 `agent_sessions` 表的一行，
+    session_id + agent_role 唯一確定一次辯證發言。
+    """
+    session_id: str
+    incident_id: str
+    agent_role: AgentRole
+    input_hash: str              # sha256(input_json)[:16]
+    output_json: dict[str, Any]  # Agent 原始輸出
+    latency_ms: int
+    vote: AgentVote
+    degraded: bool = False
--- a/apps/api/src/agents/reviewer_agent.py
+++ b/apps/api/src/agents/reviewer_agent.py
@@ -0,0 +1,227 @@
+"""
+AWOOOI AIOps Phase 2 — Reviewer Agent（安全官）
+================================================
+職責：安全審查 + 可行性驗證
+
+輸入：ActionPlan（來自 Solver）
+輸出：ReviewVerdict（approve / reject / request_revision）
+
+設計原則：
+1. 硬核拒絕 HARD_RULES 觸碰動作（delete node / DROP TABLE / force push 等）
+2. blast_radius > 50 → 自動 request_revision（不 reject，讓 Solver 調整方案）
+3. blast_radius > 80 → reject（風險太高）
+4. 熔斷降級：LLM 失敗 → 保守降級（APPROVE 低 blast_radius，REJECT 高 blast_radius）
+5. Reviewer 的 REJECT 是最高優先：Coordinator 不得執行任何被拒絕的方案
+
+HARD_RULES 觸碰清單（ADR-082 §安全原則）：
+  - kubectl delete node / kubectl delete --all
+  - DROP TABLE / DELETE FROM（無 WHERE）
+  - rm -rf /
+  - force push to main
+  - kubectl exec 執行任意 shell
+
+ADR-082: Phase 2 多 Agent 協作
+2026-04-15 ogt + Claude Sonnet 4.6（亞太）: Phase 2 初始建立
+"""
+
+from __future__ import annotations
+
+import hashlib
+import re
+import time
+from typing import Any
+
+import structlog
+
+from src.agents.base import BaseAgent
+from src.agents.protocol import (
+    ActionPlan,
+    AgentRole,
+    AgentVote,
+    CandidateAction,
+    ReviewVerdict,
+)
+from src.services.sanitization_service import sanitize
+
+logger = structlog.get_logger(__name__)
+
+# blast_radius 閾值
+BLAST_REQUEST_REVISION_THRESHOLD = 50    # > 50 → request_revision
+BLAST_REJECT_THRESHOLD = 80              # > 80 → reject（太危險）
+
+# 硬核拒絕 pattern（HARD_RULES 觸碰）
+_HARD_BLOCK_PATTERNS = [
+    re.compile(r"kubectl\s+delete\s+node", re.IGNORECASE),
+    re.compile(r"kubectl\s+delete\s+--all", re.IGNORECASE),
+    re.compile(r"\bDROP\s+TABLE\b", re.IGNORECASE),
+    re.compile(r"\bDELETE\s+FROM\b(?!.*\bWHERE\b)", re.IGNORECASE | re.DOTALL),  # Gate 2: lookahead 必須在 FROM 後而非 .* 後
+    re.compile(r"rm\s+-rf\s+/", re.IGNORECASE),
+    # Gate 2 驗證修正：git push --force 是 "push" 先、"--force/-f" 後，需同時覆蓋兩種順序
+    re.compile(r"(?:force.{0,5}push|push.{0,30}(?:--force|-f\b)).{0,30}main", re.IGNORECASE),
+]
+
+
+class ReviewerAgent(BaseAgent):
+    """
+    Reviewer Agent — 安全審查官
+
+    Usage:
+        agent = ReviewerAgent()
+        verdict = await agent.run(action_plan)
+    """
+
+    AGENT_NAME = AgentRole.REVIEWER.value
+    AGENT_DESCRIPTION = "Safety and feasibility reviewer. Hard-blocks HARD_RULES violations."
+
+    async def run(
+        self,
+        plan: ActionPlan,
+        timeout_sec: float = 0.0,  # noqa: ARG002 — 已廢棄，保留簽名相容性
+    ) -> ReviewVerdict:
+        """
+        審查方案安全性。
+
+        Args:
+            plan: Solver 輸出的方案
+            timeout_sec: 已廢棄 (2026-04-16 ogt) — LLM 等完整回應，真實異常才降級
+
+        Returns:
+            ReviewVerdict（真實異常時 degraded=True）
+        """
+        start_ms = int(time.monotonic() * 1000)
+
+        # 1. 硬核靜態檢查（不依賴 LLM）— HARD_RULES 優先
+        hard_blocked = [
+            c.action for c in plan.candidates
+            if _is_hard_blocked(c.action)
+        ]
+        if hard_blocked:
+            latency = int(time.monotonic() * 1000) - start_ms
+            logger.warning("reviewer_hard_block", blocked=hard_blocked)
+            return ReviewVerdict(
+                vote=AgentVote.REJECT,
+                reason=f"HARD_RULES 觸碰：{hard_blocked}",
+                blocked_actions=hard_blocked,
+                safe_actions=[],
+                latency_ms=latency,
+            )
+
+        try:
+            verdict = await self._review(plan)
+            verdict.latency_ms = int(time.monotonic() * 1000) - start_ms
+            logger.info(
+                "reviewer_done",
+                vote=verdict.vote,
+                blocked=len(verdict.blocked_actions),
+                safe=len(verdict.safe_actions),
+                latency_ms=verdict.latency_ms,
+            )
+            return verdict
+
+        except Exception:
+            latency = int(time.monotonic() * 1000) - start_ms
+            logger.exception("reviewer_error")
+            return self._degraded_verdict(plan, latency, "error")
+
+    async def _review(self, plan: ActionPlan) -> ReviewVerdict:
+        """LLM 審查 + blast_radius 靜態規則組合。"""
+        # 靜態 blast_radius 規則（不需要 LLM）
+        high_blast = [c for c in plan.candidates if c.blast_radius > BLAST_REJECT_THRESHOLD]
+        mid_blast = [c for c in plan.candidates if BLAST_REQUEST_REVISION_THRESHOLD < c.blast_radius <= BLAST_REJECT_THRESHOLD]
+        safe_candidates = [c for c in plan.candidates if c.blast_radius <= BLAST_REQUEST_REVISION_THRESHOLD]
+
+        if high_blast:
+            return ReviewVerdict(
+                vote=AgentVote.REJECT,
+                reason=f"blast_radius > {BLAST_REJECT_THRESHOLD}，風險過高",
+                blocked_actions=[c.action for c in high_blast],
+                safe_actions=[c.action for c in safe_candidates],
+                latency_ms=0,
+            )
+
+        if mid_blast:
+            return ReviewVerdict(
+                vote=AgentVote.REQUEST_REVISION,
+                reason=f"blast_radius > {BLAST_REQUEST_REVISION_THRESHOLD}，請 Solver 提供影響更小的方案",
+                blocked_actions=[c.action for c in mid_blast],
+                safe_actions=[c.action for c in safe_candidates],
+                latency_ms=0,
+            )
+
+        # 低 blast_radius → LLM 補充可行性審查
+        if safe_candidates:
+            return ReviewVerdict(
+                vote=AgentVote.APPROVE,
+                reason="blast_radius 符合安全閾值，靜態規則通過",
+                blocked_actions=[],
+                safe_actions=[c.action for c in safe_candidates],
+                latency_ms=0,
+            )
+
+        return ReviewVerdict(
+            vote=AgentVote.ABSTAIN,
+            reason="無候選方案可審查",
+            blocked_actions=[],
+            safe_actions=[],
+            latency_ms=0,
+        )
+
+    def _build_prompt(self, context: dict[str, Any]) -> str:
+        return ""  # Phase 2 Reviewer 使用靜態規則，LLM 備用
+
+    def _parse_response(self, response: str) -> dict[str, Any]:
+        return self._extract_json(response)
+
+    def analyze(self, context: dict[str, Any]) -> Any:
+        raise NotImplementedError("Use run() for Phase 2 agents")
+
+    def _degraded_verdict(
+        self,
+        plan: ActionPlan,
+        latency_ms: int,
+        reason: str,
+    ) -> ReviewVerdict:
+        """
+        熔斷降級：保守策略
+        - blast_radius <= 30 → APPROVE（低風險兜底）
+        - blast_radius > 30 → REQUEST_REVISION（高風險不敢承擔）
+        """
+        safe = [c.action for c in plan.candidates if c.blast_radius <= 30]
+        risky = [c.action for c in plan.candidates if c.blast_radius > 30]
+        vote = AgentVote.APPROVE if safe and not risky else AgentVote.REQUEST_REVISION
+        return ReviewVerdict(
+            vote=vote,
+            reason=f"[降級] Reviewer LLM 失敗（{reason}），使用保守靜態降級規則",
+            blocked_actions=risky,
+            safe_actions=safe,
+            latency_ms=latency_ms,
+            degraded=True,
+        )
+
+
+# ─────────────────────────────────────────────────────────────────────────────
+# Helpers
+# ─────────────────────────────────────────────────────────────────────────────
+
+def _is_hard_blocked(action: str) -> bool:
+    """檢查動作是否觸碰 HARD_RULES（靜態 pattern，不依賴 LLM）"""
+    return any(p.search(action) for p in _HARD_BLOCK_PATTERNS)
+
+
+def compute_input_hash(plan: ActionPlan) -> str:
+    key = plan.diagnosis_report.evidence_snapshot_id + str([c.action for c in plan.candidates])
+    return hashlib.sha256(key.encode()).hexdigest()[:16]
+
+
+# ─────────────────────────────────────────────────────────────────────────────
+# Singleton
+# ─────────────────────────────────────────────────────────────────────────────
+
+_agent: ReviewerAgent | None = None
+
+
+def get_reviewer_agent() -> ReviewerAgent:
+    global _agent
+    if _agent is None:
+        _agent = ReviewerAgent()
+    return _agent
--- a/apps/api/src/agents/solver_agent.py
+++ b/apps/api/src/agents/solver_agent.py
@@ -0,0 +1,370 @@
+"""
+AWOOOI AIOps Phase 2 — Solver Agent（軍師）
+===========================================
+職責：對每個根因假設產修復方案
+
+輸入：DiagnosisReport（來自 Diagnostician）
+輸出：ActionPlan（候選動作，含 blast_radius + rollback_cost + confidence）
+
+設計原則：
+1. 每個 Hypothesis 至少產 1 個 CandidateAction
+2. blast_radius 評分影響 Reviewer 的審查嚴格度
+3. blast_radius > 50 → Reviewer 必須 request_revision
+4. 熔斷降級：LLM 失敗 → rule-based mock（基於 category 推 RESTART 為兜底動作）
+5. Solver 不直接觸碰執行層（Coordinator 的工作）
+
+ADR-082: Phase 2 多 Agent 協作
+2026-04-15 ogt + Claude Sonnet 4.6（亞太）: Phase 2 初始建立
+"""
+
+from __future__ import annotations
+
+import hashlib
+import time
+from typing import Any
+
+import structlog
+
+from src.agents.base import BaseAgent
+from src.agents.protocol import (
+    ActionPlan,
+    AgentRole,
+    AgentVote,
+    CandidateAction,
+    DiagnosisReport,
+)
+from src.services.sanitization_service import sanitize
+
+logger = structlog.get_logger(__name__)
+
+
+class SolverAgent(BaseAgent):
+    """
+    Solver Agent — 修復方案軍師
+
+    Usage:
+        agent = SolverAgent()
+        plan = await agent.run(diagnosis_report)
+    """
+
+    AGENT_NAME = AgentRole.SOLVER.value
+    AGENT_DESCRIPTION = "Remediation plan specialist. Produces candidate actions with blast radius scoring."
+
+    async def run(
+        self,
+        diagnosis: DiagnosisReport,
+        timeout_sec: float = 0.0,  # noqa: ARG002 — 已廢棄，保留簽名相容性
+    ) -> ActionPlan:
+        """
+        根據診斷報告產出修復計畫。
+
+        Args:
+            diagnosis: Diagnostician 輸出
+            timeout_sec: 已廢棄 (2026-04-16 ogt) — LLM 等完整回應，真實異常才降級
+
+        Returns:
+            ActionPlan（真實異常時 degraded=True）
+        """
+        start_ms = int(time.monotonic() * 1000)
+
+        # 若 Diagnostician 已棄權，Solver 也應棄權（無論降級假設是否存在）
+        # Gate 2: 原條件 `and not diagnosis.hypotheses` 誤放行降級的 confidence=0.2 假設
+        if diagnosis.vote == AgentVote.ABSTAIN:
+            return ActionPlan(
+                candidates=[],
+                diagnosis_report=diagnosis,
+                latency_ms=0,
+                vote=AgentVote.ABSTAIN,
+                degraded=diagnosis.degraded,
+            )
+
+        try:
+            plan = await self._solve(diagnosis)
+            plan.latency_ms = int(time.monotonic() * 1000) - start_ms
+            logger.info(
+                "solver_done",
+                candidates=len(plan.candidates),
+                vote=plan.vote,
+                latency_ms=plan.latency_ms,
+            )
+            return plan
+
+        except Exception:
+            latency = int(time.monotonic() * 1000) - start_ms
+            logger.exception("solver_error")
+            return self._degraded_plan(diagnosis, latency, "error")
+
+    async def _solve(self, diagnosis: DiagnosisReport) -> ActionPlan:
+        """核心 LLM 推理邏輯。"""
+        top = diagnosis.top_hypothesis
+        if not top:
+            return ActionPlan(
+                candidates=[],
+                diagnosis_report=diagnosis,
+                latency_ms=0,
+                vote=AgentVote.ABSTAIN,
+            )
+
+        # 2026-04-17 ogt + Claude Sonnet 4.6 (Checkpoint-2 環境感知):
+        # 根因：LLM 在無叢集上下文時「盲猜」資源名稱 → awooiii-api（三個 i）→ K8s not found
+        # 修復：生成指令前先拉取實際 Deployment 清單，注入 prompt 讓 LLM 對齊真實名稱
+        # 失敗無害：kubectl 超時或拒絕 → _k8s_inventory 為空 → prompt 仍正常但無鎖定效果
+        _k8s_inventory = await _fetch_k8s_inventory(namespace="awoooi-prod")
+
+        prompt = self._build_prompt({
+            "hypothesis": top.description,
+            "category": top.category,
+            "confidence": top.confidence,
+            "k8s_inventory": _k8s_inventory,
+        })
+
+        # 2026-04-16 ogt + Claude Sonnet 4.6: 傳遞 hypothesis 結構化資料給 OPENCLAW_NEMO
+        # 根因：原本 call(prompt) 不傳 context → nemo fallback 把 prompt[:500]（系統說明）
+        #       當 signal description → LLM 回傳「設計修復方案的軍師 Agent」垃圾
+        # 修復：把 top hypothesis description 放進 alert_context.signals 讓 nemo 看到真實診斷
+        _hypothesis_text = (top.description or "（待診斷）")[:800]
+        alert_context = {
+            "incident_id": diagnosis.evidence_snapshot_id or "UNKNOWN",
+            "severity": "P3",
+            "signals": [{"alert_name": "diagnosis_hypothesis", "description": _hypothesis_text}],
+            "affected_services": [],
+        }
+
+        from src.services.openclaw import get_openclaw
+        openclaw = get_openclaw()
+        response_text, _provider, success = await openclaw.call(prompt, alert_context=alert_context)
+
+        if not success or not response_text:
+            return self._degraded_plan(diagnosis, 0, "llm_failed")
+
+        parsed = self._parse_response(sanitize(response_text, "solver_output"))
+        candidates = _extract_candidates(parsed)
+
+        if not candidates:
+            return self._degraded_plan(diagnosis, 0, "no_candidates")
+
+        return ActionPlan(
+            candidates=candidates,
+            diagnosis_report=diagnosis,
+            latency_ms=0,
+            vote=AgentVote.APPROVE,
+        )
+
+    def _build_prompt(self, context: dict[str, Any]) -> str:
+        # 2026-04-17 ogt + Claude Sonnet 4.6: 修復 Solver action 格式問題
+        # 根因：舊 prompt action 範例為 "restart_service:awoooi-api"（自訂格式）
+        #       LLM 模仿範例輸出自然語言描述，而非 kubectl 命令
+        #       → auto_approve Condition 1c 拒絕（無 kubectl 關鍵字）
+        #       → blast_radius_calculator 永遠不被調用（fill rate = 0%）
+        # 修復：要求 action 必須是真實 kubectl 命令，並提供正確範例
+        # 2026-04-17 ogt + Claude Sonnet 4.6 (Checkpoint-2): 注入 K8s 實際 Deployment 清單
+        # LLM 必須從此清單選擇資源名稱，不可自行編造
+        _inventory = context.get("k8s_inventory", "")
+        _inventory_section = (
+            f"\n🔒 叢集實際 Deployment 清單（awoooi-prod）— 必須從此清單選擇資源名稱：\n{_inventory}\n"
+            if _inventory
+            else "\n⚠️ 無法取得叢集清單，請謹慎填寫資源名稱。\n"
+        )
+        return f"""你是 AWOOOI SRE 系統的軍師 Agent，專職修復方案設計。
+
+根因假設：{context.get("hypothesis", "")}
+告警類別：{context.get("category", "")}
+診斷信心：{context.get("confidence", 0.0):.0%}
+{_inventory_section}
+你的工作：為此根因提出 1-3 個修復候選方案。
+每個方案必須評估：
+- blast_radius（0-100）：影響範圍（越高 = 風險越大）
+- rollback_cost（0-100）：回滾難度（越高 = 越難還原）
+
+blast_radius 參考：
+- kubectl rollout restart deployment = 10
+- kubectl scale deployment --replicas=N = 15
+- kubectl rollout undo deployment = 25
+- kubectl apply -f = 40
+- kubectl delete deployment = 75
+- kubectl delete pvc = 95
+
+🔴 關鍵規則：action 欄位必須是真實的 kubectl 命令，不可用自然語言描述。
+目標資源格式：deployment/<name>，命名空間統一用 awoooi-prod。
+
+以 JSON 回覆：
+{{
+  "candidates": [
+    {{
+      "action": "kubectl rollout restart deployment/awoooi-api -n awoooi-prod",
+      "blast_radius": 10,
+      "rollback_cost": 5,
+      "confidence": 0.8,
+      "rationale": "重啟可清除 OOM 導致的記憶體碎片化"
+    }}
+  ]
+}}"""
+
+    def _parse_response(self, response: str) -> dict[str, Any]:
+        return self._extract_json(response)
+
+    def analyze(self, context: dict[str, Any]) -> Any:
+        raise NotImplementedError("Use run() for Phase 2 agents")
+
+    def _degraded_plan(
+        self,
+        diagnosis: DiagnosisReport,
+        latency_ms: int,
+        reason: str = "unknown",
+    ) -> ActionPlan:
+        """熔斷降級：rule-based mock（依 category 推 RESTART 兜底）"""
+        category = diagnosis.top_hypothesis.category if diagnosis.top_hypothesis else "Unknown"
+        fallback_action = _default_action_for_category(category)
+        return ActionPlan(
+            candidates=[
+                CandidateAction(
+                    action=fallback_action,
+                    blast_radius=20,
+                    rollback_cost=5,
+                    confidence=0.2,
+                    rationale=f"[降級] LLM 分析失敗（{reason}），使用類別 {category} 的預設兜底動作",
+                )
+            ],
+            diagnosis_report=diagnosis,
+            latency_ms=latency_ms,
+            vote=AgentVote.DEGRADED,
+            degraded=True,
+        )
+
+
+# ─────────────────────────────────────────────────────────────────────────────
+# Helpers
+# ─────────────────────────────────────────────────────────────────────────────
+
+async def _fetch_k8s_inventory(namespace: str = "awoooi-prod", timeout_sec: float = 5.0) -> str:
+    """
+    取得 K8s 叢集實際 Deployment/StatefulSet 清單，供 Solver prompt 注入。
+
+    2026-04-17 ogt + Claude Sonnet 4.6 (Checkpoint-2 環境感知):
+    - 在生成 kubectl 指令前查詢叢集真實資源，防止 LLM 幻覺資源名（如 awooiii-api）
+    - 超時或失敗 → 返回 ""（呼叫端降級為警示模式，不中斷 Solver 主流程）
+    - 只執行唯讀 get 指令，不修改叢集
+
+    Returns:
+        "awoooi-api, awoooi-web, postgres, ..." 格式字串，失敗時返回 ""
+    """
+    import asyncio as _asyncio
+    try:
+        cmd = f"kubectl get deployments,statefulsets -n {namespace} -o jsonpath='{{.items[*].metadata.name}}' 2>/dev/null"
+        proc = await _asyncio.create_subprocess_shell(
+            cmd,
+            stdout=_asyncio.subprocess.PIPE,
+            stderr=_asyncio.subprocess.PIPE,
+        )
+        try:
+            stdout, _ = await _asyncio.wait_for(proc.communicate(), timeout=timeout_sec)
+        except _asyncio.TimeoutError:
+            proc.kill()
+            logger.warning("k8s_inventory_timeout", namespace=namespace, timeout_sec=timeout_sec)
+            return ""
+
+        raw = (stdout or b"").decode("utf-8", errors="replace").strip()
+        if not raw:
+            return ""
+
+        # jsonpath 輸出以空格分隔，轉成可讀逗號格式
+        names = [n.strip() for n in raw.split() if n.strip()]
+        inventory = ", ".join(names)
+        logger.debug("k8s_inventory_fetched", namespace=namespace, count=len(names))
+        return inventory
+
+    except Exception as _e:
+        logger.warning("k8s_inventory_failed", namespace=namespace, error=str(_e))
+        return ""
+
+
+def _extract_candidates(parsed: dict[str, Any]) -> list[CandidateAction]:
+    """從 LLM 解析結果提取候選方案（按信心降序）。
+
+    支援兩種格式：
+    1. 標準格式：{"candidates": [{action, blast_radius, rollback_cost, confidence, rationale}]}
+    2. OpenClaw Nemo 格式：{"action_title": "...", "risk_level": "...", "confidence": 0.85}
+
+    2026-04-16 ogt + Claude Sonnet 4.6: 與 diagnostician 同步，修復 openclaw_nemo 格式不相容
+    """
+    # OpenClaw Nemo 格式轉換
+    # 2026-04-17 ogt + Claude Sonnet 4.6: Nemo path kubectl 驗證
+    # 根因：Nemo 回傳 {"action_title": "重啟 Crash Looping Pod"} 自然語言
+    #       直接用 action_title 作為 action → 無 kubectl → auto_approve 誤通過 → 死迴圈
+    # 修復：action_title 不含 kubectl → return []（觸發 _degraded_plan 輸出真實 kubectl）
+    if "action_title" in parsed and "candidates" not in parsed:
+        action_title = str(parsed.get("action_title", ""))
+        if "kubectl" not in action_title.lower():
+            return []  # 交由 _degraded_plan 接手，輸出真實 kubectl 調查指令
+        confidence = float(parsed.get("confidence", 0.5))
+        risk_level = str(parsed.get("risk_level", "medium"))
+        risk_to_blast = {"critical": 60, "high": 40, "medium": 25, "low": 10}
+        blast = risk_to_blast.get(risk_level.lower(), 30)
+        if action_title and confidence > 0:
+            return [CandidateAction(
+                action=action_title[:200],
+                blast_radius=blast,
+                rollback_cost=20,
+                confidence=confidence,
+                rationale=f"OpenClaw Nemo 建議: {action_title}",
+            )]
+        return []
+
+    raw = parsed.get("candidates", [])
+    candidates = []
+    for item in raw:
+        if not isinstance(item, dict):
+            continue
+        c = CandidateAction(
+            action=str(item.get("action", ""))[:200],
+            blast_radius=max(0, min(100, int(item.get("blast_radius", 50)))),
+            rollback_cost=max(0, min(100, int(item.get("rollback_cost", 50)))),
+            confidence=float(item.get("confidence", 0.0)),
+            rationale=str(item.get("rationale", ""))[:500],
+        )
+        candidates.append(c)
+    candidates.sort(key=lambda c: c.confidence, reverse=True)
+    return candidates
+
+
+def _default_action_for_category(category: str) -> str:
+    """降級時的預設調查指令 — 必須是真實 kubectl 命令（調查優先，不執行破壞性操作）
+
+    2026-04-17 ogt + Claude Sonnet 4.6: 改為真實 kubectl 指令
+    舊：自然語言如 "restart_pod"、"check_disk_usage" → 無法被 auto_approve 執行
+    新：kubectl 調查指令 → 可執行，且均為唯讀操作，無副作用
+    """
+    category_lower = category.lower()
+    if "pod" in category_lower or "kube" in category_lower or "crash" in category_lower:
+        return "kubectl get pods -n awoooi-prod -o wide"
+    if "disk" in category_lower or "storage" in category_lower or "pvc" in category_lower:
+        return "kubectl exec -n awoooi-prod deployment/postgresql -- df -h"
+    if "cpu" in category_lower or "load" in category_lower:
+        return "kubectl top pods -n awoooi-prod --sort-by=cpu"
+    if "memory" in category_lower or "oom" in category_lower:
+        return "kubectl top pods -n awoooi-prod --sort-by=memory"
+    if "network" in category_lower or "connect" in category_lower:
+        return "kubectl get services -n awoooi-prod"
+    return "kubectl get pods -n awoooi-prod"
+
+
+def compute_input_hash(diagnosis: DiagnosisReport) -> str:
+    """計算 Solver 輸入的 fingerprint。"""
+    key = diagnosis.evidence_snapshot_id + (
+        diagnosis.top_hypothesis.description if diagnosis.top_hypothesis else ""
+    )
+    return hashlib.sha256(key.encode()).hexdigest()[:16]
+
+
+# ─────────────────────────────────────────────────────────────────────────────
+# Singleton
+# ─────────────────────────────────────────────────────────────────────────────
+
+_agent: SolverAgent | None = None
+
+
+def get_solver_agent() -> SolverAgent:
+    global _agent
+    if _agent is None:
+        _agent = SolverAgent()
+    return _agent
--- a/apps/api/src/api/v1/ai_slo.py
+++ b/apps/api/src/api/v1/ai_slo.py
@@ -0,0 +1,58 @@
+"""
+AI SLO REST API
+===============
+ADR-087 Phase 6 自我治理閉環 — AI 決策品質 SLO 查詢端點
+
+Endpoints:
+  GET /api/v1/ai/slo  — 取得最新 SLO 計算結果（含 Redis 快取）
+
+設計原則:
+- 優先讀 Service 層快取（TTL 5min），快取失效才重算
+- 計算失敗 → 保守回傳 any_violated=True（由 AiSloCalculator 處理）
+- 強制重算：?force_refresh=true
+- Router 層不直接存取 Redis（leWOOOgo 積木化鐵律）
+
+2026-04-15 ogt + Claude Sonnet 4.6（亞太）: Phase 6 初始建立
+"""
+
+from __future__ import annotations
+
+import structlog
+from fastapi import APIRouter, Query
+
+from src.services.ai_slo_calculator import AiSloCalculator
+
+logger = structlog.get_logger(__name__)
+
+router = APIRouter()
+
+
+@router.get("/ai/slo")
+async def get_ai_slo(
+    force_refresh: bool = Query(False, description="忽略快取，強制重算"),
+) -> dict:
+    """
+    取得 AI 決策品質 SLO 最新結果。
+
+    優先讀 Redis 快取（TTL 5min）；force_refresh=true 則強制重算並更新快取。
+
+    Response:
+        calculated_at   ISO 時間戳
+        window_days     計算視窗（天）
+        any_violated    是否有任何 SLO 違反
+        cache_hit       是否命中快取
+        metrics[]       三大 SLO 指標明細
+    """
+    calc = AiSloCalculator()
+
+    if not force_refresh:
+        cached = await calc.get_cached_report()
+        if cached:
+            data = cached.to_dict()
+            data["cache_hit"] = True
+            return data
+
+    report = await calc.run()
+    data = report.to_dict()
+    data["cache_hit"] = False
+    return data
--- a/apps/api/src/api/v1/aider_events.py
+++ b/apps/api/src/api/v1/aider_events.py
@@ -0,0 +1,53 @@
+# apps/api/src/api/v1/aider_events.py | 2026-04-20 @ Asia/Taipei
+"""POST /api/v1/aider/events — Mac aiderw client 推事件入口。
+HMAC-SHA256 verified; 推入 Redis stream 讓 background job 處理。"""
+from __future__ import annotations
+import hmac
+import hashlib
+import os
+import structlog
+from fastapi import APIRouter, Header, HTTPException, Request, status
+from pydantic import ValidationError
+from src.models.aider import AiderBatchIn
+from src.services.aider_event_service import push_aider_batch_to_stream
+
+logger = structlog.get_logger(__name__)
+router = APIRouter(prefix="/aider", tags=["Aider"])
+
+
+def _verify_signature(body: bytes, signature: str | None, secret: str) -> bool:
+    """Timing-safe HMAC-SHA256 比對。signature 格式 'sha256=<hex>'。"""
+    if not signature or not signature.startswith("sha256=") or not secret:
+        return False
+    expected = "sha256=" + hmac.new(secret.encode(), body, hashlib.sha256).hexdigest()
+    return hmac.compare_digest(expected, signature)
+
+
+@router.post("/events", status_code=status.HTTP_202_ACCEPTED)
+async def receive_aider_events(
+    request: Request,
+    x_aider_signature: str | None = Header(default=None, alias="X-Aider-Signature"),
+):
+    """接收 Mac aiderw 推來的 event batch，HMAC 驗證後推 Redis stream。"""
+    body = await request.body()
+
+    secret = os.environ.get("AIDER_WEBHOOK_SECRET", "")
+    if not _verify_signature(body, x_aider_signature, secret):
+        logger.warning("aider_webhook_signature_invalid")
+        raise HTTPException(status_code=401, detail="invalid signature")
+
+    try:
+        batch = AiderBatchIn.model_validate_json(body)
+    except ValidationError as e:
+        # 只回前 5 筆錯誤避免巨大 response
+        raise HTTPException(status_code=400, detail=e.errors()[:5])
+
+    # 推 Redis stream（透過 Service 層）
+    try:
+        stream_ids = await push_aider_batch_to_stream(batch)
+    except Exception as exc:
+        logger.exception("aider_webhook_redis_push_failed")
+        raise HTTPException(status_code=503, detail="queue unavailable") from exc
+
+    logger.info("aider_webhook_accepted", count=len(batch.events))
+    return {"accepted": len(batch.events), "stream_ids": stream_ids}
--- a/apps/api/src/api/v1/aiops_kpi.py
+++ b/apps/api/src/api/v1/aiops_kpi.py
@@ -0,0 +1,36 @@
+"""
+AIOps KPI Dashboard — ADR-090 + MASTER §7.1
+=============================================
+GET /api/v1/aiops/kpi → 一次回傳 AI 自主化成熟度全景.
+
+Router 層只負責 HTTP 路由,DB/business logic 由 AiopsKpiService 處理
+(leWOOOgo 積木化鐵律: Router 禁直接存取 DB).
+
+2026-04-19 ogt + Claude Opus 4.7 (1M context) Asia/Taipei
+"""
+from __future__ import annotations
+
+from typing import Any
+
+from fastapi import APIRouter
+
+from src.services.aiops_kpi_service import get_aiops_kpi_service
+
+router = APIRouter()
+
+
+@router.get("/aiops/kpi", tags=["AIOps KPI"])
+async def get_aiops_kpi() -> dict[str, Any]:
+    """
+    AI 自主化成熟度全景 KPI.
+
+    一次返回 6 個 section + autonomy_score:
+    - asset_inventory: 資產盤點 (by type + last_scan)
+    - coverage_kpi: 7 維自動化覆蓋 SLO (green/yellow/red/unknown)
+    - rule_quality: 規則品質 (noisy/deprecated/with_fires + top 5)
+    - capacity_health: 主機容量健康 (ai_verdict 分布)
+    - automation_flow_24h: 過去 24h aol 動作流量
+    - ai_autonomy_score: 自主化總分 (0-100, 5 子項 × 20)
+    """
+    svc = get_aiops_kpi_service()
+    return await svc.get_snapshot()
--- a/apps/api/src/api/v1/alert_operation_logs.py
+++ b/apps/api/src/api/v1/alert_operation_logs.py
@@ -0,0 +1,100 @@
+"""
+Alert Operation Log API Endpoints
+==================================
+告警操作日誌 API — 提供 alert_operation_log 的查詢介面
+
+Endpoints:
+- GET  /api/v1/alert-operation-logs         - 分頁列表（最新優先）
+- GET  /api/v1/alert-operation-logs/stats   - 統計（24h 事件分佈）
+
+2026-04-09 Claude Sonnet 4.6 Asia/Taipei (Sprint 5.2)
+"""
+
+from typing import Any
+
+from fastapi import APIRouter, Query
+from pydantic import BaseModel
+
+from src.core.logging import get_logger
+from src.repositories.alert_operation_log_repository import get_alert_operation_log_repository
+
+router = APIRouter(prefix="/alert-operation-logs", tags=["Alert Operation Logs"])
+logger = get_logger("awoooi.alert_op_log")
+
+
+# =============================================================================
+# Response Models
+# =============================================================================
+
+class AlertOperationLogResponse(BaseModel):
+    id: str
+    incident_id: str | None
+    approval_id: str | None
+    audit_log_id: str | None
+    auto_repair_id: str | None
+    event_type: str
+    actor: str | None
+    action_detail: str | None
+    success: bool | None
+    error_message: str | None
+    context: dict[str, Any]
+    created_at: str
+
+    model_config = {"from_attributes": True}
+
+
+class AlertOperationLogListResponse(BaseModel):
+    items: list[AlertOperationLogResponse]
+    total: int
+    limit: int
+    offset: int
+
+
+# =============================================================================
+# Endpoints
+# =============================================================================
+
+@router.get("/stats", summary="取得告警操作事件統計")
+async def get_stats(
+    since_hours: int = Query(default=24, ge=1, le=168),
+) -> dict[str, Any]:
+    repo = get_alert_operation_log_repository()
+    return await repo.get_stats(since_hours=since_hours)
+
+
+@router.get("", response_model=AlertOperationLogListResponse, summary="取得告警操作日誌列表")
+async def list_logs(
+    limit: int = Query(default=50, ge=1, le=200),
+    offset: int = Query(default=0, ge=0),
+    event_type: str | None = Query(default=None),
+    incident_id: str | None = Query(default=None),
+) -> AlertOperationLogListResponse:
+    repo = get_alert_operation_log_repository()
+    items, total = await repo.list_recent(
+        limit=limit,
+        offset=offset,
+        event_type=event_type,
+        incident_id=incident_id,
+    )
+    return AlertOperationLogListResponse(
+        items=[
+            AlertOperationLogResponse(
+                id=str(item.id),
+                incident_id=item.incident_id,
+                approval_id=item.approval_id,
+                audit_log_id=item.audit_log_id,
+                auto_repair_id=item.auto_repair_id,
+                event_type=str(item.event_type),
+                actor=item.actor,
+                action_detail=item.action_detail,
+                success=item.success,
+                error_message=item.error_message,
+                context=item.context or {},
+                created_at=item.created_at.isoformat(),
+            )
+            for item in items
+        ],
+        total=total,
+        limit=limit,
+        offset=offset,
+    )
--- a/apps/api/src/api/v1/auto_repair.py
+++ b/apps/api/src/api/v1/auto_repair.py
@@ -13,7 +13,7 @@ Phase 8.2: API Router 實作
 - 業務邏輯委託給 Service 層
 """

-from fastapi import APIRouter, HTTPException
+from fastapi import APIRouter, HTTPException, Query
 from pydantic import BaseModel, Field

 from src.services.auto_repair_service import (
@@ -81,7 +81,7 @@ async def evaluate_auto_repair(incident_id: str) -> EvaluateResponse:
    """
    # 取得 Incident
    incident_service = get_incident_service()
-    incident = await incident_service.get_incident(incident_id)
+    incident = await incident_service.get_from_working_memory(incident_id)

    if not incident:
        raise HTTPException(
@@ -116,7 +116,7 @@ async def execute_auto_repair(request: ExecuteRequest) -> ExecuteResponse:
    """
    # 取得 Incident
    incident_service = get_incident_service()
-    incident = await incident_service.get_incident(request.incident_id)
+    incident = await incident_service.get_from_working_memory(request.incident_id)

    if not incident:
        raise HTTPException(
@@ -190,10 +190,121 @@ async def get_auto_repair_stats() -> dict:
    total_executions = sum(p.total_executions for p in playbooks)
    total_success = sum(p.success_count for p in playbooks)

+    # 2026-04-07 Claude Code: Sprint 4 C2 — 加入處置分佈摘要
+    # P0-2 Fix: 呼叫 Service 層封裝方法
+    disposition_summary = {"auto_repair": 0, "human_approved": 0, "manual_resolved": 0, "cold_start_trust": 0, "total": 0}
+    try:
+        from src.services.anomaly_counter import get_anomaly_counter
+        counter = get_anomaly_counter()
+        disposition_summary, _ = await counter.get_all_disposition_stats()
+    except Exception:
+        pass
+
+    total_disp = disposition_summary["total"]
+    auto_cnt = disposition_summary["auto_repair"] + disposition_summary["cold_start_trust"]
+
    return {
        "approved_playbooks": len(playbooks),
        "high_quality_playbooks": high_quality_count,
        "total_executions": total_executions,
        "overall_success_rate": total_success / total_executions if total_executions > 0 else 0.0,
        "auto_repair_eligible": high_quality_count > 0,
+        "disposition_summary": {
+            **disposition_summary,
+            "auto_rate": auto_cnt / total_disp if total_disp > 0 else 0.0,
+        },
    }
+
+
+# =============================================================================
+# History Models & Endpoint
+# 2026-04-06 Claude Code: Sprint 3 T_frontend — 修復歷史記錄 API
+# =============================================================================
+
+
+class RepairHistoryItem(BaseModel):
+    """修復歷史記錄項目"""
+
+    id: str
+    incident_id: str
+    playbook_id: str
+    playbook_name: str
+    action_type: str  # "kubectl" | "ssh_command" | "manual"
+    uri_scheme: str   # "kubectl://" | "openclaw://" | "ansible://"
+    command: str
+    status: str       # "success" | "failed" | "pending_approval" | "running"
+    executed_at: str
+    duration_ms: int | None = None
+    error: str | None = None
+    rag_confidence: float | None = None
+
+
+class RepairHistoryResponse(BaseModel):
+    """修復歷史記錄回應"""
+
+    count: int
+    items: list[RepairHistoryItem]
+
+
+@router.get("/history", response_model=RepairHistoryResponse)
+async def get_repair_history(
+    limit: int = Query(20, ge=1, le=100),
+) -> RepairHistoryResponse:
+    """
+    取得修復歷史記錄。
+    從 incidents (working memory) 推導，回傳有 auto_repair 活動的事件。
+    2026-04-06 Claude Code: Sprint 3 T_frontend
+    """
+    try:
+        incident_service = get_incident_service()
+        all_incidents = await incident_service.get_active_incidents()
+
+        items: list[RepairHistoryItem] = []
+        for incident in all_incidents:
+            fs = incident.frequency_stats
+            if fs is None or fs.auto_repair_count == 0:
+                continue
+
+            # 從 frequency_stats 推導修復狀態
+            if fs.last_repair_success is True:
+                status = "success"
+            elif fs.last_repair_success is False:
+                status = "failed"
+            else:
+                status = "running"
+
+            action = fs.last_repair_action or "kubectl rollout restart"
+            # 推導 action_type 和 uri_scheme
+            if action.startswith("kubectl"):
+                action_type = "kubectl"
+                uri_scheme = "kubectl://"
+            elif action.startswith("ssh") or action.startswith("ansible"):
+                action_type = "ssh_command"
+                uri_scheme = "ansible://"
+            else:
+                action_type = "manual"
+                uri_scheme = "openclaw://"
+
+            items.append(RepairHistoryItem(
+                id=f"hist-{incident.incident_id}",
+                incident_id=incident.incident_id,
+                playbook_id="unknown",
+                playbook_name=action,
+                action_type=action_type,
+                uri_scheme=uri_scheme,
+                command=action,
+                status=status,
+                executed_at=incident.updated_at.isoformat(),
+                duration_ms=None,
+                error=None,
+                rag_confidence=None,
+            ))
+
+        # 最多回傳 limit 筆，newest first (updated_at 已是活躍事件，先按 ID 截斷)
+        items = items[:limit]
+
+        return RepairHistoryResponse(count=len(items), items=items)
+
+    except Exception:
+        # 任何錯誤都回傳空列表，不中斷前端
+        return RepairHistoryResponse(count=0, items=[])
--- a/apps/api/src/api/v1/drift.py
+++ b/apps/api/src/api/v1/drift.py
@@ -0,0 +1,198 @@
+"""
+Config Drift Detection API Router - Phase 25 P2
+================================================
+GitOps 守門員 HTTP 端點
+
+leWOOOgo 積木化原則:
+- Router 層只做 HTTP 轉發
+- 不直接存取 Redis/DB
+- 業務邏輯委託給 Service 層
+
+版本: v1.0
+建立: 2026-04-04 (台北時區)
+建立者: Claude Code (Phase 25 P2)
+"""
+
+from fastapi import APIRouter, BackgroundTasks, HTTPException
+
+from src.models.drift import (
+    DriftListResponse,
+    DriftReport,
+    DriftScanRequest,
+    DriftScanResponse,
+)
+from src.repositories.drift_repository import get_drift_repository
+from src.services.drift_analyzer import get_drift_analyzer
+from src.services.drift_detector import get_drift_detector
+from src.services.drift_interpreter import get_drift_interpreter
+from src.services.drift_remediator import get_drift_remediator
+
+router = APIRouter(prefix="/drift", tags=["drift"])
+
+# 2026-04-09 Claude Sonnet 4.6: B4 drift_reports 持久化 — 改用 DB repository
+
+
+@router.post("/scan", response_model=DriftScanResponse, summary="觸發漂移掃描")
+async def trigger_drift_scan(
+    request: DriftScanRequest,
+    background_tasks: BackgroundTasks,
+) -> DriftScanResponse:
+    """
+    觸發 Config Drift 掃描
+
+    - 比對 Git YAML vs K8s 實際狀態
+    - Nemotron 分析漂移意圖
+    - 高/中嚴重度漂移自動推送 Telegram
+
+    適合由 Gitea CD Webhook 或手動呼叫觸發
+    """
+    detector = get_drift_detector()
+    analyzer = get_drift_analyzer()
+
+    repo = get_drift_repository()
+    all_items = []
+    last_report: DriftReport | None = None
+
+    for namespace in request.namespaces:
+        raw_report = await detector.scan(namespace, triggered_by=request.triggered_by)
+        classified_report = analyzer.classify(raw_report)
+        all_items.extend(classified_report.items)
+
+        # 持久化到 DB
+        await repo.save(classified_report)
+
+        if analyzer.needs_alert(classified_report):
+            background_tasks.add_task(_analyze_and_notify, classified_report)
+            last_report = classified_report
+
+    # 若多 namespace，彙總第一個 report 的計數
+    if last_report:
+        return DriftScanResponse(
+            report_id=last_report.report_id,
+            summary=last_report.summary,
+            high_count=last_report.high_count,
+            medium_count=last_report.medium_count,
+            info_count=last_report.info_count,
+            has_critical_drift=last_report.has_critical_drift,
+        )
+
+    return DriftScanResponse(
+        report_id="no-drift",
+        summary="無漂移",
+        high_count=0,
+        medium_count=0,
+        info_count=0,
+        has_critical_drift=False,
+    )
+
+
+@router.get("/reports", response_model=DriftListResponse, summary="列出最近漂移報告")
+async def list_drift_reports() -> DriftListResponse:
+    """列出最近 50 筆漂移報告（倒序）"""
+    repo = get_drift_repository()
+    items = await repo.list_recent(limit=50)
+    return DriftListResponse(items=items, total=len(items))
+
+
+@router.post("/reports/{report_id}/rollback", summary="覆蓋回 Git 狀態")
+async def rollback_drift(report_id: str) -> dict:
+    """
+    將 K8s 狀態覆蓋回 Git YAML（kubectl apply）
+
+    人工確認後才執行，DriftRemediator 負責確定性修復
+    """
+    repo = get_drift_repository()
+    report = await repo.get(report_id)
+    if not report:
+        raise HTTPException(status_code=404, detail=f"Report {report_id} not found")
+
+    remediator = get_drift_remediator()
+    result = await remediator.rollback(report)
+    return result
+
+
+@router.post("/reports/{report_id}/adopt", summary="承認變更並建立 Git PR")
+async def adopt_drift(report_id: str) -> dict:
+    """
+    承認 K8s 漂移，透過 Gitea PR API 將漂移寫回 Git
+
+    2026-04-05 Claude Code: ADR-057 實作 — 改用 Gitea PR API（不再 git push main）
+    流程: 建立 drift/adopt-* branch → commit YAML 注解 → 建立 PR → Telegram 通知 SRE
+    """
+    repo = get_drift_repository()
+    report = await repo.get(report_id)
+    if not report:
+        raise HTTPException(status_code=404, detail=f"Report {report_id} not found")
+
+    from src.services.drift_adopt_service import get_drift_adopt_service
+    adopt_svc = get_drift_adopt_service()
+    result = await adopt_svc.adopt(report)
+    return result
+
+
+# =============================================================================
+# Internal endpoint（供 K8s CronJob 呼叫）
+# =============================================================================
+
+@router.post("/internal/scan", include_in_schema=False, summary="CronJob 觸發掃描")
+async def internal_scan(background_tasks: BackgroundTasks) -> dict:
+    """內部 CronJob 端點，每小時自動掃描 awoooi-prod"""
+    from src.core.config import get_settings
+    settings = get_settings()
+    namespaces = getattr(settings, "DRIFT_SCAN_NAMESPACES", "awoooi-prod").split(",")
+
+    background_tasks.add_task(
+        _run_full_scan,
+        [ns.strip() for ns in namespaces],
+    )
+    return {"status": "scan_triggered", "namespaces": namespaces}
+
+
+# =============================================================================
+# Background helpers
+# =============================================================================
+
+async def _analyze_and_notify(report: DriftReport) -> None:
+    """背景：Nemotron 意圖分析 + Telegram 推送 + Phase 30 AI 人話摘要"""
+    import structlog as _structlog
+    _logger = _structlog.get_logger(__name__)
+    try:
+        interpreter = get_drift_interpreter()
+
+        interpretation = await interpreter.analyze(report)
+        repo = get_drift_repository()
+        await repo.update_interpretation(report.report_id, interpretation)
+
+        # ADR-075: drift_narrator_service 負責發送 TYPE-4D 卡片（含按鈕）
+        # 舊的 send_text() 已移除，改由 narrate_and_notify() 統一處理
+        try:
+            from src.services.drift_narrator_service import get_drift_narrator_service
+            narrator = get_drift_narrator_service()
+            await narrator.narrate_and_notify(report, interpretation)
+        except Exception as e:
+            _logger.warning("drift_narrator_failed", error=str(e))
+
+    except Exception as e:
+        import structlog
+        structlog.get_logger(__name__).error("drift_analyze_notify_failed", error=str(e))
+
+
+async def _run_full_scan(namespaces: list[str]) -> None:
+    """背景：完整漂移掃描"""
+    detector = get_drift_detector()
+    analyzer = get_drift_analyzer()
+
+    repo = get_drift_repository()
+    for namespace in namespaces:
+        try:
+            raw = await detector.scan(namespace, triggered_by="cron")
+            classified = analyzer.classify(raw)
+            await repo.save(classified)
+
+            if analyzer.needs_alert(classified):
+                await _analyze_and_notify(classified)
+        except Exception as e:
+            import structlog
+            structlog.get_logger(__name__).error(
+                "full_scan_namespace_failed", namespace=namespace, error=str(e)
+            )
--- a/apps/api/src/api/v1/github_webhook.py
+++ b/apps/api/src/api/v1/github_webhook.py
@@ -1,34 +1,30 @@
 """
-AWOOOI API - GitHub Webhook Handler
+AWOOOI API - Gitea Webhook Handler
 ====================================
-Phase 13.1: GitHub PR/Push/CI → OpenClaw AI 整合
+ADR-059: GitHub → Gitea Webhook 遷移

 整合流程:
-1. GitHub Webhook (PR/Push/Workflow) → AWOOOI API
-2. HMAC-SHA256 簽章驗證 (X-Hub-Signature-256)
-3. 解析 PR diff / Push commits / Workflow failure
-4. 呼叫 OpenClaw 進行 AI 代碼審查 / CI 失敗診斷
+1. Gitea Webhook (PR/Push) → AWOOOI API
+2. HMAC-SHA256 簽章驗證 (X-Gitea-Signature)
+3. 解析 PR diff / Push commits
+4. 呼叫 OpenClaw 進行 AI 代碼審查
 5. 儲存審查結果到 Redis
 6. 發送 Telegram 通知
 7. (可選) 建立 Approval 等待人工確認

 支援事件:
- pull_request: PR 代碼審查 (#74-75)
- push: 主分支推送審查 (#74-75)
- workflow_run: CI 失敗診斷 (#76)
+- pull_request: PR 代碼審查
+- push: 主分支推送審查
+- ping: 連線測試

-安全要求 (feedback_openclaw_security.md):
- HMAC 簽章驗證 (X-Hub-Signature-256)
+安全要求:
+- HMAC 簽章驗證 (X-Gitea-Signature)
 - Webhook Secret 存放於 K8s Secret
- Rate limiting 防止 DoS
 - 倉庫白名單驗證

-🔴 HARD RULE: 時間顯示使用 Asia/Taipei (UTC+8)
-
-版本: v2.1
-最後修改: 2026-04-01 11:00 (台北時區)
-修改者: Claude Code
-變更: 協調函數移至 Service 層 (leWOOOgo ADR-024)
+版本: v1.0
+最後修改: 2026-04-05 (台北時區)
+修改者: Claude Code (ADR-059 GitHub → Gitea 遷移)
 """

 import hashlib
@@ -42,11 +38,12 @@ from pydantic import BaseModel

 from src.core.config import settings
 from src.core.logging import get_logger
-from src.services.github_webhook_service import get_github_webhook_service
+from src.services.gitea_webhook_service import get_gitea_webhook_service
+from src.services.incident_service import get_incident_service

-logger = get_logger("awoooi.github_webhook")
+logger = get_logger("awoooi.gitea_webhook")

-router = APIRouter(prefix="/webhooks/github", tags=["GitHub Webhook"])
+router = APIRouter(prefix="/webhooks/gitea", tags=["Gitea Webhook"])

 # =============================================================================
 # Constants
@@ -55,23 +52,19 @@ router = APIRouter(prefix="/webhooks/github", tags=["GitHub Webhook"])
 # OpenClaw 配置 (使用 settings 中的 OPENCLAW_URL)
 OPENCLAW_URL = settings.OPENCLAW_URL

-# GitHub Review 結果 Redis TTL: 7 天 (秒)
-GITHUB_REVIEW_TTL_SECONDS = 7 * 24 * 60 * 60
-
-
 # =============================================================================
 # Pydantic Models
 # =============================================================================

-class GitHubUser(BaseModel):
-    """GitHub 使用者"""
+class GiteaUser(BaseModel):
+    """Gitea 使用者"""
    login: str
    id: int
    avatar_url: str | None = None


-class GitHubRepository(BaseModel):
-    """GitHub 倉庫"""
+class GiteaRepository(BaseModel):
+    """Gitea 倉庫"""
    id: int
    name: str
    full_name: str
@@ -79,8 +72,8 @@ class GitHubRepository(BaseModel):
    html_url: str


-class GitHubPullRequest(BaseModel):
-    """GitHub PR 資訊"""
+class GiteaPullRequest(BaseModel):
+    """Gitea PR 資訊"""
    id: int
    number: int
    title: str
@@ -88,7 +81,7 @@ class GitHubPullRequest(BaseModel):
    state: str  # open, closed
    html_url: str
    diff_url: str
-    user: GitHubUser
+    user: GiteaUser
    head: dict  # head branch info
    base: dict  # base branch info
    additions: int = 0
@@ -96,8 +89,8 @@ class GitHubPullRequest(BaseModel):
    changed_files: int = 0


-class GitHubCommit(BaseModel):
-    """GitHub Commit 資訊"""
+class GiteaCommit(BaseModel):
+    """Gitea Commit 資訊"""
    id: str  # SHA
    message: str
    timestamp: str
@@ -108,53 +101,35 @@ class GitHubCommit(BaseModel):
    modified: list[str] = []


-class GitHubWorkflowRun(BaseModel):
-    """GitHub Workflow Run 資訊 (Phase 13.1 #76)"""
+class GiteaWorkflowRun(BaseModel):
+    """Gitea Actions Workflow Run 資訊"""
    id: int
    name: str
-    status: str  # queued, in_progress, completed
-    conclusion: str | None = None  # success, failure, cancelled, skipped, timed_out
-    html_url: str
-    run_number: int
-    run_attempt: int = 1
-    head_sha: str
-    head_branch: str | None = None
-    event: str  # push, pull_request, schedule, workflow_dispatch
-    created_at: str
-    updated_at: str
-    logs_url: str | None = None  # API URL for logs (requires auth)
-
-
-class GitHubWorkflowJob(BaseModel):
-    """GitHub Workflow Job 資訊"""
-    id: int
-    name: str
-    status: str
+    status: str   # waiting, running, success, failure, cancelled, skipped
    conclusion: str | None = None
-    started_at: str | None = None
-    completed_at: str | None = None
-    steps: list[dict] = []
+    head_sha: str | None = None
+    head_branch: str | None = None
+    html_url: str | None = None


-class GitHubWebhookPayload(BaseModel):
-    """GitHub Webhook Payload (通用)"""
+class GiteaWebhookPayload(BaseModel):
+    """Gitea Webhook Payload (通用)"""
    action: str | None = None  # PR: opened, synchronize, etc.
-    repository: GitHubRepository
-    sender: GitHubUser
+    repository: GiteaRepository
+    sender: GiteaUser
    # PR 事件
-    pull_request: GitHubPullRequest | None = None
+    pull_request: GiteaPullRequest | None = None
    # Push 事件
    ref: str | None = None  # refs/heads/main
    before: str | None = None  # previous commit SHA
    after: str | None = None  # current commit SHA
-    commits: list[GitHubCommit] | None = None
+    commits: list[GiteaCommit] | None = None
    pusher: dict | None = None
-    # Workflow Run 事件 (Phase 13.1 #76)
-    workflow_run: GitHubWorkflowRun | None = None
-    workflow_job: GitHubWorkflowJob | None = None
+    # workflow_run 事件 (ADR-074 M3)
+    workflow_run: GiteaWorkflowRun | None = None


-class GitHubWebhookResponse(BaseModel):
+class GiteaWebhookResponse(BaseModel):
    """Webhook 回應"""
    status: Literal["accepted", "ignored", "error"]
    message: str
@@ -166,77 +141,59 @@ class GitHubWebhookResponse(BaseModel):
 # HMAC Signature Verification (CISO 安全要求)
 # =============================================================================

-class GitHubSignatureError(Exception):
-    """GitHub 簽章驗證失敗"""
+class GiteaSignatureError(Exception):
+    """Gitea 簽章驗證失敗"""
    pass


-async def verify_github_signature(
+async def verify_gitea_signature(
    request: Request,
-    x_hub_signature_256: str | None,
+    x_gitea_signature: str | None,
 ) -> bool:
    """
-    驗證 GitHub Webhook 請求的 HMAC-SHA256 簽章
+    驗證 Gitea Webhook 請求的 HMAC-SHA256 簽章

-    CISO 安全要求:
-    - 所有 GitHub Webhook 必須攜帶 X-Hub-Signature-256 Header
+    - 簽章 Header: X-Gitea-Signature
    - 簽章格式: sha256=<hex_digest>
-    - 使用 GITHUB_WEBHOOK_SECRET 進行驗證
+    - 使用 GITEA_WEBHOOK_SECRET 進行驗證

    安全鐵律 (Fail-Closed):
    - 生產環境: Secret 未設定 → 直接拒絕
    - 開發環境: 可跳過驗證 (僅供本地測試)
-
-    Args:
-        request: FastAPI Request 物件
-        x_hub_signature_256: X-Hub-Signature-256 Header 值
-
-    Returns:
-        bool: 驗證是否通過
-
-    Raises:
-        GitHubSignatureError: 簽章驗證失敗
    """
-    # ==========================================================================
-    # Fail-Closed 安全策略 (CISO 要求)
-    # ==========================================================================
-    if not settings.GITHUB_WEBHOOK_SECRET:
-        # 生產環境: 強制拒絕 (Fail-Closed)
+    if not settings.GITEA_WEBHOOK_SECRET:
        if settings.ENVIRONMENT == "prod":
            logger.critical(
-                "github_webhook_secret_missing_in_production",
+                "gitea_webhook_secret_missing_in_production",
                environment=settings.ENVIRONMENT,
-                message="CRITICAL: GITHUB_WEBHOOK_SECRET missing in production!",
+                message="CRITICAL: GITEA_WEBHOOK_SECRET missing in production!",
            )
-            raise GitHubSignatureError(
-                "Critical: GITHUB_WEBHOOK_SECRET missing in production environment"
+            raise GiteaSignatureError(
+                "Critical: GITEA_WEBHOOK_SECRET missing in production environment"
            )

-        # 開發環境: 允許跳過 (僅供本地測試)
        logger.warning(
-            "github_signature_verification_skipped_dev_only",
+            "gitea_signature_verification_skipped_dev_only",
            environment=settings.ENVIRONMENT,
-            reason="GITHUB_WEBHOOK_SECRET not configured (dev mode only)",
+            reason="GITEA_WEBHOOK_SECRET not configured (dev mode only)",
        )
        return True

-    # 必須提供簽章
-    if not x_hub_signature_256:
-        logger.warning("github_signature_missing")
-        raise GitHubSignatureError("Missing X-Hub-Signature-256 header")
+    if not x_gitea_signature:
+        logger.warning("gitea_signature_missing")
+        raise GiteaSignatureError("Missing X-Gitea-Signature header")

-    # 解析簽章格式
-    if not x_hub_signature_256.startswith("sha256="):
-        raise GitHubSignatureError("Invalid signature format (expected sha256=...)")
+    # Gitea 送出純 hex（無 "sha256=" 前綴），GitHub 才有前綴
+    # 2026-04-05 ogt: 修正 Gitea 實際格式為純 hex
+    if x_gitea_signature.startswith("sha256="):
+        provided_signature = x_gitea_signature[7:]
+    else:
+        provided_signature = x_gitea_signature

-    provided_signature = x_hub_signature_256[7:]  # 移除 "sha256=" 前綴
-
-    # 讀取 Request Body
    body = await request.body()

-    # 計算預期簽章
    expected_signature = hmac.new(
-        settings.GITHUB_WEBHOOK_SECRET.encode(),
+        settings.GITEA_WEBHOOK_SECRET.encode(),
        body,
        hashlib.sha256,
    ).hexdigest()
@@ -244,17 +201,17 @@ async def verify_github_signature(
    # 常數時間比較 (防止計時攻擊)
    if not hmac.compare_digest(provided_signature, expected_signature):
        logger.warning(
-            "github_signature_verification_failed",
+            "gitea_signature_verification_failed",
            provided=provided_signature[:16] + "...",
            expected=expected_signature[:16] + "...",
        )
-        raise GitHubSignatureError("Invalid signature")
+        raise GiteaSignatureError("Invalid signature")

-    logger.info("github_signature_verification_success")
+    logger.info("gitea_signature_verification_success")
    return True


-def verify_allowed_repo(full_name: str) -> bool:
+def verify_gitea_allowed_repo(full_name: str) -> bool:
    """
    驗證倉庫是否在白名單中

@@ -264,20 +221,20 @@ def verify_allowed_repo(full_name: str) -> bool:
    Returns:
        bool: 是否允許
    """
-    allowed_repos = settings.get_github_allowed_repos()
+    allowed_repos = settings.get_gitea_allowed_repos()

    # 如果白名單為空，開發環境允許所有
    if not allowed_repos:
        if settings.ENVIRONMENT == "prod":
            logger.warning(
-                "github_allowed_repos_empty_in_production",
+                "gitea_allowed_repos_empty_in_production",
                repo=full_name,
                message="No allowed repos configured in production",
            )
            return False
        # 開發環境: 白名單空 = 允許所有
        logger.debug(
-            "github_repo_allowed_dev_mode",
+            "gitea_repo_allowed_dev_mode",
            repo=full_name,
            reason="Empty whitelist in dev mode",
        )
@@ -287,7 +244,7 @@ def verify_allowed_repo(full_name: str) -> bool:
    is_allowed = full_name in allowed_repos
    if not is_allowed:
        logger.warning(
-            "github_repo_not_in_whitelist",
+            "gitea_repo_not_in_whitelist",
            repo=full_name,
            allowed_repos=allowed_repos,
        )
@@ -300,20 +257,20 @@ def verify_allowed_repo(full_name: str) -> bool:

@router.post(
    "",
-    response_model=GitHubWebhookResponse,
+    response_model=GiteaWebhookResponse,
    status_code=status.HTTP_202_ACCEPTED,
-    summary="GitHub Webhook 接收端點",
-    description="接收 GitHub PR/Push 事件並觸發 AI 代碼審查",
+    summary="Gitea Webhook 接收端點",
+    description="接收 Gitea PR/Push 事件並觸發 AI 代碼審查",
 )
-async def handle_github_webhook(
+async def handle_gitea_webhook(
    request: Request,
    background_tasks: BackgroundTasks,
-    x_github_event: str | None = Header(None, alias="X-GitHub-Event"),
-    x_github_delivery: str | None = Header(None, alias="X-GitHub-Delivery"),
-    x_hub_signature_256: str | None = Header(None, alias="X-Hub-Signature-256"),
+    x_gitea_event: str | None = Header(None, alias="X-Gitea-Event"),
+    x_gitea_delivery: str | None = Header(None, alias="X-Gitea-Delivery"),
+    x_gitea_signature: str | None = Header(None, alias="X-Gitea-Signature"),
 ):
    """
-    GitHub Webhook Handler
+    Gitea Webhook Handler

    支援事件:
    - pull_request (opened, synchronize, reopened)
@@ -323,14 +280,14 @@ async def handle_github_webhook(
    1. 驗證簽章
    2. 驗證倉庫白名單
    3. 解析事件類型
-    4. 背景執行 AI 審查 (委派給 GitHubWebhookService)
+    4. 背景執行 AI 審查 (委派給 GiteaWebhookService)
    """
    try:
        # 1. 驗證 HMAC 簽章
        try:
-            await verify_github_signature(request, x_hub_signature_256)
-        except GitHubSignatureError as e:
-            logger.warning("github_webhook_signature_failed", error=str(e))
+            await verify_gitea_signature(request, x_gitea_signature)
+        except GiteaSignatureError as e:
+            logger.warning("gitea_webhook_signature_failed", error=str(e))
            raise HTTPException(
                status_code=status.HTTP_401_UNAUTHORIZED,
                detail=str(e),
@@ -339,52 +296,51 @@ async def handle_github_webhook(
        # 2. 解析 Payload
        body = await request.body()
        payload_dict = json.loads(body)
-        payload = GitHubWebhookPayload(**payload_dict)
+        payload = GiteaWebhookPayload(**payload_dict)

        # 3. 驗證倉庫白名單
-        if not verify_allowed_repo(payload.repository.full_name):
-            return GitHubWebhookResponse(
+        if not verify_gitea_allowed_repo(payload.repository.full_name):
+            return GiteaWebhookResponse(
                status="ignored",
                message=f"Repository {payload.repository.full_name} not in whitelist",
-                event_type=x_github_event,
+                event_type=x_gitea_event,
            )

        # 4. 根據事件類型處理
        logger.info(
-            "github_webhook_received",
-            github_event=x_github_event,
-            delivery_id=x_github_delivery,
+            "gitea_webhook_received",
+            gitea_event=x_gitea_event,
+            delivery_id=x_gitea_delivery,
            repo=payload.repository.full_name,
            sender=payload.sender.login,
        )

        # Pull Request 事件
-        if x_github_event == "pull_request":
+        if x_gitea_event == "pull_request":
            return await handle_pull_request(
                payload=payload,
                background_tasks=background_tasks,
-                delivery_id=x_github_delivery,
+                delivery_id=x_gitea_delivery,
            )

        # Push 事件
-        elif x_github_event == "push":
+        elif x_gitea_event == "push":
            return await handle_push(
                payload=payload,
                background_tasks=background_tasks,
-                delivery_id=x_github_delivery,
+                delivery_id=x_gitea_delivery,
            )

-        # Workflow Run 事件 (Phase 13.1 #76 CI 失敗診斷)
-        elif x_github_event == "workflow_run":
+        # workflow_run 事件 (ADR-074 M3: CI/CD 管線失敗告警)
+        elif x_gitea_event == "workflow_run":
            return await handle_workflow_run(
                payload=payload,
                background_tasks=background_tasks,
-                delivery_id=x_github_delivery,
            )

-        # Ping 事件 (GitHub 測試連線)
-        elif x_github_event == "ping":
-            return GitHubWebhookResponse(
+        # Ping 事件 (Gitea 測試連線)
+        elif x_gitea_event == "ping":
+            return GiteaWebhookResponse(
                status="accepted",
                message="Pong! Webhook configured successfully.",
                event_type="ping",
@@ -392,16 +348,16 @@ async def handle_github_webhook(

        # 其他事件 (忽略)
        else:
-            return GitHubWebhookResponse(
+            return GiteaWebhookResponse(
                status="ignored",
-                message=f"Event type '{x_github_event}' not supported",
-                event_type=x_github_event,
+                message=f"Event type '{x_gitea_event}' not supported",
+                event_type=x_gitea_event,
            )

    except HTTPException:
        raise
    except Exception as e:
-        logger.exception("github_webhook_processing_failed", error=str(e))
+        logger.exception("gitea_webhook_processing_failed", error=str(e))
        raise HTTPException(
            status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
            detail="Internal error processing webhook",
@@ -413,10 +369,10 @@ async def handle_github_webhook(
 # =============================================================================

 async def handle_pull_request(
-    payload: GitHubWebhookPayload,
+    payload: GiteaWebhookPayload,
    background_tasks: BackgroundTasks,
    delivery_id: str | None,  # noqa: ARG001 — reserved for idempotency (future use)
-) -> GitHubWebhookResponse:
+) -> GiteaWebhookResponse:
    """
    處理 Pull Request 事件

@@ -427,7 +383,7 @@ async def handle_pull_request(
    """
    pr = payload.pull_request
    if not pr:
-        return GitHubWebhookResponse(
+        return GiteaWebhookResponse(
            status="error",
            message="Missing pull_request data",
            event_type="pull_request",
@@ -436,17 +392,17 @@ async def handle_pull_request(
    # 只處理需要審查的 action
    supported_actions = {"opened", "synchronize", "reopened"}
    if payload.action not in supported_actions:
-        return GitHubWebhookResponse(
+        return GiteaWebhookResponse(
            status="ignored",
            message=f"PR action '{payload.action}' not supported",
            event_type="pull_request",
        )

    # 生成審查 ID
-    review_id = f"gh-pr-{payload.repository.id}-{pr.number}-{uuid.uuid4().hex[:8]}"
+    review_id = f"gitea-pr-{payload.repository.id}-{pr.number}-{uuid.uuid4().hex[:8]}"

    # 背景執行審查 (委派給 Service)
-    service = get_github_webhook_service()
+    service = get_gitea_webhook_service()
    background_tasks.add_task(
        service.review_pull_request,
        repo=payload.repository,
@@ -457,7 +413,7 @@ async def handle_pull_request(
    )

    logger.info(
-        "github_pr_review_scheduled",
+        "gitea_pr_review_scheduled",
        review_id=review_id,
        repo=payload.repository.full_name,
        pr_number=pr.number,
@@ -465,7 +421,7 @@ async def handle_pull_request(
        action=payload.action,
    )

-    return GitHubWebhookResponse(
+    return GiteaWebhookResponse(
        status="accepted",
        message=f"PR #{pr.number} review scheduled",
        event_type="pull_request",
@@ -474,10 +430,10 @@ async def handle_pull_request(


 async def handle_push(
-    payload: GitHubWebhookPayload,
+    payload: GiteaWebhookPayload,
    background_tasks: BackgroundTasks,
    delivery_id: str | None,  # noqa: ARG001 — reserved for idempotency (future use)
-) -> GitHubWebhookResponse:
+) -> GiteaWebhookResponse:
    """
    處理 Push 事件

@@ -487,7 +443,7 @@ async def handle_push(
    ref = payload.ref or ""
    # 通常是 refs/heads/main 或 refs/heads/master
    if not (ref.endswith("/main") or ref.endswith("/master")):
-        return GitHubWebhookResponse(
+        return GiteaWebhookResponse(
            status="ignored",
            message=f"Push to non-default branch: {ref}",
            event_type="push",
@@ -495,17 +451,17 @@ async def handle_push(

    commits = payload.commits or []
    if not commits:
-        return GitHubWebhookResponse(
+        return GiteaWebhookResponse(
            status="ignored",
            message="No commits in push",
            event_type="push",
        )

    # 生成審查 ID
-    review_id = f"gh-push-{payload.repository.id}-{payload.after[:8]}-{uuid.uuid4().hex[:8]}"
+    review_id = f"gitea-push-{payload.repository.id}-{payload.after[:8]}-{uuid.uuid4().hex[:8]}"

    # 背景執行審查 (委派給 Service)
-    service = get_github_webhook_service()
+    service = get_gitea_webhook_service()
    background_tasks.add_task(
        service.review_push,
        repo=payload.repository,
@@ -518,7 +474,7 @@ async def handle_push(
    )

    logger.info(
-        "github_push_review_scheduled",
+        "gitea_push_review_scheduled",
        review_id=review_id,
        repo=payload.repository.full_name,
        ref=ref,
@@ -526,7 +482,7 @@ async def handle_push(
        after_sha=payload.after[:8] if payload.after else None,
    )

-    return GitHubWebhookResponse(
+    return GiteaWebhookResponse(
        status="accepted",
        message=f"Push with {len(commits)} commit(s) review scheduled",
        event_type="push",
@@ -535,67 +491,81 @@ async def handle_push(


 async def handle_workflow_run(
-    payload: GitHubWebhookPayload,
+    payload: GiteaWebhookPayload,
    background_tasks: BackgroundTasks,
-    delivery_id: str | None,  # noqa: ARG001 — reserved for idempotency (future use)
-) -> GitHubWebhookResponse:
+) -> GiteaWebhookResponse:
    """
-    處理 Workflow Run 事件 (Phase 13.1 #76 CI 失敗診斷)
+    處理 Gitea Actions workflow_run 事件 — ADR-074 M3

-    只處理 completed + failure 的 workflow run
+    只處理 status=failure（或 conclusion=failure）的管線失敗。
+    建立 TYPE-1 Incident（純通知，不自動修復）。
    """
-    workflow_run = payload.workflow_run
-    if not workflow_run:
-        return GitHubWebhookResponse(
-            status="ignored",
-            message="No workflow_run in payload",
+    wf = payload.workflow_run
+    if not wf:
+        return GiteaWebhookResponse(
+            status="error",
+            message="Missing workflow_run data",
            event_type="workflow_run",
        )

-    # 只處理 completed 狀態
-    if workflow_run.status != "completed":
-        return GitHubWebhookResponse(
+    # 只關心失敗
+    failed = wf.status == "failure" or wf.conclusion == "failure"
+    if not failed:
+        return GiteaWebhookResponse(
            status="ignored",
-            message=f"Workflow status '{workflow_run.status}' not completed",
+            message=f"workflow_run status='{wf.status}' conclusion='{wf.conclusion}' — not a failure",
            event_type="workflow_run",
        )

-    # 只處理失敗的 workflow
-    if workflow_run.conclusion not in ("failure", "timed_out"):
-        return GitHubWebhookResponse(
-            status="ignored",
-            message=f"Workflow conclusion '{workflow_run.conclusion}' is not failure",
-            event_type="workflow_run",
-        )
+    repo = payload.repository.full_name
+    branch = wf.head_branch or "unknown"
+    sha_short = (wf.head_sha or "")[:8] or "unknown"
+    run_url = wf.html_url or ""

-    # 生成診斷 ID
-    diagnosis_id = f"gh-ci-{payload.repository.id}-{workflow_run.id}-{uuid.uuid4().hex[:8]}"
-
-    # 背景執行 CI 失敗診斷 (委派給 Service)
-    service = get_github_webhook_service()
-    background_tasks.add_task(
-        service.diagnose_ci_failure,
-        repo=payload.repository,
-        workflow_run=workflow_run,
-        sender=payload.sender,
-        diagnosis_id=diagnosis_id,
+    logger.warning(
+        "gitea_ci_pipeline_failed",
+        repo=repo,
+        workflow=wf.name,
+        branch=branch,
+        sha=sha_short,
+        run_url=run_url,
    )

-    logger.info(
-        "github_ci_failure_diagnosis_scheduled",
-        diagnosis_id=diagnosis_id,
-        repo=payload.repository.full_name,
-        workflow_name=workflow_run.name,
-        workflow_id=workflow_run.id,
-        conclusion=workflow_run.conclusion,
-        head_sha=workflow_run.head_sha[:8],
-    )
+    async def _create_ci_incident() -> None:
+        try:
+            svc = get_incident_service()
+            await svc.create_incident_from_signal({
+                "alert_name": "GiteaCIPipelineFailed",
+                "severity": "warning",
+                "source": "gitea",
+                "fingerprint": f"gitea-ci-{repo}-{branch}",
+                "labels": {
+                    "alertname": "GiteaCIPipelineFailed",
+                    "severity": "warning",
+                    "repo": repo,
+                    "workflow": wf.name,
+                    "branch": branch,
+                    "sha": sha_short,
+                    "run_url": run_url,
+                    "notification_type": "TYPE-1",
+                    "alert_category": "infrastructure",
+                },
+                "annotations": {
+                    "summary": f"CI 管線失敗：{repo} [{branch}] {wf.name}",
+                    "description": (
+                        f"Gitea Actions 管線 `{wf.name}` 在 `{branch}` ({sha_short}) 失敗。\n{run_url}"
+                    ),
+                },
+            })
+        except Exception:
+            logger.exception("gitea_ci_incident_create_failed", repo=repo, workflow=wf.name)

-    return GitHubWebhookResponse(
+    background_tasks.add_task(_create_ci_incident)
+
+    return GiteaWebhookResponse(
        status="accepted",
-        message=f"CI failure diagnosis scheduled for '{workflow_run.name}'",
+        message=f"CI pipeline failure for '{wf.name}' on '{branch}' queued as TYPE-1 incident",
        event_type="workflow_run",
-        review_id=diagnosis_id,
    )


@@ -606,11 +576,11 @@ async def handle_workflow_run(
@router.get(
    "/reviews/{review_id}",
    summary="取得審查結果",
-    description="根據 review_id 取得 GitHub 代碼審查結果",
+    description="根據 review_id 取得 Gitea 代碼審查結果",
 )
 async def get_review_result(review_id: str):
    """
-    取得 GitHub 審查結果 (透過 Service)
+    取得 Gitea 審查結果 (透過 Service)

    Args:
        review_id: 審查 ID
@@ -618,7 +588,7 @@ async def get_review_result(review_id: str):
    Returns:
        dict: 審查結果
    """
-    service = get_github_webhook_service()
+    service = get_gitea_webhook_service()
    result = await service.get_review_result(review_id)

    if not result:
--- a/apps/api/src/api/v1/incidents.py
+++ b/apps/api/src/api/v1/incidents.py
@@ -143,37 +143,56 @@ async def list_incidents() -> IncidentListResponse:

        incidents.sort(key=safe_created_at, reverse=True)

-        # Phase 6.5: 為每個事件生成決策令牌 (非同步並行)
+        # 2026-04-09 Claude Sonnet 4.6: 效能修復 — list endpoint 不同步等待 AI
+        # 原設計: 每個 incident await AI 決策 (120-180s timeout)，多 incident 時乘積爆炸
+        # 修復: 只取已存在的決策 token，若無則背景觸發生成，前端 poll 單筆 GET 取得結果
+        import asyncio
+
        responses = []
+        background_tasks = []
+
        for incident in incidents:
            try:
-                # P0/P1 給更短的 timeout (緊急)
-                # 2026-03-27 ogt: 增加超時 (Ollama CPU 模式 llama3.2:3b 約 2-3 分鐘)
-                timeout = 120.0 if incident.severity in (Severity.P0, Severity.P1) else 180.0
-
-                decision_token = await decision_manager.get_or_create_decision(
-                    incident=incident,
-                    timeout_sec=timeout,
-                )
-
-                decision_info = DecisionInfo(
-                    token=decision_token.token,
-                    state=decision_token.state.value,
-                    proposal_data=decision_token.proposal_data,
-                    proposal_id=decision_token.proposal_id,
-                )
-
-                responses.append(IncidentResponse.from_incident(incident, decision_info))
-
+                # 只查已快取的決策 (不等待 AI，立即返回)
+                existing = await decision_manager._find_existing_token(incident.incident_id)
+                if existing:
+                    decision_info = DecisionInfo(
+                        token=existing.token,
+                        state=existing.state.value,
+                        proposal_data=existing.proposal_data,
+                        proposal_id=existing.proposal_id,
+                    )
+                    responses.append(IncidentResponse.from_incident(incident, decision_info))
+                else:
+                    # 無快取 → 背景觸發，本次返回 None（前端看到 decision=null 會 poll）
+                    responses.append(IncidentResponse.from_incident(incident, None))
+                    # 2026-04-16 Claude Sonnet 4.6: 只對 48h 內的 incident 觸發 AI 分析
+                    # 舊 incident token 每小時過期，若不限制會反覆重新分析歷史事件 → Telegram 洪水
+                    from datetime import datetime, timezone, timedelta
+                    _created = getattr(incident, "created_at", None)
+                    _too_old = False
+                    if _created:
+                        if _created.tzinfo is None:
+                            _created = _created.replace(tzinfo=timezone.utc)
+                        _too_old = (_created < datetime.now(timezone.utc) - timedelta(hours=48))
+                    if not _too_old:
+                        timeout = 120.0 if incident.severity in (Severity.P0, Severity.P1) else 180.0
+                        background_tasks.append(
+                            decision_manager.get_or_create_decision(incident=incident, timeout_sec=timeout)
+                        )
            except Exception as e:
                logger.warning(
                    "decision_generation_failed",
                    incident_id=incident.incident_id,
                    error=str(e),
                )
-                # 即使決策生成失敗，也返回事件 (不含 decision)
                responses.append(IncidentResponse.from_incident(incident, None))

+        # 背景觸發 AI 決策（fire-and-forget，不阻塞 response）
+        if background_tasks:
+            for task in background_tasks:
+                asyncio.create_task(task)
+
        logger.info(
            "incidents_listed",
            count=len(incidents),
--- a/apps/api/src/api/v1/knowledge.py
+++ b/apps/api/src/api/v1/knowledge.py
@@ -36,7 +36,7 @@ router = APIRouter(prefix="/knowledge", tags=["Knowledge Base"])
 # Endpoints
 # =============================================================================

-@router.get("/", response_model=KnowledgeListResponse)
+@router.get("", response_model=KnowledgeListResponse)
 async def list_entries(
    category: str | None = Query(None, description="篩選分類"),
    entry_type: EntryType | None = Query(None, description="篩選類型"),
@@ -67,6 +67,36 @@ async def search_entries(
    return await service.search(q, limit)


+@router.get("/semantic-search")
+async def semantic_search(
+    q: str = Query(..., min_length=1, description="語意搜尋查詢"),
+    limit: int = Query(10, ge=1, le=50),
+    threshold: float = Query(0.5, ge=0.0, le=1.0, description="相似度門檻 (0-1)"),
+) -> list[dict]:
+    """
+    語意搜尋 (pgvector cosine similarity)
+
+    使用 nomic-embed-text 向量模型，回傳含相似度分數的結果。
+    """
+    service = get_knowledge_service()
+    results = await service.semantic_search(q, limit=limit, threshold=threshold)
+    return [
+        {**entry.model_dump(), "score": round(score, 4)}
+        for entry, score in results
+    ]
+
+
+@router.post("/embed-all", status_code=200)
+async def embed_all_entries() -> dict:
+    """
+    管理端點: 批次為所有未 embed 的條目產生 embedding
+
+    Returns: {"total": N, "success": N, "failed": N}
+    """
+    service = get_knowledge_service()
+    return await service.embed_all_entries()
+
+
@router.get("/categories")
 async def get_categories() -> list[dict]:
    """取得分類樹 (含各類數量)"""
@@ -85,7 +115,7 @@ async def get_entry(entry_id: str) -> KnowledgeEntry:
    return entry


-@router.post("/", response_model=KnowledgeEntry, status_code=201)
+@router.post("", response_model=KnowledgeEntry, status_code=201)
 async def create_entry(data: KnowledgeEntryCreate) -> KnowledgeEntry:
    """建立新知識條目"""
    service = get_knowledge_service()
@@ -122,3 +152,48 @@ async def archive_entry(entry_id: str) -> None:
    success = await service.archive_entry(entry_id)
    if not success:
        raise HTTPException(status_code=404, detail="Knowledge entry not found")
+
+
+# =============================================================================
+# Phase 33 (ADR-067 2026-04-10): RAG pgvector 端點
+# =============================================================================
+
+@router.post("/rag/index", status_code=200)
+async def rag_index_document(body: dict) -> dict:
+    """
+    索引文件到 RAG 知識庫 (pgvector rag_chunks 表)
+
+    body: {source, source_id, title, text, metadata?}
+    """
+    from src.services.knowledge_rag_service import get_knowledge_rag_service
+    svc = get_knowledge_rag_service()
+    ok = await svc.index_document(
+        source=body.get("source", "manual"),
+        source_id=body.get("source_id", ""),
+        title=body.get("title", ""),
+        text=body.get("text", ""),
+        metadata=body.get("metadata", {}),
+    )
+    return {"ok": ok, "title": body.get("title", "")}
+
+
+@router.get("/rag/query")
+async def rag_query(
+    q: str = Query(..., min_length=1, description="RAG 查詢問題"),
+    top_k: int = Query(5, ge=1, le=20),
+) -> dict:
+    """
+    RAG 查詢 — embedding → pgvector knn → 生成繁中回答
+    """
+    from src.services.knowledge_rag_service import get_knowledge_rag_service
+    svc = get_knowledge_rag_service()
+    answer = await svc.query(q, top_k=top_k)
+    return {"question": q, "answer": answer}
+
+
+@router.get("/rag/stats")
+async def rag_stats() -> dict:
+    """RAG 知識庫統計"""
+    from src.services.knowledge_rag_service import get_knowledge_rag_service
+    svc = get_knowledge_rag_service()
+    return await svc.get_stats()
--- a/apps/api/src/api/v1/learning.py
+++ b/apps/api/src/api/v1/learning.py
@@ -18,7 +18,7 @@ Phase D-G P0 修正: 新增學習 API 端點
 """

 import structlog
-from fastapi import APIRouter
+from fastapi import APIRouter, HTTPException
 from pydantic import BaseModel

 from src.services.learning_service import get_learning_service
@@ -125,3 +125,60 @@ async def get_recommendation(anomaly_key: str) -> RecommendationResponse:
    )

    return RecommendationResponse(**recommendation)
+
+
+# =============================================================================
+# Evolver Admin Endpoints
+# =============================================================================
+
+
+class EvolverRunResponse(BaseModel):
+    """Evolver 執行報告回應"""
+    archived_count: int
+    merged_count: int
+    skipped_count: int
+    archived_ids: list[str]
+    merged_pairs: list[list[str]]  # [[dropped_id, kept_id], ...]
+    errors: list[str]
+    total_affected: int
+
+
+@router.post(
+    "/evolver/run",
+    response_model=EvolverRunResponse,
+    summary="手動觸發 Evolver Agent",
+    description=(
+        "立即執行 Playbook Evolver：低信任封存 + 休眠封存 + 相似合併。"
+        "需要 AIOPS_P3_EVOLVER_ENABLED=True，否則返回空報告（HTTP 200）。"
+        "ADR-083 Phase 3 手動演練端點。"
+    ),
+)
+async def run_evolver_now() -> EvolverRunResponse:
+    """
+    手動觸發 Evolver Agent（Phase 3 exit condition #6 演練端點）
+
+    Returns:
+        EvolverRunResponse: 合併/封存報告
+    """
+    try:
+        from src.services.playbook_evolver import run_evolver
+        report = await run_evolver(force=True)  # 管理員手動觸發，繞過 feature flag
+    except Exception as exc:
+        logger.exception("evolver_manual_run_failed")
+        raise HTTPException(status_code=500, detail=str(exc)) from exc
+
+    logger.info(
+        "evolver_manual_run_done",
+        archived=report.archived_count,
+        merged=report.merged_count,
+        skipped=report.skipped_count,
+    )
+    return EvolverRunResponse(
+        archived_count=report.archived_count,
+        merged_count=report.merged_count,
+        skipped_count=report.skipped_count,
+        archived_ids=report.archived_ids,
+        merged_pairs=[list(p) for p in report.merged_pairs],
+        errors=report.errors,
+        total_affected=report.total_affected,
+    )
--- a/apps/api/src/api/v1/monitoring.py
+++ b/apps/api/src/api/v1/monitoring.py
@@ -0,0 +1,247 @@
+"""
+Monitoring Status API
+=====================
+探測所有可觀測性工具狀態：
+  Grafana / Prometheus / Sentry / Langfuse / SigNoz / Gitea
+
+所有探測從後端發出，不暴露內網 IP 給前端。
+Grafana: 110:3002 (Docker 3002→3000)
+
+建立時間: 2026-04-03 (台北時區)
+建立者: Claude Code
+更新時間: 2026-04-03 加入 Grafana(3002) / Sentry / Langfuse
+"""
+
+import asyncio
+from datetime import UTC, datetime
+
+import httpx
+from fastapi import APIRouter
+
+from src.core.logging import get_logger
+
+logger = get_logger(__name__)
+
+router = APIRouter(prefix="/monitoring", tags=["Monitoring"])
+
+TIMEOUT = 3.0
+
+
+# =============================================================================
+# Probes
+# =============================================================================
+
+async def _probe_grafana(client: httpx.AsyncClient) -> dict:
+    base = "http://192.168.0.110:3002"
+    try:
+        r = await client.get(f"{base}/api/health", timeout=TIMEOUT)
+        if r.status_code == 200:
+            data = r.json()
+            version = data.get("version")
+            # Dashboard count requires basic auth (internal probe only)
+            import base64 as _b64
+            _token = _b64.b64encode(b"admin:WoooTech2026").decode()
+            dash_r = await client.get(
+                f"{base}/api/search?type=dash-db",
+                headers={"Authorization": f"Basic {_token}"},
+                timeout=TIMEOUT,
+            )
+            dash_count = len(dash_r.json()) if dash_r.status_code == 200 and isinstance(dash_r.json(), list) else None
+            return {
+                "name": "Grafana",
+                "status": "up",
+                "version": version,
+                "stats": f"面板 {dash_count} 個" if dash_count is not None else "監控面板",
+                "description": "指標視覺化 · Dashboard",
+                "url": base,
+            }
+    except Exception as e:
+        logger.warning("grafana_probe_failed", error=str(e))
+    return {
+        "name": "Grafana", "status": "down", "version": None,
+        "stats": None, "description": "指標視覺化 · Dashboard", "url": base,
+    }
+
+
+async def _probe_prometheus(client: httpx.AsyncClient) -> dict:
+    base = "http://192.168.0.110:9090"
+    try:
+        health_r = await client.get(f"{base}/-/healthy", timeout=TIMEOUT)
+        if health_r.status_code == 200:
+            build_r = await client.get(f"{base}/api/v1/status/buildinfo", timeout=TIMEOUT)
+            version = None
+            if build_r.status_code == 200:
+                version = build_r.json().get("data", {}).get("version")
+            rules_r = await client.get(f"{base}/api/v1/rules", timeout=TIMEOUT)
+            rules_count = 0
+            firing_count = 0
+            if rules_r.status_code == 200:
+                groups = rules_r.json().get("data", {}).get("groups", [])
+                rules_count = sum(len(g.get("rules", [])) for g in groups)
+                firing_count = sum(
+                    1 for g in groups for r in g.get("rules", [])
+                    if r.get("state") == "firing"
+                )
+            stats_parts = [f"規則 {rules_count} 條"]
+            if firing_count > 0:
+                stats_parts.append(f"{firing_count} 觸發")
+            return {
+                "name": "Prometheus",
+                "status": "up",
+                "version": version,
+                "stats": " · ".join(stats_parts),
+                "description": "時序資料庫 · 告警規則",
+                "firing_count": firing_count,
+                "url": base,
+            }
+    except Exception as e:
+        logger.warning("prometheus_probe_failed", error=str(e))
+    return {
+        "name": "Prometheus", "status": "down", "version": None,
+        "stats": None, "description": "時序資料庫 · 告警規則", "firing_count": 0, "url": base,
+    }
+
+
+async def _probe_sentry(client: httpx.AsyncClient) -> dict:
+    base = "http://192.168.0.110:9000"
+    try:
+        r = await client.get(f"{base}/_health/", timeout=TIMEOUT)
+        if r.status_code == 200 and r.text.strip() == "ok":
+            ver_r = await client.get(f"{base}/api/0/", timeout=TIMEOUT)
+            version = None
+            if ver_r.status_code == 200:
+                raw = ver_r.json().get("version")
+                if isinstance(raw, dict):
+                    version = raw.get("version")
+                elif raw:
+                    version = str(raw)
+            return {
+                "name": "Sentry",
+                "status": "up",
+                "version": version,
+                "stats": "Error Tracking · Issue",
+                "description": "錯誤追蹤 · Issue 管理",
+                "url": base,
+            }
+    except Exception as e:
+        logger.warning("sentry_probe_failed", error=str(e))
+    return {
+        "name": "Sentry", "status": "down", "version": None,
+        "stats": None, "description": "錯誤追蹤 · Issue 管理", "url": base,
+    }
+
+
+async def _probe_langfuse(client: httpx.AsyncClient) -> dict:
+    base = "http://192.168.0.110:3100"
+    try:
+        r = await client.get(f"{base}/api/public/health", timeout=TIMEOUT)
+        if r.status_code == 200:
+            data = r.json()
+            version = data.get("version")
+            return {
+                "name": "Langfuse",
+                "status": "up",
+                "version": version,
+                "stats": "LLM Tracing · AI 觀測",
+                "description": "LLM 追蹤 · AI 成本監控",
+                "url": base,
+            }
+    except Exception as e:
+        logger.warning("langfuse_probe_failed", error=str(e))
+    return {
+        "name": "Langfuse", "status": "down", "version": None,
+        "stats": None, "description": "LLM 追蹤 · AI 成本監控", "url": base,
+    }
+
+
+async def _probe_signoz(client: httpx.AsyncClient) -> dict:
+    base = "http://192.168.0.188:3301"
+    try:
+        r = await client.get(f"{base}/api/v1/health", timeout=TIMEOUT)
+        if r.status_code == 200:
+            return {
+                "name": "SigNoz",
+                "status": "up",
+                "version": None,
+                "stats": "APM · Trace · Log",
+                "description": "可觀測性平台 · OTEL",
+                "url": base,
+            }
+    except Exception as e:
+        logger.warning("signoz_probe_failed", error=str(e))
+    try:
+        r2 = await client.get(f"{base}/", timeout=TIMEOUT)
+        if r2.status_code in (200, 301, 302):
+            return {
+                "name": "SigNoz", "status": "up", "version": None,
+                "stats": "APM · Trace · Log", "description": "可觀測性平台 · OTEL", "url": base,
+            }
+    except Exception:
+        pass
+    return {
+        "name": "SigNoz", "status": "down", "version": None,
+        "stats": None, "description": "可觀測性平台 · OTEL", "url": base,
+    }
+
+
+async def _probe_gitea(client: httpx.AsyncClient) -> dict:
+    base = "http://192.168.0.110:3001"
+    try:
+        # Use /api/v1/version — /-/readiness returns 404 on this Gitea version
+        ver_r = await client.get(f"{base}/api/v1/version", timeout=TIMEOUT)
+        if ver_r.status_code == 200:
+            version = ver_r.json().get("version")
+            return {
+                "name": "Gitea",
+                "status": "up",
+                "version": version,
+                "stats": "CI/CD · Git · Mirror",
+                "description": "代碼倉庫 · Pipeline",
+                "url": base,
+            }
+    except Exception as e:
+        logger.warning("gitea_probe_failed", error=str(e))
+    return {
+        "name": "Gitea", "status": "down", "version": None,
+        "stats": None, "description": "代碼倉庫 · Pipeline", "url": base,
+    }
+
+
+# =============================================================================
+# Router
+# =============================================================================
+
+@router.get("/status")
+async def get_monitoring_status() -> dict:
+    """
+    並行探測所有可觀測性工具狀態
+
+    工具清單: Grafana(3002) / Prometheus / Sentry / Langfuse / SigNoz / Gitea
+    注意: Loki 未安裝 (ADR: SigNoz 統一派)
+
+    Returns:
+        dict with tools list, each containing name/status/version/stats/description
+    """
+    async with httpx.AsyncClient(follow_redirects=True) as client:
+        results = await asyncio.gather(
+            _probe_grafana(client),
+            _probe_prometheus(client),
+            _probe_sentry(client),
+            _probe_langfuse(client),
+            _probe_signoz(client),
+            _probe_gitea(client),
+            return_exceptions=True,
+        )
+
+    now = datetime.now(UTC).isoformat()
+    tools = []
+    for r in results:
+        if isinstance(r, Exception):
+            logger.error("monitoring_probe_exception", error=str(r))
+            continue
+        tools.append({**r, "checked_at": now})
+
+    return {
+        "tools": tools,
+        "checked_at": now,
+    }
--- a/apps/api/src/api/v1/notifications.py
+++ b/apps/api/src/api/v1/notifications.py
@@ -0,0 +1,85 @@
+"""
+通知頻道狀態 API
+================
+GET /api/v1/notifications/channels — 回傳所有通知頻道的真實狀態
+
+頻道:
+  1. Telegram (OpenClaw bot)   — 檢查 BOT_TOKEN 是否設定
+  2. Telegram (AWOOOI bot)     — 檢查 AWOOOI_TG_BOT_TOKEN 是否設定
+  3. SSE (Server-Sent Events)  — 永遠 active (HTTP endpoint 存在)
+  4. Redis Stream              — 檢查 Redis 連線狀態
+
+建立時間: 2026-04-10 (台北時區)
+建立者: Claude Code (Sprint 5R B5 修復)
+"""
+
+from __future__ import annotations
+
+from fastapi import APIRouter
+
+from src.core.config import settings
+
+router = APIRouter()
+
+
+@router.get("/notifications/channels")
+async def list_notification_channels() -> list[dict]:
+    """
+    回傳所有通知頻道的真實設定狀態。
+    不做網路探測 (避免延遲)，只檢查 config 是否完整。
+    """
+    channels: list[dict] = []
+
+    # 1. OpenClaw Telegram Bot (告警 + 審核按鈕)
+    openclaw_ok = bool(
+        getattr(settings, "OPENCLAW_TG_BOT_TOKEN", None)
+        or getattr(settings, "OPENCLAW_BOT_TOKEN", None)
+    )
+    channels.append({
+        "name": "Telegram (OpenClaw Bot)",
+        "type": "telegram",
+        "status": "active" if openclaw_ok else "error",
+        "description": "告警通知 + HITL 審核按鈕",
+        "features": ["alerts", "approvals", "auto_repair"],
+    })
+
+    # 2. Nemotron Telegram Bot (AI 回覆)
+    nemotron_ok = bool(getattr(settings, "NEMOTRON_BOT_TOKEN", None))
+    channels.append({
+        "name": "Telegram (Nemotron Bot)",
+        "type": "telegram",
+        "status": "active" if nemotron_ok else "error",
+        "description": "AI 分析結果回覆",
+        "features": ["ai_responses", "rag_query"],
+    })
+
+    # 3. SSE (Server-Sent Events) — 前端實時推播
+    channels.append({
+        "name": "SSE (Web Stream)",
+        "type": "sse",
+        "status": "active",
+        "description": "前端儀表板實時數據推播",
+        "features": ["dashboard", "approvals", "incidents"],
+        "endpoint": "/api/v1/dashboard/stream",
+    })
+
+    # 4. Redis Stream — 感測器信號通道
+    try:
+        import redis.asyncio as aioredis
+        r = aioredis.from_url(settings.REDIS_URL, socket_connect_timeout=1)
+        await r.ping()
+        await r.aclose()
+        redis_status = "active"
+    except Exception:
+        redis_status = "error"
+
+    channels.append({
+        "name": "Redis Stream (awoooi:signals)",
+        "type": "stream",
+        "status": redis_status,
+        "description": "Sensor Agent 信號通道",
+        "features": ["sensor_signals", "dedup"],
+        "stream_key": "awoooi:signals",
+    })
+
+    return channels
--- a/apps/api/src/api/v1/playbooks.py
+++ b/apps/api/src/api/v1/playbooks.py
@@ -22,6 +22,7 @@ from src.models.playbook import (
    PlaybookListResponse,
    PlaybookRecommendation,
    PlaybookResponse,
+    PlaybookSource,
    PlaybookStatus,
    PlaybookUpdateRequest,
    SymptomPatternRequest,
@@ -51,11 +52,33 @@ class DeletePlaybookResponse(BaseModel):
    message: str


+class CreatePlaybookResponse(BaseModel):
+    """建立 Playbook 回應"""
+
+    success: bool
+    playbook_id: str
+    message: str
+
+
 # =============================================================================
 # API Endpoints
 # =============================================================================


+# 2026-04-05 Claude Code: Sprint 3 — 直接建立 Playbook (seed 腳本用)
+@router.post("/", response_model=CreatePlaybookResponse)
+async def create_playbook(playbook: Playbook) -> CreatePlaybookResponse:
+    """直接建立 Playbook（管理/seed 用途）"""
+    service = get_playbook_service()
+    playbook.source = PlaybookSource.MANUAL
+    saved = await service.create(playbook)
+    return CreatePlaybookResponse(
+        success=True,
+        playbook_id=saved.playbook_id,
+        message=f"Playbook '{saved.name}' created",
+    )
+
+
@router.post("/extract/{incident_id}", response_model=ExtractPlaybookResponse)
 async def extract_playbook(
    incident_id: str,
--- a/apps/api/src/api/v1/rag.py
+++ b/apps/api/src/api/v1/rag.py
@@ -0,0 +1,111 @@
+"""
+RAG 知識庫 API Router - Phase 33
+===================================
+leWOOOgo 原則: Router 只做 HTTP 轉發，業務邏輯在 KnowledgeRAGService
+
+版本: v1.0
+建立: 2026-04-10 (台北時區)
+建立者: Claude Code (Phase 33 ADR-067)
+"""
+
+from fastapi import APIRouter, BackgroundTasks, HTTPException
+from pydantic import BaseModel
+
+from src.services.knowledge_rag_service import get_knowledge_rag_service
+
+router = APIRouter(prefix="/rag", tags=["RAG Knowledge Base"])
+
+
+class RagQueryRequest(BaseModel):
+    question: str
+    top_k: int = 5
+
+
+class RagQueryResponse(BaseModel):
+    answer: str
+    question: str
+
+
+class RagIndexResponse(BaseModel):
+    status: str
+    message: str
+
+
+@router.post("/index", response_model=RagIndexResponse, summary="觸發知識庫全量索引")
+async def trigger_index(background_tasks: BackgroundTasks) -> RagIndexResponse:
+    """
+    觸發文件向量化索引（背景執行）
+
+    索引來源：
+    - docs/runbooks/*.md
+    - docs/adr/*.md
+    - docs/LOGBOOK.md
+    - .agents/skills/*.md
+    """
+    background_tasks.add_task(_run_index)
+    return RagIndexResponse(
+        status="accepted",
+        message="索引已排程，背景執行中（nomic-embed-text @ Ollama 111）",
+    )
+
+
+@router.post("/query", response_model=RagQueryResponse, summary="語義查詢知識庫")
+async def query_rag(request: RagQueryRequest) -> RagQueryResponse:
+    """語義搜尋知識庫，用 deepseek-r1:14b 生成回答"""
+    svc = get_knowledge_rag_service()
+    answer = await svc.query(request.question, top_k=request.top_k)
+    return RagQueryResponse(answer=answer, question=request.question)
+
+
+@router.get("/debug", summary="RAG 容器環境診斷", include_in_schema=False)
+async def rag_debug() -> dict:
+    """診斷用：確認容器內 docs 路徑 + Ollama 連線"""
+    import os
+    from pathlib import Path
+    import httpx
+
+    paths_check = {}
+    for p in ["docs/runbooks", "docs/adr", "docs", ".agents/skills"]:
+        d = Path(p)
+        paths_check[p] = {
+            "exists": d.exists(),
+            "files": [f.name for f in d.glob("*.md")][:3] if d.exists() else [],
+        }
+
+    ollama_ok: bool | str = False
+    try:
+        async with httpx.AsyncClient(timeout=10.0) as c:
+            from src.core.config import get_settings as _gs
+            r = await c.post(
+                f"{_gs().OLLAMA_URL}/api/embeddings",
+                json={"model": "nomic-embed-text", "prompt": "test"},
+            )
+            ollama_ok = r.status_code == 200 if r.status_code == 200 else f"http_{r.status_code}"
+    except Exception as e:
+        ollama_ok = f"error: {type(e).__name__}: {e}"
+
+    return {"cwd": os.getcwd(), "paths": paths_check, "ollama_111_embed": ollama_ok}
+
+
+@router.get("/stats", summary="索引統計")
+async def rag_stats() -> dict:
+    """取得知識庫索引統計（chunk 數量等）"""
+    svc = get_knowledge_rag_service()
+    return await svc.get_stats()
+
+
+@router.post("/optimize", summary="建立 ivfflat 向量索引（需 >100 chunks）", include_in_schema=False)
+async def rag_optimize() -> dict:
+    """對 rag_chunks.embedding 建立 ivfflat 索引，加速向量搜尋"""
+    import src.repositories.rag_chunk_repository as rag_repo
+    return await rag_repo.create_ivfflat_index()
+
+
+# ============================================================
+# Background helper
+# ============================================================
+
+async def _run_index() -> None:
+    """背景：委派給 KnowledgeRAGService.index_all_sources() 執行"""
+    svc = get_knowledge_rag_service()
+    await svc.index_all_sources()
--- a/apps/api/src/api/v1/sentry_webhook.py
+++ b/apps/api/src/api/v1/sentry_webhook.py
@@ -76,6 +76,12 @@ class ErrorAnalysisResult(BaseModel):
    analyzed_by: str  # ollama, claude


+@router.get("/health")
+async def sentry_webhook_health() -> dict:
+    """Wave A.6 Smoke Test: Sentry Webhook 可達性探測"""
+    return {"status": "ok", "webhook": "sentry"}
+
+
@router.post("/error")
 async def handle_sentry_error(
    request: Request,
@@ -437,6 +443,7 @@ async def send_sentry_telegram_alert(
        level = error_context.get("level", "error")

        # 發送 Sentry 告警卡片 (含 Y/n 按鈕)
+        # TODO(2026-04-05): Sentry 路徑無 incident_id，待 Sentry→Incident 關聯後補傳
        await telegram.send_approval_card(
            approval_id=approval_id,
            risk_level="high" if level in ["fatal", "error"] else "medium",
--- a/apps/api/src/api/v1/signoz_webhook.py
+++ b/apps/api/src/api/v1/signoz_webhook.py
@@ -1,5 +1,7 @@
 from __future__ import annotations

+import asyncio
+
 """
 AWOOOI API - SignOz Webhook Handler
 ====================================
@@ -249,6 +251,15 @@ async def process_signoz_alert(
            approval_id=approval_id,
        )

+        # Phase 31 (2026-04-10 Claude Code ADR-067): Log 異常摘要背景推送
+        # 5s 軟超時不阻塞主流程；超時後摘要繼續跑，結果存 Redis 快取
+        pod_name = labels.get("pod", labels.get("pod_name", ""))
+        namespace = labels.get("namespace", "awoooi-prod")
+        if pod_name:
+            asyncio.create_task(
+                _send_log_summary_notification(pod_name, namespace, approval_id)
+            )
+
        # Wave A.5: 記錄告警鏈路成功 (ADR-037)
        record_alert_chain_success("signoz")
        record_alert_processed(
@@ -313,16 +324,28 @@ async def create_signoz_approval(
            action = f"[AI 建議] {analysis_result.action_title}"
        else:
            action = f"SignOz Alert: {alert_name}"
-        approval_request = ApprovalRequestCreate(
-            action=action,
-            description=description,
-            risk_level=analysis_result.risk_level if analysis_result else risk_level,
-            blast_radius=analysis_result.blast_radius if analysis_result else BlastRadius(
+        # 轉換 AIBlastRadius → BlastRadius (兩者欄位相同，enum 型別不同)
+        if analysis_result and analysis_result.blast_radius:
+            ai_br = analysis_result.blast_radius
+            blast_radius = BlastRadius(
+                affected_pods=ai_br.affected_pods,
+                estimated_downtime=ai_br.estimated_downtime,
+                related_services=ai_br.related_services,
+                data_impact=DataImpact(ai_br.data_impact.value.lower()),
+            )
+        else:
+            blast_radius = BlastRadius(
                affected_pods=1,
                estimated_downtime="0",
                related_services=[service_name],
                data_impact=DataImpact.READ_ONLY,
-            ),
+            )
+
+        approval_request = ApprovalRequestCreate(
+            action=action,
+            description=description,
+            risk_level=analysis_result.risk_level if analysis_result else risk_level,
+            blast_radius=blast_radius,
            kubectl_command=command,
            dry_run_checks=[],
            requested_by="signoz-webhook",
@@ -369,6 +392,7 @@ async def send_signoz_telegram(
        summary = annotations.get("summary", f"SignOz Alert: {alert_name}")
        description = annotations.get("description", "")

+        # TODO(2026-04-05): SignOz 路徑無 incident_id，待 SignOz→Incident 關聯後補傳
        await telegram.send_approval_card(
            approval_id=approval_id,
            risk_level=analysis_result.risk_level if analysis_result else (
@@ -400,6 +424,45 @@ async def send_signoz_telegram(
        logger.exception("signoz_telegram_error", error=str(e))


+# =============================================================================
+# Phase 31 (2026-04-10 Claude Code ADR-067): Log 摘要背景推送 helper
+# =============================================================================
+
+async def _send_log_summary_notification(
+    pod_name: str,
+    namespace: str,
+    approval_id: str,
+) -> None:
+    """
+    背景取得 Pod log 摘要並推送 Telegram
+
+    帶 5s 軟超時：超時後摘要繼續生成並存 Redis，不阻塞告警主流程
+    """
+    import html as _html
+    from src.services.log_summary_service import get_log_summary_service
+    from src.services.telegram_gateway import get_telegram_gateway
+
+    try:
+        svc = get_log_summary_service()
+        summary = await svc.summarize_with_soft_timeout(pod_name, namespace)
+
+        if not summary:
+            return
+
+        tg = get_telegram_gateway()
+        msg = (
+            f"🔍 <b>Log 異常摘要</b>\n"
+            f"Pod: <code>{_html.escape(pod_name)}</code>\n"
+            f"Approval: <code>{_html.escape(approval_id)}</code>\n\n"
+            f"{_html.escape(summary)}\n\n"
+            f"<i>deepseek-r1:14b | 免費本地推理</i>"
+        )
+        await tg.send_text(msg[:4096])
+
+    except Exception as e:
+        logger.warning("log_summary_notification_failed", pod=pod_name, error=str(e))
+
+
 # =============================================================================
 # Health Check (供 SignOz 確認 Webhook 可達)
 # =============================================================================
--- a/apps/api/src/api/v1/stats.py
+++ b/apps/api/src/api/v1/stats.py
@@ -19,14 +19,18 @@
 # @see feedback_lewooogo_modular_enforcement.md
 # =============================================================================

-from typing import Annotated
+import asyncio
+import json
+from typing import Annotated, Any

-from fastapi import APIRouter, Depends, Query
+from fastapi import APIRouter, Depends, Query, WebSocket, WebSocketDisconnect
+from fastapi.responses import PlainTextResponse
 from pydantic import BaseModel, Field

 from src.services.stats_service import StatsService, get_stats_service
 from src.services.k3s_monitor_service import K3sMonitorService, get_k3s_monitor_service
 from src.services.weekly_report_service import WeeklyReportService, get_weekly_report_service
+from src.services.flywheel_stats_service import FlywheelStatsService, get_flywheel_stats_service

 router = APIRouter(prefix="/stats", tags=["Statistics"])

@@ -401,3 +405,184 @@ async def trigger_weekly_report(
        "success": success,
        "message": "週報已發送" if success else "週報發送失敗",
    }
+
+
+# =============================================================================
+# 2026-04-07 Claude Code: Sprint 4 C1 — 告警處置統計
+# =============================================================================
+
+
+class DispositionSummary(BaseModel):
+    """處置類型分佈"""
+    total: int = Field(default=0, description="總處置次數")
+    auto_repair: int = Field(default=0, description="自動修復次數")
+    human_approved: int = Field(default=0, description="人工審核批准次數")
+    manual_resolved: int = Field(default=0, description="手動處理次數")
+    cold_start_trust: int = Field(default=0, description="冷啟動信任次數")
+    auto_rate: float = Field(default=0.0, description="自動化率 (auto_repair + cold_start) / total")
+    human_rate: float = Field(default=0.0, description="人工介入率")
+
+
+class DispositionByAnomaly(BaseModel):
+    """按異常類型的處置分佈"""
+    anomaly_key: str
+    alert_name: str = ""
+    disposition: DispositionSummary
+
+
+class DispositionResponse(BaseModel):
+    """處置統計 API 回應"""
+    summary: DispositionSummary
+    by_anomaly: list[DispositionByAnomaly] = Field(default_factory=list)
+
+
+@router.get(
+    "/disposition",
+    response_model=DispositionResponse,
+    summary="告警處置統計",
+)
+async def get_disposition_stats() -> DispositionResponse:
+    """
+    取得告警處置類型分佈統計。
+    2026-04-07 Claude Code: Sprint 4 C1
+    2026-04-07 Claude Code: P0-2 Fix — 封裝到 Service 層，Router 不直接存取 Redis
+
+    包含:
+    - 總覽: 自動修復/人工審核/手動處理/冷啟動信任 各幾次
+    - 自動化率
+    - 按異常類型明細
+    """
+    try:
+        from src.services.anomaly_counter import get_anomaly_counter
+        counter = get_anomaly_counter()
+
+        # P0-2 Fix: 呼叫 Service 層封裝方法，不直接存取 Redis
+        total_summary, by_anomaly_raw = await counter.get_all_disposition_stats()
+
+        by_anomaly: list[DispositionByAnomaly] = []
+        for item in by_anomaly_raw:
+            d_total = item["total"]
+            auto_cnt = item["auto_repair"] + item["cold_start_trust"]
+            by_anomaly.append(DispositionByAnomaly(
+                anomaly_key=item["anomaly_key"],
+                alert_name=item.get("alert_name", ""),
+                disposition=DispositionSummary(
+                    total=d_total,
+                    auto_repair=item["auto_repair"],
+                    human_approved=item["human_approved"],
+                    manual_resolved=item["manual_resolved"],
+                    cold_start_trust=item["cold_start_trust"],
+                    auto_rate=auto_cnt / d_total if d_total > 0 else 0,
+                    human_rate=item["human_approved"] / d_total if d_total > 0 else 0,
+                ),
+            ))
+
+        total = total_summary["total"]
+        auto_cnt = total_summary["auto_repair"] + total_summary["cold_start_trust"]
+
+        return DispositionResponse(
+            summary=DispositionSummary(
+                **total_summary,
+                auto_rate=auto_cnt / total if total > 0 else 0,
+                human_rate=total_summary["human_approved"] / total if total > 0 else 0,
+            ),
+            by_anomaly=by_anomaly,
+        )
+
+    except Exception as e:
+        import structlog
+        structlog.get_logger(__name__).warning("disposition_stats_error", error=str(e))
+        return DispositionResponse(summary=DispositionSummary())
+
+
+# =============================================================================
+# ADR-073-C C1 + ADR-074 M1 — 飛輪健康度 API
+# 2026-04-12 ogt
+# =============================================================================
+
+FlywheelStatsDep = Annotated[FlywheelStatsService, Depends(get_flywheel_stats_service)]
+
+
+@router.get(
+    "/flywheel",
+    summary="飛輪六節點即時狀態（ADR-073-C C1）",
+    response_model=None,
+)
+async def get_flywheel_stats(svc: FlywheelStatsDep) -> dict[str, Any]:
+    """
+    飛輪六節點即時狀態 + 當前流動中的告警。
+    供前端飛輪動畫元件接真實數據。
+    """
+    metrics = await svc.compute()
+    return metrics.to_flywheel_api_dict()
+
+
+@router.get(
+    "/summary",
+    summary="飛輪 KPI 摘要（ADR-073-C C1）",
+    response_model=None,
+)
+async def get_flywheel_summary(svc: FlywheelStatsDep) -> dict[str, Any]:
+    """
+    飛輪 KPI 面板數據：Playbook 數、成功率、今日處理數、KM 向量化率。
+    供前端右上角三個 KPI 卡片顯示真實數據。
+    """
+    metrics = await svc.compute()
+    return metrics.to_summary_api_dict()
+
+
+@router.get(
+    "/flywheel/metrics",
+    summary="Prometheus 飛輪健康度指標（ADR-074 M1）",
+    response_class=PlainTextResponse,
+)
+async def get_flywheel_prometheus_metrics(svc: FlywheelStatsDep) -> PlainTextResponse:
+    """
+    Prometheus text format 飛輪健康度指標。
+    Prometheus scrape target: /api/v1/stats/flywheel/metrics
+
+    Metrics:
+      awoooi_flywheel_playbook_count
+      awoooi_flywheel_execution_success_rate
+      awoooi_flywheel_km_unvectorized_count
+      awoooi_flywheel_alertname_null_rate
+      awoooi_flywheel_incidents_stuck
+      awoooi_flywheel_km_vectorized_rate
+    """
+    metrics = await svc.compute()
+    return PlainTextResponse(
+        content=metrics.to_prometheus_lines(),
+        media_type="text/plain; version=0.0.4; charset=utf-8",
+    )
+
+
+# =============================================================================
+# ADR-073-C C3: WebSocket 即時飛輪推送
+# =============================================================================
+
+@router.websocket("/flywheel/ws")
+async def flywheel_websocket(websocket: WebSocket) -> None:
+    """
+    WebSocket 即時飛輪健康度推送 — ADR-073-C C3
+
+    每 10 秒推送一次 FlywheelSummary JSON。
+    前端連線路徑：ws(s)://<host>/api/v1/stats/flywheel/ws
+
+    Protocol:
+      Server → Client: {"type": "flywheel_summary", "data": {...}, "ts": "ISO8601"}
+      Client → Server: (ignored)
+    """
+    svc = get_flywheel_stats_service()
+    await websocket.accept()
+    try:
+        while True:
+            metrics = await svc.compute()
+            payload = json.dumps({
+                "type": "flywheel_summary",
+                "data": metrics.to_summary_api_dict(),
+                "ts": metrics.computed_at.isoformat(),
+            })
+            await websocket.send_text(payload)
+            await asyncio.sleep(10)
+    except WebSocketDisconnect:
+        pass
--- a/apps/api/src/api/v1/telegram.py
+++ b/apps/api/src/api/v1/telegram.py
@@ -60,6 +60,8 @@ class TestPushRequest(BaseModel):
    root_cause: str = "Test alert for development"
    suggested_action: str = "DELETE_POD"
    estimated_downtime: str = "~30s"
+    # 2026-04-05 Claude Code: 支援 incident_id 以測試第二排按鈕渲染
+    incident_id: str = ""


 # =============================================================================
@@ -88,8 +90,32 @@ async def telegram_webhook(
    logger.info("telegram_webhook_received", update_id=update.update_id)

    # =========================================================================
-    # Step 1: 僅處理 callback_query (簽核按鈕點擊)
+    # Step 1: 路由 Update 類型
    # =========================================================================
+    # Phase 34 (ADR-067 2026-04-10): 圖片訊息 → image_analysis_service
+    if not update.callback_query and update.message:
+        msg = update.message
+        if msg.get("photo"):
+            # 取最高解析度 (photos 陣列最後一個)
+            photos = msg["photo"]
+            best = max(photos, key=lambda p: p.get("file_size", 0))
+            file_id = best.get("file_id", "")
+            chat_id = str(msg.get("chat", {}).get("id", ""))
+            caption = msg.get("caption", "請用繁體中文描述這張圖片")
+            if file_id and chat_id:
+                try:
+                    from src.services.image_analysis_service import get_image_analysis_service
+                    svc = get_image_analysis_service()
+                    # download_and_analyze 內部自行下載 + 分析 + 發送 Telegram
+                    await svc.download_and_analyze(
+                        chat_id=chat_id,
+                        file_id=file_id,
+                        question=caption,
+                    )
+                except Exception as _img_err:
+                    logger.warning("image_analysis_webhook_failed", error=str(_img_err))
+            return {"ok": True, "message": "photo_processed"}
+
    if not update.callback_query:
        logger.debug("telegram_webhook_ignored", reason="not callback_query")
        return {"ok": True, "message": "Ignored (not callback_query)"}
@@ -140,6 +166,27 @@ async def telegram_webhook(

        service = get_approval_service()

+        # 2026-04-08 Claude Code: USER_ACTION 記錄
+        async def _log_user_action(action_name: str, success: bool, incident_id: str | None = None) -> None:
+            try:
+                from src.repositories.alert_operation_log_repository import get_alert_operation_log_repository
+                await get_alert_operation_log_repository().append(
+                    "USER_ACTION",
+                    incident_id=incident_id,
+                    approval_id=approval_id,
+                    actor=f"telegram:{user_id}",
+                    action_detail=action_name,
+                    success=success,
+                    context={
+                        "username": username,
+                        "user_id": user_id,
+                        "message_id": message_id,
+                        "action": action_name,
+                    },
+                )
+            except Exception as _e:
+                logger.warning("alert_op_log_user_action_failed", error=str(_e))
+
        # 2026-03-29 ogt: 修復方法呼叫 - add_signature/reject 不存在
        # 正確方法: sign_approval / reject_approval
        if action == "approve":
@@ -158,6 +205,7 @@ async def telegram_webhook(
                    status=approval.status.value,
                    execution_triggered=execution_triggered,
                )
+                await _log_user_action("approve", True, getattr(approval, "incident_id", None))

                return {
                    "ok": True,
@@ -181,6 +229,7 @@ async def telegram_webhook(
                    approval_id=approval_id,
                    user_id=user_id,
                )
+                await _log_user_action("reject", False, getattr(approval, "incident_id", None))

                return {
                    "ok": True,
@@ -234,6 +283,7 @@ async def test_push(
            root_cause=request.root_cause,
            suggested_action=request.suggested_action,
            estimated_downtime=request.estimated_downtime,
+            incident_id=request.incident_id,
        )

        return {
--- a/apps/api/src/api/v1/webhooks.py
+++ b/apps/api/src/api/v1/webhooks.py
--- a/apps/api/src/constants/init.py
+++ b/apps/api/src/constants/init.py
@@ -0,0 +1 @@
+# AWOOOI Constants Package
--- a/apps/api/src/constants/alert_types.py
+++ b/apps/api/src/constants/alert_types.py
@@ -0,0 +1,74 @@
+"""
+Alert Type Mapping Constants
+=============================
+alertname → incident_type 靜態對應表
+
+來源: BUG-008 修復 2026-04-11（9筆 → 56筆，涵蓋 alerts-unified.yml 全部 alertname）
+遷移: M3 2026-04-11 — 從 webhooks.py 內聯 dict 抽至此模組
+
+整合狀態 (I1, 2026-04-11): ADR-064 Rule Engine 已整合，見 alert_rule_engine.get_incident_type()。
+此靜態 dict 為 Layer 2 fallback：YAML incident_type（Layer 1）→ 此 dict → "custom"（Layer 3）。
+擴展方式：在 alert_rules.yaml 中新增 incident_type 欄位；此 dict 僅需補無 YAML 規則的 alertname。
+"""
+
+# alertname → incident_type 對應（56 筆）
+ALERTNAME_TO_TYPE: dict[str, str] = {
+    # --- 主機層 (host_alerts) ---
+    "HostDown":                          "host_down",
+    "HostHighCpuLoad":                   "host_cpu",
+    "HostOutOfMemory":                   "host_memory",
+    "HostOutOfDiskSpace":                "disk_full",
+    "HostBackupFailed":                  "backup_failure",
+    # --- K8s 層 (kubernetes_alerts) ---
+    "K3sNodeNotReady":                   "k8s_node_failure",
+    "KubePodCrashLooping":               "k8s_pod_crash",
+    "KubePodNotReady":                   "k8s_pod_crash",
+    "KubeNodeNotReady":                  "k8s_node_failure",
+    "KubeNodeUnreachable":               "k8s_node_failure",
+    "KubeDeploymentReplicasMismatch":    "k8s_deployment_mismatch",
+    "VeleroBackupFailed":                "backup_failure",
+    "VeleroBackupNotRun":                "backup_failure",
+    # --- 資料庫 (database_alerts / database_detail_alerts) ---
+    "PostgreSQLDown":                    "database_down",
+    "RedisDown":                         "database_down",
+    "PostgreSQLHighConnections":         "database_performance",
+    "RedisMemoryHigh":                   "high_memory",
+    "PostgreSQLSlowQueries":             "database_performance",
+    "PostgreSQLDeadlocks":               "database_performance",
+    "PostgreSQLTooManyConnections":      "database_performance",
+    "RedisKeyEviction":                  "database_performance",
+    "RedisConnectionsHigh":              "database_performance",
+    "RedisCommandLatencyHigh":           "database_performance",
+    # --- 服務可用性 (service_alerts) ---
+    "OpenClawDown":                      "service_down",
+    "SignOzDown":                        "service_down",
+    "SentryDown":                        "service_down",
+    "HarborDown":                        "service_down",
+    "GiteaDown":                         "service_down",
+    "AlertmanagerDown":                  "service_down",
+    "MinIODown":                         "service_down",
+    "KaliScannerDown":                   "service_down",
+    # --- 外部網站 (external_website_alerts) ---
+    "MoWoooWorkDown":                    "service_404",
+    "TsenyangWebsiteDown":               "service_404",
+    "StockWoooWorkDown":                 "service_404",
+    "BitanWoooWorkDown":                 "service_404",
+    "ExternalSiteSSLExpiringSoon":       "ssl_expiry",
+    # --- 告警鏈路 (alert_chain) ---
+    "AlertChainBroken_Alertmanager":     "alert_chain_broken",
+    "AlertChainBroken_Sentry":           "alert_chain_broken",
+    "NoAlertsReceived2Hours":            "alert_chain_broken",
+    "AlertChainUnhealthy":               "alert_chain_broken",
+    # --- Docker 容器 (docker_health_alerts) ---
+    "DockerContainerUnhealthy":          "docker_container_unhealthy",
+    "DockerContainerExited":             "docker_container_unhealthy",
+    # --- 自動修復監控 (auto_repair) ---
+    "AutoRepairLowSuccessRate":          "auto_repair_degraded",
+    "PermanentFixRequired":              "auto_repair_degraded",
+    # --- 舊版相容 ---
+    "HighCPUUsage":                      "high_cpu",
+    "HighMemoryUsage":                   "high_memory",
+    "DiskSpaceLow":                      "disk_full",
+    "SSLCertExpiringSoon":               "ssl_expiry",
+    "TargetDown":                        "service_404",
+}
--- a/apps/api/src/core/config.py
+++ b/apps/api/src/core/config.py
@@ -52,6 +52,38 @@ class Settings(BaseSettings):
    )

    # ==========================================================================
+    # ==========================================================================
+    # Phase 24: AI Provider Registry (ADR-052)
+    # 2026-04-02 ogt: 絞殺者開關 — true=新 AIRouter, false=舊 openclaw.py if/else
+    # 回滾指令: kubectl set env deployment/awoooi-api USE_AI_ROUTER=false
+    # ==========================================================================
+    USE_AI_ROUTER: bool = Field(
+        default=False,
+        description="Phase 24: True=新 AIRouter 路由, False=舊 openclaw.py fallback chain",
+    )
+
+    # ==========================================================================
+    # aider-watch v2 integration (2026-04-20 ADR-091)
+    # 整合 Mac aider CLI 監控進 awoooi 飛輪（events → incident → ai_router feedback）
+    # 回滾：kubectl set env deployment/awoooi-api USE_AIDER_FEEDBACK=false
+    # ==========================================================================
+    AIDER_WEBHOOK_SECRET: str = Field(
+        default="",
+        description="HMAC secret for /api/v1/aider/events webhook verification",
+    )
+    AIDER_EVENTS_STREAM_KEY: str = Field(
+        default="signals:aider:events",
+        description="Redis stream key for aider event ingestion",
+    )
+    AIDER_PATTERN_EXTRACT_INTERVAL_HOURS: float = Field(
+        default=24.0,
+        description="Aider event pattern extraction interval (future use)",
+    )
+    USE_AIDER_FEEDBACK: bool = Field(
+        default=False,
+        description="Phase 24 A8: True=ai_router.route() 讀 aider 成功率調權重, False=不讀（預設）",
+    )
+
    # Phase 22: OpenClaw + Nemotron 協作 (ADR-044)
    # 2026-03-31 Claude Code: 統帥批准實作
    #
@@ -74,6 +106,32 @@ class Settings(BaseSettings):
        default=True,
        description="Phase 22: True=異步更新 (先推 OpenClaw), False=同步等待",
    )
+    # 2026-04-05 Claude Code: Phase 25 P0 v4.3 — DIAGNOSE timeout 依實測修正
+    # 實測依據 (2026-04-05):
+    #   NIM (nvidia/nemotron-mini-4b-instruct): 2.2s~27.3s，平均 10.6s → 60s timeout (27s * 2 + buffer)
+    #   Ollama llama3.2:3b CPU-only: 238s 回 {"ok":true} → 不可用於生產，timeout 保留但實際走 NIM
+    NEMOTRON_DIAGNOSE_TIMEOUT_SECONDS: int = Field(
+        default=60,
+        description="Phase 25 P0: DIAGNOSE NIM timeout (秒)，實測 2.2-27.3s avg 10.6s，60s 含 buffer",
+    )
+    OLLAMA_DIAGNOSE_TIMEOUT_SECONDS: int = Field(
+        default=200,
+        description="Phase 25 P0: Ollama timeout (秒)，實測 CPU-only 238s，保留欄位但 DIAGNOSE 不再走 Ollama",
+    )
+
+    # ==========================================================================
+    # Gitea — ADR-057 adopt() Gitea PR API (2026-04-05)
+    # ==========================================================================
+    GITEA_API_URL: str = Field(
+        default="http://192.168.0.110:3001",
+        description="Gitea 內網 API base URL",
+    )
+    GITEA_API_TOKEN: str = Field(
+        default="",
+        description="Gitea API Token（需 write:repository scope），ADR-057 adopt() 使用",
+    )
+    GITEA_REPO_OWNER: str = Field(default="wooo", description="Gitea repo owner")
+    GITEA_REPO_NAME: str = Field(default="awoooi", description="Gitea repo name")

    # ==========================================================================
    # CORS - 嚴格白名單 (無 UAT, 無 wildcard)
@@ -87,6 +145,9 @@ class Settings(BaseSettings):
            "http://localhost:3333",
            "http://192.168.0.168:3000",  # 168 MacBook 本機開發
            "http://192.168.0.188:3000",  # 188 本機開發
+            "http://192.168.0.125:32335",  # K3s VIP NodePort (staging/QA)
+            "http://192.168.0.120:32335",  # K3s node-1 NodePort
+            "http://192.168.0.121:32335",  # K3s node-2 NodePort
            "https://awoooi.wooo.work",
        ],
        description="Allowed CORS origins - NO wildcards allowed",
@@ -124,9 +185,14 @@ class Settings(BaseSettings):
    # External Services - Four Host Architecture
    # ==========================================================================
    OLLAMA_URL: str = Field(
-        default="http://192.168.0.188:11434",
+        default="http://192.168.0.111:11434",  # 2026-04-08 ogt: 切換至 M1 Pro (40+ tok/s vs 0.45 tok/s)
        description="Ollama LLM service URL",
    )
+    # 2026-04-12 ogt: 心跳必須確認載入的 Ollama 模型清單
+    OLLAMA_REQUIRED_MODELS: list[str] = Field(
+        default=["nomic-embed-text", "qwen2.5:7b-instruct", "deepseek-r1:14b"],
+        description="HeartbeatReportService 探測必要模型是否載入",
+    )
    # Deprecated: use OPENCLAW_URL instead
    CLAWBOT_URL: str = Field(
        default="http://192.168.0.188:8088",  # 🔧 修正: OpenClaw 實際 port 是 8088
@@ -225,6 +291,15 @@ class Settings(BaseSettings):
        default="",
        description="NVIDIA NIM API key for Nemotron Tool Calling (ADR-036)",
    )
+    # 2026-04-09 Claude Sonnet 4.6: Ollama Tool Calling — 替代 NVIDIA 雲端，本機推理
+    USE_OLLAMA_TOOL_CALLING: bool = Field(
+        default=True,
+        description="使用 Ollama 本機做 Tool Calling，取代 NVIDIA NIM 雲端 (44s→5s)",
+    )
+    OLLAMA_TOOL_MODEL: str = Field(
+        default="llama3.1:8b",
+        description="Ollama Tool Calling 模型 (支援 function calling 格式)",
+    )

    @field_validator("AI_FALLBACK_ORDER", mode="before")
    @classmethod
@@ -301,12 +376,14 @@ class Settings(BaseSettings):
        description="OpenClaw AI Agent service URL",
    )
    OPENCLAW_DEFAULT_MODEL: str = Field(
-        default="qwen2.5:7b-instruct",
-        description="Default Ollama model for RCA analysis (7B params, better Chinese)",
+        default="deepseek-r1:14b",  # 2026-04-08 ogt: SRE最強推理，M1 Pro實測 13 tok/s
+        description="Default Ollama model for RCA analysis",
    )
    OPENCLAW_TIMEOUT: int = Field(
-        default=90,
-        description="Timeout for OpenClaw AI calls (seconds)",
+        default=30,  # 2026-04-14 Claude Sonnet 4.6: 從 120s 改 30s，配合 ADR-052 GAP-B4
+        # 25s LLM hard timeout + 5s buffer。原 120s 違反 defense-in-depth 設計，
+        # 導致 Ollama 過載時 thread 飢餓 120s 才降級 fallback。
+        description="Timeout for OpenClaw AI calls (seconds, aligned with GAP-B4 25s)",
    )

    # ==========================================================================
@@ -335,10 +412,26 @@ class Settings(BaseSettings):
        default=False,
        description="Telegram Polling (False: OpenClaw handles it; True: only if OpenClaw unavailable)",
    )
+    # 2026-04-03 ogt: SRE 戰情室群組三頭政治 (Triumvirate) — ADR-053
+    OPENCLAW_BOT_TOKEN: str = Field(
+        default="",
+        description="@OpenClawAwoooI_Bot Token — 群組內代表 OpenClaw AI 發言",
+    )
+    NEMOTRON_BOT_TOKEN: str = Field(
+        default="",
+        description="@NemoTronAwoooI_Bot Token — 群組內代表 NemoClaw AI 發言",
+    )
+    SRE_GROUP_CHAT_ID: str = Field(
+        default="",
+        description="AwoooI SRE 戰情室群組 Chat ID",
+    )

    def get_tg_user_whitelist(self) -> list[int]:
        """Parse comma-separated or JSON array user IDs to list[int]"""
        raw = self.OPENCLAW_TG_USER_WHITELIST
+        # 已是 list（測試 monkeypatch 或程式碼直接傳入）
+        if isinstance(raw, list):
+            return [int(uid) for uid in raw]
        if not raw or not raw.strip():
            return []
        # Handle JSON array format or comma-separated
@@ -436,24 +529,70 @@ class Settings(BaseSettings):

    # ==========================================================================
    # Phase 13.1: GitHub Webhook → OpenClaw 整合
-    # GitHub PR/Push 事件自動觸發 AI 代碼審查
+    # Gitea PR/Push 事件自動觸發 AI 代碼審查 (ADR-059: GitHub → Gitea 遷移)
    # ==========================================================================
-    GITHUB_WEBHOOK_SECRET: str = Field(
+    GITEA_WEBHOOK_SECRET: str = Field(
        default="",
-        description="GitHub Webhook secret for signature verification (X-Hub-Signature-256)",
+        description="Gitea Webhook secret for HMAC-SHA256 signature verification (X-Gitea-Signature)",
    )
-    GITHUB_ALLOWED_REPOS: str = Field(
-        default="",
-        description="Comma-separated list of allowed repositories (e.g., 'owner/repo1,owner/repo2')",
+    GITEA_ALLOWED_REPOS: str = Field(
+        default="wooo/awoooi",
+        description="Comma-separated list of allowed Gitea repositories (e.g., 'wooo/awoooi')",
    )

-    def get_github_allowed_repos(self) -> list[str]:
+    def get_gitea_allowed_repos(self) -> list[str]:
        """Parse comma-separated allowed repos to list"""
-        raw = self.GITHUB_ALLOWED_REPOS
+        # 2026-04-05 Claude Code (ADR-059): GitHub → Gitea webhook 遷移
+        raw = self.GITEA_ALLOWED_REPOS
        if not raw or not raw.strip():
            return []
        return [repo.strip() for repo in raw.split(",") if repo.strip()]

+    # ==========================================================================
+    # MCP Phase 2b: Prometheus MCP Server (ADR-071, 2026-04-11 Claude Sonnet 4.6)
+    # ==========================================================================
+    PROMETHEUS_URL: str = Field(
+        default="http://192.168.0.188:9090",
+        description="Prometheus server URL",
+    )
+    PROMETHEUS_MCP_ENABLED: bool = Field(
+        default=True,
+        description="啟用 Prometheus MCP Provider",
+    )
+
+    # MCP Phase 2a: SSH MCP Server (ADR-071, 2026-04-11 Claude Sonnet 4.6)
+    # ==========================================================================
+    SSH_MCP_ENABLED: bool = Field(
+        default=False,
+        description="啟用 SSH MCP Provider（需 K8s Secret ssh-mcp-key 掛載）",
+    )
+    SSH_MCP_ALLOWED_HOSTS: str = Field(
+        default="192.168.0.188,192.168.0.110,192.168.0.111",
+        description="允許 SSH 的主機 IP 清單（逗號分隔）",
+    )
+
+    # MCP Phase 3: ArgoCD MCP Server (2026-04-11 Claude Sonnet 4.6)
+    # ==========================================================================
+    ARGOCD_URL: str = Field(
+        default="https://192.168.0.125:30443",
+        description="ArgoCD API Server URL（K3s NodePort HTTPS）",
+    )
+    ARGOCD_API_TOKEN: str = Field(
+        default="",
+        description="ArgoCD API Token（從 K8s Secret 取得）",
+    )
+    ARGOCD_MCP_ENABLED: bool = Field(
+        default=True,
+        description="啟用 ArgoCD MCP Provider（需 ARGOCD_API_TOKEN）",
+    )
+
+    # MCP Phase 3: Sentry MCP Server (2026-04-11 Claude Sonnet 4.6)
+    # ==========================================================================
+    SENTRY_MCP_ENABLED: bool = Field(
+        default=True,
+        description="啟用 Sentry MCP Provider（需 SENTRY_AUTH_TOKEN）",
+    )
+
    # ==========================================================================
    # Phase 13.2: Grafana MCP Tool (#83)
    # ==========================================================================
--- a/apps/api/src/core/constants.py
+++ b/apps/api/src/core/constants.py
@@ -85,6 +85,34 @@ CICD_ALERT_SUFFIXES = (
 # CI/CD 告警關鍵字 (不區分大小寫)
 CICD_ALERT_KEYWORDS = ("CI/CD", "cicd")

+# =============================================================================
+# Heartbeat/Watchdog Alert Detection (2026-04-10 Claude Sonnet 4.6 Asia/Taipei)
+# 心跳/看門狗告警不觸發自動修復飛輪 — 這類告警代表監控系統狀態，不是服務故障
+# =============================================================================
+HEARTBEAT_ALERT_NAMES = frozenset({
+    "Watchdog",
+    "DeadMansSwitch",
+    "NoAlertsReceived",
+    "NoAlertsReceived2Hours",
+    "AlertmanagerDown",
+    "PrometheusNotConnectedToAlertmanager",
+})
+
+HEARTBEAT_ALERT_KEYWORDS = ("NoAlertsReceived", "Watchdog", "DeadMansSwitch", "Heartbeat")
+
+
+def is_heartbeat_alertname(alertname: str) -> bool:
+    """
+    判斷 alertname 是否為心跳/看門狗告警
+
+    心跳告警代表監控系統自身健康狀態，不是服務故障，
+    不應進入自動修復飛輪（不存在對應的 Playbook 修復動作）。
+    """
+    return (
+        alertname in HEARTBEAT_ALERT_NAMES
+        or any(kw in alertname for kw in HEARTBEAT_ALERT_KEYWORDS)
+    )
+

 def is_cicd_alertname(alertname: str) -> bool:
    """
--- a/apps/api/src/core/feature_flags.py
+++ b/apps/api/src/core/feature_flags.py
@@ -0,0 +1,250 @@
+"""
+AWOOOI AIOps Feature Flags
+==========================
+AI 自主化飛輪 Phase 0-6 功能開關
+
+ADR-080: AI 自主化飛輪總綱
+MASTER: docs/superpowers/specs/2026-04-15-MASTER-ai-autonomous-flywheel-v2.md
+
+安全規則：
+- 所有 flag 預設 False — 任何 Phase 必須明確開啟才生效
+- Phase 總開關 = False 時，該 Phase 所有子開關均視為 False
+- 自我降級後 (D6) 不得自動反向升級，升級必須人工設定 env var
+
+回滾方式：
+    kubectl set env deployment/awoooi-api AIOPS_P1_ENABLED=false
+    # ⚠️  pydantic_settings 在 Pod 啟動時讀取 env var 並快取為 Singleton
+    # kubectl set env 修改後必須重啟 Pod 才生效（非熱重載）
+    # 緊急回滾：kubectl rollout restart deployment/awoooi-api
+
+2026-04-15 ogt: Phase 0 — 初始建立，ADR-080 批准後啟用
+"""
+
+from pydantic import Field
+from pydantic_settings import BaseSettings, SettingsConfigDict
+
+
+class AIOpsFeatureFlags(BaseSettings):
+    """
+    AI 自主化飛輪 Feature Flag 集合
+
+    每個 Phase 一個總開關 + 細粒度子開關。
+    讀取順序：環境變數 > .env 檔 > 預設值（全 False）。
+    """
+
+    model_config = SettingsConfigDict(
+        env_file=".env",
+        env_file_encoding="utf-8",
+        case_sensitive=True,
+        extra="ignore",
+    )
+
+    # ==========================================================================
+    # Phase 總開關（Phase N 退出條件達到後才設 True）
+    # ==========================================================================
+
+    AIOPS_P1_ENABLED: bool = Field(
+        default=False,
+        description="Phase 1 感官縱深：PreDecisionInvestigator + EvidenceSnapshot + PostExecutionVerifier",
+    )
+    AIOPS_P2_ENABLED: bool = Field(
+        default=False,
+        description="Phase 2 多 Agent 協作：5 角色全部上線（Diagnostician/Solver/Reviewer/Critic/Coordinator）",
+    )
+    AIOPS_P3_ENABLED: bool = Field(
+        default=False,
+        description="Phase 3 學習閉環重建：3 根因修復 + EWMA + Evolver + Fine-tune pipeline",
+    )
+    AIOPS_P4_ENABLED: bool = Field(
+        default=False,
+        description="Phase 4 動態異常偵測：Holt-Winters + Drain3 + Prophet + 主動巡檢",
+    )
+    AIOPS_P5_ENABLED: bool = Field(
+        default=False,
+        description="Phase 5 修復抽象化：Declarative + Blast Radius 四級分控 + GitOps PR",
+    )
+    AIOPS_P6_ENABLED: bool = Field(
+        default=False,
+        description="Phase 6 自我治理閉環：SLO + Trust Drift + KB Rot + 離線回放 + 自我降級",
+    )
+
+    # ==========================================================================
+    # Phase 1 細粒度子開關
+    # ==========================================================================
+
+    AIOPS_P1_PRE_DECISION_INVESTIGATOR: bool = Field(
+        default=False,
+        description="P1: PreDecisionInvestigator 是否在決策前執行 MCP 感官蒐集（可獨立關閉）",
+    )
+    AIOPS_P1_POST_EXECUTION_VERIFIER: bool = Field(
+        default=False,
+        description="P1: PostExecutionVerifier 是否在每次執行後驗證狀態",
+    )
+
+    # ==========================================================================
+    # Phase 2 細粒度子開關
+    # ==========================================================================
+
+    AIOPS_P2_CRITIC_ENABLED: bool = Field(
+        default=False,
+        description="P2: Critic Agent 是否啟用辯證挑戰（關閉可降低延遲但失去質疑機制）",
+    )
+    AIOPS_P2_AGENT_TIMEOUT_SEC: int = Field(
+        default=5,
+        description="P2: 單 Agent 熔斷閾值（秒），超時則 Coordinator 降級處理",
+    )
+
+    # ==========================================================================
+    # Phase 3 細粒度子開關
+    # ==========================================================================
+
+    AIOPS_P3_FINETUNE_EXPORT: bool = Field(
+        default=False,
+        description="P3: Fine-tune JSONL 每週匯出到 MinIO 是否執行",
+    )
+    AIOPS_P3_EVOLVER_ENABLED: bool = Field(
+        default=False,
+        description="P3: Evolver Agent 是否執行 Playbook 自動合併與封存",
+    )
+    AIOPS_P3_KNOWLEDGE_DECAY: bool = Field(
+        default=False,
+        description="P3: 30 天知識遺忘 job 是否執行（標 decayed，降到 cold index）",
+    )
+
+    # ==========================================================================
+    # Phase 4 細粒度子開關
+    # ==========================================================================
+
+    AIOPS_P4_DYNAMIC_BASELINE: bool = Field(
+        default=False,
+        description="P4: Holt-Winters 動態基線服務是否啟用",
+    )
+    AIOPS_P4_LOG_ANOMALY: bool = Field(
+        default=False,
+        description="P4: Drain3 日誌異常偵測是否啟用",
+    )
+    AIOPS_P4_TREND_PREDICTOR: bool = Field(
+        default=False,
+        description="P4: Prophet 趨勢預測是否啟用（預測 4h 內超閾值風險）",
+    )
+    AIOPS_P4_PROACTIVE_INSPECTOR: bool = Field(
+        default=False,
+        description="P4: 主動巡檢每 5min 是否執行",
+    )
+    AIOPS_P4_SHADOW_MODE: bool = Field(
+        default=True,
+        description="P4: Shadow Mode = True 時動態偵測只記錄不觸發 Alert；False = 真實觸發（需先觀察噪音率）",
+    )
+
+    # ==========================================================================
+    # Phase 5 細粒度子開關
+    # ==========================================================================
+
+    AIOPS_P5_BLAST_RADIUS_CHECK: bool = Field(
+        default=False,
+        description="P5: Blast Radius 評估是否執行（False = 全部視為低風險自動執行，危險）",
+    )
+    AIOPS_P5_GITOPS_PR: bool = Field(
+        default=False,
+        description="P5: 高風險修復（Blast Radius > 50）是否走 GitOps Gitea PR 流程",
+    )
+    AIOPS_P5_DRY_RUN_ENFORCED: bool = Field(
+        default=False,
+        description="P5: Declarative apply 前是否強制 dry-run（False = 跳過 dry-run，危險）",
+    )
+
+    # ==========================================================================
+    # Phase 6 細粒度子開關
+    # ==========================================================================
+
+    AIOPS_P6_SELF_DEMOTION: bool = Field(
+        default=False,
+        description="P6: 自我降級邏輯是否啟用（SLO 違反 → 自動提高信心閾值）",
+    )
+    AIOPS_P6_OFFLINE_REPLAY: bool = Field(
+        default=False,
+        description="P6: 週度離線回放 100 案是否執行",
+    )
+    AIOPS_P6_KB_ROT_CLEANER: bool = Field(
+        default=False,
+        description="P6: 月度 KB 腐爛清理 job 是否執行",
+    )
+    AIOPS_P6_TRUST_DRIFT_DETECTOR: bool = Field(
+        default=False,
+        description="P6: Playbook trust 分布漂移偵測是否啟用",
+    )
+    AIOPS_P6_GOVERNANCE_ENABLED: bool = Field(
+        default=False,
+        description="P6: 治理閉環總開關（offline_replay_service / model_rollback_service 守衛）",
+    )
+
+    def is_phase_enabled(self, phase: int) -> bool:
+        """
+        檢查指定 Phase 的總開關是否啟用。
+
+        Args:
+            phase: Phase 編號（1-6）
+
+        Returns:
+            bool: 該 Phase 是否開啟
+
+        Usage:
+            if flags.is_phase_enabled(1):
+                await pre_decision_investigator.investigate(...)
+        """
+        phase_flags = {
+            1: self.AIOPS_P1_ENABLED,
+            2: self.AIOPS_P2_ENABLED,
+            3: self.AIOPS_P3_ENABLED,
+            4: self.AIOPS_P4_ENABLED,
+            5: self.AIOPS_P5_ENABLED,
+            6: self.AIOPS_P6_ENABLED,
+        }
+        return phase_flags.get(phase, False)
+
+    def is_sub_flag_enabled(self, flag_name: str) -> bool:
+        """
+        檢查細粒度子開關（自動驗證父 Phase 開關）。
+
+        Args:
+            flag_name: 子開關名稱，例如 "AIOPS_P1_PRE_DECISION_INVESTIGATOR"
+
+        Returns:
+            bool: 子開關 AND 父 Phase 開關都為 True 才回 True
+
+        Usage:
+            if flags.is_sub_flag_enabled("AIOPS_P1_PRE_DECISION_INVESTIGATOR"):
+                ...
+        """
+        # 解析 Phase 編號
+        parts = flag_name.split("_")
+        if len(parts) < 3 or not parts[1].startswith("P"):
+            return False
+
+        try:
+            phase = int(parts[1][1:])
+        except ValueError:
+            return False
+
+        # 父 Phase 必須開啟
+        if not self.is_phase_enabled(phase):
+            return False
+
+        return bool(getattr(self, flag_name, False))
+
+
+# Singleton — 與 core/config.py 的 settings 相同模式
+# 使用：from src.core.feature_flags import aiops_flags
+aiops_flags = AIOpsFeatureFlags()
+
+
+def get_aiops_flags() -> AIOpsFeatureFlags:
+    """
+    FastAPI dependency injection 用。
+
+    Usage:
+        @router.get("/status")
+        async def status(flags: AIOpsFeatureFlags = Depends(get_aiops_flags)):
+            return {"p1": flags.AIOPS_P1_ENABLED}
+    """
+    return aiops_flags
--- a/apps/api/src/core/prompts.py
+++ b/apps/api/src/core/prompts.py
@@ -74,7 +74,7 @@ For each optimization suggestion, provide EXECUTABLE kubectl commands:
 {
  "action_title": "string - 操作標題 (繁體中文)",
  "description": "string - 根因分析含 SignOz 數據關聯 (繁體中文)",
-  "suggested_action": "RESTART_DEPLOYMENT|DELETE_POD|SCALE_DEPLOYMENT|APPLY_HPA|TUNE_RESOURCES|NO_ACTION",
+  "suggested_action": "RESTART_DEPLOYMENT|DELETE_POD|SCALE_DEPLOYMENT|APPLY_HPA|TUNE_RESOURCES|INVESTIGATE|OBSERVE|NO_ACTION",
  "kubectl_command": "string - 具體的 kubectl 指令",
  "target_resource": "string - 目標資源名稱",
  "namespace": "string - K8s namespace",
@@ -103,6 +103,23 @@ For each optimization suggestion, provide EXECUTABLE kubectl commands:
 }
 ```

+## 🔑 Alert-Specific Analysis Rules (CRITICAL — read alertname first)
+The `alertname` field is your PRIMARY signal. Use it to determine the problem type and appropriate action:
+
+| Alert category / alertname pattern | suggested_action | kubectl_command guidance |
+|-------------------------------------|-----------------|--------------------------|
+| contains "Disk", "Storage", "PVC", "Volume" | NO_ACTION | `kubectl exec <pod> -- df -h` or `kubectl get pvc -n <ns>` |
+| contains "Postgres", "MySQL", "Redis", "DB", "Database" | NO_ACTION | `kubectl exec <pod> -- psql` or `kubectl logs <pod>` |
+| contains "CrashLoop", "OOMKilled", "Pod" | DELETE_POD or RESTART_DEPLOYMENT | `kubectl delete pod <pod> -n <ns>` |
+| contains "CPU", "Memory", "Resource" | TUNE_RESOURCES or SCALE_DEPLOYMENT | `kubectl top pod -n <ns>` or HPA command |
+| contains "Node", "NodeNotReady" | NO_ACTION | `kubectl describe node <node>` |
+| contains "SSL", "Certificate", "Cert" | NO_ACTION | `kubectl get certificate -n <ns>` |
+| alert_category = "database" | NO_ACTION | DB investigation commands only |
+| alert_category = "storage" | NO_ACTION | `kubectl get pvc`, `kubectl exec -- df -h` |
+
+**NEVER** use `kubectl rollout restart deployment/awoooi-prod` for database, storage, or network alerts.
+Make `action_title` describe the ACTUAL problem from alertname (not generic "自動修復 AWOOOI 服務").
+
 ## 🔥 Short Example: High CPU -> SCALE_DEPLOYMENT, HPA, risk_level=medium
 Please carefully justify your confidence between 0.0 and 1.0 (e.g. 0.82) based on symptoms and metrics.

@@ -138,16 +155,35 @@ OPENCLAW_TEST_PROMPT = """你是 AWOOOI AIOps 平台的智慧助手 OpenClaw。
 NEMOTRON_SYSTEM_PROMPT = """# OpenClaw Lightweight (Nemo-4B Optimized)
 You are an SRE AI. Analyze the alert and respond with ONLY valid JSON.

+## 🔒 DEPLOYMENT NAME RULE (STRICTLY ENFORCED)
+- `namespace` is NEVER a deployment name.
+- "awoooi-prod" is a NAMESPACE, NOT a deployment. NEVER write `deployment/awoooi-prod`.
+- When "叢集實際資源清單" is provided, `target_resource` and deployment in
+  `kubectl_command` MUST match one of those names exactly.
+- If alert has `labels.deployment`, prefer it over guessing.
+- Unknown target → suggested_action=NO_ACTION, kubectl_command=
+  "kubectl get deploy -n <namespace>" (investigation only).
+
+## CRITICAL: Read alertname first
+The `alertname` field tells you what kind of problem this is. Use it:
+- "Disk/Storage/PVC/Volume" → suggested_action=NO_ACTION, kubectl_command="kubectl get pvc" or "kubectl exec <pod> -- df -h"
+- "Postgres/MySQL/Redis/DB/Database" → suggested_action=NO_ACTION, DB investigation commands
+- "CrashLoop/OOM/Pod" → suggested_action=DELETE_POD or RESTART_DEPLOYMENT
+- "CPU/Memory/Resource" → suggested_action=TUNE_RESOURCES or SCALE_DEPLOYMENT
+- "SSL/Cert" → suggested_action=NO_ACTION
+NEVER use "kubectl rollout restart deployment/awoooi-prod" (that is the NAMESPACE, not a deployment).
+Make action_title describe the ACTUAL problem (not generic "自動修復 AWOOOI 服務").
+
 ## Required JSON Schema:
 {
  "confidence": <YOUR_CALCULATED_VALUE>,
  "reasoning": "簡短理由 (繁體中文)",
  "primary_responsibility": "FE|BE|INFRA|DB|COLLAB",
  "risk_level": "low|medium|critical",
-  "action_title": "操作標題 (繁體中文)",
-  "description": "根因分析 (繁體中文)",
-  "suggested_action": "RESTART_DEPLOYMENT|DELETE_POD|SCALE_DEPLOYMENT|NO_ACTION",
-  "kubectl_command": "kubectl 指令",
+  "action_title": "操作標題，必須反映 alertname 的實際問題 (繁體中文)",
+  "description": "根因分析，說明 alertname 代表的問題及建議調查步驟 (繁體中文)",
+  "suggested_action": "RESTART_DEPLOYMENT|DELETE_POD|SCALE_DEPLOYMENT|APPLY_HPA|TUNE_RESOURCES|INVESTIGATE|OBSERVE|NO_ACTION",
+  "kubectl_command": "針對此告警類型的 kubectl 指令",
  "target_resource": "目標資源",
  "namespace": "K8s namespace",
  "blast_radius": {"affected_pods": 1, "estimated_downtime": "~30s"}
--- a/apps/api/src/db/base.py
+++ b/apps/api/src/db/base.py
@@ -143,9 +143,33 @@ async def init_db() -> None:
    Call this at application startup.
    """
    engine = get_engine()
-    async with engine.begin() as conn:
-        await conn.run_sync(Base.metadata.create_all)

+    # 2026-04-15 ogt: 多 replica 並行啟動競爭修復
+    # 問題：單一大 transaction 裡兩個 pod 同時建 table → 其中一個 CREATE INDEX 失敗
+    #       PostgreSQL 中 transaction 內任何錯誤導致整個 transaction ROLLBACK
+    #       → table + index 全消失 → 下次重啟同樣問題 → 無限 CrashLoop
+    # 修法：每個 table 獨立 transaction；先 DROP INDEX IF EXISTS 清殘留孤兒 index；
+    #       捕捉 "already exists" 讓並行 pod 優雅跳過
+    async with engine.connect() as probe_conn:
+        existing = set(await probe_conn.run_sync(
+            lambda c: __import__('sqlalchemy', fromlist=['inspect']).inspect(c).get_table_names()
+        ))
+
+    for table in Base.metadata.sorted_tables:
+        if table.name not in existing:
+            try:
+                async with engine.begin() as conn:
+                    # 先清殘留孤兒 index（前次 CrashLoop 留下的部分狀態）
+                    for index in table.indexes:
+                        await conn.execute(text(f'DROP INDEX IF EXISTS "{index.name}"'))
+                    await conn.run_sync(table.create)
+            except Exception as exc:
+                if "already exists" in str(exc).lower():
+                    pass  # 並行 pod 已建好，忽略
+                else:
+                    raise
+
+    async with engine.begin() as conn:
        # 2026-04-02 Claude Code: 確保 risklevel enum 包含 'high' 值
        # Phase 23 新增，避免舊 DB 缺少此值導致 InvalidTextRepresentation
        await conn.execute(
@@ -164,6 +188,54 @@ async def init_db() -> None:
            """)
        )

+        # 2026-04-09 Claude Sonnet 4.6: Sprint 5R C1 修復 — 批准執行閉環 Telegram 訊息持久化欄位
+        # create_all 不做 ALTER，需手動補欄位
+        await conn.execute(
+            text("""
+            ALTER TABLE approval_records
+                ADD COLUMN IF NOT EXISTS telegram_message_id INTEGER,
+                ADD COLUMN IF NOT EXISTS telegram_chat_id INTEGER;
+            """)
+        )
+
+        # 2026-04-15 ogt + Claude Sonnet 4.6（亞太）: Phase 3.5 Playbook PostgreSQL 持久化
+        # ADR-085: AI 學習成果不可存 Cache — trust_score、EWMA 必須永久保存
+        # playbooks 表已存在（15 筆舊資料），補加新欄位
+        await conn.execute(
+            text("""
+            ALTER TABLE playbooks
+                ADD COLUMN IF NOT EXISTS trust_score FLOAT NOT NULL DEFAULT 0.3,
+                ADD COLUMN IF NOT EXISTS requires_approval_level VARCHAR(20) NOT NULL DEFAULT 'auto',
+                ADD COLUMN IF NOT EXISTS stateful_targets JSONB NOT NULL DEFAULT '[]',
+                ADD COLUMN IF NOT EXISTS requires_pre_backup BOOLEAN NOT NULL DEFAULT FALSE;
+            """)
+        )
+
+        # 2026-04-15 ogt + Claude Sonnet 4.6（亞太）: Phase 4 8D 感官升級
+        # ADR-084: EvidenceSnapshot 加入 Phase 4 動態異常上下文（anomaly_context）
+        await conn.execute(
+            text("""
+            ALTER TABLE incident_evidence
+                ADD COLUMN IF NOT EXISTS anomaly_context JSONB;
+            """)
+        )
+
+        # 2026-04-15 ogt + Claude Sonnet 4.6（亞太）: Phase 6 自我治理閉環
+        # ADR-087: ai_governance_events 不可變 Event Sourcing 表
+        # asyncpg 不允許 prepared statement 內多條指令，必須分開 execute
+        await conn.execute(text(
+            "CREATE INDEX IF NOT EXISTS ix_ai_governance_event_type "
+            "ON ai_governance_events (event_type);"
+        ))
+        await conn.execute(text(
+            "CREATE INDEX IF NOT EXISTS ix_ai_governance_triggered_at "
+            "ON ai_governance_events (triggered_at);"
+        ))
+        await conn.execute(text(
+            "CREATE INDEX IF NOT EXISTS ix_ai_governance_resolved "
+            "ON ai_governance_events (resolved);"
+        ))
+

 async def close_db() -> None:
    """
--- a/apps/api/src/db/models.py
+++ b/apps/api/src/db/models.py
@@ -25,6 +25,7 @@ from sqlalchemy import (
 from sqlalchemy import (
    Enum as SQLEnum,
 )
+from sqlalchemy.dialects.postgresql import ENUM as PgEnum
 from sqlalchemy.orm import Mapped, mapped_column

 from src.db.base import Base
@@ -124,6 +125,48 @@ class ApprovalRecord(Base):
        comment="Last time this alert pattern was seen",
    )

+    # Sprint 5.1 MultiSig 雙簽核支援 (2026-04-08 Claude Sonnet 4.6 Asia/Taipei，ADR-062 Q3)
+    approval_level: Mapped[str] = mapped_column(
+        String(20),
+        default="standard",
+        nullable=False,
+        comment="standard=1票審核, critical=2票MultiSig",
+    )
+    approval_votes: Mapped[list[dict[str, Any]]] = mapped_column(
+        JSON,
+        default=list,
+        nullable=False,
+        comment="[{user_id, voted_at, action}]",
+    )
+    required_votes: Mapped[int] = mapped_column(
+        Integer,
+        default=1,
+        nullable=False,
+        comment="standard=1, critical=2",
+    )
+
+    # 2026-04-06 ogt: Phase 26 — 關聯 Incident ID
+    # Playbook 萃取和 KM 寫入必須知道 incident_id，不能靠文字解析
+    incident_id: Mapped[str | None] = mapped_column(
+        String(64),
+        nullable=True,
+        index=True,
+        comment="Associated Incident ID (INC-YYYYMMDD-XXXXXX)",
+    )
+
+    # 2026-04-09 Claude Sonnet 4.6: Telegram 訊息持久化
+    # Redis tg_msg:{id} TTL 24h 過期後仍可查詢，支援跨 Session 狀態更新
+    telegram_message_id: Mapped[int | None] = mapped_column(
+        Integer,
+        nullable=True,
+        comment="Telegram message_id of the approval card sent to operator",
+    )
+    telegram_chat_id: Mapped[int | None] = mapped_column(
+        Integer,
+        nullable=True,
+        comment="Telegram chat_id where the approval card was sent",
+    )
+
    # Timestamps
    created_at: Mapped[datetime] = mapped_column(
        DateTime(timezone=True),
@@ -343,6 +386,109 @@ class AuditLog(Base):
    )


+# =============================================================================
+# AutoRepairExecution - Phase 10 操作記錄
+# 2026-04-08 Claude Code: 統帥指令「所有操作都必須被記錄，寫入資料庫」
+# =============================================================================
+
+class AutoRepairExecution(Base):
+    """
+    自動修復執行記錄
+
+    每次 evaluate_auto_repair 觸發並執行 (成功或失敗) 都寫入此表。
+    不依賴 approval_id（自動修復不需人工批准）。
+    """
+    __tablename__ = "auto_repair_executions"
+
+    id: Mapped[str] = mapped_column(String(36), primary_key=True, default=generate_uuid)
+
+    # 關聯
+    incident_id: Mapped[str] = mapped_column(String(30), nullable=False, index=True)
+    playbook_id: Mapped[str] = mapped_column(String(36), nullable=False, index=True)
+    playbook_name: Mapped[str] = mapped_column(String(200), nullable=False)
+
+    # 執行結果
+    success: Mapped[bool] = mapped_column(default=False, nullable=False)
+    executed_steps: Mapped[list] = mapped_column(JSON, default=list, nullable=False)
+    error_message: Mapped[str | None] = mapped_column(Text, nullable=True)
+
+    # 執行上下文
+    triggered_by: Mapped[str] = mapped_column(
+        String(50), default="auto_repair", nullable=False,
+        comment="auto_repair / cold_start_trust",
+    )
+    similarity_score: Mapped[float | None] = mapped_column(nullable=True)
+    risk_level: Mapped[str | None] = mapped_column(String(20), nullable=True)
+    execution_time_ms: Mapped[int | None] = mapped_column(Integer, nullable=True)
+
+    # 時間戳 (台北時區)
+    created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=taipei_now)
+
+    __table_args__ = (
+        Index("ix_are_created_at", "created_at"),
+        Index("ix_are_success", "success"),
+    )
+
+
+# =============================================================================
+# AlertOperationLog - Phase 11 告警操作溯源 (Event Sourcing)
+# 2026-04-08 Claude Code: 統帥指令「所有操作都必須被記錄，寫入資料庫」
+# 不可變 — 只 INSERT，不 UPDATE/DELETE
+# =============================================================================
+
+class AlertOperationLog(Base):
+    """
+    告警操作完整溯源
+
+    Event Sourcing 模式：每個告警生命週期的每個事件都寫一筆。
+    不可變 (Immutable)。
+
+    event_type 值：
+      ALERT_RECEIVED / TELEGRAM_SENT / USER_ACTION /
+      AUTO_REPAIR_TRIGGERED / EXECUTION_STARTED / EXECUTION_COMPLETED /
+      TELEGRAM_RESULT_SENT / RESOLVED / SILENCED / ESCALATED
+    """
+    __tablename__ = "alert_operation_log"
+
+    id: Mapped[str] = mapped_column(String(36), primary_key=True, default=generate_uuid)
+
+    # 關聯 (允許 NULL，不同事件有不同關聯)
+    incident_id: Mapped[str | None] = mapped_column(String(30), nullable=True, index=True)
+    approval_id: Mapped[str | None] = mapped_column(String(36), nullable=True, index=True)
+    audit_log_id: Mapped[str | None] = mapped_column(String(36), nullable=True)
+    auto_repair_id: Mapped[str | None] = mapped_column(String(36), nullable=True)
+
+    # 事件核心
+    # 2026-04-08 Claude Sonnet 4.6: Sprint 5.1 — 修正 enum 型別不符 (String→PgEnum, create_type=False)
+    event_type: Mapped[str] = mapped_column(
+        PgEnum(
+            "ALERT_RECEIVED", "TELEGRAM_SENT", "USER_ACTION", "AUTO_REPAIR_TRIGGERED",
+            "EXECUTION_STARTED", "EXECUTION_COMPLETED", "TELEGRAM_RESULT_SENT",
+            "RESOLVED", "SILENCED", "ESCALATED", "GUARDRAIL_BLOCKED",
+            "PRE_FLIGHT_PASSED", "PRE_FLIGHT_FAILED", "BACKUP_TRIGGERED",
+            "BACKUP_COMPLETED", "BACKUP_FAILED", "APPROVAL_ESCALATED", "CHANGE_APPLIED",
+            name="alert_event_type", create_type=False,
+        ),
+        nullable=False, index=True,
+    )
+    actor: Mapped[str | None] = mapped_column(String(100), nullable=True, index=True)
+    action_detail: Mapped[str | None] = mapped_column(String(200), nullable=True)
+
+    # 執行結果 (NULL = 不適用)
+    success: Mapped[bool | None] = mapped_column(nullable=True)
+    error_message: Mapped[str | None] = mapped_column(Text, nullable=True)
+
+    # 結構化上下文
+    context: Mapped[dict] = mapped_column(JSON, default=dict, nullable=False)
+
+    # 時間戳 (台北時區，不可變)
+    created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=taipei_now)
+
+    __table_args__ = (
+        Index("ix_aol_created_at", "created_at"),
+    )
+
+
 # =============================================================================
 # IncidentRecord - Phase 6.2 Episodic Memory (PostgreSQL)
 # =============================================================================
@@ -419,6 +565,32 @@ class IncidentRecord(Base):
        comment="事件結果與人類回饋",
    )

+    # === ADR-073 Phase 2 欄位 (2026-04-12 ogt) ===
+    alertname: Mapped[str | None] = mapped_column(
+        String(100),
+        nullable=True,
+        comment="告警名稱 (從 signals labels 抽取)",
+    )
+    notification_type: Mapped[str | None] = mapped_column(
+        String(10),
+        nullable=True,
+        comment="通知類型 TYPE-1/2/3/4/4D (早期分診)",
+    )
+    alert_category: Mapped[str | None] = mapped_column(
+        String(50),
+        nullable=True,
+        comment="告警類別 config_drift/info/backup/infrastructure/kubernetes/database/general",
+    )
+
+    # === 頻率快照 (Phase 27, 2026-04-10 ogt) ===
+    # frequency_stats 原本只存記憶體/Redis(TTL=35天)，Pod重啟或超期即失
+    # 此欄位在 incident 建立時寫入快照，永久保存當時的頻率統計
+    frequency_snapshot: Mapped[dict[str, Any] | None] = mapped_column(
+        JSON,
+        nullable=True,
+        comment="建立時刻的 AnomalyFrequency 快照，永久保存 (Phase 27)",
+    )
+
    # === 時間軸 ===
    created_at: Mapped[datetime] = mapped_column(
        DateTime(timezone=True),
@@ -530,6 +702,12 @@ class KnowledgeEntryRecord(Base):
        nullable=True,
        comment="關聯 Playbook Redis Key",
    )
+    # 2026-04-04 ogt: Phase 25 P1 — Anti-Pattern 閉環攔截用症狀 hash (SymptomPattern.compute_hash())
+    symptoms_hash: Mapped[str | None] = mapped_column(
+        String(16),
+        nullable=True,
+        comment="症狀模式 hash (16字元 SHA256 前綴)，Anti-Pattern 閉環攔截使用",
+    )

    # Metrics
    view_count: Mapped[int] = mapped_column(
@@ -556,4 +734,497 @@ class KnowledgeEntryRecord(Base):
        Index("ix_knowledge_category", "category"),
        Index("ix_knowledge_status", "status"),
        Index("ix_knowledge_created_at", "created_at"),
+        # 2026-04-04 ogt: Phase 25 P1 — Anti-Pattern 快速查詢
+        Index("ix_knowledge_symptoms_hash", "symptoms_hash"),
+    )
+
+
+# IncidentEvidence — ADR-081 Phase 1 EvidenceSnapshot 持久化
+# 2026-04-15 ogt + Claude Sonnet 4.6: AI 自主化飛輪 Phase 1 初始建立
+class IncidentEvidence(Base):
+    """
+    不可變事件證據快照表
+
+    每次決策前 PreDecisionInvestigator 拍攝一次 EvidenceSnapshot，
+    寫入此表以供：
+    - 決策溯源（LLM 推理過程的完整情報上下文）
+    - 學習訓練（Phase 3 fine-tune pipeline 金礦資料）
+    - 異常驗證（執行前 vs 執行後 state diff）
+
+    ADR-081: PreDecisionInvestigator + EvidenceSnapshot
+    設計原則：只追加寫入，禁止 UPDATE（event sourcing 對齊）
+    """
+    __tablename__ = "incident_evidence"
+
+    id: Mapped[str] = mapped_column(String(36), primary_key=True, default=generate_uuid)
+
+    # 關聯
+    incident_id: Mapped[str] = mapped_column(String(30), nullable=False)  # index via __table_args__
+    # Phase 3 填充：matched_playbook_id 目前永久 null，Phase 3 修復
+    matched_playbook_id: Mapped[str | None] = mapped_column(String(36), nullable=True)
+
+    # Schema 版本（方便 fine-tune pipeline 過濾相容版本）
+    schema_version: Mapped[str] = mapped_column(String(10), default="v1", nullable=False)
+
+    # 8D 感官數據（各維度 nullable — MCP 失敗時部分缺失）
+    k8s_state: Mapped[dict | None] = mapped_column(
+        JSON, nullable=True, comment="D1: kubectl describe pod + events"
+    )
+    recent_logs: Mapped[str | None] = mapped_column(
+        Text, nullable=True, comment="D2: container stderr tail-50，經 SanitizationService 清洗"
+    )
+    metrics_snapshot: Mapped[dict | None] = mapped_column(
+        JSON, nullable=True, comment="D3: Prometheus 5min vs 1h baseline 對比"
+    )
+    recent_deployments: Mapped[list | None] = mapped_column(
+        JSON, nullable=True, comment="D4: ArgoCD/Gitea 過去 1h 部署 diff"
+    )
+    business_metrics: Mapped[dict | None] = mapped_column(
+        JSON, nullable=True, comment="D5: 訂單量 / 登入成功率 / P0 SLI"
+    )
+    historical_context: Mapped[str | None] = mapped_column(
+        Text, nullable=True, comment="D6: 過去 30 天同 alertname 處置歷史摘要"
+    )
+    peer_health: Mapped[dict | None] = mapped_column(
+        JSON, nullable=True, comment="D7: 同 Deployment 其他 replica 健康度"
+    )
+    dependency_topology: Mapped[dict | None] = mapped_column(
+        JSON, nullable=True, comment="D8: Istio/Service Mesh 上下游 latency/error rate"
+    )
+    # Phase 4 ADR-084: 動態異常偵測增強感官（DynamicBaseline + LogAnomaly + TrendPredictor）
+    # 2026-04-15 ogt + Claude Sonnet 4.6（亞太）: Phase 4 8D 升級
+    anomaly_context: Mapped[dict | None] = mapped_column(
+        JSON, nullable=True,
+        comment="Phase 4 動態異常上下文：baseline_anomalies / log_patterns / trend_breaches"
+    )
+
+    # 感官品質指標
+    mcp_health: Mapped[dict] = mapped_column(
+        JSON, default=dict, nullable=False,
+        comment="各 MCP 呼叫成敗 {tool_name: bool}，用於 decision_fusion 權重調整"
+    )
+    collection_duration_ms: Mapped[int | None] = mapped_column(
+        Integer, nullable=True, comment="情報蒐集總耗時（ms），P99 目標 < 8000"
+    )
+    sensors_attempted: Mapped[int] = mapped_column(
+        default=0, nullable=False, comment="嘗試啟動的感官數"
+    )
+    sensors_succeeded: Mapped[int] = mapped_column(
+        default=0, nullable=False, comment="成功回傳資料的感官數"
+    )
+
+    # LLM 輸入摘要（不超 8K tokens，由 Investigator 壓縮）
+    evidence_summary: Mapped[str | None] = mapped_column(
+        Text, nullable=True, comment="最終餵給 LLM 的情報摘要（UTF-8，< 8K tokens）"
+    )
+
+    # 執行前後 State（PostExecutionVerifier 填入 post_execution_state）
+    pre_execution_state: Mapped[dict | None] = mapped_column(
+        JSON, nullable=True, comment="執行前環境狀態快照（PostExecutionVerifier 基準線）"
+    )
+    post_execution_state: Mapped[dict | None] = mapped_column(
+        JSON, nullable=True, comment="執行後環境狀態（PostExecutionVerifier 抓取，Phase 1 接線）"
+    )
+    verification_result: Mapped[str | None] = mapped_column(
+        String(20), nullable=True, comment="success / degraded / failed / timeout（PostExecutionVerifier 填入）"
+    )
+
+    # 時間戳（台北時區）
+    collected_at: Mapped[datetime] = mapped_column(
+        DateTime(timezone=True), default=taipei_now, nullable=False
+    )
+
+    __table_args__ = (
+        Index("ix_incident_evidence_incident_id", "incident_id"),
+        Index("ix_incident_evidence_collected_at", "collected_at"),
+        Index("ix_incident_evidence_playbook_id", "matched_playbook_id"),
+    )
+
+
+# =============================================================================
+# PlaybookRecord — Phase 3.5 Playbook PostgreSQL 持久化 (System of Record)
+# ADR-085: AI 學習成果不可存在 Cache — Playbook 是 AI 的肌肉記憶
+# 2026-04-15 ogt + Claude Sonnet 4.6（亞太）: Phase 3.5 初始建立
+#
+# 核心鐵律：
+# - PostgreSQL = System of Record（永久保存，AI 的長期記憶）
+# - Redis = Warm Cache（7天 TTL，加速讀取，DB 為 source of truth）
+# - trust_score, EWMA, 統計數據必須持久化 — 不能因 Redis TTL 消失
+# =============================================================================
+
+class PlaybookRecord(Base):
+    """
+    Playbook 修復劇本 PostgreSQL ORM
+
+    與 Pydantic Playbook 模型對應。
+    Redis 為 warm cache（7d TTL），PostgreSQL 為 source of truth。
+
+    設計原則：
+    - AI 的學習成果（trust_score、success_count、failure_count）永久保存
+    - EWMA 信任度在 Redis TTL 後不會重置，Pod 重啟後 AI 記憶不失
+    - 雙寫：create/update 先寫 PG，再更新 Redis cache
+    - 讀取：Redis-first（cache hit），miss 時從 PG 載入並回填 Redis
+    """
+    __tablename__ = "playbooks"
+
+    # Primary Key
+    playbook_id: Mapped[str] = mapped_column(
+        String(36), primary_key=True,
+        comment="Playbook 唯一識別碼 (PB-YYYYMMDD-XXXXXX)",
+    )
+
+    # Core Fields
+    name: Mapped[str] = mapped_column(String(256), nullable=False)
+    description: Mapped[str] = mapped_column(Text, default="", nullable=False)
+    status: Mapped[str] = mapped_column(String(20), default="draft", nullable=False)
+    source: Mapped[str] = mapped_column(String(20), default="extracted", nullable=False)
+
+    # Complex structures (JSONB)
+    symptom_pattern: Mapped[dict[str, Any]] = mapped_column(JSON, default=dict, nullable=False)
+    repair_steps: Mapped[list[dict[str, Any]]] = mapped_column(JSON, default=list, nullable=False)
+
+    # Timing
+    estimated_duration_minutes: Mapped[int] = mapped_column(Integer, default=5, nullable=False)
+
+    # Source tracing
+    source_incident_ids: Mapped[list[str]] = mapped_column(JSON, default=list, nullable=False)
+    ai_confidence: Mapped[float] = mapped_column(default=0.0, nullable=False)
+
+    # Stats — MUST be in PG (AI learning artifacts, cannot expire)
+    success_count: Mapped[int] = mapped_column(Integer, default=0, nullable=False)
+    failure_count: Mapped[int] = mapped_column(Integer, default=0, nullable=False)
+    last_used_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True), nullable=True)
+
+    # EWMA trust score — ADR-083 Phase 3, 絕對不能用 Redis TTL 管理
+    # trust_score 是 AI 累積學習的結晶，TTL 到期就歸零 = AI 記憶全部消失
+    trust_score: Mapped[float] = mapped_column(default=0.3, nullable=False,
+        comment="EWMA 動態信任度 (Phase 3)。成功 α=0.1，失敗 α=0.2（2x 衰減）。< 0.1 → 封存")
+
+    # Approval metadata
+    approved_by: Mapped[str | None] = mapped_column(String(100), nullable=True)
+    approved_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True), nullable=True)
+    tags: Mapped[list[str]] = mapped_column(JSON, default=list, nullable=False)
+    notes: Mapped[str | None] = mapped_column(Text, nullable=True)
+
+    # Sprint 5.1 護欄欄位 (2026-04-08)
+    requires_approval_level: Mapped[str] = mapped_column(
+        String(20), default="auto", nullable=False,
+        comment="auto=直接執行, standard=1票, critical=2票MultiSig",
+    )
+    stateful_targets: Mapped[list[str]] = mapped_column(JSON, default=list, nullable=False)
+    requires_pre_backup: Mapped[bool] = mapped_column(default=False, nullable=False)
+
+    # Timestamps
+    created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=taipei_now, nullable=False)
+    updated_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=taipei_now,
+        onupdate=taipei_now, nullable=False)
+
+    __table_args__ = (
+        Index("ix_playbook_status", "status"),
+        Index("ix_playbook_trust_score", "trust_score"),
+        Index("ix_playbook_created_at", "created_at"),
+    )
+
+
+# =============================================================================
+# DynamicBaselineRecord — Phase 4 Holt-Winters 訓練基線持久化
+# ADR-084: 動態基線不能只存 Redis — AI 每天重學「正常」不是在學習
+# 2026-04-15 ogt + Claude Sonnet 4.6（亞太）: Phase 4 初始建立
+#
+# 核心鐵律：
+# - 訓練好的 Holt-Winters 模型必須在 PG 長期保存
+# - Redis 為 24h warm cache（加速 is_anomaly() 讀取）
+# - 基線消失 = AI 對「正常」的認識消失 = 每天從頭學習 = 不是 AI
+# =============================================================================
+
+class DynamicBaselineRecord(Base):
+    """
+    動態基線訓練結果 PostgreSQL ORM
+
+    Holt-Winters 訓練完成後：
+    1. 先寫入 PG（永久保存）
+    2. 再寫入 Redis（24h warm cache，加速讀取）
+
+    Redis key: baseline:{metric_name}
+    PG: 此表，metric_name 為主鍵，最新一筆 = 有效基線
+    """
+    __tablename__ = "dynamic_baselines"
+
+    id: Mapped[str] = mapped_column(String(36), primary_key=True, default=generate_uuid)
+
+    # 基線識別
+    metric_name: Mapped[str] = mapped_column(
+        String(200), nullable=False, index=True,
+        comment="基線識別名 (e.g. cpu_usage_node_mon)",
+    )
+
+    # 訓練結果（Holt-Winters 統計）
+    mean: Mapped[float] = mapped_column(nullable=False, comment="擬合值均值")
+    std: Mapped[float] = mapped_column(nullable=False, comment="殘差標準差")
+
+    # 24h 季節性因子（JSON 陣列，長度 24）
+    seasonal_factors: Mapped[list[float]] = mapped_column(
+        JSON, default=list, nullable=False,
+        comment="24h 週期季節性因子（乘法形式，均值 ≈ 1.0）",
+    )
+
+    # 訓練元資料
+    datapoint_count: Mapped[int] = mapped_column(Integer, default=0, nullable=False)
+    promql: Mapped[str] = mapped_column(Text, default="", nullable=False,
+        comment="訓練使用的 PromQL 查詢")
+    lookback_hours: Mapped[int] = mapped_column(Integer, default=336, nullable=False)
+
+    # Timestamps
+    trained_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=taipei_now, nullable=False)
+    created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=taipei_now, nullable=False)
+
+    __table_args__ = (
+        Index("ix_dynamic_baseline_metric", "metric_name"),
+        Index("ix_dynamic_baseline_trained_at", "trained_at"),
+    )
+
+
+# =============================================================================
+# LogClusterRecord — Phase 4 Drain3 學習到的 Log Pattern 持久化
+# ADR-084: Drain3 模板不能只存 Redis — 每次重啟 AI 把已知 pattern 當新 pattern
+# 2026-04-15 ogt + Claude Sonnet 4.6（亞太）: Phase 4 初始建立
+#
+# 核心鐵律：
+# - Drain3 學到的 log cluster template 必須在 PG 長期保存
+# - 新 cluster 事件列表 (log_anomaly:new) 才存 Redis（短期工作記憶）
+# - 基礎知識庫（已學到的 pattern）必須在 PG
+# =============================================================================
+
+class LogClusterRecord(Base):
+    """
+    Drain3 Log Cluster Template 持久化
+
+    每個新 pattern 首次偵測到時：
+    1. 寫入 PG（永久保存，AI 的 log 語意理解）
+    2. 推送到 Redis list log_anomaly:new（短期工作記憶）
+
+    Re-detect 相同 template 時只更新 last_seen_at + size，不重複寫入 PG。
+    """
+    __tablename__ = "log_clusters"
+
+    id: Mapped[str] = mapped_column(String(36), primary_key=True, default=generate_uuid)
+
+    # Cluster 識別（MD5[:8] of template）
+    cluster_id: Mapped[str] = mapped_column(
+        String(16), nullable=False, unique=True, index=True,
+        comment="模板 MD5[:8].upper()，穩定 ID",
+    )
+
+    # Drain3 模板
+    template: Mapped[str] = mapped_column(
+        Text, nullable=False,
+        comment="Drain3 萃取的 log 模板 (e.g. 'ERROR <*> connection failed to <*>')",
+    )
+
+    # 統計
+    size: Mapped[int] = mapped_column(Integer, default=1, nullable=False,
+        comment="命中次數（第一次 = 1）")
+    source: Mapped[str] = mapped_column(String(50), default="k8s_pod", nullable=False,
+        comment="k8s_pod | host_syslog | app_log")
+
+    # 樣本日誌（保留首次觸發的原始行，供事後分析）
+    sample_log: Mapped[str | None] = mapped_column(Text, nullable=True,
+        comment="首次觸發的原始 log 行（前 500 字元）")
+
+    # Timestamps
+    first_seen_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=taipei_now, nullable=False)
+    last_seen_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=taipei_now,
+        onupdate=taipei_now, nullable=False)
+
+    __table_args__ = (
+        Index("ix_log_cluster_first_seen", "first_seen_at"),
+        Index("ix_log_cluster_source", "source"),
+    )
+
+
+# =============================================================================
+# AgentSession — Phase 2 多 Agent 辯證 Audit Trail
+# =============================================================================
+
+class AgentSession(Base):
+    """
+    ADR-082 Phase 2: 多 Agent 辯證 Immutable Event Log
+
+    每個 Agent 每次「發言」寫一行。
+    session_id 串連同一次 Incident 決策的所有 Agent turns。
+
+    不可刪除 — 只能新增（Immutable Event Sourcing）。
+    Phase 3 學習閉環依賴此表（Critic 挑戰成功作為負向學習信號）。
+
+    ADR-082: 多 Agent 協作架構
+    2026-04-15 ogt + Claude Sonnet 4.6（亞太）: Phase 2 初始建立
+    """
+    __tablename__ = "agent_sessions"
+
+    id: Mapped[str] = mapped_column(
+        String(36), primary_key=True, default=lambda: str(uuid4()),
+        comment="行主鍵（UUID）"
+    )
+    session_id: Mapped[str] = mapped_column(
+        String(36), nullable=False,
+        comment="辯證 Session ID（一次 Incident 決策的所有 turns 共用同一 session_id）"
+    )
+    incident_id: Mapped[str] = mapped_column(
+        String(50), nullable=False,
+        comment="關聯 Incident ID"
+    )
+    agent_role: Mapped[str] = mapped_column(
+        String(20), nullable=False,
+        comment="Agent 角色：diagnostician / solver / reviewer / critic / coordinator"
+    )
+
+    # 輸入指紋（sha256[:16]）— 用於查重、快取命中追蹤
+    input_hash: Mapped[str] = mapped_column(
+        String(16), nullable=False, default="",
+        comment="sha256(input_json)[:16]，供查重與快取命中追蹤"
+    )
+
+    # Agent 輸出（完整 JSON，供 Phase 3 學習 + 事後複盤）
+    output_json: Mapped[dict] = mapped_column(
+        JSON, nullable=False, default=dict,
+        comment="Agent 原始輸出（DiagnosisReport / ActionPlan / 等序列化 dict）"
+    )
+
+    # 品質指標
+    latency_ms: Mapped[int] = mapped_column(
+        Integer, nullable=False, default=0,
+        comment="此 Agent 的執行耗時（ms）"
+    )
+    vote: Mapped[str] = mapped_column(
+        String(20), nullable=False, default="abstain",
+        comment="Agent 投票：approve / reject / request_revision / abstain / degraded"
+    )
+    degraded: Mapped[bool] = mapped_column(
+        nullable=False, default=False,
+        comment="True = 此 Agent 因熔斷/超時降級，輸出為 rule-based mock"
+    )
+
+    # 時間戳（台北時區）
+    created_at: Mapped[datetime] = mapped_column(
+        DateTime(timezone=True), default=taipei_now, nullable=False
+    )
+
+    __table_args__ = (
+        Index("ix_agent_sessions_session_id", "session_id"),
+        Index("ix_agent_sessions_incident_id", "incident_id"),
+        Index("ix_agent_sessions_created_at", "created_at"),
+        # 查詢某 session 中特定 role 的 turn（Coordinator 聚合時常用）
+        Index("ix_agent_sessions_session_role", "session_id", "agent_role"),
+    )
+
+
+# =============================================================================
+# AiGovernanceEvent — Phase 6 自我治理事件溯源（不可刪除）
+# ADR-087: AI 自我治理閉環：SLO 違反 / 信任漂移 / KB 腐爛 / 自我降級
+# 2026-04-15 ogt + Claude Sonnet 4.6（亞太）: Phase 6 初始建立
+#
+# 核心鐵律：
+# - 不可變 Event Sourcing — 只 INSERT，禁止 UPDATE/DELETE
+# - 所有治理事件必須落地 PG，SLO dashboard 依賴此表
+# - resolved=True 僅由人工或下次計算時補填，不可自動翻轉未解決項目
+# =============================================================================
+
+class AiGovernanceEvent(Base):
+    """
+    AI 自我治理事件記錄（不可變）
+
+    event_type 值：
+      slo_violation     — SLO 計算結果違反閾值
+      trust_drift       — Playbook 信任度分布偏態（全高或全低）
+      kb_stale          — KB 條目引用已廢棄 K8s API / Prometheus query
+      self_demotion     — 信心閾值自動調高（自我降級）
+      conservative_mode — 連續 SLO 違反，全系統切保守模式
+      replay_degraded   — 離線回放一致率連續下降
+
+    immutable — 只 INSERT，禁 UPDATE / DELETE
+    """
+    __tablename__ = "ai_governance_events"
+
+    id: Mapped[str] = mapped_column(
+        String(36), primary_key=True, default=generate_uuid,
+        comment="主鍵（UUID）"
+    )
+    event_type: Mapped[str] = mapped_column(
+        String(40), nullable=False,
+        comment="slo_violation / trust_drift / kb_stale / self_demotion / conservative_mode / replay_degraded"
+    )
+    triggered_at: Mapped[datetime] = mapped_column(
+        DateTime(timezone=True), default=taipei_now, nullable=False,
+        comment="事件觸發時間（台北時區）"
+    )
+    details: Mapped[dict] = mapped_column(
+        JSON, nullable=False, default=dict,
+        comment="事件詳情 JSONB（SLO 數值、漂移分布等）"
+    )
+    resolved: Mapped[bool] = mapped_column(
+        default=False, nullable=False,
+        comment="是否已解決（人工確認或下次計算恢復正常後補填）"
+    )
+    resolved_at: Mapped[datetime | None] = mapped_column(
+        DateTime(timezone=True), nullable=True,
+        comment="解決時間（僅人工/系統補填，不得自動反轉未解決項目）"
+    )
+
+    __table_args__ = (
+        Index("ix_ai_governance_event_type", "event_type"),
+        Index("ix_ai_governance_triggered_at", "triggered_at"),
+        Index("ix_ai_governance_resolved", "resolved"),
+    )
+
+
+# =============================================================================
+# TrustRecordDB - ADR-088 TrustScore 持久化
+# =============================================================================
+
+class TrustRecordDB(Base):
+    """
+    Trust Score 持久化記錄
+
+    ADR-088: TrustScoreManager 從記憶體升級為 PostgreSQL 持久化。
+    Pod 重啟後分數不歸零，AI 能真正累積信任達到 L4 自動放行。
+
+    score >= 5:  MEDIUM → LOW (自動執行)
+    score >= 10: HIGH → MEDIUM (降一級)
+
+    2026-04-17 ogt + Claude Sonnet 4.6（亞太）: Phase 4 信任持久化
+    """
+    __tablename__ = "trust_records"
+
+    action_pattern: Mapped[str] = mapped_column(
+        String(255), primary_key=True,
+        comment="操作模式，例如 delete:nginx-frontend-*"
+    )
+    score: Mapped[int] = mapped_column(
+        Integer, nullable=False, default=0,
+        comment="累積信任分數。+1/approve，reject 歸零"
+    )
+    total_approvals: Mapped[int] = mapped_column(
+        Integer, nullable=False, default=0,
+    )
+    total_rejections: Mapped[int] = mapped_column(
+        Integer, nullable=False, default=0,
+    )
+    last_approval_by: Mapped[str | None] = mapped_column(String(100), nullable=True)
+    last_approval_at: Mapped[datetime | None] = mapped_column(
+        DateTime(timezone=True), nullable=True,
+    )
+    last_rejection_by: Mapped[str | None] = mapped_column(String(100), nullable=True)
+    last_rejection_at: Mapped[datetime | None] = mapped_column(
+        DateTime(timezone=True), nullable=True,
+    )
+    created_at: Mapped[datetime] = mapped_column(
+        DateTime(timezone=True), nullable=False, default=taipei_now,
+    )
+    updated_at: Mapped[datetime] = mapped_column(
+        DateTime(timezone=True), nullable=False, default=taipei_now, onupdate=taipei_now,
+    )
+
+    __table_args__ = (
+        Index("ix_trust_records_score", "score"),
+        Index("ix_trust_records_updated", "updated_at"),
    )
--- a/apps/api/src/jobs/init.py
+++ b/apps/api/src/jobs/init.py
@@ -0,0 +1,15 @@
+"""
+AWOOOI AIOps Jobs
+==================
+定時任務（非 Redis Streams Worker）
+
+目前包含：
+- baseline_snapshot: Phase 0 觀測基線快照
+- knowledge_decay_job: Phase 3 30 天知識遺忘 (待建)
+- detection_feedback_writer: Phase 3 誤判告警回寫 (待建)
+- offline_replay_service: Phase 6 週度離線回放 (待建)
+- kb_rot_cleaner: Phase 6 月度 KB 腐爛清理 (待建)
+
+ADR-080: AI 自主化飛輪總綱
+2026-04-15 ogt: Phase 0 — 初始建立
+"""
--- a/apps/api/src/jobs/ai_slo_watchdog_job.py
+++ b/apps/api/src/jobs/ai_slo_watchdog_job.py
@@ -0,0 +1,166 @@
+"""
+AI SLO Watchdog Job — 系統自健診（每 15 分鐘）
+=============================================
+MASTER §1.1 AI 自主化方向：系統必須能感知自身故障。
+ADR-092 (2026-04-20 ogt + Claude Opus 4.7 Asia/Taipei)
+
+檢查項目：
+  W-1  AI SLO 違反（決策品質，7d 滾動）
+  W-2  Telegram 靜默偵測（PENDING 告警無 tg_sent 確認超過 30 分鐘）
+  W-3  飛輪 execution_success_rate 低落（< 30%）
+
+任一異常 → send_meta_alert（TYPE-8M，flywheel_health）
+去重：Redis watchdog:alert:{dedup_hash} TTL 1h，避免每 15 分鐘重複洗版
+"""
+from __future__ import annotations
+
+import asyncio
+import uuid
+from datetime import UTC, datetime, timedelta
+
+import structlog
+from sqlalchemy import and_, select
+
+from src.core.redis_client import get_redis
+from src.db.base import get_db_context
+from src.db.models import ApprovalRecord
+from src.models.approval import ApprovalStatus
+from src.utils.timezone import now_taipei
+
+logger = structlog.get_logger(__name__)
+
+_INTERVAL_SEC = 900      # 每 15 分鐘
+_DEDUP_TTL_SEC = 3600    # 同一告警 1 小時內不重複發送
+_TG_SILENCE_THRESHOLD = 2  # PENDING 無 tg_sent 確認數量告警門檻
+_FLYWHEEL_SUCCESS_MIN = 0.30  # 執行成功率下限
+
+
+async def run_ai_slo_watchdog_loop() -> None:
+    """
+    永久迴圈：每 15 分鐘自健診，異常時發送 TYPE-8M Meta-System 告警。
+    由 main.py lifespan 透過 asyncio.create_task() 啟動。
+    """
+    logger.info("ai_slo_watchdog_started", interval_sec=_INTERVAL_SEC)
+    while True:
+        try:
+            await _check_once()
+        except Exception as e:
+            logger.warning("ai_slo_watchdog_error", error=str(e))
+        await asyncio.sleep(_INTERVAL_SEC)
+
+
+async def _check_once() -> None:
+    violations: list[str] = []
+
+    # W-1: AI SLO 違反（決策品質 7d 滾動）
+    try:
+        from src.services.ai_slo_calculator import AiSloCalculator
+        report = await AiSloCalculator().calculate()
+        if report.any_violated:
+            violated = [m.name for m in report.metrics if m.violated]
+            violations.append(f"SLO 違反: {', '.join(violated)}")
+    except Exception as e:
+        logger.warning("watchdog_w1_slo_check_failed", error=str(e))
+
+    # W-2: Telegram 靜默偵測（PENDING 無 tg_sent 確認 > 30 分鐘）
+    try:
+        silent_count = await _count_pending_no_tg_sent()
+        if silent_count >= _TG_SILENCE_THRESHOLD:
+            violations.append(f"{silent_count} 個 PENDING 告警超 30 分鐘無 Telegram 確認（疑似靜默故障）")
+    except Exception as e:
+        logger.warning("watchdog_w2_tg_silence_check_failed", error=str(e))
+
+    # W-3: 飛輪執行成功率過低
+    try:
+        from src.services.flywheel_stats_service import FlywheelStatsService
+        metrics = await FlywheelStatsService().compute()
+        if metrics and metrics.execution_success_rate < _FLYWHEEL_SUCCESS_MIN:
+            violations.append(f"飛輪執行成功率 {metrics.execution_success_rate:.1%} < {_FLYWHEEL_SUCCESS_MIN:.0%}")
+    except Exception as e:
+        logger.warning("watchdog_w3_flywheel_check_failed", error=str(e))
+
+    # W-4: 無 APPROVED Playbook（自動修復鏈路斷裂）
+    try:
+        approved_count = await _count_approved_playbooks()
+        if approved_count == 0:
+            violations.append("無 APPROVED Playbook — 自動修復鏈路斷裂（evolver 可能全部封存）")
+    except Exception as e:
+        logger.warning("watchdog_w4_playbook_check_failed", error=str(e))
+
+    if not violations:
+        logger.debug("ai_slo_watchdog_all_ok", checks=4)
+        return
+
+    # 去重：violations 相同內容 1 小時內不重複發
+    dedup_hash = f"{hash(tuple(sorted(violations))) & 0xFFFFFF:06x}"
+    dedup_key = f"watchdog:alert:{dedup_hash}"
+    redis = get_redis()
+    if await redis.exists(dedup_key):
+        logger.debug("ai_slo_watchdog_deduped", key=dedup_key)
+        return
+    await redis.setex(dedup_key, _DEDUP_TTL_SEC, "1")
+
+    # 發送 TYPE-8M Meta-System 告警
+    diagnosis = " | ".join(violations)
+    incident_id = f"META-{now_taipei().strftime('%Y%m%d%H%M%S')}"
+    try:
+        from src.services.telegram_gateway import get_telegram_gateway
+        await get_telegram_gateway().send_meta_alert(
+            incident_id=incident_id,
+            approval_id=str(uuid.uuid4()),
+            alertname="AI 自健診異常",
+            alert_category="flywheel_health",
+            diagnosis=diagnosis,
+            severity_level="critical",
+            system_impact=f"{len(violations)} 項 KPI 異常，飛輪自動化能力可能降級",
+        )
+        logger.warning(
+            "ai_slo_watchdog_alert_sent",
+            incident_id=incident_id,
+            violation_count=len(violations),
+            violations=violations,
+        )
+    except Exception as e:
+        logger.error("ai_slo_watchdog_telegram_failed", error=str(e), violations=violations)
+
+
+async def _count_pending_no_tg_sent() -> int:
+    """
+    查詢 PENDING 超過 30 分鐘且 Redis tg_sent:{fingerprint} 無確認的告警數量。
+    與 ADR-092 B2 修復配合：B2 修復後新告警會標記 tg_sent；
+    此查詢偵測仍存在的靜默告警（B2 修復前殘留 + 未來潛在故障）。
+    """
+    cutoff = datetime.now(UTC) - timedelta(minutes=30)
+    redis = get_redis()
+    silent = 0
+
+    async with get_db_context() as db:
+        result = await db.execute(
+            select(ApprovalRecord.id, ApprovalRecord.fingerprint)
+            .where(
+                and_(
+                    ApprovalRecord.status == ApprovalStatus.PENDING,
+                    ApprovalRecord.created_at <= cutoff,
+                    ApprovalRecord.fingerprint.isnot(None),
+                )
+            )
+            .limit(20)
+        )
+        rows = result.all()
+
+    for row in rows:
+        fp = row.fingerprint
+        if fp and not await redis.exists(f"tg_sent:{fp}"):
+            silent += 1
+
+    return silent
+
+
+async def _count_approved_playbooks() -> int:
+    """查詢 APPROVED 狀態 Playbook 數量，為 0 代表自動修復鏈路斷裂。"""
+    from sqlalchemy import text as sa_text
+    async with get_db_context() as db:
+        result = await db.execute(
+            sa_text("SELECT COUNT(*) FROM playbooks WHERE status = 'approved'")
+        )
+        return result.scalar() or 0
--- a/apps/api/src/jobs/approval_timeout_resolver.py
+++ b/apps/api/src/jobs/approval_timeout_resolver.py
@@ -0,0 +1,150 @@
+"""
+AWOOOI — Approval Timeout Resolver（逾期 Approval 自動結案 Job）
+================================================================
+職責：每小時掃描 approval_records 中已逾期（expires_at < now）但狀態仍為
+PENDING 的記錄，標記為 EXPIRED，並對其關聯的 Incident 呼叫 resolve_incident
+以確保 KM 學習鏈完整閉環。
+
+為什麼需要這個 Job？
+  get_pending_approvals() 有自動過期邏輯，但只在用戶開啟待處理列表時觸發。
+  若無人開 UI，PENDING 記錄永遠停留，關聯 Incident 不會 RESOLVED，
+  km_conversion_service 永不觸發，AI 學習飛輪對「無人處置的告警」完全盲目。
+
+disposition 記錄：
+  timeout_ignored — 與 auto_repair / human_approved 區別，
+  讓 anomaly_counter 統計反映「AI 建議但被人類忽略」的現象，
+  供 Phase 6 SLO human_override_rate 校正。
+
+設計原則：
+  1. 只更新 DB，不刪除記錄（符合 archive_not_delete 鐵律）
+  2. resolve_incident 使用 resolution_type="timeout"，記錄正確 disposition
+  3. 失敗 → 只記錄 error，不影響主路徑
+  4. 每次執行記錄 resolved_count / error_count
+
+2026-04-15 ogt + Claude Sonnet 4.6（亞太）：P2 飛輪斷鏈修復
+"""
+
+from __future__ import annotations
+
+import asyncio
+from datetime import UTC, datetime, timedelta
+
+import structlog
+from sqlalchemy import and_, select, update
+
+from src.db.base import get_db_context
+from src.db.models import ApprovalRecord
+from src.models.approval import ApprovalStatus
+from src.utils.timezone import now_taipei
+
+logger = structlog.get_logger(__name__)
+
+# 每次最多處理幾筆，避免單次執行阻塞過長
+BATCH_LIMIT = 50
+
+
+async def run_approval_timeout_resolver() -> None:
+    """
+    無限迴圈：每小時執行一次逾期 Approval 結案掃描。
+    在 main.py startup 以 asyncio.create_task 掛載。
+    """
+    while True:
+        try:
+            resolved, errors = await _resolve_expired_approvals()
+            if resolved > 0 or errors > 0:
+                logger.info(
+                    "approval_timeout_resolver_done",
+                    resolved=resolved,
+                    errors=errors,
+                )
+        except Exception as e:
+            logger.error("approval_timeout_resolver_loop_error", error=str(e))
+
+        await asyncio.sleep(3600)  # 每小時執行一次
+
+
+async def _resolve_expired_approvals() -> tuple[int, int]:
+    """
+    找出已逾期的 PENDING approval，標記 EXPIRED 並結案對應 Incident。
+
+    Returns:
+        (resolved_count, error_count)
+    """
+    now = datetime.now(UTC)
+    resolved = 0
+    errors = 0
+
+    # Step 1: 找出逾期但仍 PENDING 的記錄（有 expires_at 且逾期）
+    async with get_db_context() as db:
+        result = await db.execute(
+            select(ApprovalRecord)
+            .where(
+                and_(
+                    ApprovalRecord.status == ApprovalStatus.PENDING,
+                    ApprovalRecord.expires_at.is_not(None),
+                    ApprovalRecord.expires_at < now,
+                )
+            )
+            .order_by(ApprovalRecord.expires_at)
+            .limit(BATCH_LIMIT)
+        )
+        expired_records = result.scalars().all()
+
+        if not expired_records:
+            return 0, 0
+
+        # Step 2: 批次標記 EXPIRED
+        expired_ids = [r.id for r in expired_records]
+        await db.execute(
+            update(ApprovalRecord)
+            .where(ApprovalRecord.id.in_(expired_ids))
+            .values(status=ApprovalStatus.EXPIRED, resolved_at=now)
+        )
+        await db.commit()
+
+        logger.info(
+            "approval_timeout_batch_expired",
+            count=len(expired_ids),
+            ids=[str(i)[:8] for i in expired_ids[:10]],
+        )
+
+    # Step 3: 對每筆有 incident_id 的記錄呼叫 resolve_incident
+    from src.services.incident_service import get_incident_service
+
+    inc_svc = get_incident_service()
+
+    for record in expired_records:
+        incident_id = getattr(record, "incident_id", None)
+        if not incident_id:
+            continue
+
+        try:
+            result = await inc_svc.resolve_incident(
+                incident_id=str(incident_id),
+                resolution_type="timeout",
+            )
+            if result:
+                resolved += 1
+                logger.info(
+                    "approval_timeout_incident_resolved",
+                    approval_id=str(record.id)[:8],
+                    incident_id=str(incident_id)[:8],
+                )
+            else:
+                # incident_not_found 或已 RESOLVED，不算 error
+                logger.debug(
+                    "approval_timeout_incident_skip",
+                    approval_id=str(record.id)[:8],
+                    incident_id=str(incident_id)[:8],
+                    reason="not_found_or_already_resolved",
+                )
+        except Exception as e:
+            errors += 1
+            logger.error(
+                "approval_timeout_resolve_error",
+                approval_id=str(record.id)[:8],
+                incident_id=str(incident_id)[:8],
+                error=str(e),
+            )
+
+    return resolved, errors
--- a/apps/api/src/jobs/asset_change_tracker_job.py
+++ b/apps/api/src/jobs/asset_change_tracker_job.py
@@ -0,0 +1,246 @@
+"""
+Asset Change Tracker Job — ADR-090 § asset_change_event
+=========================================================
+每 1h 比對最近兩次 asset_discovery_run,寫 asset_change_event (added/removed/modified).
+
+職責邊界:
+  ✅ 比對 run_N vs run_N-1 的 asset set
+  ✅ 新出現的 asset → 'asset_added' event
+  ✅ 消失的 asset (lifecycle 'deprecated' 或完全不在新 run) → 'asset_removed'
+  ✅ 存在於兩次但 metadata 有差異 → 'asset_modified'
+  ⏳ TODO: coverage_improved/degraded (需要 coverage_evaluator 歷史比對)
+  ⏳ TODO: criticality_changed / owner_changed (需人工設定 criticality 欄位)
+
+設計鐵律:
+  - 用 asset_key (UNIQUE) 作比對基準,跨 run 穩定
+  - before_state/after_state 存 metadata JSONB 便於 AI 分析
+  - diff jsonb 標註變動欄位
+  - 失敗 → log + 跳過,下次重試
+
+排程:
+  - 首次延遲 360s (讓 asset_scanner 至少跑 2 次)
+  - 每 1h
+
+2026-04-19 ogt + Claude Opus 4.7 (1M context) Asia/Taipei
+ADR-090 § Phase 7 Change Tracking
+"""
+from __future__ import annotations
+
+import asyncio
+import json as _json
+import time as _time
+from typing import Any
+
+import structlog
+
+logger = structlog.get_logger(__name__)
+
+_TRACK_INTERVAL_SEC = 3600
+_FIRST_DELAY_SEC = 360
+_LOOP_BACKOFF_SEC = 600
+
+
+async def run_asset_change_tracker_loop() -> None:
+    """每 1h 比對最近兩次 run,寫 asset_change_event."""
+    logger.info("asset_change_tracker_loop_started", interval_sec=_TRACK_INTERVAL_SEC)
+    await asyncio.sleep(_FIRST_DELAY_SEC)
+
+    while True:
+        try:
+            await track_once()
+        except Exception as e:
+            logger.exception("asset_change_tracker_loop_error", error=str(e))
+            await asyncio.sleep(_LOOP_BACKOFF_SEC)
+            continue
+        await asyncio.sleep(_TRACK_INTERVAL_SEC)
+
+
+async def track_once() -> dict[str, int]:
+    """比對兩個最近的 run,產出 change events."""
+    started_ms = _time.time()
+    stats = {"added": 0, "removed": 0, "modified": 0}
+    error_msg: str | None = None
+
+    try:
+        runs = await _get_recent_runs(limit=2)
+        if len(runs) < 2:
+            logger.info("asset_change_tracker_need_two_runs", got=len(runs))
+            return stats
+
+        newer_run, older_run = runs[0], runs[1]
+        logger.info("asset_change_tracker_comparing", newer=newer_run, older=older_run)
+
+        stats = await _diff_runs(newer_run, older_run)
+
+    except Exception as e:
+        error_msg = f"{type(e).__name__}: {e}"[:1000]
+        logger.exception("asset_change_track_once_failed", error=error_msg)
+
+    duration_ms = int((_time.time() - started_ms) * 1000)
+    await _log_aol(stats, duration_ms, error_msg)
+
+    logger.info(
+        "asset_change_track_once_done",
+        added=stats["added"],
+        removed=stats["removed"],
+        modified=stats["modified"],
+        duration_ms=duration_ms,
+    )
+    return stats
+
+
+async def _get_recent_runs(limit: int = 2) -> list[str]:
+    """取最近 N 個 success 的 run_id (降序)."""
+    from sqlalchemy import text as _sql
+    from src.db.base import get_db_context
+
+    async with get_db_context() as db:
+        rows = await db.execute(
+            _sql("SELECT run_id FROM asset_discovery_run WHERE status='success' ORDER BY ended_at DESC LIMIT :lim"),
+            {"lim": limit},
+        )
+        return [str(r[0]) for r in rows.fetchall()]
+
+
+async def _diff_runs(newer_run: str, older_run: str) -> dict[str, int]:
+    """
+    比較兩個 run 所關聯的 asset set (via asset_coverage_snapshot JOIN asset_inventory).
+
+    Strategy:
+      - 用 coverage_snapshot 知道哪些 asset 出現在哪 run
+      - newer - older = added
+      - older - newer = removed (同時 lifecycle_state 改 deprecated by asset_scanner 流程)
+      - newer ∩ older 且 metadata 變 = modified
+    """
+    from sqlalchemy import text as _sql
+    from src.db.base import get_db_context
+
+    stats = {"added": 0, "removed": 0, "modified": 0}
+
+    async with get_db_context() as db:
+        # 1. Added: newer run 有但 older run 沒有
+        result = await db.execute(
+            _sql("""
+                INSERT INTO asset_change_event (
+                    run_id, asset_id, change_type,
+                    before_state, after_state, diff, detected_at
+                )
+                SELECT
+                    CAST(:newer AS uuid),
+                    ai.asset_id,
+                    'asset_added',
+                    NULL,
+                    ai.metadata,
+                    jsonb_build_object('asset_key', ai.asset_key, 'asset_type', ai.asset_type),
+                    NOW()
+                FROM asset_inventory ai
+                WHERE ai.asset_id IN (
+                    SELECT DISTINCT cs_new.asset_id FROM asset_coverage_snapshot cs_new
+                    WHERE cs_new.run_id = CAST(:newer AS uuid)
+                    EXCEPT
+                    SELECT DISTINCT cs_old.asset_id FROM asset_coverage_snapshot cs_old
+                    WHERE cs_old.run_id = CAST(:older AS uuid)
+                )
+                ON CONFLICT DO NOTHING
+            """),
+            {"newer": newer_run, "older": older_run},
+        )
+        stats["added"] = result.rowcount or 0
+
+        # 2. Removed: older 有但 newer 沒有
+        result = await db.execute(
+            _sql("""
+                INSERT INTO asset_change_event (
+                    run_id, asset_id, change_type,
+                    before_state, after_state, diff, detected_at
+                )
+                SELECT
+                    CAST(:newer AS uuid),
+                    ai.asset_id,
+                    'asset_removed',
+                    ai.metadata,
+                    NULL,
+                    jsonb_build_object('asset_key', ai.asset_key, 'asset_type', ai.asset_type),
+                    NOW()
+                FROM asset_inventory ai
+                WHERE ai.asset_id IN (
+                    SELECT DISTINCT cs_old.asset_id FROM asset_coverage_snapshot cs_old
+                    WHERE cs_old.run_id = CAST(:older AS uuid)
+                    EXCEPT
+                    SELECT DISTINCT cs_new.asset_id FROM asset_coverage_snapshot cs_new
+                    WHERE cs_new.run_id = CAST(:newer AS uuid)
+                )
+                ON CONFLICT DO NOTHING
+            """),
+            {"newer": newer_run, "older": older_run},
+        )
+        stats["removed"] = result.rowcount or 0
+
+        # 3. Modified: 兩次都在,lifecycle_state 有變化 (asset_scanner UPSERT 會改 lifecycle)
+        # 實務上 metadata 差異過於 noisy,只追蹤 lifecycle_state 變化
+        # 另外: pod phase 變化 (Running→CrashLoopBackOff 等) 也記
+        # 本 MVP 版偵測: asset.updated_at 比 asset.first_seen_at 新且相差在兩次 run 之間
+        # (簡化: 只記 lifecycle_state='deprecated' 被標的 asset,這些通常是新失去的 pods)
+        result = await db.execute(
+            _sql("""
+                INSERT INTO asset_change_event (
+                    run_id, asset_id, change_type,
+                    before_state, after_state, diff, detected_at
+                )
+                SELECT
+                    CAST(:newer AS uuid),
+                    ai.asset_id,
+                    'lifecycle_changed',
+                    jsonb_build_object('prior_state', 'active'),
+                    jsonb_build_object('new_state', ai.lifecycle_state),
+                    jsonb_build_object('asset_key', ai.asset_key),
+                    NOW()
+                FROM asset_inventory ai
+                WHERE ai.lifecycle_state = 'deprecated'
+                  AND ai.updated_at > NOW() - INTERVAL '2 hours'
+                  AND NOT EXISTS (
+                      SELECT 1 FROM asset_change_event ace
+                      WHERE ace.asset_id = ai.asset_id
+                        AND ace.change_type = 'lifecycle_changed'
+                        AND ace.detected_at > NOW() - INTERVAL '2 hours'
+                  )
+                ON CONFLICT DO NOTHING
+            """),
+            {"newer": newer_run},
+        )
+        stats["modified"] = result.rowcount or 0
+
+    return stats
+
+
+async def _log_aol(stats: dict[str, int], duration_ms: int, error: str | None) -> None:
+    try:
+        from sqlalchemy import text as _sql
+        from src.db.base import get_db_context
+
+        aol_status = "failed" if error else "success"
+        async with get_db_context() as db:
+            await db.execute(
+                _sql("""
+                    INSERT INTO automation_operation_log (
+                        operation_type, actor, status,
+                        input, output, duration_ms, error, tags
+                    ) VALUES (
+                        'asset_discovered',
+                        'asset_change_tracker',
+                        :st,
+                        '{}'::jsonb,
+                        CAST(:output AS jsonb),
+                        :dur, :err, :tags
+                    )
+                """),
+                {
+                    "st": aol_status,
+                    "output": _json.dumps(stats, ensure_ascii=False),
+                    "dur": duration_ms,
+                    "err": (error or "")[:2000] if error else None,
+                    "tags": ["change_tracker", "asset"],
+                },
+            )
+    except Exception as e:
+        logger.warning("asset_change_tracker_aol_failed", error=str(e))
--- a/apps/api/src/jobs/asset_scanner_job.py
+++ b/apps/api/src/jobs/asset_scanner_job.py
@@ -0,0 +1,918 @@
+"""
+Asset Scanner Job — ADR-090 §4.1 資產盤點 cron
+================================================
+每 1 小時掃描 K8s + 寫入 asset_inventory + asset_discovery_run + asset_coverage_snapshot.
+
+職責邊界:
+  ✅ K8s API 列出全部 namespace 的 pods/deployments/services (shallow scan)
+  ✅ UPSERT asset_inventory (asset_key 為 UNIQUE)
+  ✅ 為每個 active asset 寫 7 維 asset_coverage_snapshot (預設 unknown,後續 service 補)
+  ✅ 完成時寫 automation_operation_log(asset_discovered)
+  ❌ 不掃 Prometheus targets / Gitea repos / Docker compose (留下一階段)
+  ❌ 不算 capacity 欄位 (留 capacity_scanner_job)
+
+設計鐵律 (參考 ADR-090 §3.4):
+  - 同一個 asset 跨 run 沿用同 asset_id (asset_key 為自然鍵)
+  - 上次出現但這次沒出現的 asset → lifecycle_state='deprecated' + decommissioned_at
+  - run_id 串連 inventory 與 coverage_snapshot,提供完整稽核
+
+排程:
+  - 預設每 3600s (1 小時) 跑一次,首次延遲 60s 等其他 service init
+  - 由 main.py lifespan asyncio.create_task() 啟動
+
+2026-04-19 ogt + Claude Opus 4.7 (1M context) Asia/Taipei
+ADR-090 監控盲區治理 § Phase 7 Asset Inventory Foundation
+"""
+from __future__ import annotations
+
+import asyncio
+import json as _json
+import time as _time
+from typing import Any
+
+import structlog
+
+logger = structlog.get_logger(__name__)
+
+# ============================================================================
+# 排程參數
+# ============================================================================
+_SCAN_INTERVAL_SEC = 3600       # 每 1 小時
+_FIRST_DELAY_SEC = 60           # 啟動後等 60s 再首掃 (其他 service init)
+_KUBECTL_TIMEOUT_SEC = 30
+_LOOP_BACKOFF_SEC = 300         # 異常時 backoff 5 分鐘
+
+# 7 個自動化覆蓋維度 (ADR-090 §3.5)
+_COVERAGE_DIMENSIONS = (
+    "auto_monitoring", "auto_alerting", "auto_rule_creation",
+    "auto_rule_matching", "auto_playbook", "auto_remediation", "auto_km_creation",
+)
+
+# K8s asset_type 對應
+_K8S_RESOURCE_TO_ASSET_TYPE = {
+    "Pod": "container",
+    "Deployment": "k8s_workload",
+    "StatefulSet": "k8s_workload",
+    "DaemonSet": "k8s_workload",
+    "Service": "k8s_resource",
+    "ConfigMap": "k8s_resource",
+    "Secret": "secret",
+}
+
+
+# ============================================================================
+# Public entry — main.py lifespan 呼叫
+# ============================================================================
+
+async def run_asset_scanner_loop() -> None:
+    """
+    永久迴圈：每 _SCAN_INTERVAL_SEC 秒做一次資產盤點。
+
+    錯誤策略:
+    - 單次 scan 異常 → backoff 5 分鐘再試,不 crash
+    - 連續 5 次失敗 → 寫 ai_governance_event (Phase 6 自我治理) — TODO 後續實作
+    """
+    logger.info("asset_scanner_loop_started", interval_sec=_SCAN_INTERVAL_SEC)
+    await asyncio.sleep(_FIRST_DELAY_SEC)
+
+    while True:
+        try:
+            await scan_once(triggered_by="cron")
+        except Exception as e:
+            logger.exception("asset_scanner_loop_error", error=str(e))
+            await asyncio.sleep(_LOOP_BACKOFF_SEC)
+            continue
+        await asyncio.sleep(_SCAN_INTERVAL_SEC)
+
+
+async def scan_once(
+    triggered_by: str = "cron",
+    scope: tuple[str, ...] = ("k8s",),
+    scan_depth: str = "shallow",
+) -> str | None:
+    """
+    執行一次資產盤點。
+
+    Args:
+        triggered_by: cron / ai / human / incident
+        scope:        本次掃描範圍標籤 (寫入 asset_discovery_run.scope)
+        scan_depth:   shallow (僅 list) / deep (含 describe) / full
+
+    Returns:
+        run_id (UUID 字串) 或 None (寫 header 失敗時)
+    """
+    started_ms = _time.time()
+    run_id = await _start_discovery_run(triggered_by, list(scope), scan_depth)
+    if not run_id:
+        return None
+
+    new_count = 0
+    modified_count = 0
+    total_count = 0
+    error_msg: str | None = None
+
+    try:
+        # 2026-04-19 v3 擴充: 多資源類型掃描 + relationship 提取
+        # 資源類型: pods (container), deployments/statefulsets/daemonsets (k8s_workload),
+        #          services (k8s_resource), nodes (host), configmaps (k8s_resource)
+        # 跳過: secrets (awoooi-executor RBAC 不允許 list)
+        k8s_assets, relationships = await _collect_all_k8s_assets()
+        total_count = len(k8s_assets)
+
+        # UPSERT inventory
+        new_count, modified_count = await _upsert_assets(k8s_assets, run_id)
+
+        # 建立 asset_relationship (OwnerReference + Service selector + Pod volumes)
+        rel_written = await _upsert_relationships(relationships)
+
+        # 為每個 active asset 寫 7 維 coverage (預設 unknown,後續其他 service 升級為 green/yellow/red)
+        await _write_coverage_snapshots(run_id)
+
+        logger.info("asset_scan_relationships_written", run_id=run_id, relationships=rel_written)
+
+    except Exception as e:
+        error_msg = f"{type(e).__name__}: {e}"[:1000]
+        logger.exception("asset_scan_once_failed", run_id=run_id, error=error_msg)
+
+    duration_ms = int((_time.time() - started_ms) * 1000)
+    final_status = "failed" if error_msg else "success"
+
+    await _finish_discovery_run(
+        run_id=run_id,
+        status=final_status,
+        total_assets=total_count,
+        new_assets=new_count,
+        modified_assets=modified_count,
+        duration_ms=duration_ms,
+        error=error_msg,
+    )
+
+    # ADR-090 § aol 留痕 — asset_discovered 是合法 op_type
+    await _log_aol_asset_discovered(
+        run_id=run_id,
+        triggered_by=triggered_by,
+        total=total_count,
+        new=new_count,
+        modified=modified_count,
+        duration_ms=duration_ms,
+        status=final_status,
+        error=error_msg,
+    )
+
+    logger.info(
+        "asset_scan_once_done",
+        run_id=run_id,
+        status=final_status,
+        total=total_count,
+        new=new_count,
+        modified=modified_count,
+        duration_ms=duration_ms,
+    )
+    return run_id
+
+
+# ============================================================================
+# K8s 資產收集
+# ============================================================================
+
+async def _fetch_kubectl_json(resource: str, all_namespaces: bool = True) -> dict[str, Any]:
+    """
+    subprocess 執行 kubectl get <resource> --all-namespaces -o json (或 nodes 不帶 ns).
+    回傳 parse 後的 payload dict ({'items': [...]}).
+    """
+    cmd = ["kubectl", "get", resource, "-o", "json"]
+    if all_namespaces:
+        cmd.insert(3, "--all-namespaces")
+    proc = await asyncio.wait_for(
+        asyncio.create_subprocess_exec(
+            *cmd,
+            stdout=asyncio.subprocess.PIPE,
+            stderr=asyncio.subprocess.PIPE,
+        ),
+        timeout=_KUBECTL_TIMEOUT_SEC,
+    )
+    stdout, stderr = await asyncio.wait_for(proc.communicate(), timeout=_KUBECTL_TIMEOUT_SEC)
+    if proc.returncode != 0:
+        raise RuntimeError(f"kubectl {resource} failed rc={proc.returncode}: {stderr.decode('utf-8', errors='replace')[:300]}")
+    try:
+        return _json.loads(stdout.decode("utf-8", errors="replace"))
+    except _json.JSONDecodeError as e:
+        raise RuntimeError(f"kubectl {resource} JSON parse failed: {e}") from e
+
+
+def _build_pod_asset(item: dict[str, Any]) -> dict[str, Any]:
+    """Pod → asset_inventory row (asset_type='container')."""
+    meta = item.get("metadata", {}) or {}
+    spec = item.get("spec", {}) or {}
+    ns = meta.get("namespace") or "default"
+    name = meta.get("name") or "unknown"
+    node = spec.get("nodeName") or ""
+    labels = meta.get("labels", {}) or {}
+
+    tags = []
+    for k in ("app", "environment", "system"):
+        if labels.get(k):
+            tags.append(f"{k if k != 'environment' else 'env'}:{labels[k]}")
+
+    return {
+        "asset_key": f"k8s/pod/{ns}/{name}",
+        "asset_type": "container",
+        "host": node or None,
+        "namespace": ns,
+        "name": name,
+        "metadata": {
+            "owner_references": meta.get("ownerReferences", []),
+            "labels": labels,
+            "phase": (item.get("status", {}) or {}).get("phase", ""),
+            "node": node,
+            "volumes": [
+                {
+                    "name": v.get("name"),
+                    "configMap": v.get("configMap", {}).get("name"),
+                    "secret": v.get("secret", {}).get("secretName"),
+                }
+                for v in (spec.get("volumes") or [])
+                if v.get("configMap") or v.get("secret")
+            ],
+        },
+        "tags": tags,
+    }
+
+
+def _build_workload_asset(item: dict[str, Any], kind: str) -> dict[str, Any]:
+    """Deployment/StatefulSet/DaemonSet → asset_inventory row (asset_type='k8s_workload')."""
+    meta = item.get("metadata", {}) or {}
+    ns = meta.get("namespace") or "default"
+    name = meta.get("name") or "unknown"
+    labels = meta.get("labels", {}) or {}
+    spec = item.get("spec", {}) or {}
+    status = item.get("status", {}) or {}
+
+    return {
+        "asset_key": f"k8s/{kind.lower()}/{ns}/{name}",
+        "asset_type": "k8s_workload",
+        "host": None,
+        "namespace": ns,
+        "name": name,
+        "metadata": {
+            "kind": kind,
+            "labels": labels,
+            "replicas": spec.get("replicas"),
+            "ready_replicas": status.get("readyReplicas"),
+            "selector": (spec.get("selector", {}) or {}).get("matchLabels", {}),
+        },
+        "tags": [f"kind:{kind}"] + [f"app:{labels['app']}"] if labels.get("app") else [f"kind:{kind}"],
+    }
+
+
+def _build_service_asset(item: dict[str, Any]) -> dict[str, Any]:
+    """Service → asset_inventory row (asset_type='k8s_resource')."""
+    meta = item.get("metadata", {}) or {}
+    ns = meta.get("namespace") or "default"
+    name = meta.get("name") or "unknown"
+    spec = item.get("spec", {}) or {}
+
+    return {
+        "asset_key": f"k8s/service/{ns}/{name}",
+        "asset_type": "k8s_resource",
+        "host": None,
+        "namespace": ns,
+        "name": name,
+        "metadata": {
+            "kind": "Service",
+            "type": spec.get("type"),
+            "cluster_ip": spec.get("clusterIP"),
+            "selector": spec.get("selector", {}),
+            "ports": spec.get("ports", []),
+        },
+        "tags": [f"svc_type:{spec.get('type', '')}"],
+    }
+
+
+def _build_node_asset(item: dict[str, Any]) -> dict[str, Any]:
+    """Node → asset_inventory row (asset_type='host')."""
+    meta = item.get("metadata", {}) or {}
+    name = meta.get("name") or "unknown"
+    labels = meta.get("labels", {}) or {}
+    status = item.get("status", {}) or {}
+    addresses = status.get("addresses", []) or []
+    internal_ip = next((a["address"] for a in addresses if a.get("type") == "InternalIP"), "")
+
+    return {
+        "asset_key": f"k8s/node/{name}",
+        "asset_type": "host",
+        "host": name,
+        "namespace": None,
+        "name": name,
+        "metadata": {
+            "kind": "Node",
+            "internal_ip": internal_ip,
+            "labels": labels,
+            "capacity": status.get("capacity", {}),
+            "conditions": [
+                {"type": c.get("type"), "status": c.get("status")}
+                for c in status.get("conditions", [])
+            ],
+        },
+        "tags": [f"role:{labels.get('kubernetes.io/role', 'worker')}"],
+    }
+
+
+def _build_configmap_asset(item: dict[str, Any]) -> dict[str, Any]:
+    """ConfigMap → asset_inventory row (asset_type='k8s_resource')."""
+    meta = item.get("metadata", {}) or {}
+    ns = meta.get("namespace") or "default"
+    name = meta.get("name") or "unknown"
+
+    return {
+        "asset_key": f"k8s/configmap/{ns}/{name}",
+        "asset_type": "k8s_resource",
+        "host": None,
+        "namespace": ns,
+        "name": name,
+        "metadata": {
+            "kind": "ConfigMap",
+            "labels": meta.get("labels", {}),
+            "keys": list((item.get("data") or {}).keys()),
+            "creationTimestamp": meta.get("creationTimestamp"),
+        },
+        "tags": ["kind:ConfigMap"],
+    }
+
+
+async def _collect_all_k8s_assets() -> tuple[list[dict[str, Any]], list[dict[str, str]]]:
+    """
+    掃多種 K8s 資源類型 + 提取 relationship.
+
+    Relationships:
+      - Pod ─ depends_on ─> Deployment (via ReplicaSet 橋接 owner chain)
+      - Pod ─ depends_on ─> StatefulSet/DaemonSet (ownerReferences 直連)
+      - Service ─ routes_to ─> Pod (via spec.selector 匹配 Pod.labels)
+      - Pod ─ depends_on ─> ConfigMap (via spec.volumes[].configMap.name)
+
+    2026-04-19 ogt + Claude Opus 4.7 v3 bug fix:
+      Pod.ownerReferences[0].kind 99% 是 ReplicaSet (Deployment 管 ReplicaSet 管 Pod),
+      原 code 跳過 ReplicaSet → Pod→Deployment 全部漏掉.
+      修: 先掃 ReplicaSet 建 rs_to_deployment map,Pod 用 rs_name 反查 Deployment.
+
+    回傳 (assets, relationships) tuple.
+    """
+    assets: list[dict[str, Any]] = []
+    relationships: list[dict[str, str]] = []
+
+    # 0. ReplicaSets — 僅作為 Pod→Deployment 橋樑,不寫入 asset_inventory
+    rs_to_deployment: dict[str, str] = {}  # "ns/rs_name" -> "deployment_name"
+    try:
+        payload = await _fetch_kubectl_json("replicasets")
+        for item in payload.get("items", []) or []:
+            meta = item.get("metadata", {}) or {}
+            rs_ns = meta.get("namespace") or "default"
+            rs_name = meta.get("name") or ""
+            for ref in meta.get("ownerReferences", []) or []:
+                if ref.get("kind", "").lower() == "deployment":
+                    rs_to_deployment[f"{rs_ns}/{rs_name}"] = ref.get("name", "")
+    except Exception as e:
+        logger.warning("collect_replicasets_failed", error=str(e))
+
+    # 1. Nodes (不帶 ns)
+    try:
+        payload = await _fetch_kubectl_json("nodes", all_namespaces=False)
+        for item in payload.get("items", []) or []:
+            assets.append(_build_node_asset(item))
+    except Exception as e:
+        logger.warning("collect_nodes_failed", error=str(e))
+
+    # 2. Pods — 主體 + 從 ownerReferences 建 relationship
+    pod_by_key: dict[str, dict[str, Any]] = {}
+    try:
+        payload = await _fetch_kubectl_json("pods")
+        for item in payload.get("items", []) or []:
+            a = _build_pod_asset(item)
+            assets.append(a)
+            pod_by_key[a["asset_key"]] = item
+
+            meta = item.get("metadata", {}) or {}
+            ns = meta.get("namespace") or "default"
+            for ref in meta.get("ownerReferences", []) or []:
+                owner_kind = ref.get("kind", "").lower()
+                owner_name = ref.get("name", "")
+                if not owner_name:
+                    continue
+                # StatefulSet/DaemonSet 直接 owner Pod,直接建 relationship
+                if owner_kind in ("statefulset", "daemonset"):
+                    relationships.append({
+                        "from_key": a["asset_key"],
+                        "to_key": f"k8s/{owner_kind}/{ns}/{owner_name}",
+                        "relationship_type": "depends_on",
+                    })
+                # ReplicaSet 中介: 用 rs_to_deployment map 反查 Deployment
+                elif owner_kind == "replicaset":
+                    deploy_name = rs_to_deployment.get(f"{ns}/{owner_name}")
+                    if deploy_name:
+                        relationships.append({
+                            "from_key": a["asset_key"],
+                            "to_key": f"k8s/deployment/{ns}/{deploy_name}",
+                            "relationship_type": "depends_on",
+                        })
+                # 極少數直接是 Deployment owner (舊版 K8s)
+                elif owner_kind == "deployment":
+                    relationships.append({
+                        "from_key": a["asset_key"],
+                        "to_key": f"k8s/deployment/{ns}/{owner_name}",
+                        "relationship_type": "depends_on",
+                    })
+
+            # Pod volumes → ConfigMap relationship
+            for v in (item.get("spec", {}) or {}).get("volumes", []) or []:
+                cm = (v.get("configMap") or {}).get("name")
+                if cm:
+                    relationships.append({
+                        "from_key": a["asset_key"],
+                        "to_key": f"k8s/configmap/{ns}/{cm}",
+                        "relationship_type": "depends_on",
+                    })
+    except Exception as e:
+        logger.warning("collect_pods_failed", error=str(e))
+
+    # 3. Deployments / StatefulSets / DaemonSets
+    for kind, resource in (("Deployment", "deployments"), ("StatefulSet", "statefulsets"), ("DaemonSet", "daemonsets")):
+        try:
+            payload = await _fetch_kubectl_json(resource)
+            for item in payload.get("items", []) or []:
+                assets.append(_build_workload_asset(item, kind))
+        except Exception as e:
+            logger.warning(f"collect_{resource}_failed", error=str(e))
+
+    # 4. Services + routes_to Pod (via selector match)
+    try:
+        payload = await _fetch_kubectl_json("services")
+        for item in payload.get("items", []) or []:
+            svc = _build_service_asset(item)
+            assets.append(svc)
+
+            # 為該 Service 找出匹配的 Pod
+            selector = (item.get("spec", {}) or {}).get("selector") or {}
+            if not selector:
+                continue
+            svc_ns = (item.get("metadata", {}) or {}).get("namespace") or "default"
+            for pod_key, pod_item in pod_by_key.items():
+                if not pod_key.startswith(f"k8s/pod/{svc_ns}/"):
+                    continue
+                pod_labels = (pod_item.get("metadata", {}) or {}).get("labels", {}) or {}
+                # selector 所有 kv 必須都在 pod labels 內
+                if all(pod_labels.get(k) == v for k, v in selector.items()):
+                    relationships.append({
+                        "from_key": svc["asset_key"],
+                        "to_key": pod_key,
+                        "relationship_type": "routes_to",
+                    })
+    except Exception as e:
+        logger.warning("collect_services_failed", error=str(e))
+
+    # 5. ConfigMaps
+    try:
+        payload = await _fetch_kubectl_json("configmaps")
+        for item in payload.get("items", []) or []:
+            assets.append(_build_configmap_asset(item))
+    except Exception as e:
+        logger.warning("collect_configmaps_failed", error=str(e))
+
+    # 6. Prometheus targets — 補齊 host-install services (110/112/188/125 等非 K8s)
+    #    Gap 1 修補 (2026-04-19 audit): 原本 asset_inventory 只涵蓋 K8s,
+    #    110 Harbor/Gitea/監控 + 188 PostgreSQL/Redis/Ollama host-install 全漏
+    #    用 Prometheus /api/v1/targets 自動發現全節點服務
+    try:
+        prom_assets, host_relationships = await _collect_prometheus_targets()
+        assets.extend(prom_assets)
+        relationships.extend(host_relationships)
+    except Exception as e:
+        logger.warning("collect_prometheus_targets_failed", error=str(e))
+
+    return assets, relationships
+
+
+def _is_valid_ipv4(s: str) -> bool:
+    """嚴格 IPv4 判斷: 4 段 + 每段 0-255 整數.
+
+    避免 '125' (短名) / 'cadvisor-110' (hostname) 被誤判為 IP.
+    """
+    if not s or s.count(".") != 3:
+        return False
+    parts = s.split(".")
+    if len(parts) != 4:
+        return False
+    for p in parts:
+        if not p or not p.isdigit():
+            return False
+        try:
+            n = int(p)
+        except ValueError:
+            return False
+        if n < 0 or n > 255:
+            return False
+    return True
+
+
+async def _collect_prometheus_targets() -> tuple[list[dict[str, Any]], list[dict[str, str]]]:
+    """
+    從 Prometheus /api/v1/targets 發現所有被監控的 host-install service + 主機.
+
+    每個 target 建 third_party_service / host_service asset.
+    每個 unique IP 建 host asset (若尚未存在).
+    target → host 建 depends_on relationship.
+    """
+    import httpx
+    from src.core.config import settings
+
+    assets: list[dict[str, Any]] = []
+    relationships: list[dict[str, str]] = []
+    seen_hosts: set[str] = set()
+
+    url = f"{settings.PROMETHEUS_URL.rstrip('/')}/api/v1/targets"
+    try:
+        async with httpx.AsyncClient(timeout=10.0, trust_env=False) as client:
+            resp = await client.get(url, params={"state": "active"})
+            resp.raise_for_status()
+            data = resp.json()
+    except Exception as e:
+        logger.warning("prometheus_targets_fetch_failed", error=str(e))
+        return assets, relationships
+
+    for t in (data.get("data", {}) or {}).get("activeTargets", []) or []:
+        labels = t.get("labels", {}) or {}
+        instance = labels.get("instance", "")
+        job = labels.get("job", "")
+        if not instance or not job:
+            continue
+
+        # 2026-04-19 Audit 1 修: 嚴格 IPv4 判斷
+        # 原 code bug: labels.host="125" (短名) 被 "125".replace(".","").isdigit()=True 誤判 IP
+        # 修: 優先從 instance 抽 IP (IP:port 形式或純 IP 無 port),嚴格 4 段 0-255 驗證
+        # labels.host 可能是短名不可靠,只信 instance
+        instance_host = instance.split(":")[0] if ":" in instance else instance
+        host_ip = instance_host if _is_valid_ipv4(instance_host) else ""
+
+        if not host_ip:
+            # target instance 不是 IP 形式 → 建 third_party_service asset 但 host 留空
+            asset_key = f"prometheus_target/{job}/{instance}"
+            assets.append({
+                "asset_key": asset_key,
+                "asset_type": "third_party_service",
+                "host": None,
+                "namespace": None,
+                "name": f"{job}:{instance}",
+                "metadata": {
+                    "job": job,
+                    "instance": instance,
+                    "scrape_url": t.get("scrapeUrl"),
+                    "health": t.get("health"),
+                    "labels": labels,
+                },
+                "tags": [f"job:{job}", "source:prometheus_target"],
+            })
+            continue
+
+        # IP 形式 target — 用 'monitoring_target' (asset_inventory CHECK 允許列表)
+        # host_service 不在 ADR-090 asset_type CHECK 內,之前 1 筆 125:32334 scan 拋
+        # CheckViolationError (constraint asset_inventory_type_valid)
+        asset_key = f"prometheus_target/{job}/{instance}"
+        assets.append({
+            "asset_key": asset_key,
+            "asset_type": "monitoring_target",
+            "host": host_ip,
+            "namespace": None,
+            "name": f"{job}@{host_ip}",
+            "metadata": {
+                "job": job,
+                "instance": instance,
+                "scrape_url": t.get("scrapeUrl"),
+                "health": t.get("health"),
+                "labels": labels,
+            },
+            "tags": [f"job:{job}", f"host:{host_ip}", "source:prometheus_target"],
+        })
+
+        # 對每個 IP 建 host asset (若尚未)
+        if host_ip not in seen_hosts:
+            seen_hosts.add(host_ip)
+            host_key = f"host/{host_ip}"
+            assets.append({
+                "asset_key": host_key,
+                "asset_type": "host",
+                "host": host_ip,
+                "namespace": None,
+                "name": host_ip,
+                "metadata": {
+                    "discovered_by": "prometheus_targets",
+                    "source": "blackbox_icmp_or_node_exporter",
+                },
+                "tags": [f"ip:{host_ip}", "source:prometheus"],
+            })
+
+        # 建 target → host 的 depends_on relationship
+        relationships.append({
+            "from_key": asset_key,
+            "to_key": f"host/{host_ip}",
+            "relationship_type": "depends_on",
+        })
+
+    logger.info("prometheus_targets_collected", count=len(assets), hosts=len(seen_hosts))
+    return assets, relationships
+
+
+# ============================================================================
+# DB 寫入
+# ============================================================================
+
+async def _start_discovery_run(
+    triggered_by: str,
+    scope: list[str],
+    scan_depth: str,
+) -> str | None:
+    """
+    寫 asset_discovery_run header (status='running'), 回傳 run_id。
+
+    失敗 → log warning + 回 None,主流程靜默跳過本次 scan。
+    """
+    try:
+        from sqlalchemy import text as _sql
+        from src.db.base import get_db_context
+
+        async with get_db_context() as db:
+            row = await db.execute(
+                _sql("""
+                    INSERT INTO asset_discovery_run (
+                        triggered_by, scope, scan_depth, status,
+                        new_assets, modified_assets, disappeared_assets,
+                        tools_used
+                    ) VALUES (
+                        :tb, :scope, :sd, 'running',
+                        0, 0, 0,
+                        CAST(:tools AS jsonb)
+                    )
+                    RETURNING run_id
+                """),
+                {
+                    "tb": triggered_by,
+                    "scope": scope,
+                    "sd": scan_depth,
+                    "tools": _json.dumps({"k8s": "kubectl_get pods --all-namespaces"}),
+                },
+            )
+            run_id = row.scalar()
+            return str(run_id) if run_id else None
+    except Exception as e:
+        logger.warning("asset_discovery_run_start_failed", error=str(e))
+        return None
+
+
+async def _finish_discovery_run(
+    run_id: str,
+    status: str,
+    total_assets: int,
+    new_assets: int,
+    modified_assets: int,
+    duration_ms: int,
+    error: str | None,
+) -> None:
+    try:
+        from sqlalchemy import text as _sql
+        from src.db.base import get_db_context
+
+        async with get_db_context() as db:
+            await db.execute(
+                _sql("""
+                    UPDATE asset_discovery_run
+                    SET status = :st,
+                        ended_at = NOW(),
+                        total_assets = :total,
+                        new_assets = :new,
+                        modified_assets = :mod,
+                        duration_ms = :dur,
+                        error = :err
+                    WHERE run_id = CAST(:rid AS uuid)
+                """),
+                {
+                    "st": status,
+                    "total": total_assets,
+                    "new": new_assets,
+                    "mod": modified_assets,
+                    "dur": duration_ms,
+                    "err": error,
+                    "rid": run_id,
+                },
+            )
+    except Exception as e:
+        logger.warning("asset_discovery_run_finish_failed", run_id=run_id, error=str(e))
+
+
+async def _upsert_assets(
+    assets: list[dict[str, Any]],
+    run_id: str,
+) -> tuple[int, int]:
+    """
+    UPSERT asset_inventory,回傳 (new_count, modified_count)。
+
+    用 ON CONFLICT (asset_key) DO UPDATE 確保 idempotent。
+    """
+    if not assets:
+        return 0, 0
+
+    from sqlalchemy import text as _sql
+    from src.db.base import get_db_context
+
+    new_count = 0
+    modified_count = 0
+
+    try:
+        async with get_db_context() as db:
+            for a in assets:
+                # xmax = 0 表示 INSERT (新),否則表示 UPDATE (修改)
+                row = await db.execute(
+                    _sql("""
+                        INSERT INTO asset_inventory (
+                            asset_key, asset_type, host, namespace, name,
+                            metadata, tags, environment, lifecycle_state,
+                            first_seen_at, last_seen_at
+                        ) VALUES (
+                            :ak, :at, :host, :ns, :name,
+                            CAST(:md AS jsonb), :tags, 'prod', 'active',
+                            NOW(), NOW()
+                        )
+                        ON CONFLICT (asset_key) DO UPDATE
+                          SET last_seen_at = NOW(),
+                              host = EXCLUDED.host,
+                              metadata = EXCLUDED.metadata,
+                              tags = EXCLUDED.tags,
+                              lifecycle_state = 'active',
+                              updated_at = NOW(),
+                              decommissioned_at = NULL
+                        RETURNING asset_id, (xmax = 0) AS inserted
+                    """),
+                    {
+                        "ak": a["asset_key"],
+                        "at": a["asset_type"],
+                        "host": a["host"],
+                        "ns": a["namespace"],
+                        "name": a["name"],
+                        "md": _json.dumps(a["metadata"], ensure_ascii=False),
+                        "tags": a["tags"],
+                    },
+                )
+                _, inserted = row.one()
+                if inserted:
+                    new_count += 1
+                else:
+                    modified_count += 1
+    except Exception as e:
+        logger.exception("asset_upsert_failed", run_id=run_id, error=str(e))
+
+    return new_count, modified_count
+
+
+async def _upsert_relationships(relationships: list[dict[str, str]]) -> int:
+    """
+    UPSERT asset_relationship (from_asset_id/to_asset_id/relationship_type 為 UNIQUE).
+
+    用 asset_key 查 asset_id 後 INSERT,忽略 asset 不存在的 relationship.
+    回傳實際寫入 (新建/更新) 筆數.
+    """
+    if not relationships:
+        return 0
+
+    from sqlalchemy import text as _sql
+    from src.db.base import get_db_context
+
+    written = 0
+    try:
+        async with get_db_context() as db:
+            for rel in relationships:
+                try:
+                    # 用 asset_key → asset_id 解析,同時 UPSERT
+                    await db.execute(
+                        _sql("""
+                            INSERT INTO asset_relationship (
+                                from_asset_id, to_asset_id, relationship_type,
+                                first_detected_at, last_verified_at, is_active
+                            )
+                            SELECT a1.asset_id, a2.asset_id, :rt, NOW(), NOW(), true
+                            FROM asset_inventory a1, asset_inventory a2
+                            WHERE a1.asset_key = :from_key AND a2.asset_key = :to_key
+                              AND a1.asset_id <> a2.asset_id
+                            ON CONFLICT (from_asset_id, to_asset_id, relationship_type) DO UPDATE
+                              SET last_verified_at = NOW(),
+                                  is_active = true
+                        """),
+                        {
+                            "from_key": rel["from_key"],
+                            "to_key": rel["to_key"],
+                            "rt": rel["relationship_type"],
+                        },
+                    )
+                    written += 1
+                except Exception as e:
+                    logger.debug("relationship_upsert_skipped",
+                                 from_key=rel["from_key"], to_key=rel["to_key"], error=str(e))
+    except Exception as e:
+        logger.warning("relationship_upsert_failed", error=str(e))
+    return written
+
+
+async def _write_coverage_snapshots(run_id: str) -> None:
+    """
+    為本次 run 中的所有 active asset 寫 7 維 coverage_snapshot (預設 unknown)。
+
+    後續 service (rule_catalog / playbook_extractor / km_writer) 會 UPDATE 對應維度。
+    """
+    try:
+        from sqlalchemy import text as _sql
+        from src.db.base import get_db_context
+
+        async with get_db_context() as db:
+            # 一次性 INSERT: 取所有 active asset × 7 dimensions
+            await db.execute(
+                _sql("""
+                    INSERT INTO asset_coverage_snapshot (
+                        run_id, asset_id, dimension, coverage_status,
+                        evidence, detected_by
+                    )
+                    SELECT
+                        CAST(:rid AS uuid),
+                        ai.asset_id,
+                        d.dimension,
+                        'unknown' AS coverage_status,
+                        '{}'::jsonb,
+                        'asset_scanner' AS detected_by
+                    FROM asset_inventory ai
+                    CROSS JOIN (
+                        VALUES ('auto_monitoring'),('auto_alerting'),('auto_rule_creation'),
+                               ('auto_rule_matching'),('auto_playbook'),('auto_remediation'),
+                               ('auto_km_creation')
+                    ) AS d(dimension)
+                    WHERE ai.lifecycle_state = 'active'
+                    ON CONFLICT (run_id, asset_id, dimension) DO NOTHING
+                """),
+                {"rid": run_id},
+            )
+    except Exception as e:
+        logger.warning("asset_coverage_write_failed", run_id=run_id, error=str(e))
+
+
+async def _log_aol_asset_discovered(
+    run_id: str,
+    triggered_by: str,
+    total: int,
+    new: int,
+    modified: int,
+    duration_ms: int,
+    status: str,
+    error: str | None,
+) -> None:
+    """寫 automation_operation_log(asset_discovered)。失敗只 log 不阻塞。"""
+    try:
+        from sqlalchemy import text as _sql
+        from src.db.base import get_db_context
+
+        aol_status = "success" if status == "success" else "failed"
+        input_payload = {
+            "run_id": run_id,
+            "triggered_by": triggered_by,
+            "scope": ["k8s"],
+            "scan_depth": "shallow",
+        }
+        output_payload = {
+            "run_id": run_id,
+            "total_assets": total,
+            "new_assets": new,
+            "modified_assets": modified,
+        }
+
+        async with get_db_context() as db:
+            await db.execute(
+                _sql("""
+                    INSERT INTO automation_operation_log (
+                        operation_type, actor, status,
+                        input, output, run_id, duration_ms, error, tags
+                    ) VALUES (
+                        'asset_discovered',
+                        'asset_scanner',
+                        :st,
+                        CAST(:input AS jsonb),
+                        CAST(:output AS jsonb),
+                        CAST(:rid AS uuid),
+                        :dur, :err,
+                        :tags
+                    )
+                """),
+                {
+                    "st": aol_status,
+                    "input": _json.dumps(input_payload, ensure_ascii=False),
+                    "output": _json.dumps(output_payload, ensure_ascii=False),
+                    "rid": run_id,
+                    "dur": duration_ms,
+                    "err": (error or "")[:2000] if error else None,
+                    "tags": ["asset_scanner", "discovery", "k8s"],
+                },
+            )
+    except Exception as e:
+        logger.warning("asset_scanner_aol_write_failed", run_id=run_id, error=str(e))
--- a/apps/api/src/jobs/baseline_snapshot.py
+++ b/apps/api/src/jobs/baseline_snapshot.py
@@ -0,0 +1,338 @@
+"""
+AWOOOI AIOps Phase 0 — 基線快照 Job
+=====================================
+拍攝 AI 自主化飛輪「啟動前現況」，作為 Phase 0→1 進展衡量基準。
+
+快照涵蓋 ADR-080 診斷表中的 6 大指標：
+  1. MCP 呼叫次數/24h（目標：> 0；現況預估：0）
+  2. Playbook trust/confidence 分佈（目標：動態；現況：全靜態）
+  3. 學習閉環觸發率（目標：≥ 99%；現況：0%，fire-and-forget）
+  4. 告警分類 general 比例（目標：< 10%；現況：~ 41%）
+  5. 修復動作 RESTART 比例（目標：< 40%；現況：~ 68%）
+  6. 自動執行成功次數/24h（目標：> 0；現況：0）
+
+儲存策略：
+  - Redis Key `aiops:baseline:{timestamp_iso}` — 最新快照（TTL 永不過期）
+  - Redis Key `aiops:baseline:latest` — 指向最新快照的時間戳（方便 API 讀取）
+
+使用方式：
+  python -m src.jobs.baseline_snapshot          # 直接執行（一次性）
+  await take_baseline_snapshot()                 # 從程式碼呼叫
+
+ADR-080: AI 自主化飛輪總綱
+MASTER: docs/superpowers/specs/2026-04-15-MASTER-ai-autonomous-flywheel-v2.md §5 Phase 0
+
+2026-04-15 ogt + Claude Sonnet 4.6 (亞太): Phase 0 — 初始建立
+"""
+
+from __future__ import annotations
+
+import asyncio
+import json
+from datetime import timedelta
+
+import structlog
+from sqlalchemy import func, select, text
+
+from src.core.redis_client import get_redis
+from src.db.base import get_db_context
+from src.db.models import (
+    AutoRepairExecution,
+    IncidentRecord,
+    KnowledgeEntryRecord,
+)
+from src.utils.timezone import now_taipei
+
+logger = structlog.get_logger(__name__)
+
+# Redis 鍵
+BASELINE_KEY_PREFIX = "aiops:baseline:"
+BASELINE_LATEST_KEY = "aiops:baseline:latest"
+
+# Playbook Redis 前綴（同 playbook_repository.py）
+PLAYBOOK_KEY_PREFIX = "playbook:"
+
+
+async def take_baseline_snapshot() -> dict:
+    """
+    拍攝一次完整基線快照並寫入 Redis。
+
+    Returns:
+        dict: 快照內容（含 snapshot_at 時間戳）
+    """
+    now = now_taipei()
+    since_24h = now - timedelta(hours=24)
+    ts_iso = now.isoformat()
+
+    logger.info("baseline_snapshot_start", snapshot_at=ts_iso)
+
+    snapshot = {
+        "snapshot_at": ts_iso,
+        "phase": "P0",
+        "description": "AI 自主化飛輪 Phase 0 啟動前基線",
+        "metrics": {},
+    }
+
+    # ── 1. MCP 呼叫次數/24h ───────────────────────────────────────────────
+    # Phase 0 時 MCP 尚未接入任何決策流程 → 預期為 0
+    # Phase 1 完成後此數字應 > 0（PreDecisionInvestigator 開始呼叫）
+    mcp_calls_24h = await _count_mcp_calls_24h(since_24h)
+    snapshot["metrics"]["mcp_calls_24h"] = mcp_calls_24h
+
+    # ── 2. Playbook confidence 分佈（Redis 掃描）──────────────────────────
+    playbook_stats = await _playbook_confidence_stats()
+    snapshot["metrics"]["playbook"] = playbook_stats
+
+    # ── 3. 學習閉環觸發率 + 其他 DB 指標 ─────────────────────────────────
+    db_metrics = await _db_metrics(since_24h)
+    snapshot["metrics"].update(db_metrics)
+
+    # ── 4. 計算衍生指標 ───────────────────────────────────────────────────
+    snapshot["metrics"]["learning_loop_rate"] = _compute_learning_rate(
+        db_metrics.get("auto_repair_24h", 0),
+        db_metrics.get("learning_writes_24h", 0),
+    )
+
+    # ── 寫入 Redis ─────────────────────────────────────────────────────────
+    await _persist_to_redis(ts_iso, snapshot)
+
+    logger.info(
+        "baseline_snapshot_done",
+        snapshot_at=ts_iso,
+        mcp_calls_24h=mcp_calls_24h,
+        playbook_total=playbook_stats.get("total", 0),
+        incidents_24h=db_metrics.get("incidents_24h", 0),
+        auto_repair_success_24h=db_metrics.get("auto_repair_success_24h", 0),
+    )
+
+    return snapshot
+
+
+# ─────────────────────────────────────────────────────────────────────────────
+# Internal helpers
+# ─────────────────────────────────────────────────────────────────────────────
+
+
+async def _count_mcp_calls_24h(since_24h) -> int:
+    """
+    MCP 呼叫次數/24h。
+
+    Phase 0：無 MCP Calls Table → 從 audit_logs 嘗試計數。
+    Phase 1 建立 PreDecisionInvestigator 後，此處改為查 mcp_tool_calls 表。
+    """
+    try:
+        async with get_db_context() as db:
+            # audit_logs 中 action='mcp_call' — Phase 0 預期 0 筆
+            result = await db.execute(
+                text(
+                    "SELECT COUNT(*) FROM audit_logs "
+                    "WHERE action = 'mcp_call' AND created_at >= :since"
+                ),
+                {"since": since_24h},
+            )
+            return result.scalar_one_or_none() or 0
+    except Exception:
+        logger.exception("baseline_mcp_count_error")
+        return 0
+
+
+async def _playbook_confidence_stats() -> dict:
+    """
+    掃描 Redis 中全部 Playbook，統計 ai_confidence 分佈。
+
+    指標診斷：
+    - avg_confidence ≈ 0.3 → 佐證「全靜態」現況（Phase 0 基線）
+    - Phase 3 EWMA 上線後，此值應動態分散（std_dev 升高、avg 可能提升）
+    """
+    stats = {
+        "total": 0,
+        "approved": 0,
+        "avg_confidence": 0.0,
+        "min_confidence": None,
+        "max_confidence": None,
+        "never_used": 0,       # success_count + failure_count == 0
+        "action_type_dist": {},
+    }
+
+    try:
+        redis = get_redis()
+        confidences: list[float] = []
+        action_counts: dict[str, int] = {}
+
+        async for key in redis.scan_iter(match=f"{PLAYBOOK_KEY_PREFIX}PB-*", count=200):
+            raw = await redis.get(key)
+            if not raw:
+                continue
+            try:
+                pb = json.loads(raw)
+            except json.JSONDecodeError:
+                continue
+
+            stats["total"] += 1
+
+            if pb.get("status") == "approved":
+                stats["approved"] += 1
+
+            conf = pb.get("ai_confidence", 0.0) or 0.0
+            confidences.append(conf)
+
+            used = (pb.get("success_count", 0) or 0) + (pb.get("failure_count", 0) or 0)
+            if used == 0:
+                stats["never_used"] += 1
+
+            # 統計 repair_steps 中首個 action_type（代表主要修復動作）
+            steps = pb.get("repair_steps", [])
+            if steps:
+                first_action = steps[0].get("action_type", "unknown")
+                action_counts[first_action] = action_counts.get(first_action, 0) + 1
+
+        if confidences:
+            stats["avg_confidence"] = round(sum(confidences) / len(confidences), 4)
+            stats["min_confidence"] = round(min(confidences), 4)
+            stats["max_confidence"] = round(max(confidences), 4)
+
+        # RESTART 比例：佐證 ADR-080 診斷（目標 < 40%）
+        total_actions = sum(action_counts.values())
+        restart_count = action_counts.get("restart_service", 0)
+        stats["restart_ratio"] = round(restart_count / total_actions, 4) if total_actions else 0.0
+        stats["action_type_dist"] = action_counts
+
+    except Exception:
+        logger.exception("baseline_playbook_stats_error")
+
+    return stats
+
+
+async def _db_metrics(since_24h) -> dict:
+    """
+    從 PostgreSQL 取得核心計數指標。
+    """
+    metrics: dict = {
+        "incidents_24h": 0,
+        "incidents_total": 0,
+        "general_alert_ratio": 0.0,
+        "auto_repair_24h": 0,
+        "auto_repair_success_24h": 0,
+        "km_total": 0,
+        "km_vectorized": 0,
+        "learning_writes_24h": 0,
+        "audit_logs_24h": 0,
+    }
+
+    try:
+        async with get_db_context() as db:
+            # Incident 數量（24h + 總計）
+            r = await db.execute(
+                select(func.count(IncidentRecord.incident_id)).where(
+                    IncidentRecord.created_at >= since_24h
+                )
+            )
+            metrics["incidents_24h"] = r.scalar_one_or_none() or 0
+
+            r = await db.execute(select(func.count(IncidentRecord.incident_id)))
+            metrics["incidents_total"] = r.scalar_one_or_none() or 0
+
+            # general 告警比例（alert_category = 'general'）
+            r = await db.execute(
+                select(func.count()).where(
+                    IncidentRecord.alert_category == "general"
+                )
+            )
+            general_count = r.scalar_one_or_none() or 0
+            total = metrics["incidents_total"]
+            metrics["general_alert_ratio"] = round(general_count / total, 4) if total else 0.0
+
+            # 自動修復執行（24h）
+            r = await db.execute(
+                select(func.count(AutoRepairExecution.id)).where(
+                    AutoRepairExecution.created_at >= since_24h
+                )
+            )
+            metrics["auto_repair_24h"] = r.scalar_one_or_none() or 0
+
+            r = await db.execute(
+                select(func.count(AutoRepairExecution.id)).where(
+                    AutoRepairExecution.created_at >= since_24h,
+                    AutoRepairExecution.success.is_(True),
+                )
+            )
+            metrics["auto_repair_success_24h"] = r.scalar_one_or_none() or 0
+
+            # KM 數量 + 向量化率
+            r = await db.execute(select(func.count(KnowledgeEntryRecord.id)))
+            metrics["km_total"] = r.scalar_one_or_none() or 0
+
+            r = await db.execute(
+                select(func.count()).where(
+                    KnowledgeEntryRecord.embedding.is_not(None)
+                )
+            )
+            metrics["km_vectorized"] = r.scalar_one_or_none() or 0
+
+            # 學習寫入數（24h 內新增 KM）
+            r = await db.execute(
+                select(func.count()).where(
+                    KnowledgeEntryRecord.created_at >= since_24h
+                )
+            )
+            metrics["learning_writes_24h"] = r.scalar_one_or_none() or 0
+
+            # audit_logs 24h 計數（Phase 0 預期 = 0）
+            r = await db.execute(
+                text(
+                    "SELECT COUNT(*) FROM audit_logs WHERE created_at >= :since"
+                ),
+                {"since": since_24h},
+            )
+            metrics["audit_logs_24h"] = r.scalar_one_or_none() or 0
+
+    except Exception:
+        logger.exception("baseline_db_metrics_error")
+
+    return metrics
+
+
+def _compute_learning_rate(auto_repair_24h: int, learning_writes_24h: int) -> float:
+    """
+    學習閉環觸發率 = learning_writes_24h / auto_repair_24h。
+
+    Phase 0 診斷：fire-and-forget → 比率為 0%（即使 auto_repair > 0，learning 也可能 = 0）
+    Phase 3 修復後目標：≥ 99%
+    """
+    if auto_repair_24h == 0:
+        return 0.0
+    return round(min(learning_writes_24h / auto_repair_24h, 1.0), 4)
+
+
+async def _persist_to_redis(ts_iso: str, snapshot: dict) -> None:
+    """
+    將快照寫入 Redis：
+    - `aiops:baseline:{ts_iso}` — 歷史記錄（永不過期）
+    - `aiops:baseline:latest` — 最新快照全量（永不過期）
+    """
+    try:
+        redis = get_redis()
+        payload = json.dumps(snapshot, ensure_ascii=False)
+
+        # 歷史記錄（保留全部 snapshot）
+        await redis.set(f"{BASELINE_KEY_PREFIX}{ts_iso}", payload)
+
+        # 最新快照（供 API 快速讀取）
+        await redis.set(BASELINE_LATEST_KEY, payload)
+
+        logger.info("baseline_snapshot_persisted", key=BASELINE_LATEST_KEY)
+    except Exception:
+        logger.exception("baseline_persist_error")
+
+
+# ─────────────────────────────────────────────────────────────────────────────
+# Entry point（直接執行）
+# ─────────────────────────────────────────────────────────────────────────────
+
+async def _main() -> None:
+    snapshot = await take_baseline_snapshot()
+    print(json.dumps(snapshot, indent=2, ensure_ascii=False))
+
+
+if __name__ == "__main__":
+    asyncio.run(_main())
--- a/apps/api/src/jobs/capacity_forecaster_job.py
+++ b/apps/api/src/jobs/capacity_forecaster_job.py
@@ -0,0 +1,404 @@
+"""
+Capacity Forecaster Job — Phase 4 AI 容量預測 MVP
+=================================================
+每日 05:00 Taipei 用 Prometheus predict_linear 預測未來 7 天容量趨勢,
+推 Telegram 建議 + 寫 aol(capacity_recommendation).
+
+職責邊界 (MVP):
+  ✅ 用 predict_linear (Prometheus 內建 linear regression) 預測:
+     - 7d 後 disk avail < 0 (磁碟將滿)
+     - 7d 後 mem available < 10% (記憶體緊繃)
+     - 7d 後 cpu 使用率 > 85% (CPU 飽和)
+  ✅ 對每個高風險 host 寫 aol(capacity_recommendation)
+  ✅ 彙總推 Telegram SRE group
+  ⏳ TODO: 真正 Holt-Winters (季節性) — Prometheus 不支援,需外接 Python statsmodels
+  ⏳ TODO: 根據業務週期 (週一高峰/週末低谷) 調整預測
+
+預測方法論:
+  Prometheus predict_linear(metric[7d], 86400*N) 回傳「基於過去 7d,未來 N 秒後的預測值」
+  簡單但有效 — 線性外推,適合穩定增長/下降趨勢
+
+統帥鐵律對齊:
+  - AI 預測 + 推建議,不自動 scale up (人工決策擴容)
+  - 7d window 保證有足夠樣本 (Prometheus retention 15d 夠)
+  - 閾值 (avail < 0, mem < 10%) 是「觸發討論」非「最終決策」
+
+排程:
+  - 首次延遲 540s (其他 scanner 都跑完後)
+  - 每日 05:00 Taipei (capacity_scanner 02:00 → compliance 03:00 → Hermes 04:00 → 預測 05:00)
+
+2026-04-19 ogt + Claude Opus 4.7 (1M context) Asia/Taipei
+ADR-090 § Phase 4 AI 容量預測
+"""
+from __future__ import annotations
+
+import asyncio
+import json as _json
+import time as _time
+from datetime import datetime, timedelta, timezone
+from typing import Any
+
+import httpx
+import structlog
+
+from src.core.config import settings
+
+logger = structlog.get_logger(__name__)
+
+_FIRST_DELAY_SEC = 540
+_LOOP_BACKOFF_SEC = 1800
+_DAILY_TRIGGER_HOUR_TAIPEI = 5
+_HTTP_TIMEOUT_SEC = 15
+
+# 預測視窗 (7d) 與 horizon (未來 7d)
+_WINDOW = "7d"
+_HORIZON_SEC = 7 * 86400  # 7 天
+
+# predict_linear 查詢定義
+# 回傳高風險 host 的 instance label + 預測值
+_FORECAST_QUERIES = {
+    "disk_saturation_7d": (
+        # 7d 後根目錄 avail 預測為 0 或負 = 磁碟會滿
+        f'predict_linear(node_filesystem_avail_bytes{{fstype!~"tmpfs|overlay", mountpoint="/"}}[{_WINDOW}], {_HORIZON_SEC}) < 0',
+        "disk 預測 7 天內會滿",
+    ),
+    "mem_saturation_7d": (
+        # 7d 後記憶體可用 < 10%
+        f'predict_linear(node_memory_MemAvailable_bytes[{_WINDOW}], {_HORIZON_SEC}) '
+        f'/ node_memory_MemTotal_bytes < 0.1',
+        "mem 預測 7 天內可用量 < 10%",
+    ),
+    "cpu_high_7d_trend": (
+        # 過去 7d 平均 cpu 已 > 70% + 上升趨勢
+        f'avg_over_time((100 - (avg by(instance)(rate(node_cpu_seconds_total{{mode="idle"}}[5m])) * 100))[{_WINDOW}:15m]) > 70',
+        "過去 7d cpu 平均 > 70%",
+    ),
+}
+
+
+async def run_capacity_forecaster_loop() -> None:
+    """每日 05:00 Taipei 容量預測."""
+    logger.info("capacity_forecaster_loop_started")
+    await asyncio.sleep(_FIRST_DELAY_SEC)
+
+    while True:
+        try:
+            await forecast_once()
+        except Exception as e:
+            logger.exception("capacity_forecaster_loop_error", error=str(e))
+            await asyncio.sleep(_LOOP_BACKOFF_SEC)
+            continue
+
+        sleep_sec = _seconds_until_next_trigger()
+        logger.info("capacity_forecaster_next_tick", sleep_sec=sleep_sec)
+        await asyncio.sleep(sleep_sec)
+
+
+async def forecast_once() -> dict[str, Any]:
+    """跑一次預測,對每個高風險 host 留痕 + LLM 分析 + 推 Telegram.
+
+    2026-04-19 P0 修 (統帥截圖反饋): 加 leader_lock 避免多 Pod 重複推.
+    """
+    from src.services.ai_advisory_helpers import try_acquire_daily_lock
+
+    # Leader lock: 只 leader Pod 跑,其他 skip
+    if not await try_acquire_daily_lock("capacity_forecaster"):
+        logger.info("capacity_forecast_skipped_not_leader")
+        return {"skipped": "not_leader"}
+
+    started_ms = _time.time()
+    stats: dict[str, Any] = {
+        "queries_run": 0, "high_risk_hosts": 0, "recommendations": 0, "llm_analyzed": 0,
+    }
+    risks: dict[str, list[dict[str, Any]]] = {}
+    llm_analyses: dict[str, dict[str, Any]] = {}
+    error_msg: str | None = None
+
+    try:
+        for query_name, (promql, reason) in _FORECAST_QUERIES.items():
+            hits = await _run_prom_query(promql)
+            stats["queries_run"] += 1
+            for host, value in hits.items():
+                risks.setdefault(host, []).append({
+                    "query": query_name,
+                    "value": value,
+                    "reason": reason,
+                })
+
+        stats["high_risk_hosts"] = len(risks)
+
+        # v2 Gap 3 LLM 升級: 對每個高風險 host 跑 LLM 分析產具體建議
+        # (原 _derive_actions 是硬編 keyword mapping, LLM 能看完整 context 產客製建議)
+        for host, findings in risks.items():
+            analysis = await _llm_analyze_risk(host, findings)
+            if analysis:
+                llm_analyses[host] = analysis
+                stats["llm_analyzed"] += 1
+
+        for host, findings in risks.items():
+            ok = await _write_recommendation_aol(host, findings, llm_analyses.get(host))
+            if ok:
+                stats["recommendations"] += 1
+
+        if risks:
+            await _send_telegram_forecast(risks, llm_analyses)
+
+    except Exception as e:
+        error_msg = f"{type(e).__name__}: {e}"[:1000]
+        logger.exception("capacity_forecast_once_failed", error=error_msg)
+
+    duration_ms = int((_time.time() - started_ms) * 1000)
+    logger.info(
+        "capacity_forecast_once_done",
+        queries=stats["queries_run"],
+        hosts=stats["high_risk_hosts"],
+        recommendations=stats["recommendations"],
+        llm_analyzed=stats["llm_analyzed"],
+        duration_ms=duration_ms,
+    )
+    return stats
+
+
+# ============================================================================
+# v2 Gap 3 LLM 分析 — 統帥鐵律「朝 AI 自主化方向」
+# ============================================================================
+
+_LLM_FORECAST_PROMPT = """你是 AWOOOI 容量規劃專家。以下 host 過去 7 天趨勢顯示高風險,請分析真因並給具體可執行建議。
+
+## Host
+{host}
+
+## Prometheus 預測命中
+{findings_json}
+
+## 當前主機環境資訊
+  - 主機架構: 110 (Harbor/Gitea/監控), 112 (Security), 120/121 (K3s), 125 (K3s backup), 188 (PG/Redis/Ollama/MinIO)
+  - 判斷請考慮: 該主機上跑什麼服務、常見瓶頸模式
+
+## 輸出規格 (必須是合法 JSON,純 JSON 無前後文字)
+{{
+  "root_causes": ["3 個候選真因,繁中"],
+  "priority_actions": [
+    {{"priority": "high|medium|low", "action": "具體動作 (繁中)", "command_hint": "可執行指令 hint"}}
+  ],
+  "urgency_days": 0-30,
+  "confidence": 0.0-1.0
+}}
+
+## 分析方向 (不要寫死 hardcoded reason)
+  - disk_saturation: 查是哪類檔案增長 (container images / PG WAL / 日誌 / build cache)
+  - mem: 查哪個 process 佔最多 (JVM / Redis / cache thrashing)
+  - cpu: 看是 runtime 壓力還是 cron / batch job
+"""
+
+
+async def _llm_analyze_risk(host: str, findings: list[dict[str, Any]]) -> dict[str, Any] | None:
+    """用 OpenClaw 分析高風險 host. 失敗回 None 不阻塞.
+
+    2026-04-19 P1.2 重構: 改用 llm_json_parser.parse_llm_json_response 共用 helper.
+    """
+    try:
+        import json as _j
+        from src.services.llm_json_parser import parse_llm_json_response
+        from src.services.openclaw import get_openclaw
+
+        prompt = _LLM_FORECAST_PROMPT.format(
+            host=host,
+            findings_json=_j.dumps(findings, ensure_ascii=False, indent=2),
+        )
+        openclaw = get_openclaw()
+        text, provider, success = await openclaw.call(prompt)
+        if not success or not text:
+            return None
+
+        parsed = parse_llm_json_response(
+            text,
+            required_key="priority_actions",
+            logger_context=f"forecaster:{host}",
+        )
+        if parsed:
+            parsed["_llm_provider"] = provider
+        return parsed
+    except Exception as e:
+        logger.warning("forecast_llm_error", host=host, error=str(e))
+        return None
+
+
+async def _run_prom_query(promql: str) -> dict[str, float]:
+    """跑 Prometheus instant query, 回傳 {host: value}."""
+    url = f"{settings.PROMETHEUS_URL.rstrip('/')}/api/v1/query"
+    try:
+        async with httpx.AsyncClient(timeout=_HTTP_TIMEOUT_SEC, trust_env=False) as client:
+            resp = await client.get(url, params={"query": promql})
+            resp.raise_for_status()
+            data = resp.json()
+        if data.get("status") != "success":
+            return {}
+        result: dict[str, float] = {}
+        for r in (data.get("data", {}) or {}).get("result", []) or []:
+            instance = (r.get("metric", {}) or {}).get("instance", "")
+            host = instance.split(":")[0] if instance else "unknown"
+            val = r.get("value", [None, None])
+            if val and len(val) >= 2:
+                try:
+                    result[host] = float(val[1])
+                except (ValueError, TypeError):
+                    pass
+        return result
+    except Exception as e:
+        logger.warning("prom_forecast_query_failed", promql=promql[:80], error=str(e))
+        return {}
+
+
+async def _write_recommendation_aol(
+    host: str,
+    findings: list[dict[str, Any]],
+    llm_analysis: dict[str, Any] | None = None,
+) -> bool:
+    """寫 aol(capacity_recommendation) + LLM 分析結果."""
+    try:
+        from sqlalchemy import text as _sql
+        from src.db.base import get_db_context
+
+        input_payload = {"host": host, "forecast_horizon_days": 7, "findings_count": len(findings)}
+        output_payload: dict[str, Any] = {
+            "host": host,
+            "findings": findings,
+            "proposed_actions": _derive_actions(findings),
+            "requires_human_decision": True,
+        }
+        if llm_analysis:
+            output_payload["llm_analysis"] = llm_analysis
+
+        async with get_db_context() as db:
+            await db.execute(
+                _sql("""
+                    INSERT INTO automation_operation_log (
+                        operation_type, actor, status,
+                        input, output, tags
+                    ) VALUES (
+                        'capacity_recommendation',
+                        'capacity_forecaster',
+                        'success',
+                        CAST(:input AS jsonb),
+                        CAST(:output AS jsonb),
+                        :tags
+                    )
+                """),
+                {
+                    "input": _json.dumps(input_payload, ensure_ascii=False),
+                    "output": _json.dumps(output_payload, ensure_ascii=False),
+                    "tags": ["capacity", "forecast", "phase4", "predict_linear"],
+                },
+            )
+        return True
+    except Exception as e:
+        logger.warning("capacity_forecast_aol_failed", host=host, error=str(e))
+        return False
+
+
+def _derive_actions(findings: list[dict[str, Any]]) -> list[str]:
+    """根據 findings 產生建議動作清單."""
+    actions: list[str] = []
+    queries = {f["query"] for f in findings}
+    if "disk_saturation_7d" in queries:
+        actions.append("清理 /var/log, /var/lib/docker, PG WAL archive;或擴容磁碟")
+    if "mem_saturation_7d" in queries:
+        actions.append("檢查 top mem consumer;考慮加記憶體或調整 JVM/Redis maxmemory")
+    if "cpu_high_7d_trend" in queries:
+        actions.append("分析 top CPU process;考慮擴充 vCPU 或 scale out")
+    if not actions:
+        actions.append("人工審查各指標")
+    return actions
+
+
+async def _send_telegram_forecast(
+    risks: dict[str, list[dict[str, Any]]],
+    llm_analyses: dict[str, dict[str, Any]] | None = None,
+) -> bool:
+    """推 Telegram 預測摘要 (含 LLM 分析 + 互動按鈕).
+
+    2026-04-19 P0 修 (統帥截圖反饋): 加 snooze check + inline_keyboard 4 按鈕
+    (✅ 已處理 / 😴 忽略 24h / 🔍 查看詳情 / 📋 產 kubectl 指令).
+    """
+    try:
+        import html
+        from src.services.ai_advisory_helpers import build_ai_advisory_keyboard, is_snoozed
+        from src.services.telegram_gateway import get_telegram_gateway
+
+        if not settings.OPENCLAW_TG_CHAT_ID:
+            return False
+
+        # Snooze check: 過濾掉被人工 snooze 的 host (按「忽略 24h」後)
+        active_risks = {}
+        skipped_hosts: list[str] = []
+        for host, findings in risks.items():
+            if await is_snoozed("capacity_forecast", host):
+                skipped_hosts.append(host)
+                continue
+            active_risks[host] = findings
+
+        if not active_risks:
+            logger.info("capacity_forecast_all_snoozed", total=len(risks))
+            return False
+
+        llm_analyses = llm_analyses or {}
+        lines = [
+            "📈 <b>容量預測 (Phase 4 AI 升級版)</b>",
+            f"未來 7 天高風險 host: {len(active_risks)} 台"
+            + (f" (含 {len(skipped_hosts)} 台已忽略)" if skipped_hosts else ""),
+            "",
+        ]
+        for host, findings in list(active_risks.items())[:8]:
+            lines.append(f"🟡 <code>{html.escape(host)}</code>")
+            for f in findings[:3]:
+                lines.append(f"   ▸ {html.escape(f['reason'])} (value={f['value']:.2f})")
+
+            ai = llm_analyses.get(host)
+            if ai:
+                urgency = ai.get("urgency_days", "?")
+                conf = ai.get("confidence", 0.0)
+                lines.append(f"   🤖 AI 判定: 緊急 {urgency}d, 信心 {conf:.0%}")
+                for act in (ai.get("priority_actions") or [])[:2]:
+                    pri = act.get("priority", "")
+                    detail = html.escape(str(act.get("action", ""))[:100])
+                    lines.append(f"   ▸ [{pri}] {detail}")
+            else:
+                actions = _derive_actions(findings)
+                if actions:
+                    lines.append(f"   建議: {html.escape(actions[0])[:100]}")
+            lines.append("")
+
+        # advisory_id 用第一個 host (snooze / aol 對應用)
+        primary_host = next(iter(active_risks.keys()))
+        keyboard = build_ai_advisory_keyboard(
+            advisory_type="capacity_forecast",
+            advisory_id=primary_host,
+            include_view=True,
+            include_produce_cmd=True,
+        )
+
+        msg = "\n".join(lines)
+
+        tg = get_telegram_gateway()
+        await tg._send_request("sendMessage", {  # type: ignore[attr-defined]
+            "chat_id": settings.OPENCLAW_TG_CHAT_ID,
+            "text": msg,
+            "parse_mode": "HTML",
+            "disable_web_page_preview": True,
+            "reply_markup": keyboard,
+        })
+        return True
+    except Exception as e:
+        logger.warning("capacity_forecast_telegram_failed", error=str(e))
+        return False
+
+
+def _seconds_until_next_trigger() -> float:
+    tz_taipei = timezone(timedelta(hours=8))
+    now = datetime.now(tz_taipei)
+    today_trigger = now.replace(hour=_DAILY_TRIGGER_HOUR_TAIPEI, minute=0, second=0, microsecond=0)
+    if now >= today_trigger:
+        today_trigger = today_trigger + timedelta(days=1)
+    delta = (today_trigger - now).total_seconds()
+    return max(300.0, min(delta, 25 * 3600))
--- a/apps/api/src/jobs/capacity_scanner_job.py
+++ b/apps/api/src/jobs/capacity_scanner_job.py
@@ -0,0 +1,339 @@
+"""
+Capacity Scanner Job — ADR-090 § Phase 4 NemoTron 容量巡檢 MVP
+===============================================================
+每日 02:00 Taipei 從 Prometheus 撈 node metrics → 寫 host_capacity_snapshot.
+
+職責邊界:
+  ✅ 撈 Prometheus node_exporter metrics (load / cpu / mem / swap)
+  ✅ 為每個 host 寫一筆 host_capacity_snapshot + heuristic ai_verdict
+  ✅ 超過硬閾值寫 capacity_violation_event
+  ✅ 寫 automation_operation_log(capacity_recommendation)
+  ❌ 不做 Holt-Winters 預測 (那是 Hermes 後續階段)
+  ❌ 不自動執行修復 (只 recommend,統帥決策)
+
+設計鐵律:
+  - 每日一次 snapshot (歷史 30d 供 AI 趨勢分析)
+  - ai_verdict heuristic: cpu>80 or mem>85 → critical; >60/70 → warning; else safe
+  - Prometheus 失敗 → log + skip 該 host,不 crash 整 loop
+
+資料來源:
+  - PROMETHEUS_URL/api/v1/query (instant query)
+  - 預期 instance 格式: '192.168.0.XXX:9100' 或 hostname
+
+排程:
+  - 首次延遲 120s
+  - 後續每日 02:00 (Taipei) 對齊跑
+
+2026-04-19 ogt + Claude Opus 4.7 (1M context) Asia/Taipei
+ADR-090 § Phase 4
+"""
+from __future__ import annotations
+
+import asyncio
+import json as _json
+import time as _time
+from datetime import datetime, timedelta, timezone
+from typing import Any
+
+import httpx
+import structlog
+
+from src.core.config import settings
+
+logger = structlog.get_logger(__name__)
+
+# ============================================================================
+# 排程 / 閾值
+# ============================================================================
+_FIRST_DELAY_SEC = 120
+_HTTP_TIMEOUT_SEC = 10
+_LOOP_BACKOFF_SEC = 1800
+
+# Taipei = UTC+8,每日 02:00 Taipei = 18:00 UTC 前一天
+_DAILY_TRIGGER_HOUR_TAIPEI = 2
+
+# Heuristic 閾值 (ai_verdict 計算)
+_CPU_CRITICAL = 80.0
+_CPU_WARNING = 60.0
+_MEM_CRITICAL = 85.0
+_MEM_WARNING = 70.0
+_SWAP_CRITICAL = 50.0
+_LOAD1_CRITICAL_RATIO = 2.0  # load1 > 2x CPU cores = critical
+
+# Prometheus 查詢 (instant query,每筆 host 一個 label)
+_PROM_QUERIES = {
+    "load1": "avg by(instance) (node_load1)",
+    "load5": "avg by(instance) (node_load5)",
+    "load15": "avg by(instance) (node_load15)",
+    "cpu_used_pct": "100 - (avg by(instance) (rate(node_cpu_seconds_total{mode=\"idle\"}[5m])) * 100)",
+    "cpu_iowait_pct": "avg by(instance) (rate(node_cpu_seconds_total{mode=\"iowait\"}[5m])) * 100",
+    "mem_used_pct": "(1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100",
+    "swap_used_pct": "(node_memory_SwapTotal_bytes - node_memory_SwapFree_bytes) / (node_memory_SwapTotal_bytes > 0 or vector(1)) * 100",
+}
+
+
+# ============================================================================
+# Public entry — main.py lifespan 呼叫
+# ============================================================================
+
+async def run_capacity_scanner_loop() -> None:
+    """每日 02:00 Taipei 跑一次容量巡檢."""
+    logger.info("capacity_scanner_loop_started")
+    await asyncio.sleep(_FIRST_DELAY_SEC)
+
+    while True:
+        try:
+            await scan_once()
+        except Exception as e:
+            logger.exception("capacity_scanner_loop_error", error=str(e))
+            await asyncio.sleep(_LOOP_BACKOFF_SEC)
+            continue
+
+        # 算下次 02:00 Taipei 的 sleep 秒數
+        sleep_sec = _seconds_until_next_trigger()
+        logger.info("capacity_scanner_next_tick", sleep_sec=sleep_sec)
+        await asyncio.sleep(sleep_sec)
+
+
+async def scan_once(triggered_by: str = "cron") -> dict[str, int]:
+    """執行一次容量巡檢,每 host 寫一筆 snapshot."""
+    started_ms = _time.time()
+    stats = {"hosts_scanned": 0, "violations": 0}
+    error_msg: str | None = None
+
+    try:
+        metrics_by_host = await _fetch_all_metrics()
+        for host, m in metrics_by_host.items():
+            snapshot_id = await _write_snapshot(host, m)
+            if snapshot_id:
+                stats["hosts_scanned"] += 1
+                viol = await _check_and_write_violations(host, m)
+                stats["violations"] += viol
+    except Exception as e:
+        error_msg = f"{type(e).__name__}: {e}"[:1000]
+        logger.exception("capacity_scan_once_failed", error=error_msg)
+
+    duration_ms = int((_time.time() - started_ms) * 1000)
+    await _log_aol(stats=stats, duration_ms=duration_ms, triggered_by=triggered_by, error=error_msg)
+
+    logger.info(
+        "capacity_scan_once_done",
+        hosts=stats["hosts_scanned"],
+        violations=stats["violations"],
+        duration_ms=duration_ms,
+    )
+    return stats
+
+
+# ============================================================================
+# Prometheus 撈資料
+# ============================================================================
+
+async def _fetch_all_metrics() -> dict[str, dict[str, float]]:
+    """
+    對每個 _PROM_QUERIES 跑 instant query,回傳 {host: {metric: value}}.
+
+    host 來自 query 結果 label 'instance' 的 IP 前綴 (去掉 :9100).
+    """
+    url = f"{settings.PROMETHEUS_URL.rstrip('/')}/api/v1/query"
+    results: dict[str, dict[str, float]] = {}
+
+    async with httpx.AsyncClient(timeout=_HTTP_TIMEOUT_SEC, trust_env=False) as client:
+        for metric_name, promql in _PROM_QUERIES.items():
+            try:
+                resp = await client.get(url, params={"query": promql})
+                resp.raise_for_status()
+                data = resp.json()
+                if data.get("status") != "success":
+                    logger.warning("prom_query_non_success", metric=metric_name)
+                    continue
+                for r in (data.get("data", {}) or {}).get("result", []) or []:
+                    instance = (r.get("metric", {}) or {}).get("instance", "")
+                    host = instance.split(":")[0] if instance else "unknown"
+                    val = r.get("value", [None, None])
+                    if val and len(val) >= 2:
+                        try:
+                            results.setdefault(host, {})[metric_name] = float(val[1])
+                        except (ValueError, TypeError):
+                            pass
+            except Exception as e:
+                logger.warning("prom_query_failed", metric=metric_name, error=str(e))
+                continue
+    return results
+
+
+# ============================================================================
+# DB 寫入
+# ============================================================================
+
+async def _write_snapshot(host: str, m: dict[str, float]) -> int | None:
+    """寫 host_capacity_snapshot,回傳 snapshot_id."""
+    if not host or host == "unknown":
+        return None
+
+    verdict, reasoning = _assess_verdict(m)
+
+    try:
+        from sqlalchemy import text as _sql
+        from src.db.base import get_db_context
+
+        async with get_db_context() as db:
+            row = await db.execute(
+                _sql("""
+                    INSERT INTO host_capacity_snapshot (
+                        host, captured_at,
+                        load1, load5, load15,
+                        cpu_used_pct, cpu_iowait_pct,
+                        mem_used_pct, swap_used_pct,
+                        ai_verdict, ai_reasoning,
+                        written_by_agent
+                    ) VALUES (
+                        :host, NOW(),
+                        :l1, :l5, :l15,
+                        :cpu, :iowait,
+                        :mem, :swap,
+                        :verdict, :reason,
+                        'capacity_scanner'
+                    )
+                    RETURNING snapshot_id
+                """),
+                {
+                    "host": host,
+                    "l1": m.get("load1"),
+                    "l5": m.get("load5"),
+                    "l15": m.get("load15"),
+                    "cpu": m.get("cpu_used_pct"),
+                    "iowait": m.get("cpu_iowait_pct"),
+                    "mem": m.get("mem_used_pct"),
+                    "swap": m.get("swap_used_pct"),
+                    "verdict": verdict,
+                    "reason": reasoning[:500],
+                },
+            )
+            sid = row.scalar()
+            return int(sid) if sid else None
+    except Exception as e:
+        logger.warning("capacity_snapshot_write_failed", host=host, error=str(e))
+        return None
+
+
+async def _check_and_write_violations(host: str, m: dict[str, float]) -> int:
+    """超過硬閾值時寫 capacity_violation_event,回傳新增筆數."""
+    from sqlalchemy import text as _sql
+    from src.db.base import get_db_context
+
+    violations: list[tuple[str, float, float]] = []
+    cpu = m.get("cpu_used_pct")
+    mem = m.get("mem_used_pct")
+    swap = m.get("swap_used_pct")
+    if cpu is not None and cpu > _CPU_CRITICAL:
+        violations.append(("cpu_over_threshold", _CPU_CRITICAL, cpu))
+    if mem is not None and mem > _MEM_CRITICAL:
+        violations.append(("mem_over_threshold", _MEM_CRITICAL, mem))
+    if swap is not None and swap > _SWAP_CRITICAL:
+        violations.append(("swap_over_threshold", _SWAP_CRITICAL, swap))
+
+    if not violations:
+        return 0
+
+    written = 0
+    try:
+        async with get_db_context() as db:
+            for vtype, threshold, actual in violations:
+                await db.execute(
+                    _sql("""
+                        INSERT INTO capacity_violation_event (
+                            host, violation_type, threshold, actual_value,
+                            detected_at
+                        ) VALUES (
+                            :host, :vt, :th, :av,
+                            NOW()
+                        )
+                    """),
+                    {"host": host, "vt": vtype, "th": threshold, "av": actual},
+                )
+                written += 1
+    except Exception as e:
+        logger.warning("capacity_violation_write_failed", host=host, error=str(e))
+    return written
+
+
+async def _log_aol(stats: dict[str, int], duration_ms: int, triggered_by: str, error: str | None) -> None:
+    """寫 aol(capacity_recommendation)."""
+    try:
+        from sqlalchemy import text as _sql
+        from src.db.base import get_db_context
+
+        aol_status = "failed" if error else "success"
+        input_payload = {"triggered_by": triggered_by, "source": "prometheus_node_exporter"}
+        output_payload = dict(stats)
+
+        async with get_db_context() as db:
+            await db.execute(
+                _sql("""
+                    INSERT INTO automation_operation_log (
+                        operation_type, actor, status,
+                        input, output, duration_ms, error, tags
+                    ) VALUES (
+                        'capacity_recommendation',
+                        'capacity_scanner',
+                        :st,
+                        CAST(:input AS jsonb),
+                        CAST(:output AS jsonb),
+                        :dur, :err, :tags
+                    )
+                """),
+                {
+                    "st": aol_status,
+                    "input": _json.dumps(input_payload, ensure_ascii=False),
+                    "output": _json.dumps(output_payload, ensure_ascii=False),
+                    "dur": duration_ms,
+                    "err": (error or "")[:2000] if error else None,
+                    "tags": ["capacity", "scanner", "prometheus"],
+                },
+            )
+    except Exception as e:
+        logger.warning("capacity_aol_write_failed", error=str(e))
+
+
+# ============================================================================
+# Heuristic + 時間計算
+# ============================================================================
+
+def _assess_verdict(m: dict[str, float]) -> tuple[str, str]:
+    """根據閾值給 ai_verdict (safe/warning/critical) + reasoning."""
+    reasons = []
+    max_level = 0  # 0=safe 1=warning 2=critical
+
+    cpu = m.get("cpu_used_pct")
+    mem = m.get("mem_used_pct")
+    swap = m.get("swap_used_pct")
+
+    if cpu is not None:
+        if cpu > _CPU_CRITICAL:
+            max_level = max(max_level, 2); reasons.append(f"cpu={cpu:.1f}% (>{_CPU_CRITICAL})")
+        elif cpu > _CPU_WARNING:
+            max_level = max(max_level, 1); reasons.append(f"cpu={cpu:.1f}% (>{_CPU_WARNING})")
+    if mem is not None:
+        if mem > _MEM_CRITICAL:
+            max_level = max(max_level, 2); reasons.append(f"mem={mem:.1f}% (>{_MEM_CRITICAL})")
+        elif mem > _MEM_WARNING:
+            max_level = max(max_level, 1); reasons.append(f"mem={mem:.1f}% (>{_MEM_WARNING})")
+    if swap is not None and swap > _SWAP_CRITICAL:
+        max_level = max(max_level, 2); reasons.append(f"swap={swap:.1f}% (>{_SWAP_CRITICAL})")
+
+    verdict = ("safe", "warning", "critical")[max_level]
+    reasoning = "; ".join(reasons) if reasons else "all metrics within safe range"
+    return verdict, reasoning
+
+
+def _seconds_until_next_trigger() -> float:
+    """算到下個 02:00 Taipei 的秒數."""
+    tz_taipei = timezone(timedelta(hours=8))
+    now = datetime.now(tz_taipei)
+    today_trigger = now.replace(hour=_DAILY_TRIGGER_HOUR_TAIPEI, minute=0, second=0, microsecond=0)
+    if now >= today_trigger:
+        today_trigger = today_trigger + timedelta(days=1)
+    delta = (today_trigger - now).total_seconds()
+    # 上限保護: 至少 300s,至多 25h
+    return max(300.0, min(delta, 25 * 3600))
--- a/apps/api/src/jobs/compliance_scanner_job.py
+++ b/apps/api/src/jobs/compliance_scanner_job.py
@@ -0,0 +1,589 @@
+"""
+Compliance Scanner Job — ADR-090 § 合規掃描 MVP
+================================================
+每日 03:00 Taipei 遍歷 asset_inventory,為每個 active asset 寫 7 維 asset_compliance_snapshot.
+
+職責邊界 (MVP):
+  ✅ 為所有 active asset 建立 7 維 snapshot 占位 (status='unknown')
+  ✅ 基礎檢查: secret asset 是否 > 90d 沒輪替 (K8s Secret createdAt)
+  ✅ 寫 automation_operation_log(coverage_recalculated) summary
+  ⏳ TODO: ssl_cert_valid (openssl s_client 檢查憑證到期)
+  ⏳ TODO: cve_scan (trivy image scan)
+  ⏳ TODO: backup_tested (查 16-cronjob-backup-restore-test 結果)
+
+設計鐵律:
+  - 7 個 dimension 固定: ssl_cert_valid / cve_scan / secret_rotated / backup_tested /
+                        audit_log_enabled / access_reviewed / encryption_at_rest
+  - 未實作的 dimension 預設 status='unknown',後續 AI agent UPDATE
+  - 每天一次 snapshot,歷史保留供 SLO 統計
+
+排程:
+  - 首次延遲 180s (capacity_scanner 之後)
+  - 每日 03:00 Taipei 對齊
+
+2026-04-19 ogt + Claude Opus 4.7 (1M context) Asia/Taipei
+ADR-090 § Compliance
+"""
+from __future__ import annotations
+
+import asyncio
+import json as _json
+import time as _time
+from datetime import datetime, timedelta, timezone
+from typing import Any
+
+import structlog
+
+logger = structlog.get_logger(__name__)
+
+# ============================================================================
+# 排程
+# ============================================================================
+_FIRST_DELAY_SEC = 180
+_LOOP_BACKOFF_SEC = 1800
+_DAILY_TRIGGER_HOUR_TAIPEI = 3
+
+# 7 維 compliance (ADR-090 schema CHECK)
+_DIMENSIONS = (
+    "ssl_cert_valid", "cve_scan", "secret_rotated", "backup_tested",
+    "audit_log_enabled", "access_reviewed", "encryption_at_rest",
+)
+
+# secret_rotated 閾值
+_SECRET_ROTATION_WARNING_DAYS = 90
+
+
+# ============================================================================
+# Public entry — main.py lifespan 呼叫
+# ============================================================================
+
+async def run_compliance_scanner_loop() -> None:
+    """每日 03:00 Taipei 合規掃描."""
+    logger.info("compliance_scanner_loop_started")
+    await asyncio.sleep(_FIRST_DELAY_SEC)
+
+    while True:
+        try:
+            await scan_once()
+        except Exception as e:
+            logger.exception("compliance_scanner_loop_error", error=str(e))
+            await asyncio.sleep(_LOOP_BACKOFF_SEC)
+            continue
+
+        sleep_sec = _seconds_until_next_trigger()
+        logger.info("compliance_scanner_next_tick", sleep_sec=sleep_sec)
+        await asyncio.sleep(sleep_sec)
+
+
+async def scan_once(triggered_by: str = "cron") -> dict[str, int]:
+    """遍歷 asset_inventory 為每個 active asset 寫 7 維 compliance snapshot.
+
+    2026-04-19 Gap 3.2 LLM 升級: scan 完後若有 warnings/violations,
+    用 LLM 分析整體 compliance posture + top 3 優先建議.
+    2026-04-19 P0 修: 加 leader_lock 避免多 Pod 重複推 Telegram.
+    """
+    from src.services.ai_advisory_helpers import try_acquire_daily_lock
+
+    # Leader lock (cron 觸發才鎖,手動觸發不鎖)
+    if triggered_by == "cron" and not await try_acquire_daily_lock("compliance_scanner"):
+        logger.info("compliance_scan_skipped_not_leader")
+        return {"skipped": "not_leader"}
+
+    started_ms = _time.time()
+    stats: dict[str, Any] = {
+        "assets_scanned": 0, "snapshots_written": 0, "violations": 0, "warnings": 0,
+        "llm_analyzed": False,
+    }
+    error_msg: str | None = None
+    warning_assets: list[dict[str, Any]] = []
+
+    try:
+        assets = await _fetch_active_assets()
+        stats["assets_scanned"] = len(assets)
+
+        for asset in assets:
+            s, v, w, asset_warnings = await _write_compliance_for_asset_v2(asset)
+            stats["snapshots_written"] += s
+            stats["violations"] += v
+            stats["warnings"] += w
+            if asset_warnings:
+                warning_assets.append(asset_warnings)
+
+        # Gap 3.2: 有 warning 時 LLM 分析整體 posture
+        if warning_assets and (stats["warnings"] > 0 or stats["violations"] > 0):
+            analysis = await _llm_analyze_compliance_posture(warning_assets, stats)
+            if analysis:
+                stats["llm_analyzed"] = True
+                stats["llm_summary"] = analysis
+                await _send_telegram_posture(warning_assets, stats, analysis)
+
+    except Exception as e:
+        error_msg = f"{type(e).__name__}: {e}"[:1000]
+        logger.exception("compliance_scan_once_failed", error=error_msg)
+
+    duration_ms = int((_time.time() - started_ms) * 1000)
+    await _log_aol(stats=stats, duration_ms=duration_ms, triggered_by=triggered_by, error=error_msg)
+
+    logger.info(
+        "compliance_scan_once_done",
+        assets=stats["assets_scanned"],
+        snapshots=stats["snapshots_written"],
+        warnings=stats["warnings"],
+        violations=stats["violations"],
+        llm_analyzed=stats["llm_analyzed"],
+        duration_ms=duration_ms,
+    )
+    return stats
+
+
+# ============================================================================
+# DB 操作
+# ============================================================================
+
+async def _fetch_active_assets() -> list[dict[str, Any]]:
+    """從 asset_inventory 撈所有 active asset."""
+    from sqlalchemy import text as _sql
+    from src.db.base import get_db_context
+
+    try:
+        async with get_db_context() as db:
+            result = await db.execute(
+                _sql("""
+                    SELECT asset_id, asset_key, asset_type, metadata
+                    FROM asset_inventory
+                    WHERE lifecycle_state = 'active'
+                    ORDER BY asset_id
+                """),
+            )
+            rows = result.fetchall()
+            return [
+                {
+                    "asset_id": r.asset_id,
+                    "asset_key": r.asset_key,
+                    "asset_type": r.asset_type,
+                    "metadata": r.metadata or {},
+                }
+                for r in rows
+            ]
+    except Exception as e:
+        logger.warning("fetch_active_assets_failed", error=str(e))
+        return []
+
+
+async def _write_compliance_for_asset_v2(asset: dict[str, Any]) -> tuple[int, int, int, dict[str, Any] | None]:
+    """
+    v2: 回傳 warnings detail 給上層做 LLM 分析.
+
+    Returns: (snapshots_written, violations_count, warnings_count, asset_warning_dict | None)
+    """
+    s, v, w = await _write_compliance_for_asset(asset)
+    if v == 0 and w == 0:
+        return s, v, w, None
+    # 建構 warning summary (供 LLM 分析用)
+    warning_detail = {
+        "asset_key": asset.get("asset_key"),
+        "asset_type": asset.get("asset_type"),
+        "violations_count": v,
+        "warnings_count": w,
+    }
+    return s, v, w, warning_detail
+
+
+async def _write_compliance_for_asset(asset: dict[str, Any]) -> tuple[int, int, int]:
+    """
+    為單一 asset 寫 7 維 compliance snapshot.
+
+    Returns: (snapshots_written, violations_count, warnings_count)
+    """
+    from sqlalchemy import text as _sql
+    from src.db.base import get_db_context
+
+    snapshots = 0
+    violations = 0
+    warnings = 0
+
+    # 2026-04-19 v2: SSL check 是同步阻塞 (socket.connect),用 to_thread 避免卡 event loop
+    dimension_results = await asyncio.to_thread(_evaluate_all_dimensions, asset)
+
+    try:
+        async with get_db_context() as db:
+            for dim, (status, detail) in dimension_results.items():
+                await db.execute(
+                    _sql("""
+                        INSERT INTO asset_compliance_snapshot (
+                            asset_id, dimension, status, detail, detected_at
+                        ) VALUES (
+                            :aid, :dim, :status, CAST(:detail AS jsonb), NOW()
+                        )
+                    """),
+                    {
+                        "aid": asset["asset_id"],
+                        "dim": dim,
+                        "status": status,
+                        "detail": _json.dumps(detail, ensure_ascii=False),
+                    },
+                )
+                snapshots += 1
+                if status == "violation":
+                    violations += 1
+                elif status == "warning":
+                    warnings += 1
+    except Exception as e:
+        logger.warning("compliance_write_failed", asset_id=asset["asset_id"], error=str(e))
+
+    return snapshots, violations, warnings
+
+
+# ============================================================================
+# Compliance 評估邏輯 (MVP — 多數 unknown,留 TODO)
+# ============================================================================
+
+def _evaluate_all_dimensions(asset: dict[str, Any]) -> dict[str, tuple[str, dict]]:
+    """
+    為 asset 評估所有 7 維,回傳 {dimension: (status, detail)}.
+
+    v2 實作 (2026-04-19):
+    - secret_rotated: asset_type='secret' 檢查 metadata.creationTimestamp
+    - ssl_cert_valid: third_party_service 的 scrape_url=https:// 檢查 cert expiry
+    - backup_tested: 從 K8s CronJob 'backup-restore-test' 的 lastSuccessfulTime
+    - 其他 4 維仍 unknown (cve_scan/audit_log_enabled/access_reviewed/encryption_at_rest)
+    """
+    results: dict[str, tuple[str, dict]] = {}
+
+    # secret_rotated
+    if asset["asset_type"] == "secret":
+        results["secret_rotated"] = _check_secret_rotation(asset)
+    else:
+        results["secret_rotated"] = ("unknown", {"reason": "asset_type is not 'secret', N/A"})
+
+    # ssl_cert_valid: 對有 HTTPS scrape_url 的 asset 檢查
+    results["ssl_cert_valid"] = _check_ssl_cert(asset)
+
+    # 其他 5 維佔位
+    results["cve_scan"] = ("unknown", {"todo": "trivy image scan"})
+    results["backup_tested"] = ("unknown", {"todo": "pg-backup-restore-test 結果 (Phase 7.4)"})
+    results["audit_log_enabled"] = ("unknown", {"todo": "audit_logs table 對應查詢"})
+    results["access_reviewed"] = ("unknown", {"todo": "RBAC quarterly review"})
+    results["encryption_at_rest"] = ("unknown", {"todo": "PG TDE / K8s Secret encryption check"})
+
+    return results
+
+
+def _check_ssl_cert(asset: dict[str, Any]) -> tuple[str, dict]:
+    """
+    SSL 憑證到期檢查 — 對 third_party_service / host_service 含 https scrape_url 的 asset.
+
+    用 Python 內建 ssl module (無外部依賴) 打 cert expiry check.
+    - expires > 30d: compliant
+    - expires 7-30d: warning
+    - expires < 7d: violation (critical)
+    - 無 https / 連線失敗: unknown
+
+    2026-04-19 Gap 1 後續: 適用 prometheus_target 類 asset (含 blackbox https 監控)
+    """
+    metadata = asset.get("metadata") or {}
+    scrape_url = metadata.get("scrape_url") or ""
+    instance = metadata.get("instance") or ""
+
+    # 從 scrape_url 或 instance 找 https 目標
+    https_target: str | None = None
+    if scrape_url.startswith("https://"):
+        https_target = scrape_url
+    elif instance.startswith("https://"):
+        https_target = instance
+    elif asset.get("name", "").startswith("https://"):
+        https_target = asset["name"]
+
+    if not https_target:
+        return ("unknown", {"reason": "no https scrape_url / instance"})
+
+    import ssl
+    import socket
+    from urllib.parse import urlparse
+    from datetime import datetime
+
+    try:
+        parsed = urlparse(https_target)
+        hostname = parsed.hostname
+        port = parsed.port or 443
+        if not hostname:
+            return ("unknown", {"reason": f"cannot parse hostname from {https_target}"})
+
+        ctx = ssl.create_default_context()
+        ctx.check_hostname = False  # blackbox 可能掃多個不同 SNI
+        ctx.verify_mode = ssl.CERT_NONE  # 只要拿 cert expiry 不強制 verify
+        with socket.create_connection((hostname, port), timeout=5.0) as sock:
+            with ctx.wrap_socket(sock, server_hostname=hostname) as ssock:
+                # verify_mode=NONE 時要用 getpeercert(binary_form=True) + parse
+                # 簡化: 改用 context.set_ciphers + verify_mode=CERT_REQUIRED 會抓 cert;
+                # 這裡為了相容 self-signed 內網 cert,改讀 DER binary 自行 parse
+                cert_bin = ssock.getpeercert(binary_form=True)
+        if not cert_bin:
+            return ("unknown", {"reason": "no cert returned"})
+
+        # 不依賴 cryptography 套件: 用簡單 ASN.1 解析找 Validity/notAfter
+        # 實務上 Python 內建沒 X.509 parser; 用 openssl CLI 更可靠
+        # MVP: 改 CERT_REQUIRED + check_hostname=True 模式
+        return _check_ssl_cert_via_verified_socket(hostname, port)
+    except Exception as e:
+        return ("unknown", {"reason": f"ssl_check_failed: {type(e).__name__}: {str(e)[:100]}"})
+
+
+def _check_ssl_cert_via_verified_socket(hostname: str, port: int) -> tuple[str, dict]:
+    """用 verified socket 拿 dict form cert, 取 notAfter 判斷剩餘天數."""
+    import ssl
+    import socket
+    from datetime import datetime
+
+    try:
+        ctx = ssl.create_default_context()
+        with socket.create_connection((hostname, port), timeout=5.0) as sock:
+            with ctx.wrap_socket(sock, server_hostname=hostname) as ssock:
+                cert = ssock.getpeercert()
+        if not cert or "notAfter" not in cert:
+            return ("unknown", {"reason": "cert has no notAfter"})
+
+        # notAfter 格式例: "Jul 15 12:34:56 2026 GMT"
+        expires_at = datetime.strptime(cert["notAfter"], "%b %d %H:%M:%S %Y %Z")
+        now = datetime.utcnow()
+        days_remaining = (expires_at - now).days
+
+        detail = {
+            "hostname": hostname,
+            "port": port,
+            "not_after": cert["notAfter"],
+            "days_remaining": days_remaining,
+            "issuer": dict(x[0] for x in cert.get("issuer", []) if x),
+            "subject": dict(x[0] for x in cert.get("subject", []) if x),
+        }
+        if days_remaining < 7:
+            return ("violation", {**detail, "message": f"憑證 {days_remaining} 天內到期 (critical)"})
+        elif days_remaining < 30:
+            return ("warning", {**detail, "message": f"憑證 {days_remaining} 天內到期"})
+        else:
+            return ("compliant", {**detail, "message": f"憑證剩 {days_remaining} 天"})
+    except ssl.SSLCertVerificationError as e:
+        return ("violation", {"hostname": hostname, "reason": f"憑證驗證失敗: {str(e)[:100]}"})
+    except Exception as e:
+        return ("unknown", {"hostname": hostname, "reason": f"ssl check error: {type(e).__name__}: {str(e)[:100]}"})
+
+
+def _check_secret_rotation(asset: dict[str, Any]) -> tuple[str, dict]:
+    """檢查 Secret 的 creationTimestamp,超過 90d 標 warning."""
+    meta = asset.get("metadata", {})
+    created_ts = meta.get("creationTimestamp") or meta.get("createdAt") or ""
+    if not created_ts:
+        return ("unknown", {"reason": "creationTimestamp not in metadata"})
+
+    try:
+        if created_ts.endswith("Z"):
+            created = datetime.fromisoformat(created_ts.replace("Z", "+00:00"))
+        else:
+            created = datetime.fromisoformat(created_ts)
+    except (ValueError, TypeError):
+        return ("unknown", {"reason": f"unparseable timestamp: {created_ts[:50]}"})
+
+    now_utc = datetime.now(timezone.utc)
+    age_days = (now_utc - created).days
+
+    if age_days > _SECRET_ROTATION_WARNING_DAYS:
+        return ("warning", {
+            "age_days": age_days,
+            "threshold_days": _SECRET_ROTATION_WARNING_DAYS,
+            "message": f"Secret 已 {age_days} 天未輪替,超過 {_SECRET_ROTATION_WARNING_DAYS}d 閾值",
+        })
+    return ("compliant", {"age_days": age_days})
+
+
+# ============================================================================
+# Gap 3.2 LLM 分析 — 2026-04-19 朝 AI 自主化
+# ============================================================================
+
+_LLM_POSTURE_PROMPT = """你是 AWOOOI 資訊安全合規專家。以下是今日合規掃描結果,請分析整體 compliance posture 並提出 top 3 優先處理項目。
+
+## 合規掃描摘要
+  - 已掃描 asset 總數: {total_assets}
+  - 有 violations 的 asset 數: {violations_count}
+  - 有 warnings 的 asset 數: {warnings_count}
+
+## 問題 asset 清單 (前 20 筆)
+{warning_list_json}
+
+## 輸出規格 (必須是合法 JSON,純 JSON 無前後文字)
+{{
+  "posture_grade": "A|B|C|D|F",
+  "posture_summary": "3 句繁中敘述整體合規態勢",
+  "top_priorities": [
+    {{"priority": 1, "action": "繁中動作描述", "rationale": "為何優先"}}
+  ],
+  "risk_level": "low|medium|high|critical",
+  "confidence": 0.0-1.0
+}}
+
+## 分析方向
+  - 統計 violations vs warnings 比例
+  - 考量 asset type 分布 (secret / workload / host 各佔比)
+  - 不要寫死建議,根據實際資料推理
+"""
+
+
+async def _llm_analyze_compliance_posture(
+    warning_assets: list[dict[str, Any]],
+    stats: dict[str, Any],
+) -> dict[str, Any] | None:
+    """用 LLM 分析整體 compliance posture. 失敗回 None.
+
+    2026-04-19 P1.2 重構: 改用 llm_json_parser.parse_llm_json_response.
+    """
+    try:
+        import json as _j
+        from src.services.llm_json_parser import parse_llm_json_response
+        from src.services.openclaw import get_openclaw
+
+        prompt = _LLM_POSTURE_PROMPT.format(
+            total_assets=stats.get("assets_scanned", 0),
+            violations_count=stats.get("violations", 0),
+            warnings_count=stats.get("warnings", 0),
+            warning_list_json=_j.dumps(warning_assets[:20], ensure_ascii=False, indent=2),
+        )
+        openclaw = get_openclaw()
+        text, provider, success = await openclaw.call(prompt)
+        if not success or not text:
+            return None
+
+        parsed = parse_llm_json_response(
+            text, required_key="posture_grade", logger_context="compliance",
+        )
+        if parsed:
+            parsed["_llm_provider"] = provider
+        return parsed
+    except Exception as e:
+        logger.warning("compliance_llm_error", error=str(e))
+        return None
+
+
+async def _send_telegram_posture(
+    warning_assets: list[dict[str, Any]],
+    stats: dict[str, Any],
+    analysis: dict[str, Any],
+) -> None:
+    """推 Telegram 合規摘要 + 互動按鈕 (P0 修)."""
+    try:
+        import html
+        from src.core.config import settings
+        from src.services.ai_advisory_helpers import build_ai_advisory_keyboard, is_snoozed
+        from src.services.telegram_gateway import get_telegram_gateway
+
+        if not settings.OPENCLAW_TG_CHAT_ID:
+            return
+
+        # Snooze check (advisory_id 用當日 date 即可,一天只能 snooze 一次)
+        from src.utils.timezone import now_taipei
+        today = now_taipei().date().isoformat()
+        if await is_snoozed("compliance_posture", today):
+            logger.info("compliance_posture_snoozed", date=today)
+            return
+
+        grade = analysis.get("posture_grade", "?")
+        grade_emoji = {"A": "🟢", "B": "🟡", "C": "🟠", "D": "🔴", "F": "⛔"}.get(grade, "⚠️")
+        risk = analysis.get("risk_level", "?")
+
+        # 統計 warning_assets 的 asset_type 分布,給統帥看具體哪類最多問題
+        type_dist: dict[str, int] = {}
+        for wa in warning_assets:
+            t = wa.get("asset_type") or "unknown"
+            type_dist[t] = type_dist.get(t, 0) + 1
+        type_summary = ", ".join(f"{k}:{v}" for k, v in sorted(type_dist.items(), key=lambda x: -x[1])[:5])
+
+        lines = [
+            f"{grade_emoji} <b>今日合規態勢 (Compliance Posture)</b>",
+            f"評級: <b>{grade}</b> | 風險: {html.escape(risk)} | 信心: {analysis.get('confidence', 0):.0%}",
+            "",
+            f"📊 掃描: {stats.get('assets_scanned', 0)} assets | "
+            f"violations {stats.get('violations', 0)} | warnings {stats.get('warnings', 0)}",
+            f"📂 問題 asset 類型分布: {html.escape(type_summary) if type_summary else '(無)'}",
+            "",
+            f"📝 {html.escape(str(analysis.get('posture_summary', ''))[:300])}",
+            "",
+            "<b>Top Priorities</b>:",
+        ]
+        for p in (analysis.get("top_priorities") or [])[:3]:
+            pri = p.get("priority", "?")
+            action = html.escape(str(p.get("action", ""))[:120])
+            rationale = html.escape(str(p.get("rationale", ""))[:120])
+            lines.append(f"   {pri}. {action}")
+            lines.append(f"      ↳ <i>{rationale}</i>")
+        lines.append("")
+        lines.append("決策: 人工評估各項修復優先")
+
+        msg = "\n".join(lines)
+        keyboard = build_ai_advisory_keyboard(
+            advisory_type="compliance_posture",
+            advisory_id=today,
+            include_view=False,
+            include_produce_cmd=False,
+        )
+        tg = get_telegram_gateway()
+        await tg._send_request("sendMessage", {  # type: ignore[attr-defined]
+            "chat_id": settings.OPENCLAW_TG_CHAT_ID,
+            "text": msg,
+            "parse_mode": "HTML",
+            "disable_web_page_preview": True,
+            "reply_markup": keyboard,
+        })
+    except Exception as e:
+        logger.warning("compliance_telegram_failed", error=str(e))
+
+
+# ============================================================================
+# AOL
+# ============================================================================
+
+async def _log_aol(stats: dict[str, int], duration_ms: int, triggered_by: str, error: str | None) -> None:
+    try:
+        from sqlalchemy import text as _sql
+        from src.db.base import get_db_context
+
+        aol_status = "failed" if error else "success"
+
+        async with get_db_context() as db:
+            await db.execute(
+                _sql("""
+                    INSERT INTO automation_operation_log (
+                        operation_type, actor, status,
+                        input, output, duration_ms, error, tags
+                    ) VALUES (
+                        'coverage_recalculated',
+                        'compliance_scanner',
+                        :st,
+                        CAST(:input AS jsonb),
+                        CAST(:output AS jsonb),
+                        :dur, :err, :tags
+                    )
+                """),
+                {
+                    "st": aol_status,
+                    "input": _json.dumps({"triggered_by": triggered_by, "dimensions": list(_DIMENSIONS)}, ensure_ascii=False),
+                    "output": _json.dumps(stats, ensure_ascii=False),
+                    "dur": duration_ms,
+                    "err": (error or "")[:2000] if error else None,
+                    "tags": ["compliance", "scanner"],
+                },
+            )
+    except Exception as e:
+        logger.warning("compliance_aol_write_failed", error=str(e))
+
+
+# ============================================================================
+# 時間計算
+# ============================================================================
+
+def _seconds_until_next_trigger() -> float:
+    """算到下個 03:00 Taipei 的秒數."""
+    tz_taipei = timezone(timedelta(hours=8))
+    now = datetime.now(tz_taipei)
+    today_trigger = now.replace(hour=_DAILY_TRIGGER_HOUR_TAIPEI, minute=0, second=0, microsecond=0)
+    if now >= today_trigger:
+        today_trigger = today_trigger + timedelta(days=1)
+    delta = (today_trigger - now).total_seconds()
+    return max(300.0, min(delta, 25 * 3600))
--- a/apps/api/src/jobs/coverage_evaluator_job.py
+++ b/apps/api/src/jobs/coverage_evaluator_job.py
@@ -0,0 +1,745 @@
+"""
+Coverage Evaluator Job — ADR-090 § 覆蓋率評估
+==============================================
+把 asset_coverage_snapshot 從 'unknown' 升級為真實 green/yellow/red.
+
+職責邊界 (MVP):
+  ✅ auto_monitoring: 查 Prometheus /api/v1/targets 看 asset 是否有 scrape target
+  ✅ auto_alerting: asset 的 host/namespace 是否 match alert_rule_catalog.labels
+  ✅ auto_km_creation: asset_type 是否有對應 knowledge_entries (粗略)
+  ⏳ TODO: auto_rule_matching (需 alert history 統計)
+  ⏳ TODO: auto_playbook / auto_remediation / auto_rule_creation (需 playbook 表)
+
+設計鐵律:
+  - 只 UPDATE 最新 run 的 coverage_snapshot (不創新 row)
+  - evidence JSONB 記錄 「為什麼 green/red」的證據
+  - 失敗 → log + 跳過該 dim,不 crash 整個 evaluator
+
+排程:
+  - 首次延遲 300s (asset_scanner+rule_catalog 完成後)
+  - 每 1h 跑一次
+
+2026-04-19 ogt + Claude Opus 4.7 (1M context) Asia/Taipei
+ADR-090 § Phase 7 Coverage Evaluator
+"""
+from __future__ import annotations
+
+import asyncio
+import json as _json
+import time as _time
+from typing import Any
+
+import httpx
+import structlog
+
+from src.core.config import settings
+
+logger = structlog.get_logger(__name__)
+
+# ============================================================================
+# 排程
+# ============================================================================
+_EVAL_INTERVAL_SEC = 3600
+_FIRST_DELAY_SEC = 300
+_HTTP_TIMEOUT_SEC = 10
+_LOOP_BACKOFF_SEC = 600
+
+
+# ============================================================================
+# Public entry
+# ============================================================================
+
+async def run_coverage_evaluator_loop() -> None:
+    """每 1h 把最新 run 的 coverage_snapshot 從 unknown 升級成真實 status."""
+    logger.info("coverage_evaluator_loop_started", interval_sec=_EVAL_INTERVAL_SEC)
+    await asyncio.sleep(_FIRST_DELAY_SEC)
+
+    while True:
+        try:
+            await evaluate_once()
+        except Exception as e:
+            logger.exception("coverage_evaluator_loop_error", error=str(e))
+            await asyncio.sleep(_LOOP_BACKOFF_SEC)
+            continue
+        await asyncio.sleep(_EVAL_INTERVAL_SEC)
+
+
+async def evaluate_once() -> dict[str, int]:
+    """針對最新 asset_discovery_run 的 coverage_snapshot 升級 status.
+
+    2026-04-19 v2 擴充 4 維 (原 3 維 monitoring/alerting/km):
+      + auto_playbook: asset.name 出現在 playbooks.symptom_pattern 或 description
+      + auto_remediation: remediation_events 過去 30d 有 target match asset.name
+      + auto_rule_matching: incidents 過去 30d 有 asset match (alertname+affected_services)
+      + auto_rule_creation: alert_rule_catalog source='ai_generated' 覆蓋 asset
+
+    2026-04-19 P0 修: 加 hourly_lock 避免多 Pod 重複推 + LLM 分析.
+    """
+    from src.services.ai_advisory_helpers import try_acquire_hourly_lock
+
+    if not await try_acquire_hourly_lock("coverage_evaluator"):
+        logger.info("coverage_evaluate_skipped_not_leader")
+        return {"skipped": "not_leader"}
+
+    started_ms = _time.time()
+    stats = {
+        "monitoring_updated": 0, "alerting_updated": 0, "km_updated": 0,
+        "playbook_updated": 0, "remediation_updated": 0,
+        "rule_matching_updated": 0, "rule_creation_updated": 0,
+    }
+    error_msg: str | None = None
+
+    try:
+        run_id = await _get_latest_run_id()
+        if not run_id:
+            logger.info("coverage_evaluator_no_run_yet")
+            return stats
+
+        # 原 3 維
+        stats["monitoring_updated"] = await _evaluate_monitoring(run_id)
+        stats["alerting_updated"] = await _evaluate_alerting(run_id)
+        stats["km_updated"] = await _evaluate_km_coverage(run_id)
+
+        # v2 新增 4 維
+        stats["playbook_updated"] = await _evaluate_playbook_coverage(run_id)
+        stats["remediation_updated"] = await _evaluate_remediation_coverage(run_id)
+        stats["rule_matching_updated"] = await _evaluate_rule_matching_coverage(run_id)
+        stats["rule_creation_updated"] = await _evaluate_rule_creation_coverage(run_id)
+
+    except Exception as e:
+        error_msg = f"{type(e).__name__}: {e}"[:1000]
+        logger.exception("coverage_evaluate_once_failed", error=error_msg)
+
+    duration_ms = int((_time.time() - started_ms) * 1000)
+
+    # Gap 3.3 LLM 升級: 分析 red 分布產補覆蓋建議
+    # 2026-04-19 P1.3 閾值調整 (架構師 review): 從「total_red >= 20」改雙條件
+    #   - 紅佔比 > 30%: 實質有治理缺口
+    #   - 且總 asset_scan >= 50: 樣本量足夠
+    # 避免 bootstrap 首次 scan 必觸發 LLM 浪費 token.
+    red_summary = await _fetch_red_summary()
+    llm_analysis: dict[str, Any] | None = None
+    if red_summary:
+        total_red = red_summary.get("total_red", 0)
+        total_scanned = red_summary.get("total_scanned", 0)
+        red_ratio = (total_red / total_scanned) if total_scanned > 0 else 0.0
+        if red_ratio > 0.3 and total_scanned >= 50:
+            llm_analysis = await _llm_analyze_coverage_gaps(red_summary)
+            if llm_analysis:
+                stats["llm_analyzed"] = True
+                await _send_telegram_gaps(red_summary, llm_analysis)
+
+    await _log_aol(stats, duration_ms, error_msg)
+
+    logger.info(
+        "coverage_evaluate_once_done",
+        monitoring=stats["monitoring_updated"],
+        alerting=stats["alerting_updated"],
+        km=stats["km_updated"],
+        playbook=stats["playbook_updated"],
+        remediation=stats["remediation_updated"],
+        rule_matching=stats["rule_matching_updated"],
+        rule_creation=stats["rule_creation_updated"],
+        llm_analyzed=bool(llm_analysis),
+        duration_ms=duration_ms,
+    )
+    return stats
+
+
+# ============================================================================
+# Gap 3.3 LLM 升級 — 覆蓋率缺口分析 + 補覆蓋建議
+# ============================================================================
+
+async def _fetch_red_summary() -> dict[str, Any] | None:
+    """撈最新 run 的 red 分佈 + top red asset type.
+
+    2026-04-19 P1.3: 加 total_scanned 供呼叫端算 red_ratio 做雙條件觸發.
+    """
+    from sqlalchemy import text as _sql
+    from src.db.base import get_db_context
+
+    try:
+        async with get_db_context() as db:
+            # 總覽: 每維度 red count
+            dim_rows = await db.execute(_sql("""
+                SELECT dimension, count(*) AS cnt
+                FROM asset_coverage_snapshot
+                WHERE run_id = (
+                    SELECT run_id FROM asset_discovery_run
+                    WHERE status='success' ORDER BY ended_at DESC LIMIT 1
+                )
+                  AND coverage_status = 'red'
+                GROUP BY dimension
+                ORDER BY cnt DESC
+            """))
+            by_dim = [{"dimension": r.dimension, "red_count": int(r.cnt)} for r in dim_rows.fetchall()]
+            total_red = sum(d["red_count"] for d in by_dim)
+            if total_red == 0:
+                return None
+
+            # 總 snapshot 數 (for red_ratio 計算)
+            total_row = await db.execute(_sql("""
+                SELECT count(*) AS total
+                FROM asset_coverage_snapshot
+                WHERE run_id = (
+                    SELECT run_id FROM asset_discovery_run
+                    WHERE status='success' ORDER BY ended_at DESC LIMIT 1
+                )
+            """))
+            total_scanned = int(total_row.scalar() or 0)
+
+            # Top red asset: 哪些 asset 被標最多 red
+            asset_rows = await db.execute(_sql("""
+                SELECT ai.asset_key, ai.asset_type, count(*) AS red_dims
+                FROM asset_coverage_snapshot cs
+                JOIN asset_inventory ai ON cs.asset_id = ai.asset_id
+                WHERE cs.run_id = (
+                    SELECT run_id FROM asset_discovery_run
+                    WHERE status='success' ORDER BY ended_at DESC LIMIT 1
+                )
+                  AND cs.coverage_status = 'red'
+                GROUP BY ai.asset_key, ai.asset_type
+                ORDER BY red_dims DESC
+                LIMIT 10
+            """))
+            top_assets = [
+                {"asset_key": r.asset_key, "asset_type": r.asset_type, "red_dims": int(r.red_dims)}
+                for r in asset_rows.fetchall()
+            ]
+            return {
+                "total_red": total_red,
+                "total_scanned": total_scanned,
+                "by_dimension": by_dim,
+                "top_red_assets": top_assets,
+            }
+    except Exception as e:
+        logger.warning("fetch_red_summary_failed", error=str(e))
+        return None
+
+
+_LLM_COVERAGE_PROMPT = """你是 AWOOOI 可觀察性覆蓋率專家。以下是最新 asset 覆蓋率掃描的 red 缺口,請分析並提出補覆蓋優先順序.
+
+## red 缺口分布
+各維度 red 數: {by_dim_json}
+總 red count: {total_red}
+
+## 最多 red 的 asset (top 10)
+{top_assets_json}
+
+## 7 維自動化意義
+  - auto_monitoring: 有無 Prometheus scrape
+  - auto_alerting: 有無 alert rule 覆蓋
+  - auto_rule_creation: 有無 AI 產生的規則
+  - auto_rule_matching: 過去 30d 是否被 alert 匹配
+  - auto_playbook: 有無 playbook
+  - auto_remediation: 過去 30d 有無 remediation
+  - auto_km_creation: 有無 knowledge_entries
+
+## 輸出規格 (純 JSON)
+{{
+  "worst_dimension": "哪個維度最該優先補",
+  "root_cause": "red 集中的真因 (繁中)",
+  "top_remediation_actions": [
+    {{"priority": 1, "target": "asset_key 或類型", "action": "具體動作", "effort": "low|medium|high"}}
+  ],
+  "estimated_weeks_to_close": 1-52,
+  "confidence": 0.0-1.0
+}}
+"""
+
+
+async def _llm_analyze_coverage_gaps(red_summary: dict[str, Any]) -> dict[str, Any] | None:
+    """LLM 分析 coverage 缺口. 失敗回 None.
+
+    2026-04-19 P1.2 重構: 改用 llm_json_parser.parse_llm_json_response.
+    """
+    try:
+        import json as _j
+        from src.services.llm_json_parser import parse_llm_json_response
+        from src.services.openclaw import get_openclaw
+
+        prompt = _LLM_COVERAGE_PROMPT.format(
+            by_dim_json=_j.dumps(red_summary.get("by_dimension", []), ensure_ascii=False),
+            total_red=red_summary.get("total_red", 0),
+            top_assets_json=_j.dumps(red_summary.get("top_red_assets", []), ensure_ascii=False, indent=2),
+        )
+        openclaw = get_openclaw()
+        text, provider, success = await openclaw.call(prompt)
+        if not success or not text:
+            return None
+
+        parsed = parse_llm_json_response(
+            text, required_key="worst_dimension", logger_context="coverage",
+        )
+        if parsed:
+            parsed["_llm_provider"] = provider
+        return parsed
+    except Exception as e:
+        logger.warning("coverage_llm_error", error=str(e))
+        return None
+
+
+async def _send_telegram_gaps(
+    red_summary: dict[str, Any],
+    analysis: dict[str, Any],
+) -> None:
+    """推 coverage 缺口 Telegram 摘要 + 互動按鈕 (P0 修)."""
+    try:
+        import html
+        from src.core.config import settings
+        from src.services.ai_advisory_helpers import build_ai_advisory_keyboard, is_snoozed
+        from src.services.telegram_gateway import get_telegram_gateway
+
+        if not settings.OPENCLAW_TG_CHAT_ID:
+            return
+
+        # Snooze check: 以 worst_dimension 為 key
+        worst_dim = str(analysis.get("worst_dimension", "unknown"))
+        if await is_snoozed("coverage_gap", worst_dim):
+            logger.info("coverage_gap_snoozed", dim=worst_dim)
+            return
+
+        worst = html.escape(str(analysis.get("worst_dimension", "")))
+        cause = html.escape(str(analysis.get("root_cause", ""))[:200])
+        weeks = analysis.get("estimated_weeks_to_close", "?")
+        conf = analysis.get("confidence", 0.0)
+
+        lines = [
+            "📉 <b>Coverage 缺口分析 (AI 升級)</b>",
+            f"總 red: <b>{red_summary.get('total_red', 0)}</b> | 最嚴重維度: <code>{worst}</code>",
+            f"預計補齊週數: {weeks}w | AI 信心: {conf:.0%}",
+            "",
+            f"🔍 真因: {cause}",
+            "",
+            "<b>Top Remediation Priorities</b>:",
+        ]
+        for act in (analysis.get("top_remediation_actions") or [])[:3]:
+            pri = act.get("priority", "?")
+            target = html.escape(str(act.get("target", ""))[:60])
+            action = html.escape(str(act.get("action", ""))[:100])
+            effort = act.get("effort", "?")
+            lines.append(f"   {pri}. <code>{target}</code> [{effort}]")
+            lines.append(f"      ↳ {action}")
+        lines.append("")
+        lines.append("決策: 人工評估補覆蓋排程")
+
+        msg = "\n".join(lines)
+        keyboard = build_ai_advisory_keyboard(
+            advisory_type="coverage_gap",
+            advisory_id=worst_dim,
+            include_view=False,
+            include_produce_cmd=False,
+        )
+        tg = get_telegram_gateway()
+        await tg._send_request("sendMessage", {  # type: ignore[attr-defined]
+            "chat_id": settings.OPENCLAW_TG_CHAT_ID,
+            "text": msg,
+            "parse_mode": "HTML",
+            "disable_web_page_preview": True,
+            "reply_markup": keyboard,
+        })
+    except Exception as e:
+        logger.warning("coverage_telegram_failed", error=str(e))
+
+
+# ============================================================================
+# 查最新 run_id
+# ============================================================================
+
+async def _get_latest_run_id() -> str | None:
+    from sqlalchemy import text as _sql
+    from src.db.base import get_db_context
+
+    try:
+        async with get_db_context() as db:
+            row = await db.execute(
+                _sql("SELECT run_id FROM asset_discovery_run WHERE status='success' ORDER BY ended_at DESC LIMIT 1"),
+            )
+            rid = row.scalar()
+            return str(rid) if rid else None
+    except Exception as e:
+        logger.warning("get_latest_run_id_failed", error=str(e))
+        return None
+
+
+# ============================================================================
+# auto_monitoring: Prometheus targets
+# ============================================================================
+
+async def _evaluate_monitoring(run_id: str) -> int:
+    """
+    Prometheus /api/v1/targets 拿所有 scrape targets 的 instance IP,
+    然後 UPDATE asset_coverage_snapshot dim='auto_monitoring':
+      - host asset 的 IP 在 targets 內 → green
+      - 不在 → red
+    """
+    targets_ips = await _fetch_prometheus_target_ips()
+    if not targets_ips:
+        return 0
+
+    from sqlalchemy import text as _sql
+    from src.db.base import get_db_context
+
+    try:
+        async with get_db_context() as db:
+            # host asset: 看 metadata.internal_ip 是否在 targets
+            # 其他 asset type: 留 unknown (Prometheus 不直接 scrape)
+            result = await db.execute(
+                _sql("""
+                    UPDATE asset_coverage_snapshot cs
+                    SET coverage_status = CASE
+                            WHEN (ai.metadata->>'internal_ip')::text = ANY(:ips) THEN 'green'
+                            WHEN ai.asset_type = 'host' THEN 'red'
+                            ELSE cs.coverage_status
+                        END,
+                        evidence = CASE
+                            WHEN (ai.metadata->>'internal_ip')::text = ANY(:ips)
+                              THEN jsonb_build_object(
+                                'source', 'prometheus_targets',
+                                'matched_ip', ai.metadata->>'internal_ip'
+                              )
+                            WHEN ai.asset_type = 'host'
+                              THEN jsonb_build_object(
+                                'source', 'prometheus_targets',
+                                'reason', 'host IP not in scrape targets'
+                              )
+                            ELSE cs.evidence
+                        END
+                    FROM asset_inventory ai
+                    WHERE cs.asset_id = ai.asset_id
+                      AND cs.run_id = CAST(:rid AS uuid)
+                      AND cs.dimension = 'auto_monitoring'
+                      AND ai.asset_type = 'host'
+                """),
+                {"rid": run_id, "ips": targets_ips},
+            )
+            return result.rowcount or 0
+    except Exception as e:
+        logger.warning("evaluate_monitoring_failed", error=str(e))
+        return 0
+
+
+async def _fetch_prometheus_target_ips() -> list[str]:
+    """GET Prometheus /api/v1/targets 回傳 scrape target IPs."""
+    url = f"{settings.PROMETHEUS_URL.rstrip('/')}/api/v1/targets"
+    try:
+        async with httpx.AsyncClient(timeout=_HTTP_TIMEOUT_SEC, trust_env=False) as client:
+            resp = await client.get(url, params={"state": "active"})
+            resp.raise_for_status()
+            data = resp.json()
+        ips: set[str] = set()
+        for t in (data.get("data", {}) or {}).get("activeTargets", []) or []:
+            instance = ((t.get("labels") or {}).get("instance") or "")
+            ip = instance.split(":")[0] if instance else ""
+            if ip:
+                ips.add(ip)
+        return sorted(ips)
+    except Exception as e:
+        logger.warning("prometheus_targets_fetch_failed", error=str(e))
+        return []
+
+
+# ============================================================================
+# auto_alerting: alert_rule_catalog labels match
+# ============================================================================
+
+async def _evaluate_alerting(run_id: str) -> int:
+    """
+    每個 host/k8s_workload asset:
+      - 看 alert_rule_catalog.labels.host 是否 match asset.host → green
+      - 或 alert_rule_catalog.labels.namespace match asset.namespace → green
+      - 無任何 match → red
+    """
+    from sqlalchemy import text as _sql
+    from src.db.base import get_db_context
+
+    try:
+        async with get_db_context() as db:
+            result = await db.execute(
+                _sql("""
+                    UPDATE asset_coverage_snapshot cs
+                    SET coverage_status = CASE
+                            WHEN EXISTS (
+                                SELECT 1 FROM alert_rule_catalog arc
+                                WHERE (arc.labels ? 'host' AND arc.labels->>'host' = ai.host)
+                                   OR (arc.labels ? 'namespace' AND arc.labels->>'namespace' = ai.namespace)
+                                   OR (arc.labels ? 'layer' AND arc.labels->>'layer' LIKE '%' || COALESCE(ai.host, '') || '%')
+                            ) THEN 'green'
+                            ELSE 'red'
+                        END,
+                        evidence = jsonb_build_object(
+                            'source', 'alert_rule_catalog_label_match',
+                            'asset_host', ai.host,
+                            'asset_namespace', ai.namespace
+                        )
+                    FROM asset_inventory ai
+                    WHERE cs.asset_id = ai.asset_id
+                      AND cs.run_id = CAST(:rid AS uuid)
+                      AND cs.dimension = 'auto_alerting'
+                      AND ai.asset_type IN ('host', 'k8s_workload', 'container')
+                """),
+                {"rid": run_id},
+            )
+            return result.rowcount or 0
+    except Exception as e:
+        logger.warning("evaluate_alerting_failed", error=str(e))
+        return 0
+
+
+# ============================================================================
+# auto_km_creation: knowledge_entries 覆蓋
+# ============================================================================
+
+async def _evaluate_km_coverage(run_id: str) -> int:
+    """
+    asset 有對應 knowledge_entries → green
+
+    2026-04-19 ogt + Claude Opus 4.7 v2 bug fix: knowledge_entries 欄位是 'content',
+    不是 'body' (前次 UndefinedColumnError). 同時加 title 匹配擴大覆蓋.
+    """
+    from sqlalchemy import text as _sql
+    from src.db.base import get_db_context
+
+    try:
+        async with get_db_context() as db:
+            result = await db.execute(
+                _sql("""
+                    UPDATE asset_coverage_snapshot cs
+                    SET coverage_status = CASE
+                            WHEN ai.asset_type = 'k8s_workload' AND EXISTS (
+                                SELECT 1 FROM knowledge_entries ke
+                                WHERE ke.content ILIKE '%' || ai.name || '%'
+                                   OR ke.title ILIKE '%' || ai.name || '%'
+                            ) THEN 'green'
+                            WHEN ai.asset_type = 'k8s_workload' THEN 'yellow'
+                            ELSE cs.coverage_status
+                        END,
+                        evidence = jsonb_build_object(
+                            'source', 'knowledge_entries_content_or_title_match',
+                            'asset_name', ai.name
+                        )
+                    FROM asset_inventory ai
+                    WHERE cs.asset_id = ai.asset_id
+                      AND cs.run_id = CAST(:rid AS uuid)
+                      AND cs.dimension = 'auto_km_creation'
+                      AND ai.asset_type = 'k8s_workload'
+                """),
+                {"rid": run_id},
+            )
+            return result.rowcount or 0
+    except Exception as e:
+        logger.warning("evaluate_km_coverage_failed", error=str(e))
+        return 0
+
+
+# ============================================================================
+# v2 新增 4 維 evaluator
+# ============================================================================
+
+async def _evaluate_playbook_coverage(run_id: str) -> int:
+    """
+    auto_playbook: k8s_workload asset 在 playbooks.symptom_pattern (JSON) 或 description 出現 → green
+    沒對應 playbook 但 type 合理 → yellow; 否則保持 unknown
+    """
+    from sqlalchemy import text as _sql
+    from src.db.base import get_db_context
+
+    try:
+        async with get_db_context() as db:
+            result = await db.execute(
+                _sql("""
+                    UPDATE asset_coverage_snapshot cs
+                    SET coverage_status = CASE
+                            WHEN ai.asset_type = 'k8s_workload' AND EXISTS (
+                                SELECT 1 FROM playbooks pb
+                                WHERE pb.status = 'approved'
+                                  AND (pb.description ILIKE '%' || ai.name || '%'
+                                       OR pb.symptom_pattern::text ILIKE '%' || ai.name || '%')
+                            ) THEN 'green'
+                            WHEN ai.asset_type = 'k8s_workload' THEN 'yellow'
+                            ELSE cs.coverage_status
+                        END,
+                        evidence = jsonb_build_object(
+                            'source', 'playbooks_symptom_pattern_or_description_match',
+                            'asset_name', ai.name
+                        )
+                    FROM asset_inventory ai
+                    WHERE cs.asset_id = ai.asset_id
+                      AND cs.run_id = CAST(:rid AS uuid)
+                      AND cs.dimension = 'auto_playbook'
+                      AND ai.asset_type = 'k8s_workload'
+                """),
+                {"rid": run_id},
+            )
+            return result.rowcount or 0
+    except Exception as e:
+        logger.warning("evaluate_playbook_coverage_failed", error=str(e))
+        return 0
+
+
+async def _evaluate_remediation_coverage(run_id: str) -> int:
+    """
+    auto_remediation: 過去 30d remediation_events.target_resource 包含 asset.name → green
+    沒 target 匹配但 asset 是 k8s_workload/container → red (應有修復能力但沒)
+    """
+    from sqlalchemy import text as _sql
+    from src.db.base import get_db_context
+
+    try:
+        async with get_db_context() as db:
+            result = await db.execute(
+                _sql("""
+                    UPDATE asset_coverage_snapshot cs
+                    SET coverage_status = CASE
+                            WHEN ai.asset_type IN ('k8s_workload', 'container') AND EXISTS (
+                                SELECT 1 FROM remediation_events re
+                                WHERE re.target_resource ILIKE '%' || ai.name || '%'
+                                  AND re.created_at > NOW() - INTERVAL '30 days'
+                            ) THEN 'green'
+                            WHEN ai.asset_type IN ('k8s_workload', 'container') THEN 'red'
+                            ELSE cs.coverage_status
+                        END,
+                        evidence = jsonb_build_object(
+                            'source', 'remediation_events_target_match_30d',
+                            'asset_name', ai.name
+                        )
+                    FROM asset_inventory ai
+                    WHERE cs.asset_id = ai.asset_id
+                      AND cs.run_id = CAST(:rid AS uuid)
+                      AND cs.dimension = 'auto_remediation'
+                      AND ai.asset_type IN ('k8s_workload', 'container')
+                """),
+                {"rid": run_id},
+            )
+            return result.rowcount or 0
+    except Exception as e:
+        logger.warning("evaluate_remediation_coverage_failed", error=str(e))
+        return 0
+
+
+async def _evaluate_rule_matching_coverage(run_id: str) -> int:
+    """
+    auto_rule_matching: 過去 30d incidents 有觸發過關聯到該 asset → green
+    關聯: incident.alertname match alert_rule_catalog + labels.namespace/host 對應 asset
+         或 incident.affected_services ILIKE asset.name
+    沒觸發 → yellow (可能沒問題也可能沒覆蓋,中性)
+    """
+    from sqlalchemy import text as _sql
+    from src.db.base import get_db_context
+
+    try:
+        async with get_db_context() as db:
+            result = await db.execute(
+                _sql("""
+                    UPDATE asset_coverage_snapshot cs
+                    SET coverage_status = CASE
+                            WHEN EXISTS (
+                                SELECT 1 FROM incidents i
+                                WHERE i.created_at > NOW() - INTERVAL '30 days'
+                                  AND (i.affected_services::text ILIKE '%' || ai.name || '%'
+                                       OR (i.alertname IS NOT NULL AND EXISTS (
+                                           SELECT 1 FROM alert_rule_catalog arc
+                                           WHERE arc.rule_name = i.alertname
+                                             AND (arc.labels->>'host' = ai.host
+                                                  OR arc.labels->>'namespace' = ai.namespace)
+                                       )))
+                            ) THEN 'green'
+                            WHEN ai.asset_type IN ('host','k8s_workload','container') THEN 'yellow'
+                            ELSE cs.coverage_status
+                        END,
+                        evidence = jsonb_build_object(
+                            'source', 'incidents_match_30d',
+                            'asset_name', ai.name
+                        )
+                    FROM asset_inventory ai
+                    WHERE cs.asset_id = ai.asset_id
+                      AND cs.run_id = CAST(:rid AS uuid)
+                      AND cs.dimension = 'auto_rule_matching'
+                      AND ai.asset_type IN ('host', 'k8s_workload', 'container')
+                """),
+                {"rid": run_id},
+            )
+            return result.rowcount or 0
+    except Exception as e:
+        logger.warning("evaluate_rule_matching_coverage_failed", error=str(e))
+        return 0
+
+
+async def _evaluate_rule_creation_coverage(run_id: str) -> int:
+    """
+    auto_rule_creation: asset 是否有被 AI-generated rule 覆蓋
+    current: 所有 rule source='yaml_hardcoded',沒 AI-generated → 全 red (表示尚未由 AI 主動建規則)
+    未來 Hermes 建出 AI rule 後會變 green
+    """
+    from sqlalchemy import text as _sql
+    from src.db.base import get_db_context
+
+    try:
+        async with get_db_context() as db:
+            result = await db.execute(
+                _sql("""
+                    UPDATE asset_coverage_snapshot cs
+                    SET coverage_status = CASE
+                            WHEN EXISTS (
+                                SELECT 1 FROM alert_rule_catalog arc
+                                WHERE arc.source = 'ai_generated'
+                                  AND (arc.labels->>'host' = ai.host
+                                       OR arc.labels->>'namespace' = ai.namespace)
+                            ) THEN 'green'
+                            WHEN ai.asset_type IN ('host','k8s_workload','container') THEN 'red'
+                            ELSE cs.coverage_status
+                        END,
+                        evidence = jsonb_build_object(
+                            'source', 'alert_rule_catalog_ai_generated_match',
+                            'asset_name', ai.name,
+                            'note', 'AI 自主建規則尚未啟用,後續 Hermes 產出後此欄變 green'
+                        )
+                    FROM asset_inventory ai
+                    WHERE cs.asset_id = ai.asset_id
+                      AND cs.run_id = CAST(:rid AS uuid)
+                      AND cs.dimension = 'auto_rule_creation'
+                      AND ai.asset_type IN ('host', 'k8s_workload', 'container')
+                """),
+                {"rid": run_id},
+            )
+            return result.rowcount or 0
+    except Exception as e:
+        logger.warning("evaluate_rule_creation_coverage_failed", error=str(e))
+        return 0
+
+
+# ============================================================================
+# AOL
+# ============================================================================
+
+async def _log_aol(stats: dict[str, int], duration_ms: int, error: str | None) -> None:
+    try:
+        from sqlalchemy import text as _sql
+        from src.db.base import get_db_context
+
+        aol_status = "failed" if error else "success"
+        async with get_db_context() as db:
+            await db.execute(
+                _sql("""
+                    INSERT INTO automation_operation_log (
+                        operation_type, actor, status,
+                        input, output, duration_ms, error, tags
+                    ) VALUES (
+                        'coverage_recalculated',
+                        'coverage_evaluator',
+                        :st,
+                        '{}'::jsonb,
+                        CAST(:output AS jsonb),
+                        :dur, :err, :tags
+                    )
+                """),
+                {
+                    "st": aol_status,
+                    "output": _json.dumps(stats, ensure_ascii=False),
+                    "dur": duration_ms,
+                    "err": (error or "")[:2000] if error else None,
+                    "tags": ["coverage_evaluator"],
+                },
+            )
+    except Exception as e:
+        logger.warning("coverage_evaluator_aol_failed", error=str(e))
--- a/apps/api/src/jobs/hermes_rule_quality_job.py
+++ b/apps/api/src/jobs/hermes_rule_quality_job.py
@@ -0,0 +1,378 @@
+"""
+Hermes Rule Quality Advisor — ADR-090 § E3 AI 規則品質建議
+==========================================================
+每日 04:00 Taipei 分析 alert_rule_catalog,對 noise_rate > 0.7 的 rule 推 Telegram
+建議 + 寫 aol(rule_rejected) 稽核,人工決策是否 deprecate.
+
+職責邊界:
+  ✅ 讀 alert_rule_catalog WHERE noise_rate >= 0.7
+  ✅ 為每條寫 aol(rule_rejected) + proposed_action='review_or_deprecate'
+  ✅ 推 Telegram 通知 SRE group (格式化清單)
+  ⏳ 不自動改 review_status (統帥鐵律: AI 不做最終決策)
+  ⏳ TODO: LLM 分析每條 rule 的假報真因 (下一階段)
+
+統帥鐵律對齊:
+  - 禁止寫死規則做最終決策 → 本 agent 只推建議,人工決策
+  - 朝 AI 自主化方向 → aol 留 trail,未來可升級為 LLM 判斷
+  - noise_rate threshold 0.7 是「觸發討論」而非「自動動作」
+
+排程:
+  - 首次延遲 420s
+  - 每日 04:00 Taipei
+
+2026-04-19 ogt + Claude Opus 4.7 (1M context) Asia/Taipei
+ADR-090 § E3 Hermes
+"""
+from __future__ import annotations
+
+import asyncio
+import json as _json
+import time as _time
+from datetime import datetime, timedelta, timezone
+from typing import Any
+
+import structlog
+
+logger = structlog.get_logger(__name__)
+
+_FIRST_DELAY_SEC = 420
+_LOOP_BACKOFF_SEC = 1800
+_DAILY_TRIGGER_HOUR_TAIPEI = 4
+
+# 觸發討論的噪音閾值
+_NOISE_THRESHOLD = 0.7
+# 樣本不足不發建議 (避免只 fire 1 次就標為噪音)
+_MIN_SAMPLE_SIZE = 5
+
+
+async def run_hermes_rule_quality_loop() -> None:
+    """每日 04:00 分析 rule 品質."""
+    logger.info("hermes_rule_quality_loop_started")
+    await asyncio.sleep(_FIRST_DELAY_SEC)
+
+    while True:
+        try:
+            await analyze_once()
+        except Exception as e:
+            logger.exception("hermes_rule_quality_loop_error", error=str(e))
+            await asyncio.sleep(_LOOP_BACKOFF_SEC)
+            continue
+
+        sleep_sec = _seconds_until_next_trigger()
+        logger.info("hermes_rule_quality_next_tick", sleep_sec=sleep_sec)
+        await asyncio.sleep(sleep_sec)
+
+
+async def analyze_once() -> dict[str, int]:
+    """一次分析: 找噪音 rule + LLM 分析真因 + 推建議 + aol 留痕.
+
+    2026-04-19 P0 修: 加 daily leader_lock 避免多 Pod 重複推.
+    """
+    from src.services.ai_advisory_helpers import try_acquire_daily_lock
+
+    if not await try_acquire_daily_lock("hermes_rule_quality"):
+        logger.info("hermes_analyze_skipped_not_leader")
+        return {"skipped": "not_leader"}
+
+    started_ms = _time.time()
+    stats = {"noisy_rules": 0, "llm_analyzed": 0, "advisories_written": 0, "telegram_sent": 0}
+    error_msg: str | None = None
+    llm_analyses: dict[str, dict[str, Any]] = {}
+
+    try:
+        noisy = await _fetch_noisy_rules()
+        stats["noisy_rules"] = len(noisy)
+
+        # v2 升級: 對每條 noisy rule 跑 LLM 分析真因 + 具體建議
+        for r in noisy:
+            analysis = await _llm_analyze_noisy_rule(r)
+            if analysis:
+                llm_analyses[r["rule_name"]] = analysis
+                stats["llm_analyzed"] += 1
+
+        for r in noisy:
+            ok = await _write_advisory_aol(r, llm_analyses.get(r["rule_name"]))
+            if ok:
+                stats["advisories_written"] += 1
+
+        if noisy:
+            sent = await _send_telegram_summary(noisy, llm_analyses)
+            stats["telegram_sent"] = 1 if sent else 0
+
+    except Exception as e:
+        error_msg = f"{type(e).__name__}: {e}"[:1000]
+        logger.exception("hermes_analyze_once_failed", error=error_msg)
+
+    duration_ms = int((_time.time() - started_ms) * 1000)
+    logger.info(
+        "hermes_rule_quality_once_done",
+        noisy=stats["noisy_rules"],
+        llm_analyzed=stats["llm_analyzed"],
+        advisories=stats["advisories_written"],
+        telegram_sent=stats["telegram_sent"],
+        duration_ms=duration_ms,
+    )
+    return stats
+
+
+# ============================================================================
+# v2 LLM 分析 — 統帥鐵律「朝 AI 自主化方向」
+# ============================================================================
+
+_LLM_ANALYZE_PROMPT = """你是 AWOOOI SRE 告警規則品質分析專家。以下是一條 Prometheus alerting rule 過去 30 天的統計,請分析假報真因並提出具體改進建議。
+
+## 告警規則
+- rule_name: {rule_name}
+- severity: {severity}
+- expr: {expr}
+- for: {duration_seconds}s
+- labels: {labels}
+- annotations: {annotations}
+
+## 過去 30 天統計
+- true_positive (確實解決的): {tp}
+- false_positive (有破壞性動作但 EXPIRED 沒人理): {fp}
+- noise_rate: {noise_rate}
+
+## 輸出規格 (必須是合法 JSON,純 JSON 無前後文字)
+{{
+  "probable_root_causes": ["3-4 個候選真因,繁中"],
+  "recommended_actions": [
+    {{"action": "adjust_threshold|add_for_duration|refine_labels|deprecate|split_rule|keep_as_is", "detail": "具體怎麼做,繁中一句話"}}
+  ],
+  "confidence": 0.0-1.0,
+  "should_deprecate": true/false
+}}
+
+## 分析思路
+1. 看 expr 是否過於敏感 (閾值太低 / 沒有 for: window)
+2. 看 annotations 是否暗示「這是真實需要處理的問題」但被 AI 判 NO_ACTION → 可能是 action 流程問題而非規則問題
+3. 考慮 severity warning/critical 是否合理
+"""
+
+
+async def _llm_analyze_noisy_rule(rule: dict[str, Any]) -> dict[str, Any] | None:
+    """用 OpenClaw (多 provider) 分析噪音真因. 失敗回 None 不阻塞.
+
+    2026-04-19 P1.2 重構: 使用 llm_json_parser.parse_llm_json_response 共用 helper
+    (原 30 行重複 3-path parse 邏輯已抽出到 services/llm_json_parser.py).
+    """
+    try:
+        import json as _j
+        from src.services.llm_json_parser import parse_llm_json_response
+        from src.services.openclaw import get_openclaw
+
+        prompt = _LLM_ANALYZE_PROMPT.format(
+            rule_name=rule["rule_name"],
+            severity=rule["severity"] or "-",
+            expr=(rule.get("expr") or "")[:500],
+            duration_seconds=rule.get("duration_seconds") or 0,
+            labels=_j.dumps(rule.get("labels", {}), ensure_ascii=False)[:300],
+            annotations=_j.dumps(rule.get("annotations", {}), ensure_ascii=False)[:300],
+            tp=rule["tp"],
+            fp=rule["fp"],
+            noise_rate=f"{rule['noise_rate']:.1%}",
+        )
+        openclaw = get_openclaw()
+        text, provider, success = await openclaw.call(prompt)
+        if not success or not text:
+            return None
+
+        parsed = parse_llm_json_response(
+            text,
+            required_key="recommended_actions",
+            logger_context=f"hermes:{rule['rule_name']}",
+        )
+        if parsed:
+            parsed["_llm_provider"] = provider
+        return parsed
+    except Exception as e:
+        logger.warning("hermes_llm_analyze_error", rule=rule["rule_name"], error=str(e))
+        return None
+
+
+# ============================================================================
+# 資料查詢
+# ============================================================================
+
+async def _fetch_noisy_rules() -> list[dict[str, Any]]:
+    """撈 noise_rate >= 0.7 且樣本 >= 5 的 rules."""
+    from sqlalchemy import text as _sql
+    from src.db.base import get_db_context
+
+    try:
+        async with get_db_context() as db:
+            result = await db.execute(
+                _sql(f"""
+                    SELECT
+                        rule_id, rule_name, severity,
+                        true_positive_count, false_positive_count, noise_rate,
+                        last_fired_at, review_status
+                    FROM alert_rule_catalog
+                    WHERE noise_rate >= :thr
+                      AND (true_positive_count + false_positive_count) >= :min_sample
+                      AND (review_status IS NULL OR review_status = 'approved')
+                    ORDER BY noise_rate DESC, (true_positive_count + false_positive_count) DESC
+                """),
+                {"thr": _NOISE_THRESHOLD, "min_sample": _MIN_SAMPLE_SIZE},
+            )
+            return [
+                {
+                    "rule_id": r.rule_id,
+                    "rule_name": r.rule_name,
+                    "severity": r.severity,
+                    "tp": int(r.true_positive_count or 0),
+                    "fp": int(r.false_positive_count or 0),
+                    "noise_rate": float(r.noise_rate) if r.noise_rate else 0.0,
+                    "last_fired_at": r.last_fired_at,
+                    "review_status": r.review_status,
+                }
+                for r in result.fetchall()
+            ]
+    except Exception as e:
+        logger.warning("fetch_noisy_rules_failed", error=str(e))
+        return []
+
+
+# ============================================================================
+# 建議寫入 (aol only,不改 rule 本身)
+# ============================================================================
+
+async def _write_advisory_aol(rule: dict[str, Any], llm_analysis: dict[str, Any] | None = None) -> bool:
+    """寫 aol(rule_rejected) — 紀錄 AI 建議人工審查 + LLM 分析結果."""
+    try:
+        from sqlalchemy import text as _sql
+        from src.db.base import get_db_context
+
+        input_payload = {
+            "rule_name": rule["rule_name"],
+            "severity": rule["severity"],
+            "noise_rate": rule["noise_rate"],
+            "true_positive_count": rule["tp"],
+            "false_positive_count": rule["fp"],
+        }
+        output_payload: dict[str, Any] = {
+            "proposed_action": "review_or_deprecate",
+            "reason": (
+                f"過去 30d noise_rate {rule['noise_rate']:.1%} "
+                f"(tp={rule['tp']}, fp={rule['fp']}),"
+                f"假報過多應考慮 deprecate 或改進 expr"
+            ),
+            "requires_human_decision": True,
+        }
+        if llm_analysis:
+            output_payload["llm_analysis"] = llm_analysis
+
+        async with get_db_context() as db:
+            await db.execute(
+                _sql("""
+                    INSERT INTO automation_operation_log (
+                        operation_type, actor, status,
+                        input, output, tags
+                    ) VALUES (
+                        'rule_rejected',
+                        'hermes_rule_quality',
+                        'success',
+                        CAST(:input AS jsonb),
+                        CAST(:output AS jsonb),
+                        :tags
+                    )
+                """),
+                {
+                    "input": _json.dumps(input_payload, ensure_ascii=False),
+                    "output": _json.dumps(output_payload, ensure_ascii=False),
+                    "tags": ["hermes", "rule_quality", "advisory"],
+                },
+            )
+        return True
+    except Exception as e:
+        logger.warning("write_advisory_aol_failed", rule=rule["rule_name"], error=str(e))
+        return False
+
+
+# ============================================================================
+# Telegram 推送
+# ============================================================================
+
+async def _send_telegram_summary(
+    noisy: list[dict[str, Any]],
+    llm_analyses: dict[str, dict[str, Any]] | None = None,
+) -> bool:
+    """推 Telegram 摘要訊息給 SRE group,含 LLM 分析結果 + 互動按鈕 (P0 修)."""
+    try:
+        import html
+        from src.core.config import settings
+        from src.services.ai_advisory_helpers import build_ai_advisory_keyboard, is_snoozed
+        from src.services.telegram_gateway import get_telegram_gateway
+
+        if not settings.OPENCLAW_TG_CHAT_ID:
+            logger.info("hermes_telegram_skip_no_chat_id")
+            return False
+
+        # Snooze check: 以第一條 noisy rule_name 為 key
+        primary_rule = noisy[0]["rule_name"] if noisy else "unknown"
+        if await is_snoozed("rule_quality", primary_rule):
+            logger.info("hermes_rule_snoozed", rule=primary_rule)
+            return False
+
+        llm_analyses = llm_analyses or {}
+        lines = [
+            "🔍 <b>Hermes 規則品質檢測 (AI 分析)</b>",
+            f"檢測到 {len(noisy)} 條規則噪音率 ≥ {_NOISE_THRESHOLD:.0%},請統帥審查:",
+            "",
+        ]
+        for r in noisy[:8]:  # LLM 分析含建議,單條訊息較長,只秀 8 條
+            safe_name = html.escape(r["rule_name"])
+            lines.append(
+                f"🟡 <code>{safe_name}</code> — noise {r['noise_rate']:.1%} (tp={r['tp']} fp={r['fp']})"
+            )
+            ai = llm_analyses.get(r["rule_name"])
+            if ai:
+                deprecate = ai.get("should_deprecate")
+                conf = ai.get("confidence", 0.0)
+                lines.append(f"   AI 判定: should_deprecate={deprecate} confidence={conf:.0%}")
+                actions = ai.get("recommended_actions", []) or []
+                for act in actions[:2]:  # 最多秀前 2 個建議
+                    safe_detail = html.escape(str(act.get("detail", ""))[:120])
+                    lines.append(f"   ▸ <i>{html.escape(str(act.get('action', '')))}</i>: {safe_detail}")
+            else:
+                lines.append("   (LLM 分析不可用,僅依噪音率判斷)")
+            lines.append("")
+        if len(noisy) > 8:
+            lines.append(f"…還有 {len(noisy) - 8} 條,見 automation_operation_log")
+        lines.append("決策: 人工 UPDATE alert_rule_catalog SET review_status='deprecated' WHERE rule_name='...'")
+
+        msg = "\n".join(lines)
+        keyboard = build_ai_advisory_keyboard(
+            advisory_type="rule_quality",
+            advisory_id=primary_rule,
+            include_view=False,
+            include_produce_cmd=False,
+        )
+
+        tg = get_telegram_gateway()
+        await tg._send_request("sendMessage", {  # type: ignore[attr-defined]
+            "chat_id": settings.OPENCLAW_TG_CHAT_ID,
+            "text": msg,
+            "parse_mode": "HTML",
+            "disable_web_page_preview": True,
+            "reply_markup": keyboard,
+        })
+        return True
+    except Exception as e:
+        logger.warning("hermes_telegram_send_failed", error=str(e))
+        return False
+
+
+# ============================================================================
+# 時間
+# ============================================================================
+
+def _seconds_until_next_trigger() -> float:
+    tz_taipei = timezone(timedelta(hours=8))
+    now = datetime.now(tz_taipei)
+    today_trigger = now.replace(hour=_DAILY_TRIGGER_HOUR_TAIPEI, minute=0, second=0, microsecond=0)
+    if now >= today_trigger:
+        today_trigger = today_trigger + timedelta(days=1)
+    delta = (today_trigger - now).total_seconds()
+    return max(300.0, min(delta, 25 * 3600))
--- a/apps/api/src/jobs/incident_analysis_sweeper.py
+++ b/apps/api/src/jobs/incident_analysis_sweeper.py
@@ -0,0 +1,142 @@
+"""
+Incident Analysis Sweeper — 自動觸發 INVESTIGATING 事件 AI 分析
+================================================================
+問題背景:
+  Signal Worker 創建 Incident 後，AI 分析 (decision_manager) 原本只在
+  GET /api/v1/incidents 被呼叫時才觸發 (背景 fire-and-forget)。
+  若前端沒人看或 Telegram Bot 未呼叫該端點，新 Incident 永遠沒有 AI 分析。
+
+解法:
+  每 90 秒掃描 INVESTIGATING 狀態且無 decision token 的 Incident，
+  自動在背景觸發 get_or_create_decision()。
+
+限流:
+  Semaphore(3) — 避免並發壓垮 OPENCLAW_NEMO/Ollama
+  每批最多 5 個 incident，避免啟動雪崩
+
+Key 格式說明:
+  decision token 儲存為 decision:DEC-{HEX}，內部 incident_id 欄位對應 INC-*。
+  使用 sweeper_done:{incident_id} 輕量標記避免重複掃描。
+  get_or_create_decision() 本身已有 COMPLETED/READY 去重，雙重保護。
+
+2026-04-16 Claude Sonnet 4.6 Asia/Taipei — 修正 key 格式 BUG
+"""
+from __future__ import annotations
+
+import asyncio
+
+import structlog
+
+from src.models.incident import Incident, IncidentStatus, Severity
+
+logger = structlog.get_logger(__name__)
+
+_SWEEP_INTERVAL_SEC = 90      # 每 90 秒掃一次
+_MAX_BATCH = 5                # 每批最多 5 個
+_SEMAPHORE_LIMIT = 3          # 最多 3 個並發 AI 分析
+_DONE_MARKER_PREFIX = "sweeper_done:"   # 輕量標記：已觸發過分析
+_DONE_MARKER_TTL = 3600                 # 1 小時 TTL，後續由 get_or_create 去重
+# 2026-04-16 ogt: 只處理 48h 內的 incident，避免首次啟動把所有歷史舊案洗版到 Telegram
+_MAX_INCIDENT_AGE_HOURS = 48
+
+
+async def run_incident_analysis_sweeper() -> None:
+    """
+    永久迴圈：每 90 秒自動為未分析的 INVESTIGATING Incident 觸發 AI 分析。
+    由 main.py lifespan 透過 asyncio.create_task() 啟動。
+    """
+    logger.info("incident_analysis_sweeper_started", interval_sec=_SWEEP_INTERVAL_SEC)
+    sem = asyncio.Semaphore(_SEMAPHORE_LIMIT)
+
+    while True:
+        try:
+            await _sweep_once(sem)
+        except Exception as e:
+            logger.warning("incident_analysis_sweeper_error", error=str(e))
+
+        await asyncio.sleep(_SWEEP_INTERVAL_SEC)
+
+
+async def _sweep_once(sem: asyncio.Semaphore) -> None:
+    """
+    執行一次掃描：找出沒有 decision token 的 INVESTIGATING incidents，
+    在背景觸發 AI 分析。
+
+    Decision token key 格式: decision:DEC-{HEX12} (非 decision:INC-*)
+    使用 sweeper_done:{incident_id} 輕量標記避免重複觸發。
+    """
+    from src.services.decision_manager import get_decision_manager
+    from src.services.incident_service import get_incident_service
+    from src.core.redis_client import get_redis
+
+    redis = get_redis()
+    incident_service = get_incident_service()
+    dm = get_decision_manager()
+
+    # 取得所有 INVESTIGATING incidents
+    try:
+        incidents: list[Incident] = await incident_service.get_active_incidents()
+    except Exception as e:
+        logger.warning("sweeper_get_incidents_failed", error=str(e))
+        return
+
+    if not incidents:
+        return
+
+    # 過濾：只處理 48h 內的 incident（避免首次啟動把全部歷史舊案洗版 Telegram）
+    from datetime import datetime, timezone, timedelta
+    now_utc = datetime.now(timezone.utc)
+    cutoff = now_utc - timedelta(hours=_MAX_INCIDENT_AGE_HOURS)
+
+    recent_incidents = []
+    for incident in incidents:
+        created = getattr(incident, "created_at", None)
+        if created:
+            # 確保 created_at 有時區資訊
+            if created.tzinfo is None:
+                created = created.replace(tzinfo=timezone.utc)
+            if created >= cutoff:
+                recent_incidents.append(incident)
+        else:
+            # 沒有 created_at 的舊資料：跳過
+            pass
+
+    if not recent_incidents:
+        return
+
+    # 找出尚未觸發過分析的 (用輕量標記，不掃描 decision:DEC-* 全集)
+    unanalyzed = []
+    for incident in recent_incidents:
+        done_key = f"{_DONE_MARKER_PREFIX}{incident.incident_id}"
+        if not await redis.exists(done_key):
+            unanalyzed.append(incident)
+
+    if not unanalyzed:
+        return
+
+    # 限制每批
+    batch = unanalyzed[:_MAX_BATCH]
+    logger.info(
+        "sweeper_triggering_analysis",
+        total_unanalyzed=len(unanalyzed),
+        batch_size=len(batch),
+    )
+
+    async def _analyze(incident: Incident) -> None:
+        async with sem:
+            try:
+                timeout = 120.0 if incident.severity in (Severity.P0, Severity.P1) else 180.0
+                await dm.get_or_create_decision(incident=incident, timeout_sec=timeout)
+                # 設 done 標記，避免下次掃描重複觸發
+                done_key = f"{_DONE_MARKER_PREFIX}{incident.incident_id}"
+                await redis.set(done_key, "1", ex=_DONE_MARKER_TTL)
+                logger.info("sweeper_analysis_done", incident_id=incident.incident_id)
+            except Exception as e:
+                logger.warning(
+                    "sweeper_analysis_failed",
+                    incident_id=incident.incident_id,
+                    error=str(e),
+                )
+
+    tasks = [asyncio.create_task(_analyze(inc)) for inc in batch]
+    await asyncio.gather(*tasks, return_exceptions=True)
--- a/apps/api/src/jobs/kb_rot_cleaner.py
+++ b/apps/api/src/jobs/kb_rot_cleaner.py
@@ -0,0 +1,247 @@
+"""
+AWOOOI AIOps Phase 6 — KB 腐爛清理 Job
+=======================================
+職責：月度巡檢知識庫（knowledge_entries）中腐爛的知識條目，
+      標記引用了已廢棄資源的條目為 stale，並寫入 ai_governance_events。
+
+「腐爛」的三種形態：
+  ROT-1  廢棄 K8s API 版本引用（extensions/v1beta1、apps/v1beta1、v1beta2）
+  ROT-2  過時 Prometheus query pattern（已知廢棄 metric 名稱前綴）
+  ROT-3  超過 180 天未被引用且成功率為 0 的 incident_case 條目
+
+設計原則：
+1. 只讀掃描 + 標記（不刪除任何 entry，符合 archive_not_delete 鐵律）
+2. 標記方式：status = 'archived' + tags 追加 'kb_rot_detected'
+3. 掃描失敗 → 記錄 error，不拋出，不影響主路徑
+4. 每次執行結果寫 ai_governance_events（event_type=kb_stale）
+
+ADR-087: AI 自我治理閉環
+2026-04-15 ogt + Claude Sonnet 4.6（亞太）: Phase 6 初始建立
+"""
+
+from __future__ import annotations
+
+import re
+from dataclasses import dataclass, field
+from datetime import timedelta
+
+import structlog
+from sqlalchemy import select, update
+
+from src.db.base import get_session_factory
+from src.db.models import AiGovernanceEvent, KnowledgeEntryRecord
+from src.utils.timezone import now_taipei
+
+logger = structlog.get_logger(__name__)
+
+# ─────────────────────────────────────────────────────────────────────────────
+# 腐爛偵測規則（不可寫死 action，只標記 stale）
+# ─────────────────────────────────────────────────────────────────────────────
+
+# ROT-1: 廢棄 K8s API 版本（Kubernetes 1.16+ 已移除）
+DEPRECATED_K8S_APIS = [
+    "extensions/v1beta1",
+    "apps/v1beta1",
+    "apps/v1beta2",
+    "networking.k8s.io/v1beta1",
+    "policy/v1beta1",
+    "rbac.authorization.k8s.io/v1beta1",
+]
+
+# ROT-2: 廢棄 Prometheus metric 前綴（已知改名的 metric pattern）
+DEPRECATED_PROM_PATTERNS = [
+    r"container_cpu_used_total",          # → container_cpu_usage_seconds_total
+    r"kube_pod_container_status_restarts$",  # → kube_pod_container_status_restarts_total
+    r"http_requests_total\{.*le=",        # 錯誤 histogram 用法
+]
+
+# ROT-3: 未引用 + 零成功率條目的老化天數
+STALE_AGE_DAYS = 180
+
+
+# ─────────────────────────────────────────────────────────────────────────────
+# Data Types
+# ─────────────────────────────────────────────────────────────────────────────
+
+@dataclass
+class RotScanResult:
+    """KB 腐爛掃描結果"""
+    total_scanned: int
+    stale_ids: list[str] = field(default_factory=list)
+    rot_reasons: dict[str, list[str]] = field(default_factory=dict)
+    # rot_reasons: {entry_id: ["ROT-1: extensions/v1beta1", ...]}
+    scanned_at: str = field(default_factory=lambda: now_taipei().isoformat())
+
+    @property
+    def stale_count(self) -> int:
+        return len(self.stale_ids)
+
+    def to_dict(self) -> dict:
+        return {
+            "total_scanned": self.total_scanned,
+            "stale_count": self.stale_count,
+            "stale_ids": self.stale_ids[:50],  # 最多記錄前 50 個
+            "rot_reasons_sample": {k: v for k, v in list(self.rot_reasons.items())[:10]},
+            "scanned_at": self.scanned_at,
+        }
+
+
+# ─────────────────────────────────────────────────────────────────────────────
+# Main Job
+# ─────────────────────────────────────────────────────────────────────────────
+
+class KbRotCleaner:
+    """
+    KB 腐爛清理 Job（月度執行）
+
+    Usage:
+        cleaner = KbRotCleaner()
+        result = await cleaner.run()
+    """
+
+    async def run(self) -> RotScanResult:
+        """
+        完整執行：掃描 → 標記 stale → 寫 governance event。
+
+        Returns:
+            RotScanResult
+        """
+        from src.core.feature_flags import aiops_flags
+        if not aiops_flags.is_sub_flag_enabled("AIOPS_P6_KB_ROT_CLEANER"):
+            logger.info("kb_rot_cleaner_skipped_feature_flag")
+            return RotScanResult(total_scanned=0)
+
+        try:
+            result = await self._scan()
+            if result.stale_count > 0:
+                await self._mark_stale(result)
+                await self._save_event(result)
+            else:
+                logger.info("kb_rot_scan_clean", total_scanned=result.total_scanned)
+            return result
+        except Exception as e:
+            logger.error("kb_rot_cleaner_error", error=str(e))
+            return RotScanResult(total_scanned=0)
+
+    async def _scan(self) -> RotScanResult:
+        """掃描所有 approved / draft 條目，找出腐爛項目。"""
+        stale_ids: list[str] = []
+        rot_reasons: dict[str, list[str]] = {}
+        total = 0
+
+        async with get_session_factory()() as session:
+            # 只掃 active 狀態（非 archived）
+            q = await session.execute(
+                select(KnowledgeEntryRecord).where(
+                    KnowledgeEntryRecord.status.in_(["approved", "draft", "review"])
+                )
+            )
+            entries = q.scalars().all()
+            total = len(entries)
+
+            stale_cutoff = now_taipei() - timedelta(days=STALE_AGE_DAYS)
+
+            for entry in entries:
+                reasons: list[str] = []
+
+                content = (entry.content or "").lower()
+                title = (entry.title or "").lower()
+                combined = content + " " + title
+
+                # ROT-1: 廢棄 K8s API
+                for api in DEPRECATED_K8S_APIS:
+                    if api.lower() in combined:
+                        reasons.append(f"ROT-1: 廢棄 K8s API {api}")
+
+                # ROT-2: 廢棄 Prometheus pattern
+                for pattern in DEPRECATED_PROM_PATTERNS:
+                    if re.search(pattern, combined):
+                        reasons.append(f"ROT-2: 廢棄 Prom metric pattern {pattern[:40]}")
+
+                # ROT-3: 老化未引用（incident_case 且 180 天未更新）
+                if (
+                    entry.entry_type == "incident_case"
+                    and entry.updated_at < stale_cutoff
+                    and entry.view_count == 0
+                ):
+                    reasons.append(
+                        f"ROT-3: 超過 {STALE_AGE_DAYS}d 未引用 "
+                        f"(last_updated={entry.updated_at.strftime('%Y-%m-%d')})"
+                    )
+
+                if reasons:
+                    stale_ids.append(entry.id)
+                    rot_reasons[entry.id] = reasons
+
+        logger.info(
+            "kb_rot_scan_complete",
+            total=total,
+            stale_count=len(stale_ids),
+        )
+        return RotScanResult(
+            total_scanned=total,
+            stale_ids=stale_ids,
+            rot_reasons=rot_reasons,
+        )
+
+    async def _mark_stale(self, result: RotScanResult) -> None:
+        """
+        將腐爛條目標記為 archived，並追加 kb_rot_detected tag。
+
+        符合 archive_not_delete 鐵律：只封存，不刪除。
+        """
+        if not result.stale_ids:
+            return
+
+        async with get_session_factory()() as session:
+            # 逐條更新（避免 bulk update 覆蓋 tags JSONB）
+            q = await session.execute(
+                select(KnowledgeEntryRecord).where(
+                    KnowledgeEntryRecord.id.in_(result.stale_ids)
+                )
+            )
+            entries = q.scalars().all()
+
+            for entry in entries:
+                entry.status = "archived"
+                tags = list(entry.tags or [])
+                if "kb_rot_detected" not in tags:
+                    tags.append("kb_rot_detected")
+                entry.tags = tags
+
+            await session.commit()
+
+        logger.warning(
+            "kb_rot_entries_archived",
+            count=len(result.stale_ids),
+            entry_ids=result.stale_ids[:10],
+        )
+
+    async def _save_event(self, result: RotScanResult) -> None:
+        """寫 kb_stale 事件到 ai_governance_events。"""
+        try:
+            async with get_session_factory()() as session:
+                event = AiGovernanceEvent(
+                    event_type="kb_stale",
+                    details=result.to_dict(),
+                    resolved=False,
+                )
+                session.add(event)
+                await session.commit()
+                logger.info("kb_rot_event_saved", stale_count=result.stale_count)
+        except Exception as e:
+            logger.error("kb_rot_event_save_error", error=str(e))
+
+
+# ─────────────────────────────────────────────────────────────────────────────
+# Singleton
+# ─────────────────────────────────────────────────────────────────────────────
+
+_cleaner: KbRotCleaner | None = None
+
+
+def get_kb_rot_cleaner() -> KbRotCleaner:
+    global _cleaner
+    if _cleaner is None:
+        _cleaner = KbRotCleaner()
+    return _cleaner
--- a/Show More
+++ b/Show More