docs: Phase 15 觀測架構更新 Skills 04/05
Skill 04 v1.5: - 新增 Phase 15 三層觀測架構章節 - Deep Linking URL 格式說明 - Trace Context 傳遞架構圖 Skill 05 v1.3: - 新增 Phase 15 觀測性測試章節 - 三系統健康檢查腳本 - Trace Context 驗證測試 參考: - project_phase15_langfuse.md (Phase 15 完整記錄) - project_phase17_tech_debt.md (技術債規劃) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -6,6 +6,72 @@
|
||||
|
||||
---
|
||||
|
||||
## 文件資訊
|
||||
|
||||
| 欄位 | 值 |
|
||||
|------|-----|
|
||||
| **版本** | v1.5 |
|
||||
| **建立日期** | 2026-03-20 (台北) |
|
||||
| **建立者** | Claude Code |
|
||||
| **最後修改** | 2026-03-26 03:30 (台北) |
|
||||
| **修改者** | Claude Code |
|
||||
|
||||
### 變更紀錄
|
||||
|
||||
| 版本 | 日期 | 執行者 | 變更內容 |
|
||||
|------|------|--------|----------|
|
||||
| v1.0 | 2026-03-20 | Claude Code | 初始建立 |
|
||||
| v1.1 | 2026-03-23 | Claude Code | Worker NetworkPolicy 教訓 |
|
||||
| v1.2 | 2026-03-25 | Claude Code | CI/CD 整合 + clawbot-redis → openclaw-redis |
|
||||
| v1.3 | 2026-03-25 | Claude Code | 加入文件資訊區塊 |
|
||||
| v1.4 | 2026-03-26 | Claude Code | 新增部署層級決策鐵律 |
|
||||
| v1.5 | 2026-03-26 | Claude Code | **Phase 15 三層觀測架構 (Deep Linking)** |
|
||||
|
||||
---
|
||||
|
||||
## 🔴🔴🔴 部署層級決策鐵律 (2026-03-26)
|
||||
|
||||
> **統帥指令**: 所有服務的部署位置 (主機/容器/K3s) 必須經過專業評估並明確確認
|
||||
> **詳細文件**: `~/.claude/projects/-Users-ogt-awoooi/memory/feedback_deployment_layer_decision.md`
|
||||
|
||||
### 血的教訓
|
||||
|
||||
| 問題 | 後果 | 根因 |
|
||||
|------|------|------|
|
||||
| 容器不見 | 服務中斷,資料遺失 | 沒有持久化 |
|
||||
| K3s Pod 被移除 | 配置遺失,需重建 | 沒有 Git 版本化 |
|
||||
| 修復時間長 | 業務影響嚴重 | 沒有備份/回滾機制 |
|
||||
|
||||
### 部署新服務前必須
|
||||
|
||||
```
|
||||
1. ❌ 禁止: 直接 docker run / kubectl apply (不經評估)
|
||||
2. ✅ 必須: 先提出「服務部署評估報告」
|
||||
3. ✅ 必須: 等待統帥確認部署層級
|
||||
4. ✅ 必須: 確認備份/回滾機制
|
||||
```
|
||||
|
||||
### 三層部署選項
|
||||
|
||||
| 層級 | 適用場景 | 備份方式 | 範例 |
|
||||
|------|---------|---------|------|
|
||||
| **主機層** | 核心基礎設施、需持久化 | snapshot / rsync | Harbor, Runner, PostgreSQL |
|
||||
| **容器層** | 獨立工具、非核心路徑 | compose.yml + volume 備份 | Sentry, Langfuse |
|
||||
| **K3s 層** | 生產應用、需水平擴展 | Git YAML + PVC 備份 | AWOOOI API/Web/Worker |
|
||||
|
||||
### 評估報告必答問題
|
||||
|
||||
| # | 問題 | 為什麼重要 |
|
||||
|---|------|-----------|
|
||||
| 1 | 服務類型是什麼? | 無狀態 vs 有狀態決定部署方式 |
|
||||
| 2 | 需要持久化嗎? | 決定是否需要 Volume/PVC |
|
||||
| 3 | 可以接受中斷嗎? | 決定 HA 需求 |
|
||||
| 4 | 有備份機制嗎? | 決定恢復策略 |
|
||||
| 5 | 配置如何版本化? | 決定 Git 管理方式 |
|
||||
| 6 | 如何回滾? | 決定部署策略 |
|
||||
|
||||
---
|
||||
|
||||
## 四主機架構 (絕對邊界)
|
||||
|
||||
| 主機 | IP | 角色 | 部署內容 |
|
||||
@@ -236,13 +302,13 @@ kubectl get networkpolicy allow-required-egress -n awoooi-prod \
|
||||
```bash
|
||||
# 查看消費者群組狀態
|
||||
ssh ollama@192.168.0.188 \
|
||||
"docker exec clawbot-redis redis-cli -n 10 XINFO GROUPS stream:awoooi_signals"
|
||||
"docker exec openclaw-redis redis-cli -n 10 XINFO GROUPS stream:awoooi_signals"
|
||||
|
||||
# 警告訊號: consumers > 5 且有大量死掉的 Pod
|
||||
|
||||
# 銷毀並重建 (Tier 3 需統帥授權)
|
||||
ssh ollama@192.168.0.188 \
|
||||
"docker exec clawbot-redis redis-cli -n 10 XGROUP DESTROY stream:awoooi_signals awoooi_workers"
|
||||
"docker exec openclaw-redis redis-cli -n 10 XGROUP DESTROY stream:awoooi_signals awoooi_workers"
|
||||
```
|
||||
|
||||
### 重建後驗證
|
||||
@@ -304,11 +370,11 @@ kubectl rollout restart deployment/awoooi-worker -n awoooi-prod
|
||||
|
||||
# ✅ 正確: 重啟前先清理外部殘留
|
||||
# Step 1: 檢查是否有殭屍消費者
|
||||
ssh ollama@192.168.0.188 "docker exec clawbot-redis redis-cli -n 10 \
|
||||
ssh ollama@192.168.0.188 "docker exec openclaw-redis redis-cli -n 10 \
|
||||
XINFO GROUPS stream:awoooi_signals"
|
||||
|
||||
# Step 2: 如果 consumers 數量異常高,先清理
|
||||
ssh ollama@192.168.0.188 "docker exec clawbot-redis redis-cli -n 10 \
|
||||
ssh ollama@192.168.0.188 "docker exec openclaw-redis redis-cli -n 10 \
|
||||
XGROUP DESTROY stream:awoooi_signals awoooi_workers"
|
||||
|
||||
# Step 3: 重啟後,Consumer Group 會自動重建
|
||||
@@ -450,6 +516,154 @@ ssh ollama@192.168.0.188 "cat /home/ollama/momo-pro/monitoring/alertmanager.yml"
|
||||
|
||||
---
|
||||
|
||||
## 🔗 CI/CD 整合 (Phase 13.1)
|
||||
|
||||
> **目標**: Git 事件觸發 AI 審查 + CI 失敗自動診斷
|
||||
|
||||
### GitHub Webhook 整合
|
||||
|
||||
```
|
||||
GitHub Events
|
||||
│
|
||||
├─ push → AI 代碼審查 (PR Review Agent)
|
||||
│
|
||||
├─ workflow_run.conclusion=failure → AI 診斷根因
|
||||
│
|
||||
└─ pull_request → AI 安全掃描
|
||||
↓
|
||||
┌─────────────────────────────────┐
|
||||
│ AWOOOI API │
|
||||
│ POST /api/v1/webhooks/github │
|
||||
└─────────────────────────────────┘
|
||||
↓
|
||||
┌─────────────────────────────────┐
|
||||
│ OpenClaw AI 分析 │
|
||||
│ - 代碼品質評估 │
|
||||
│ - 資安掃描 │
|
||||
│ - 根因診斷 │
|
||||
└─────────────────────────────────┘
|
||||
↓
|
||||
Telegram 通知
|
||||
```
|
||||
|
||||
### Webhook 端點
|
||||
|
||||
```yaml
|
||||
# 新增端點
|
||||
POST /api/v1/webhooks/github
|
||||
Headers:
|
||||
- X-GitHub-Event: push | workflow_run | pull_request
|
||||
- X-Hub-Signature-256: sha256=xxx
|
||||
Body:
|
||||
- repository, sender, commits, workflow_run...
|
||||
```
|
||||
|
||||
### 驗證 HMAC
|
||||
|
||||
```python
|
||||
import hmac
|
||||
import hashlib
|
||||
|
||||
def verify_github_signature(payload: bytes, signature: str, secret: str) -> bool:
|
||||
"""驗證 GitHub Webhook 簽名
|
||||
|
||||
Warning:
|
||||
WEBHOOK_HMAC_SECRET 必須與 GitHub 設定一致
|
||||
"""
|
||||
expected = "sha256=" + hmac.new(
|
||||
secret.encode(),
|
||||
payload,
|
||||
hashlib.sha256
|
||||
).hexdigest()
|
||||
return hmac.compare_digest(expected, signature)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📊 五主機架構更新 (2026-03-25)
|
||||
|
||||
| 主機 | IP | 新增服務 |
|
||||
|------|-----|----------|
|
||||
| DevOps | 192.168.0.110 | **Sentry Self-Hosted :9000** |
|
||||
| K3s | 192.168.0.120 | **awoooi-worker** (獨立 Pod) |
|
||||
|
||||
### 部署拓撲更新
|
||||
|
||||
```
|
||||
┌─────────────────────────┐
|
||||
│ 192.168.0.110 │
|
||||
│ DevOps 金庫 │
|
||||
├─────────────────────────┤
|
||||
│ [Docker] │
|
||||
│ ├─ Harbor :5000 │
|
||||
│ ├─ GH Runner │
|
||||
│ └─ Sentry :9000 ← NEW │
|
||||
└─────────────────────────┘
|
||||
```
|
||||
|
||||
### K3s awoooi-prod 服務清單
|
||||
|
||||
| 服務 | Deployment | NodePort |
|
||||
|------|------------|----------|
|
||||
| Frontend | awoooi-web | 32335 |
|
||||
| Backend | awoooi-api | 32334 |
|
||||
| **Worker** | awoooi-worker | - |
|
||||
|
||||
---
|
||||
|
||||
## 👁️ Phase 15 三層觀測架構 (2026-03-26)
|
||||
|
||||
> **Phase 15 全部完成**: 15.1 Langfuse + 15.2 Trace Context + 15.3 Deep Linking
|
||||
|
||||
### 三系統互連
|
||||
|
||||
```
|
||||
┌─────────────┐ trace_id ┌─────────────┐ trace_id ┌─────────────┐
|
||||
│ Sentry │ ◄────────────────► │ SignOz │ ◄────────────────► │ Langfuse │
|
||||
│ Errors │ │ Traces │ │ LLMOps │
|
||||
│ :9000 (.110)│ │ :3301 (.188)│ │ :3100 (.110)│
|
||||
└─────────────┘ └─────────────┘ └─────────────┘
|
||||
```
|
||||
|
||||
### Deep Linking URL 格式
|
||||
|
||||
| 系統 | URL 格式 |
|
||||
|------|----------|
|
||||
| SignOz Trace | `http://192.168.0.188:3301/trace/{trace_id}` |
|
||||
| Langfuse Trace | `http://192.168.0.110:3100/project/awoooi-openclaw/traces/{id}` |
|
||||
| Sentry Issue | `http://192.168.0.110:9000/organizations/sentry/issues/{id}/` |
|
||||
|
||||
### Trace Context 傳遞 (Phase 15.2)
|
||||
|
||||
```
|
||||
webhooks.py (Producer)
|
||||
└── get_trace_context() → {trace_id, span_id}
|
||||
└── XADD stream:signals {..., _trace_id, _span_id}
|
||||
│
|
||||
▼
|
||||
signal_worker.py (Consumer)
|
||||
└── restore_trace_context() → W3C traceparent 還原
|
||||
└── 建立子 Span 繼承原始 Trace
|
||||
```
|
||||
|
||||
### 使用方式
|
||||
|
||||
```python
|
||||
from src.core.deep_linking import DeepLinking
|
||||
|
||||
# 取得 SignOz URL
|
||||
url = DeepLinking.signoz_trace_url(trace_id)
|
||||
|
||||
# 取得所有連結
|
||||
links = DeepLinking.get_all_links(
|
||||
otel_trace_id="xxx",
|
||||
langfuse_trace_id="yyy",
|
||||
sentry_issue_id="123"
|
||||
)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 參考文檔
|
||||
|
||||
- `k8s/awoooi-prod/`: K8s Manifests
|
||||
@@ -457,3 +671,7 @@ ssh ollama@192.168.0.188 "cat /home/ollama/momo-pro/monitoring/alertmanager.yml"
|
||||
- `docs/HARD_RULES.md`: 絕對禁止規則
|
||||
- `reference_four_hosts.md`: 主機架構參考
|
||||
- `feedback_alertmanager_awoooi_flow.md`: **🔴 Alertmanager 正確流程**
|
||||
- `memory/project_phase13_enterprise_aiops.md`: **Phase 13 規劃**
|
||||
- `memory/project_phase15_langfuse.md`: **📊 Phase 15 全部完成**
|
||||
- `memory/project_phase17_tech_debt.md`: **🔧 Phase 17 技術債**
|
||||
- `src/core/deep_linking.py`: **👁️ Deep Linking URL 生成器**
|
||||
|
||||
@@ -6,6 +6,27 @@
|
||||
|
||||
---
|
||||
|
||||
## 文件資訊
|
||||
|
||||
| 欄位 | 值 |
|
||||
|------|-----|
|
||||
| **版本** | v1.3 |
|
||||
| **建立日期** | 2026-03-20 (台北) |
|
||||
| **建立者** | Claude Code |
|
||||
| **最後修改** | 2026-03-26 03:30 (台北) |
|
||||
| **修改者** | Claude Code |
|
||||
|
||||
### 變更紀錄
|
||||
|
||||
| 版本 | 日期 | 執行者 | 變更內容 |
|
||||
|------|------|--------|----------|
|
||||
| v1.0 | 2026-03-20 | Claude Code | 初始建立 |
|
||||
| v1.1 | 2026-03-24 | Claude Code | 禁止 Mock 測試鐵律 |
|
||||
| v1.2 | 2026-03-25 | Claude Code | 加入文件資訊區塊 |
|
||||
| v1.3 | 2026-03-26 | Claude Code | **Phase 15 觀測性測試** |
|
||||
|
||||
---
|
||||
|
||||
## 🔴🔴 禁止 Mock 測試鐵律 (2026-03-24)
|
||||
|
||||
> **統帥明確指示**: 「全面禁止!!!」
|
||||
@@ -401,9 +422,72 @@ kubectl rollout undo deployment/awoooi-api -n awoooi-prod
|
||||
|
||||
---
|
||||
|
||||
## 👁️ Phase 15 觀測性測試 (2026-03-26)
|
||||
|
||||
> **Phase 15 全部完成**: Langfuse + Trace Context + Deep Linking
|
||||
|
||||
### 觀測系統健康檢查
|
||||
|
||||
```bash
|
||||
# SignOz (Traces)
|
||||
curl -f http://192.168.0.188:3301/api/v1/services | jq '.data[] | .serviceName'
|
||||
|
||||
# Langfuse (LLMOps)
|
||||
curl -f http://192.168.0.110:3100/api/health
|
||||
|
||||
# Sentry (Errors)
|
||||
curl -f http://192.168.0.110:9000/api/0/internal/health/
|
||||
```
|
||||
|
||||
### Trace Context 驗證
|
||||
|
||||
```python
|
||||
# 測試 Deep Linking URL 生成
|
||||
from src.core.deep_linking import DeepLinking
|
||||
|
||||
# 應返回有效 URL
|
||||
assert DeepLinking.signoz_trace_url("abc123") == "http://192.168.0.188:3301/trace/abc123"
|
||||
assert DeepLinking.langfuse_trace_url("lf-123") != ""
|
||||
assert DeepLinking.sentry_issue_url("456") != ""
|
||||
```
|
||||
|
||||
### Trace Context 傳遞測試
|
||||
|
||||
```python
|
||||
from src.core.telemetry import get_trace_context, restore_trace_context
|
||||
|
||||
# 1. 無 active span 時返回 None
|
||||
assert get_trace_context() is None
|
||||
|
||||
# 2. restore 容錯測試
|
||||
with restore_trace_context(None) as span:
|
||||
assert span is not None # 應建立新 span
|
||||
|
||||
with restore_trace_context({"trace_id": "", "span_id": ""}) as span:
|
||||
assert span is not None # 空值容錯
|
||||
```
|
||||
|
||||
### 觀測系統三報表
|
||||
|
||||
```markdown
|
||||
## 觀測系統狀態報告
|
||||
|
||||
| 系統 | URL | 狀態 |
|
||||
|------|-----|------|
|
||||
| SignOz | http://192.168.0.188:3301 | ✅ |
|
||||
| Langfuse | http://192.168.0.110:3100 | ✅ |
|
||||
| Sentry | http://192.168.0.110:9000 | ✅ |
|
||||
| Deep Linking | N/A (內建) | ✅ |
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 參考文檔
|
||||
|
||||
- `apps/web/playwright.config.ts`: Playwright 設定
|
||||
- `apps/web/tests/`: E2E 測試腳本
|
||||
- `scripts/`: 自動化驗證腳本
|
||||
- `apps/web/src/stores/approval.store.ts`: Polling + Race Condition 修復範例
|
||||
- `src/core/deep_linking.py`: **👁️ Deep Linking URL 生成器**
|
||||
- `src/core/telemetry.py`: **Phase 15.2 Trace Context**
|
||||
- `memory/project_phase15_langfuse.md`: **📊 Phase 15 完整記錄**
|
||||
|
||||
Reference in New Issue
Block a user