diff --git a/.agents/skills/04-awoooi-devops-commander.md b/.agents/skills/04-awoooi-devops-commander.md index 790bb1f9..ac1dc3d5 100644 --- a/.agents/skills/04-awoooi-devops-commander.md +++ b/.agents/skills/04-awoooi-devops-commander.md @@ -10,10 +10,10 @@ | 欄位 | 值 | |------|-----| -| **版本** | v1.7 | +| **版本** | v1.8 | | **建立日期** | 2026-03-20 (台北) | | **建立者** | Claude Code | -| **最後修改** | 2026-03-28 16:00 (台北) | +| **最後修改** | 2026-03-28 20:30 (台北) | | **修改者** | Claude Code ### 變更紀錄 @@ -28,6 +28,7 @@ | v1.5 | 2026-03-26 | Claude Code | **Phase 15 三層觀測架構 (Deep Linking)** | | v1.6 | 2026-03-26 | Claude Code | **Runner 殭屍進程修復 + CI/CD cancel-in-progress: false** | | v1.7 | 2026-03-28 | Claude Code | **K3s 生產級優化 (ADR-033 + Phase K0)** | +| v1.8 | 2026-03-28 | Claude Code | **可觀測性端點配置規範 (SignOz 121→188 修正)** | --- @@ -299,6 +300,43 @@ kubectl --server=https://192.168.0.125:6443 get nodes --- +## 🔴 可觀測性端點配置 (2026-03-28) + +> **血的教訓**: CI/CD workflows 指向錯誤的 OTEL 端點 (121:4318),導致遙測數據無法送達 +> **詳細文件**: `memory/feedback_signoz_otel_config.md` + +### 正確端點配置 + +| 服務 | 端點 | 用途 | +|------|------|------| +| **SignOz Web UI** | `http://192.168.0.188:3301` | APM Dashboard | +| **OTEL gRPC** | `192.168.0.188:24317` | K8s/應用程式 Traces | +| **OTEL HTTP** | `http://192.168.0.188:24318` | CI/CD workflows | +| **Sentry** | `http://192.168.0.110:9000` | Error Tracking | +| **Langfuse** | `http://192.168.0.110:3100` | LLM Tracing | + +### CI/CD 配置 (ci.yaml / cd.yaml) + +```yaml +# ✅ 正確 +OTEL_EXPORTER_OTLP_ENDPOINT: http://192.168.0.188:24318 + +# ❌ 錯誤 (2026-03-28 已修正) +OTEL_EXPORTER_OTLP_ENDPOINT: http://192.168.0.121:4318 +``` + +### 驗證指令 + +```bash +# 檢查 SignOz 容器運行狀態 +ssh ollama@192.168.0.188 "docker ps | grep signoz" + +# 測試 OTEL 端點 +curl -s http://192.168.0.188:24318/v1/traces -X POST | head -c 100 +``` + +--- + ## 🔴🔴🔴 告警鏈路 E2E 驗證 (ADR-025) > **2026-03-26**: URL 路徑錯誤導致 2 天無告警 (`webhook` vs `webhooks`) diff --git a/.github/workflows/cd.yaml b/.github/workflows/cd.yaml index 7fc1b253..9022c272 100644 --- a/.github/workflows/cd.yaml +++ b/.github/workflows/cd.yaml @@ -44,8 +44,8 @@ env: IMAGE_PREFIX: library/awoooi LOCAL_CACHE_DIR: /home/wooo/build-cache/awoooi OPENCLAW_URL: http://192.168.0.188:8088 - # OTEL CI/CD 監控 (2026-03-24 批准) - OTEL_EXPORTER_OTLP_ENDPOINT: http://192.168.0.121:4318 + # OTEL CI/CD 監控 (2026-03-24 批准, 2026-03-28 修正: SignOz 在 188) + OTEL_EXPORTER_OTLP_ENDPOINT: http://192.168.0.188:24318 OTEL_SERVICE_NAME: awoooi-cd OTEL_RESOURCE_ATTRIBUTES: service.version=${{ github.sha }},deployment.environment=production diff --git a/.github/workflows/ci.yaml b/.github/workflows/ci.yaml index 52735f8d..9857932a 100644 --- a/.github/workflows/ci.yaml +++ b/.github/workflows/ci.yaml @@ -19,8 +19,8 @@ env: NODE_VERSION: '20' PNPM_VERSION: '9' PYTHON_VERSION: '3.11' - # OTEL CI/CD 監控 (2026-03-24 批准) - OTEL_EXPORTER_OTLP_ENDPOINT: http://192.168.0.121:4318 + # OTEL CI/CD 監控 (2026-03-24 批准, 2026-03-28 修正: SignOz 在 188) + OTEL_EXPORTER_OTLP_ENDPOINT: http://192.168.0.188:24318 OTEL_SERVICE_NAME: awoooi-ci OTEL_RESOURCE_ATTRIBUTES: service.version=${{ github.sha }},deployment.environment=ci diff --git a/docs/LOGBOOK.md b/docs/LOGBOOK.md index 427e716e..134d8f83 100644 --- a/docs/LOGBOOK.md +++ b/docs/LOGBOOK.md @@ -5,7 +5,7 @@ --- -## 📍 當前狀態 (2026-03-28 19:00 台北) +## 📍 當前狀態 (2026-03-28 20:30 台北) | 項目 | 狀態 | |------|------| @@ -20,6 +20,34 @@ | **ADR** | ✅ ADR-031 + ADR-032 + **ADR-033 (K3s HA)** | | **首席架構師審查** | ✅ **綜合審查 8.8/10 Strong Pass** | | **Sentry Replay** | ✅ **已配置 (10% Session + 100% Error)** | +| **SignOz/OTEL** | ✅ **配置修正 (188:24318)** | + +--- + +### 🔴 2026-03-28 SignOz OTEL 配置錯誤修復 (Day 10 晚間 20:30) + +**狀態**: ✅ **CI/CD OTEL 端點配置已修正** + +**問題發現**: CI/CD workflows 指向錯誤的 OTEL 端點 + +| 配置 | 錯誤值 | 正確值 | +|------|--------|--------| +| `ci.yaml` | `192.168.0.121:4318` | `192.168.0.188:24318` | +| `cd.yaml` | `192.168.0.121:4318` | `192.168.0.188:24318` | + +**根本原因**: SignOz 部署在 188 主機,但 CI/CD 配置誤寫為 121 (K3s Worker) + +**SignOz 實際運行位置** (192.168.0.188): +- Web UI: `:3301` +- OTEL Collector gRPC: `:24317` +- OTEL Collector HTTP: `:24318` + +**修正文件**: +- `.github/workflows/ci.yaml` (第 23 行) +- `.github/workflows/cd.yaml` (第 48 行) +- `~/.claude/projects/.../memory/reference_four_hosts.md` (加入可觀測性服務表) + +**經驗教訓**: 需建立配置完整性檢查機制 ---