docs: 新增 ADR-025 告警鏈路 E2E 驗證 + 更新 Skills
新增: - ADR-025: 告警鏈路 E2E 驗證架構 (2026-03-26 事故教訓) 更新: - ADR-011: 新增 DNS 規則最佳實踐 (附錄 B) - Skill 04: 新增 NetworkPolicy DNS 規則 + CoreDNS 設定 - Skill 05: 新增告警鏈路 Smoke Test 要求 - CLAUDE.md: 新增告警鏈路驗證到任務前必讀 事故根因: 1. URL 路徑錯誤 (webhook vs webhooks) 2. NetworkPolicy DNS 規則標籤不匹配 3. CoreDNS 上游 DNS 依賴 systemd-resolved Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -179,6 +179,78 @@ metadata:
|
||||
- `📝` 用途說明
|
||||
- `⚠️` 注意事項
|
||||
|
||||
### 🔴🔴 NetworkPolicy DNS 規則 (2026-03-26)
|
||||
|
||||
> **血的教訓**: DNS 規則標籤錯誤導致 2 天無告警!
|
||||
|
||||
```yaml
|
||||
# ❌ 錯誤: 使用不存在的標籤
|
||||
- ports:
|
||||
- port: 53
|
||||
to:
|
||||
- podSelector:
|
||||
matchLabels:
|
||||
environment: prod # CoreDNS 沒有這個標籤!
|
||||
k8s-app: kube-dns
|
||||
system: awoooi # CoreDNS 沒有這個標籤!
|
||||
|
||||
# ✅ 正確: 使用 namespace selector
|
||||
- ports:
|
||||
- port: 53
|
||||
protocol: UDP
|
||||
- port: 53
|
||||
protocol: TCP
|
||||
to:
|
||||
- namespaceSelector:
|
||||
matchLabels:
|
||||
kubernetes.io/metadata.name: kube-system
|
||||
podSelector:
|
||||
matchLabels:
|
||||
k8s-app: kube-dns
|
||||
```
|
||||
|
||||
### 🔴🔴 CoreDNS 上游 DNS 設定
|
||||
|
||||
> **血的教訓**: 容器內無法使用 127.0.0.53 (systemd-resolved)
|
||||
|
||||
```yaml
|
||||
# ❌ 錯誤: 使用 /etc/resolv.conf (指向 127.0.0.53)
|
||||
forward . /etc/resolv.conf
|
||||
|
||||
# ✅ 正確: 使用真實 DNS 伺服器
|
||||
forward . 8.8.8.8 1.1.1.1
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔴🔴🔴 告警鏈路 E2E 驗證 (ADR-025)
|
||||
|
||||
> **2026-03-26**: URL 路徑錯誤導致 2 天無告警 (`webhook` vs `webhooks`)
|
||||
|
||||
### 部署後 Smoke Test (強制)
|
||||
|
||||
```bash
|
||||
# 每次部署後必須執行
|
||||
curl -s -X POST "$API_URL/api/v1/webhooks/alertmanager" \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{"receiver":"smoke-test","status":"firing","alerts":[...]}' \
|
||||
| jq -e '.success == true' || exit 1
|
||||
```
|
||||
|
||||
### URL 路徑規範
|
||||
|
||||
| 正確 | 錯誤 |
|
||||
|-----|------|
|
||||
| `/api/v1/webhooks/alertmanager` | `/api/v1/webhook/alertmanager` |
|
||||
| 複數形式 `webhooks` | 單數形式 `webhook` |
|
||||
|
||||
### Alertmanager ConfigMap 修改流程
|
||||
|
||||
1. 提取 webhook URL
|
||||
2. curl 測試 URL 可達性
|
||||
3. 必須收到 200 或 422 (格式錯但端點存在)
|
||||
4. 驗證失敗 → **阻止 apply**
|
||||
|
||||
---
|
||||
|
||||
## Turborepo 快取強化協議
|
||||
|
||||
@@ -10,7 +10,7 @@
|
||||
|
||||
| 欄位 | 值 |
|
||||
|------|-----|
|
||||
| **版本** | v1.4 |
|
||||
| **版本** | v1.5 |
|
||||
| **建立日期** | 2026-03-20 (台北) |
|
||||
| **建立者** | Claude Code |
|
||||
| **最後修改** | 2026-03-26 03:30 (台北) |
|
||||
@@ -25,6 +25,7 @@
|
||||
| v1.2 | 2026-03-25 | Claude Code | 加入文件資訊區塊 |
|
||||
| v1.3 | 2026-03-26 | Claude Code | **Phase 15 觀測性測試** |
|
||||
| v1.4 | 2026-03-26 | Claude Code | **Runner 殭屍進程診斷流程** |
|
||||
| v1.5 | 2026-03-26 | Claude Code | **LLM 測試策略** (首席架構師審查 P3) |
|
||||
|
||||
---
|
||||
|
||||
@@ -114,6 +115,68 @@ cd apps/web && node scripts/verify-frontend.js
|
||||
|
||||
---
|
||||
|
||||
## 🔴🔴🔴 告警鏈路 E2E 驗證 (2026-03-26 ADR-025)
|
||||
|
||||
> **血的教訓**: URL 路徑錯誤 (`webhook` vs `webhooks`) + DNS 規則錯誤,導致 2 天無 Telegram 告警
|
||||
|
||||
### 部署後 Smoke Test (強制)
|
||||
|
||||
**任何 API 或 Alertmanager 部署後必須執行:**
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# scripts/smoke-test-alert-chain.sh
|
||||
|
||||
API_URL="${1:-http://192.168.0.120:32334}"
|
||||
|
||||
echo "🔔 Testing alert chain..."
|
||||
|
||||
RESPONSE=$(curl -s -X POST "$API_URL/api/v1/webhooks/alertmanager" \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{
|
||||
"receiver": "smoke-test",
|
||||
"status": "firing",
|
||||
"alerts": [{
|
||||
"status": "firing",
|
||||
"labels": {"alertname": "SmokeTest", "severity": "info"},
|
||||
"annotations": {"summary": "Smoke test from CI"}
|
||||
}]
|
||||
}')
|
||||
|
||||
# 驗證成功
|
||||
if echo "$RESPONSE" | jq -e '.success == true' > /dev/null 2>&1; then
|
||||
echo "✅ Alert chain smoke test passed"
|
||||
echo "📬 Approval ID: $(echo "$RESPONSE" | jq -r '.approval_id')"
|
||||
exit 0
|
||||
else
|
||||
echo "❌ Alert chain smoke test FAILED"
|
||||
echo "$RESPONSE"
|
||||
exit 1
|
||||
fi
|
||||
```
|
||||
|
||||
### 驗證項目
|
||||
|
||||
| 項目 | 驗證方式 | 失敗動作 |
|
||||
|------|---------|---------|
|
||||
| Webhook 端點可達 | curl 收到 200 | 回滾部署 |
|
||||
| Telegram 通知送達 | 檢查 message_id | 檢查 DNS/Token |
|
||||
| Approval 建立成功 | approval_id 存在 | 檢查 DB 連線 |
|
||||
|
||||
### DNS 連通性檢查
|
||||
|
||||
```bash
|
||||
# 從 Pod 內測試
|
||||
kubectl exec -n awoooi-prod deployment/awoooi-api -- \
|
||||
python -c "import socket; print(socket.gethostbyname('api.telegram.org'))"
|
||||
```
|
||||
|
||||
**失敗原因**:
|
||||
- CoreDNS 上游 DNS 設定錯誤 (`127.0.0.53`)
|
||||
- NetworkPolicy DNS 規則標籤不匹配
|
||||
|
||||
---
|
||||
|
||||
## Playwright 自動化規範
|
||||
|
||||
### 測試腳本結構
|
||||
@@ -532,6 +595,86 @@ ls -la ~/actions-runner-awoooi*/_work/_temp/
|
||||
|
||||
---
|
||||
|
||||
## 🧠 LLM 測試策略 (2026-03-26 首席架構師審查)
|
||||
|
||||
> **背景**: LLM 測試天生非確定性,需特殊處理確保 CI 穩定
|
||||
> **ADR**: ADR-018-llm-testing-strategy.md (Deferred - 採用方案 A)
|
||||
|
||||
### 確定性參數 (必須)
|
||||
|
||||
```python
|
||||
# ✅ 所有 LLM 測試必須使用確定性參數
|
||||
response = await client.post(
|
||||
f"{OLLAMA_URL}/api/chat",
|
||||
json={
|
||||
"model": model,
|
||||
"messages": messages,
|
||||
"stream": False,
|
||||
"options": {
|
||||
"temperature": 0.0, # 🔴 確定性輸出
|
||||
"seed": 42, # 🔴 可重現性
|
||||
},
|
||||
},
|
||||
)
|
||||
```
|
||||
|
||||
### CI 分層策略
|
||||
|
||||
| 層級 | Workflow | 執行時間 | 包含測試 |
|
||||
|------|----------|----------|----------|
|
||||
| **Fast CI** | `ci.yaml` | ~3 min | Lint, Unit, Integration |
|
||||
| **Nightly LLM** | `nightly-llm.yaml` | ~45 min | Prompt Validation, Model Regression |
|
||||
| **Daily E2E** | `daily-e2e-health.yaml` | ~5 min | Health Check, K8s 驗證 |
|
||||
|
||||
### Ollama CPU 模式須知
|
||||
|
||||
> **192.168.0.188**: 純 CPU 推理 (無 GPU),速度 ~0.45 tok/s
|
||||
|
||||
```python
|
||||
# CPU 模式必須設定足夠長的 Timeout
|
||||
TIMEOUT = 300 # 秒 (CPU 推理需 ~222-666 秒)
|
||||
|
||||
async with httpx.AsyncClient(timeout=TIMEOUT) as client:
|
||||
response = await client.post(...)
|
||||
```
|
||||
|
||||
### 測試分類執行
|
||||
|
||||
```bash
|
||||
# 快速測試 (CI 每次)
|
||||
pytest apps/api/tests/ -k "not llm and not model" -v
|
||||
|
||||
# LLM 測試 (Nightly)
|
||||
pytest apps/api/tests/test_model_regression.py -v
|
||||
pytest apps/api/tests/test_prompt_validation.py -v
|
||||
```
|
||||
|
||||
### 繁體中文輸出驗證
|
||||
|
||||
```python
|
||||
# System Prompt 必須強調繁中
|
||||
AWOOOI_SYSTEM_PROMPT = """
|
||||
...
|
||||
- 【重要】必須使用台灣繁體中文回應 (Traditional Chinese Taiwan)
|
||||
- 禁止使用簡體中文字符 (如:与→與、说→說、这→這)
|
||||
...
|
||||
"""
|
||||
|
||||
# 驗證器範例
|
||||
def validate_traditional_chinese(response: str) -> bool:
|
||||
simplified_chars = ["与", "说", "这", "为", "时"]
|
||||
return not any(c in response for c in simplified_chars)
|
||||
```
|
||||
|
||||
### 參考
|
||||
|
||||
- `src/core/prompts.py`: 集中式 System Prompt (ADR-019)
|
||||
- `tests/test_model_regression.py`: 模型回歸測試
|
||||
- `tests/test_prompt_validation.py`: Prompt 品質測試
|
||||
- `.github/workflows/nightly-llm.yaml`: Nightly LLM Workflow
|
||||
|
||||
---
|
||||
|
||||
## 參考文檔
|
||||
|
||||
- `apps/web/playwright.config.ts`: Playwright 設定
|
||||
@@ -542,3 +685,7 @@ ls -la ~/actions-runner-awoooi*/_work/_temp/
|
||||
- `src/core/telemetry.py`: **Phase 15.2 Trace Context**
|
||||
- `memory/project_phase15_langfuse.md`: **📊 Phase 15 完整記錄**
|
||||
- `memory/feedback_runner_zombie_process.md`: **🚨 Runner 殭屍進程修復**
|
||||
- `docs/adr/ADR-018-llm-testing-strategy.md`: **🧠 LLM 測試策略 (Deferred)**
|
||||
- `docs/adr/ADR-019-system-prompt-management.md`: **📝 System Prompt 集中管理**
|
||||
- `.github/workflows/nightly-llm.yaml`: **🌙 Nightly LLM 測試**
|
||||
- `.github/workflows/daily-e2e-health.yaml`: **🏥 Daily E2E 健康檢查**
|
||||
|
||||
@@ -95,6 +95,7 @@
|
||||
- `*telegram*` → Telegram Token 章節
|
||||
- `apps/web/**` → i18n 章節
|
||||
- Incident/Approval 流程 → 確認 Telegram + DB 鏈路
|
||||
- **Alertmanager/NetworkPolicy** → ADR-025 告警鏈路 E2E 驗證 🔴🔴
|
||||
|
||||
---
|
||||
|
||||
@@ -120,6 +121,7 @@
|
||||
| **API 路徑** | `feedback_api_path_naming.md` 🔴 修改需同步前端 |
|
||||
| **部署驗證** | `feedback_deployment_verification.md` 🔴🔴 必須驗證 Pod 版本 |
|
||||
| **部署層級** | `feedback_deployment_layer_decision.md` 🔴🔴🔴 主機/容器/K3s 必須評估 |
|
||||
| **告警鏈路** | `feedback_alertchain_e2e_validation.md` 🔴🔴🔴 Alertmanager→API→Telegram |
|
||||
|
||||
---
|
||||
|
||||
@@ -207,7 +209,9 @@ Pre-commit Hook 會自動檢查並阻擋 Router 層違規
|
||||
| DevOps | `.agents/skills/04-awoooi-devops-commander.md` |
|
||||
| 測試 | `.agents/skills/05-awoooi-sre-qa.md` |
|
||||
| Git | `.agents/skills/06-awoooi-monorepo-master.md` |
|
||||
| **Tool 整合** | `.agents/skills/07-tool-integration-expert.md` 🆕 |
|
||||
| Tool 整合 | `.agents/skills/07-tool-integration-expert.md` |
|
||||
| 模型路由 | `.agents/skills/08-model-router-expert.md` |
|
||||
| **絞殺者重構** | `.agents/skills/09-strangler-pattern-expert.md` 🆕 |
|
||||
|
||||
## Memory 系統
|
||||
|
||||
|
||||
@@ -265,7 +265,7 @@ k8s/awoooi-prod/
|
||||
|
||||
---
|
||||
|
||||
## 附錄: 今日事故根因分析
|
||||
## 附錄 A: 2026-03-23 事故根因分析
|
||||
|
||||
```
|
||||
2026-03-23 Y 按鈕執行超時
|
||||
@@ -286,3 +286,54 @@ k8s/awoooi-prod/
|
||||
2. NetworkPolicy 需要允許完整路由路徑
|
||||
3. 變更前應該用 dry-run 驗證
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 附錄 B: 2026-03-26 DNS 規則事故根因分析
|
||||
|
||||
```
|
||||
2026-03-26 兩天無 Telegram 告警
|
||||
|
||||
根因 1: Alertmanager URL 路徑錯誤
|
||||
設定: /api/v1/webhook/alertmanager (單數)
|
||||
實際: /api/v1/webhooks/alertmanager (複數)
|
||||
結果: 404 Not Found
|
||||
|
||||
根因 2: NetworkPolicy DNS 規則標籤錯誤
|
||||
設定: podSelector 要求 environment=prod, system=awoooi
|
||||
實際: CoreDNS 只有 k8s-app=kube-dns
|
||||
結果: Pod 無法連接 CoreDNS
|
||||
|
||||
根因 3: CoreDNS 上游 DNS 設定錯誤
|
||||
設定: forward . /etc/resolv.conf → 127.0.0.53 (systemd-resolved)
|
||||
實際: 容器內無法使用 127.0.0.53
|
||||
結果: 外部 DNS 解析失敗
|
||||
|
||||
修復:
|
||||
1. Alertmanager ConfigMap 修正 URL 路徑
|
||||
2. NetworkPolicy 使用正確的 namespaceSelector
|
||||
3. CoreDNS 改用 8.8.8.8 1.1.1.1
|
||||
|
||||
教訓:
|
||||
1. URL 路徑必須經過 E2E 測試驗證 (ADR-025)
|
||||
2. NetworkPolicy DNS 規則必須使用 namespace selector
|
||||
3. CoreDNS 不能依賴宿主機的 systemd-resolved
|
||||
```
|
||||
|
||||
### DNS 規則最佳實踐
|
||||
|
||||
```yaml
|
||||
# ✅ 正確的 DNS 規則寫法
|
||||
- ports:
|
||||
- port: 53
|
||||
protocol: UDP
|
||||
- port: 53
|
||||
protocol: TCP
|
||||
to:
|
||||
- namespaceSelector:
|
||||
matchLabels:
|
||||
kubernetes.io/metadata.name: kube-system
|
||||
podSelector:
|
||||
matchLabels:
|
||||
k8s-app: kube-dns
|
||||
```
|
||||
|
||||
196
docs/adr/ADR-025-alert-chain-e2e-validation.md
Normal file
196
docs/adr/ADR-025-alert-chain-e2e-validation.md
Normal file
@@ -0,0 +1,196 @@
|
||||
# ADR-025: 告警鏈路 E2E 驗證架構
|
||||
|
||||
**狀態**: 批准
|
||||
**日期**: 2026-03-26
|
||||
**決策者**: 統帥
|
||||
**觸發**: URL 路徑錯誤 + NetworkPolicy DNS 規則錯誤導致 2 天無告警
|
||||
|
||||
## 問題陳述
|
||||
|
||||
```
|
||||
事故時間線 (2026-03-26):
|
||||
├── Alertmanager 設定 /api/v1/webhook/alertmanager (單數)
|
||||
├── API 實際路徑 /api/v1/webhooks/alertmanager (複數)
|
||||
├── 結果: 404 Not Found,所有告警丟失
|
||||
├── 同時: NetworkPolicy DNS 規則使用錯誤標籤
|
||||
├── CoreDNS 無法解析外部 DNS (使用 127.0.0.53)
|
||||
└── 後果: 2 天完全無 Telegram 告警
|
||||
```
|
||||
|
||||
**根本原因**:
|
||||
1. 沒有 E2E 測試驗證 Alertmanager → API → Telegram 鏈路
|
||||
2. 部署後沒有 Smoke Test 確認端點可達
|
||||
3. NetworkPolicy DNS 規則標籤與 CoreDNS 不匹配
|
||||
4. CoreDNS 上游 DNS 設定依賴 systemd-resolved (容器內無效)
|
||||
|
||||
---
|
||||
|
||||
## 決策:四層驗證架構
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────────────────────────┐
|
||||
│ 告警鏈路 E2E 驗證架構 │
|
||||
├──────────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ Layer 1: 部署後 Smoke Test (強制) │
|
||||
│ ═══════════════════════════════════ │
|
||||
│ ┌─────────────────────────────────────────────────────────────┐ │
|
||||
│ │ 每次部署後自動執行: │ │
|
||||
│ │ 1. curl POST /api/v1/webhooks/alertmanager (測試告警) │ │
|
||||
│ │ 2. 驗證回應 success=true │ │
|
||||
│ │ 3. 驗證 Telegram message_id 存在 │ │
|
||||
│ │ 4. 失敗 → 部署回滾 │ │
|
||||
│ └─────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ Layer 2: DNS 連通性檢查 │
|
||||
│ ══════════════════════════ │
|
||||
│ ┌─────────────────────────────────────────────────────────────┐ │
|
||||
│ │ Health Probe 必須包含: │ │
|
||||
│ │ - 內部 DNS: kubernetes.default.svc.cluster.local │ │
|
||||
│ │ - 外部 DNS: api.telegram.org │ │
|
||||
│ │ 任一失敗 → Pod 標記 Not Ready │ │
|
||||
│ └─────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ Layer 3: 鏈路心跳監控 │
|
||||
│ ════════════════════════ │
|
||||
│ ┌─────────────────────────────────────────────────────────────┐ │
|
||||
│ │ Prometheus 規則: │ │
|
||||
│ │ - awoooi_alerts_received_total │ │
|
||||
│ │ - awoooi_telegram_sent_total │ │
|
||||
│ │ 連續 1 小時為 0 → 觸發 CRITICAL 告警 │ │
|
||||
│ └─────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ Layer 4: ConfigMap 驗證 Hook │
|
||||
│ ════════════════════════════ │
|
||||
│ ┌─────────────────────────────────────────────────────────────┐ │
|
||||
│ │ Alertmanager ConfigMap 修改前: │ │
|
||||
│ │ 1. 提取 webhook URL │ │
|
||||
│ │ 2. curl 測試 URL 可達性 │ │
|
||||
│ │ 3. 必須收到 200 或 422 (格式錯但端點存在) │ │
|
||||
│ │ 4. 驗證失敗 → 阻止 apply │ │
|
||||
│ └─────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
└──────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 實施細節
|
||||
|
||||
### 1. 部署後 Smoke Test
|
||||
|
||||
```yaml
|
||||
# CI/CD 強制步驟
|
||||
- name: Alert Chain Smoke Test
|
||||
run: |
|
||||
# 發送測試告警
|
||||
RESPONSE=$(curl -s -X POST "$API_URL/api/v1/webhooks/alertmanager" \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{"receiver":"smoke-test","status":"firing","alerts":[{"status":"firing","labels":{"alertname":"SmokeTest","severity":"info"},"annotations":{"summary":"CI Smoke Test"}}]}')
|
||||
|
||||
# 驗證成功
|
||||
echo "$RESPONSE" | jq -e '.success == true' || exit 1
|
||||
echo "Alert chain smoke test passed"
|
||||
```
|
||||
|
||||
### 2. NetworkPolicy DNS 規則 (正確寫法)
|
||||
|
||||
```yaml
|
||||
# ❌ 錯誤: 使用不存在的標籤
|
||||
- ports:
|
||||
- port: 53
|
||||
to:
|
||||
- podSelector:
|
||||
matchLabels:
|
||||
environment: prod # CoreDNS 沒有這個標籤!
|
||||
k8s-app: kube-dns
|
||||
system: awoooi # CoreDNS 沒有這個標籤!
|
||||
|
||||
# ✅ 正確: 使用 namespace selector
|
||||
- ports:
|
||||
- port: 53
|
||||
protocol: UDP
|
||||
- port: 53
|
||||
protocol: TCP
|
||||
to:
|
||||
- namespaceSelector:
|
||||
matchLabels:
|
||||
kubernetes.io/metadata.name: kube-system
|
||||
podSelector:
|
||||
matchLabels:
|
||||
k8s-app: kube-dns
|
||||
```
|
||||
|
||||
### 3. CoreDNS 上游 DNS 設定
|
||||
|
||||
```yaml
|
||||
# ❌ 錯誤: 使用 /etc/resolv.conf (指向 127.0.0.53)
|
||||
forward . /etc/resolv.conf
|
||||
|
||||
# ✅ 正確: 使用真實 DNS 伺服器
|
||||
forward . 8.8.8.8 1.1.1.1
|
||||
```
|
||||
|
||||
### 4. Prometheus 鏈路監控規則
|
||||
|
||||
```yaml
|
||||
groups:
|
||||
- name: alert-chain-health
|
||||
rules:
|
||||
- alert: AlertChainBroken
|
||||
expr: increase(awoooi_alerts_received_total[1h]) == 0
|
||||
for: 1h
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "告警鏈路斷裂!1 小時內沒有收到任何告警"
|
||||
|
||||
- alert: TelegramNotificationFailed
|
||||
expr: increase(awoooi_telegram_sent_total[1h]) == 0 and increase(awoooi_alerts_received_total[1h]) > 0
|
||||
for: 30m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "Telegram 通知失敗!有告警但沒有發送成功"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## URL 路徑規範
|
||||
|
||||
| 正確 | 錯誤 |
|
||||
|-----|------|
|
||||
| `/api/v1/webhooks/alertmanager` | `/api/v1/webhook/alertmanager` |
|
||||
| 複數形式 `webhooks` | 單數形式 `webhook` |
|
||||
| `/api/v1/approvals` | `/api/v1/approval` |
|
||||
| `/api/v1/incidents` | `/api/v1/incident` |
|
||||
|
||||
**原則**: API Router 統一使用複數命名
|
||||
|
||||
---
|
||||
|
||||
## 驗收標準
|
||||
|
||||
| 項目 | 狀態 |
|
||||
|------|------|
|
||||
| CI/CD 包含 Alert Chain Smoke Test | ⬜ |
|
||||
| NetworkPolicy DNS 規則使用正確標籤 | ✅ |
|
||||
| CoreDNS 使用真實上游 DNS | ✅ |
|
||||
| Prometheus 鏈路監控規則已部署 | ⬜ |
|
||||
| ConfigMap 修改前驗證 Hook | ⬜ |
|
||||
|
||||
---
|
||||
|
||||
## 教訓
|
||||
|
||||
> "路徑差一個 s,所以 404" — 這種低級錯誤絕對不能再犯。
|
||||
> 必須靠自動化驗證,不能靠人眼審查。
|
||||
|
||||
---
|
||||
|
||||
## 關聯文件
|
||||
|
||||
- Memory: `feedback_alertchain_e2e_validation.md`
|
||||
- ADR-011: NetworkPolicy 變更治理架構
|
||||
- Skill 04: DevOps Commander
|
||||
- Skill 05: SRE QA
|
||||
Reference in New Issue
Block a user