feat(k8s): CoreDNS GitOps 架構 (ADR-026)
問題: DNS 配置沒有版本控制,手動修改易遺失 架構: - k8s/k3s-system/coredns-custom.yaml: HelmChartConfig - CD workflow: k3s-system 路徑偵測 + 自動 apply - ADR-026: CoreDNS GitOps 管控架構 DNS 上游: - 使用 8.8.8.8 + 1.1.1.1 - 禁止 /etc/resolv.conf (systemd-resolved) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
12
.github/workflows/cd.yaml
vendored
12
.github/workflows/cd.yaml
vendored
@@ -111,6 +111,7 @@ jobs:
|
||||
outputs:
|
||||
api: ${{ inputs.force_deploy == true && 'true' || steps.filter.outputs.api }}
|
||||
web: ${{ inputs.force_deploy == true && 'true' || steps.filter.outputs.web }}
|
||||
k3s-system: ${{ steps.filter.outputs.k3s-system }}
|
||||
steps:
|
||||
# 2026-03-26: 清理暫存目錄 (temp + pages)
|
||||
- name: "Clean Runner temp"
|
||||
@@ -135,6 +136,8 @@ jobs:
|
||||
- 'packages/**'
|
||||
- 'package.json'
|
||||
- 'pnpm-lock.yaml'
|
||||
k3s-system:
|
||||
- 'k8s/k3s-system/**'
|
||||
|
||||
# ==================== 並行建構 API ====================
|
||||
build-api:
|
||||
@@ -280,6 +283,15 @@ jobs:
|
||||
|
||||
kubectl apply -k .
|
||||
|
||||
# 2026-03-26: CoreDNS GitOps 同步 (ADR-026)
|
||||
- name: Sync CoreDNS Config
|
||||
if: needs.detect-changes.outputs.k3s-system == 'true'
|
||||
run: |
|
||||
echo "📦 同步 CoreDNS 配置到 K3s..."
|
||||
# HelmChartConfig 是 K8s 資源,直接 apply
|
||||
kubectl apply -f k8s/k3s-system/coredns-custom.yaml
|
||||
echo "✅ CoreDNS 配置已同步"
|
||||
|
||||
- name: Wait for rollout
|
||||
run: |
|
||||
kubectl rollout status deployment/awoooi-web -n awoooi-prod --timeout=300s || true
|
||||
|
||||
@@ -5,18 +5,61 @@
|
||||
|
||||
---
|
||||
|
||||
## 📍 當前狀態 (2026-03-26 23:45 台北)
|
||||
## 📍 當前狀態 (2026-03-27 00:40 台北)
|
||||
|
||||
| 項目 | 狀態 |
|
||||
|------|------|
|
||||
| **當前 Phase** | **Phase 13.2 #84 RAG Tool** |
|
||||
| **Day** | Day 9 |
|
||||
| **AI Fallback** | ✅ **Gemini 優先 (Rate Limiter 保護)** |
|
||||
| **Phase 13.2** | 🔄 **#84 RAG Tool 進行中** |
|
||||
| **當前 Phase** | **Phase 13.2 #84 RAG Tool ✅** |
|
||||
| **Day** | Day 10 |
|
||||
| **AI Fallback** | ✅ **Gemini 優先 (7/500 daily used)** |
|
||||
| **Phase 13.2** | ✅ **#84 RAGProvider 完成,CD 部署中** |
|
||||
| **Phase 16** | 🔄 R1.3 驗證期至 2026-03-27 16:04 |
|
||||
| **架構審查** | ✅ **ADR-025 CI/CD AI 整合** |
|
||||
| **Skills** | ✅ **Skill 07 v1.3 更新** |
|
||||
|
||||
### ✅ 2026-03-26 Telegram 告警鏈修復 + CoreDNS GitOps (Day 9 傍晚 18:45)
|
||||
|
||||
**問題**: Telegram 兩天無告警 + 簽核後內容消失
|
||||
|
||||
**根因分析**:
|
||||
1. NetworkPolicy DNS 規則標籤錯誤 (CoreDNS 只有 k8s-app=kube-dns)
|
||||
2. CoreDNS forward 使用 /etc/resolv.conf → 127.0.0.53 (容器無法使用)
|
||||
3. Alertmanager 指向舊系統 (momo-pro-system)
|
||||
4. 前端簽核後立即移除卡片
|
||||
|
||||
**修復內容**:
|
||||
- NetworkPolicy: 使用 namespaceSelector 指定 kube-system (ADR-011 Appendix B)
|
||||
- CoreDNS GitOps: ADR-026 + HelmChartConfig + CD 整合
|
||||
- 前端: 簽核後延遲 5 秒顯示結果
|
||||
|
||||
**新增檔案**:
|
||||
- `k8s/k3s-system/coredns-custom.yaml` - HelmChartConfig
|
||||
- `docs/adr/ADR-026-coredns-gitops.md` - CoreDNS GitOps 架構
|
||||
|
||||
**Commits**: 34bfa99, 7847e00
|
||||
|
||||
---
|
||||
|
||||
### ✅ 2026-03-27 Phase 13.2 #84 RAGProvider 完成 (Day 10 凌晨 00:40)
|
||||
|
||||
**實作內容**:
|
||||
- `rag_provider.py`: RAG MCP Tool Provider (ADR-015 模組化)
|
||||
- `search_runbook`: 語義搜尋維運手冊
|
||||
- `index_documents`: 索引文檔
|
||||
- `get_index_stats`: 索引統計
|
||||
- `providers/__init__.py`: 註冊 RAGProvider
|
||||
|
||||
**首席架構師審查**:
|
||||
- ✅ 符合 ADR-015 模組化架構 (Interface + Lazy Loading + DI)
|
||||
- ✅ 健康檢查實作
|
||||
- ⚠️ base_path 計算改進建議 (使用 settings)
|
||||
|
||||
**Gemini 驗證**:
|
||||
- `/health/ai-usage` 確認: `fallback_order: ["gemini","ollama","claude"]`
|
||||
- 用量: Gemini 7/500 daily requests
|
||||
|
||||
**Commit**: 539f14b
|
||||
|
||||
### ✅ 2026-03-26 Gemini API 切換 + Rate Limiter (Day 9 晚間 23:45)
|
||||
|
||||
**統帥決策**: 臨時切換 Gemini API 排除 Ollama CPU 推論問題
|
||||
|
||||
142
docs/adr/ADR-026-coredns-gitops.md
Normal file
142
docs/adr/ADR-026-coredns-gitops.md
Normal file
@@ -0,0 +1,142 @@
|
||||
# ADR-026: CoreDNS GitOps 管控架構
|
||||
|
||||
**狀態**: 批准
|
||||
**日期**: 2026-03-26 (台北時區)
|
||||
**決策者**: 統帥
|
||||
**觸發**: DNS 解析問題 + NetworkPolicy 連鎖事故 (ADR-011)
|
||||
|
||||
## 問題陳述
|
||||
|
||||
```
|
||||
事故時間線:
|
||||
├── 2026-03-26: Telegram 告警失敗 → 發現 Pod 無法解析外部 DNS
|
||||
├── 根因: CoreDNS forward 設定使用 /etc/resolv.conf
|
||||
│ → 解析到 127.0.0.53 (systemd-resolved)
|
||||
│ → 容器內無法使用宿主機 loopback
|
||||
└── 修復後: 需要版本控制防止再次發生
|
||||
```
|
||||
|
||||
**核心問題**:
|
||||
1. CoreDNS 配置沒有版本控制
|
||||
2. 手動修改容易遺失或被覆蓋
|
||||
3. 與 K3s 內建 Helm Controller 衝突風險
|
||||
|
||||
---
|
||||
|
||||
## 決策:K3s HelmChartConfig + GitOps
|
||||
|
||||
### 為什麼不用純 ArgoCD?
|
||||
|
||||
| 方案 | 問題 |
|
||||
|------|------|
|
||||
| ArgoCD 管 kube-system | 與 K3s Helm Controller 衝突 |
|
||||
| 直接 kubectl apply ConfigMap | K3s 會覆蓋手動修改 |
|
||||
| **HelmChartConfig** | K3s 原生支援,不衝突 |
|
||||
|
||||
### 架構設計
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ CoreDNS GitOps 架構 │
|
||||
├─────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ Git Repository (版本控制) │
|
||||
│ └── k8s/k3s-system/coredns-custom.yaml │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ CD Pipeline (GitHub Actions) │
|
||||
│ └── kubectl apply -f k8s/k3s-system/ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ K3s Helm Controller │
|
||||
│ └── 偵測 HelmChartConfig 變更 │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ CoreDNS Deployment 更新 │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 實作細節
|
||||
|
||||
### 1. 配置檔案 (`k8s/k3s-system/coredns-custom.yaml`)
|
||||
|
||||
```yaml
|
||||
apiVersion: helm.cattle.io/v1
|
||||
kind: HelmChartConfig
|
||||
metadata:
|
||||
name: coredns
|
||||
namespace: kube-system
|
||||
spec:
|
||||
valuesContent: |-
|
||||
servers:
|
||||
- zones:
|
||||
- zone: .
|
||||
port: 53
|
||||
plugins:
|
||||
- name: forward
|
||||
parameters: . 8.8.8.8 1.1.1.1 # 禁止使用 /etc/resolv.conf
|
||||
```
|
||||
|
||||
### 2. CD 整合 (`.github/workflows/cd.yaml`)
|
||||
|
||||
```yaml
|
||||
- name: Sync CoreDNS Config
|
||||
if: needs.detect-changes.outputs.k3s-system == 'true'
|
||||
run: kubectl apply -f k8s/k3s-system/coredns-custom.yaml
|
||||
```
|
||||
|
||||
### 3. 路徑偵測
|
||||
|
||||
```yaml
|
||||
k3s-system:
|
||||
- 'k8s/k3s-system/**'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## DNS 上游選擇
|
||||
|
||||
| 優先順序 | DNS 伺服器 | 說明 |
|
||||
|----------|-----------|------|
|
||||
| 1 | 8.8.8.8 | Google Public DNS |
|
||||
| 2 | 1.1.1.1 | Cloudflare DNS |
|
||||
|
||||
**禁止清單**:
|
||||
- `127.0.0.53` - systemd-resolved (容器無法使用)
|
||||
- `/etc/resolv.conf` - 可能指向宿主機 loopback
|
||||
|
||||
---
|
||||
|
||||
## 驗收標準
|
||||
|
||||
| 項目 | 狀態 |
|
||||
|------|------|
|
||||
| HelmChartConfig 建立 | ✅ |
|
||||
| CD workflow 整合 | ✅ |
|
||||
| 版本控制於 Git | ✅ |
|
||||
| 禁止 systemd-resolved | ✅ |
|
||||
|
||||
---
|
||||
|
||||
## 關聯文件
|
||||
|
||||
- ADR-011: NetworkPolicy 變更治理架構
|
||||
- ADR-025: CI/CD AI 整合架構
|
||||
|
||||
---
|
||||
|
||||
## 附錄:驗證指令
|
||||
|
||||
```bash
|
||||
# 檢查 CoreDNS ConfigMap
|
||||
kubectl get cm coredns -n kube-system -o yaml
|
||||
|
||||
# 測試 DNS 解析
|
||||
kubectl run dns-test --rm -it --image=busybox --restart=Never -- nslookup google.com
|
||||
|
||||
# 檢查 HelmChartConfig 狀態
|
||||
kubectl get helmchartconfig -n kube-system
|
||||
```
|
||||
85
k8s/k3s-system/coredns-custom.yaml
Normal file
85
k8s/k3s-system/coredns-custom.yaml
Normal file
@@ -0,0 +1,85 @@
|
||||
# =============================================================================
|
||||
# K3s CoreDNS 自訂配置 (GitOps 版本控制)
|
||||
# =============================================================================
|
||||
# 負責人: CIO
|
||||
# 版本: v1.0
|
||||
# 日期: 2026-03-26 (台北時區)
|
||||
#
|
||||
# 原則:
|
||||
# - 使用 K3s 原生 HelmChartConfig (不與 K3s Helm Controller 衝突)
|
||||
# - 禁止使用宿主機 systemd-resolved (127.0.0.53)
|
||||
# - 上游 DNS: Google (8.8.8.8) + Cloudflare (1.1.1.1)
|
||||
#
|
||||
# 部署方式:
|
||||
# CD Pipeline 會將此檔案 SCP 到 K3s Master:
|
||||
# /var/lib/rancher/k3s/server/manifests/coredns-custom.yaml
|
||||
# K3s 會自動偵測並 Apply
|
||||
#
|
||||
# 變更紀錄:
|
||||
# - v1.0 (2026-03-26): 初始版本,修復 DNS 解析問題 (ADR-011 Appendix B)
|
||||
# =============================================================================
|
||||
|
||||
apiVersion: helm.cattle.io/v1
|
||||
kind: HelmChartConfig
|
||||
metadata:
|
||||
name: coredns
|
||||
namespace: kube-system
|
||||
spec:
|
||||
valuesContent: |-
|
||||
# CoreDNS 伺服器配置
|
||||
servers:
|
||||
- zones:
|
||||
- zone: .
|
||||
port: 53
|
||||
plugins:
|
||||
# 錯誤日誌
|
||||
- name: errors
|
||||
|
||||
# 健康檢查端點 (/health)
|
||||
- name: health
|
||||
configBlock: |-
|
||||
lameduck 5s
|
||||
|
||||
# 就緒檢查端點 (/ready)
|
||||
- name: ready
|
||||
|
||||
# Kubernetes 內部 DNS 解析
|
||||
- name: kubernetes
|
||||
parameters: cluster.local in-addr.arpa ip6.arpa
|
||||
configBlock: |-
|
||||
pods insecure
|
||||
fallthrough in-addr.arpa ip6.arpa
|
||||
ttl 30
|
||||
|
||||
# 上游 DNS 伺服器 (禁止使用 /etc/resolv.conf)
|
||||
# 🔴 重要: 不使用宿主機 systemd-resolved (127.0.0.53)
|
||||
- name: forward
|
||||
parameters: . 8.8.8.8 1.1.1.1
|
||||
|
||||
# DNS 快取 (30 秒)
|
||||
- name: cache
|
||||
parameters: 30
|
||||
|
||||
# 偵測 DNS 迴圈
|
||||
- name: loop
|
||||
|
||||
# 熱重載配置
|
||||
- name: reload
|
||||
|
||||
# 負載平衡 (多個 Pod 回應)
|
||||
- name: loadbalance
|
||||
|
||||
# 資源限制
|
||||
resources:
|
||||
limits:
|
||||
cpu: 100m
|
||||
memory: 128Mi
|
||||
requests:
|
||||
cpu: 50m
|
||||
memory: 64Mi
|
||||
|
||||
# 自動水平擴展
|
||||
autoscaler:
|
||||
enabled: true
|
||||
min: 1
|
||||
max: 3
|
||||
Reference in New Issue
Block a user