Files
awoooi/docs/runbooks/ssh-mcp-setup.md
OG T a1432c03ed docs: ADR-070/071 + ssh-mcp-setup runbook + Skill-04 v2.7
- ADR-070: 全自動 AIOps 閉環 MCP Phase 1-4 決策文件
- ADR-071: 告警通知四類型 + KM 三段資料閉環決策文件
- docs/runbooks/ssh-mcp-setup.md: SSH MCP 建立/驗證/輪換 SOP
- Skill-04: v2.7 新增 Sprint C DR + ADR-070 MCP 10 providers 完整記錄

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-11 20:04:47 +08:00

3.9 KiB
Raw Blame History

SSH MCP Provider 設定 Runbook

建立日期: 2026-04-11 (Claude Sonnet 4.6)
關聯: ADR-070 MCP Phase 2a


架構說明

SSH MCP Provider (ssh_provider.py) 讓 API Pod 可透過 SSH 對宿主機執行診斷和修復指令。

K8s Pod (awoooi-api)
  ↓ asyncssh
188 (ollama@192.168.0.188) — Docker/Nginx/PM2 操作
110 (wooo@192.168.0.110)   — Harbor/Gitea/Sentry 操作
111 (ollama@192.168.0.111) — Ollama GPU 操作

安全設計

四層守衛

  1. SSH Key 認證ed25519 私鑰掛載至 /run/secrets/ssh_mcp_key0400
  2. known_hosts 驗證/etc/ssh-mcp/known_hosts — 僅信任已知主機指紋
  3. 參數白名單:所有工具參數都通過 _validate_param() 驗證
  4. 指令白名單:只允許預定義工具,不允許任意指令執行

參數驗證規則

參數類型 規則
container_name ^[a-zA-Z0-9][a-zA-Z0-9._-]{0,127}$
service 同 container_name
compose_dir 必須以 /opt//srv/ 開頭,禁止 ..
domain FQDN 格式
tail/lines int1-5000
port int1-65535

建立步驟

1. 生成 SSH Key Pair

ssh-keygen -t ed25519 -f /tmp/ssh-mcp-key -N "" -C "awoooi-mcp@k3s"

2. 將公鑰加入目標主機

ssh-copy-id -i /tmp/ssh-mcp-key.pub ollama@192.168.0.188
ssh-copy-id -i /tmp/ssh-mcp-key.pub wooo@192.168.0.110
ssh-copy-id -i /tmp/ssh-mcp-key.pub ollama@192.168.0.111

3. 生成 known_hosts

ssh-keyscan -H 192.168.0.188 192.168.0.110 192.168.0.111 > /tmp/ssh-mcp-known_hosts

4. 建立 K8s Secret

kubectl create secret generic ssh-mcp-key \
  --from-file=ssh_mcp_key=/tmp/ssh-mcp-key \
  --from-literal=known_hosts="$(cat /tmp/ssh-mcp-known_hosts)" \
  -n awoooi-prod

# 清除暫存
rm /tmp/ssh-mcp-key /tmp/ssh-mcp-key.pub /tmp/ssh-mcp-known_hosts

5. ConfigMap 設定(已設定)

SSH_MCP_ENABLED: "true"
SSH_MCP_KNOWN_HOSTS_FILE: "/etc/ssh-mcp/known_hosts"

6. Deployment Volume Mount已設定

volumeMounts:
  - name: ssh-mcp-key
    mountPath: /run/secrets/ssh_mcp_key
    subPath: ssh_mcp_key
    readOnly: true
  - name: ssh-mcp-key
    mountPath: /etc/ssh-mcp/known_hosts
    subPath: known_hosts
    readOnly: true
volumes:
  - name: ssh-mcp-key
    secret:
      secretName: ssh-mcp-key
      defaultMode: 0400
      optional: true

驗證

# 確認私鑰掛載
kubectl exec -n awoooi-prod deploy/awoooi-api -- ls -la /run/secrets/ssh_mcp_key
# 期望: -r--r----- ... 411 ...

# 確認 known_hosts 掛載
kubectl exec -n awoooi-prod deploy/awoooi-api -- ls -la /etc/ssh-mcp/known_hosts

# 確認 provider 已啟用
kubectl logs -n awoooi-prod deploy/awoooi-api | grep '"name": "ssh_host"'
# 期望: "enabled": true

# 測試 SSH 連線(從本機)
ssh -i /tmp/ssh-mcp-key ollama@192.168.0.188 "echo OK"

輪換 Key

# 1. 生成新 key
ssh-keygen -t ed25519 -f /tmp/ssh-mcp-key-new -N "" -C "awoooi-mcp@k3s-$(date +%Y%m%d)"

# 2. 雙寫(新舊並存,避免服務中斷)
ssh-copy-id -i /tmp/ssh-mcp-key-new.pub ollama@192.168.0.188
ssh-copy-id -i /tmp/ssh-mcp-key-new.pub wooo@192.168.0.110

# 3. 更新 Secret
kubectl create secret generic ssh-mcp-key \
  --from-file=ssh_mcp_key=/tmp/ssh-mcp-key-new \
  --dry-run=client -o yaml | kubectl apply -f -

# 4. Rollout restart
kubectl rollout restart deploy/awoooi-api -n awoooi-prod

# 5. 確認後移除舊 key從各主機 authorized_keys 刪除舊 key

故障排查

症狀 原因 解決
ssh_host provider enabled=false SSH_MCP_ENABLED 未設定 確認 ConfigMap
known_hosts WARNING SSH_MCP_KNOWN_HOSTS_FILE 指向空檔 確認 Secret 有 known_hosts key
Connection refused authorized_keys 未加入公鑰 重做步驟 2
Host key verification failed known_hosts 過期 重做步驟 3+4