Files
awoooi/docs/runbooks/GCP-OLLAMA-WIREGUARD-MESH.md
Your Name ed7c6946cb
All checks were successful
Code Review / ai-code-review (push) Successful in 10s
docs(awooop): define private Ollama mesh gateway
2026-05-05 22:56:22 +08:00

281 lines
6.4 KiB
Markdown

# GCP Ollama WireGuard Mesh Runbook
> Target state for ADR-125. This replaces the public GCP Ollama proxy as the
> primary path after shadow and canary validation.
---
## Scope
This runbook builds private Ollama connectivity between AWOOOI K3s and the GCP
Ollama hosts.
It does not replace AwoooP Inference Gateway work. The mesh solves transport and
security. The gateway solves routing, queueing, model residency, and fallback.
## Current State
Current production endpoints:
| Variable | Endpoint | Meaning |
|----------|----------|---------|
| `OLLAMA_URL` | `http://192.168.0.110:11435` | GCP-A through 110 nginx |
| `OLLAMA_SECONDARY_URL` | `http://192.168.0.110:11436` | GCP-B through 110 nginx |
| `OLLAMA_FALLBACK_URL` | `http://192.168.0.111:11434` | Local 111 |
This is a bridge. Do not treat the public proxy as the final architecture.
## Target State
| Host | WireGuard IP | Notes |
|------|--------------|-------|
| 110 | `10.77.114.10` | DevOps host and rollback bridge |
| 120 | `10.77.114.120` | K3s node |
| 121 | `10.77.114.121` | K3s node |
| 111 | `10.77.114.111` | Local Ollama fallback |
| GCP-A | `10.77.114.21` | Primary Ollama |
| GCP-B | `10.77.114.22` | Secondary Ollama |
Production endpoints after cutover:
```yaml
OLLAMA_URL: "http://10.77.114.21:11434"
OLLAMA_SECONDARY_URL: "http://10.77.114.22:11434"
OLLAMA_FALLBACK_URL: "http://10.77.114.111:11434"
```
## Prerequisites
- SSH access to GCP-A and GCP-B.
- GCP IAM permissions for firewall rules if OS firewall alone is not enough.
- SSH access to 110, 111, 120, and 121.
- A secured place to store WireGuard private keys. Never commit private keys.
- Confirm the GCP hosts have enough CPU/RAM for `gemma3:4b`.
## Key Rules
- Private keys are generated on each host and never copied into Git.
- Public keys may be recorded in the operator handoff note.
- Public GCP `11434/tcp` must be closed after cutover.
- `alert-fast` uses `gemma3:4b`; 14B/32B models must not run on GCP-A/B during
alert-lane canary.
## Install WireGuard
Ubuntu/Debian:
```bash
sudo apt-get update
sudo apt-get install -y wireguard
```
Alpine:
```bash
sudo apk add --no-cache wireguard-tools
```
Generate keys on every host:
```bash
umask 077
wg genkey | sudo tee /etc/wireguard/awooop.key
sudo cat /etc/wireguard/awooop.key | wg pubkey | sudo tee /etc/wireguard/awooop.pub
```
## Configure Peers
Create `/etc/wireguard/wg-awooop.conf` on each host.
Example for GCP-A:
```ini
[Interface]
Address = 10.77.114.21/32
ListenPort = 51820
PrivateKey = <GCP_A_PRIVATE_KEY>
[Peer]
# 120 K3s node
PublicKey = <K3S_120_PUBLIC_KEY>
AllowedIPs = 10.77.114.120/32
Endpoint = <120_REACHABLE_ENDPOINT>:51820
PersistentKeepalive = 25
[Peer]
# 121 K3s node
PublicKey = <K3S_121_PUBLIC_KEY>
AllowedIPs = 10.77.114.121/32
Endpoint = <121_REACHABLE_ENDPOINT>:51820
PersistentKeepalive = 25
[Peer]
# 110 DevOps rollback bridge
PublicKey = <HOST_110_PUBLIC_KEY>
AllowedIPs = 10.77.114.10/32
Endpoint = <110_REACHABLE_ENDPOINT>:51820
PersistentKeepalive = 25
```
Example for a K3s node:
```ini
[Interface]
Address = 10.77.114.120/32
ListenPort = 51820
PrivateKey = <K3S_120_PRIVATE_KEY>
[Peer]
# GCP-A
PublicKey = <GCP_A_PUBLIC_KEY>
AllowedIPs = 10.77.114.21/32
Endpoint = 34.143.170.20:51820
PersistentKeepalive = 25
[Peer]
# GCP-B
PublicKey = <GCP_B_PUBLIC_KEY>
AllowedIPs = 10.77.114.22/32
Endpoint = 34.21.145.224:51820
PersistentKeepalive = 25
[Peer]
# Local 111
PublicKey = <HOST_111_PUBLIC_KEY>
AllowedIPs = 10.77.114.111/32
Endpoint = 192.168.0.111:51820
PersistentKeepalive = 25
```
The exact peer list depends on reachable endpoints. If inbound access to 120/121
is not available, use 110 as a temporary mesh relay, then replace it with direct
K3s-to-GCP peers when routing is confirmed.
## Start WireGuard
```bash
sudo systemctl enable --now wg-quick@wg-awooop
sudo wg show wg-awooop
```
Verify connectivity:
```bash
ping -c 3 10.77.114.21
ping -c 3 10.77.114.22
curl -fsS http://10.77.114.21:11434/api/tags
curl -fsS http://10.77.114.22:11434/api/tags
```
## Bind or Firewall Ollama
Preferred: bind Ollama to the mesh interface.
```bash
sudo systemctl edit ollama
```
```ini
[Service]
Environment="OLLAMA_HOST=10.77.114.21:11434"
```
Use `10.77.114.22:11434` on GCP-B.
If binding is not possible, firewall the host:
```bash
sudo ufw allow from 10.77.114.0/24 to any port 11434 proto tcp
sudo ufw deny 11434/tcp
```
Then restart:
```bash
sudo systemctl daemon-reload
sudo systemctl restart ollama
```
## K8s NetworkPolicy
After mesh cutover, allow only mesh endpoints for Ollama:
```yaml
- to:
- ipBlock:
cidr: 10.77.114.21/32
- ipBlock:
cidr: 10.77.114.22/32
- ipBlock:
cidr: 10.77.114.111/32
ports:
- protocol: TCP
port: 11434
```
Do not remove the `192.168.0.110:11435/11436` rules until rollback is no longer
needed.
## Shadow Validation
From the API pod:
```bash
bash scripts/ops/ollama-topology-check.sh
```
Expected:
- GCP-A `/api/tags` returns 200.
- GCP-B `/api/tags` returns 200.
- `gemma3:4b` generation succeeds on both nodes.
- `/api/ps` contains `gemma3:4b`.
- If `size_vram=0`, keep GCP-A/B on `alert-fast` only and route heavy models to
111 or a GPU-capable node.
## Cutover
Patch deployment env after shadow passes:
```bash
kubectl -n awoooi-prod set env deploy/awoooi-api \
OLLAMA_URL=http://10.77.114.21:11434 \
OLLAMA_SECONDARY_URL=http://10.77.114.22:11434 \
OLLAMA_FALLBACK_URL=http://10.77.114.111:11434
kubectl -n awoooi-prod set env deploy/awoooi-worker \
OLLAMA_URL=http://10.77.114.21:11434 \
OLLAMA_SECONDARY_URL=http://10.77.114.22:11434 \
OLLAMA_FALLBACK_URL=http://10.77.114.111:11434
```
Verify:
```bash
kubectl -n awoooi-prod rollout status deploy/awoooi-api --timeout=180s
kubectl -n awoooi-prod rollout status deploy/awoooi-worker --timeout=180s
bash scripts/ops/ollama-topology-check.sh
```
## Rollback
```bash
kubectl -n awoooi-prod set env deploy/awoooi-api \
OLLAMA_URL=http://192.168.0.110:11435 \
OLLAMA_SECONDARY_URL=http://192.168.0.110:11436 \
OLLAMA_FALLBACK_URL=http://192.168.0.111:11434
kubectl -n awoooi-prod set env deploy/awoooi-worker \
OLLAMA_URL=http://192.168.0.110:11435 \
OLLAMA_SECONDARY_URL=http://192.168.0.110:11436 \
OLLAMA_FALLBACK_URL=http://192.168.0.111:11434
```
## Done Criteria
- Mesh endpoints pass 7 days of canary.
- Alert lane Gemini usage is zero except documented all-Ollama outages.
- Public GCP `11434/tcp` is closed.
- Operator runbook records peer public keys and rollback owner.