6.4 KiB
GCP Ollama WireGuard Mesh Runbook
Target state for ADR-125. This replaces the public GCP Ollama proxy as the primary path after shadow and canary validation.
Scope
This runbook builds private Ollama connectivity between AWOOOI K3s and the GCP Ollama hosts.
It does not replace AwoooP Inference Gateway work. The mesh solves transport and security. The gateway solves routing, queueing, model residency, and fallback.
Current State
Current production endpoints:
| Variable | Endpoint | Meaning |
|---|---|---|
OLLAMA_URL |
http://192.168.0.110:11435 |
GCP-A through 110 nginx |
OLLAMA_SECONDARY_URL |
http://192.168.0.110:11436 |
GCP-B through 110 nginx |
OLLAMA_FALLBACK_URL |
http://192.168.0.111:11434 |
Local 111 |
This is a bridge. Do not treat the public proxy as the final architecture.
Target State
| Host | WireGuard IP | Notes |
|---|---|---|
| 110 | 10.77.114.10 |
DevOps host and rollback bridge |
| 120 | 10.77.114.120 |
K3s node |
| 121 | 10.77.114.121 |
K3s node |
| 111 | 10.77.114.111 |
Local Ollama fallback |
| GCP-A | 10.77.114.21 |
Primary Ollama |
| GCP-B | 10.77.114.22 |
Secondary Ollama |
Production endpoints after cutover:
OLLAMA_URL: "http://10.77.114.21:11434"
OLLAMA_SECONDARY_URL: "http://10.77.114.22:11434"
OLLAMA_FALLBACK_URL: "http://10.77.114.111:11434"
Prerequisites
- SSH access to GCP-A and GCP-B.
- GCP IAM permissions for firewall rules if OS firewall alone is not enough.
- SSH access to 110, 111, 120, and 121.
- A secured place to store WireGuard private keys. Never commit private keys.
- Confirm the GCP hosts have enough CPU/RAM for
gemma3:4b.
Key Rules
- Private keys are generated on each host and never copied into Git.
- Public keys may be recorded in the operator handoff note.
- Public GCP
11434/tcpmust be closed after cutover. alert-fastusesgemma3:4b; 14B/32B models must not run on GCP-A/B during alert-lane canary.
Install WireGuard
Ubuntu/Debian:
sudo apt-get update
sudo apt-get install -y wireguard
Alpine:
sudo apk add --no-cache wireguard-tools
Generate keys on every host:
umask 077
wg genkey | sudo tee /etc/wireguard/awooop.key
sudo cat /etc/wireguard/awooop.key | wg pubkey | sudo tee /etc/wireguard/awooop.pub
Configure Peers
Create /etc/wireguard/wg-awooop.conf on each host.
Example for GCP-A:
[Interface]
Address = 10.77.114.21/32
ListenPort = 51820
PrivateKey = <GCP_A_PRIVATE_KEY>
[Peer]
# 120 K3s node
PublicKey = <K3S_120_PUBLIC_KEY>
AllowedIPs = 10.77.114.120/32
Endpoint = <120_REACHABLE_ENDPOINT>:51820
PersistentKeepalive = 25
[Peer]
# 121 K3s node
PublicKey = <K3S_121_PUBLIC_KEY>
AllowedIPs = 10.77.114.121/32
Endpoint = <121_REACHABLE_ENDPOINT>:51820
PersistentKeepalive = 25
[Peer]
# 110 DevOps rollback bridge
PublicKey = <HOST_110_PUBLIC_KEY>
AllowedIPs = 10.77.114.10/32
Endpoint = <110_REACHABLE_ENDPOINT>:51820
PersistentKeepalive = 25
Example for a K3s node:
[Interface]
Address = 10.77.114.120/32
ListenPort = 51820
PrivateKey = <K3S_120_PRIVATE_KEY>
[Peer]
# GCP-A
PublicKey = <GCP_A_PUBLIC_KEY>
AllowedIPs = 10.77.114.21/32
Endpoint = 34.143.170.20:51820
PersistentKeepalive = 25
[Peer]
# GCP-B
PublicKey = <GCP_B_PUBLIC_KEY>
AllowedIPs = 10.77.114.22/32
Endpoint = 34.21.145.224:51820
PersistentKeepalive = 25
[Peer]
# Local 111
PublicKey = <HOST_111_PUBLIC_KEY>
AllowedIPs = 10.77.114.111/32
Endpoint = 192.168.0.111:51820
PersistentKeepalive = 25
The exact peer list depends on reachable endpoints. If inbound access to 120/121 is not available, use 110 as a temporary mesh relay, then replace it with direct K3s-to-GCP peers when routing is confirmed.
Start WireGuard
sudo systemctl enable --now wg-quick@wg-awooop
sudo wg show wg-awooop
Verify connectivity:
ping -c 3 10.77.114.21
ping -c 3 10.77.114.22
curl -fsS http://10.77.114.21:11434/api/tags
curl -fsS http://10.77.114.22:11434/api/tags
Bind or Firewall Ollama
Preferred: bind Ollama to the mesh interface.
sudo systemctl edit ollama
[Service]
Environment="OLLAMA_HOST=10.77.114.21:11434"
Use 10.77.114.22:11434 on GCP-B.
If binding is not possible, firewall the host:
sudo ufw allow from 10.77.114.0/24 to any port 11434 proto tcp
sudo ufw deny 11434/tcp
Then restart:
sudo systemctl daemon-reload
sudo systemctl restart ollama
K8s NetworkPolicy
After mesh cutover, allow only mesh endpoints for Ollama:
- to:
- ipBlock:
cidr: 10.77.114.21/32
- ipBlock:
cidr: 10.77.114.22/32
- ipBlock:
cidr: 10.77.114.111/32
ports:
- protocol: TCP
port: 11434
Do not remove the 192.168.0.110:11435/11436 rules until rollback is no longer
needed.
Shadow Validation
From the API pod:
bash scripts/ops/ollama-topology-check.sh
Expected:
- GCP-A
/api/tagsreturns 200. - GCP-B
/api/tagsreturns 200. gemma3:4bgeneration succeeds on both nodes./api/pscontainsgemma3:4b.- If
size_vram=0, keep GCP-A/B onalert-fastonly and route heavy models to 111 or a GPU-capable node.
Cutover
Patch deployment env after shadow passes:
kubectl -n awoooi-prod set env deploy/awoooi-api \
OLLAMA_URL=http://10.77.114.21:11434 \
OLLAMA_SECONDARY_URL=http://10.77.114.22:11434 \
OLLAMA_FALLBACK_URL=http://10.77.114.111:11434
kubectl -n awoooi-prod set env deploy/awoooi-worker \
OLLAMA_URL=http://10.77.114.21:11434 \
OLLAMA_SECONDARY_URL=http://10.77.114.22:11434 \
OLLAMA_FALLBACK_URL=http://10.77.114.111:11434
Verify:
kubectl -n awoooi-prod rollout status deploy/awoooi-api --timeout=180s
kubectl -n awoooi-prod rollout status deploy/awoooi-worker --timeout=180s
bash scripts/ops/ollama-topology-check.sh
Rollback
kubectl -n awoooi-prod set env deploy/awoooi-api \
OLLAMA_URL=http://192.168.0.110:11435 \
OLLAMA_SECONDARY_URL=http://192.168.0.110:11436 \
OLLAMA_FALLBACK_URL=http://192.168.0.111:11434
kubectl -n awoooi-prod set env deploy/awoooi-worker \
OLLAMA_URL=http://192.168.0.110:11435 \
OLLAMA_SECONDARY_URL=http://192.168.0.110:11436 \
OLLAMA_FALLBACK_URL=http://192.168.0.111:11434
Done Criteria
- Mesh endpoints pass 7 days of canary.
- Alert lane Gemini usage is zero except documented all-Ollama outages.
- Public GCP
11434/tcpis closed. - Operator runbook records peer public keys and rollback owner.