281 lines
6.4 KiB
Markdown
281 lines
6.4 KiB
Markdown
# GCP Ollama WireGuard Mesh Runbook
|
|
|
|
> Target state for ADR-125. This replaces the public GCP Ollama proxy as the
|
|
> primary path after shadow and canary validation.
|
|
|
|
---
|
|
|
|
## Scope
|
|
|
|
This runbook builds private Ollama connectivity between AWOOOI K3s and the GCP
|
|
Ollama hosts.
|
|
|
|
It does not replace AwoooP Inference Gateway work. The mesh solves transport and
|
|
security. The gateway solves routing, queueing, model residency, and fallback.
|
|
|
|
## Current State
|
|
|
|
Current production endpoints:
|
|
|
|
| Variable | Endpoint | Meaning |
|
|
|----------|----------|---------|
|
|
| `OLLAMA_URL` | `http://192.168.0.110:11435` | GCP-A through 110 nginx |
|
|
| `OLLAMA_SECONDARY_URL` | `http://192.168.0.110:11436` | GCP-B through 110 nginx |
|
|
| `OLLAMA_FALLBACK_URL` | `http://192.168.0.111:11434` | Local 111 |
|
|
|
|
This is a bridge. Do not treat the public proxy as the final architecture.
|
|
|
|
## Target State
|
|
|
|
| Host | WireGuard IP | Notes |
|
|
|------|--------------|-------|
|
|
| 110 | `10.77.114.10` | DevOps host and rollback bridge |
|
|
| 120 | `10.77.114.120` | K3s node |
|
|
| 121 | `10.77.114.121` | K3s node |
|
|
| 111 | `10.77.114.111` | Local Ollama fallback |
|
|
| GCP-A | `10.77.114.21` | Primary Ollama |
|
|
| GCP-B | `10.77.114.22` | Secondary Ollama |
|
|
|
|
Production endpoints after cutover:
|
|
|
|
```yaml
|
|
OLLAMA_URL: "http://10.77.114.21:11434"
|
|
OLLAMA_SECONDARY_URL: "http://10.77.114.22:11434"
|
|
OLLAMA_FALLBACK_URL: "http://10.77.114.111:11434"
|
|
```
|
|
|
|
## Prerequisites
|
|
|
|
- SSH access to GCP-A and GCP-B.
|
|
- GCP IAM permissions for firewall rules if OS firewall alone is not enough.
|
|
- SSH access to 110, 111, 120, and 121.
|
|
- A secured place to store WireGuard private keys. Never commit private keys.
|
|
- Confirm the GCP hosts have enough CPU/RAM for `gemma3:4b`.
|
|
|
|
## Key Rules
|
|
|
|
- Private keys are generated on each host and never copied into Git.
|
|
- Public keys may be recorded in the operator handoff note.
|
|
- Public GCP `11434/tcp` must be closed after cutover.
|
|
- `alert-fast` uses `gemma3:4b`; 14B/32B models must not run on GCP-A/B during
|
|
alert-lane canary.
|
|
|
|
## Install WireGuard
|
|
|
|
Ubuntu/Debian:
|
|
|
|
```bash
|
|
sudo apt-get update
|
|
sudo apt-get install -y wireguard
|
|
```
|
|
|
|
Alpine:
|
|
|
|
```bash
|
|
sudo apk add --no-cache wireguard-tools
|
|
```
|
|
|
|
Generate keys on every host:
|
|
|
|
```bash
|
|
umask 077
|
|
wg genkey | sudo tee /etc/wireguard/awooop.key
|
|
sudo cat /etc/wireguard/awooop.key | wg pubkey | sudo tee /etc/wireguard/awooop.pub
|
|
```
|
|
|
|
## Configure Peers
|
|
|
|
Create `/etc/wireguard/wg-awooop.conf` on each host.
|
|
|
|
Example for GCP-A:
|
|
|
|
```ini
|
|
[Interface]
|
|
Address = 10.77.114.21/32
|
|
ListenPort = 51820
|
|
PrivateKey = <GCP_A_PRIVATE_KEY>
|
|
|
|
[Peer]
|
|
# 120 K3s node
|
|
PublicKey = <K3S_120_PUBLIC_KEY>
|
|
AllowedIPs = 10.77.114.120/32
|
|
Endpoint = <120_REACHABLE_ENDPOINT>:51820
|
|
PersistentKeepalive = 25
|
|
|
|
[Peer]
|
|
# 121 K3s node
|
|
PublicKey = <K3S_121_PUBLIC_KEY>
|
|
AllowedIPs = 10.77.114.121/32
|
|
Endpoint = <121_REACHABLE_ENDPOINT>:51820
|
|
PersistentKeepalive = 25
|
|
|
|
[Peer]
|
|
# 110 DevOps rollback bridge
|
|
PublicKey = <HOST_110_PUBLIC_KEY>
|
|
AllowedIPs = 10.77.114.10/32
|
|
Endpoint = <110_REACHABLE_ENDPOINT>:51820
|
|
PersistentKeepalive = 25
|
|
```
|
|
|
|
Example for a K3s node:
|
|
|
|
```ini
|
|
[Interface]
|
|
Address = 10.77.114.120/32
|
|
ListenPort = 51820
|
|
PrivateKey = <K3S_120_PRIVATE_KEY>
|
|
|
|
[Peer]
|
|
# GCP-A
|
|
PublicKey = <GCP_A_PUBLIC_KEY>
|
|
AllowedIPs = 10.77.114.21/32
|
|
Endpoint = 34.143.170.20:51820
|
|
PersistentKeepalive = 25
|
|
|
|
[Peer]
|
|
# GCP-B
|
|
PublicKey = <GCP_B_PUBLIC_KEY>
|
|
AllowedIPs = 10.77.114.22/32
|
|
Endpoint = 34.21.145.224:51820
|
|
PersistentKeepalive = 25
|
|
|
|
[Peer]
|
|
# Local 111
|
|
PublicKey = <HOST_111_PUBLIC_KEY>
|
|
AllowedIPs = 10.77.114.111/32
|
|
Endpoint = 192.168.0.111:51820
|
|
PersistentKeepalive = 25
|
|
```
|
|
|
|
The exact peer list depends on reachable endpoints. If inbound access to 120/121
|
|
is not available, use 110 as a temporary mesh relay, then replace it with direct
|
|
K3s-to-GCP peers when routing is confirmed.
|
|
|
|
## Start WireGuard
|
|
|
|
```bash
|
|
sudo systemctl enable --now wg-quick@wg-awooop
|
|
sudo wg show wg-awooop
|
|
```
|
|
|
|
Verify connectivity:
|
|
|
|
```bash
|
|
ping -c 3 10.77.114.21
|
|
ping -c 3 10.77.114.22
|
|
curl -fsS http://10.77.114.21:11434/api/tags
|
|
curl -fsS http://10.77.114.22:11434/api/tags
|
|
```
|
|
|
|
## Bind or Firewall Ollama
|
|
|
|
Preferred: bind Ollama to the mesh interface.
|
|
|
|
```bash
|
|
sudo systemctl edit ollama
|
|
```
|
|
|
|
```ini
|
|
[Service]
|
|
Environment="OLLAMA_HOST=10.77.114.21:11434"
|
|
```
|
|
|
|
Use `10.77.114.22:11434` on GCP-B.
|
|
|
|
If binding is not possible, firewall the host:
|
|
|
|
```bash
|
|
sudo ufw allow from 10.77.114.0/24 to any port 11434 proto tcp
|
|
sudo ufw deny 11434/tcp
|
|
```
|
|
|
|
Then restart:
|
|
|
|
```bash
|
|
sudo systemctl daemon-reload
|
|
sudo systemctl restart ollama
|
|
```
|
|
|
|
## K8s NetworkPolicy
|
|
|
|
After mesh cutover, allow only mesh endpoints for Ollama:
|
|
|
|
```yaml
|
|
- to:
|
|
- ipBlock:
|
|
cidr: 10.77.114.21/32
|
|
- ipBlock:
|
|
cidr: 10.77.114.22/32
|
|
- ipBlock:
|
|
cidr: 10.77.114.111/32
|
|
ports:
|
|
- protocol: TCP
|
|
port: 11434
|
|
```
|
|
|
|
Do not remove the `192.168.0.110:11435/11436` rules until rollback is no longer
|
|
needed.
|
|
|
|
## Shadow Validation
|
|
|
|
From the API pod:
|
|
|
|
```bash
|
|
bash scripts/ops/ollama-topology-check.sh
|
|
```
|
|
|
|
Expected:
|
|
|
|
- GCP-A `/api/tags` returns 200.
|
|
- GCP-B `/api/tags` returns 200.
|
|
- `gemma3:4b` generation succeeds on both nodes.
|
|
- `/api/ps` contains `gemma3:4b`.
|
|
- If `size_vram=0`, keep GCP-A/B on `alert-fast` only and route heavy models to
|
|
111 or a GPU-capable node.
|
|
|
|
## Cutover
|
|
|
|
Patch deployment env after shadow passes:
|
|
|
|
```bash
|
|
kubectl -n awoooi-prod set env deploy/awoooi-api \
|
|
OLLAMA_URL=http://10.77.114.21:11434 \
|
|
OLLAMA_SECONDARY_URL=http://10.77.114.22:11434 \
|
|
OLLAMA_FALLBACK_URL=http://10.77.114.111:11434
|
|
|
|
kubectl -n awoooi-prod set env deploy/awoooi-worker \
|
|
OLLAMA_URL=http://10.77.114.21:11434 \
|
|
OLLAMA_SECONDARY_URL=http://10.77.114.22:11434 \
|
|
OLLAMA_FALLBACK_URL=http://10.77.114.111:11434
|
|
```
|
|
|
|
Verify:
|
|
|
|
```bash
|
|
kubectl -n awoooi-prod rollout status deploy/awoooi-api --timeout=180s
|
|
kubectl -n awoooi-prod rollout status deploy/awoooi-worker --timeout=180s
|
|
bash scripts/ops/ollama-topology-check.sh
|
|
```
|
|
|
|
## Rollback
|
|
|
|
```bash
|
|
kubectl -n awoooi-prod set env deploy/awoooi-api \
|
|
OLLAMA_URL=http://192.168.0.110:11435 \
|
|
OLLAMA_SECONDARY_URL=http://192.168.0.110:11436 \
|
|
OLLAMA_FALLBACK_URL=http://192.168.0.111:11434
|
|
|
|
kubectl -n awoooi-prod set env deploy/awoooi-worker \
|
|
OLLAMA_URL=http://192.168.0.110:11435 \
|
|
OLLAMA_SECONDARY_URL=http://192.168.0.110:11436 \
|
|
OLLAMA_FALLBACK_URL=http://192.168.0.111:11434
|
|
```
|
|
|
|
## Done Criteria
|
|
|
|
- Mesh endpoints pass 7 days of canary.
|
|
- Alert lane Gemini usage is zero except documented all-Ollama outages.
|
|
- Public GCP `11434/tcp` is closed.
|
|
- Operator runbook records peer public keys and rollback owner.
|
|
|