wooo/awoooi

Fork 0

Files

Your Name ed7c6946cb

Code Review / ai-code-review (push) Successful in 10s

Details

docs(awooop): define private Ollama mesh gateway

2026-05-05 22:56:22 +08:00

6.4 KiB

Raw Blame History

GCP Ollama WireGuard Mesh Runbook

Target state for ADR-125. This replaces the public GCP Ollama proxy as the primary path after shadow and canary validation.

Scope

This runbook builds private Ollama connectivity between AWOOOI K3s and the GCP Ollama hosts.

It does not replace AwoooP Inference Gateway work. The mesh solves transport and security. The gateway solves routing, queueing, model residency, and fallback.

Current State

Current production endpoints:

Variable	Endpoint	Meaning
`OLLAMA_URL`	`http://192.168.0.110:11435`	GCP-A through 110 nginx
`OLLAMA_SECONDARY_URL`	`http://192.168.0.110:11436`	GCP-B through 110 nginx
`OLLAMA_FALLBACK_URL`	`http://192.168.0.111:11434`	Local 111

This is a bridge. Do not treat the public proxy as the final architecture.

Target State

Host	WireGuard IP	Notes
110	`10.77.114.10`	DevOps host and rollback bridge
120	`10.77.114.120`	K3s node
121	`10.77.114.121`	K3s node
111	`10.77.114.111`	Local Ollama fallback
GCP-A	`10.77.114.21`	Primary Ollama
GCP-B	`10.77.114.22`	Secondary Ollama

Production endpoints after cutover:

OLLAMA_URL: "http://10.77.114.21:11434"
OLLAMA_SECONDARY_URL: "http://10.77.114.22:11434"
OLLAMA_FALLBACK_URL: "http://10.77.114.111:11434"

Prerequisites

SSH access to GCP-A and GCP-B.
GCP IAM permissions for firewall rules if OS firewall alone is not enough.
SSH access to 110, 111, 120, and 121.
A secured place to store WireGuard private keys. Never commit private keys.
Confirm the GCP hosts have enough CPU/RAM for gemma3:4b.

Key Rules

Private keys are generated on each host and never copied into Git.
Public keys may be recorded in the operator handoff note.
Public GCP 11434/tcp must be closed after cutover.
alert-fast uses gemma3:4b; 14B/32B models must not run on GCP-A/B during alert-lane canary.

Install WireGuard

Ubuntu/Debian:

sudo apt-get update
sudo apt-get install -y wireguard

Alpine:

sudo apk add --no-cache wireguard-tools

Generate keys on every host:

umask 077
wg genkey | sudo tee /etc/wireguard/awooop.key
sudo cat /etc/wireguard/awooop.key | wg pubkey | sudo tee /etc/wireguard/awooop.pub

Configure Peers

Create /etc/wireguard/wg-awooop.conf on each host.

Example for GCP-A:

[Interface]
Address = 10.77.114.21/32
ListenPort = 51820
PrivateKey = <GCP_A_PRIVATE_KEY>

[Peer]
# 120 K3s node
PublicKey = <K3S_120_PUBLIC_KEY>
AllowedIPs = 10.77.114.120/32
Endpoint = <120_REACHABLE_ENDPOINT>:51820
PersistentKeepalive = 25

[Peer]
# 121 K3s node
PublicKey = <K3S_121_PUBLIC_KEY>
AllowedIPs = 10.77.114.121/32
Endpoint = <121_REACHABLE_ENDPOINT>:51820
PersistentKeepalive = 25

[Peer]
# 110 DevOps rollback bridge
PublicKey = <HOST_110_PUBLIC_KEY>
AllowedIPs = 10.77.114.10/32
Endpoint = <110_REACHABLE_ENDPOINT>:51820
PersistentKeepalive = 25

Example for a K3s node:

[Interface]
Address = 10.77.114.120/32
ListenPort = 51820
PrivateKey = <K3S_120_PRIVATE_KEY>

[Peer]
# GCP-A
PublicKey = <GCP_A_PUBLIC_KEY>
AllowedIPs = 10.77.114.21/32
Endpoint = 34.143.170.20:51820
PersistentKeepalive = 25

[Peer]
# GCP-B
PublicKey = <GCP_B_PUBLIC_KEY>
AllowedIPs = 10.77.114.22/32
Endpoint = 34.21.145.224:51820
PersistentKeepalive = 25

[Peer]
# Local 111
PublicKey = <HOST_111_PUBLIC_KEY>
AllowedIPs = 10.77.114.111/32
Endpoint = 192.168.0.111:51820
PersistentKeepalive = 25

The exact peer list depends on reachable endpoints. If inbound access to 120/121 is not available, use 110 as a temporary mesh relay, then replace it with direct K3s-to-GCP peers when routing is confirmed.

Start WireGuard

sudo systemctl enable --now wg-quick@wg-awooop
sudo wg show wg-awooop

Verify connectivity:

ping -c 3 10.77.114.21
ping -c 3 10.77.114.22
curl -fsS http://10.77.114.21:11434/api/tags
curl -fsS http://10.77.114.22:11434/api/tags

Bind or Firewall Ollama

Preferred: bind Ollama to the mesh interface.

sudo systemctl edit ollama

[Service]
Environment="OLLAMA_HOST=10.77.114.21:11434"

Use 10.77.114.22:11434 on GCP-B.

If binding is not possible, firewall the host:

sudo ufw allow from 10.77.114.0/24 to any port 11434 proto tcp
sudo ufw deny 11434/tcp

Then restart:

sudo systemctl daemon-reload
sudo systemctl restart ollama

K8s NetworkPolicy

After mesh cutover, allow only mesh endpoints for Ollama:

- to:
    - ipBlock:
        cidr: 10.77.114.21/32
    - ipBlock:
        cidr: 10.77.114.22/32
    - ipBlock:
        cidr: 10.77.114.111/32
  ports:
    - protocol: TCP
      port: 11434

Do not remove the 192.168.0.110:11435/11436 rules until rollback is no longer needed.

Shadow Validation

From the API pod:

bash scripts/ops/ollama-topology-check.sh

Expected:

GCP-A /api/tags returns 200.
GCP-B /api/tags returns 200.
gemma3:4b generation succeeds on both nodes.
/api/ps contains gemma3:4b.
If size_vram=0, keep GCP-A/B on alert-fast only and route heavy models to 111 or a GPU-capable node.

Cutover

Patch deployment env after shadow passes:

kubectl -n awoooi-prod set env deploy/awoooi-api \
  OLLAMA_URL=http://10.77.114.21:11434 \
  OLLAMA_SECONDARY_URL=http://10.77.114.22:11434 \
  OLLAMA_FALLBACK_URL=http://10.77.114.111:11434

kubectl -n awoooi-prod set env deploy/awoooi-worker \
  OLLAMA_URL=http://10.77.114.21:11434 \
  OLLAMA_SECONDARY_URL=http://10.77.114.22:11434 \
  OLLAMA_FALLBACK_URL=http://10.77.114.111:11434

Verify:

kubectl -n awoooi-prod rollout status deploy/awoooi-api --timeout=180s
kubectl -n awoooi-prod rollout status deploy/awoooi-worker --timeout=180s
bash scripts/ops/ollama-topology-check.sh

Rollback

kubectl -n awoooi-prod set env deploy/awoooi-api \
  OLLAMA_URL=http://192.168.0.110:11435 \
  OLLAMA_SECONDARY_URL=http://192.168.0.110:11436 \
  OLLAMA_FALLBACK_URL=http://192.168.0.111:11434

kubectl -n awoooi-prod set env deploy/awoooi-worker \
  OLLAMA_URL=http://192.168.0.110:11435 \
  OLLAMA_SECONDARY_URL=http://192.168.0.110:11436 \
  OLLAMA_FALLBACK_URL=http://192.168.0.111:11434

Done Criteria

Mesh endpoints pass 7 days of canary.
Alert lane Gemini usage is zero except documented all-Ollama outages.
Public GCP 11434/tcp is closed.
Operator runbook records peer public keys and rollback owner.

6.4 KiB Raw Blame History