Skip to main content

Hetzner Cloud

The edge site lives in Hetzner Cloud, region nbg1. It's a single-node Talos cluster — control-plane-1, a cx33 instance — fronted by a Hetzner-managed VPC, label-selected cloud firewalls, and a floating IP that survives instance replacement.

Driven by tofu/environment/edge via the hetznercloud/hcloud provider.

Why Hetzner

  • Cheap, predictable pricing for a 24/7 edge node. A cx33 is a couple of euros a month and covers what the edge cluster needs.
  • A real API with a first-class OpenTofu provider — VMs, networks, floating IPs, firewalls, and labels are all declarative.
  • EU-resident for any data that hits the edge before it tunnels back home through NetBird.

Alternatives considered

OptionWhy not
AWS / GCP / AzureOverkill and overpriced for a single edge instance
DigitalOceanComparable, slightly more expensive; no obvious upside
ScalewayUsed for the backup S3 bucket, not for compute
OVH / Hetzner Robot dedicatedToo much hardware for the edge role

Layout

floating IP (edge-1)


┌────────────────────────┐
│ control-plane-1 │ cx33 · Talos
│ 172.30.0.11 (private)│ nbg1
│ public IP │
└─────┬────────────┬─────┘
│ │
VPC `edge` cloud firewalls
172.30.0.0/16 (label-selected, see below)
subnet `k8s`
172.30.0.0/24
ResourceValueNotes
Networkedge172.30.0.0/16, delete-protected
Subnetk8s172.30.0.0/24, in eu-central
Instancecontrol-plane-1cx33, Talos image
Floating IPedge-1Inbound traffic target — survives node replace

Cloud firewalls

Hetzner cloud firewalls are label-selected, so adding a node to the right firewall is a label edit, not a topology change.

FirewallTargets (label)IngressSources
talos-controltalos_control=true50000/tcp (apid), 6443/tcp (kube-api), 2379-2380/tcp (etcd)172.30.0.0/24
talos-internaltalos=true50001 (trustd), 51871/udp (Cilium WG), 4240/4244/4245/4250 (Cilium/Hubble), 9962-9964 (metrics), 10250 (kubelet)172.30.0.0/24
allow-httphttp=true80, 4430.0.0.0/0
allow-sshssh=true220.0.0.0/0

Talos itself doesn't run SSH, but allow-ssh exists for the rare bootstrap maintenance instance that does.

NetBird

  • Network: edge → resource edge Management Subnet = 172.30.0.0/24
  • Routing peer: control-plane-1 is in edge_peers (metric 9999, masquerade)
  • Sidecars: edge_sidecar_envoy group has a cross-network policy to the production public subnet (192.168.105.0/24) — this is the path that lets the edge Envoy gateway reach apps on the production cluster, the basis of the edge → production traffic chain.

OpenTofu workflow

cd tofu/environment/edge
tofu init
tofu plan -out=plan
tofu apply plan

The provider authenticates with an API token stored in SOPS. State and tokens never live in plaintext on disk; CI runs apply only after a manual approval step.

Backups on Scaleway

The matching tofu/modules/scaleway/backup_bucket module creates an S3-compatible Object Storage bucket on Scaleway used as the Restic target. It's deliberately a different provider so a Hetzner outage can't take both the edge node and its backup target out at the same time.

Operational notes

  • Replacing the instance. Re-create with the same name, re-attach the floating IP, re-apply Talos config. The cloud firewalls and VPC stay untouched because they're label-selected.
  • Region. nbg1 is the only region used; switching regions is an apply, but the Talos image must be in that region's snapshot library.
  • Quotas. Hetzner enforces per-project caps on instances / floating IPs / networks. Keep the edge env on its own project for clean blast-radius control.
  • Monitoring. Gatus checks the floating IP from outside the homelab; Prometheus scrapes node metrics from inside the mesh.

Where to look next