Talos Linux
Talos Linux is the operating system on every node of both Kubernetes clusters — the production cluster on Proxmox (3× control-plane + 3× workers) and the edge cluster on Hetzner (1× control-plane). It is immutable, API-driven, and ships with no shell, no SSH, and no package manager.
Why Talos
A fewer moving parts list, basically:
- Immutable. No drift between nodes — what you don't configure, you can't accidentally change. Upgrades are atomic image swaps with one-command rollback.
- API-only.
talosctlover mTLS is the only way to interact with a node. There is no SSH to forget to disable, no shell history to leak, noapt-day inventory to maintain. - Pre-baked for Kubernetes. Kubelet, etcd, containerd, the CNI dance — all wired up by the OS. The whole machine config fits in a single YAML.
- Boot from a UKI. Secure boot, signed kernels, no GRUB to reason about.
- Predictable. Patches are declarative; the node's running state is exactly what
talconfig.yamlsays it is.
Alternatives considered
| Option | Why not |
|---|---|
| Stock Ubuntu / Debian + kubeadm | Mutable. Drift, package upgrades, SSH attack surface. No upside vs Talos. |
| Flatcar | Closer in spirit but still has SSH and a shell; less Kubernetes-specific |
| Bottlerocket | AWS-flavored; not aimed at on-prem; no Hetzner support |
| Fedora CoreOS | Mutable enough to drift; updates via rpm-ostree; not as opinionated for k8s |
| Plain Talos without Talhelper | Fine, but Talhelper turns 6 nearly-identical machine configs into one config file |
Talhelper
Talhelper is a thin wrapper that turns one declarative config into per-node machine configs:
talos/
├── talos/ ← production cluster
│ ├── talconfig.yaml ← cluster-wide + per-node settings
│ ├── talsecret.sops.yaml ← cluster secrets (PKI, tokens), SOPS-encrypted
│ ├── clusterconfig/ ← rendered per-node configs (apply target)
│ └── patch-*.yaml ← shared strategic-merge patches
└── edge/ ← edge cluster
├── talconfig.yaml
├── talsecret.sops.yaml
├── clusterconfig/
└── patch-*.yaml
Render and apply:
# Decrypt secrets, render, then apply
talhelper genconfig
talhelper gencommand apply | sh
clusterconfig/ is what actually goes onto the nodes; everything else is source.
Patches
Both clusters share a baseline of strategic-merge patches:
| Patch | Why |
|---|---|
patch-kubelet.yaml | Tweaks kubelet config |
patch-etcd.yaml | etcd defaults / quotas |
patch-disable-kube-proxy.yaml | kube-proxy is replaced by Cilium |
patch-cilium-fix.yaml | Cilium-specific Talos kernel/config tweaks |
patch-nameservers.yaml | Resolver pinned to known-good upstreams |
The production cluster carries two extras:
| Patch | Why |
|---|---|
patch-longhorn-extramount.yaml | Extra mount for Longhorn data |
patch-spegel.yaml | Spegel image-mirror config (containerd registry mirror) |
The edge cluster does not run Longhorn or Spegel, so it doesn't need either.
Bootstrap flow
- Provision VMs. Proxmox for production (one VM per planned node), Hetzner for edge.
- Generate machine configs.
talhelper genconfigproducesclusterconfig/<node>.yamlfor each node. - Boot from Talos image. PXE / ISO / Hetzner snapshot — the node comes up in maintenance mode.
- Apply config.
talhelper gencommand apply | shpushes the per-node config; nodes reboot into "configured" state. - Bootstrap etcd.
talosctl bootstrapon the first control-plane node. - Hand the kubeconfig over.
talhelper gencommand kubeconfig | shwrites a kubeconfig. - Install Flux. From there everything else is GitOps.
Upgrades
talhelper genconfig
talhelper gencommand upgrade --extra-flags "--preserve" | sh
--preserve keeps user-data partitions (Longhorn, etc.) across the image swap. Talos upgrades are sequential per node and recoverable — if a node fails to come back, talosctl rollback reverts to the previous image.
Recovery & rotation
| Scenario | Action |
|---|---|
| Single node lost | Re-provision the VM with the same name, re-apply config — etcd readmits it |
| All control-planes lost (worst case) | Restore etcd from a Restic snapshot, reapply configs |
| Cluster CA rotation | Edit talconfig.yaml, regen secrets, roll the cluster one node at a time |
| Talsecret leaked | Wipe + reinstall is faster than rotating; treat the cluster as cattle |
Where to look next
- Hardware → Talos cluster — the physical NUC nodes themselves
- Platform → Cilium — kube-proxy replacement that needs the matching Talos patch
- Operations → Backups — what to snapshot to make a recovery actually possible