Talos Linux

Talos Linux is the operating system on every node of both Kubernetes clusters — the production cluster on Proxmox (3× control-plane + 3× workers) and the edge cluster on Hetzner (1× control-plane). It is immutable, API-driven, and ships with no shell, no SSH, and no package manager.

Why Talos

A fewer moving parts list, basically:

Immutable. No drift between nodes — what you don't configure, you can't accidentally change. Upgrades are atomic image swaps with one-command rollback.
API-only. talosctl over mTLS is the only way to interact with a node. There is no SSH to forget to disable, no shell history to leak, no apt-day inventory to maintain.
Pre-baked for Kubernetes. Kubelet, etcd, containerd, the CNI dance — all wired up by the OS. The whole machine config fits in a single YAML.
Boot from a UKI. Secure boot, signed kernels, no GRUB to reason about.
Predictable. Patches are declarative; the node's running state is exactly what talconfig.yaml says it is.

Alternatives considered

Option	Why not
Stock Ubuntu / Debian + kubeadm	Mutable. Drift, package upgrades, SSH attack surface. No upside vs Talos.
Flatcar	Closer in spirit but still has SSH and a shell; less Kubernetes-specific
Bottlerocket	AWS-flavored; not aimed at on-prem; no Hetzner support
Fedora CoreOS	Mutable enough to drift; updates via rpm-ostree; not as opinionated for k8s
Plain Talos without Talhelper	Fine, but Talhelper turns 6 nearly-identical machine configs into one config file

Talhelper

Talhelper is a thin wrapper that turns one declarative config into per-node machine configs:

talos/
├── talos/                          ← production cluster
│   ├── talconfig.yaml              ← cluster-wide + per-node settings
│   ├── talsecret.sops.yaml         ← cluster secrets (PKI, tokens), SOPS-encrypted
│   ├── clusterconfig/              ← rendered per-node configs (apply target)
│   └── patch-*.yaml                ← shared strategic-merge patches
└── edge/                           ← edge cluster
    ├── talconfig.yaml
    ├── talsecret.sops.yaml
    ├── clusterconfig/
    └── patch-*.yaml

Render and apply:

# Decrypt secrets, render, then apply
talhelper genconfig
talhelper gencommand apply | sh

clusterconfig/ is what actually goes onto the nodes; everything else is source.

Patches

Both clusters share a baseline of strategic-merge patches:

Patch	Why
`patch-kubelet.yaml`	Tweaks kubelet config
`patch-etcd.yaml`	etcd defaults / quotas
`patch-disable-kube-proxy.yaml`	`kube-proxy` is replaced by Cilium
`patch-cilium-fix.yaml`	Cilium-specific Talos kernel/config tweaks
`patch-nameservers.yaml`	Resolver pinned to known-good upstreams

The production cluster carries two extras:

Patch	Why
`patch-longhorn-extramount.yaml`	Extra mount for Longhorn data
`patch-spegel.yaml`	Spegel image-mirror config (containerd registry mirror)

The edge cluster does not run Longhorn or Spegel, so it doesn't need either.

Bootstrap flow

Provision VMs. Proxmox for production (one VM per planned node), Hetzner for edge.
Generate machine configs. talhelper genconfig produces clusterconfig/<node>.yaml for each node.
Boot from Talos image. PXE / ISO / Hetzner snapshot — the node comes up in maintenance mode.
Apply config. talhelper gencommand apply | sh pushes the per-node config; nodes reboot into "configured" state.
Bootstrap etcd. talosctl bootstrap on the first control-plane node.
Hand the kubeconfig over. talhelper gencommand kubeconfig | sh writes a kubeconfig.
Install Flux. From there everything else is GitOps.

Upgrades

talhelper genconfig
talhelper gencommand upgrade --extra-flags "--preserve" | sh

--preserve keeps user-data partitions (Longhorn, etc.) across the image swap. Talos upgrades are sequential per node and recoverable — if a node fails to come back, talosctl rollback reverts to the previous image.

Recovery & rotation

Scenario	Action
Single node lost	Re-provision the VM with the same name, re-apply config — etcd readmits it
All control-planes lost (worst case)	Restore etcd from a Restic snapshot, reapply configs
Cluster CA rotation	Edit `talconfig.yaml`, regen secrets, roll the cluster one node at a time
Talsecret leaked	Wipe + reinstall is faster than rotating; treat the cluster as cattle

Where to look next

Hardware → Talos cluster — the physical NUC nodes themselves
Platform → Cilium — kube-proxy replacement that needs the matching Talos patch
Operations → Backups — what to snapshot to make a recovery actually possible

Why Talos​

Alternatives considered​

Talhelper​

Patches​

Bootstrap flow​

Upgrades​

Recovery & rotation​

Where to look next​