Skip to main content

Three-tier backups, end-to-end

The homelab follows the 3-2-1 backup rule — three copies of every important byte, on two different media, with at least one off-site. This page traces the full path a byte takes from "written to a Longhorn PVC" to "rotated cold drive on a shelf", and what it costs to recover from each tier.

TierWhereDriverRPO (target)RTO (target)
LiveLonghorn PVC on the Talos clusterthe workload itselfn/an/a
WarmRestic repository on Hetzner Object Storagek8up → Restic≤ 24 hminutes-to-an-hour
HotOff-site Synology DS723+ ("Maresa")Syncthingsecondsseconds
ColdTwo rotating, encrypted WD Elements drivesmanual restic copy + rsync1–4 weekshours

What gets backed up vs. what doesn't

A workload's volume is backed up when its PersistentVolumeClaim carries the annotation k8up.io/backup: "true". The matching Schedule resource in the backups component picks it up. A volume without the annotation is deliberately not backed up — regenerable state (search indices, transcode caches), bulk media that's mirrored separately, or anything where the snapshot cost outweighs the data's value.

The same opt-in flag drives CNPG clusters: an annotation of k8up.io/backupcommand: pg_dump makes k8up snapshot a logical dump rather than a file-system view, which is what you want for a database.

Synced datasets (the /volume1/backup/* paths on Maresa) use a different mechanism — Syncthing replicates the upstream peer continuously. Neither k8up nor Restic is involved on that path; the cluster is the source of truth and Maresa is a live replica.

Warm tier — k8up → Restic → Hetzner S3

PVC (Longhorn)

▼ scheduled
┌────────────┐
│ k8up │ reads PVC, optionally runs pre-backup command (e.g. pg_dump)
│ Schedule │
└─────┬──────┘

┌────────────┐
│ Restic │ chunks, deduplicates, encrypts, uploads
│ repository │
└─────┬──────┘

┌────────────┐
│ Hetzner │ S3-compatible Object Storage (eu-central)
│ Object Storage │
└────────────┘

This is the primary recovery path. Restic's content-addressed store gives:

  • Deduplication across all PVCs writing to the same repo — 47 apps × N snapshots compress hard.
  • End-to-end encryption before the bytes leave the cluster. The repository password lives in a SOPS-encrypted Secret; Hetzner sees only ciphertext.
  • Pruning policy that retains hourly for 24 h, daily for 7 d, weekly for 4 w, monthly for a year. Each Schedule resource declares its own retention via keep: {hourly, daily, weekly, monthly}.

Restoring from warm: restic restore <snapshot> --target /restore from a CNPG cluster, or via the postgres-restore runbook which streams a dump through pv. RTO is dominated by transfer speed from Hetzner — a 100 GB volume is on the order of 15–30 minutes.

Hot tier — Syncthing to Maresa

PVC content (or media folder)

▼ inotify / continuous
┌────────────┐
│ Syncthing │ peer-to-peer, bidirectional, with staggered versioning
│ folder │
└─────┬──────┘

┌────────────┐
│ Maresa │ /volume1/backup/{archive,audio,gaming,images,reading,stash,videos}
│ (Synology) │
└────────────┘

The hot tier exists for the last few minutes of work that the warm-tier snapshot hasn't caught yet, plus for bulk datasets (media libraries, etc.) where a continuous replica is cheaper than scheduled snapshots.

  • Bidirectional by default. A correction made on Maresa flows back to the source. Useful for the archive pool.
  • Versioning per folder ("Staggered" policy) gives hour-by-hour, then daily, then weekly file history, even on Maresa. So a delete on the upstream doesn't immediately destroy the replica's previous state.
  • No central server. Each peer holds its own keys; the Synology can be unreachable for days without breaking the upstream.

Maresa is in a different physical location from the cluster — the "off-site" leg of 3-2-1. The Syncthing folder on Maresa is also the source for the cold tier below.

Restoring from hot: copy the relevant directory back from Maresa over Syncthing (or just over the NetBird mesh directly). RTO measured in seconds for individual files; minutes for large datasets at LAN-ish speeds.

Cold tier — encrypted rotating drives

Maresa /volume1/backup/*

▼ manual, monthly
┌─────────────────────┐
│ VeraCrypt volume │ rsync over USB
│ on WD Elements drive│
└─────┬───────────────┘

off-line shelf

Two WD Elements drives, each formatted as a VeraCrypt volume, rotated monthly to a different physical location. One drive is always offline; the other is in transit or being written. See Hardware → Cold storage for the drive list and the encryption setup.

  • Air-gapped. A cluster-wide ransomware event can't touch a drive that's sitting on a shelf with no USB plugged in.
  • Encrypted at rest. A lost or stolen drive is useless without the passphrase.
  • Targeted, not exhaustive. Only the "important" subset of /volume1/backup/* makes it onto the cold tier — irreplaceable content (photos, documents, source code) — not bulk media that can be re-acquired.

The cold tier is the slowest, lowest-RPO, highest-RTO tier. It's the answer to every other tier has been compromised — not the path you reach for to roll back a deploy.

When each tier matters

Failure modeTier that recovers it
Bad deploy, app data corrupted within last hourwarm (latest hourly Restic snapshot)
App data lost on a single PVC, more than 24 h agowarm (daily/weekly snapshot)
Cluster unreachable, but Maresa OKhot (Syncthing replica is reachable directly)
Hetzner outage or repo-password mistakenly rotatedhot or cold, whichever has the freshest copy
Whole site (cluster + Hetzner) lostcold (offline drive)
Drive theft / lossthe other drive in rotation

The whole point of having three tiers is that any single compromise leaves the other two intact.

Operational rules of thumb

  • Don't tier-shift on a deploy. Backups happen on their own k8up Schedules, unrelated to GitOps. Resist the urge to "snapshot before deploy" — see topics/gitops-flow for why.
  • Periodically rotate the cold drives. A monthly cadence is what's documented; further drift turns the cold tier from a safety net into a museum.
  • Test restores. A backup that's never been restored is a hope, not a backup. The postgres-restore runbook is the canonical drill; pick an app's PVC quarterly and run through it on a throwaway namespace.
  • Watch repo password handling. Lose the Restic repository password and the warm tier is cryptographically gone — that's the failure mode the cold tier exists to survive.

See also