Three-tier backups, end-to-end

The homelab follows the 3-2-1 backup rule — three copies of every important byte, on two different media, with at least one off-site. This page traces the full path a byte takes from "written to a Longhorn PVC" to "rotated cold drive on a shelf", and what it costs to recover from each tier.

Tier	Where	Driver	RPO (target)	RTO (target)
Live	Longhorn PVC on the Talos cluster	the workload itself	n/a	n/a
Warm	Restic repository on Hetzner Object Storage	k8up → Restic	≤ 24 h	minutes-to-an-hour
Hot	Off-site Synology DS723+ ("Maresa")	Syncthing	seconds	seconds
Cold	Two rotating, encrypted WD Elements drives	manual `restic copy` + `rsync`	1–4 weeks	hours

What gets backed up vs. what doesn't

A workload's volume is backed up when its PersistentVolumeClaim carries the annotation k8up.io/backup: "true". The matching Schedule resource in the backups component picks it up. A volume without the annotation is deliberately not backed up — regenerable state (search indices, transcode caches), bulk media that's mirrored separately, or anything where the snapshot cost outweighs the data's value.

The same opt-in flag drives CNPG clusters: an annotation of k8up.io/backupcommand: pg_dump makes k8up snapshot a logical dump rather than a file-system view, which is what you want for a database.

Synced datasets (the /volume1/backup/* paths on Maresa) use a different mechanism — Syncthing replicates the upstream peer continuously. Neither k8up nor Restic is involved on that path; the cluster is the source of truth and Maresa is a live replica.

Warm tier — k8up → Restic → Hetzner S3

PVC (Longhorn)
   │
   ▼ scheduled
┌────────────┐
│ k8up       │   reads PVC, optionally runs pre-backup command (e.g. pg_dump)
│ Schedule   │
└─────┬──────┘
      ▼
┌────────────┐
│ Restic     │   chunks, deduplicates, encrypts, uploads
│ repository │
└─────┬──────┘
      ▼
┌────────────┐
│ Hetzner    │   S3-compatible Object Storage (eu-central)
│ Object Storage │
└────────────┘

This is the primary recovery path. Restic's content-addressed store gives:

Deduplication across all PVCs writing to the same repo — 47 apps × N snapshots compress hard.
End-to-end encryption before the bytes leave the cluster. The repository password lives in a SOPS-encrypted Secret; Hetzner sees only ciphertext.
Pruning policy that retains hourly for 24 h, daily for 7 d, weekly for 4 w, monthly for a year. Each Schedule resource declares its own retention via keep: {hourly, daily, weekly, monthly}.

Restoring from warm: restic restore <snapshot> --target /restore from a CNPG cluster, or via the postgres-restore runbook which streams a dump through pv. RTO is dominated by transfer speed from Hetzner — a 100 GB volume is on the order of 15–30 minutes.

Hot tier — Syncthing to Maresa

PVC content (or media folder)
   │
   ▼ inotify / continuous
┌────────────┐
│ Syncthing  │   peer-to-peer, bidirectional, with staggered versioning
│ folder     │
└─────┬──────┘
      ▼
┌────────────┐
│ Maresa     │   /volume1/backup/{archive,audio,gaming,images,reading,stash,videos}
│ (Synology) │
└────────────┘

The hot tier exists for the last few minutes of work that the warm-tier snapshot hasn't caught yet, plus for bulk datasets (media libraries, etc.) where a continuous replica is cheaper than scheduled snapshots.

Bidirectional by default. A correction made on Maresa flows back to the source. Useful for the archive pool.
Versioning per folder ("Staggered" policy) gives hour-by-hour, then daily, then weekly file history, even on Maresa. So a delete on the upstream doesn't immediately destroy the replica's previous state.
No central server. Each peer holds its own keys; the Synology can be unreachable for days without breaking the upstream.

Maresa is in a different physical location from the cluster — the "off-site" leg of 3-2-1. The Syncthing folder on Maresa is also the source for the cold tier below.

Restoring from hot: copy the relevant directory back from Maresa over Syncthing (or just over the NetBird mesh directly). RTO measured in seconds for individual files; minutes for large datasets at LAN-ish speeds.

Cold tier — encrypted rotating drives

Maresa /volume1/backup/*
   │
   ▼ manual, monthly
┌─────────────────────┐
│ VeraCrypt volume    │   rsync over USB
│ on WD Elements drive│
└─────┬───────────────┘
      ▼
   off-line shelf

Two WD Elements drives, each formatted as a VeraCrypt volume, rotated monthly to a different physical location. One drive is always offline; the other is in transit or being written. See Hardware → Cold storage for the drive list and the encryption setup.

Air-gapped. A cluster-wide ransomware event can't touch a drive that's sitting on a shelf with no USB plugged in.
Encrypted at rest. A lost or stolen drive is useless without the passphrase.
Targeted, not exhaustive. Only the "important" subset of /volume1/backup/* makes it onto the cold tier — irreplaceable content (photos, documents, source code) — not bulk media that can be re-acquired.

The cold tier is the slowest, lowest-RPO, highest-RTO tier. It's the answer to every other tier has been compromised — not the path you reach for to roll back a deploy.

When each tier matters

Failure mode	Tier that recovers it
Bad deploy, app data corrupted within last hour	warm (latest hourly Restic snapshot)
App data lost on a single PVC, more than 24 h ago	warm (daily/weekly snapshot)
Cluster unreachable, but Maresa OK	hot (Syncthing replica is reachable directly)
Hetzner outage or repo-password mistakenly rotated	hot or cold, whichever has the freshest copy
Whole site (cluster + Hetzner) lost	cold (offline drive)
Drive theft / loss	the other drive in rotation

The whole point of having three tiers is that any single compromise leaves the other two intact.

Operational rules of thumb

Don't tier-shift on a deploy. Backups happen on their own k8up Schedules, unrelated to GitOps. Resist the urge to "snapshot before deploy" — see topics/gitops-flow for why.
Periodically rotate the cold drives. A monthly cadence is what's documented; further drift turns the cold tier from a safety net into a museum.
Test restores. A backup that's never been restored is a hope, not a backup. The postgres-restore runbook is the canonical drill; pick an app's PVC quarterly and run through it on a throwaway namespace.
Watch repo password handling. Lose the Restic repository password and the warm tier is cryptographically gone — that's the failure mode the cold tier exists to survive.

What gets backed up vs. what doesn't​

Warm tier — k8up → Restic → Hetzner S3​

Hot tier — Syncthing to Maresa​

Cold tier — encrypted rotating drives​

When each tier matters​

Operational rules of thumb​

See also​