Three-tier backups, end-to-end
The homelab follows the 3-2-1 backup rule — three copies of every important byte, on two different media, with at least one off-site. This page traces the full path a byte takes from "written to a Longhorn PVC" to "rotated cold drive on a shelf", and what it costs to recover from each tier.
| Tier | Where | Driver | RPO (target) | RTO (target) |
|---|---|---|---|---|
| Live | Longhorn PVC on the Talos cluster | the workload itself | n/a | n/a |
| Warm | Restic repository on Hetzner Object Storage | k8up → Restic | ≤ 24 h | minutes-to-an-hour |
| Hot | Off-site Synology DS723+ ("Maresa") | Syncthing | seconds | seconds |
| Cold | Two rotating, encrypted WD Elements drives | manual restic copy + rsync | 1–4 weeks | hours |
What gets backed up vs. what doesn't
A workload's volume is backed up when its PersistentVolumeClaim carries the annotation k8up.io/backup: "true". The matching Schedule resource in the backups component picks it up. A volume without the annotation is deliberately not backed up — regenerable state (search indices, transcode caches), bulk media that's mirrored separately, or anything where the snapshot cost outweighs the data's value.
The same opt-in flag drives CNPG clusters: an annotation of k8up.io/backupcommand: pg_dump makes k8up snapshot a logical dump rather than a file-system view, which is what you want for a database.
Synced datasets (the /volume1/backup/* paths on Maresa) use a different mechanism — Syncthing replicates the upstream peer continuously. Neither k8up nor Restic is involved on that path; the cluster is the source of truth and Maresa is a live replica.
Warm tier — k8up → Restic → Hetzner S3
PVC (Longhorn)
│
▼ scheduled
┌────────────┐
│ k8up │ reads PVC, optionally runs pre-backup command (e.g. pg_dump)
│ Schedule │
└─────┬──────┘
▼
┌────────────┐
│ Restic │ chunks, deduplicates, encrypts, uploads
│ repository │
└─────┬──────┘
▼
┌────────────┐
│ Hetzner │ S3-compatible Object Storage (eu-central)
│ Object Storage │
└────────────┘
This is the primary recovery path. Restic's content-addressed store gives:
- Deduplication across all PVCs writing to the same repo — 47 apps × N snapshots compress hard.
- End-to-end encryption before the bytes leave the cluster. The repository password lives in a SOPS-encrypted Secret; Hetzner sees only ciphertext.
- Pruning policy that retains hourly for 24 h, daily for 7 d, weekly for 4 w, monthly for a year. Each
Scheduleresource declares its own retention viakeep: {hourly, daily, weekly, monthly}.
Restoring from warm: restic restore <snapshot> --target /restore from a CNPG cluster, or via the postgres-restore runbook which streams a dump through pv. RTO is dominated by transfer speed from Hetzner — a 100 GB volume is on the order of 15–30 minutes.
Hot tier — Syncthing to Maresa
PVC content (or media folder)
│
▼ inotify / continuous
┌────────────┐
│ Syncthing │ peer-to-peer, bidirectional, with staggered versioning
│ folder │
└─────┬──────┘
▼
┌────────────┐
│ Maresa │ /volume1/backup/{archive,audio,gaming,images,reading,stash,videos}
│ (Synology) │
└────────────┘
The hot tier exists for the last few minutes of work that the warm-tier snapshot hasn't caught yet, plus for bulk datasets (media libraries, etc.) where a continuous replica is cheaper than scheduled snapshots.
- Bidirectional by default. A correction made on Maresa flows back to the source. Useful for the
archivepool. - Versioning per folder ("Staggered" policy) gives hour-by-hour, then daily, then weekly file history, even on Maresa. So a delete on the upstream doesn't immediately destroy the replica's previous state.
- No central server. Each peer holds its own keys; the Synology can be unreachable for days without breaking the upstream.
Maresa is in a different physical location from the cluster — the "off-site" leg of 3-2-1. The Syncthing folder on Maresa is also the source for the cold tier below.
Restoring from hot: copy the relevant directory back from Maresa over Syncthing (or just over the NetBird mesh directly). RTO measured in seconds for individual files; minutes for large datasets at LAN-ish speeds.
Cold tier — encrypted rotating drives
Maresa /volume1/backup/*
│
▼ manual, monthly
┌─────────────────────┐
│ VeraCrypt volume │ rsync over USB
│ on WD Elements drive│
└─────┬───────────────┘
▼
off-line shelf
Two WD Elements drives, each formatted as a VeraCrypt volume, rotated monthly to a different physical location. One drive is always offline; the other is in transit or being written. See Hardware → Cold storage for the drive list and the encryption setup.
- Air-gapped. A cluster-wide ransomware event can't touch a drive that's sitting on a shelf with no USB plugged in.
- Encrypted at rest. A lost or stolen drive is useless without the passphrase.
- Targeted, not exhaustive. Only the "important" subset of
/volume1/backup/*makes it onto the cold tier — irreplaceable content (photos, documents, source code) — not bulk media that can be re-acquired.
The cold tier is the slowest, lowest-RPO, highest-RTO tier. It's the answer to every other tier has been compromised — not the path you reach for to roll back a deploy.
When each tier matters
| Failure mode | Tier that recovers it |
|---|---|
| Bad deploy, app data corrupted within last hour | warm (latest hourly Restic snapshot) |
| App data lost on a single PVC, more than 24 h ago | warm (daily/weekly snapshot) |
| Cluster unreachable, but Maresa OK | hot (Syncthing replica is reachable directly) |
| Hetzner outage or repo-password mistakenly rotated | hot or cold, whichever has the freshest copy |
| Whole site (cluster + Hetzner) lost | cold (offline drive) |
| Drive theft / loss | the other drive in rotation |
The whole point of having three tiers is that any single compromise leaves the other two intact.
Operational rules of thumb
- Don't tier-shift on a deploy. Backups happen on their own k8up
Schedules, unrelated to GitOps. Resist the urge to "snapshot before deploy" — see topics/gitops-flow for why. - Periodically rotate the cold drives. A monthly cadence is what's documented; further drift turns the cold tier from a safety net into a museum.
- Test restores. A backup that's never been restored is a hope, not a backup. The
postgres-restorerunbook is the canonical drill; pick an app's PVC quarterly and run through it on a throwaway namespace. - Watch repo password handling. Lose the Restic repository password and the warm tier is cryptographically gone — that's the failure mode the cold tier exists to survive.
See also
- Operations → Database restore from k8up — the warm-tier restore runbook, end to end
- Operations → SOPS — how the Restic repo password is encrypted at rest
- Platform → k8up — the
ScheduleCRD that drives all of this - Apps → Syncthing — the hot-tier engine
- Hardware → Cold storage — the WD Elements drives + VeraCrypt setup
- Topics → Disaster recovery drill — running the full recovery from scratch