Skip to main content

From a Renovate PR to a running container

This page traces the full GitOps loop in the homelab — what happens between Renovate opens a pull request and the new image is running in the cluster, with the new database schema applied, the old PVCs intact, and the dashboards showing green.

It crosses every section of the docs, so it lives in topics/ rather than belonging to any one. The actors involved:

ActorWhere it lives in docs
Renovate (running in gitea-runner)apps/gitea-runner
Gitea (the source of truth)apps/gitea
Flux (the reconciler)foundation/flux
Kustomize + the components/ and per-app overlayscomponents/
SOPS + age (decrypts secrets in-cluster)operations/sops
External Secrets (re-materializes external creds as Secrets)platform/external-secrets
CNPG (rolls the database in step with the app)platform/cloudnative-pg
Longhorn (carries the PVCs across the upgrade)platform/longhorn
Envoy Gateway (does not lose connections)platform/envoy-gateway
k8up (snapshots happen on a schedule unrelated to deploys)platform/k8up · operations/

The loop, top-down

1. Renovate scans manifests on a schedule

▼ opens a PR with the new image digest
┌──────────┐
│ Gitea │
└────┬─────┘

2. CI runs on gitea-runner
- kustomize build sanity check
- kubectl --dry-run=server (against a kind cluster)
- schema check via cnpg-fixtures

3. Human merges (small bumps auto-merge)


┌──────────┐
│ main │ source of truth
└────┬─────┘

4. Flux source-controller pulls main
5. Flux kustomize-controller renders + applies


┌──────────┐
│ cluster │
└────┬─────┘

6. Kubernetes rolls the workload
- new ReplicaSet, gradual cutover
- PVCs reattach, no data lost
- CNPG handles DB migration / no-op
- Envoy Gateway drains old endpoints


┌────────────────┐
│ observability │
└────────────────┘
7. Prometheus scrapes the new pod
8. Gatus pings the public URL
9. k8up snapshot still runs on its schedule

Nine steps, four moving parts (Renovate, Gitea, Flux, Kubernetes), one human in the middle for non-trivial bumps.

Step 1 — Renovate opens the PR

Renovate runs as a scheduled job on gitea-runner. Configuration is centralized in renovate.json at the repo root.

Two things make the PRs deterministic:

  1. Digest pinning. Image references are stored as image: gitea/gitea@sha256:… rather than image: gitea/gitea:1.25.5. A semver bump and a digest bump are separate events; the diff in a PR shows exactly what changed.
  2. Group rules. Patch updates within a single chart group bundle into one PR; major chart bumps stay solo so they never silently piggy-back on a routine patch.
// renovate.json (excerpt)
{
"extends": ["config:recommended", "helpers:pinGitHubActionDigests"],
"kubernetes": { "fileMatch": ["^k8s/.*\\.ya?ml$"] },
"flux": { "fileMatch": ["^k8s/.*\\.ya?ml$"] },
"packageRules": [
{
"matchManagers": ["docker"],
"pinDigests": true
},
{
"matchUpdateTypes": ["patch", "pin", "digest"],
"automerge": true,
"automergeType": "branch"
}
]
}

Renovate does not deploy anything — it only proposes a commit.

Step 2 — CI runs on the PR

The PR triggers a gitea-runner job that does, roughly:

# Sanity: every kustomization actually builds
for k in $(find k8s -name 'kustomization.yaml' -not -path '*/_*'); do
kustomize build "$(dirname "$k")" > /dev/null
done

# Server dry-run against a fresh kind cluster
kind create cluster --name pr-$PR_NUMBER
flux install --components-extra=image-reflector-controller,image-automation-controller
kubectl apply -k k8s/clusters/talos --dry-run=server

The kind cluster is throwaway — what matters is that the API server validates every manifest the cluster will eventually see. SOPS-encrypted secrets are skipped (the runner doesn't have the age key); they're validated by their decrypt counterparts in step 5.

Step 3 — Merge

Patch + digest PRs auto-merge once CI passes; majors and chart bumps stay open until a human reads them. Either way, the merge produces a fast-forward commit on main.

The instant main advances, every cluster's Flux starts noticing.

Step 4 — Flux pulls the new commit

flux-system/GitRepository is a 1m-interval poll of Gitea over SSH. When a new commit lands, the source-controller writes a tarball into a generation-keyed artifact and notifies the kustomize-controller.

Authentication is via deploy key — the same SSH key documented on the Flux page. Each cluster has its own deploy key; rotating one doesn't disrupt the others.

Step 5 — Kustomize-controller renders and applies

The kustomize-controller takes the Git artifact and kustomize builds it from k8s/clusters/<cluster>/. That root pulls in:

  • k8s/infrastructure/<cluster>/ — controllers and CRDs
  • k8s/apps/<cluster>/ — the workloads
  • k8s/components/<cluster>/ and k8s/resources/<cluster>/ — pulled in via components: references in each app's kustomization.yaml

If any resource references a SOPS-encrypted Secret, Flux decrypts it inline using the in-cluster age key (see SOPS). Decryption happens server-side in the controller — the plaintext never leaves the cluster.

The controller diffs the rendered manifests against the live cluster state and kubectl applys the difference. By default it prunes resources that disappeared from the rendered set, so removing a manifest deletes the resource — that's the GitOps part.

Step 6 — Kubernetes rolls the workload

Once the manifest hits the API server, normal Kubernetes semantics take over. For a typical app upgrade:

  • A new ReplicaSet is created with the new image digest.
  • The Deployment's RollingUpdate strategy (maxSurge: 25%, maxUnavailable: 25% by default) spins up new pods alongside old ones.
  • Longhorn PVCs reattach to the new pods. PVCs are ReadWriteOnce, so the old pod terminates before the new one mounts — the rollout takes the small downtime per pod.
  • CNPG Cluster resources are not touched by app upgrades — only image bumps to the operator itself trigger rolling restarts of the database pods. Schema migrations are an application concern, run by the app itself on startup.
  • Envoy Gateway gradually drains in-flight requests off old endpoints; the routing graph stays consistent throughout.
  • Cilium network policies move with the labels, so policy-by-identity survives the rollout.

The whole rollout for a typical app is on the order of 30–90 seconds, dominated by image pull (mitigated by Spegel's in-cluster registry mirror) and readiness gating.

Step 7 — Prometheus scrapes the new pod

Monitoring (Victoria-Metrics + Grafana via the talos-cluster monitoring overlay) auto-discovers the new pod via the existing ServiceMonitor + PodMonitor resources. No re-scrape configuration; the scrape config is generated by the operator from CRDs that were already in place.

A bad upgrade shows up here first — error rate, p99 latency, restart loop counter — before the user notices.

Step 8 — External health checks

Gatus pings the public URL of every app on a 30-second interval. After a deploy, two checks pass before Gatus stops alerting:

  1. The app's /-/healthy (or equivalent) endpoint returns 200.
  2. The TLS certificate served by Envoy Gateway is valid and chains correctly.

If either fails for two consecutive intervals, an alert fires via ntfy.

Step 9 — Backups don't change

k8up snapshots PVCs to Restic on its own Schedule — independent of the GitOps loop. A bad deploy doesn't compromise yesterday's snapshot; a successful deploy doesn't trigger a new one.

That separation is intentional: it would be very tempting to "snapshot before deploy" but it ties two unrelated reliability systems together. If the deploy is broken in a way that corrupts data on first write, you want to roll back with the previous scheduled snapshot, not a snapshot taken at the moment of breakage.

What can break, and where to look

SymptomMost likely causeWhere to look first
Renovate stops opening PRsgitea-runner is down or rate-limitedapps/gitea-runner logs; gitea API status
PR fails CI on every projectkustomize-build error in a shared componentlast commit to components/
flux get all -A shows False on kustomizationSOPS decrypt failed (key rotated?), or a manifest references a missing CRDflux logs --kind=Kustomization
flux get all -A shows False on gitrepositorySSH deploy key revoked, or Gitea downthe deploy key on the Flux page
Pods stuck PendingPVC pending — Longhorn replica scheduling issueplatform/longhorn
TLS handshake fails after deploycert-manager issued a fresh cert that hasn't propagatedplatform/cert-manager
5xx after deploy, healthy in clusterEdge → production routing wedgetopics/envoy-gateway-proxy-protocol-v2, then fabric/netbird
Backups stalek8up Schedule paused or repository password rotatedoperations/

Why this is a topic, not a runbook

A runbook tells you what to do when X breaks. This page tells you what normally happens, end-to-end, so when something breaks the diagnosis is "where in this loop did it stop?". Operations-side runbooks for the individual failure modes live in operations/.