From a Renovate PR to a running container
This page traces the full GitOps loop in the homelab — what happens between Renovate opens a pull request and the new image is running in the cluster, with the new database schema applied, the old PVCs intact, and the dashboards showing green.
It crosses every section of the docs, so it lives in topics/ rather than belonging to any one. The actors involved:
| Actor | Where it lives in docs |
|---|---|
Renovate (running in gitea-runner) | apps/gitea-runner |
| Gitea (the source of truth) | apps/gitea |
| Flux (the reconciler) | foundation/flux |
Kustomize + the components/ and per-app overlays | components/ |
| SOPS + age (decrypts secrets in-cluster) | operations/sops |
External Secrets (re-materializes external creds as Secrets) | platform/external-secrets |
| CNPG (rolls the database in step with the app) | platform/cloudnative-pg |
| Longhorn (carries the PVCs across the upgrade) | platform/longhorn |
| Envoy Gateway (does not lose connections) | platform/envoy-gateway |
| k8up (snapshots happen on a schedule unrelated to deploys) | platform/k8up · operations/ |
The loop, top-down
1. Renovate scans manifests on a schedule
│
▼ opens a PR with the new image digest
┌──────────┐
│ Gitea │
└────┬─────┘
│
2. CI runs on gitea-runner
- kustomize build sanity check
- kubectl --dry-run=server (against a kind cluster)
- schema check via cnpg-fixtures
│
3. Human merges (small bumps auto-merge)
│
▼
┌──────────┐
│ main │ source of truth
└────┬─────┘
│
4. Flux source-controller pulls main
5. Flux kustomize-controller renders + applies
│
▼
┌──────────┐
│ cluster │
└────┬─────┘
│
6. Kubernetes rolls the workload
- new ReplicaSet, gradual cutover
- PVCs reattach, no data lost
- CNPG handles DB migration / no-op
- Envoy Gateway drains old endpoints
│
▼
┌────────────────┐
│ observability │
└────────────────┘
7. Prometheus scrapes the new pod
8. Gatus pings the public URL
9. k8up snapshot still runs on its schedule
Nine steps, four moving parts (Renovate, Gitea, Flux, Kubernetes), one human in the middle for non-trivial bumps.
Step 1 — Renovate opens the PR
Renovate runs as a scheduled job on gitea-runner. Configuration is centralized in renovate.json at the repo root.
Two things make the PRs deterministic:
- Digest pinning. Image references are stored as
image: gitea/gitea@sha256:…rather thanimage: gitea/gitea:1.25.5. A semver bump and a digest bump are separate events; the diff in a PR shows exactly what changed. - Group rules. Patch updates within a single chart group bundle into one PR; major chart bumps stay solo so they never silently piggy-back on a routine patch.
// renovate.json (excerpt)
{
"extends": ["config:recommended", "helpers:pinGitHubActionDigests"],
"kubernetes": { "fileMatch": ["^k8s/.*\\.ya?ml$"] },
"flux": { "fileMatch": ["^k8s/.*\\.ya?ml$"] },
"packageRules": [
{
"matchManagers": ["docker"],
"pinDigests": true
},
{
"matchUpdateTypes": ["patch", "pin", "digest"],
"automerge": true,
"automergeType": "branch"
}
]
}
Renovate does not deploy anything — it only proposes a commit.
Step 2 — CI runs on the PR
The PR triggers a gitea-runner job that does, roughly:
# Sanity: every kustomization actually builds
for k in $(find k8s -name 'kustomization.yaml' -not -path '*/_*'); do
kustomize build "$(dirname "$k")" > /dev/null
done
# Server dry-run against a fresh kind cluster
kind create cluster --name pr-$PR_NUMBER
flux install --components-extra=image-reflector-controller,image-automation-controller
kubectl apply -k k8s/clusters/talos --dry-run=server
The kind cluster is throwaway — what matters is that the API server validates every manifest the cluster will eventually see. SOPS-encrypted secrets are skipped (the runner doesn't have the age key); they're validated by their decrypt counterparts in step 5.
Step 3 — Merge
Patch + digest PRs auto-merge once CI passes; majors and chart bumps stay open until a human reads them. Either way, the merge produces a fast-forward commit on main.
The instant main advances, every cluster's Flux starts noticing.
Step 4 — Flux pulls the new commit
flux-system/GitRepository is a 1m-interval poll of Gitea over SSH. When a new commit lands, the source-controller writes a tarball into a generation-keyed artifact and notifies the kustomize-controller.
Authentication is via deploy key — the same SSH key documented on the Flux page. Each cluster has its own deploy key; rotating one doesn't disrupt the others.
Step 5 — Kustomize-controller renders and applies
The kustomize-controller takes the Git artifact and kustomize builds it from k8s/clusters/<cluster>/. That root pulls in:
k8s/infrastructure/<cluster>/— controllers and CRDsk8s/apps/<cluster>/— the workloadsk8s/components/<cluster>/andk8s/resources/<cluster>/— pulled in viacomponents:references in each app'skustomization.yaml
If any resource references a SOPS-encrypted Secret, Flux decrypts it inline using the in-cluster age key (see SOPS). Decryption happens server-side in the controller — the plaintext never leaves the cluster.
The controller diffs the rendered manifests against the live cluster state and kubectl applys the difference. By default it prunes resources that disappeared from the rendered set, so removing a manifest deletes the resource — that's the GitOps part.
Step 6 — Kubernetes rolls the workload
Once the manifest hits the API server, normal Kubernetes semantics take over. For a typical app upgrade:
- A new
ReplicaSetis created with the new image digest. - The
Deployment'sRollingUpdatestrategy (maxSurge: 25%,maxUnavailable: 25%by default) spins up new pods alongside old ones. - Longhorn PVCs reattach to the new pods. PVCs are
ReadWriteOnce, so the old pod terminates before the new one mounts — the rollout takes the small downtime per pod. - CNPG
Clusterresources are not touched by app upgrades — only image bumps to the operator itself trigger rolling restarts of the database pods. Schema migrations are an application concern, run by the app itself on startup. - Envoy Gateway gradually drains in-flight requests off old endpoints; the routing graph stays consistent throughout.
- Cilium network policies move with the labels, so policy-by-identity survives the rollout.
The whole rollout for a typical app is on the order of 30–90 seconds, dominated by image pull (mitigated by Spegel's in-cluster registry mirror) and readiness gating.
Step 7 — Prometheus scrapes the new pod
Monitoring (Victoria-Metrics + Grafana via the talos-cluster monitoring overlay) auto-discovers the new pod via the existing ServiceMonitor + PodMonitor resources. No re-scrape configuration; the scrape config is generated by the operator from CRDs that were already in place.
A bad upgrade shows up here first — error rate, p99 latency, restart loop counter — before the user notices.
Step 8 — External health checks
Gatus pings the public URL of every app on a 30-second interval. After a deploy, two checks pass before Gatus stops alerting:
- The app's
/-/healthy(or equivalent) endpoint returns 200. - The TLS certificate served by Envoy Gateway is valid and chains correctly.
If either fails for two consecutive intervals, an alert fires via ntfy.
Step 9 — Backups don't change
k8up snapshots PVCs to Restic on its own Schedule — independent of the GitOps loop. A bad deploy doesn't compromise yesterday's snapshot; a successful deploy doesn't trigger a new one.
That separation is intentional: it would be very tempting to "snapshot before deploy" but it ties two unrelated reliability systems together. If the deploy is broken in a way that corrupts data on first write, you want to roll back with the previous scheduled snapshot, not a snapshot taken at the moment of breakage.
What can break, and where to look
| Symptom | Most likely cause | Where to look first |
|---|---|---|
| Renovate stops opening PRs | gitea-runner is down or rate-limited | apps/gitea-runner logs; gitea API status |
| PR fails CI on every project | kustomize-build error in a shared component | last commit to components/ |
flux get all -A shows False on kustomization | SOPS decrypt failed (key rotated?), or a manifest references a missing CRD | flux logs --kind=Kustomization |
flux get all -A shows False on gitrepository | SSH deploy key revoked, or Gitea down | the deploy key on the Flux page |
Pods stuck Pending | PVC pending — Longhorn replica scheduling issue | platform/longhorn |
| TLS handshake fails after deploy | cert-manager issued a fresh cert that hasn't propagated | platform/cert-manager |
| 5xx after deploy, healthy in cluster | Edge → production routing wedge | topics/envoy-gateway-proxy-protocol-v2, then fabric/netbird |
| Backups stale | k8up Schedule paused or repository password rotated | operations/ |
Why this is a topic, not a runbook
A runbook tells you what to do when X breaks. This page tells you what normally happens, end-to-end, so when something breaks the diagnosis is "where in this loop did it stop?". Operations-side runbooks for the individual failure modes live in operations/.