“The best backup strategy is the one you can actually verify and restore from.”
Why Move from Restic to Kopia?
My original Volsync setup used Restic to back up PVCs directly to Backblaze B2. It worked, but had some pain points:
- No visibility: Restic repositories are opaque. You can’t browse them without CLI tools.
- Slow restores: Every restore required downloading from S3, which is slow and costs egress fees.
- No deduplication across apps: Each app had its own Restic repository with no shared deduplication.
Kopia solves all of these:
- Web UI: Kopia has a built-in web interface to browse snapshots, verify integrity, and trigger restores.
- Local NFS repository: Backups go to NFS first (fast restores), then sync to cloud storage.
- Global deduplication: A single Kopia repository deduplicates across all PVCs.
The pattern I’m following comes from the home-operations community—specifically Devin (onedr0p), Jory (joryirving), and Kashall’s homelab repos.
Architecture Overview
|
|
The key insight: Volsync mover pods need access to the NFS share where Kopia stores its repository. Instead of configuring NFS mounts in every ReplicationSource, we use MutatingAdmissionPolicy to automatically inject the NFS volume into any pod with specific labels.
Prerequisites
Before starting, I needed:
- NFS share on my NAS (citadel.internal) at
/mnt/storage0/backups/VolsyncKopia - 1Password item named
kopiawith aKOPIA_PASSWORDfield - Kubernetes 1.33+ for MutatingAdmissionPolicy support
Step 1: Enable MutatingAdmissionPolicy Feature Gate
MutatingAdmissionPolicy is an alpha feature in Kubernetes 1.33. To enable it on Talos, I added a controller patch:
|
|
The v1beta1 vs v1alpha1 Gotcha
My first attempt used v1beta1 because that’s what the documentation suggested. Wrong. Kubernetes 1.33 only supports v1alpha1—v1beta1 arrives in Kubernetes 1.34.
After applying the feature gate and seeing the API still wasn’t available, I had to:
- Change
runtime-configfromv1beta1tov1alpha1 - Update all MutatingAdmissionPolicy manifests from
v1beta1tov1alpha1 - Reapply the Talos config to all three nodes
Lesson learned: Always verify API versions before implementing:
|
|
Rolling Out Talos Changes Safely
I applied the changes one node at a time to minimize risk:
|
|
Step 2: Create the MutatingAdmissionPolicy
The policy automatically injects an NFS volume into Volsync mover pods:
|
|
This policy:
- Matches any pod with the label
volsync.backube/mover - Injects an NFS volume pointing to the Kopia repository
- Mounts it at
/repositoryin all containers
I also added a jitter policy to prevent all backup jobs from running simultaneously.
Step 3: Deploy the Kopia Server
The Kopia server provides a Web UI for browsing and managing backups:
|
|
Mistakes I Made
1. Repository initialization: My first deployment crashed because the NFS path was empty—no Kopia repository existed. The startup script now auto-initializes if needed.
2. KOPIA_PASSWORD handling: I initially tried passing --password as a flag, which expects interactive input. The fix: rely on the KOPIA_PASSWORD environment variable being read automatically.
3. HOME and USER environment variables: The non-root container couldn’t determine the current user. Adding export HOME=/tmp and export USER=kopia fixed the permission errors.
4. ConfigMap naming: app-template creates ConfigMaps using the release name (kopia), not a custom suffix. I had to change from name: kopia-config to identifier: config to reference the chart-defined ConfigMap correctly.
5. CSRF token errors behind reverse proxy: When accessing the Kopia Web UI through Gateway API, I got “invalid CSRF token” errors flooding the logs. The fix: add --disable-csrf-token-checks to the server start command. This is safe for internal services behind a reverse proxy.
6. Gateway naming conventions: My first attempt used envoy-internal as the gateway name. Wrong—the gateways are just named internal and external in the network namespace. Also forgot the sectionName: https.
7. Missing DNS annotation: Routes need internal-dns.alpha.kubernetes.io/target: internal.${SECRET_DOMAIN} for internal DNS to create records. Without this, the hostname doesn’t resolve.
8. Hardcoded values: Used Pacific/Auckland instead of ${TIMEZONE} and kopia.${SECRET_DOMAIN} instead of "{{ .Release.Name }}.${SECRET_DOMAIN}}". These should use variables for consistency.
9. Wrong UID/GID: Initially used 1000 for the security context. The standard in my cluster is 568 for the apps user/group. This matters for NFS share permissions.
Step 4: Create the Volsync Components
To avoid repeating the same configuration across every app, I created reusable Kustomize components. But I went further than just local NFS—I wanted a proper 3-2-1 backup strategy:
- 3 copies of data (local PVC + NFS + cloud)
- 2 different storage types (Ceph block + NFS + S3)
- 1 offsite copy (cloud)
The Multi-Destination Architecture
|
|
Key insight: The cloud components (B2/R2) don’t need PVC or ReplicationDestination. Restores happen from local NFS first (faster). Cloud backups are for disaster recovery only.
The Root Component
The root kustomization.yaml includes all three destinations:
|
|
This means most apps just need:
|
|
If you only want specific destinations, reference them directly:
|
|
The NFS Component (Primary)
The NFS component has the PVC and ReplicationDestination for restores:
|
|
The ReplicationSource backs up hourly:
|
|
The Cloud Components (Disaster Recovery)
The Backblaze B2 component uses Kopia’s S3-compatible backend:
|
|
The ReplicationSource backs up daily and keeps 14 days:
|
|
Cloudflare R2 follows the same pattern, with the endpoint constructed from the account ID:
|
|
Why Kopia for Cloud Too?
You might wonder why I didn’t stick with Restic for cloud backups. The answer: Restic lock issues. Restic repositories can get stuck with stale locks, requiring manual intervention with restic unlock. Kopia handles concurrent access better and doesn’t have this problem.
The perfectra1n fork of Volsync (ghcr.io/perfectra1n/volsync) supports Kopia’s S3 backend via environment variables:
KOPIA_S3_BUCKET- bucket nameKOPIA_S3_ENDPOINT- S3-compatible endpointAWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY- credentialsKOPIA_REPOSITORY- full repository URL (s3://bucket/path/)
Another Gotcha: ClusterSecretStore Name
I assumed the ClusterSecretStore was named onepassword. It’s actually onepassword-connect. Always verify existing resource names:
|
|
Step 5: Migrate an App (The Hard Way)
Migrating an existing app with data turned out to be more complex than expected. The Kopia volsync component expects PVCs named ${APP} (e.g., romm), but my existing app used romm-data. Here’s the approach that worked:
The Problem: PVC Name Mismatch
My romm app used a PVC named romm-data, but the volsync component creates resources expecting PVC name ${APP} (romm). I tried several approaches that failed:
- Using VOLSYNC_CLAIM variable - The component’s PVC template still created a conflicting
rommPVC - Patching the dataSourceRef - PVC specs are immutable after creation
- Snapshotting a terminating PVC - Can’t add finalizers to a PVC marked for deletion
The Approach That Worked
Step 1: Keep existing backups running
Don’t switch to Kopia immediately. Keep the old Restic-based volsync template running so you have cloud backups.
Step 2: Rename the PVC via snapshot
|
|
Step 3: Update HelmRelease to use new PVC name
|
|
Step 4: Delete old PVC and resume
|
|
The Gotcha: Conflicting dataSourceRef
After the PVC migration, Flux complained about a dry-run failure. The manually-created PVC had dataSource: VolumeSnapshot, but the volsync template wanted dataSourceRef: ReplicationDestination. These are immutable.
The fix: delete the PVC and let the volsync template recreate it from a cloud restore:
|
|
This works because we kept the Restic backups running throughout the migration.
Lessons Learned
This migration surfaced several bad assumptions:
| Assumption | Reality | Impact |
|---|---|---|
| MutatingAdmissionPolicy feature gate “just works” | Requires Talos patch for apiServer extraArgs + runtime-config | Had to create patch, regenerate configs, roll out to all nodes |
| K8s 1.33 uses MutatingAdmissionPolicy v1beta1 | Uses v1alpha1 (v1beta1 is K8s 1.34+) | API server crash, had to fix and reapply |
ClusterSecretStore named onepassword |
Named onepassword-connect |
ExternalSecrets failed to sync |
app-template creates ConfigMap as kopia-config |
Creates as kopia |
Pod stuck in ContainerCreating |
| Kopia repository pre-exists | NFS path was empty | Kopia server crashed on startup |
| Reference patterns from docs were tested | They were aspirational | Multiple fixes needed |
| Can rename PVC by creating new one from snapshot | Works, but dataSource is immutable | Had to delete and restore from cloud |
| PVC dataSourceRef can be patched | PVC spec is immutable after creation | Kustomization dry-run failures |
VolumeSnapshotClass named csi-ceph-blockpool |
Named csi-ceph-block |
Snapshots failed to create |
Gateway named envoy-internal |
Named internal (in network namespace) |
HTTPRoute not attached to gateway |
| Routes auto-create DNS records | Need internal-dns.alpha.kubernetes.io/target annotation |
Hostname didn’t resolve |
| Kopia Web UI works behind reverse proxy | CSRF token validation fails | Had to add --disable-csrf-token-checks |
| Default UID 1000 is fine | Should use 568 to match NFS share permissions | Permission issues on NFS |
| Can use hardcoded timezone | Should use ${TIMEZONE} variable |
Inconsistent with cluster conventions |
Verification Commands
Before implementing, always check:
|
|
Step 6: The Successful Migration
After fixing all the issues, migrating romm to Kopia was straightforward:
1. Update the Kustomization to use the component:
|
|
2. Set the required variables in ks.yaml:
|
|
That’s it! The VOLSYNC_UID and VOLSYNC_GID default to 568, which matches most apps. You only need to specify them if your app uses a different UID/GID:
|
|
3. Commit, push, and reconcile:
|
|
4. Verify the backups:
|
|
The local NFS backup completed in 3 seconds because Kopia’s deduplication recognized the existing data. Cloud backups take longer but run daily for disaster recovery.
Understanding Kopia’s Storage
When I first looked at the NFS share after the migration, I was confused:
|
|
Where’s the romm folder? This is content-addressable storage - Kopia doesn’t store data by source name. Instead:
- Data is deduplicated and compressed into pack blobs (the
p*,q*,s*folders) - Blob names are based on content hashes, not source names
- All apps share the same deduplication pool
- To see the logical structure, use
kopia snapshot list --allor the Web UI
This means if romm and another app have identical files, they’re only stored once. The tradeoff is you can’t browse the repository directly on the NAS - you need Kopia tools.
Important: Never manually delete files from the repository. Kopia uses garbage collection during maintenance to clean up unreferenced blobs safely.
What’s Next
The Kopia infrastructure is deployed and working. Romm is now successfully backing up to all three destinations:
- NFS (hourly) - Fast local restores
- Backblaze B2 (daily) - Off-site disaster recovery
- Cloudflare R2 (daily) - Additional cloud redundancy
The next steps:
romm (games) - Migrated to KopiaDone!- Downloads namespace - qbittorrent, radarr, sonarr, etc.
- Entertainment namespace - plex, jellyfin, tautulli
- Home automation - home-assistant, zigbee2mqtt
The key lesson: keep existing backups running during migration. Don’t switch to the new backup system until you’ve verified the PVC naming is correct and the app is stable. Having cloud backups as a safety net saved me from data loss multiple times during this migration.
Summary
| Component | Purpose |
|---|---|
| MutatingAdmissionPolicy | Auto-inject NFS volume into Volsync mover pods |
| Kopia server | Web UI for browsing/managing NFS backups |
components/volsync |
Root component that includes all 3 destinations |
components/volsync/nfs-truenas |
Primary backup (hourly) with restore capability |
components/volsync/s3-backblaze |
Disaster recovery to B2 (daily) |
components/volsync/s3-cloudflare |
Disaster recovery to R2 (daily) |
The migration from Restic to Kopia took longer than expected due to API version mismatches and incorrect assumptions about resource names. But the end result—a 3-2-1 backup strategy with local NFS for fast restores and dual cloud destinations for disaster recovery—is worth the effort. No more Restic lock issues!
This post documents part of the ongoing work on my home-ops repository. The patterns here are adapted from the excellent home-operations community repos.