Enterprise SSDs arrived, so I migrated a live Talos control plane onto them. First I had to fix the backups, then learn that swapping a boot disk on Talos isn't a swap at all — it's a rebuild. Plus the canary node that taught me five things I only half-believed.
The canary migration went perfectly, so I ran the same playbook on the last two nodes. They found five new ways to make me earn it — node-local data that vaporises on reinstall, an OSD that booted faster than its network, a password bug I'd only half-fixed, a restore that raced itself, and a serial number I wrongly swore I couldn't read.
What happens when you put consumer NVMe under an etcd + Ceph mon workload. Part 1 of 2.
Ditching Ollama for LocalAI, battling P2P federation that doesn't work in Kubernetes, and building a self-hosted AI stack with persistent memory.
A journey through TrueNAS, Oracle Cloud, and Hetzner before finally landing on AWS Graviton for running Android containers with acceptable latency from New Zealand.
Deploying Pterodactyl Panel on Kubernetes with Wings running on TrueNAS for self-hosted game server management
How a CephFS sparse file handling quirk silently corrupted my app configs during VolSync restores—and the multi-day recovery effort across qbittorrent, sabnzbd, sonarr, radarr, and filebrowser using a mix of Kopia snapshots and old Restic backups.
A real-world walkthrough of upgrading Ceph from v18 (Reef) through v19 (Squid) to v20 (Tentacle) via GitOps—including the correction of my wrong assumptions about Rook version constraints.
BGP was supposed to fix my hairpin routing issues. It didn't. Here's how CoreDNS rewriting saved the day when pods couldn't reach LoadBalancer VIPs on the same node.
How I replaced Barman Cloud Plugin with pgBackRest to get true dual-destination full backups to both Backblaze B2 and Cloudflare R2, then migrated my entire PostgreSQL infrastructure to PostgreSQL 18.