“You can’t skip Ceph versions. But you can be wrong about Rook constraints.”
The Situation
Reef (v18) end-of-life is August 2025. Tentacle (v20) shipped in November 2025. Time to upgrade.
I initially wrote a combined “Reef to Tentacle” upgrade guide, planning to do both hops in sequence. Then I read the Rook compatibility matrix and concluded I needed to wait for Rook v1.19 to get Tentacle support.
I was wrong.
The Rook Constraint (Corrected)
Here’s what I originally thought:
| Rook Version |
Supported Ceph Versions |
| v1.17.x |
Reef only |
| v1.18.x |
Reef + Squid |
| v1.19+ |
Squid + Tentacle (drops Reef) |
The reality: Rook v1.18.8 added Tentacle support. You don’t need to wait for v1.19.
From this blog post:
“If you’re running Ceph within Kubernetes using Rook Ceph, and you want to use Tentacle without the unsupported flag, you need to update at least to version v1.18.8.”
So the actual support matrix is:
| Rook Version |
Supported Ceph Versions |
| v1.18.0 - v1.18.7 |
Reef + Squid |
| v1.18.8+ |
Reef + Squid + Tentacle |
The upgrade path still requires going through Squid—you can’t skip versions—but you can do both hops on Rook v1.18.8.
My Starting Point
1
2
3
4
5
|
$ kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph version
ceph version 18.2.7 (2cf3b0098dc3cbb1b6f2e8d8ed9df8c65b6aee53) reef (stable)
$ kubectl -n rook-ceph get deploy rook-ceph-operator -o jsonpath='{.spec.template.spec.containers[0].image}'
ghcr.io/rook/ceph:v1.18.8
|
Reef v18.2.7 on Rook v1.18.8. Good to go for the full journey.
Phase 1: Reef to Squid
Step 1: Backup Everything
Ceph upgrades are one-way. Once you run require-osd-release squid, there’s no going back to Reef.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
BACKUP_DIR=~/backups/ceph/migration-$(date +%Y%m%d)
mkdir -p $BACKUP_DIR
# Cluster state
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph status > $BACKUP_DIR/ceph-status.txt
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd tree > $BACKUP_DIR/osd-tree.txt
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd crush dump > $BACKUP_DIR/crush-map.json
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph config dump > $BACKUP_DIR/config-dump.txt
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph versions > $BACKUP_DIR/versions.txt
# Kubernetes resources
kubectl -n rook-ceph get cephcluster -o yaml > $BACKUP_DIR/cephcluster.yaml
kubectl get pods -n rook-ceph -o wide > $BACKUP_DIR/pods.txt
kubectl get pvc -A | grep -E "ceph-block|ceph-filesystem" > $BACKUP_DIR/pvcs.txt
|
Step 2: Set Safety Flags
These prevent Ceph from marking OSDs as “out” and rebalancing data during the rolling restart:
1
2
3
4
5
6
|
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd set noout
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd set norebalance
# Verify
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd dump | grep flags
# flags noout,norebalance
|
Without these flags, Ceph gets nervous when daemons restart and starts shuffling data around. That slows down the upgrade and adds risk.
Step 3: Update the HelmRelease
The actual change is one line:
1
2
3
4
5
|
# kubernetes/apps/rook-ceph/rook-ceph/cluster/helmrelease.yaml
cephClusterSpec:
cephVersion:
image: quay.io/ceph/ceph:v19.2.3-20250717 # Was v18.2.7
allowUnsupported: false
|
Note the build-specific tag (v19.2.3-20250717). Don’t use just v19.2.3—the build suffix ensures you get a specific, tested image rather than whatever “latest v19.2.3” happens to be.
Step 4: Deploy via GitOps
1
2
3
|
git add kubernetes/apps/rook-ceph/rook-ceph/cluster/helmrelease.yaml
git commit -m "feat(rook-ceph): upgrade Ceph from Reef v18.2.7 to Squid v19.2.3"
git push
|
Flux picks up the change and triggers Rook to start the rolling upgrade.
Step 5: Watch the Rolling Upgrade
Rook upgrades daemons in a specific order: MON → MGR → MDS → OSD → RGW.
1
2
3
4
5
6
7
8
|
# Terminal 1: Watch pods
kubectl -n rook-ceph get pods -w
# Terminal 2: Watch Ceph status
watch -n 10 'kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph -s'
# Periodically check versions
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph versions
|
During the upgrade, you’ll see brief HEALTH_WARN states as daemons restart. This is normal. Only worry if you see HEALTH_ERR persisting for more than 10 minutes.
My 3-node cluster upgraded in about 3 minutes:
1
2
3
4
5
6
7
|
{
"mon": { "ceph version 19.2.3 (...) squid (stable)": 3 },
"mgr": { "ceph version 19.2.3 (...) squid (stable)": 2 },
"osd": { "ceph version 19.2.3 (...) squid (stable)": 3 },
"mds": { "ceph version 19.2.3 (...) squid (stable)": 2 },
"rgw": { "ceph version 19.2.3 (...) squid (stable)": 2 }
}
|
Step 6: Finalize the Squid Upgrade
This step is critical. It tells Ceph that all OSDs are now Squid-capable, enabling Squid-specific features:
1
2
3
4
5
6
7
8
9
10
|
# Check current state
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd dump | grep require_osd_release
# require_osd_release reef
# Set Squid requirement (NO GOING BACK AFTER THIS)
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd require-osd-release squid
# Verify
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd dump | grep require_osd_release
# require_osd_release squid
|
Step 7: Unset Safety Flags
1
2
|
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd unset noout
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd unset norebalance
|
Step 8: Verify Health
1
2
3
4
5
|
$ kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph health detail
HEALTH_OK
$ kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph pg stat
169 pgs: 169 active+clean
|
Phase 1 complete. Cluster is on Squid and healthy.
Phase 2: Squid to Tentacle
After running Squid stably for a while (in my case, a few hours—I was impatient), I proceeded with the Tentacle upgrade.
Tentacle Breaking Changes
Before upgrading, I checked the Tentacle release notes for breaking changes:
| Change |
Impact on My Cluster |
| RGW tenant-level IAM deprecated |
Not using tenant IAM. No impact. |
restful mgr module removed |
Not using REST API module. No impact. |
zabbix mgr module removed |
Not using Zabbix. No impact. |
| Erasure coding default changed to ISA-L |
Only affects new pools. Existing pools unchanged. |
osd_repair_during_recovery option removed |
Not in my config. No impact. |
Step 1: Backup Again
Same process, new directory:
1
2
3
4
5
6
7
8
9
10
|
BACKUP_DIR=~/backups/ceph/squid-to-tentacle-$(date +%Y%m%d)
mkdir -p $BACKUP_DIR
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph status > $BACKUP_DIR/ceph-status.txt
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd tree > $BACKUP_DIR/osd-tree.txt
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd crush dump > $BACKUP_DIR/crush-map.json
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph config dump > $BACKUP_DIR/config-dump.txt
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph versions > $BACKUP_DIR/versions.txt
kubectl -n rook-ceph get cephcluster -o yaml > $BACKUP_DIR/cephcluster.yaml
kubectl get pods -n rook-ceph -o wide > $BACKUP_DIR/pods.txt
|
Step 2: Set Safety Flags
1
2
|
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd set noout
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd set norebalance
|
Step 3: Update HelmRelease to Tentacle
1
2
3
4
5
|
# kubernetes/apps/rook-ceph/rook-ceph/cluster/helmrelease.yaml
cephClusterSpec:
cephVersion:
image: quay.io/ceph/ceph:v20.2.0-20251104 # Was v19.2.3-20250717
allowUnsupported: false
|
Step 4: Deploy via GitOps
1
2
3
|
git add kubernetes/apps/rook-ceph/rook-ceph/cluster/helmrelease.yaml
git commit -m "feat(rook-ceph): upgrade Ceph from Squid v19.2.3 to Tentacle v20.2.0"
git push
|
Step 5: Watch the Rolling Upgrade
Same monitoring as before:
1
2
3
|
kubectl -n rook-ceph get pods -w
watch -n 10 'kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph -s'
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph versions
|
The upgrade took about 3-4 minutes. I watched the versions transition:
1
2
3
4
5
|
MONs: 2/3 Tentacle... 3/3 Tentacle ✓
MGRs: 0/2 Tentacle... 2/2 Tentacle ✓
MDS: 0/2 Tentacle... 2/2 Tentacle ✓
OSDs: 0/3 Tentacle... 1/3... 2/3... 3/3 Tentacle ✓
RGWs: 0/2 Tentacle... 1/2... 2/2 Tentacle ✓
|
Final state:
1
2
3
4
5
6
7
|
{
"mon": { "ceph version 20.2.0 (...) tentacle (stable)": 3 },
"mgr": { "ceph version 20.2.0 (...) tentacle (stable)": 2 },
"osd": { "ceph version 20.2.0 (...) tentacle (stable)": 3 },
"mds": { "ceph version 20.2.0 (...) tentacle (stable)": 2 },
"rgw": { "ceph version 20.2.0 (...) tentacle (stable)": 2 }
}
|
Step 6: Finalize the Tentacle Upgrade
Interestingly, this was already set automatically:
1
2
|
$ kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd dump | grep require_osd_release
require_osd_release tentacle
|
Rook must have set it after detecting all OSDs were on Tentacle. If yours isn’t set:
1
|
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd require-osd-release tentacle
|
Step 7: Unset Safety Flags
1
2
|
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd unset noout
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd unset norebalance
|
Step 8: Final Health Check
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
|
$ kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph health detail
HEALTH_OK
$ kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph -s
cluster:
id: 3b3d504b-96b4-4102-9ce3-c91b6bc2948d
health: HEALTH_OK
services:
mon: 3 daemons, quorum a,b,c (age 6m)
mgr: b(active, since 5m), standbys: a
mds: 1/1 daemons up, 1 hot standby
osd: 3 osds: 3 up (since 2m), 3 in (since 4w)
rgw: 2 daemons active (2 hosts, 1 zones)
data:
volumes: 1/1 healthy
pools: 12 pools, 169 pgs
objects: 55.48k objects, 104 GiB
usage: 305 GiB used, 4.9 TiB / 5.2 TiB avail
pgs: 169 active+clean
|
Done. Reef → Squid → Tentacle complete.
One thing that caught me off guard during Reef → Squid: the Ceph toolbox doesn’t auto-upgrade with the cluster. It’s a separate deployment.
Turns out Rook does update it automatically now (at least in v1.18.8). But if yours is still on an old version:
1
|
kubectl -n rook-ceph set image deploy/rook-ceph-tools rook-ceph-tools=quay.io/ceph/ceph:v20.2.0-20251104
|
Using mismatched toolbox and cluster versions can cause confusing behavior—the CLI tools might not understand newer cluster features.
Lessons Learned
| Assumption |
Reality |
| Can upgrade Reef → Tentacle directly |
Must go Reef → Squid → Tentacle, one version at a time |
| Need Rook v1.19 for Tentacle |
Rook v1.18.8 already supports Tentacle |
| Toolbox auto-upgrades with cluster |
It does now, but verify anyway |
| Generic version tags are fine |
Use build-specific tags (e.g., v20.2.0-20251104) for reproducibility |
require_osd_release needs manual setting |
Rook set it automatically for Tentacle (but verify) |
Quick Reference
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
|
# Check current versions
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph version
kubectl -n rook-ceph get deploy rook-ceph-operator -o jsonpath='{.spec.template.spec.containers[0].image}'
# Set/unset safety flags
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd set noout
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd set norebalance
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd unset noout
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd unset norebalance
# Finalize upgrade (NO UNDO)
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd require-osd-release tentacle
# Check health
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph health detail
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph pg stat
|
References
The upgrade guides are in my home-ops repository under docs/Guides/Storage/.