How to remove Rook from a Kubernetes cluster

Brett Weir Mar 20, 2023 5 min read

I've been wanting to try Rook for a while. I've been wanting to deploy Ceph to learn how to manage a bare-metal storage infrastructure, and Rook is supposed to make this much easier to do in the context of Kubernetes.

Maybe it's my total lack of Ceph experience, but I've struggled to get Rook to do much of anything:

  • What the heck is a "mon" and why does it fail to start?

  • How many replicas and metadata servers do I need?

  • Why aren't my placement groups ever healthy?

  • Why isn't the "mgr" dashboard ever reachable for more than 5 minutes?

The learning process has many false starts, and it's easier to start fresh than to try to recover.

The Rook docs have a page on cleaning up Ceph, but it didn't really do the job for me.

So here's the process I came up with to factory reset my storage setup. These instructions will vary slightly depending on your specific setup.

Prerequisites

These will need to be installed to follow along in this article:

Remove the Rook cluster resources

The Rook Ceph cluster must be uninstalled before the Rook operator can be removed, so we'll do that first:

# delete the cluster
helm uninstall rook-ceph-cluster

I'm not sure if the Rook operator is supposed to clean up these resources for me, but since I don't like waiting, I'll delete them myself:

# ensure Helm resources are deleted
kubectl delete cephobjectstore ceph-objectstore &
kubectl delete cephfilesystem ceph-filesystem &
kubectl delete cephblockpool ceph-blockpool &
kubectl delete cephcluster rook-ceph &

You may have noticed that I sent the previous commands to the background using &. This is because I happen to know that those commands will hang, because they all have finalizers attached to them.

These finalizers never get removed, so I patch the finalizers into oblivion:

# remove finalizers from resources
kubectl get cephblockpools.ceph.rook.io ceph-blockpool -o json | jq '.metadata.finalizers = null' | kubectl apply -f -
kubectl get cephfilesystems.ceph.rook.io ceph-filesystem -o json | jq '.metadata.finalizers = null' | kubectl apply -f -
kubectl get cephobjectstores.ceph.rook.io ceph-objectstore -o json | jq '.metadata.finalizers = null' | kubectl apply -f -
kubectl get cephcluster.ceph.rook.io rook-ceph -o json | jq '.metadata.finalizers = null' | kubectl apply -f -

Delete the local cluster data

After deleting the cluster, data stored on each node at /var/lib/rook is left behind. It took me longer than I'd like to admit to figure this out. Because that data is not deleted, subsequent attempts to deploy a Rook cluster will lead to unpredictable results.

Thus, I'll delete the data manually on each node:

# remove local data
ansible all -i inventory.yml -b -a 'rm -rvf /var/lib/rook'

Remove the volumes

The cluster is gone, but my storage drives will never be re-provisioned as Ceph OSDs until I fully clear them out. I'll do so by doing the following:

# remove logical volumes
ansible all -i inventory.yml -b -m shell -a 'lvdisplay | grep LV\ Path | grep ceph- | awk \{print\ \$3\} | xargs lvremove -y'

# remove volume groups
ansible all -i inventory.yml -b -m shell -a 'vgdisplay | grep VG\ Name | grep ceph- | awk \{print\ \$3\} | xargs vgremove -y'

# remove physical volume labels:
ansible all -i inventory.yml -b -m shell -a 'pvdisplay | grep PV\ Name | awk \{print\ \$3\} | xargs pvremove -y'

The devices will now be available again when I re-deploy my Rook cluster.

Shred the volumes (optional)

Because I am extra paranoid, and the cluster has solid state drives, I want to run a secure erase on the drive with hdparm. This will finish very quickly and return the drives to a blank slate.

The drives are mounted at /dev/sda, so I target them there:

# set the security pass for all /dev/sda drives
ansible all -i inventory.yml -b -a 'hdparm --user-master u --security-set-pass p /dev/sda'

# secure erase all /dev/sda drives using the new password
ansible all -i inventory.yml -b -a 'hdparm --user-master u --security-erase p /dev/sda'

Remove the Rook operator

At least once, the Rook operator itself got into a state I could not comprehend. I don't remember the specifics now, though I'm sure it was user error. In any case, I'll delete it by doing the following:

# delete rook
helm uninstall rook-ceph

This deletes most resources, but not the CRDs. This is because the jury seems to be out in the Kubernetes ecosystem on what constitutes a safe way to manage CRDs, so now Helm just installs them, then promptly abandons them.

This is fine, I guess, but if you attempt to reinstall the Helm chart, Helm will attempt to adopt the CRDs and be unhappy with their now-incorrect labeling. Oof.

So I'll delete all the CRDs, too:

# delete the CRDs
kubectl get crd | grep rook.io | awk '{print $1}' | xargs k delete crd
kubectl get crd | grep openshift.io | awk '{print $1}' | xargs k delete crd
kubectl get crd | grep objectbucket.io | awk '{print $1}' | xargs k delete crd

Move on with life

With the cluster cleaned out, I can safely remove the attached storage and get on with my life.

For my next attempt at storage, I will either spend a lot more time reading the Ceph documentation, or find a simpler stack to get started with.


Tags

#ceph #kubernetes #rook #storage