Advanced bare-metal storage: Rook-Ceph

A project log for Project Dandelion

Ubuntu 21.10 with microk8s/1.21.6 on Raspberry Pi 4

carboncycleCarbonCycle 02/03/2021 at 23:480 Comments

I have fortunately have had a chance to spend copious quality time with the rook-ceph project recently on x86_64.   For some odd reason, storage continues to be a significant pain-point in bare-metal Kubernetes.   Recent architectural choices for Ceph to use the Container Storage Interface ( CSI ) have made the deployment of storage services more consistent with  scalable micro-service concepts and intentions with the Rook operator.    I looked around for other efforts to deploy rook-ceph on ARM64 and found this blog that was encouraging.    I want to deploy a full 3-node fault-tolerant version - that can provide dynamic provisioning for persistent volume claims.    I have successfully deployed this service on x86_64 - and learned some lessons on interpretation of OSS documentation - that probably shouldn't have been as difficult as it was.    Plain explanations for basic requirements are actually rare - and what does exist seems to require extensive background to grok the nuance.      I would like to help others find the easy-button to at least get a rook-ceph deployment up and running in their home clusters.    Due to circumstances, I have some spare time to pursue this goal.    I haven't actually tried this yet, found all the containers and such.  I also expect I'll need to tune the resources to the scale of my cluster.   I'll need to share some drawings of the component layout - so the bare-metal configuration is explicit.

Once I have working a working helm chart for the Rook operator and cluster manifest for the Ceph components, I'll post them here.    I'll be using external USB disk for the supporting storage RADOS 

I have been working on the hardware ARM cluster, adding sensors for individual node voltage and current.   I've also been evaluating battery backup boards.    I'm rebuilding the cluster with the most recent Ubuntu 20.10 images - because of the improved kernel support there.    I did some memory upgrades in my x86_64 nodes.   One of the interesting tools I have come across is Lens - incredibly useful for x86_64 - also capable of use with the ARM64 cluster - however that becomes problematic if you use the built-in charts that are not ARM64 aware.