Ceph Performance

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Krusty Kloud

New Member
Jan 4, 2022
2
0
1
I have a small Ceph cluster with 4 nodes, each with 1 2TB spinning disk as an OSD. When I create a block device and run a benchmark like bench.sh, I am only getting around 14MB/s. The raw disk by itself gets somewhere around 85MB/s on the same test, so obviously I am doing something wrong here. I've upped the MTU to 9k on all NICs, though they are just 1G. I have not implemented any caching or anything like that, this is just a part of a Charmed/Juju OpenStack install on some older nodes. The tests were performed inside some deployed Debian 10 VMs, one with standard file storage for the disk and one with only a Ceph volume.

I previously had an environment with 3 nodes, 4x Intel DC SSDs, dual 10G SFP+ NICs (2 switches, 1 NIC in each switch port) and got a similar result using a system called PetaSAN (petasan.org). The I/O Speed metric was somewhere in the neighborhood of 70MB/s, where the RAID10 by itself performance was above 1GB/s. I ended up dumping PetaSAN in favor of Ubuntu + targetcli to present the RAID10 volumes as iSCSI targets.

Anyone have any insight into troubleshooting this setup and increasing performance without additional hardware purchases? Not that I am against that, but that has always been my go-to and I'd like to try and take a more holistic approach for when I have a setup like this colocated.
 

pancake_riot

New Member
Nov 5, 2021
20
20
3
Jumbo frames will not do much for you on a 1Gb NIC these days. The big appeal with jumbo frames is reducing the processing overhead by sending more data in fewer frames, but just about everything a NIC does is offloaded to the controller, and modern controllers are more than sufficient to max a 1Gb link with a 1500byte MTU.

What are the specifics of your RBD pool? Replica counts, etc. Ceph Clients communicate directly to the OSDs, so if you're writing 3 replicas to a pool, consider that your client's bandwidth is being split 3 ways. Check the NIC utilization on your client device to see if you're bottlenecking there.

Also consider that disk throughput is a function of the device's IOPS capability (fixed) and the operation's block size (variable). You'll be lucky to hit 100 IOPS on a spinning disk, which at a 1MB block size would give you 100MB/s. Adjust accordingly for smaller block sizes.
 

Krusty Kloud

New Member
Jan 4, 2022
2
0
1
This is all fantastic information, thank you. I was able to snap some old Infiniband stuff on eBay, I'm going to try that out and see if it improves it any.