Ceph Performance

Krusty Kloud · Jan 4, 2022

I have a small Ceph cluster with 4 nodes, each with 1 2TB spinning disk as an OSD. When I create a block device and run a benchmark like bench.sh, I am only getting around 14MB/s. The raw disk by itself gets somewhere around 85MB/s on the same test, so obviously I am doing something wrong here. I've upped the MTU to 9k on all NICs, though they are just 1G. I have not implemented any caching or anything like that, this is just a part of a Charmed/Juju OpenStack install on some older nodes. The tests were performed inside some deployed Debian 10 VMs, one with standard file storage for the disk and one with only a Ceph volume.

I previously had an environment with 3 nodes, 4x Intel DC SSDs, dual 10G SFP+ NICs (2 switches, 1 NIC in each switch port) and got a similar result using a system called PetaSAN (petasan.org). The I/O Speed metric was somewhere in the neighborhood of 70MB/s, where the RAID10 by itself performance was above 1GB/s. I ended up dumping PetaSAN in favor of Ubuntu + targetcli to present the RAID10 volumes as iSCSI targets.

Anyone have any insight into troubleshooting this setup and increasing performance without additional hardware purchases? Not that I am against that, but that has always been my go-to and I'd like to try and take a more holistic approach for when I have a setup like this colocated.

BoredSysadmin · Jan 4, 2022

did you check the basics?
Ceph Performance Tuning Checklist

Ceph Tuning | Tuning Guide | SUSE Enterprise Storage 6

Ceph all-flash/NVMe performance: benchmark and optimization

How to do tuning on a NVMe-backed Ceph cluster? This article describes what we did and how we measured the results based on the IO500 benchmark.

croit.io

pancake_riot · Jan 4, 2022

Jumbo frames will not do much for you on a 1Gb NIC these days. The big appeal with jumbo frames is reducing the processing overhead by sending more data in fewer frames, but just about everything a NIC does is offloaded to the controller, and modern controllers are more than sufficient to max a 1Gb link with a 1500byte MTU.

What are the specifics of your RBD pool? Replica counts, etc. Ceph Clients communicate directly to the OSDs, so if you're writing 3 replicas to a pool, consider that your client's bandwidth is being split 3 ways. Check the NIC utilization on your client device to see if you're bottlenecking there.

Also consider that disk throughput is a function of the device's IOPS capability (fixed) and the operation's block size (variable). You'll be lucky to hit 100 IOPS on a spinning disk, which at a 1MB block size would give you 100MB/s. Adjust accordingly for smaller block sizes.

Krusty Kloud · Jan 12, 2022

This is all fantastic information, thank you. I was able to snap some old Infiniband stuff on eBay, I'm going to try that out and see if it improves it any.

Search

Ceph Performance

Krusty Kloud

New Member

BoredSysadmin

Not affiliated with Maxell

Ceph all-flash/NVMe performance: benchmark and optimization

pancake_riot

New Member

Krusty Kloud

New Member