Benchmarking SSDs - what tools do you use?

lunadesign

Member
Aug 7, 2013
188
20
18
As part of my testing/burn-in of recently acquired SSDs, I like to run some benchmarks like CrystalDiskMark and Iometer. If the results are reasonable and a post-test SMART check with CrystalDiskInfo doesn't flag anything obvious, I put the SSD into production.

I'm new to NVMe SSDs and got some lower-than-expected Iometer results on a P4801X, specifically with the IOPS numbers on random read and write. Specifically, the figures I got were quite a bit lower than the IOPS specs on ark.intel.com. But the CrystalDiskMark numbers look reasonable.

Maybe I need a new app?

This leads me to two questions:
1) What software are people using nowadays to benchmark SSDs (particularly with NVMe SSDs)?
2) Does anyone know how Intel measures their IOPS numbers?

Thanks!
 

NateS

Member
Apr 19, 2021
81
42
18
Sacramento, CA, US
1) I believe fio on linux is most common for enterprise drives these days. For client, CrystalDiskMark and ATTO seem popular.
2) Internally, we check with pretty much all the publicly available tools, plus some internally developed ones. The numbers that get published I believe come from fio (but don't quote me on this; I'm not 100% sure).

Regardless, you should be able to get numbers close to the published spec out of the P4801X with just CDM and IOMeter, though it may take some system configuration tweaking. Since the latency on these drives is so low, an inefficient driver or the wrong kernel settings can really hurt performance.

Intel has a page on how to tune Linux for Optane performance here: Tuning the performance of Intel® Optane™ SSDs on Linux Operating Systems – IT Peer Network

For Windows, I'd recommend using the Intel NVMe driver rather than the standard Windows one: Download Datacenter NVMe* Microsoft Windows* Drivers for Intel® SSDs


Standard disclaimer: I work for Intel on Optane drives, but I am not an official Intel spokesperson, just a lowly engineer, and my posts should not be seen as official statements by Intel. All opinions are my own.
 
Last edited:

lunadesign

Member
Aug 7, 2013
188
20
18
1) I believe fio on linux is most common for enterprise drives these days. For client, CrystalDiskMark and ATTO seem popular.
2) Internally, we check with pretty much all the publicly available tools, plus some internally developed ones. The numbers that get published I believe come from fio (but don't quote me on this; I'm not 100% sure).

Regardless, you should be able to get numbers close to the published spec out of the P4801X with just CDM and IOMeter, though it may take some system configuration tweaking. Since the latency on these drives is so low, an inefficient driver or the wrong kernel settings can really hurt performance.

Intel has a page on how to tune Linux for Optane performance here: Tuning the performance of Intel® Optane™ SSDs on Linux Operating Systems – IT Peer Network

For Windows, I'd recommend using the Intel NVMe driver rather than the standard Windows one: Download Datacenter NVMe* Microsoft Windows* Drivers for Intel® SSDs


Standard disclaimer: I work for Intel on Optane drives, but I am not an official Intel spokesperson, just a lowly engineer, and my posts should not be seen as official statements by Intel. All opinions are my own.
Wow -- this is super helpful, NateS. Thanks!

My current benchmark system uses Win 8.1 with the latest Intel Datacenter NVMe drivers.

The only tool that gets IOPS numbers on this system in the neighborhood of the Intel specs are Anvil's Storage Utilities. I have limited experience with that tool so I don't know how trustworthy it is.

Do you know of a similar performance tuning guide for Windows?

If not, I can probably get CentOS on this box pretty quickly. It might take me a while to figure out fio but the sample script on the link you provided looks like a great start.

Thanks again!
 

NateS

Member
Apr 19, 2021
81
42
18
Sacramento, CA, US
I don't know of a similar guide for Windows, but I think that's just because Windows is less tunable overall, and what tuning can be done is in the driver. Linux is going to give the most performance, but using the Intel driver on Windows should get you close.

I'm not super familiar with Anvil, but I can't think of any reason it would falsely report higher numbers than it's seeing, so that's probably real. Most likely it just happens to be more optimized by default for Optane than the other tools.

I'd be curious what numbers you're seeing from IOMeter on Windows with the Intel driver. That's one I've personally done many times, and in my experience the numbers should be fairly close to the published spec.

If it turns out your Optane drive actually is performing a lot below spec, the first thing I'd check is your cooling, especially if the drive is an M.2. The M.2 form factor doesn't dissipate heat nearly as well as the U.2 or AIC, so if you don't have a lot of airflow and/or a big heatsink on it, it's possible you're hitting thermal throttling. Also, if the drive is new and has likely been sitting on a warehouse shelf for a few months, performance may be a bit lower for the first 24-48 hours its powered on, since the ECC has to work a little harder at first, but that effect will go away soon.
 
Last edited:

lunadesign

Member
Aug 7, 2013
188
20
18
I don't know of a similar guide for Windows, but I think that's just because Windows is less tunable overall, and what tuning can be done is in the driver. Linux is going to give the most performance, but using the Intel driver on Windows should get you close.
OK, that's good to know.

I'm not super familiar with Anvil, but I can't think of any reason it would falsely report higher numbers than it's seeing, so that's probably real. Most likely it just happens to be more optimized by default for Optane than the other tools.
That makes sense although I'll note that Anvil hasn't been updated since 2014ish, which if I recall correctly was a year before Optane was first released?

I'd be curious what numbers you're seeing from IOMeter on Windows with the Intel driver. That's one I've personally done many times, and in my experience the numbers should be fairly close to the published spec.
Sure. With Iometer, I typically run four tests (2M seq read, 2M seq write, 4K random read, 4K random write) and I run each one at QDs from 1 to 32. Before starting, I do a secure erase to be at a known "clean" state.

It occurs to me that the the order may be important. I run test #1 (2M seq read) at QD1 all the way to QD32, then test #2 at QD1 all the way to QD32, and so on.

Here are the results (with lowest/highest scores regardless of QD):
2M seq read: 1,016 - 1,102 IOPS / 2,132 - 2,312 MB/s
2M seq write: 530 - 551 IOPS / 1,113 - 1,155 MB/s
4K random read: 55,512 - 145,284 IOPS / 227 - 595 MB/s
4K random write: 46,527 - 134,018 IOPS / 190 - 549 MB/s

The sequential performance I'm seeing seems very much inline with the specs (2,200 read, 1,000 MB/s write). The random performance I'm seeing is pretty low compared to the specs (550,000 IOPS read, 250,000 IOPS write).

If it turns out your Optane drive actually is performing a lot below spec, the first thing I'd check is your cooling, especially if the drive is an M.2. The M.2 form factor doesn't dissipate heat nearly as well as the U.2 or AIC, so if you don't have a lot of airflow and/or a big heatsink on it, it's possible you're hitting thermal throttling.
The P4801X I have is the 100 GB U.2 model and it has a 80mm fan pointing directly at it. Intel MAS reports the idle temp at 33 C. I watched MAS during some earlier tests and don't recall it going over 39 C but haven't actually watched it during the Iometer random tests. I could try that if you think it would be helpful.

Also, if the drive is new and has likely been sitting on a warehouse shelf for a few months, performance may be a bit lower for the first 24-48 hours its powered on, since the ECC has to work a little harder at first, but that effect will go away soon.
I just received the drive a week or two ago and have been using it on and off but probably haven't had it powered up for more than 24 hours at a time. I've never heard of an ECC warm-up period so I'd definitely like to know a bit more about that.

In case it helps, I'm running all of this on a Supermicro X9 server board with an Intel E5-1650 v2 (Ivy Bridge) proc and 16 GB RAM. BIOS is set to Max Performance / Performance and the OS power plan is set to High Performance.

THANKS for your help!
 

i386

Well-Known Member
Mar 18, 2016
2,509
703
113
32
Germany
I like and use diskspd from microsoft.
It's an opensource succesor to sqlio (which was used to benchmark storage for sql servers) and it's used by other benchmark tools (eg. crystal diskmark).
Link: microsoft/diskspd
It can be used to generate io simulating different workloads and has many options for controlling the different caches. In the beginning all the options are overwhelming, but once you master them it's the best tool for benchmarking storage systems imo :D
 
  • Like
Reactions: donedeal19

NateS

Member
Apr 19, 2021
81
42
18
Sacramento, CA, US
That makes sense although I'll note that Anvil hasn't been updated since 2014ish, which if I recall correctly was a year before Optane was first released?
I don't mean that it was optimized specifically for Optane, just that by coincidence it may be that its access pattern shows Optane in its best light.

Sure. With Iometer, I typically run four tests (2M seq read, 2M seq write, 4K random read, 4K random write) and I run each one at QDs from 1 to 32. Before starting, I do a secure erase to be at a known "clean" state.

It occurs to me that the the order may be important. I run test #1 (2M seq read) at QD1 all the way to QD32, then test #2 at QD1 all the way to QD32, and so on.

Here are the results (with lowest/highest scores regardless of QD):
2M seq read: 1,016 - 1,102 IOPS / 2,132 - 2,312 MB/s
2M seq write: 530 - 551 IOPS / 1,113 - 1,155 MB/s
4K random read: 55,512 - 145,284 IOPS / 227 - 595 MB/s
4K random write: 46,527 - 134,018 IOPS / 190 - 549 MB/s

The sequential performance I'm seeing seems very much inline with the specs (2,200 read, 1,000 MB/s write). The random performance I'm seeing is pretty low compared to the specs (550,000 IOPS read, 250,000 IOPS write).
Yeah, those IOPS numbers do look a bit low.

The order really shouldn't matter much actually. It may matter a lot with Nand drives, but one of the big benefits of Optane is that it doesn't need nearly the amount of performance-killing background management as Nand does, so your current IOs shouldn't be very affected by any previous ones.

The P4801X I have is the 100 GB U.2 model and it has a 80mm fan pointing directly at it. Intel MAS reports the idle temp at 33 C. I watched MAS during some earlier tests and don't recall it going over 39 C but haven't actually watched it during the Iometer random tests. I could try that if you think it would be helpful.


I just received the drive a week or two ago and have been using it on and off but probably haven't had it powered up for more than 24 hours at a time. I've never heard of an ECC warm-up period so I'd definitely like to know a bit more about that.

In case it helps, I'm running all of this on a Supermicro X9 server board with an Intel E5-1650 v2 (Ivy Bridge) proc and 16 GB RAM. BIOS is set to Max Performance / Performance and the OS power plan is set to High Performance.

THANKS for your help!
Yeah, that cooling setup is fine then; it's the exact same thing I've got on my desk at work.

I should be more clear, it's not that the ECC needs time to warm up, it's that its likely some errors accumulated during the shipping process that it needs time to correct. The time between when a drive leaves the factory and ends up in a customer's hands can be fairly difficult on a drive, since it's often passing through some temperature extremes in unconditioned warehouses and trucks in different parts of the world, and since it's powered off, the ECC can't be correcting errors as they happen. So the first time its powered on after shipping (or first few if it's not left on long), the ECC has some catch-up work to do, but that's a one time thing -- it shouldn't happen every time its powered on. If you've had it on a total of more than 48 hours in the weeks you've had it, even if it's not all at once, this is probably not the problem.

Now that you've given your system specs, I'm thinking it may actually be your CPU that's the IOPS bottleneck here. Each IO measured for an IOPS calculation is a round trip time from when an application like IOMeter requests a read or write to when it's completed, but that all has to go through the OS's block layer and the NVMe driver before the disk even gets the request, so CPU speed can play a big part in the IOPS results. There's a bit about that on this post about optimizing hardware for Optane performance: Optimizing hardware for Intel Optane SSD benchmarking – IT Peer Network
 

lunadesign

Member
Aug 7, 2013
188
20
18
I don't mean that it was optimized specifically for Optane, just that by coincidence it may be that its access pattern shows Optane in its best light.
Ah, got it.

Yeah, those IOPS numbers do look a bit low.

The order really shouldn't matter much actually. It may matter a lot with Nand drives, but one of the big benefits of Optane is that it doesn't need nearly the amount of performance-killing background management as Nand does, so your current IOs shouldn't be very affected by any previous ones.
I usually run my full set of tests back-to-back and usually I see a bit of a drop off in the 2nd round with NAND drives, but definitely not with Optane. Very neat!

I should be more clear, it's not that the ECC needs time to warm up, it's that its likely some errors accumulated during the shipping process that it needs time to correct. The time between when a drive leaves the factory and ends up in a customer's hands can be fairly difficult on a drive, since it's often passing through some temperature extremes in unconditioned warehouses and trucks in different parts of the world, and since it's powered off, the ECC can't be correcting errors as they happen. So the first time its powered on after shipping (or first few if it's not left on long), the ECC has some catch-up work to do, but that's a one time thing -- it shouldn't happen every time its powered on. If you've had it on a total of more than 48 hours in the weeks you've had it, even if it's not all at once, this is probably not the problem.
Thanks! That's a very good explanation. So Secure Erase doesn't take care of this? And I assume this problem also applies to NAND drives?

Now that you've given your system specs, I'm thinking it may actually be your CPU that's the IOPS bottleneck here. Each IO measured for an IOPS calculation is a round trip time from when an application like IOMeter requests a read or write to when it's completed, but that all has to go through the OS's block layer and the NVMe driver before the disk even gets the request, so CPU speed can play a big part in the IOPS results. There's a bit about that on this post about optimizing hardware for Optane performance: Optimizing hardware for Intel Optane SSD benchmarking – IT Peer Network
That article is very interesting. I can definitely try changing those BIOS params and see what happens.

It's not completely surprising that a 2013 CPU can't fully drive a 2018 SSD but I'm a little surprised since it's a relatively high frequency workstation/server CPU.

I wonder how much of this is because I'm testing on a client OS vs a server OS. Or Windows vs Linux for that matter.

I was hoping to hold onto my test bench system a little longer but maybe it's time to upgrade it.

Thanks again for your guidance! We're lucky to have you as a forum member!
 

lunadesign

Member
Aug 7, 2013
188
20
18
I like and use diskspd from microsoft.
It's an opensource succesor to sqlio (which was used to benchmark storage for sql servers) and it's used by other benchmark tools (eg. crystal diskmark).
Link: microsoft/diskspd
It can be used to generate io simulating different workloads and has many options for controlling the different caches. In the beginning all the options are overwhelming, but once you master them it's the best tool for benchmarking storage systems imo :D
Thanks! I'll give this a try as well!
 

NateS

Member
Apr 19, 2021
81
42
18
Sacramento, CA, US
Thanks! That's a very good explanation. So Secure Erase doesn't take care of this? And I assume this problem also applies to NAND drives?
Yeah, it applies in varying degrees to all technologies, probably including hard drives too (though I don't have as much experience with the internals of those). And how quickly the drive tries to finish this process (which trades off how much of a performance impact there will be with how long it will take) is a choice built into the firmware, and different products/manufacturers/fw versions may do different things.

Secure erase does less than you may think. All it does is erase the current encryption key and generate a new one, which does effectively erase everything since the drive can no longer read it, but it doesn't actually do writes to the full drive, and the data (and associated metadata) doesn't actually get refreshed. If you want to speed up this process, just writing zeroes to the full drive should do it.

That article is very interesting. I can definitely try changing those BIOS params and see what happens.

It's not completely surprising that a 2013 CPU can't fully drive a 2018 SSD but I'm a little surprised since it's a relatively high frequency workstation/server CPU.

I wonder how much of this is because I'm testing on a client OS vs a server OS. Or Windows vs Linux for that matter.

I was hoping to hold onto my test bench system a little longer but maybe it's time to upgrade it.

Thanks again for your guidance! We're lucky to have you as a forum member!
Yeah, one of the challenges of Optane is that it often forces upgrades of other parts of the system, both hardware and software, to fully take advantage of it. All the little inefficiencies in the storage stack which only slowed things down a few percent when paired with slow storage are suddenly a big deal when your storage latency is near zero.

I can't completely rule out that your drive is misbehaving, but since you're hitting the performance numbers on throughput (which don't need much CPU power) but not IOPS (which need a powerful CPU and optimized software stack to achieve), I think it's likely that your system can't take full advantage of the drive's theoretical IOPS.
 

lunadesign

Member
Aug 7, 2013
188
20
18
Yeah, it applies in varying degrees to all technologies, probably including hard drives too (though I don't have as much experience with the internals of those). And how quickly the drive tries to finish this process (which trades off how much of a performance impact there will be with how long it will take) is a choice built into the firmware, and different products/manufacturers/fw versions may do different things.

Secure erase does less than you may think. All it does is erase the current encryption key and generate a new one, which does effectively erase everything since the drive can no longer read it, but it doesn't actually do writes to the full drive, and the data (and associated metadata) doesn't actually get refreshed. If you want to speed up this process, just writing zeroes to the full drive should do it.
Got it. That's all very good to know. Thanks!

Yeah, one of the challenges of Optane is that it often forces upgrades of other parts of the system, both hardware and software, to fully take advantage of it. All the little inefficiencies in the storage stack which only slowed things down a few percent when paired with slow storage are suddenly a big deal when your storage latency is near zero.

I can't completely rule out that your drive is misbehaving, but since you're hitting the performance numbers on throughput (which don't need much CPU power) but not IOPS (which need a powerful CPU and optimized software stack to achieve), I think it's likely that your system can't take full advantage of the drive's theoretical IOPS.
Totally understand. I'd definitely like to rule out the misbehaving part so I'll be experimenting a bit with what I have.

But if I need to update my test bench system, how recent (and how powerful) of a CPU do I need to fully exercise an Optane drive? I'm assuming frequency is more important than cores, right? Are there any hardware considerations?
 

NateS

Member
Apr 19, 2021
81
42
18
Sacramento, CA, US
I believe the servers listed in the "compatible hardware" on arc are what the drive is guaranteed to hit the spec performance numbers, so using one of those or something newer would be the safe choice. But anything fairly high end from ~2018 or newer should work pretty well. And yes, single core speed is more important than number of cores, especially for the low queue depth tests. For higher queue depths, having more cores can be helpful, if the software is multithreaded enough to take advantage of it.

To be clear though, your current system is fully exercising the Optane drive -- it's just adding its own delays in addition, so the overall measurement is longer. For example (these are made up numbers, but they should be in the ballpark), if your OS and driver stack takes 20us to process each IO, and you speed up your drives latency 10x by switching from a 100us latency Nand drive to a 10us latency Optane, your overall latency falls from 120us to 30us, only a 4x improvement instead of the 10x you'd hoped for. If you upgrade other parts of your system to bring the software stack latency down, then you can get more benefit out of the low latency of the drive, which will help you achieve higher IOPS.
 

ajs

Active Member
Mar 27, 2018
100
35
28
Minnesota
When benchmarking SSDs you want to precondition the NAND in order to reach a steady state. The precondition workload will have an impact to following performance. This is heavily dependent on how the SSD architecture is setup, but typically for a random workload benchmark you want to precondition the drive with a similar random write pass of 2x the drive capacity to reach a steady state.

Similarly, if you are doing constant sequential write passes to the drive garbage collection should be highly optimized resulting in a write amplification close to 1. My take away here is the preconditioning of the NAND matters and can make benchmarking SSDs (especially synthetic benchmarks) quite complex.

As far as the low random IOPS you are seeing, I'd echo NateS and have you check on a host bottleneck as well as thermals. If the drive starts overheating it will throttle itself.

My experience is in the FTL back end, so there may also be things to consider from a NVMe host perspective.
 

lunadesign

Member
Aug 7, 2013
188
20
18
I believe the servers listed in the "compatible hardware" on arc are what the drive is guaranteed to hit the spec performance numbers, so using one of those or something newer would be the safe choice. But anything fairly high end from ~2018 or newer should work pretty well. And yes, single core speed is more important than number of cores, especially for the low queue depth tests. For higher queue depths, having more cores can be helpful, if the software is multithreaded enough to take advantage of it.

To be clear though, your current system is fully exercising the Optane drive -- it's just adding its own delays in addition, so the overall measurement is longer. For example (these are made up numbers, but they should be in the ballpark), if your OS and driver stack takes 20us to process each IO, and you speed up your drives latency 10x by switching from a 100us latency Nand drive to a 10us latency Optane, your overall latency falls from 120us to 30us, only a 4x improvement instead of the 10x you'd hoped for. If you upgrade other parts of your system to bring the software stack latency down, then you can get more benefit out of the low latency of the drive, which will help you achieve higher IOPS.
Thanks! All of this is really helpful.

BTW - I couldn't find the "compatible hardware" section you described on the Ark page for this drive. Or am I looking in the wrong place?
 

lunadesign

Member
Aug 7, 2013
188
20
18
When benchmarking SSDs you want to precondition the NAND in order to reach a steady state. The precondition workload will have an impact to following performance. This is heavily dependent on how the SSD architecture is setup, but typically for a random workload benchmark you want to precondition the drive with a similar random write pass of 2x the drive capacity to reach a steady state.

Similarly, if you are doing constant sequential write passes to the drive garbage collection should be highly optimized resulting in a write amplification close to 1. My take away here is the preconditioning of the NAND matters and can make benchmarking SSDs (especially synthetic benchmarks) quite complex.

As far as the low random IOPS you are seeing, I'd echo NateS and have you check on a host bottleneck as well as thermals. If the drive starts overheating it will throttle itself.

My experience is in the FTL back end, so there may also be things to consider from a NVMe host perspective.
Thanks for the helpful feedback!

If I understand you correctly, you're talking about preconditioning in terms of getting the drive to a real world situation where it's got a reasonable amount of data on it, right?

For my purposes, I'm primarily interested in quickly verifying that a newly acquired drive doesn't have anything obviously wrong with it so I can get a new replacement within the first 30 days. I generally buy a handful at a time and compare the results (from this batch and previous ones) and look for any results that don't roughly fall in line with the rest. As such, I'm not sure pre-filling the drives before testing actually buys me anything. But I'm kinda new at this, so please correct me if any of this is wrong!
 

NateS

Member
Apr 19, 2021
81
42
18
Sacramento, CA, US
Thanks! All of this is really helpful.

BTW - I couldn't find the "compatible hardware" section you described on the Ark page for this drive. Or am I looking in the wrong place?
My bad, I guess it wasn't on ARC but somewhere else on the Intel page. This is what I meant:

 

lunadesign

Member
Aug 7, 2013
188
20
18
My bad, I guess it wasn't on ARC but somewhere else on the Intel page. This is what I meant:

Thanks! This is helpful but I'm a bit surprised the only products listed are Q2 2021 ones to pair with a Q3 2018 SSD. I'm guessing "compatible" on this page means "compatible from the most recent lineup".
 

NateS

Member
Apr 19, 2021
81
42
18
Sacramento, CA, US
Thanks! This is helpful but I'm a bit surprised the only products listed are Q2 2021 ones to pair with a Q3 2018 SSD. I'm guessing "compatible" on this page means "compatible from the most recent lineup".
The P4801x was actually launched some time in 2019, it's the P4800x drives that were earlier. The equivalent pages for the P4800x has older options listed as compatible: