The Million IOPS Club Thread (Or more than 1 Million IOPS)

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Patrick

Administrator
Staff member
Dec 21, 2010
12,511
5,792
113
The next iteration of this kind of thread should be an STH OLTP benchmark that installs a standard DB, configures it in a standard way, and then runs a production like load test to test both the CPUs and the disks. Iops is neat and all but it's still a bit abstract.
I have ideas on how to do this, but only so much time in the day :-/ Happy to support a project though.

Also, the 24 NVMe drive 4K random tests generate too much CPU utilization with iometer.
 
  • Like
Reactions: gigatexal

CookiesLikeWhoa

Active Member
Sep 7, 2016
112
26
28
35
Seems like this might be the place for this question, but what type of CPU would be recommended for ZFS based NVMe server?

We were thinking of picking up a Supermicro 2028U-TN24R4T+ and running two E5 2687W v4's in it /w 512gb's of RAM and (initially) 10 800GB p3700's. The processors are way overkill for a storage server, so we were planning on running the system as an ESXI host. We would passthru the NVMe connections along with two 40GbE, and run it with 6vCPUs and 128GB's of RAM, utilizing the rest of the system as a performance platform for other VM's.

The storage server would be serving iSCSI targets over 40GbE for our workstations.

The concern is that the processors may not be quick enough to keep up with the storage. So we were considering a pair of E5-2667 V4's or even a pair of 2643 V4's. Any recommendations?

Thank you in advance!
 
  • Like
Reactions: gigatexal

Patrick

Administrator
Staff member
Dec 21, 2010
12,511
5,792
113
You are going to want V4's capable of DDR4-2400.

Bandwidth wise that server can handle multiple 40GbE links.

On the drive side, I would swap to, at most, P3600's. Still plenty of performance but you can get more capacity.

Let me just ask, interested in trying a system for a few days? We have one in the lab with E5-2698 V4s. It has been used for several use cases like this. Shoot me a PM. The demos that have been done on that machine (storage cluster and hyper converged) have yielded some big changes in planned configurations.
 
  • Like
Reactions: CookiesLikeWhoa

CookiesLikeWhoa

Active Member
Sep 7, 2016
112
26
28
35
The 2698's aren't exactly brimming with speed either, so that gives me hope.

Noted on the P3600's. They are more reasonably priced than the P3700 and still perform well. My only concern might be the loss of endurance, but even then the 3600's seem to have a 2PBW, which we would probably replace them due to capacity before they wear out.
 

JDM

Member
Jun 25, 2016
44
22
8
33
Reviving this old thread now that we have some affordable Optane from Intel

  • System: Custom Built Supermicro
  • RAM: 96GB RAM
  • CPU: 1x Xeon Silver 4114
  • OS: Debian Stretch 9.2 (Proxmox 5.1)
  • OS Drives: 1 x Intel 600p 128GB
  • Drives being tested: 2x Intel 280 AIC Optane SSDs
  • Capacity being tested: 560GB
  • Acquire date: 11/ 2017
  • Approximate Cost: $3,700
Results
  • Tool Used: FIO
  • 4K Random IOPS: 1.17 million
Config:
[global]
thread
ioengine=libaio
direct=1
buffered=0
group_reporting=1
rw=randread
bs=4k
iodepth=64
numjobs=2
size=50%

[job1]
filename=/dev/nvme1n1

[job2]
filename=/dev/nvme2n1

Output:
job1: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64
...
job2: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64
...
fio-2.16
Starting 4 threads
Jobs: 4 (f=4): [r(4)] [100.0% done] [4587MB/0KB/0KB /s] [1174K/0/0 iops] [eta 00m:00s]
job1: (groupid=0, jobs=4): err= 0: pid=122712: Tue Nov 7 12:59:32 2017
read : io=534182MB, bw=4582.8MB/s, iops=1173.2K, runt=116564msec
slat (usec): min=0, max=85, avg= 1.90, stdev= 0.74
clat (usec): min=9, max=424, avg=215.80, stdev= 9.02
lat (usec): min=10, max=425, avg=217.74, stdev= 8.98
clat percentiles (usec):
| 1.00th=[ 201], 5.00th=[ 205], 10.00th=[ 207], 20.00th=[ 209],
| 30.00th=[ 211], 40.00th=[ 215], 50.00th=[ 217], 60.00th=[ 219],
| 70.00th=[ 219], 80.00th=[ 221], 90.00th=[ 223], 95.00th=[ 235],
| 99.00th=[ 249], 99.50th=[ 251], 99.90th=[ 258], 99.95th=[ 270],
| 99.99th=[ 278]
lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%, 250=99.19%
lat (usec) : 500=0.81%
cpu : usr=20.83%, sys=66.71%, ctx=31934106, majf=0, minf=260
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued : total=r=136750572/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
READ: io=534182MB, aggrb=4582.8MB/s, minb=4582.8MB/s, maxb=4582.8MB/s, mint=116564msec, maxt=116564msec

Disk stats (read/write):
nvme1n1: ios=68335281/0, merge=0/0, ticks=14676432/0, in_queue=15530816, util=100.00%
nvme2n1: ios=68298772/0, merge=0/0, ticks=14674828/0, in_queue=15676884, util=100.00%
 

gigatexal

I'm here to learn
Nov 25, 2012
2,913
607
113
Portland, Oregon
alexandarnarayan.com
Reviving this old thread now that we have some affordable Optane from Intel

  • System: Custom Built Supermicro
  • RAM: 96GB RAM
  • CPU: 1x Xeon Silver 4114
  • OS: Debian Stretch 9.2 (Proxmox 5.1)
  • OS Drives: 1 x Intel 600p 128GB
  • Drives being tested: 2x Intel 280 AIC Optane SSDs
  • Capacity being tested: 560GB
  • Acquire date: 11/ 2017
  • Approximate Cost: $3,700
Results
  • Tool Used: FIO
  • 4K Random IOPS: 1.17 million
Config:
[global]
thread
ioengine=libaio
direct=1
buffered=0
group_reporting=1
rw=randread
bs=4k
iodepth=64
numjobs=2
size=50%

[job1]
filename=/dev/nvme1n1

[job2]
filename=/dev/nvme2n1

Output:
job1: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64
...
job2: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64
...
fio-2.16
Starting 4 threads
Jobs: 4 (f=4): [r(4)] [100.0% done] [4587MB/0KB/0KB /s] [1174K/0/0 iops] [eta 00m:00s]
job1: (groupid=0, jobs=4): err= 0: pid=122712: Tue Nov 7 12:59:32 2017
read : io=534182MB, bw=4582.8MB/s, iops=1173.2K, runt=116564msec
slat (usec): min=0, max=85, avg= 1.90, stdev= 0.74
clat (usec): min=9, max=424, avg=215.80, stdev= 9.02
lat (usec): min=10, max=425, avg=217.74, stdev= 8.98
clat percentiles (usec):
| 1.00th=[ 201], 5.00th=[ 205], 10.00th=[ 207], 20.00th=[ 209],
| 30.00th=[ 211], 40.00th=[ 215], 50.00th=[ 217], 60.00th=[ 219],
| 70.00th=[ 219], 80.00th=[ 221], 90.00th=[ 223], 95.00th=[ 235],
| 99.00th=[ 249], 99.50th=[ 251], 99.90th=[ 258], 99.95th=[ 270],
| 99.99th=[ 278]
lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%, 250=99.19%
lat (usec) : 500=0.81%
cpu : usr=20.83%, sys=66.71%, ctx=31934106, majf=0, minf=260
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued : total=r=136750572/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
READ: io=534182MB, aggrb=4582.8MB/s, minb=4582.8MB/s, maxb=4582.8MB/s, mint=116564msec, maxt=116564msec

Disk stats (read/write):
nvme1n1: ios=68335281/0, merge=0/0, ticks=14676432/0, in_queue=15530816, util=100.00%
nvme2n1: ios=68298772/0, merge=0/0, ticks=14674828/0, in_queue=15676884, util=100.00%
Wow. I wonder if the SAN vendors are scared by what could be some competent DIY offerings.
 

Monoman

Active Member
Oct 16, 2013
408
160
43
If I were a SAN vendor I would be most concerned about Redhat + Ceph + COTS hardware eating my lunch.
very much this.

Just need to get support contracts on par with SAN vendors for software (RH + CEPH) on COTS hardware.
 

Evan

Well-Known Member
Jan 6, 2016
3,346
598
113
That’s crazy performance compared to what some big vendors top end SAN (FC connected) arrays do. Of course ceph etc does not always work so no choice but still good to know if it does work there is options
 

cheezehead

Active Member
Sep 23, 2012
723
175
43
Midwest, US
hasn't fiberchannel been dead for ages anyway?
Not yet, too much of it is out there. Even the first iteration of NVMe over Fabrics is via 32GB FC. Given how much of the market has been taken over by 10GB/40GB iSCSI/NFS/CIFS the writing may be on the wall but it will be decades before large enterprises replace all their FC gear with something else.
 

Evan

Well-Known Member
Jan 6, 2016
3,346
598
113
FC also has the advantage of being much safer than ethernet, it’s generally safe to use across security zones.
I would have expected shared storage in general to be replaced with replicated storage mostly by application replication rather than system level.
 

Patriot

Moderator
Apr 18, 2011
1,450
789
113
Ceph can't even use a single nvme drives performance yet... its bottlenecked... Optane would be a waste.
This is impressive, this is what I hit in 2014 with 16 drives off a single p431 controller.

Intel just made AMD's pcie lanes worth more with optane.