The Million IOPS Club Thread (Or more than 1 Million IOPS)

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

JDM

Member
Jun 25, 2016
44
22
8
33
@JDM , what about fio write I/O numbers? Nice read iops run btw!
Here those are below, took that benchmark as well but figured people wouldn't want a single post to be so long :)

Config:
[global]
thread
ioengine=libaio
direct=1
buffered=0
group_reporting=1
rw=randwrite
bs=4k
iodepth=64
numjobs=2
size=50%

[job1]
filename=/dev/nvme1n1

[job2]
filename=/dev/nvme2n1

Output:
job1: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64
...
job2: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64
...
fio-2.16
Starting 4 threads
Jobs: 4 (f=4): [w(4)] [100.0% done] [0KB/4338MB/0KB /s] [0/1111K/0 iops] [eta 00m:00s]
job1: (groupid=0, jobs=4): err= 0: pid=123737: Tue Nov 7 13:15:07 2017
write: io=534182MB, bw=4322.6MB/s, iops=1106.6K, runt=123580msec
slat (usec): min=1, max=30029, avg= 1.94, stdev= 2.67
clat (usec): min=42, max=30240, avg=228.85, stdev=38.90
lat (usec): min=43, max=30242, avg=230.84, stdev=38.97
clat percentiles (usec):
| 1.00th=[ 191], 5.00th=[ 193], 10.00th=[ 195], 20.00th=[ 197],
| 30.00th=[ 199], 40.00th=[ 211], 50.00th=[ 231], 60.00th=[ 235],
| 70.00th=[ 239], 80.00th=[ 258], 90.00th=[ 274], 95.00th=[ 294],
| 99.00th=[ 326], 99.50th=[ 338], 99.90th=[ 366], 99.95th=[ 378],
| 99.99th=[ 402]
lat (usec) : 50=0.01%, 100=0.01%, 250=78.37%, 500=21.63%
lat (msec) : 50=0.01%
cpu : usr=22.30%, sys=60.08%, ctx=12805960, majf=0, minf=4
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued : total=r=0/w=136750572/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
WRITE: io=534182MB, aggrb=4322.6MB/s, minb=4322.6MB/s, maxb=4322.6MB/s, mint=123580msec, maxt=123580msec

Disk stats (read/write):
nvme1n1: ios=162/68310148, merge=0/0, ticks=0/15535956, in_queue=16306048, util=100.00%
nvme2n1: ios=162/68297331, merge=0/0, ticks=0/15516168, in_queue=16584844, util=100.00%
 

JDM

Member
Jun 25, 2016
44
22
8
33
What about pools of NVMe on ZFS? Could that work? With maybe some mirrored optanes read cache etc
These are now (after raw device benchmarking) in a mirrored zpool to host VMs, will let you know how that goes as soon as I get time...which may not be until this weekend.
 

i386

Well-Known Member
Mar 18, 2016
4,245
1,546
113
34
Germany
Wow. I wonder if the SAN vendors are scared by what could be some competent DIY offerings.
I think they will stay for a while.
The real strength of 3dxpoint will be servers with many nv-dimm slots and large (128+ gb) nvdimms replacing nvme/sas ssds, bypassing the "pcie bottleneck".
Tape and mainframes are still alive so FC has decades ahead.
In 2014/15 our company had a project for a customer to see how much it would cost to migrate from ibm system z to x86 servers and run the applications on x86...
Customer bought 2 new ibm z13 after that project :D
 
  • Like
Reactions: gigatexal

nkw

Active Member
Aug 28, 2017
136
48
28
Ceph can't even use a single nvme drives performance yet... its bottlenecked... Optane would be a waste.
Huh? Are you just referring to the fact you might need to run multiple OSD instances per physical nvme drive?
 

Patriot

Moderator
Apr 18, 2011
1,451
792
113
Huh? Are you just referring to the fact you might need to run multiple OSD instances per physical nvme drive?
Aaaaaand that negates redundancy or the reason you are using ceph. It is used for benchmark numbers not production.
 

gigatexal

I'm here to learn
Nov 25, 2012
2,913
607
113
Portland, Oregon
alexandarnarayan.com