Floored by optane performance

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Rand__

Well-Known Member
Mar 6, 2014
6,626
1,767
113
Hm thats kinda surprising.
Ran the same commands against my zeus backed pool of 8 spinners and get faster perf throughout?


Not looking to turn this thread off course but weird - especially after @gea's results.


Code:
root@core ~/fio-2.2.10# ./fio --filename=test --direct=1 --rw=randrw --refill_buffers --norandommap --randrepeat=0 --ioengine=libaio --bs=4k --rwmixread=100 --iodepth=16 --numjobs=16 --runtime=60 --group_reporting --name=4kreadtest --size=12000M
4kreadtest: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=16
...
fio-2.2.10
Starting 16 processes
Jobs: 16 (f=16): [r(16)] [100.0% done] [60440KB/0KB/0KB /s] [15.2K/0/0 iops] [eta 00m:00s]
4kreadtest: (groupid=0, jobs=16): err= 0: pid=14144: Tue Jan  2 09:30:22 2018
  read : io=3925.2MB, bw=66981KB/s, iops=16745, runt= 60008msec
    slat (usec): min=1, max=113852, avg=484.11, stdev=2525.15
    clat (usec): min=51, max=639863, avg=14801.33, stdev=31350.04
     lat (usec): min=115, max=645943, avg=15285.61, stdev=31634.15
    clat percentiles (usec):
     |  1.00th=[  502],  5.00th=[ 1336], 10.00th=[ 2512], 20.00th=[ 3728],
     | 30.00th=[ 4640], 40.00th=[ 5472], 50.00th=[ 6432], 60.00th=[ 7456],
     | 70.00th=[ 9280], 80.00th=[12352], 90.00th=[25472], 95.00th=[67072],
     | 99.00th=[168960], 99.50th=[207872], 99.90th=[313344], 99.95th=[378880],
     | 99.99th=[497664]
    bw (KB  /s): min=  152, max=12826, per=6.33%, avg=4239.99, stdev=1927.83
    lat (usec) : 100=0.01%, 250=0.14%, 500=0.84%, 750=1.23%, 1000=1.24%
    lat (msec) : 2=4.11%, 4=15.55%, 10=49.75%, 20=15.28%, 50=5.40%
    lat (msec) : 100=3.66%, 250=2.54%, 500=0.24%, 750=0.01%
  cpu          : usr=0.14%, sys=0.65%, ctx=263530, majf=0, minf=168
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=1004845/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
   READ: io=3925.2MB, aggrb=66980KB/s, minb=66980KB/s, maxb=66980KB/s, mint=60008msec, maxt=60008msec

Disk stats (read/write):
    dm-0: ios=1004064/72, merge=0/0, ticks=8743012/155236, in_queue=8919360, util=99.93%, aggrios=1004796/39, aggrmerge=2/33, aggrticks=8711148/132476, aggrin_queue=8844088, aggrutil=99.77%
  sda: ios=1004796/39, merge=2/33, ticks=8711148/132476, in_queue=8844088, util=99.77%
Write

Code:
./fio --filename=test --direct=1 --rw=randrw --refill_buffers --norandommap --randrepeat=0 --ioengine=libaio --bs=4k --rwmixread=0 --iodepth=16 --numjobs=16 --runtime=60 --group_reporting --name=4kwritetest --size=12000M
4kwritetest: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=16
...
fio-2.2.10
Starting 16 processes
Jobs: 16 (f=16): [w(16)] [100.0% done] [0KB/25324KB/0KB /s] [0/6331/0 iops] [eta 00m:00s]
4kwritetest: (groupid=0, jobs=16): err= 0: pid=14109: Tue Jan  2 09:27:57 2018
  write: io=1218.8MB, bw=20703KB/s, iops=5175, runt= 60282msec
    slat (usec): min=2, max=2230.3K, avg=1548.82, stdev=20996.31
    clat (usec): min=310, max=2423.9K, avg=47875.43, stdev=127574.56
     lat (usec): min=316, max=2423.9K, avg=49424.48, stdev=129380.33
    clat percentiles (usec):
     |  1.00th=[ 1592],  5.00th=[ 4960], 10.00th=[ 7520], 20.00th=[10176],
     | 30.00th=[12352], 40.00th=[14656], 50.00th=[17792], 60.00th=[22400],
     | 70.00th=[29056], 80.00th=[45312], 90.00th=[100864], 95.00th=[168960],
     | 99.00th=[577536], 99.50th=[995328], 99.90th=[1777664], 99.95th=[2244608],
     | 99.99th=[2342912]
    bw (KB  /s): min=    6, max= 4055, per=7.13%, avg=1475.84, stdev=765.72
    lat (usec) : 500=0.01%, 750=0.07%, 1000=0.12%
    lat (msec) : 2=1.61%, 4=2.26%, 10=15.13%, 20=35.94%, 50=26.39%
    lat (msec) : 100=8.34%, 250=7.72%, 500=1.25%, 750=0.50%, 1000=0.16%
    lat (msec) : 2000=0.42%, >=2000=0.08%
  cpu          : usr=0.09%, sys=0.24%, ctx=85428, majf=0, minf=183
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=99.9%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=312001/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
  WRITE: io=1218.8MB, aggrb=20702KB/s, minb=20702KB/s, maxb=20702KB/s, mint=60282msec, maxt=60282msec

Disk stats (read/write):
    dm-0: ios=0/312027, merge=0/0, ticks=0/8702792, in_queue=8760896, util=99.92%, aggrios=0/312008, aggrmerge=0/12, aggrticks=0/8702352, aggrin_queue=8702292, aggrutil=99.86%
  sda: ios=0/312008, merge=0/12, ticks=0/8702352, in_queue=8702292, util=99.86%
 

Rand__

Well-Known Member
Mar 6, 2014
6,626
1,767
113
The drives where doing a zfs sync to a remote side (only 20 mbit upstrem but might impact minimum perf) during time of the test
 
  • Like
Reactions: gigatexal

nk215

Active Member
Oct 6, 2015
412
143
43
49
How did you setup an AIO ESXi based FreeNAS SLOG with a 900p? Is it a U2/M2 drives? As far as I understand, 900p PCI version won't work in passthru mode to FreeNAS VM.

Thanks
 

Rand__

Well-Known Member
Mar 6, 2014
6,626
1,767
113
The 900p is partially passed through via ESX vdisk. The impact is significantly less negative then with non 3DXPoint flash
 

nk215

Active Member
Oct 6, 2015
412
143
43
49
The 900p is partially passed through via ESX vdisk. The impact is significantly less negative then with non 3DXPoint flash
Does that mean the following?

+ Use the 900p as ESXi data store
+ Create a virtual HDD inside FreeNAS VM pointing to the above ESXi data store for location. Basically create a vmdk file.
+ Create a SLOG using the above virtual drive (vmdk file)

or use RDM?

Thanks
 

whitey

Moderator
Jun 30, 2014
2,766
868
113
41
Does that mean the following?

+ Use the 900p as ESXi data store
+ Create a virtual HDD inside FreeNAS VM pointing to the above ESXi data store for location. Basically create a vmdk file.
+ Create a SLOG using the above virtual drive (vmdk file)

or use RDM?

Thanks
Concur, you got it right and that is how I am configured.
 

whitey

Moderator
Jun 30, 2014
2,766
868
113
41
Yep thats the way to go.
Check this thread for performance comparisons
@Rand__ not sure why the iops perf dropoff on my end, I'll have to look at Gea's thread/numbers again. I do have another pool that is exactly the same w/ 4 husmm's in raidz w/ another husmm as a SLOG, guess I can float the fio test VM there and take another run. Caught me off guard for sure as I know I have a 6 disk raidz2 spinner pool w/ a ZeusRAM behind it that i believe did better as well iops-wise. Can try that as well, hell maybe I mis-configured something.
 

nk215

Active Member
Oct 6, 2015
412
143
43
49
I also have a AIO ESXi setup. What's the benefit in an AIO to have FreeNAS served back for datastore to store VM? I also have an AIO but my VMs are on local PCi based SSD datastore (P3700, P3605, Virident etc) and the performance is great. I don't know what kind of hardware is needed to get similar performance from a served back FreeNAS.

I tried a 4x S3500 in FreeNAS and the performance is nowhere near as good with sync=disable and 32gig memory. With sync=always, there's no hope.

I also have a NAS on my AIO setup and use it to store bulk data (movies, music etc). Random 4K read/write is not that critical there. With samba share to my Flex, I can still get 4.5K IOPS (4K randome read/write Q=1 T1) and 15K IOPS (random read/write 4K Q=32 T1) from a 6-drive RAID6.

What am I missing?
 

gea

Well-Known Member
Dec 31, 2010
3,141
1,182
113
DE
The advantages of ZFS storage over lokale vmfs storage are
- much higher level of data security due CopyOnWrite (crash resistent) and checksums
- unlimited (ZFS) snaps
- easy and fast access to VMs via NFS/SMB for copy/move/backup
- ZFS replication
- superior ram cache
- superior write security on a crash due ZIL/Slogs

You do not need a lot of CPU but RAM to get the full performance out of ZFS

see (with different RAM settings and vs barebone)
http://napp-it.org/doc/downloads/optane_slog_pool_performane.pdf

My tests are done on Solarish but a FreeNAS AiO is build originally on my/this idea.
 
Last edited:
  • Like
Reactions: K D and gigatexal

nk215

Active Member
Oct 6, 2015
412
143
43
49
That's the advantage between ZFS NAS and other NAS OS.

For my use case, the OS and programs (basically the OS drive) are on a local data store which gives me great performance. Data are stored on a NAS (btrfs on my case for offline dedup, snap shots). I do backup of my OS drive which doesn't change that much from one week/month to another. If the data store #1 went down or OS drive on data store #1 for corrupted for some reason, I can always restore the OS drive to data store #2 in a few mins. Data is safe either way.

I would love to have all the benefit of ZFS/btrfs on a local data store but right now, I am forced to pick those benefit vs. speed. For the OS drive, speed is more important to me (again Data is a safe either way).

In the old day with desktop PC. My OS/programs is on a SSD drive for performance. Data is on RAID1 disks for high availability. OS drive get imaged to another drive often or before any big changes are made to the OS. I use the same concept with local data store and samba mapped data drive currently.
 

vanfawx

Active Member
Jan 4, 2015
365
67
28
45
Vancouver, Canada
zpool iostat while running fio test - note SLOG doesn't seem to be hit[b/]
Code:
                                           capacity     operations    bandwidth
pool                                    alloc   free   read  write   read  write
--------------------------------------  -----  -----  -----  -----  -----  -----
husmm1640-rz                            1.20T   255G    798  5.31K  99.5M   294M
  raidz1                                1.20T   255G    798  4.16K  99.5M   284M
    gptid/c7ad6be0-e056-11e7-a2f9-0050569a060b      -      -    595  2.01K  25.6M  97.4M
    gptid/c7de27b9-e056-11e7-a2f9-0050569a060b      -      -    581  2.08K  23.4M  96.3M
    gptid/c80db279-e056-11e7-a2f9-0050569a060b      -      -    604  2.03K  26.0M  97.4M
    gptid/c83b669a-e056-11e7-a2f9-0050569a060b      -      -    605  2.07K  24.3M  97.2M
logs                                        -      -      -      -      -      -
  gptid/c85e0fba-e056-11e7-a2f9-0050569a060b  15.2M  15.9G      0  1.15K      0  10.1M
cache                                       -      -      -      -      -      -
  gptid/c8864a6f-e056-11e7-a2f9-0050569a060b   151G  16.0E    393    319  48.8M  39.9M
--------------------------------------  -----  -----  -----  -----  -----  -----
If you want to force the SLOG to be used, set "sync=always" on the dataset you're using for testing.
 

acquacow

Well-Known Member
Feb 15, 2017
784
439
63
42
Coming from my Fusion-io background were we frequently built boxes that did 1M IOPS per server (7+ years ago), it's nice to see another new flash tech that performs well and has potential.
 

_alex

Active Member
Jan 28, 2016
866
97
28
Bavaria / Germany
I only get decent performance in KVM when exposing multiple volumes on the P900 and RAID-0 them in the guest.
Seems like KVM has some sort of per-volume limit at around 160k IOPS.
Maybe this helps with ESXi, too.

4x 10GB Volumes (from LVM on partition, virtio, iothread, cache=directsync set in proxmox)
mdadm RAID-0 in guest
kvm guest debian 9, 8 CPU (type=host), 2GB RAM, machine-type q35

reads:

Code:
Jobs: 4 (f=4): [r(4)] [100.0% done] [2054MB/0KB/0KB /s] [526K/0/0 iops] [eta 00m:00s]
test: (groupid=0, jobs=4): err= 0: pid=1518: Wed Jan 10 15:22:57 2018
  read : io=120572MB, bw=2009.6MB/s, iops=514433, runt= 60001msec
    slat (usec): min=2, max=1077, avg= 5.26, stdev= 3.28
    clat (usec): min=35, max=2665, avg=490.84, stdev=91.09
     lat (usec): min=40, max=2920, avg=496.11, stdev=91.91
    clat percentiles (usec):
     |  1.00th=[  370],  5.00th=[  390], 10.00th=[  402], 20.00th=[  414],
     | 30.00th=[  426], 40.00th=[  446], 50.00th=[  478], 60.00th=[  498],
     | 70.00th=[  516], 80.00th=[  540], 90.00th=[  644], 95.00th=[  668],
     | 99.00th=[  708], 99.50th=[  732], 99.90th=[  980], 99.95th=[ 1192],
     | 99.99th=[ 1576]
writes:
Code:
Jobs: 4 (f=4): [w(4)] [100.0% done] [0KB/1328MB/0KB /s] [0/340K/0 iops] [eta 00m:00s]
test: (groupid=0, jobs=4): err= 0: pid=1534: Wed Jan 10 15:25:54 2018
  write: io=81655MB, bw=1360.1MB/s, iops=348390, runt= 60001msec
    slat (usec): min=3, max=2783, avg= 9.01, stdev= 5.10
    clat (usec): min=47, max=4840, avg=724.29, stdev=104.48
     lat (usec): min=53, max=4850, avg=733.30, stdev=105.08
    clat percentiles (usec):
     |  1.00th=[  540],  5.00th=[  596], 10.00th=[  620], 20.00th=[  644],
     | 30.00th=[  668], 40.00th=[  684], 50.00th=[  708], 60.00th=[  732],
     | 70.00th=[  764], 80.00th=[  804], 90.00th=[  860], 95.00th=[  900],
     | 99.00th=[ 1020], 99.50th=[ 1112], 99.90th=[ 1448], 99.95th=[ 1656],
     | 99.99th=[ 2096]
FIO was 4 jobs only:
Code:
#/bin/sh
fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=0 --name=test \
        --filename=test --bs=4k \
        --iodepth=64 --readwrite=randrw --rwmixread=0 \
        --size=4G --numjobs=4 --group_reporting --runtime=60 --time_based
What is really cool is that clat stays low, too.
Max 2.6ms read / 4.8ms write ...

Alex
 

wildpig1234

Well-Known Member
Aug 22, 2016
2,197
443
83
49
Here's my poor man's answer to optane: RAM disk :)...lol

Even faster than optane.

I use the ram disk as the system temp dir. Save a lot of wear on the system ssd.

There are obvious advantages such as auto erase of sensitive data every time you turn off the pc

Obviously this is volatile memory. But if you have the right situation, this actually works even better and cheaper than optane.

SoftPerfect Virtual Disk_12GB_1GB-20180714-1914.png
 
Mar 28, 2018
32
3
8
RAM disks are certainly faster but only cheaper if your using DDR3. The great thing right now is that ive been grabbing slightly used 280GB 900Ps from gamers off eBay for a average price of $265. At or under $1/GB, optane is a potent alternative to NAND based ssds.
 

acquacow

Well-Known Member
Feb 15, 2017
784
439
63
42
Funny thing though, ramdisks can actually be slower than ssd when it comes to things like database workloads/etc... the CPU has to work doubletime to handle the context switching and converting from memory to block and back to memory again/etc...

You get some good enterprise SSDs that do DMA and you'll fly performance wise.
 
  • Like
Reactions: BackupProphet

wildpig1234

Well-Known Member
Aug 22, 2016
2,197
443
83
49
well, from what i read, optane technology is actually limited by the current pci-e bus so it can actually be even faster.

The fact that optane is an order of magnitude higher endurance than NAND as well as no housekeeping overhead is good.. but obviously its still not quite RAM which has virtually unlimited endurance.