Intel Optane (32G/800P/900P) for ZFS pools and Slog on Solaris and OmniOS

Discussion in 'Solaris, Nexenta, OpenIndiana, and napp-it' started by gea, May 14, 2018.

  1. gea

    gea Well-Known Member

    Joined:
    Dec 31, 2010
    Messages:
    1,855
    Likes Received:
    612
    I have got a pair of Intel Optane 800P and updated my pdf with benchmarks
    at http://napp-it.org/doc/downloads/optane_slog_pool_performane.pdf

    Some important results especially for the critical sync write vs async write value
    that is the most important value for databases, VM storage or even a filer when you want
    to ensure a write behaviour where a crash during writes does not result in a dataloss
    of commited writes.

    Benchmark with a Filebench singlestreamwrite

    Single Optane 32G (basic pool) on OmniOS
    Code:
    Fb4 singlestreamwrite.f sync=always sync=disabled
                             213.4 MB/s    403.4 MB/s
    not so bad but far below the bigger models 800P or 900P especially on async values

    Single Optane 800P-118 (basic pool) on OmniOS

    Code:
    Fb4 singlestreamwrite.f sync=always sync=disabled
                        202.8 MB/s   689.8 MB/s

    Single 900P-280 (basic pool) on OmniOS

    Code:
    Fb4 singlestreamwrite.f sync=always sync=disabled
                        674.4 MB/s   1944.5 MB/s
    A single 900P is 3x as fast than the 800P

    Dual 800P-118 (Raid-0) on OmniOS

    Code:
    Fb4 singlestreamwrite.f sync=always sync=disabled
                        304.6 MB/s   1076.3 MB/s
    A single 900P is much faster than a Raid-0 of two 800P


    Dual 800P-118 (Raid-0) on Solaris 11.4
    Code:
    Fb4 singlestreamwrite.f sync=always sync=disabled
                        459.8 MB/s   1376.1 MB/s
    
    Solaris is faster than OmniOS/ OpenZFS

    Dual 900P-280 (Raid-0) on OmniOS
    Code:
    Fb4 singlestreamwrite.f sync=always sync=disabled
                        824.2 MB/s   1708.4 MB/s
    Dual 900P-280 (Raid-0) on Solaris 11.4
    Code:
    Fb4 singlestreamwrite.f sync=always sync=disabled
                        938.2 MB/s   2882.2 MB/s
    
    Solaris is faster than OmniOS/ OpenZFS

    16 x SSD Sandisk Extreme Pro 960 in Raid-0 without Slog
    Code:
    Fb4 singlestreamwrite.f sync=always sync=disabled
                        69.2 MB/s   1759.7 MB/s
    69 MB/s sync write performance, good for an 1G network but far below the
    async value of 1759 MB/s

    16 x SSD Sandisk Extreme Pro 960 in Raid-0 with an Optane 800P Slog
    Code:
    Fb4 singlestreamwrite.f sync=always sync=disabled
                        346.4 MB/s   2123.5 MB/s
    
    There is a hefty improvement with an Optane Slog and even a massive SSD Raid-0 pool
    as the SSDs (even the Sandisk extreme) are much slower than a single Optane 800P

    16 x SSD Sandisk Extreme Pro 960 in Raid-0 with an Optane 900P Slog
    Code:
    Fb4 singlestreamwrite.f sync=always sync=disabled
                        348.8 MB/s   2338.1 MB/s
    
    The difference from 800P to 900P is minimal when used as an Slog for an SSD pool.


    Overall:
    If you need high sync write values
    1.) use an Optane 800p 58/118 GB (care about the 365 TBW endurance)
    2.) use a better Optane 900P/905P
    or the enterprise 4800X (not faster but with guaranteed powerloss protection)
    3.) there is no option three

    Intel Optane is a game changer technology
     
    #1
    Last edited: May 14, 2018
    lowfat, tjk, Aluminum and 5 others like this.
  2. Aluminum

    Aluminum Active Member

    Joined:
    Sep 7, 2012
    Messages:
    417
    Likes Received:
    41
    Thank you for providing a link I can shove in people's faces when I say even just ~$50 for a little m.2 card is like nitro for ZFS.
     
    #2
  3. gea

    gea Well-Known Member

    Joined:
    Dec 31, 2010
    Messages:
    1,855
    Likes Received:
    612
    You can mirror Slogs. This keeps full performance even when one dies and avoids a dataloss in case of a crash during a write with a damaged Slog at this point. An Slog failure at any other time is uncritical as ZFS then reverts to the slower onpool ZIL for logging. Think of a mirrored Slog like a hardwareraid with cache and two battery units.

    If you simply add more than one Slog to a pool, you will do a load balancing between them so each must only do a part of the load with the result of a better performance.

    16 x SSD Sandisk Extreme Pro 960 in Raid-0 with two Optane 800P Slog (load balancing)
    Code:
    Fb4 singlestreamwrite.f sync=always sync=disabled
                             476.0 MB/s   2380.3 MB/ss
    compared to one 800P as slog
    346 MB/s -> 476 MB/s sequential sync write, around 20% better


    16 x SSD Sandisk Extreme Pro 960 in Raid-0 with two Optane 800P Slog + 900P Slog
    Code:
    Fb4 singlestreamwrite.f sync=always sync=disabled
                             544.0 MB/s   2354,5 MB/ss
    the third Optane Slog gives another but lower improvement (although the 900P is faster)
    476 MB/s -> 544 MB/s, another 15%

    Especially with the quite cheap M.2 Optane 800P-58 it seems a good idea to use more than one for Slog load balancing to improve performance. This also overcomes their limited write endurance of only 365 TBW as each must only take half of the write load.

    Cheap PCI-e boards for two or four M.2 seems a good addon if the mainboard supports bifurcation
    ex Super Micro Computer, Inc. - Products | Accessories | Add-on Cards | AOC-SLG3-2M2

    Overall
    best lowcost Slog for lab use: such a card with 2 x 800P-58 as Slog optionally together with onboard M.2 as this gives extreme good sync write values and the low write endurance of the cheaper Optane is no longer a problem. This is also an option if you want to build a cheap but ultra high performance Raid-Z1 pool from some of them with a good write endurance.

    Anyone seen a cheap 4 x M.2 PCI-E 2x or 4x adapter,?
     
    #3
    Last edited: May 15, 2018
  4. SlickNetAaron

    SlickNetAaron Member

    Joined:
    Apr 30, 2016
    Messages:
    50
    Likes Received:
    12
    This is really interesting!

    Thoughts on why using multiple slog devices is only a marginal improvement over single slog?


    Sent from my iPhone using Tapatalk
     
    #4
  5. gea

    gea Well-Known Member

    Joined:
    Dec 31, 2010
    Messages:
    1,855
    Likes Received:
    612
    read the nice blog from nex7

    Sync write: "it is single queue depth random sync write with cache flush"
    Nex7's Blog: ZFS Intent Log

    Two Slogs is like a small corridor with two doors at the end. The two doors increase the number of persons that are able to cross the corridor a little but will not double.

    With the 800P you have such an improvement in throughput (20% is not too bad) and you double write endurance of them as every 800P must write only half of data. The second aspect is more important and the performance increase is a nice add-on
     
    #5
  6. Rand__

    Rand__ Well-Known Member

    Joined:
    Mar 6, 2014
    Messages:
    2,480
    Likes Received:
    323
    Would this work/have a positive effect with multiple slices of the same 900p also? i.e. pass through 2-3 virtual disks of a 280GB Optane ?
     
    #6
  7. gea

    gea Well-Known Member

    Joined:
    Dec 31, 2010
    Messages:
    1,855
    Likes Received:
    612
    I have added a benchmark series with multiple Slog partitions from same Optane 900P
    Result: There is no performance improvement (like the 20% from a second 800P) and
    as there is no improvement regarding endurance this is not recommended.

    This is a barebone setup

    11 xHGST Ultrastar 7K4000 2TB in Raid-0 without Slog
    Code:
    Fb4 singlestreamwrite.f sync=always sync=disabled
                             36,0 MB/s   1211,5 MB/s

    11 xHGST Ultrastar 7K4000 2TB in Raid-0 with one Slog partition from a 900P
    Code:
    Fb4 singlestreamwrite.f sync=always sync=disabled
                             539,6 MB/s  1325,9 MB/s
    Improvement of an Optane Slog is dramatic (factor 15)


    11 xHGST Ultrastar 7K4000 2TB in Raid-0 with two Slog partition from a 900P
    Code:
    Fb4 singlestreamwrite.f sync=always sync=disabled
                             547,4 MB/s   1380,7 MB/s
    not worth the effort


    11 xHGST Ultrastar 7K4000 2TB in Raid-0 with three Slog partition from a 900P
    Code:
    Fb4 singlestreamwrite.f sync=always sync=disabled
                             495,4 MB/s   1319,5 MB/s
    not worth the effort, even slightly slower
     
    #7
  8. T_Minus

    T_Minus Moderator

    Joined:
    Feb 15, 2015
    Messages:
    6,392
    Likes Received:
    1,304
    16 x SSD Sandisk Extreme Pro 960 in Raid-0 with an Optane 900P Slog = 348MB/s

    Yet

    11 xHGST Ultrastar 7K4000 2TB in Raid-0 with one Slog partition from a 900P = 540MB/s


    Your test shows that 16 SSD in RAID0 is actually significantly SLOWER than 11 2TB Spinning HDD.

    How is that possible?

    Why are you not testing with 16x S3500 or S3610 or S3710 or other enterprise SSD not consumer "Extreme" drives? These results seem incorrect.
     
    #8
  9. gea

    gea Well-Known Member

    Joined:
    Dec 31, 2010
    Messages:
    1,855
    Likes Received:
    612
    Simply because I do not have them laying around. I try to use the same equipment for a test sequence but mostly I can use them only for a short time.

    The question for the last test was whether multiple Slog partitions on an Optane are helpful. For this question a slower pool regarding iops seems more relevant than an ultra fast one as I do not need absolute values but a trend.

    And yes, all my tests have shown that (at least for sequential sync writes) a disk based pool with an Optane as Slog can give very high sync write values in this case nearly 40% of the async value (where disk based pools are very good). This is because the disk pool sees only large async writes where all the critical small sync writes are going to the Optane. This can be even faster than on SSD based pools where you have Flash related limits on writes.

    btw
    The Sandisk Extreme was beside the Samsung Pro one of the best desktop SSD two years ago. If you compare this with the benchmarks in https://forums.servethehome.com/index.php?threads/ssd-performance-issues-again.19625/page-2 with 7 mirrors of DC S3700 with 477 MB/s sync write with an Optane Slog, the Sandisk value is as expected.

    For a test that checks pure iops without cache effects, the results will be different. But sequential sync write values are not too bad to quantify a system given the superiour read/write RAM caches in ZFS.
     
    #9
  10. T_Minus

    T_Minus Moderator

    Joined:
    Feb 15, 2015
    Messages:
    6,392
    Likes Received:
    1,304
    You can get a single (1) high quality consumer SSD install it in your desktop and transfer sequentially faster than you can with 16 in raid0 SSD on ZFS but not only that the HDD spinning pool with less drives is faster too with the same SLOG device.

    Something is not right or the test is severely flawed or those SSD in fact can't do what they claim sequentially.
     
    #10
  11. gea

    gea Well-Known Member

    Joined:
    Dec 31, 2010
    Messages:
    1,855
    Likes Received:
    612
    This is what at least the Filebench singlestreamwrite sync workload shows to me. Sequential sync write values scale very bad with number of pool disks. It is the Slog that determines the overall value, less the pool itself as there is no small random write to the pool.

    A test mainly for random read without ramcache may give completely different results as then the iops is relevant where a SSD is far better than a disk. But the question here is Slog and the performance degration compared to async writes.
     
    #11
  12. T_Minus

    T_Minus Moderator

    Joined:
    Feb 15, 2015
    Messages:
    6,392
    Likes Received:
    1,304
    Exactly, and that still seems like incorrect results.

    Both the HDD and SSD pool had Optane 900P SLOG device for your test, and yet the spinning HDD performed better.

    To me that's a huge issue with either ZFS, OS or the benchmark itself.
     
    #12
  13. gea

    gea Well-Known Member

    Joined:
    Dec 31, 2010
    Messages:
    1,855
    Likes Received:
    612
    Yes, at least with this special Filebench workload (and any other workload gives different results) that I have commonly used on recent tests.

    Its not the absolute value ex if 300 MB/s or 400 MB/s sync with SSDs or disks compared to the async 1-2 GB/s, its the trend that syncronous sync write is mostly related to Slog quality not pool quality with very good write results with disk based pools.

    If you have an Optane around you can try any other sync workload on a pool ex sync dd vs async dd and with or without Optane slog to see if the trend is different (dd write values should be slighty higher or different to Filebench singlestreamwrite but the trend should be the same).
     
    #13
  14. Aluminum

    Aluminum Active Member

    Joined:
    Sep 7, 2012
    Messages:
    417
    Likes Received:
    41
    Asus and Asrock make retail versions, easy to find in online here just under $100.

    Single slot full-height quad m.2 with sizes up to 22110 and a heat "shroud", fan, lights, etc. Motherboard & bios absolutely must support x4x4x4x4 bifurcation on the x16 slots though. You could use u.2 drives with adapters if you really wanted, probably cheaper than trying to source an OEM server u.2 cage and proprietary expansion card.

    I have two for playing with on my threadrippers, knowing my buying habits I will have plenty of spare nvme ssds to use eventually.
     
    #14
  15. Rand__

    Rand__ Well-Known Member

    Joined:
    Mar 6, 2014
    Messages:
    2,480
    Likes Received:
    323
    #15
Similar Threads: Intel Optane
Forum Title Date
Solaris, Nexenta, OpenIndiana, and napp-it SmartOS and Intel DC P4800X Jul 5, 2018
Solaris, Nexenta, OpenIndiana, and napp-it Oracle Solaris 11.3 and Intel X552/X554 10GbE drivers May 21, 2017
Solaris, Nexenta, OpenIndiana, and napp-it OmniOS and Xeon D - do Intel X552/X557 NIC drivers work? Oct 20, 2016
Solaris, Nexenta, OpenIndiana, and napp-it ZFS performance vs RAM, AiO vs barebone, HD vs SSD/NVMe, ZeusRAM Slog vs NVMe/Optane Dec 6, 2017
Solaris, Nexenta, OpenIndiana, and napp-it Optane performance as a SLOG Nov 28, 2017

Share This Page