ZFS performance vs RAM, AiO vs barebone, HD vs SSD/NVMe, ZeusRAM Slog vs NVMe/Optane

Discussion in 'Solaris, Nexenta, OpenIndiana, and napp-it' started by gea, Dec 6, 2017.

  1. gea

    gea Well-Known Member

    Joined:
    Dec 31, 2010
    Messages:
    2,167
    Likes Received:
    705
    I have extended my benchmarks to answer some basic questions

    - How good is a AiO system compared to a barebone storage server
    - Effect of RAM for ZFS performance (random/ sequential, read/write)
    (2/4/8/16/24G RAM)
    - Scaling of ZFS over vdevs
    - Difference between HD vs SSD vs NVMe vs Optane
    - Slog SSd vs ZeusRAM vs NVMe vs Optane

    current state:
    http://napp-it.org/doc/downloads/optane_slog_pool_performane.pdf
     
    #1
    gigatexal, james23, MikeWebb and 5 others like this.
  2. _alex

    _alex Active Member

    Joined:
    Jan 28, 2016
    Messages:
    873
    Likes Received:
    94
    what exactly does the benchmark you call from the GUI?
    any chance this can be reproduced on zol to compare?
     
    #2
  3. gea

    gea Well-Known Member

    Joined:
    Dec 31, 2010
    Messages:
    2,167
    Likes Received:
    705
    My menu Pools > Benchmarks (this one is in 17.07dev) is a simple Perl script. In the current benchmark set it uses some Filebench workloads for random, sequential and mixed r/w workloads. The other options are dd and a simple write loop of 8k or larger writes via echo. The script executes the benchmarks one by one switching sync on writes automatically and allows modification of some settings directly or via shellscript (for ZFS tuning) to reduce the stupid manual switching of settings between large benchmarks series as each run consists of 7 benchmarks (write random, write sequential, both sync and async, read random, r/w and read sequential). For the many benchmarls to be run I selected benchmarks with a quite proper result but ones with a short runtime. Therefor from run to run the differences are at about 10% but this should not affect the general result.

    So it should work on Zol. But I would expect similar results on ZoL, maybe you need some extra RAM for same values.
     
    #3
    Last edited: Dec 6, 2017
  4. _alex

    _alex Active Member

    Joined:
    Jan 28, 2016
    Messages:
    873
    Likes Received:
    94
    ok, will setup an aio on proxmox and have a look into it.
    i just would like to have a point to compare zol / some idea what numbers to expect with the optane but usually use fio what doesn't compare good with your numbers.
    i want to try what happens if i export a partition of the optane via NVMEoF and use it as slog on the initiator for some local hdd.

    makes sense to turn on/off sync by script in a benchmark series :)
     
    #4
  5. Rand__

    Rand__ Well-Known Member

    Joined:
    Mar 6, 2014
    Messages:
    3,408
    Likes Received:
    482
    So when will auto pool creation/destruction/composition based on a config file be added?
    Looking forward to run on my ssd or potential nvme pool;)

    edit:
    typo :
    4.5 A SSD based pool via LSI pass-through (4 x Intel DV 3510 vdev)

    and other places same error
     
    #5
  6. gea

    gea Well-Known Member

    Joined:
    Dec 31, 2010
    Messages:
    2,167
    Likes Received:
    705
    Spirit is faster than my fingers...
    Will correct them.

    At the moment the whole benchmark series is a voluntary extra task.
    Now one wants to classify results.

    In German we say: Wer misst misst Mist.
     
    #6
    _alex likes this.
  7. Rand__

    Rand__ Well-Known Member

    Joined:
    Mar 6, 2014
    Messages:
    3,408
    Likes Received:
    482
    Yeah - i thought you probably have most of the stuff scripted anyway - and it might make your next run simpler too;)
     
    #7
  8. _alex

    _alex Active Member

    Joined:
    Jan 28, 2016
    Messages:
    873
    Likes Received:
    94
    i like benchmarks for the case of seeing other bottlenecks/missconfigurations.
    so, if there is a clear range what should be reached there must be something wrong if own results are magnitudes below ;)
     
    #8
  9. azev

    azev Active Member

    Joined:
    Jan 18, 2013
    Messages:
    603
    Likes Received:
    149
    @gea which nvme driver did you use on your test ?? Native from ESXI installation? or did you install intel NVME driver from vmware website ?
     
    #9
  10. gea

    gea Well-Known Member

    Joined:
    Dec 31, 2010
    Messages:
    2,167
    Likes Received:
    705
    ESXi native
     
    #10
  11. gea

    gea Well-Known Member

    Joined:
    Dec 31, 2010
    Messages:
    2,167
    Likes Received:
    705
    another two days later in the lab
    http://napp-it.org/doc/downloads/optane_slog_pool_performane.pdf

    Can someone with Solaris verify these results vs OmniOS from Windows (SMB and iSCSI).
    They are too good!

     
    #11
  12. Rand__

    Rand__ Well-Known Member

    Joined:
    Mar 6, 2014
    Messages:
    3,408
    Likes Received:
    482
    Still the Intel DV 3510 typo ;)
    Also "Optana 900p", sungle, Junboframes, rersulta,AStto, qualitzy wi-the,requirte, wrize-ramcache

    On Solaris only two of ny Optane 900P were detected, so a compare 4 Optans on OmniOS vs 2 Optane on Solaris
    On Solaris only two of ny Optane 900P were detected, so a comparison of 4 Optans on OmniOS vs 2 Optane on Solaris


    Else very nice, thanks a lot for the extensive testing
     
    #12
  13. jp83

    jp83 New Member

    Joined:
    Dec 29, 2017
    Messages:
    7
    Likes Received:
    0
    #13
  14. Rand__

    Rand__ Well-Known Member

    Joined:
    Mar 6, 2014
    Messages:
    3,408
    Likes Received:
    482
    He is running the slog on an optane nvme drive;)
     
    #14
  15. jp83

    jp83 New Member

    Joined:
    Dec 29, 2017
    Messages:
    7
    Likes Received:
    0
    I see that, but I thought I saw he couldn't pass it through natively, so was using a virtual disk to make it available to the vm, and that's where my question is, because I can't seem to get any decent performance like that.
     
    #15
  16. Rand__

    Rand__ Well-Known Member

    Joined:
    Mar 6, 2014
    Messages:
    3,408
    Likes Received:
    482
    Yes he is using that via vmdk, but i think the magic is in the drive not the setup. The same setup with a P3700 was not really worthwhile a couple of months ago.
     
    #16
  17. james23

    james23 Active Member

    Joined:
    Nov 18, 2014
    Messages:
    385
    Likes Received:
    64
    wow gea, what a beautiful pdf document. thank you for it.

    i had a question(s)

    (pls correct if wrong), ive been trying to bench freenas (with many different disk and hardware) setups for a few months. i often cant get past the ARC messing with my read results.

    I see in alot of your tests you set: readcache=all or readcache=none , is this enable/disable the ARC/ram read cache?
    on freenas, the best ive been able to comeup with (and these arent very good, as the read and writes speeds become very poor) is:

    zfs set primarycache=metadata MYPOOL
    or
    first: hw.physmem 8294967296 (and reboot, so that freenas at the OS level only "sees/uses" 8 gb of ram total)
    and then sysctl vfs.zfs.arc_max=1514128320 = ~ 1.5gb of ram for arc). if i dont do hw.physmem, then my real 128g or 256g of memory is active and i cant set sysctl vfs.zfs... any lower than 16gb)

    i then use these tools to TRY to get some consistent benchmarks.
    dd (often if=/dev/null or of=)
    fio (seems to give wrong/bogus results if i increase the threads or job count)
    iozone
    bonnie+++
    cp xyz /dev/null

    (sync=disabled, or sync=always. / atime=disabled , no compression)

    is there anything else i should be trying to get more consistent/repeatable speeds (mainly for read)?
    (or is there a way i can modify your napp-it bench1.sh script to run on freebsd/freenas?) it looks pretty consistent.

    (ive mostly given up on read, and only benchmark write results)
    maybe this isnt the best indicator, but i always watch the updates of gstat -s -p
    (if i dont see my drives being close to maxed out in % terms, i assume something is keeping me from the max speed i could be seeing on my benchmark)


    My goal is to get some more stable info and notes (i have tons already) on my actual Freenas performance with different pool layouts. So i can pick the best to go with for my final build.
    (ive been at this for months, and still have many more months to keep playing with Freenas / my layout - before i commit to something). I have alot of equipment to test and play with for this. (ie ~ 40x 3tb HGST sas disks, ~20 HGST sas3 SSDs , 6x enterprise NVMe s and one 280g optane. + a SM x10 and a few x9 sys , all sitting idle for my testing)

    thanks for any input
     
    #17
  18. Rand__

    Rand__ Well-Known Member

    Joined:
    Mar 6, 2014
    Messages:
    3,408
    Likes Received:
    482
    Are you going to write that all up and share with us?:)
     
    #18
  19. james23

    james23 Active Member

    Joined:
    Nov 18, 2014
    Messages:
    385
    Likes Received:
    64
    my benchmark notes are a bit of a mess (but i ofcourse know whats what) so it might be hard to read, but i'll post this one since its easy to post.

    I think best will be i'll post/share it , then if you have ? or need specifics of what i was running then ask and i'll answer / give more info. here is one i can grab now (its a huge excel SS so i figure best way is via google sheets share).

    Ill post some others (that are in google docs , not excel format) nb; my freenas box is 11.1 or 11.2 and on baremetal x9dr3-ln4f+ , 128g ram, 9207-8i and 9207-8e with a 4u 846 TQ backplane (i move to an expander BP in my later tests in future docs ill post):

    ZFS DISK BENCH SHEET _ JAN 2019 excel XLSx.xlsx
    (those "new110" comparisons way off to the right are a windows box i have with adaptec 8 series raid as a comparison)

    for somereason when you preview with google sheets the formatting looks correct, but when you open with google sheets, it looses
    alot of the formatting.

    NOTE ALOT OF THE NON COLORED text are results i copy/paste from https://calomel.org/zfs_raid_speed_capacity.html
    and then my own tests (with same type of pool) are in colored text

    any windows images/screenshots you see, are run via SMB (or some iSCSI, but most smb) against xyz pool config (using 2012r2 on a separate esxi host, via 10g to the FN box).
    unfortunately, it was only recently that i found that windows10 / srv 2016 gives you MUCH better SMB performance (i think because those OSs support SMB multi-streaming, which works better with freenas's single threaded SMB, ie with win7 or 2012r2 , even baremetal, i rarely get above 500-600mb with windows file copy, with win10 / 2016 i can get 1000 mb/s on a fast pool (ie a HGST ssd stripe'd pool)

    some of the results i'll post tomoro have more of the SSD results, and are easier to read / follow. (alot of the Spreadsheet above was when i was only 1 month into learning FN, vs 3 or 4 months of playing with FN, now)
    ----

    EDIT: this a 2nd set of benches and might be easier to follow (maybe :/ ) alot of the tests towards the top are from a RAID card on 2012r2 (not zfs), i did for my own comparisons. its a pdf of a google doc shared via google drive:

    (pt 3of3) 2019- Huge disk Benchmarks - Google Docs.pdf

    (page ~22 is where FN stuff is mostly, esp page 27)
     
    #19
    Last edited: Feb 9, 2019
  20. gea

    gea Well-Known Member

    Joined:
    Dec 31, 2010
    Messages:
    2,167
    Likes Received:
    705
    Any benchmark produces s special workload case what means every benchmark must give different results. What I have done is creating a series of benchmarks (filebench in may case) as there you can select a workload for a benchmark (ex more sequential or random workload, filer, webserver etc). The goal was not to get some absolute values but to have some ideas how to design a pool, RAM needs or configuration settings in a real setup where you can modify settings in the triangle price, performance and capacity to get an optimal setting with a given use case (ex should I add more RAM or use more disks or Raid-10 instead of Z2 for a new machine with a given use case )

    My tests are based on a series of tests where every write benchmark is done with sync enabled or disabled. Readcache (Arc) all vs none shows the effect of RAM. Only with Intel Optane readcache=none gives similar results for both settings (means RAM is not so relevant) and different number of disks and raid settings. With Arc enabled I have additionally done tests with a different amount of RAM. As the benchmark series is scripted it was easy to run series of it with different settings. The bench.sh would allow to add some own benchmarks to every run.

    As RAM is used for read and write caching, it affects read and write performance (Even on writes you must read metadata that can be cached in Arc). Flash is faster on reads than writes. Disks should perform similar so with disks a pure write test (sync vs async, sequential vs random load) can give enough informations.

    With FreeBSD you should get at least a similar behaviour as the ZFS principles are the same. You may need some more RAM on FreeNAS than on OmniOS/Solaris. This is partly related to FreeNAS and partly related to the ZFS internal RAM management that is (still) based on Solaris but OS related differences in Open-ZFS become smaller and smaller. Even on the Illumos dev mail list I have seen efforts to include commits ex from Linux more or less directly.
     
    #20
    Last edited: Feb 9, 2019
Similar Threads: performance barebone
Forum Title Date
Solaris, Nexenta, OpenIndiana, and napp-it omnios+nappit 10gb performance: iperf fast, zfs-send slow Jun 17, 2019
Solaris, Nexenta, OpenIndiana, and napp-it Windows SMB performance issues Feb 25, 2019
Solaris, Nexenta, OpenIndiana, and napp-it WD Ultrastar DC SS 530 SAS SSD for high performance HA storage/Slog Jan 10, 2019
Solaris, Nexenta, OpenIndiana, and napp-it Crap iperf3 performance in OmniOS? Nov 26, 2018
Solaris, Nexenta, OpenIndiana, and napp-it Performance troubleshooting. Aug 19, 2018

Share This Page