SSD performance - issues again

Discussion in 'Solaris, Nexenta, OpenIndiana, and napp-it' started by Rand__, Apr 30, 2018.

  1. gea

    gea Well-Known Member

    Joined:
    Dec 31, 2010
    Messages:
    2,096
    Likes Received:
    674
    There is no physical barrier. This is more an example of the 80:20 rule what means that you only need to spend 20% of all efforts to reach 80% of a maximum. If you want to reach the maximum you need to spend 80% of all efforts for the missing 20%

    I see it like
    Reason is the ZFS filesystem with its goal of ultimate datasecurity what means more data and data processing due checksums, CopyOnWrite that increases data that must be written as all writes are ZFS blockwise as there is no infile data update. This also adds more fragmentation. As in a ZFS Raid, data is spread quite evenly over the whole pool there is hardly a pure sequential datastream. Disk iops is a limiting factor then. Without these security related items a filesysystem can be faster.

    And even with a quite best of all DC/P 3700 you are limited by traditional Flash what means you must erase a SSD block prior write a page with a ZFS datablock and the need of trim and garbage collection. With 3 GB/s and more you are also in regions where you must care of internal performance limits regarding RAM, CPU, PCIe bandwith so this require a fine tuning on a lot of places. This is propably the reason why a genuine Solaris is faster than Open-ZFS..

    So unless you cannot do a technological jump like with Optane that is not limited by all the Flash restrictions as it can read/write any cell directly similar to RAM. Optane can give up to 500k iops, ultralow latency down to 10us without any degration after time or the need of trim or garbage collection to keep performance high. This is 3-5x better than a P 3700 and the reason why it can double the result of a P3700 pool.

    In the end you must also accept a benchmark as a synthetic test to check performance in a way that limits effects of RAM and caching. This is what makes ZFS fast despite the higher security approaches. On real workloads more than 80% of all random reads are delivered from RAM what makes pool performance not relevant and all small random writes go to RAM (beside sync writes). Sync write performance in a benchmark is propably the only value that give a correct relation to real world performance. Other benchmarks are more a hint that performance is as expected or to decide if a tuning or modification is helpful or not.
     
    #21
  2. Rand__

    Rand__ Well-Known Member

    Joined:
    Mar 6, 2014
    Messages:
    3,225
    Likes Received:
    441
    I totally agree - sync write is the only thing that I really look at as this is the one that impacts VM write speeds... thats why its frustrating to see that both 3 and 7 way mirrors only reach 360 MB/s;)

    So what are big vendors using for high speed sync writes? Or what did they use to use before Optane? What write speeds do they get?
     
    #22
  3. i386

    i386 Well-Known Member

    Joined:
    Mar 18, 2016
    Messages:
    1,592
    Likes Received:
    378
    There are special/custom devices like flashtec nvram that are used in high end storage appliances.
    These devices are dram based (+ nand for backup in case of power loss) and can do 200k+ 4k random io @ qd1/1thread ._.
     
    #23
  4. Rand__

    Rand__ Well-Known Member

    Joined:
    Mar 6, 2014
    Messages:
    3,225
    Likes Received:
    441
    Hm sounds interesting albeit not in my price range most likely;)

    Single Pair of P3700's
    upload_2018-5-7_10-32-1.png
     
    #24
  5. gea

    gea Well-Known Member

    Joined:
    Dec 31, 2010
    Messages:
    2,096
    Likes Received:
    674
    Optane can be used as datadisk and as slog performace booster for slower pools.
    Beside Optane there are other fast devices for sync logging, partly based on Dram or fast Flash.

    Most vendors show you only sequential values, latency and iops, not the low sync write performance.
    If you want you read about sync write on other high security filers with a CoW filesystem (similar to ZFS),
    you may look or google for comments on netApp, a leading storage vendor ex

    "netapp nfs sync write performance"

    Only hint in the specs is latency + 4k/random write iops as there is a relation between them and sync write performance
    (and latency does not scale with number of disks/vdevs unlike iops and sequential performance)

    see Netapp all Flash arrays
    Flash Array – Reduce SSD Latency with All-Flash EF-Series | NetApp
    their latency is between 100 us and 800 us with up to 1m iops

    If you compare:
    The latency of a single DC S3700 is at 50us with 36k iops
    A P3500 if you compare is at 35k iops
    The latency of a single DC P3700 is at 20-30us with 175k iops

    Optane 900/4800 latency is at 10us with 500k iops
     
    #25
  6. Rand__

    Rand__ Well-Known Member

    Joined:
    Mar 6, 2014
    Messages:
    3,225
    Likes Received:
    441
    Very interesting, thanks.

    So basically there are three options currently for a high speed VM filer...

    1. Stay with ZFS and go optane
    2. Stay with ZFS but move to async
    3. Leave zfs
     
    #26
  7. gea

    gea Well-Known Member

    Joined:
    Dec 31, 2010
    Messages:
    2,096
    Likes Received:
    674
    Leave ZFS + sync write/ Slog mostly means using a hardware raid + cache + bbu/flash protection.
    I have never seen comparable sync write values with this so I doubt this will help.

    Using filesystems with a lower security level without CoW and checksums is an option but who wants that.
     
    #27
  8. Rand__

    Rand__ Well-Known Member

    Joined:
    Mar 6, 2014
    Messages:
    3,225
    Likes Received:
    441
    Well 'want' is relative.
    I *want* good performance with ZFS but that seems to be difficult;)
    The question is how can i get that - throwing a ton of SSDs at it doesn't help as we see. I could sell all of them off and buy 2 960GB 905's (if I make enough money), double/triple write speed and quarter available capacity, but thats not really sounding too great...

    What about putting a 900p (or pair) as slog in front of the SSD array? You said usually that wouldnt be needed on SSDs but Optane is way ahead on sync write, is it not?
     
    #28
  9. Rand__

    Rand__ Well-Known Member

    Joined:
    Mar 6, 2014
    Messages:
    3,225
    Likes Received:
    441
    So finally was able to run an optane bench (on a pair of new 900 480's)
    Quite nice for a single mirror.
    upload_2018-5-7_21-0-58.png

    Wonder whether that scales up...
     
    #29
  10. gea

    gea Well-Known Member

    Joined:
    Dec 31, 2010
    Messages:
    2,096
    Likes Received:
    674
    I have made benchmarks on OmniOS and Solaris with 4 x 900P, see http://napp-it.org/doc/downloads/optane_slog_pool_performane.pdf

    About DC 3700 + Optane Slog
    Your sync write performance (300-400 MB/s) is good enough for 10G sync write. If you want more an Optane Slog can make sense., even with a DC 3700 pool. The boost is not as big than with a slower pool but I would expect an improvement.
     
    #30
  11. Rand__

    Rand__ Well-Known Member

    Joined:
    Mar 6, 2014
    Messages:
    3,225
    Likes Received:
    441
    Yes, thats really a good reference document, thanks for that:)

    And I was aiming at 40/100GB - not really expecting to fill those up but I have those cards and would like to utilize them if possible;)
     
    #31
  12. _alex

    _alex Active Member

    Joined:
    Jan 28, 2016
    Messages:
    873
    Likes Received:
    94
    why not break the mirror and stripe them to get a first idea?
     
    #32
  13. Rand__

    Rand__ Well-Known Member

    Joined:
    Mar 6, 2014
    Messages:
    3,225
    Likes Received:
    441
    Thats not so easy in ZFS i believe. Although I never tried;)

    Here are some results with 7 Mirrors and 900p as slog...

    7 drives
    nappitI_7.PNG
    7 drives + Optane as slog
    nappitI_7_slog.PNG
     
    #33
  14. gea

    gea Well-Known Member

    Joined:
    Dec 31, 2010
    Messages:
    2,096
    Likes Received:
    674
    At least around 30% faster

    bzw
    Raid-0 in ZFS is ultra easy.
    In menu Pools: select two or more disks and vdev type=basic or
    Create a pool from one basic disk as vdev and add more basic disk as vdev.
     
    #34
  15. Rand__

    Rand__ Well-Known Member

    Joined:
    Mar 6, 2014
    Messages:
    3,225
    Likes Received:
    441
    Ah, never thought to use basic vdev:p thanks:)

    And yes, with optane slog the ssd pool is usable. Still horrible performance for the amount of hardware involved but at least something;)
     
    #35
    gea likes this.
  16. gea

    gea Well-Known Member

    Joined:
    Dec 31, 2010
    Messages:
    2,096
    Likes Received:
    674
    Indeed horrible,
    around 1 million read iops and around 500 (FB) -1000 MB/s (dd) sync write performance.....
     
    #36
  17. Rand__

    Rand__ Well-Known Member

    Joined:
    Mar 6, 2014
    Messages:
    3,225
    Likes Received:
    441
    So I created a new VM (no slog yet), ran Benchmark on it;

    nappitJ_virtual_9.PNG


    created a NFS share and moved a win vm to it and...

    upload_2018-5-14_15-25-12.png


    Not what i was hoping for ;)
     
    #37
  18. gea

    gea Well-Known Member

    Joined:
    Dec 31, 2010
    Messages:
    2,096
    Likes Received:
    674
    Can you add details about

    Server
    - vnic type (slow e1000 or fast vmxnet3)
    - vmxnet3 settings/ buffers
    - tcp buffers
    - NFS buffers and servers

    Windows client
    - vnic type
    - tcp settings especially interrupt throtteling (should be set to off)

    background
    defaults are optimised for a 1G network and to limit RAM/CPU use
     
    #38
  19. Rand__

    Rand__ Well-Known Member

    Joined:
    Mar 6, 2014
    Messages:
    3,225
    Likes Received:
    441
    Ok, reset the Nappit Box to be sure its ok, updated to the latest and greates (omni OS/Napp-It).

    Using ESX6.5U1, vmxnet3, 2658v3 4 cores, 48GB Ram, no tuning on NappIt
    VM is located on the NFS share and is writing to local disk, i.e NIC is not in play; Same CPU, 2 cores, 4GB Ram


    I will read the tuning guide:)
     
    #39
  20. Rand__

    Rand__ Well-Known Member

    Joined:
    Mar 6, 2014
    Messages:
    3,225
    Likes Received:
    441
    So added slog again...

    upload_2018-5-14_22-32-12.png

    Not helping ...

    upload_2018-5-14_22-26-46.png


    still not checked the tuning guide though;)
     
    #40
Similar Threads: performance issues
Forum Title Date
Solaris, Nexenta, OpenIndiana, and napp-it Windows SMB performance issues Feb 25, 2019
Solaris, Nexenta, OpenIndiana, and napp-it Napp-in-one (OmniOS) on ESXi 6.5U1, performance issues? Feb 21, 2018
Solaris, Nexenta, OpenIndiana, and napp-it Slow AFP performance, ping issues (omnios + netatalk 3.0.4) Aug 21, 2013
Solaris, Nexenta, OpenIndiana, and napp-it Performance Issues (Throughput) Jan 19, 2013
Solaris, Nexenta, OpenIndiana, and napp-it omnios+nappit 10gb performance: iperf fast, zfs-send slow Monday at 7:47 AM

Share This Page