NVMe Storage Solution

Discussion in 'NAS Systems and Networked Home and SMB Software' started by CookiesLikeWhoa, Apr 1, 2018.

  1. CookiesLikeWhoa

    Joined:
    Sep 7, 2016
    Messages:
    109
    Likes Received:
    24
    Hey all,

    So in typical fashion I've gotten myself way in over my head and come here asking for some help. The goal of this was to have an all NVMe storage solution that would be the data stores for all of my ESXi nodes.

    Right now I'm sitting on a Supermicro 2028U-TN24R4T+ running two E5-2623V3's and 32GBs of Samsung DDR4 2400 RAM. I have four Intel P3520 1.2TB drives and four 900P 280GB drives.

    Initially I was going to run this as a FreeNAS system and take the hit on performance, since, it was assumed that with this many NVMe drives, even if I took a 50% hit on performance I could still max out a 40Gbs connection. Well...not so much. The quote is the post from the FreeNAS forum where I was looking for some tips on NVMe pool tuning.

    The tl:dr version of it is, I'm only getting 1 drives worth of performance (if that), no matter how I set up the pools.

    So I decided to try Windows' Storage Space. Local storage performance was better, but still somewhat a mystery. The 900P's in a striped pool yielded a mixed bag of results with results ranging from 1/4th of a drives performance (512b - 16kb) to almost 100% the performance of all 4 drives (64kb - 1MB) and then back down to 50% of all 4 drives performance (2MB - 64MB). All of this was at QD of 4. The P3520's had odd write > read performance under 32kb and about 1 drives worth of performance then at 64kb and up it started to come around on how it should look.

    Right now I'm leaning towards going with Windows since it seems to yield more performance than FreeNAS, but I'm open to suggestions or if there are some pro tips for tuning FreeNAS/Windows to help with an all NVMe pool. Looking for any guidance!
     
    #1
  2. Nizmo

    Nizmo Member

    Joined:
    Jan 24, 2018
    Messages:
    101
    Likes Received:
    17
    Should optimize the connection settings, example : RDMA (on), Send and Receive Buffers (max), Jumbo Packet.

    Also, for some reason "Balanced" power profile is defaulted for most server installs, set to "High Performance".
     
    #2
    CookiesLikeWhoa likes this.
  3. T_Minus

    T_Minus Moderator

    Joined:
    Feb 15, 2015
    Messages:
    6,359
    Likes Received:
    1,296
    This+ more.

    You're going to have to tweak ZFS to maximize performance based on your work load, and then start tweaking network configurations, etc...
     
    #3
  4. Nizmo

    Nizmo Member

    Joined:
    Jan 24, 2018
    Messages:
    101
    Likes Received:
    17
    Are you feeding the CPU's 8 x 4GB modules? Or your only giving dual channels to each CPU? These are 4 Channel CPU's. Correct memory installation would be 8 x 8GB or 8 x 4GB afaik.

    I would also note that the CPU's listed, only support max 1600/1866Mhz ram per Intel, which is odd I havent seen much less than 2133Mhz ddr4.
    Intel® Xeon® Processor E5-2623 v3 (10M Cache, 3.00 GHz) Product Specifications
     
    #4
    CookiesLikeWhoa likes this.
  5. Patrick

    Patrick Administrator
    Staff Member

    Joined:
    Dec 21, 2010
    Messages:
    11,045
    Likes Received:
    3,996
    Sorry to say this, but you need to be off of FreeNAS if you want to get near 40GbE speeds these days.

    FreeNAS is a great solution for 1GbE and maybe 10GbE NAS, but remember it is still primarily focused on being a homelab / SMB office tool.
     
    #5
  6. SlickNetAaron

    SlickNetAaron Member

    Joined:
    Apr 30, 2016
    Messages:
    50
    Likes Received:
    12
    Interesting... huge statement (and I don’t doubt it at the moment)

    So where do we go from FreeNAS? Preferably with data integrity


    Sent from my iPhone using Tapatalk
     
    #6
  7. gea

    gea Well-Known Member

    Joined:
    Dec 31, 2010
    Messages:
    1,843
    Likes Received:
    611
    The three Open-ZFS alternatives are Free-BSD, Illumos (free Solaris fork, ex OmniOS) and Zol. Only the base OS is performance relevant not a GUI for management.

    I use OmniOS (Illumos) with 40G to my filers but not for 40G but for many 10G users (especially video workstations). I have made tests with 4 x Optane 900P in a Raid-0 setup on OmniOS and was not even near to 5 GB/s throughput on a local benchmark, only about the half. As ZFS is a high security and not a high performance filesystem it would not be easy to come near to 5 GB/s throughput. Mainly ZFS needs the superiour RAM caches for high performance.

    What I have seen is that a genuine Oracle Solaris 11.3 with ZFS v37 was near or faster to the result of OmniOS with 4 Optane but with only two of them. I was not able to test Solaris with 4 Optane as only two were detected. The new Solaris 11.4 with a newer ZFS v43 detects all Optane. I can add a benchmark but this may be limited due beta code.

    see http://napp-it.org/doc/downloads/optane_slog_pool_performane.pdf
    chapter 5
     
    #7
    Last edited: Apr 1, 2018
    T_Minus and Patrick like this.
  8. acquacow

    acquacow Active Member

    Joined:
    Feb 15, 2017
    Messages:
    211
    Likes Received:
    82
    Your fastest solutions are going to be doing rdma from a linux host runing mdadm for any raid work you want done...

    Any more layers you add are going to increase latency and eat up I/O cycles.

    Benchmark some of that to get your max theoretical numbers, then put a filesystem on them and bench that overhead, then add any other raid levels and benchmark that.

    -- Dave
     
    #8
    T_Minus and Patrick like this.
  9. Patrick

    Patrick Administrator
    Staff Member

    Joined:
    Dec 21, 2010
    Messages:
    11,045
    Likes Received:
    3,996
    @SlickNetAaron what is a huge statement? That FreeNAS struggles to achieve 40GbE speeds? I think we have discussed that many times.

    @acquacow and @gea have some good suggestions.
     
    #9
    CookiesLikeWhoa likes this.
  10. SlickNetAaron

    SlickNetAaron Member

    Joined:
    Apr 30, 2016
    Messages:
    50
    Likes Received:
    12
    Hmm.. I guess it’s not such a huge statement... it just struck a chord with me somehow. I’m not sure I can quantify why.

    I’ll have to look around a bit. I’ve been lurking for a while and fail to recall a big discussion on FreeNAS + 40gb. I’m no fan of FreeNAS, just trying to find something that “magic” solution lol
     
    #10
  11. CookiesLikeWhoa

    Joined:
    Sep 7, 2016
    Messages:
    109
    Likes Received:
    24
    Have the buffers maxed and RDMA on. Turned off Jumbo packets as that actually hurt network performance. Though that could be more tuning on my end to get it to work.

    Found out the High Performance bit when I was benching and every bench would come out different. Tried to figure out what would cause such inconsistencies then it dawned on me that MS likes to have "balance" as the default. Once I enabled "high performance" the numbers went up and so did the consistency. Also enabled "performance" in the CPU power management section of the BIOS. Didn't seem to affect numbers really, but did make them more consistent between benches.

    I did not know about the max frequency of RAM for that CPU. Miss there. I wonder how much that will hurt performance overall. It is running with eight 4GB modules though, so it should have all the channels in use.

    The other thing that I was thinking about is that I know this backplane is running to two PLX chips, that each have 16 lanes. Which gives me 32 lanes total for 24 drives. I assume that will have some negative affect on performance, eventually, but how much I can't figure out.

    I will look into Linux options as I've seen that mentioned a few times for performance, though I'm not nearly as comfortable with Linux. Here's to learning!

    Edit: Thank you to the Patrick and the others as well! I figured FreeNAS might not be able to work with this solution, but there was hope!
     
    #11
    Nizmo likes this.
  12. _alex

    _alex Active Member

    Joined:
    Jan 28, 2016
    Messages:
    846
    Likes Received:
    88
    To get this to speed you will need to try and tune a lot.
    As said, mdadm instead of zfs may be your choice.

    Then there you certainly want to use blk-mq, have a look at the io-schedulers and related things.
    Also, properly watching how/if interrupts are spread over CPU's can help finding bottlenecks.
    (at this speed, often one core get's hammered because 'something' just doesn't use the other cores)
    With your setup, also numa-nodes will be of interest.

    For the transport, if possible skip iscsi and try nvemof -> srp -> iser in this order.
    (with the corresponding tunings, i.e. num_channels for srp - what requires blk-mq)

    Good Luck,
    Alex
     
    #12
  13. CookiesLikeWhoa

    Joined:
    Sep 7, 2016
    Messages:
    109
    Likes Received:
    24
    Update/Advice Request!

    So I managed to iron out the performance on the Windows' Server and got all the drives running at their theoretical speeds locally with the advice provided in this thread! Over the network is another story with "meh" 10Gb connections (slightly better sequential speeds, but 4x the 4K performance of my FreeNAS running HDDs) and 40Gb connections that are worse than 1Gb. (Not sure what is going on there yet)

    To help eliminate some trouble, I was considering trying to go with vSAN to keep this all on vmWare. I have a VMUG subscription, so I have the key for it and the cluster would be 5 nodes. The only downside is none of my nodes, except the NVMe one, have local storage since my data stores are on a FreeNAS server via iSCSI and from what I've gathered each node needs to provide some storage to the SAN. Would this likely be a better idea than trying to get a separate OS up to provide storage out?
     
    #13
    T_Minus likes this.
  14. Rand__

    Rand__ Well-Known Member

    Joined:
    Mar 6, 2014
    Messages:
    2,416
    Likes Received:
    302
    #14
    CookiesLikeWhoa and T_Minus like this.
  15. CookiesLikeWhoa

    Joined:
    Sep 7, 2016
    Messages:
    109
    Likes Received:
    24
    Ah thank you for the link. Seems that vSAN is a hot mess.

    Wish I had saved my screen shots from my work with the W2k16 server I'm on. Have the four 900P drives in a striped array and created a 475GB target for the ESXi host. Connected via 10Gb to the ESXi host I was able to get 576MB/s reads/writes with 16Q and 16 threads. 1Q 16threads sits at around 500MB/s. 4K performance was around 20MB/s r/w.

    With the current FreeNAS system I get around 500MB/s r/w in sequential and 5MB/s r/w in 4k. That system is set up with 256GBs of RAM and 12 4TB HDD's mirrored in 6vdevs.

    All of this is with a Windows 10VM with 4GBs of RAM 4vCPU @3.1GHz on a 35GB drive.

    Looks like the search will continue.
     
    #15
    Last edited: Apr 12, 2018
  16. Rand__

    Rand__ Well-Known Member

    Joined:
    Mar 6, 2014
    Messages:
    2,416
    Likes Received:
    302
    Well you should be able to get that with optane as slog (via datastore disk as it is still a lottery to get optane to run in Freenas VM) - have you tried that?
     
    #16
  17. CookiesLikeWhoa

    Joined:
    Sep 7, 2016
    Messages:
    109
    Likes Received:
    24
    The original goal was to set this system up with 8 900Ps in a striped array and 16 P3520's in 4 Raidz vdevs. The 900Ps would be the data store and the P3520's would the be the rsync. After setting up the 4 900P's in a striped array I tested the performance over iSCSI to an ESXi host and got performance that was worse than my current HDD FreeNAS, around 300MB/s sequential and 3MB/s 4k. Did some DD testing as seen up in the OP and found that the system just couldn't really use the drives and from the advice given above, moved away from FreeNAS and ZFS as a whole.

    Next step was to try a 2k16 server to see if I could get better performance. The initial performance was way better than the FreeNAS system and after working with it for a while I was able to get incredible performance on the local machine. Next step was to transfer that to the network. That is proving to be more difficult.
     
    #17
    Last edited: Apr 12, 2018
  18. acquacow

    acquacow Active Member

    Joined:
    Feb 15, 2017
    Messages:
    211
    Likes Received:
    82
    You shouldn't use dd as your testing tool. You need something multi-threaded with proper ability to control thread count and queue depth/etc.

    I suggest looking into using "fio"

    -- Dave

    Sent from my Moto Z (2) using Tapatalk
     
    #18
    CookiesLikeWhoa, Evan and T_Minus like this.
  19. CookiesLikeWhoa

    Joined:
    Sep 7, 2016
    Messages:
    109
    Likes Received:
    24
    Thank you for the info! I hadn't even heard of that to be honest. I did try the FreeNAS install again to see what the performance was like over the network and it was worse than my current set up on every account. Think I can officially rule FreeNAS out for now.

    I'm thinking a lot of the over the network performance issues are related to latency. When running the benchmarks I'm noticing that the throughput is jumping on the network cards. It will jump to 10Gb/s then down to 2 and back up a lot. With the HDD array on the FreeNAS it just stays right at 5Gb/s . This makes me think that something on the network is killing the performance of the drives.

    I decided to install Hyper-V onto the W2k16 server to see what performance was like locally for VMs. Was not disappointed. 900P.PNG Hyper-V.PNG
    I'm pretty sure that the 4MB ATTO results were actually Hyper-V starting to cache the bench in RAM since the "max" the drives can theoretically do is around 10.4GB/s reads and 9.2 GB/s writes.

    I might end up going down this path and using this server as a Hyper-V server. Unfortunately, outside of the desktop world, I do not have a whole lot of experience with Windows.
    [​IMG]
     
    #19
    Last edited: Apr 15, 2018
  20. whitey

    whitey Moderator

    Joined:
    Jun 30, 2014
    Messages:
    2,734
    Likes Received:
    846
    Do tell us a lil' bit more about your network setup. Switch vendor/model, jumbo in the mix or not, desired/preferred protocol for network stg access. Remember not all switches are up to the task of IP SAN duties, have to be REAL careful here otherwise it is an exercise of futility (AKA 'what's the definition of insanity'...) you know the rest of that quote right? :-D
     
    #20
Similar Threads: NVMe Storage
Forum Title Date
NAS Systems and Networked Home and SMB Software 4x NVMe shared over NFS/iscsi options Nov 11, 2017
NAS Systems and Networked Home and SMB Software NAS and Storage Options Apr 12, 2018
NAS Systems and Networked Home and SMB Software Oracle ZFS Storage Appliance available - What to test? Apr 3, 2018
NAS Systems and Networked Home and SMB Software New storage box for mixed workload Jan 26, 2018
NAS Systems and Networked Home and SMB Software All flash storage server Aug 25, 2017

Share This Page