FreeBSD/FreeNAS vs OmniOS/Napp-it write speeds when used as ESXi NFS VM datastore?

Discussion in 'Solaris, Nexenta, OpenIndiana, and napp-it' started by AveryFreeman, Jul 29, 2018.

  1. AveryFreeman

    AveryFreeman ESXi + ( ILLUMOS / ZFS ) = HAPPY

    Mar 17, 2017
    Likes Received:

    so I've done a couple all-in-one ESXi boxes with either FreeNAS or FreeBSD with ZFS I pass to ESXi as NFS datastores. Tried to run VMs but benchmarks exposed gawd-awful write speeds, totally unusable. Just ended up using them for ISO storage and (slow) backups, content libraries, etc. so not a total bust, but still.

    Ran KVM for a while on a box with ZFS datastore and that was much better, but want to use ESXi ... is there something else that works better? I was wondering if Napp-it avoids this awful sync issue FreeBSD/NAS has with ZFS writes, have tried NVMe slog (SM953 with 1000+ Mbps writes), sync=disable, tried re-compiling a kernel with sync modified (I'm sure you've all seen it), it's all crap. ZFS just sucks for VM datastore in ESXi.

    Does Napp-it overcome that slow write issue to NFS datastores in some other way? Should I just ditch ZFS and run XFS on mdadm?
  2. cheezehead

    cheezehead Active Member

    Sep 23, 2012
    Likes Received:
    With ESXi were you using VT-d to passthrough the HBA to your FreeNAS/FreeBSD VM? Also, were you using the VMXNet3 adapter? If either is set wrong I've seen some pretty bad speeds.

    Also, what kind of write speed variations are you seeing between installs?
  3. AveryFreeman

    AveryFreeman ESXi + ( ILLUMOS / ZFS ) = HAPPY

    Mar 17, 2017
    Likes Received:
    Oh yeah, of course - sorry, I thought all that was just a given. Most recent was LSI 9211-8i controller w/ HGST He8 8TB mirror, the connection was vmxnet3 on an internal VM network (no physical nic, just for use between VMs). MTU was 9000.

    I mean, it's well known that ESXi is slow AF when writing to ZFS NFS datastores due to the o_sync flag, which is what I tried to disable by modifying the kernel, commenting out this portion of nfs_nfsdport.c and compiling /installing the kernel (and rebooting of course):

    vi /usr/src/sys/fs/nfsserver/nfs_nfsdport.c

    // if (stable == NFSWRITE_UNSTABLE)
    ioflags = IO_NODELOCKED;
    // else
    // ioflags = (IO_SYNC | IO_NODELOCKED);
    uiop->uio_resid = retlen;
    uiop->uio_rw = UIO_WRITE;

    The original source is: Technical Musings: Speeding up FreeBSD's NFS on ZFS for ESX clients

    I don't know if it's just too old and doesn't work on current versions of FreeBSD, I mean 7 years ago is quite a long time in the tech world.

    But what I really want to know (before I just try it for myself) is if Napp-it has these same issues with the io_sync - is it as stringent when writing to NFS datastores as FreeBSD, or have the developers mitigated this issue by default?

    really crap, in CDM like 100MB/s Seqential Q32T1 and sub-MB speeds in 4K Q32T1. I can't find any of the screenshots ATM, but I tested KVM w/ ZFS shortly thereafter and thought this was a huge improvement:

    Attached Files:

  4. gea

    gea Well-Known Member

    Dec 31, 2010
    Likes Received:
    There are four aspects

    1. ZFS
    Esxi wants to do sync writes on NFS. So when your NFS shared filesystem, has sync=default or always ESXi does secure sync writes that are much slower than async writes. To check this, set sync to disabled and recheck performance. If write is much faster then, add an Intel Optane (800P or better) as an Slog.

    2. Settings
    Most defaults are optimized for 1G networks. For faster networks, increase tcp, vmxnet3s and NFS buffers (napp-it see menu System > Tunings)

    3. RAM
    ZFS use a rambased write cache (Open-ZFS default 10% RAM up to 4GB) to transform small random writes to large sequential ones. Performance on ZFS is RAM related. Add as much RAM as you can and redo test. With low RAM Solarish is often faster than BSD or Linux as the ZFS internal RAM management is (still) Solaris based. FreeNAS itself (the management GUI) is also known to be quite memory hungry, more than others GUIs like Nas4Free or napp-it.

    For the above, see for effects of an Optane Slog or RAM vs performance

    4. OS
    ZFS was developped on Solaris and quite often Solaris is the fastest ZFS filer followed by the Solaris forks followed by other implementations.

    ESXi internal transfers are in software. MTU settings are only relevant on real ethernet.
    Last edited: Jul 30, 2018
  5. AveryFreeman

    AveryFreeman ESXi + ( ILLUMOS / ZFS ) = HAPPY

    Mar 17, 2017
    Likes Received:
    Wow, you're a wellspring of knowledge, I really appreciate it.

    I don't think I can afford an Optane 800P - what about an IODrive? My m.2 SM953 isn't fast enough? Or an SM970 pro... ? Done any tests? Samsung consumer has pretty amazing MTBF these days...

    Oh, and how big should I make it? I've read some places it doesn't really matter how large, maybe partition off a 24GB partition and massively over-provision it... does that sound right?

    What are some typical NFS buffer settings? Sorry, I've never worked with NFS buffer.

    I agree, FreeNAS is what I started with to try and make things easier but it had so many features I didn't need I switched to FreeBSD and haven't looked back...

    Will do!

    Yeah, I tried Napp-it on Solaris 11 for a while and it seemed to work well, but I got scared about having a pool that wasn't using OpenZFS for future compatibility reasons.

    Oh, good to know! I won't bother setting it going forward then.

    There's one other aspect I wanted to know about that I ran across on a CentOS NFS page doing something totally unrelated last night - sync write on request

    18.6. NFS Server Configuration

    4th paragraph down from fig 18.4:

    Sync write operations on request — Enabled by default, this option does not allow the server to reply to requests before the changes made by the request are written to the disk. This option corresponds tosync. If this is not selected, the async option is used.

    • Force sync of write operations immediately — Do not delay writing to disk. This option corresponds to no_wdelay.

    Can I safely disable this feature when using 10Gbe? It sounds like it has a deleterious effect re: throughput...

    Thanks for all your help... reading benchmark test PDF right now...
  6. gea

    gea Well-Known Member

    Dec 31, 2010
    Likes Received:
    Using consumer SSDs for an Slog?

    This is a quite a bad idea and only better than using sync write without any Slog on slow disks as it improves sync write performance and data security a little. They lack powerloss protection and therefor you cannot use them to fully protect the content of the RAM based writecache. They are also optimized for high sequential values and quite bad regarding low latency and steady writes where their write iops go down after some time of writing.

    From size an Slog can be quite small. The best Slog in the past was a Dram based ZeusRam with a size of 8GB.

    NFS Tuning, see Chapter 3 NFS Tunable Parameters (Solaris Tunable Parameters Reference Manual)
    You can set them in /etc/system manually or menu System > Appliance tuning where you can set new values and see defaults.

    About Solaris ZFS
    Sadly Oracle decided to close its development so now we have to decide between native Oracle ZFS and Open-ZFS.
    Now you have the choice to use Solaris ZFS v44 (from a pure technical view the advanced solution over Open-ZFS regarding features and performance but nor open not free and with bugfixes available only with a subscription) and Open-ZFS. A move between requires a backup, pool destroy and restore.

    About sync write on ZFS
    On ZFS this is a property of a filesystem.
    With sync=default a client like ESXi decides otherwise you can force always or disabled.
  7. rune-san

    rune-san Member

    Feb 7, 2014
    Likes Received:
    An SM953 is actually an NVMe drive with PLP. It would work as a SLOG device, but it's way larger than it needs to be for SLOG, and I would argue it would be best used as All-Flash storage vs. a SLOG device to a larger pool. But if its what you have, then its hard to beat reusing non-used hardware.

    Stay away from the 970 Pro. Even with "Pro" in the name it is very much a consumer drive with no PLP. It's latency is all over the place, and as a result it makes a poor choice for SLOG.
  8. J-san

    J-san Member

    Nov 27, 2014
    Likes Received:
    I think you've already moved past this, but I thought I would add that hacking or tweaking settings to defeat Sync writes is just shooting yourself in the foot.

    This is a very bad idea, ESXi requests sync writes to NFS stores to ensure data integrity for the VMDKs. You could end up with corrupted guest filesystems if there's powerloss or a crash to the ESXi server.

    Also on a different note, it does appear that MTU settings do still affect read/write speeds even over VMXNet3 virtual NIC / switches - so do still update/set those to 9000Mtu:
Similar Threads: FreeBSD/FreeNAS OmniOS/Napp-it
Forum Title Date
Solaris, Nexenta, OpenIndiana, and napp-it Looking to update OmniOS/NAPP-IT from r151014 Oct 23, 2018
Solaris, Nexenta, OpenIndiana, and napp-it OmniOS/napp-it self-sign certificate Oct 4, 2018
Solaris, Nexenta, OpenIndiana, and napp-it OmniOS/Napp-it standalone to AIO ESXi Jun 18, 2016
Solaris, Nexenta, OpenIndiana, and napp-it refresh AD users/groups in OmniOS/napp-it? Dec 8, 2015
Solaris, Nexenta, OpenIndiana, and napp-it ZFS Backup Script from OmniOS/Napp-it to Freenas or ZOL Nov 28, 2015

Share This Page