FreeBSD/FreeNAS vs OmniOS/Napp-it write speeds when used as ESXi NFS VM datastore?

AveryFreeman

ESXi + ( ILLUMOS / ZFS ) = HAPPY
Mar 17, 2017
165
19
18
38
Near Seattle
averyfreeman.com
Hey,

so I've done a couple all-in-one ESXi boxes with either FreeNAS or FreeBSD with ZFS I pass to ESXi as NFS datastores. Tried to run VMs but benchmarks exposed gawd-awful write speeds, totally unusable. Just ended up using them for ISO storage and (slow) backups, content libraries, etc. so not a total bust, but still.

Ran KVM for a while on a box with ZFS datastore and that was much better, but want to use ESXi ... is there something else that works better? I was wondering if Napp-it avoids this awful sync issue FreeBSD/NAS has with ZFS writes, have tried NVMe slog (SM953 with 1000+ Mbps writes), sync=disable, tried re-compiling a kernel with sync modified (I'm sure you've all seen it), it's all crap. ZFS just sucks for VM datastore in ESXi.

Does Napp-it overcome that slow write issue to NFS datastores in some other way? Should I just ditch ZFS and run XFS on mdadm?
 

cheezehead

Active Member
Sep 23, 2012
711
174
43
WI
With ESXi were you using VT-d to passthrough the HBA to your FreeNAS/FreeBSD VM? Also, were you using the VMXNet3 adapter? If either is set wrong I've seen some pretty bad speeds.

Also, what kind of write speed variations are you seeing between installs?
 

AveryFreeman

ESXi + ( ILLUMOS / ZFS ) = HAPPY
Mar 17, 2017
165
19
18
38
Near Seattle
averyfreeman.com
With ESXi were you using VT-d to passthrough the HBA to your FreeNAS/FreeBSD VM? Also, were you using the VMXNet3 adapter? If either is set wrong I've seen some pretty bad speeds.
Oh yeah, of course - sorry, I thought all that was just a given. Most recent was LSI 9211-8i controller w/ HGST He8 8TB mirror, the connection was vmxnet3 on an internal VM network (no physical nic, just for use between VMs). MTU was 9000.

I mean, it's well known that ESXi is slow AF when writing to ZFS NFS datastores due to the o_sync flag, which is what I tried to disable by modifying the kernel, commenting out this portion of nfs_nfsdport.c and compiling /installing the kernel (and rebooting of course):

vi /usr/src/sys/fs/nfsserver/nfs_nfsdport.c

Code:
// if (stable == NFSWRITE_UNSTABLE)
ioflags = IO_NODELOCKED;
// else
// ioflags = (IO_SYNC | IO_NODELOCKED);
uiop->uio_resid = retlen;
uiop->uio_rw = UIO_WRITE;

The original source is: Technical Musings: Speeding up FreeBSD's NFS on ZFS for ESX clients

I don't know if it's just too old and doesn't work on current versions of FreeBSD, I mean 7 years ago is quite a long time in the tech world.

But what I really want to know (before I just try it for myself) is if Napp-it has these same issues with the io_sync - is it as stringent when writing to NFS datastores as FreeBSD, or have the developers mitigated this issue by default?

Also, what kind of write speed variations are you seeing between installs?
really crap, in CDM like 100MB/s Seqential Q32T1 and sub-MB speeds in 4K Q32T1. I can't find any of the screenshots ATM, but I tested KVM w/ ZFS shortly thereafter and thought this was a huge improvement:
 

Attachments

gea

Well-Known Member
Dec 31, 2010
2,481
835
113
DE
There are four aspects

1. ZFS
Esxi wants to do sync writes on NFS. So when your NFS shared filesystem, has sync=default or always ESXi does secure sync writes that are much slower than async writes. To check this, set sync to disabled and recheck performance. If write is much faster then, add an Intel Optane (800P or better) as an Slog.

2. Settings
Most defaults are optimized for 1G networks. For faster networks, increase tcp, vmxnet3s and NFS buffers (napp-it see menu System > Tunings)

3. RAM
ZFS use a rambased write cache (Open-ZFS default 10% RAM up to 4GB) to transform small random writes to large sequential ones. Performance on ZFS is RAM related. Add as much RAM as you can and redo test. With low RAM Solarish is often faster than BSD or Linux as the ZFS internal RAM management is (still) Solaris based. FreeNAS itself (the management GUI) is also known to be quite memory hungry, more than others GUIs like Nas4Free or napp-it.

For the above, see http://openzfs.hfg-gmuend.de/napp-it_manuals/optane_slog_pool_performane.pdf for effects of an Optane Slog or RAM vs performance

4. OS
ZFS was developped on Solaris and quite often Solaris is the fastest ZFS filer followed by the Solaris forks followed by other implementations.

btw
ESXi internal transfers are in software. MTU settings are only relevant on real ethernet.
 
Last edited:

AveryFreeman

ESXi + ( ILLUMOS / ZFS ) = HAPPY
Mar 17, 2017
165
19
18
38
Near Seattle
averyfreeman.com
Wow, you're a wellspring of knowledge, I really appreciate it.

There are four aspects

1. ZFS
Esxi wants to do sync writes on NFS. So when your NFS shared filesystem, has sync=default or always ESXi does secure sync writes that are much slower than async writes. To check this, set sync to disabled and recheck performance. If write is much faster then, add an Intel Optane (800P or better) as an Slog.
I don't think I can afford an Optane 800P - what about an IODrive? My m.2 SM953 isn't fast enough? Or an SM970 pro... ? Done any tests? Samsung consumer has pretty amazing MTBF these days...

Oh, and how big should I make it? I've read some places it doesn't really matter how large, maybe partition off a 24GB partition and massively over-provision it... does that sound right?

2. Settings
Most defaults are optimized for 1G networks. For faster networks, increase tcp, vmxnet3s and NFS buffers (napp-it see menu System > Tunings)
What are some typical NFS buffer settings? Sorry, I've never worked with NFS buffer.

3. RAM
ZFS use a ram-based write cache (Open-ZFS default 10% RAM up to 4GB) to transform small random writes to large sequential ones. Performance on ZFS is RAM related. Add as much RAM as you can and redo test. With low RAM Solarish is often faster than BSD or Linux as the ZFS internal RAM management is (still) Solaris based. FreeNAS itself (the management GUI) is also known to be quite memory hungry, more than others GUIs like Nas4Free or napp-it.
I agree, FreeNAS is what I started with to try and make things easier but it had so many features I didn't need I switched to FreeBSD and haven't looked back...

For the above, see http://openzfs.hfg-gmuend.de/napp-it_manuals/optane_slog_pool_performane.pdf for effects of an Optane Slog or RAM vs performance
Will do!

4. OS
ZFS was developed on Solaris and quite often Solaris is the fastest ZFS filer followed by the Solaris forks followed by other implementations.
Yeah, I tried Napp-it on Solaris 11 for a while and it seemed to work well, but I got scared about having a pool that wasn't using OpenZFS for future compatibility reasons.

btw
ESXi internal transfers are in software. MTU settings are only relevant on real ethernet.
Oh, good to know! I won't bother setting it going forward then.

There's one other aspect I wanted to know about that I ran across on a CentOS NFS page doing something totally unrelated last night - sync write on request

18.6. NFS Server Configuration

4th paragraph down from fig 18.4:

Sync write operations on request — Enabled by default, this option does not allow the server to reply to requests before the changes made by the request are written to the disk. This option corresponds tosync. If this is not selected, the async option is used.

  • Force sync of write operations immediately — Do not delay writing to disk. This option corresponds to no_wdelay.

Can I safely disable this feature when using 10Gbe? It sounds like it has a deleterious effect re: throughput...

Thanks for all your help... reading benchmark test PDF right now...
 

gea

Well-Known Member
Dec 31, 2010
2,481
835
113
DE
Using consumer SSDs for an Slog?

This is a quite a bad idea and only better than using sync write without any Slog on slow disks as it improves sync write performance and data security a little. They lack powerloss protection and therefor you cannot use them to fully protect the content of the RAM based writecache. They are also optimized for high sequential values and quite bad regarding low latency and steady writes where their write iops go down after some time of writing.

From size an Slog can be quite small. The best Slog in the past was a Dram based ZeusRam with a size of 8GB.


NFS Tuning, see Chapter 3 NFS Tunable Parameters (Solaris Tunable Parameters Reference Manual)
You can set them in /etc/system manually or menu System > Appliance tuning where you can set new values and see defaults.


About Solaris ZFS
Sadly Oracle decided to close its development so now we have to decide between native Oracle ZFS and Open-ZFS.
Now you have the choice to use Solaris ZFS v44 (from a pure technical view the advanced solution over Open-ZFS regarding features and performance but nor open not free and with bugfixes available only with a subscription) and Open-ZFS. A move between requires a backup, pool destroy and restore.


About sync write on ZFS
On ZFS this is a property of a filesystem.
With sync=default a client like ESXi decides otherwise you can force always or disabled.
 

rune-san

Member
Feb 7, 2014
78
15
8
An SM953 is actually an NVMe drive with PLP. It would work as a SLOG device, but it's way larger than it needs to be for SLOG, and I would argue it would be best used as All-Flash storage vs. a SLOG device to a larger pool. But if its what you have, then its hard to beat reusing non-used hardware.

Stay away from the 970 Pro. Even with "Pro" in the name it is very much a consumer drive with no PLP. It's latency is all over the place, and as a result it makes a poor choice for SLOG.
 

J-san

Member
Nov 27, 2014
67
42
18
40
Vancouver, BC
Oh yeah, of course - sorry, I thought all that was just a given. Most recent was LSI 9211-8i controller w/ HGST He8 8TB mirror, the connection was vmxnet3 on an internal VM network (no physical nic, just for use between VMs). MTU was 9000.

I mean, it's well known that ESXi is slow AF when writing to ZFS NFS datastores due to the o_sync flag, which is what I tried to disable by modifying the kernel, commenting out this portion of nfs_nfsdport.c and compiling /installing the kernel (and rebooting of course):

<snip>

The original source is: Technical Musings: Speeding up FreeBSD's NFS on ZFS for ESX clients

I don't know if it's just too old and doesn't work on current versions of FreeBSD, I mean 7 years ago is quite a long time in the tech world.
I think you've already moved past this, but I thought I would add that hacking or tweaking settings to defeat Sync writes is just shooting yourself in the foot.

This is a very bad idea, ESXi requests sync writes to NFS stores to ensure data integrity for the VMDKs. You could end up with corrupted guest filesystems if there's powerloss or a crash to the ESXi server.


Also on a different note, it does appear that MTU settings do still affect read/write speeds even over VMXNet3 virtual NIC / switches - so do still update/set those to 9000Mtu:

https://forums.servethehome.com/ind...0-vs-9000mtu-open-vm-tools.23845/#post-222042