how'd you improve zfs performance?

realtomatoes · Oct 20, 2016

i got freenas running on a 2 vcpu 8gb vm on esxi with passthrough to the intel sata ports (2x 6gbps and 4x3gbps). it currently has 4x1tb 7200rpm drives in raid10. it serves the shared datastore to my vmwre lab via iscsi.

since i only got 2 extra sata ports to expand and improve the performance, here's where i'd like your thoughts on:
1) what's the simplest way to improve performance?
2) what's the best way to improve performance
3) what's the most cost-effective way to improve performance?

spazoid · Oct 20, 2016

1) Add RAM for ARC and/or SSD for L2ARC
2) Add as much RAM as you can afford
3) Add an SSD for L2ARC

For all of the above, it depends on your workload. I actually removed an L2ARC SSD from a pool that only had media files and felt improved performance, but for a VM datastore, an L2ARC device will probably help performance quite a bit. Keep in mind that L2ARC uses 180 bytes of RAM for every entry, so L2ARC devices will consume additional RAM that can't be used for ARC data. Depending on your average block size in L2ARC, you'd only be able to have about 130-140 GB of L2ARC with 8 GB of RAM.

chilipepperz · Oct 20, 2016

Obvious question, read or write performance?

maybe use arcstats

realtomatoes · Oct 20, 2016

chilipepperz said:
Obvious question, read or write performance?

maybe use arcstats

write performance. dd rated the read at about ~800MB/s and the write at about ~180MB/s. figured getting that write speed up is what i need to work on.

ttabbal · Oct 20, 2016

Try testing with sync disabled. If it makes a big difference, fast SSD SLOG will help when you turn sync back on.

dswartz · Oct 21, 2016

Not so much. I have a 6x2 raid10 pool with a zeusram SLOG. NFS from vsphere (so sync is forced on.) Reads max out at about 700MB/s, but writes are at most 150MB/s. If I disable sync, writes go up to about 400MB/s. At one point, I even tested with an 8GB RAM disk as SLOG, and it made no difference, so that exonerates the Zeusram, the PCI-E bus, the HBA, etc... I can only assume some kind of ZIL throttling issue. I've posted this in multiple forums and mailing lists, to no avail. It *might* be ZoL issue #1012. Read it here: Large synchronous writes are slow when a slog is present · Issue #1012 · zfsonlinux/zfs · GitHub. Note that the author claims this is probably in all ZFS implementations...

realtomatoes · Oct 21, 2016

dswartz said:
Not so much. I have a 6x2 raid10 pool with a zeusram SLOG. NFS from vsphere (so sync is forced on.) Reads max out at about 700MB/s, but writes are at most 150MB/s. If I disable sync, writes go up to about 400MB/s. At one point, I even tested with an 8GB RAM disk as SLOG, and it made no difference, so that exonerates the Zeusram, the PCI-E bus, the HBA, etc... I can only assume some kind of ZIL throttling issue. I've posted this in multiple forums and mailing lists, to no avail. It *might* be ZoL issue #1012. Read it here: Large synchronous writes are slow when a slog is present · Issue #1012 · zfsonlinux/zfs · GitHub. Note that the author claims this is probably in all ZFS implementations...

thanks. that was the same behavior i noticed when running nfs for vsphere. moving to iscsi improved it so i guess i'm stick with that.

dswartz · Oct 21, 2016

How much better did it get? I get the impression with iSCSI, the only time writes are synchronous is when the virtual scsi controller issues some kind of flush. I am with NFS now because it was simpler, but if I can get a big win, I'll probably switch...

realtomatoes · Oct 21, 2016

nfs to iscsi was a perf leap for me ~45MB/s to 100Mb/s. i got no dedicated zil just 4 7200rpm drives in raid10. i got 2 more sata ports i can use which makes me wonder if the best way to get more performance out of the rig is to add another mirrored vdev or to get a zil.

of course, your rig is beefier and you got that god-like ssd (zeusram ;D) so not sure how much perf you'd get from running on iscsi.

jgreco · Oct 22, 2016

dswartz said:
How much better did it get? I get the impression with iSCSI, the only time writes are synchronous is when the virtual scsi controller issues some kind of flush. I am with NFS now because it was simpler, but if I can get a big win, I'll probably switch...

iSCSI is basically async by default, while (at least for ESXi) everything written via NFS is sync.

An iSCSI datastore used for VM storage needs to be set to sync if you care about VM disk consistency in the event of your filer crashing or other similar problems. On the flip side, you can disable sync writes for an NFS datastore if you do not care about VM disk consistency in the (generally unusual) event of a filer crashing.

Once you wrap your head around this, and you level the playing field by either forcing async on or off, it turns out that NFS and iSCSI perform very similarly, though not identically.

Adding a SLOG will always significantly lower apparent pool write performance, because there's a significant amount of additional activity to synchronously write the data to SLOG.

Having observed people wrestle with this for years, the questions you should probably ask yourself are:

"If the filer I'm storing VM's on crashes, am I okay if the underlying VM's get corrupted? Is my VM filer protected by a UPS? Does it have ECC memory? If a VM gets corrupted, do I have backups I can restore from? Can I resist the urge to be constantly updating the filer's firmware, and settle for long term stability on a known good version?"

If you can answer yes to these questions and others like it, your chances of hitting a bad VM corruption due to a crash/power loss/etc are significantly lowered, and you could consider running in async mode.

If you're setting up an enterprise grade storage system, on the other hand, ... you need the sync writes, regardless of whether you're using iSCSI or NFS.

Highly suggested reading:

Sync writes, or: Why is my ESXi NFS so slow, and why is iSCSI faster?

realtomatoes · Oct 23, 2016

@jgreco

good read. thanks.

yes, i'll get a ups and sort out an auto-shutdown procedure for the infra when the power goes out. yes, am not likely to keep upgrading firmware or software versions unless necessary. and yes, i can see how a slog can slow down zfs.
i definitely agree about configuration and tuning varying on what workload one runs.

thanks for the input. definitely, learning a lot while tinkering with zfs.

T_Minus · Oct 23, 2016

After spending a lot of time and money playing... err... experimenting with ZFS I came to the conclusion that for hosting "tons of VMs" for work they had absolutely 0 reason to be redundant, archived , mirrored or anything that cost $/time to make sure one VM wasn't "down" or had a very fast recovery. Seeing as how the VMs perform a certain task and we keep a handful of variations updated to 'roll out' as-needed along with the data being processed already in a database, archived, backed up, mirrored, etc... trickling down that type of redundancy to the VM level was simply a waste of space and headache to get as much performance as possible.

Now, for business operations that keep the business running (databases, general storage, etc...) this is where I've focused my ZFS testing, configurations, and priority in terms of top quality parts, having mirrored data on other nodes, etc... As long as these crucial pieces stay "online" the rest of the bunch of VMs that use this data could die, their host could die, and just plug a new one in or power up another chassis and away we go again.

Sorry to ramble

just the approach I ended up taking as to save money, my sanity and still get best bang for the buck.

I just got my new chassis yesterday so this week aiming to test new ZeusRAM cable, 12Gb/s SAS SSD, and a P3600 and p3700 NVME on baremetal OmniOS/Napp-IT and see how they do

Rather excited.

fractal · Oct 30, 2016

realtomatoes said:
write performance. dd rated the read at about ~800MB/s and the write at about ~180MB/s. figured getting that write speed up is what i need to work on.

How accurately does "dd" replicate your normal work flow? Does your normal workflow involve a lot of sequential writes?

I looked at putting cache ahead of four 1/0 7200's that were hosting a couple dozen VM's and came to the conclusion that it was a waste of time. I briefly tested faster rust but didn't like the heat / noise. So I replaced my HDDs with SSDs and am happy. I just saw 800gb SSD's in the deals section pretty cheap. 4 of them in a raidz1 or 6 of them in raid10 is more capacity than your 4 x 1tb's in raid10 and more speed and a LOT less latency.

Otherwise, the obligatory -- MORE RAM is probably in order.

Jaesii · Nov 19, 2016

I currently use my FreeNAS box in my home ESXi environment and I get pretty decent performance using iSCSI.

FreeNAS config.
SuperMicro X8DTL-3F
Dual Xeon L5630 4C 8T (8C 16T)
30GB DDR3 ECC
Intel x520-DA2 10GbE CNA Dual Port
Mellanox MNPA19-XTR 10GbE Single Port
8x Hitachi Ultrastar Enterprise 2TB 7.2k SATAIII RAID Z1
2x Silicon Power 120GB SSD TLC L2ARC
2x OCZ Deneva2 Enterprise 30GB SLC ZIL
Autotune enabled in advanced options.

FreeNAS is directly attached to my HP DL380 G7 via two Twinax cables.

Each nic is a seperate vswitch in vmware dedicated for iSCSI traffic.
iSCSI port binding is enabled on the two vmkernels.
Both on FreeNAS side and vmware side, I have MTU set to 9000.
In the datastore, I modified the path settings and changed it to Round-Robbin.
I also enabled Storage I/O control on the FreeNAS datastore within vmware.

I have 19 VMs running on my FreeNAS datastore, and my latency never goes above 40ms.

Here's a quick crystal disk mark test from a VM running on my FreeNAS.

And for comparison, here's a test running on a datastore comprised of 8x HP 300GB 10K SAS disks in a Raid 5.

As good of performance that I get out of my FreeNAS, I probably would never use one in a production enterprise environment. I'd stick with an HP MSA SAN direct attached with SAS cables.

T_Minus · Nov 19, 2016

Sorry no new tests / updates my chassis are getting 're-worked' from being damaged so bad in shipping. Ugh

Search

how'd you improve zfs performance?

realtomatoes

Active Member

spazoid

Member

chilipepperz

Active Member

realtomatoes

Active Member

ttabbal

Active Member

dswartz

Active Member

realtomatoes

Active Member

dswartz

Active Member

realtomatoes

Active Member

jgreco

New Member

realtomatoes

Active Member

T_Minus

Build. Break. Fix. Repeat

fractal

Active Member

Jaesii

New Member

T_Minus

Build. Break. Fix. Repeat