The ultimate ZFS ESXi datastore for the advanced single User (want, not have)

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

pzwahlen

New Member
Jun 24, 2021
7
2
3
First post on here so hi everyone!

I have tried quite hard to get such a redundant storage solution for VMware and couldn't make it work on NFS on Linux. Failover would fail because of open handles and network multipathing is not possible (afaik). I also tried with NFSv4/multi-sessions and couldn't get the Linux NFS server to play nicely with ESXi. My next option would have been nfs-ganesha.

I finally went down the iscsi path with SCST and custom Pacemaker resource agents for zfs pools and iscsi targets (happy to share).

I also first tried to export zvol block devices and performance was abysmal. I finally went for simple files on ZFS datasets exported using fileio and will probably never look back.

Finally, network multipathing is trivial with iscsi.

I have been running this on CentOS 7 / ZFS 0.7.x / SCST 3.3 for years. I am now in the process of building the next iteration on Rocky Linux 8.4 / ZFS 2.0 (or 2.1) and the latest SCST.

I am therefore also looking for optimizations.

Cheers. Patrick
 

dswartz

Active Member
Jul 14, 2011
610
79
28
I have the ewwhite HA/NFS pacemaker cluster working fine for awhile now. CentOS8.
 

Bjorn Smith

Well-Known Member
Sep 3, 2019
876
481
63
49
r00t.dk
Just to chip in.

My experience is:
No matter how fast you get your ZFS datastores - esxi will never be able to use all that speed, meaning - if you have a datastore capable of pushing 4GB/s sync writes - and a network that can support that speed - esxi will never use all that bandwidth for a single operation, i.e. vmotion.

I think the reason is simply that ESXi is meant to keep performing while you vmotion, and if it sucked the life out of the storage system just to make the vmotion faster, then everything else would suffer at the same time and would not have done its job very well - so I think inside ESXi there is some kind of MAX b/w it want to use for single work loads, simply to prevent a single workload from hogging all of the available performance.