please post some of your VMware -> SAN Latency graphs (ideally with related IOPS / disk load - or proxmox data will work). In my case im using vmware → Truenas (so im looking for any sync related loads that also measure latency). I’m posting some of mine below. (Honestly I’d appreciate any type of truenas related pool latency graphs/data). nb; my truenas is not virtualized, its physical (specs below in spoiler)
why-
Ive been trying to troubleshoot truenas (or zfs) “cross-pool latency issues” (mostly around NFS) - where on VMware-> TN via NFS data-stores, if a HDD backed pool is getting hit hard, it causes my other NVMe or SSD pools to latency spike unreasonably high from the standpoint of the VMware hosts. (all pools are optane p900 slog backed).
for example, every night i have 2x NVR (video) VMs that move about 80GB of video from SSDs Pools to HDD Pools (via nfs VMware disks), The latency spikes have forced me to move some of the VM’s OS boot disks to host direct attached storage as it was causing some services to fail or some OSes to reboot due to IO timeouts.
(pool info at bottom)
The 2x VM hosts have 25gb networking, TN (physical) has 100g networking (im aware 10g is more than enough). I do have a decent/high amount of IOPS load from vmware->truenas, So perhaps what I’m seeing is to be expected , I just need some comparisons or points of reference from the community hopefully . Below are my own datapoints, and what im hoping some others will post so i can compare (or im happy to answer any questions , even if unrelated)
I have done a crazy amount of Troubleshooting and testing, and this has been an issue over two entirely different sets of truenas hardware over the years thus:
There is a decent chance that I’m just stressing this SAN/ZFS system and what I’m seeing is normal /expected with ZFS (and is why Businesses / Enterprises pay so much reoccurring for SANs / something like truenas enterprise ) - But this is why I'm hoping to see latency data from others.
I've covered all the low hanging fruit like:
- watching gstat to be sure its not disk / HBA Saturation (same with system load avg)
- networking: seperate vlans for NFS / vMotion / jumbo frames (and -df tested end to end) / iperf showing 10gbit+ in both directions
- zfs: using good slog , using sas3 ssds in mirror, using 4 drive nvme mirror
thank you for your time
Pool / truenas system info (and screenshots showing load/ latency spikes at bottom):
(6x images - click for full screen):






why-
Ive been trying to troubleshoot truenas (or zfs) “cross-pool latency issues” (mostly around NFS) - where on VMware-> TN via NFS data-stores, if a HDD backed pool is getting hit hard, it causes my other NVMe or SSD pools to latency spike unreasonably high from the standpoint of the VMware hosts. (all pools are optane p900 slog backed).
for example, every night i have 2x NVR (video) VMs that move about 80GB of video from SSDs Pools to HDD Pools (via nfs VMware disks), The latency spikes have forced me to move some of the VM’s OS boot disks to host direct attached storage as it was causing some services to fail or some OSes to reboot due to IO timeouts.
(pool info at bottom)
The 2x VM hosts have 25gb networking, TN (physical) has 100g networking (im aware 10g is more than enough). I do have a decent/high amount of IOPS load from vmware->truenas, So perhaps what I’m seeing is to be expected , I just need some comparisons or points of reference from the community hopefully . Below are my own datapoints, and what im hoping some others will post so i can compare (or im happy to answer any questions , even if unrelated)
I have done a crazy amount of Troubleshooting and testing, and this has been an issue over two entirely different sets of truenas hardware over the years thus:
There is a decent chance that I’m just stressing this SAN/ZFS system and what I’m seeing is normal /expected with ZFS (and is why Businesses / Enterprises pay so much reoccurring for SANs / something like truenas enterprise ) - But this is why I'm hoping to see latency data from others.
I've covered all the low hanging fruit like:
- watching gstat to be sure its not disk / HBA Saturation (same with system load avg)
- networking: seperate vlans for NFS / vMotion / jumbo frames (and -df tested end to end) / iperf showing 10gbit+ in both directions
- zfs: using good slog , using sas3 ssds in mirror, using 4 drive nvme mirror
thank you for your time
Pool / truenas system info (and screenshots showing load/ latency spikes at bottom):
OS: FreeNAS-13.0u6 (stable) (boot volume is a mirror of 60gb intel 520 SSDs)
MB: H11SSL-NC (supermicro AMD)
CPU: AMD EPYC 7262 8-Core
RAM: 8x 32gb DDR4 ECC (256gb)
HBAs: 2x SAS3008 LSI sas3 HBAs to SAS3 supermicro expander backplanes
NIC: 100gb to netgear m4500 switch
SLOG: 280g Optane p900 (SLOG)
DISKS / POOL:
16x 16TB HGST 8tb SAS 7200rpm ( 8x disks in 2x vDev , each vDev raid Z2)
10x 1.6TB HGST SSD sas3 (5-way mirror)
4x 1.6tb Intel p3605 NVMe (2-way mirror)
MB: H11SSL-NC (supermicro AMD)
CPU: AMD EPYC 7262 8-Core
RAM: 8x 32gb DDR4 ECC (256gb)
HBAs: 2x SAS3008 LSI sas3 HBAs to SAS3 supermicro expander backplanes
NIC: 100gb to netgear m4500 switch
SLOG: 280g Optane p900 (SLOG)
DISKS / POOL:
16x 16TB HGST 8tb SAS 7200rpm ( 8x disks in 2x vDev , each vDev raid Z2)
10x 1.6TB HGST SSD sas3 (5-way mirror)
4x 1.6tb Intel p3605 NVMe (2-way mirror)
(6x images - click for full screen):






Last edited:


