ESXi 6.5 with ZFS backed NFS Datastore - Optane Latency AIO benchmarks

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

J-san

Member
Nov 27, 2014
68
43
18
44
Vancouver, BC
Doing some benchmarking for reducing latency in AIO - ESXi 6.5 to OmniOS ZFS backed NFS datastore.

The hardware is:
  • Supermicro X10DRi-T
  • 2 x Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
  • 128 GB RAM DDR4 (@ 1866 Mhz - cpu limited from 2133Mhz)
  • 3 x 9211-8i in HBA IT mode /w P20.0.0.7 firmware
  • 1 x AOC-SLG3-2E4 (non R) HBA card to connect NVMe Optane
BIOS:
  • Updated to Version 3.1
  • Bifurcation enabled and setup in Slot 5 to 4x4
  • Slot 5 EFI Oprom -> Legacy
  • Power Settings
    • Custom -> C State Control
      • C6 (Retention) State, CPU C6 Report Enable, C1E Enable
    • Custom -> P State Control
      • EIST (P-States) Enable, Turbo Mode Enable, P-state Coord – HW_ALL, Boost perf mode Max performance
    • Custom -> T State Control
      • Enable
  • USB 3.0 Support -> Enabled (to support USB 3.0 key booting)
I've assigned the following to the OmniOS VM:
  • 4 x Cpus
  • 59392 MB RAM (All reserved)
  • 3 x 9211-8i in passthrough
Disks:
  • 4 x Intel S4600 – 960GB
  • 4 x WD Gold 6TB
  • 1 x Intel S3700 - 200GB (slog)
  • 1 x Intel Optane P4800X - 375GB (slog)
NFS Sharing AIO setup:
  • OmniOS VM with VMXNET3 adapter attached to Port group "storagenet"
  • ESXi VMkernel NIC vmk2 for NFS storage attached to own Port Group "storagevmk"
  • Both portgroups "storagenet" and "storagevmk" attached to separate vSwitch2
    • Not attached to any Physical NIC
    • vSwitch2 set to 9000MTU
  • OmniOS VM VMXNET3 adapter set to 9000MTU
OmniOS VM setup:
  • omnios-r151028
  • Intel Optane P4800X - 375GB (slog) - added as 30 GB VMDK on local Optane backed datastore.
  • NUMA affinity set to numa.nodeAffinity=1
    • To match Optane in Slot5 (CPU2) of Dual CPU MB
    • Results were about 300MB/s seq slower if ESXi flipped the OmniOS VM onto Numa node 0 (CPU1)
  • /etc/system modified:
Code:
* Thanks Gea!
* napp-it_tuning_begin:                                                                                                          
* enable sata hotplug                                                                                                            
set sata:sata_auto_online=1                                                                                                      
* set disk timeout 15s (default 60s=0x3c)                                                                                        
set sd:sd_io_time=0xF
* increase NFS number of threads                                                                                                  
set nfs:nfs3_max_threads=64                                                                                                      
set nfs:nfs4_max_threads=64                                                                                                      
* increase NFS read ahead count                                                                                                  
set nfs:nfs3_nra=32                                                                                                              
set nfs:nfs4_nra=32                                                                                                              
* increase NFS maximum transfer size                                                                                              
set nfs3:max_transfer_size=1048576                                                                                                
set nfs4:max_transfer_size=1048576                                                                                                
* increase NFS logical block size                                                                                                
set nfs:nfs3_bsize=1048576                                                                                                        
set nfs:nfs4_bsize=1048576                                                                                                        
* tuning_end:
  • sd.conf modified:
Code:
# DISK tuning                                                                                                                    
# Set correct physical-block-size and non-volitile settings for SSDs                                                              
# S3500 - 480 GB                                                                                                                  
# S3700 - 100 + 200 + 400 GB                                                                                                      
# S4600 - 480 + 960 GB                                                                                                            
# Set fake physical-block-size for WD RE drives to set pools to ashift12 (4k) so drives are replaceable by larger disks.          
# WD RE4                                                                                                                          
# WD RE gold                                                                                                                      
sd-config-list=                                                                                                                  
"ATA     INTEL SSDSC2BB48", "physical-block-size:4096, cache-nonvolatile:true, throttle-max:32, disksort:false",                  
"ATA     INTEL SSDSC2BA10", "physical-block-size:4096, cache-nonvolatile:true, throttle-max:32, disksort:false",                  
"ATA     INTEL SSDSC2BA20", "physical-block-size:4096, cache-nonvolatile:true, throttle-max:32, disksort:false",                  
"ATA     INTEL SSDSC2BA40", "physical-block-size:4096, cache-nonvolatile:true, throttle-max:32, disksort:false",                  
"ATA     INTEL SSDSC2KG48", "physical-block-size:4096, cache-nonvolatile:true, throttle-max:32, disksort:false",                  
"ATA     INTEL SSDSC2KG96", "physical-block-size:4096, cache-nonvolatile:true, throttle-max:32, disksort:false",                  
"ATA     INTEL SSDPE21K37", "physical-block-size:4096, cache-nonvolatile:true, throttle-max:32, disksort:false",                  
"ATA     WDC WD2000FYYZ-0", "physical-block-size:4096",                                                                          
"ATA     WDC WD2005FBYZ-0", "physical-block-size:4096";


Now onto the benchmarks:

Testing out latency reduction from VSphere 6.5 best practices:
https://www.vmware.com/content/dam/...performance/Perf_Best_Practices_vSphere65.pdf


p.43
OmniOS VM ESXi advanced param ethernet2.coalescingScheme:
(for "storagenet" VMXNet3 adapter ethernet2)

Default setup (param not present - coalescing enabled by default)

4 x WD GLD in 2 stripe x 2 mirror - with S3700 slog:

CryDskMrk6-2x6tb-s3700-slog-lz4_recordsize_128k_CPUusage_X2APIC-OFF.PNG

ethernet2.coalescingScheme : disable

CryDskMrk6-2x6tb-S3700slog-lz4-recsize_128k-ethcoalesc_disable.PNG


4 x WD GLD in 2 stripe x 2 mirror - with P4800X Optane slog:

ethernet2.coalescingScheme - not present

CryDskMrk6-2x6tb-P4800X-esxiDatastore-lz4_recordsize_128k.PNG

ethernet2.coalescingScheme : disable

CryDskMrk6-2x6tb-P4800X-esxiDatastore-lz4-recsize_128k-ethcoalesc_disable.PNG



OmniOS VM set to Latency Sensitivity - High (from Normal)
- Side effect is reserving all CPU cores assigned to VM for that VM.

S3700 200GB Slog

CryDskMrk6-2x6tb-S3700slog-lz4-recsize_128k-ethcoalesc_disable_latencyHigh-4cores_try2.PNG


P4800X 375GB Optane Slog (as 30GB VMware VMDK - Thick Eager Zeroed)
- 4 CPU cores

CryDskMrk6-2x6tb-P4800Xslog(vdsk)-lz4-recsize_128k-ethcoalesc_disable_latencyHigh-4cores.PNG


Testing CPU Cores assigned to OmniOS with Latency Sensitivity High:

P4800X 375GB Optane Slog (as 30GB VMware VMDK - Thick Eager Zeroed)
- 2 CPU cores


CryDskMrk6-2x6tb-P4800Xslog(vdsk)-lz4-recsize_128k-ethcoalesc_disable_latencyHigh-2cores.PNG


- 4 Cores:

CryDskMrk6-2x6tb-P4800Xslog(vdsk)-lz4-recsize_128k-ethcoalesc_disable_latencyHigh-4cores.PNG



- 5 Cores:

CryDskMrk6-2x6tb-P4800Xslog(vdsk)-lz4-recsize_128k-ethcoalesc_disable_latencyHigh-5cores.PNG

- 6 CPU cores

CryDskMrk6-2x6tb-P4800Xslog(vdsk)-lz4-recsize_128k-ethcoalesc_disable_latencyHigh-6cores.PNG


Finally - Optane VMDK to native ESXi6 Datastore:

P4800X 375GB Optane Slog (30GB VMware VMDK - to NON-NFS local Optane ESXi6 Datastore)

CryDskMrk6-P4800X-esxiDatastore-direct-vmdk.PNG


Profit?
 
Last edited:

J-san

Member
Nov 27, 2014
68
43
18
44
Vancouver, BC
If I get time I may do some Optane benchmarks with it directly passed through to OmniOS instead of as a VMDK hard disk added to the OmniOS VM.

Tips on how to directly pass it through to OmniOS:
Bug #26508: Intel Optane 900p will not work in ESX passthrough - FreeNAS - iXsystems & FreeNAS Redmine

1. ESXi modification:
- ssh to ESXi
- edit /etc/vmware/passthru.map
- add following lines at the end of the file:
# Intel Optane P4800X can not be shared with d3d0
8086 2701 d3d0 false
- restart hypervisor
2. Toggle Passthrough for Optane in ESXi then reboot
3. Add to OmniOS as PCI device
 

gea

Well-Known Member
Dec 31, 2010
3,141
1,184
113
DE
For a performance sensitive vmdk you should not use Thin but Thick Provisioning Eager-Zeroed
 

J-san

Member
Nov 27, 2014
68
43
18
44
Vancouver, BC
For a performance sensitive vmdk you should not use Thin but Thick Provisioning Eager-Zeroed
Ah, actually looking at my config I actually did use Thick Provisioned Eager-Zeroed for the Optane SLOG VMDK... I added a 200GB Thin provisioned to test later as L2ARC. I'll update the OP to note this..

Cheers!