I just noticed something that may help others.
ZFS write throttling can kick in to slow down your write performance.
I ran into this while benchmarking when I reduced the memory given to my OmniOS VM from 16GB down to 13GB.
The pool I was benchmarking is mirrored striped 2TB RE4 Drive pool with a 100GB S3700 SSD:
/w sd.conf tuning:
I'm benchmarking @ 9000MTU mounted via an all-in-one network virtual 10G esxi vswitch connection to the NFS datastore on the pool above.
Compression=off, sync=standard ( equal to 'always' for ESXi 5.5 NFS datastore)
Benchmarking with:
OmniOS VM memory at 16GB
CrystalDiskMark 3.0.3 x64
Test size: 1000MB
47 MB/s 4K@QD32 write speed
OmniOS VM memory at 13GB
CrystalDiskMark 3.0.3 x64
Test size: 1000MB
20 MB/s 4K@QD32 write speed
I was testing other settings (NUMA) so I didn't realize this was the byproduct of memory changes.
After a while I realized that the memory size had changed so I changed it back.
Performance was back to where it was.
But why did it decrease so much?
If the size of your dirty data starts to reach close to a percentage of the zfs_dirty_data_max setting (which varies based upon amount of RAM) it can start throttling writes.
So I ran a test by changing the VM memory size down to 14GB and it lay between the two results above.
I checked a dtrace script while running the 1000MB 4k@QD32 write test:
It's write throttling as it gets closer to full, which means my RE4 vdevs can't
absorb the data being thrown at them async by ZFS after the s3700 SSD log device has written and acknowledged them.
So I temporarily bumped up the max dirty amount to 2495MB so I have lots of room:
Result:
OmniOS VM memory at 14GB
CrystalDiskMark 3.0.3 x64
Test size: 1000MB
zfs_dirty_data_max=2495MB
127 MB/s 4K@QD32 write speed
(yes 127MB/s that's not a typo, with sync=standard)
I dropped the zfs_dirty_data_max lower and ran the 500MB test and got the same results.
So just make sure you either:
A) Ensure your slower vdev disk pool can handle the writes thrown at it if you have a hybrid (SSD + slower disk) pool and you want to max out sustained 4k @QD32 throughput of your SLOG device.
B) Give the VM enough memory to handle the incoming amount of data that your SLOG can acknowledge.
(which will raise zfs_dirty_data_max as a byproduct)
C) Ensure you increase zfs_dirty_data_max independently if you can't spare more RAM, but want to steal from your ARC cache to accommodate the longest bursts of SLOG write data.
Hope people find this useful.
zfs throttle info and scripts from:
Adam Leventhal's blog » Tuning the OpenZFS write throttle
ZFS write throttling can kick in to slow down your write performance.
I ran into this while benchmarking when I reduced the memory given to my OmniOS VM from 16GB down to 13GB.
The pool I was benchmarking is mirrored striped 2TB RE4 Drive pool with a 100GB S3700 SSD:
Code:
tank
mirror-0
c21t50014EE05926E121d0 RE4 2TB
c20t50014EE25F104CBEd0 RE4 2TB
mirror-1
c13t50014EE2B50E9766d0 RE4 2TB
c14t50014EE25FB90889d0 RE4 2TB
logs
c17t55CD2E404B65494Ed0 100GB S3700 SSD
Code:
# DISK tuning
# Set correct non-volatile settings for Intel S3500 + S3700 SSDs
# WARNING: Do not set this for any other SSDs unless they have powerloss protection built-in
# WARNING: It is the equivalent to running zfs with sync=disabled if your SSD does not have powerloss protection.
sd-config-list=
"ATA INTEL SSDSC2BB48", "physical-block-size:4096, cache-nonvolatile:true, throttle-max:32, disksort:false",
"ATA INTEL SSDSC2BA10", "physical-block-size:4096, cache-nonvolatile:true, throttle-max:32, disksort:false";
Compression=off, sync=standard ( equal to 'always' for ESXi 5.5 NFS datastore)
Benchmarking with:
OmniOS VM memory at 16GB
CrystalDiskMark 3.0.3 x64
Test size: 1000MB
47 MB/s 4K@QD32 write speed
OmniOS VM memory at 13GB
CrystalDiskMark 3.0.3 x64
Test size: 1000MB
20 MB/s 4K@QD32 write speed
I was testing other settings (NUMA) so I didn't realize this was the byproduct of memory changes.
After a while I realized that the memory size had changed so I changed it back.
Performance was back to where it was.
But why did it decrease so much?
If the size of your dirty data starts to reach close to a percentage of the zfs_dirty_data_max setting (which varies based upon amount of RAM) it can start throttling writes.
So I ran a test by changing the VM memory size down to 14GB and it lay between the two results above.
I checked a dtrace script while running the 1000MB 4k@QD32 write test:
Code:
~# dtrace -s dirty.d tank
...
1 4181 txg_sync_thread:txg-syncing 0MB of 1432MB used
0 4181 txg_sync_thread:txg-syncing 0MB of 1432MB used
0 4181 txg_sync_thread:txg-syncing 64MB of 1432MB used
0 4181 txg_sync_thread:txg-syncing 516MB of 1432MB used
0 4181 txg_sync_thread:txg-syncing 637MB of 1432MB used
1 4181 txg_sync_thread:txg-syncing 933MB of 1432MB used
0 4181 txg_sync_thread:txg-syncing 927MB of 1432MB used
1 4181 txg_sync_thread:txg-syncing 932MB of 1432MB used
1 4181 txg_sync_thread:txg-syncing 940MB of 1432MB used
1 4181 txg_sync_thread:txg-syncing 925MB of 1432MB used
0 4181 txg_sync_thread:txg-syncing 932MB of 1432MB used
0 4181 txg_sync_thread:txg-syncing 935MB of 1432MB used
0 4181 txg_sync_thread:txg-syncing 752MB of 1432MB used
0 4181 txg_sync_thread:txg-syncing 0MB of 1432MB used
absorb the data being thrown at them async by ZFS after the s3700 SSD log device has written and acknowledged them.
So I temporarily bumped up the max dirty amount to 2495MB so I have lots of room:
Code:
# echo zfs_dirty_data_max/W0t2617101363 | mdb -kw
# dtrace -s dirty.d tank
0 4181 txg_sync_thread:txg-syncing 0MB of 2495MB used
1 4181 txg_sync_thread:txg-syncing 64MB of 2495MB used
1 4181 txg_sync_thread:txg-syncing 567MB of 2495MB used
1 4181 txg_sync_thread:txg-syncing 667MB of 2495MB used
0 4181 txg_sync_thread:txg-syncing 1001MB of 2495MB used
1 4181 txg_sync_thread:txg-syncing 1001MB of 2495MB used
1 4181 txg_sync_thread:txg-syncing 1001MB of 2495MB used
0 4181 txg_sync_thread:txg-syncing 1001MB of 2495MB used
0 4181 txg_sync_thread:txg-syncing 1001MB of 2495MB used
0 4181 txg_sync_thread:txg-syncing 1002MB of 2495MB used
1 4181 txg_sync_thread:txg-syncing 936MB of 2495MB used
1 4181 txg_sync_thread:txg-syncing 0MB of 2495MB used
OmniOS VM memory at 14GB
CrystalDiskMark 3.0.3 x64
Test size: 1000MB
zfs_dirty_data_max=2495MB
127 MB/s 4K@QD32 write speed
(yes 127MB/s that's not a typo, with sync=standard)
I dropped the zfs_dirty_data_max lower and ran the 500MB test and got the same results.
So just make sure you either:
A) Ensure your slower vdev disk pool can handle the writes thrown at it if you have a hybrid (SSD + slower disk) pool and you want to max out sustained 4k @QD32 throughput of your SLOG device.
B) Give the VM enough memory to handle the incoming amount of data that your SLOG can acknowledge.
(which will raise zfs_dirty_data_max as a byproduct)
C) Ensure you increase zfs_dirty_data_max independently if you can't spare more RAM, but want to steal from your ARC cache to accommodate the longest bursts of SLOG write data.
Hope people find this useful.
zfs throttle info and scripts from:
Adam Leventhal's blog » Tuning the OpenZFS write throttle
Last edited: