Sequential read speeds on my ZFS pool (4 vdevs, each a 2 x 6TB mirror) are not where I'd like them to be, at 80 - 130MB/S when moving large files around. My two main use cases for the pool are hosting VMs and moving large video files around. The VMs are performing fine but I'd like faster video file copy speeds.
DD Test (local on OmniOS):
Read: `dd if=/tank/dd.tst of=/dev/null bs=32768000`
Results = 563 MB/s write, 208 MB/s read
In reality I have never moved a large file around (intra-pool) at faster than 130 MB/s.
iostat snapshot of what the drives look like during the write test:
Expectation: Pool write = 4 x single HD write
Reality: All 4 vdevs are writing within the same ballpark as the rated sequential write speed of single drive: ~100MB for the 3 vdevs that have less free space, ~14o for the fairly empty one. This looks good to me.
Here's a snapshot during the read:
Expectation: Pool read = 8 x single HD read, reading evenly from all hard drives if data is striped evenly.
Reality: Data is distributed evenly but pool read = ~1.5x single HD read.
Why am I getting such evenly-distributed-but-slow read speeds across the board?
My hardware:
(Compression is normally on but disabled for testing with DD)
SMB copy on a Windows VM on the same ESXi host:
Here's another snapshot of what `zpool iostat -v 1` looks like during an SMB copy across filesystems going at ~70MB/s, indicating that the 4 vdevs are being utilized evenly, each transferring at ~17MB/s:
How can I find out what is bottlenecking my read speeds?
DD Test (local on OmniOS):
- BS=32M
- Count=6250
- Test file: 204.8 GB (>2 x memory)
- Napp-it realtime monitoring off
- Pool primary cache off
- Pool secondary cache off
- Compression off (not specifically called out by napp-it, but seemed like a good idea given that napp-it's dd test reads from /dev/zero
Read: `dd if=/tank/dd.tst of=/dev/null bs=32768000`
Results = 563 MB/s write, 208 MB/s read
In reality I have never moved a large file around (intra-pool) at faster than 130 MB/s.
iostat snapshot of what the drives look like during the write test:
Code:
capacity operations bandwidth
pool alloc free read write read write
------------------------- ----- ----- ----- ----- ----- -----
rpool 2.10G 37.7G 0 0 0 0
c2t0d0 2.10G 37.7G 0 0 0 0
------------------------- ----- ----- ----- ----- ----- -----
tank 11.5T 10.3T 120 3.52K 5.54M 450M
mirror 3.62T 1.82T 42 788 664K 98.4M
c0t50014EE20FB334A6d0 - - 26 788 165K 98.4M
c0t50014EE059345DBCd0 - - 15 788 499K 98.4M
mirror 3.70T 1.74T 34 870 1.98M 109M
c0t50014EE265087D35d0 - - 11 911 479K 114M
c0t50014EE2B73FBE85d0 - - 22 870 1.51M 109M
mirror 3.71T 1.73T 40 845 2.55M 106M
c0t50014EE2B7437D10d0 - - 20 884 1.01M 110M
c0t50014EE20FB32A77d0 - - 19 836 1.53M 104M
mirror 430G 5.02T 3 1.07K 369K 137M
c0t50014EE2650872C3d0 - - 1 1.06K 179K 136M
c0t50014EE2B7E365A7d0 - - 1 1.10K 191K 140M
------------------------- ----- ----- ----- ----- ----- -----
Reality: All 4 vdevs are writing within the same ballpark as the rated sequential write speed of single drive: ~100MB for the 3 vdevs that have less free space, ~14o for the fairly empty one. This looks good to me.
Here's a snapshot during the read:
Code:
capacity operations bandwidth
pool alloc free read write read write
------------------------- ----- ----- ----- ----- ----- -----
rpool 2.10G 37.7G 0 0 0 0
c2t0d0 2.10G 37.7G 0 0 0 0
------------------------- ----- ----- ----- ----- ----- -----
tank 11.6T 10.1T 1.94K 14 192M 121K
mirror 3.66T 1.78T 415 2 38.4M 23.4K
c0t50014EE20FB334A6d0 - - 278 2 21.5M 23.4K
c0t50014EE059345DBCd0 - - 136 2 17.0M 23.4K
mirror 3.74T 1.70T 353 3 44.2M 31.2K
c0t50014EE265087D35d0 - - 172 3 21.6M 31.2K
c0t50014EE2B73FBE85d0 - - 181 3 22.7M 31.2K
mirror 3.75T 1.68T 479 3 49.7M 31.2K
c0t50014EE2B7437D10d0 - - 202 3 25.3M 31.2K
c0t50014EE20FB32A77d0 - - 276 3 24.3M 31.2K
mirror 479G 4.97T 736 3 59.9M 35.1K
c0t50014EE2650872C3d0 - - 250 3 27.1M 35.1K
c0t50014EE2B7E365A7d0 - - 486 3 32.9M 35.1K
------------------------- ----- ----- ----- ----- ----- -----
Reality: Data is distributed evenly but pool read = ~1.5x single HD read.
Why am I getting such evenly-distributed-but-slow read speeds across the board?
My hardware:
- ZFS pool on OmniOS made up of 4 vdevs, each a 2 x 6TB (wd60efrx) mirror. One of these vdevs didn't exist when I wrote all the data to the pool (due to data migration constraints).
- OmniOS is an ESXi guest.
- All SATA drives are plugged directly into my motherboard's (X10SDV-7tp4f) onboard controller which is passed-through to OmniOS.
- OmniOS has 96GB of RAM.
- I've experimented with hosting an ESXi virtual disk on my Optane 900p, mounted in OmniOS, for a ZIL, readcache, or both. Doesn't seem to have any effect on performance for my workloads (when measuring simple file copies or running benchmark software).
Code:
NAME PROPERTY VALUE SOURCE
tank size 21.8T -
tank capacity 52% -
tank altroot - default
tank health ONLINE -
tank guid 4460167909185718889 default
tank version - default
tank bootfs - default
tank delegation on default
tank autoreplace off default
tank cachefile - default
tank failmode wait default
tank listsnapshots off default
tank autoexpand off default
tank dedupditto 0 default
tank dedupratio 1.00x -
tank free 10.3T -
tank allocated 11.4T -
tank readonly off -
tank comment - default
tank expandsize - -
tank freeing 0 default
tank fragmentation 1% -
tank leaked 0 default
tank bootsize - default
tank feature@async_destroy enabled local
tank feature@empty_bpobj active local
tank feature@lz4_compress active local
tank feature@multi_vdev_crash_dump enabled local
tank feature@spacemap_histogram active local
tank feature@enabled_txg active local
tank feature@hole_birth active local
tank feature@extensible_dataset enabled local
tank feature@embedded_data active local
tank feature@bookmarks enabled local
tank feature@filesystem_limits enabled local
tank feature@large_blocks enabled local
tank feature@sha512 enabled local
tank feature@skein enabled local
tank feature@edonr enabled local
Code:
Pool Version Pool GUID Vdev Ashift Asize Vdev GUID Disk Disk-GUID Cap Product/ Phys_Path/ Dev_Id/ Sn
tank 5000 4460167909185718889 vdevs: 5
vdev 1: mirror 12 6.00 TB 2243640248582335401
vdev 2: mirror 12 6.00 TB 3977744385084940597
vdev 3: mirror 12 6.00 TB 3289320665279914143
vdev 4: hole 0 0 0
vdev 5: mirror 12 6.00 TB 6527926759137914625
Here's another snapshot of what `zpool iostat -v 1` looks like during an SMB copy across filesystems going at ~70MB/s, indicating that the 4 vdevs are being utilized evenly, each transferring at ~17MB/s:
Code:
capacity operations bandwidth
pool alloc free read write read write
------------------------- ----- ----- ----- ----- ----- -----
rpool 2.10G 37.7G 0 0 0 0
c2t0d0 2.10G 37.7G 0 0 0 0
------------------------- ----- ----- ----- ----- ----- -----
tank 11.4T 10.3T 660 1.35K 69.3M 126M
mirror 3.61T 1.82T 162 260 15.4M 30.5M
c0t50014EE20FB334A6d0 - - 89 255 7.21M 30.5M
c0t50014EE059345DBCd0 - - 72 254 8.18M 30.5M
mirror 3.69T 1.75T 151 361 15.7M 30.6M
c0t50014EE265087D35d0 - - 72 306 6.61M 30.6M
c0t50014EE2B73FBE85d0 - - 79 308 9.08M 30.6M
mirror 3.70T 1.73T 172 375 18.2M 32.5M
c0t50014EE2B7437D10d0 - - 97 321 9.75M 32.5M
c0t50014EE20FB32A77d0 - - 74 319 8.45M 32.5M
mirror 422G 5.03T 174 386 20.0M 32.7M
c0t50014EE2650872C3d0 - - 86 313 9.92M 32.7M
c0t50014EE2B7E365A7d0 - - 87 314 10.1M 32.7M
logs - - - - - -
c2t1d0 772K 15.9G 0 0 0 0
cache - - - - - -
c2t2d0 29.9G 93.8M 0 0 0 0
------------------------- ----- ----- ----- ----- ----- -----
Last edited: