ZFS Benchmarks - how to they look?

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

NOTORIOUS VR

Member
Nov 24, 2015
78
7
8
43
I'm trying to optimize my file server - but not sure if I am even seeing reasonable performance as it is, so maybe someone can look at the benchmark results below and let me know if I am in the ballpark of what should/could be expected?

Napp-it 028_v4 is running in a VM (ESXi 6.5) - controllers are passed through (IT mode) and the spinning drives are connected via SFF cables to a JBOD box then going to a DELL card to split the connections to the 15 drive bays in Supermicro caddy's.

MEDIA01_Z2 is 10x shucked 8TB WD's for basically plex media (shared via NFS v3)
STORAGE1_Z1 is 5x 4TB WD RED's for general network file sharing, backups, NFS shares, etc.
SSD_DS1 is 4x 1TB CRUCIAL SSD's in RAID-0 mirrored for ESI VM datastore (NFS v3)

atime = off on all pools

napp-it VM info:


Seems like without any cache/lz4 compression the results are extremely poor, but that could also be normal/expected since I do not know what I am looking for.

I'm essentially wanting to know if I am where I should be with the hardware I have (subjective I know) and what the next step is to do cost effective, solid performance upgrades for my use case (home). Price/performance is important to me (obviously).


Code:
4x SDD mirror no cache/lz4:

Bennchmark filesystem: /SSD_DS1/_Pool_Benchmark
Read: filebench, Write: filebench_sequential, date: 04.19.2020

begin test 4 ..singlestreamwrite.f ..
begin test 4sync ..singlestreamwrite.f ..
set sync=disabled
begin test 7 randomread.f ..
begin test 8 randomrw.f ..
begin test 9 singlestreamread.f ..
pool: SSD_DS1

    NAME                       STATE     READ WRITE CKSUM
    SSD_DS1                    ONLINE       0     0     0
      mirror-0                 ONLINE       0     0     0
        c0t500A0751E1E095EAd0  ONLINE       0     0     0
        c0t500A0751E1E095FDd0  ONLINE       0     0     0
      mirror-1                 ONLINE       0     0     0
        c0t500A0751E1ECA0E6d0  ONLINE       0     0     0
        c0t500A0751E1ECA1D6d0  ONLINE       0     0     0


hostname                        batcavefs  Memory size: 65536 Megabytes
pool                            SSD_DS1 (recsize=128k, compr=off, readcache=none)
slog                            -
remark                           


Fb3                             sync=always                     sync=disabled                   

Fb4 singlestreamwrite.f         sync=always                     sync=disabled                   
                                1439 ops                        3973 ops
                                287.759 ops/s                   794.250 ops/s
                                3228us cpu/op                   4034us cpu/op
                                3.4ms latency                   1.2ms latency
                                287.6 MB/s                      794.0 MB/s
________________________________________________________________________________________
 
read fb 7-9 + dd (opt)          randomread.f     randomrw.f     singlestreamr
pri/sec cache=none              6.2 MB/s         14.2 MB/s      392.6 MB/s                   
________________________________________________________________________________________

4x SDD mirror with cache/lz4:

Bennchmark filesystem: /SSD_DS1/_Pool_Benchmark
Read: filebench, Write: filebench_sequential, date: 04.19.2020

begin test 4 ..singlestreamwrite.f ..
begin test 4sync ..singlestreamwrite.f ..
set sync=disabled
begin test 7 randomread.f ..
begin test 8 randomrw.f ..
begin test 9 singlestreamread.f ..
pool: SSD_DS1

    NAME                       STATE     READ WRITE CKSUM
    SSD_DS1                    ONLINE       0     0     0
      mirror-0                 ONLINE       0     0     0
        c0t500A0751E1E095EAd0  ONLINE       0     0     0
        c0t500A0751E1E095FDd0  ONLINE       0     0     0
      mirror-1                 ONLINE       0     0     0
        c0t500A0751E1ECA0E6d0  ONLINE       0     0     0
        c0t500A0751E1ECA1D6d0  ONLINE       0     0     0


hostname                        batcavefs  Memory size: 65536 Megabytes
pool                            SSD_DS1 (recsize=128k, compr=lz4, readcache=all)
slog                            -
remark                           


Fb3                             sync=always                     sync=disabled                   

Fb4 singlestreamwrite.f         sync=always                     sync=disabled                   
                                1374 ops                        6367 ops
                                274.781 ops/s                   1273.180 ops/s
                                3287us cpu/op                   2144us cpu/op
                                3.6ms latency                   0.8ms latency
                                274.6 MB/s                      1273.0 MB/s
________________________________________________________________________________________
 
read fb 7-9 + dd (opt)          randomread.f     randomrw.f     singlestreamr
pri/sec cache=all               129.8 MB/s       158.0 MB/s     1.3 GB/s                     
________________________________________________________________________________________



10x8TB Z2 MEDIA - no cache/lz4:

Bennchmark filesystem: /MEDIA01_Z2/_Pool_Benchmark
Read: filebench, Write: filebench_sequential, date: 04.19.2020

hostname                        batcavefs  Memory size: 65536 Megabytes
pool                            MEDIA01_Z2 (recsize=128k, compr=off, readcache=none)
slog                            -
remark                           


Fb3                             sync=always                     sync=disabled                   

Fb4 singlestreamwrite.f         sync=always                     sync=disabled                   
                                188 ops                         4528 ops
                                37.597 ops/s                    904.846 ops/s
                                10574us cpu/op                  3296us cpu/op
                                26.3ms latency                  1.0ms latency
                                37.4 MB/s                       904.6 MB/s
________________________________________________________________________________________
 
read fb 7-9 + dd (opt)          randomread.f     randomrw.f     singlestreamr
pri/sec cache=none              0.2 MB/s         0.4 MB/s       67.6 MB/s                     
________________________________________________________________________________________


10x8TB Z2 MEDIA - with cache/lz4:


Bennchmark filesystem: /MEDIA01_Z2/_Pool_Benchmark
Read: filebench, Write: filebench_sequential, date: 04.19.2020

begin test 4 ..singlestreamwrite.f ..
begin test 4sync ..singlestreamwrite.f ..
set sync=disabled
begin test 7 randomread.f ..
begin test 8 randomrw.f ..
begin test 9 singlestreamread.f ..
pool: MEDIA01_Z2

    NAME                       STATE     READ WRITE CKSUM
    MEDIA01_Z2                 ONLINE       0     0     0
      raidz2-0                 ONLINE       0     0     0
        c0t5000CCA257E7D8A2d0  ONLINE       0     0     0
        c0t5000CCA257E7DBA6d0  ONLINE       0     0     0
        c0t5000CCA257E9A2BAd0  ONLINE       0     0     0
        c0t5000CCA257E9BCC0d0  ONLINE       0     0     0
        c0t5000CCA257E9C793d0  ONLINE       0     0     0
        c0t5000CCA257E9E33Bd0  ONLINE       0     0     0
        c0t5000CCA257E9E483d0  ONLINE       0     0     0
        c0t5000CCA257EA145Ad0  ONLINE       0     0     0
        c0t5000CCA257EA25D3d0  ONLINE       0     0     0
        c0t5000CCA257EA426Ed0  ONLINE       0     0     0


hostname                        batcavefs  Memory size: 65536 Megabytes
pool                            MEDIA01_Z2 (recsize=128k, compr=lz4, readcache=all)
slog                            -
remark                           


Fb3                             sync=always                     sync=disabled                   

Fb4 singlestreamwrite.f         sync=always                     sync=disabled                   
                                197 ops                         5189 ops
                                39.398 ops/s                    1033.429 ops/s
                                11155us cpu/op                  2888us cpu/op
                                25.2ms latency                  0.9ms latency
                                39.2 MB/s                       1033.2 MB/s
________________________________________________________________________________________
 
read fb 7-9 + dd (opt)          randomread.f     randomrw.f     singlestreamr
pri/sec cache=all               106.8 MB/s       102.6 MB/s     1.1 GB/s                     
________________________________________________________________________________________


5x4TB Z1 STORAGE - no cache/lz4:


Bennchmark filesystem: /STORAGE1_Z1/_Pool_Benchmark
Read: filebench, Write: filebench_sequential, date: 04.19.2020

hostname                        batcavefs  Memory size: 65536 Megabytes
pool                            STORAGE1_Z1 (recsize=128k, compr=off, readcache=none)
slog                            -
remark                           


Fb3                             sync=always                     sync=disabled                   

Fb4 singlestreamwrite.f         sync=always                     sync=disabled                   
                                187 ops                         4382 ops
                                37.398 ops/s                    876.354 ops/s
                                11429us cpu/op                  2064us cpu/op
                                26.6ms latency                  1.1ms latency
                                37.2 MB/s                       876.2 MB/s
________________________________________________________________________________________
 
read fb 7-9 + dd (opt)          randomread.f     randomrw.f     singlestreamr
pri/sec cache=none              0.0 MB/s         0.0 MB/s       34.4 MB/s                     
________________________________________________________________________________________



5x4TB Z1 STORAGE - no cache/lz4:

Bennchmark filesystem: /MEDIA01_Z2/_Pool_Benchmark
Read: filebench, Write: filebench_sequential, date: 04.19.2020

begin test 4 ..singlestreamwrite.f ..
begin test 4sync ..singlestreamwrite.f ..
set sync=disabled
begin test 7 randomread.f ..
begin test 8 randomrw.f ..
begin test 9 singlestreamread.f ..
pool: MEDIA01_Z2

    NAME                       STATE     READ WRITE CKSUM
    MEDIA01_Z2                 ONLINE       0     0     0
      raidz2-0                 ONLINE       0     0     0
        c0t5000CCA257E7D8A2d0  ONLINE       0     0     0
        c0t5000CCA257E7DBA6d0  ONLINE       0     0     0
        c0t5000CCA257E9A2BAd0  ONLINE       0     0     0
        c0t5000CCA257E9BCC0d0  ONLINE       0     0     0
        c0t5000CCA257E9C793d0  ONLINE       0     0     0
        c0t5000CCA257E9E33Bd0  ONLINE       0     0     0
        c0t5000CCA257E9E483d0  ONLINE       0     0     0
        c0t5000CCA257EA145Ad0  ONLINE       0     0     0
        c0t5000CCA257EA25D3d0  ONLINE       0     0     0
        c0t5000CCA257EA426Ed0  ONLINE       0     0     0


hostname                        batcavefs  Memory size: 65536 Megabytes
pool                            MEDIA01_Z2 (recsize=128k, compr=lz4, readcache=all)
slog                            -
remark                           


Fb3                             sync=always                     sync=disabled                   

Fb4 singlestreamwrite.f         sync=always                     sync=disabled                   
                                196 ops                         5219 ops
                                39.198 ops/s                    1042.548 ops/s
                                10192us cpu/op                  2732us cpu/op
                                25.3ms latency                  0.9ms latency
                                39.0 MB/s                       1042.3 MB/s
________________________________________________________________________________________
 
read fb 7-9 + dd (opt)          randomread.f     randomrw.f     singlestreamr
pri/sec cache=all               106.6 MB/s       105.8 MB/s     1.1 GB/s                     
________________________________________________________________________________________
 

gea

Well-Known Member
Dec 31, 2010
3,156
1,195
113
DE
Seems ok
Your SSD Raid-10

Around 800 MB/s sequential write.
From expectation: 2 x sequential write of a single SSD (400 MB/s). This is ok

Readcache none: only helpful for special tests. ZFS gets its performance from the caches.
Write performance without readcache is worse than possible as even for writes you must read metadata

This is why your second test is faster with readcache=all.
Sync values are good, A further improvement my be possible with an Optane Slog (ex 4801).
If you use the SSD for VMs, you can reduce ZFS recsize ex to 32k what can help to improve VM performance a little.

Z2 Pool
Same thing about readcache. Sync is as bad as expected. For a disk filer either disable sync or add an Optane

As a thumbrule:
A disk can produce on a mixed load (sequential/random) what is typically the case with ZFS and raid between 100 and 150 MB/s. A Z2 pool from 10 disks has 8 datadisks. If you get 1000 MB/s sequential read or write, you land at 125 MB/s per disk. This is as expected.

A mechanical disk has around 100 iops. This is why sync write of the disk pool is so bad. A good SSD can give between 5k and 30k write iops at 4k. NVMe like an Intel Optane 900 are at 500k write iops.
 
Last edited:

NOTORIOUS VR

Member
Nov 24, 2015
78
7
8
43
Seems ok
Your SSD Raid-10

Around 800 MB/s sequential write.
From expectation: 2 x sequential write of a single SSD (400 MB/s). This is ok

Readcache none: only helpful for special tests. ZFS gets its performance from the caches.
Write performance without readcache is worse than possible as even for writes you must read metadata

This is why your second test is faster with readcache=all.
Sync values are good, A further improvement my be possible with an Optane Slog (ex 4801).
If you use the SSD for VMs, you can reduce ZFS recsize ex to 32k what can help to improve VM performance a little.
Hi Gea,

thanks. Can I assume I need a separate Optane for every Pool or can one be used for more than one at a time? Also re. recsize would I have to move data off and destroy the pool to change it?

Z2 Pool
Same thing about readcache. Sync is as bad as expected. For a disk filer either disable sync or add an Optane

As a thumbrule:
A disk can produce on a mixed load (sequential/random) what is typically the case with ZFS and raid between 100 and 150 MB/s. A Z2 pool from 10 disks has 8 datadisks. If you get 1000 MB/s sequential read or write, you land at 125 MB/s per disk. This is as expected.

A mechanical disk has around 100 iops. This is why sync write of the disk pool is so bad. A good SSD can give between 5k and 30k write iops at 4k. NVMe like an Intel Optane 900 are at 500k write iops.
I very much appreciate the time you took to look and write your response. I will look for an Optane device as you suggested to possibly add some performance.

On a side note, is it possible to be having NFS bottleneck issues? I don't see issues when running tests but I seem to have extremely poor performance with Docker and NFS (I don't believe it is Napp-it related) not sure if you have any input.
 

vangoose

Active Member
May 21, 2019
326
104
43
Canada
Hi Gea,

thanks. Can I assume I need a separate Optane for every Pool or can one be used for more than one at a time? Also re. recsize would I have to move data off and destroy the pool to change it?



I very much appreciate the time you took to look and write your response. I will look for an Optane device as you suggested to possibly add some performance.

On a side note, is it possible to be having NFS bottleneck issues? I don't see issues when running tests but I seem to have extremely poor performance with Docker and NFS (I don't believe it is Napp-it related) not sure if you have any input.
You can partition the optane but make sure it's aligned in 4K.
 

gea

Well-Known Member
Dec 31, 2010
3,156
1,195
113
DE
Recsize is a setting that affects new data.
For current data you need a move between filesystems or a replication.

about NFS
Tuning for > 1G is mostly increase tcp buffers and NFS buffers/servers/transfer size etc as the defaults are mostly optimized for 1G

see napp-it System > Appliance Tuning
 

NOTORIOUS VR

Member
Nov 24, 2015
78
7
8
43
I have one 8x slot left on my Mainboard and I was thinking of using an Supermicro AOC-SLG3-2M2 - would a single DC P4801X be sufficient?
 

gea

Well-Known Member
Dec 31, 2010
3,156
1,195
113
DE
In pass-through mode (can give problems) you can partition the Optane to use the partitions as one or more l2arc or slog. Other option is to use the Optane as datastore with some vmdks on it.

The first M.2 slot will be always supported, the second only if the mainboard supports bifurcation.