Best ZFS config?

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

StevenDTX

Active Member
Aug 17, 2016
493
173
43
I moved from 16Gb FC to to 40Gb, and I ended up just deleting my NappIT VM and figured I would start from scratch. I started with a new OVA deployment of NappIT, then passed through my HBA, Optane 900P 280GB and ConnectX-4. I had problems with the ConnectX-4, so I left it added to vSphere, and gave the VM two additional vmxnet3 adapters.

However, when I configured the pool, I am getting about half of the performance I used to get. I think I configured it the same as before. I have six 4TB HGST drives, and the Optane 900p. I configred three mirrors, and then added 32GB of the Optane is split into a 32GB write-log and the rest as read-cache. I get roughly the same speed on the pool that I get across the wire.

One other change...I upgraded from a single Xeon E5-26xxv2 (I dont remember which one) on an X9SRL-F, to an Epyc 7251 on an H11SSL-NC.

So, my question is, is this the optimal config with the hardware I have? I need about 8TB of usable space, and speed and resilliancy is what I really care about.

Thanks.
 

gea

Well-Known Member
Dec 31, 2010
3,141
1,182
113
DE
The first question to answer is if network or pool performance is not as desired, no or when you compare with another config.

For pool performance, run a Pool > Benchmark. This is a combination of benchmarks for sequential and small io read and write workloads, with sync enabled and disabled. This gives a first impression of pool performance and can be used to decide if a tuning modification is helpful.

For network performance use iperf. Napp-it comes with the server and client part. You can check Server > ESXi, ESXi > OmniOS and OmniOS > Server.

For tuning with a faster network, you can use napp-it menu System > Tuning. Increase vmxnet3, tcp and NFS buffers. For pool performance for VM storage a multi Raid-10 is best as it offers the highest iops. For VM storage I would reduce ZFS recsize to a smaller value like 64k or 32k.
 
  • Like
Reactions: StevenDTX

StevenDTX

Active Member
Aug 17, 2016
493
173
43
Thanks @gea.

Is this all the output you expect? It seemed that there were more tests on the initial setup page.

Code:
Benchmark: Write: filebench_sequential, Read: filebench, date: 07.07.2020

pool: vmware-pool-1

    NAME                       STATE     READ WRITE CKSUM
    vmware-pool-1              ONLINE       0     0     0
      mirror-0                 ONLINE       0     0     0
        c0t5000CCA22BC4EC12d0  ONLINE       0     0     0
        c0t5000CCA22BE17DC5d0  ONLINE       0     0     0
      mirror-1                 ONLINE       0     0     0
        c0t5000CCA23DC3BC69d0  ONLINE       0     0     0
        c0t5000CCA23DC3D80Dd0  ONLINE       0     0     0
      mirror-2                 ONLINE       0     0     0
        c0t5000CCA23DC3EF3Cd0  ONLINE       0     0     0
        c0t5000CCA249D59A3Dd0  ONLINE       0     0     0

host                            nappit1
pool                            vmware-pool-1 (recsize=32K, ssb=-, compr=off, readcache=all)
slog                            -
encryption                      -
remark                           

Fb3                             sync=always                     sync=disabled                   

Fb4 singlestreamwrite.f         sync=always                     sync=disabled                 
                                211 ops                         3654 ops
                                42.198 ops/s                    730.765 ops/s
                                112927us cpu/op                 38771us cpu/op
                                23.5ms latency                  1.3ms latency
                                42.0 MB/s                       730.6 MB/s
________________________________________________________________________________________
                                randomread.f     randomrw.f     singlestreamr
pri/sec cache=all               200.0 MB/s       301.8 MB/s     1.1 GB/s                     
________________________________________________________________________________________
 

StevenDTX

Active Member
Aug 17, 2016
493
173
43
And here is with the Optane configured as SLOG and L2ARC

Code:
Benchmark: Write: filebench_sequential, Read: filebench, date: 07.07.2020

pool: vmware-pool-1

    NAME                       STATE     READ WRITE CKSUM
    vmware-pool-1              ONLINE       0     0     0
      mirror-0                 ONLINE       0     0     0
        c0t5000CCA22BC4EC12d0  ONLINE       0     0     0
        c0t5000CCA22BE17DC5d0  ONLINE       0     0     0
      mirror-1                 ONLINE       0     0     0
        c0t5000CCA23DC3BC69d0  ONLINE       0     0     0
        c0t5000CCA23DC3D80Dd0  ONLINE       0     0     0
      mirror-2                 ONLINE       0     0     0
        c0t5000CCA23DC3EF3Cd0  ONLINE       0     0     0
        c0t5000CCA249D59A3Dd0  ONLINE       0     0     0
    logs   
      c3t1d0p1                 ONLINE       0     0     0
    cache
      c3t1d0p2                 ONLINE       0     0     0

host                            nappit1
pool                            vmware-pool-1 (recsize=32K, ssb=-, compr=off, readcache=all)
slog                             
encryption                      -
remark                           

Fb3                             sync=always                     sync=disabled                   

Fb4 singlestreamwrite.f         sync=always                     sync=disabled                 
                                2667 ops                        3620 ops
                                533.369 ops/s                   723.950 ops/s
                                60715us cpu/op                  39980us cpu/op
                                1.9ms latency                   1.4ms latency
                                533.2 MB/s                      723.8 MB/s
________________________________________________________________________________________
                                randomread.f     randomrw.f     singlestreamr
pri/sec cache=all               198.4 MB/s       284.2 MB/s     1.1 GB/s                     
________________________________________________________________________________________
 

StevenDTX

Active Member
Aug 17, 2016
493
173
43
Current config:

Code:
  pool: vmware-pool-1
 state: ONLINE
  scan: none requested
config:

    NAME                       STATE     READ WRITE CKSUM      CAP            Product /napp-it   SN/LUN           IOstat mess       SMART
    vmware-pool-1              ONLINE       0     0     0
      mirror-0                 ONLINE       0     0     0
        c0t5000CCA22BC4EC12d0  ONLINE       0     0     0      4 TB           Hitachi HDS72404   PK   S:0 H:0 T:0       -
        c0t5000CCA22BE17DC5d0  ONLINE       0     0     0      4 TB           HGST HDS724040AL   PK   S:0 H:0 T:0       -
      mirror-1                 ONLINE       0     0     0
        c0t5000CCA23DC3BC69d0  ONLINE       0     0     0      4 TB           HGST HDS724040AL   PK   S:0 H:0 T:0       -
        c0t5000CCA23DC3D80Dd0  ONLINE       0     0     0      4 TB           HGST HDS724040AL   PK  S:0 H:0 T:0       -
      mirror-2                 ONLINE       0     0     0
        c0t5000CCA23DC3EF3Cd0  ONLINE       0     0     0      4 TB           HGST HDS724040AL   PK   S:0 H:0 T:0       -
        c0t5000CCA249D59A3Dd0  ONLINE       0     0     0      4 TB           HGST HDN724040AL   PK   S:0 H:0 T:0       -
    logs   
      c3t1d0p1                 ONLINE       0     0     0      53.2 GB        INTEL SSDPED1D28   PHM S:0 H:4 T:0       -
    cache
      c3t1d0p2                 ONLINE       0     0     0      221 GB         INTEL SSDPED1D28   PHM S:0 H:4 T:0       -
 

gea

Well-Known Member
Dec 31, 2010
3,141
1,182
113
DE
As expected (maybe better than that).

A single mechanical disk can give 100-250 MB/s sequentially depending on the quetion if you use inner or outer track or if disk is quite full (a lot of fragmentation) or empty. Yoy see an unsync sequential pool performance of 723 MB/s. On writes the performance of a mirror is like a single disk so with 3 mirrors you are near the maximum with around 723/3= 241 MB/s per disk. This is very good.

With sync write, you see that the sequential write performance goes down to 42 MB/s. This is due the low iops of a mechanical disk, say around 100 iops. This is as bad as expected. With a single disk you may be at around 10-20 MB/s sync write. If you add an Optane with around 500000 iops as an Slog, you can do pure sequential writes to the disks with the help of the rambased writecache. All logging is on the Optane what gives a pool write performance of 533 MB/s. Also a very good value.

Random read is as bad as expected. The Arc/L2Arc does not help with random read but helps a lot in real world workloads where you see repeated reads from caches. You can enable read ahead on the L2Arc what may improve reads a little.

The singlestream read value is perfect but it mainly shows the quality of the readcaches as these reads are cached.
 

StevenDTX

Active Member
Aug 17, 2016
493
173
43
Thanks again, @gea. I sure wish I had run this benchmark test on my previous VM. From what I recall, I was getting way better performance. Oh well, it is what it is...for now.

Are the sizes of my SLOG and L2ARC OK? Should I allocate more to SLOG? I seem to recall I had it around 35GB on my previous config.

If these numbers are all OK, then I will work on tuning the network. With iperf3, I am getting less than 6Gbps into the VM on the 40Gb connections. Host to host, I am getting around 14Gbps.

I guess if I want better performance, I need to start saving my pennies and work on moving to NVMe. This motherboard has lots of available PCIe slots/lanes, and all slots can use bifurcation, so I can stack quite a few NVMe drives in this box.
 

gea

Well-Known Member
Dec 31, 2010
3,141
1,182
113
DE
Slog is there to protect the rambased writecache. Its default size on Open-ZFS is 10% RAM, max 4GB. With 32 GB Ram, your write cache is 3,2 GB. You should use at least twice of that value for Slog. Even if you double to be sure to have enough, you end with less than 20 GB. More does not help, at least not with Optane. With a traditional flash where you want an overprovisioning to keep write performance under steady load high, this may be different.

If the L2Arc helps at all depends on workload. With 32 GB RAM, do not expect too much. As management of the L2Arc needs RAM, you should not use more than 5-10x RAM. Your size of 221 GB is at the upper limit.
 
  • Like
Reactions: StevenDTX