omnios+nappit 10gb performance: iperf fast, zfs-send slow

chune

Member
Oct 28, 2013
107
22
18
I finally made the jump to 10GB on my nappit AIO boxes and thought I had followed all of the recommendations, but I am still getting gigabit speed ZFS sends using NC. The weird thing is iperf gives me 8.9 Gbits/sec throughput so I'm not sure if any of the tuneables will help me here. The pool is a stripe of 8 mirrors of 4TB HGST ultrastars.

This is going from one physical AIO box to another AIO box over the 10gb link. Everything along the way is configured with 9000 MTU and the switch is a cisco SG500XG-8F-8T

Code:
Iperf:
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec   876 MBytes  7.34 Gbits/sec
[  4]   1.00-2.00   sec  1.03 GBytes  8.87 Gbits/sec
[  4]   2.00-3.00   sec  1.03 GBytes  8.85 Gbits/sec
[  4]   3.00-4.00   sec  1.04 GBytes  8.94 Gbits/sec
[  4]   4.00-5.00   sec  1.05 GBytes  8.97 Gbits/sec
[  4]   5.00-6.00   sec  1.04 GBytes  8.95 Gbits/sec
[  4]   6.00-7.00   sec  1.04 GBytes  8.93 Gbits/sec
[  4]   7.00-8.00   sec  1.04 GBytes  8.92 Gbits/sec
[  4]   8.00-9.00   sec  1.04 GBytes  8.93 Gbits/sec
[  4]   9.00-10.00  sec  1.04 GBytes  8.92 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-10.00  sec  10.2 GBytes  8.76 Gbits/sec                  sender
[  4]   0.00-10.00  sec  10.2 GBytes  8.76 Gbits/sec                  receiver

Zfs send:
 zfs send pool01-esx/datastore09a-hdd-backup@weekly-1517848527_2018.09.14.23.00.14 | pv | nc -w 20 10.10.10.10 9090
10.2GiB 0:02:21 [74.0MiB/s]
Additional hardware info below:

intel X520-da2 nics
intel/finisar SFP modules
2x E5-2690 v2
768 GB ECC ram
nappit VM has 16 cores and 64 GB ram
LSI 2116 HBA

Let me know if you have any suggestions for what to try. My iozone 1gb benchmarks are looking decent too so I dont think its pool speed:
upload_2019-6-17_9-47-9.png
 
Last edited:

Rand__

Well-Known Member
Mar 6, 2014
4,610
918
113
What does single core load tell you?

Iirc that is the limiting factor here
 

gea

Well-Known Member
Dec 31, 2010
2,535
856
113
DE
Is the parent of the target filesystem set to sync enabled?
What is the output of Pools > Benchmark (some benchmarks with sync enabled and disabled )
 

chune

Member
Oct 28, 2013
107
22
18
Code:
begin tests ..
Bennchmark filesystem: /pool01-hdd/_Pool_Benchmark
Read: filebench, Write: filebench_sequential, date: 06.18.2019

begin test 4 ..singlestreamwrite.f ..
begin test 4sync ..singlestreamwrite.f ..
set sync=disabled
begin test 7 randomread.f ..
begin test 8 randomrw.f ..
begin test 9 singlestreamread.f ..
pool: pool01-hdd

   NAME                       STATE     READ WRITE CKSUM
   pool01-hdd                 ONLINE       0     0     0
     mirror-0                 ONLINE       0     0     0
       c0t5000CCA253C19709d0  ONLINE       0     0     0
       c0t5000CCA253C197FAd0  ONLINE       0     0     0
       c0t5000CCA253C19856d0  ONLINE       0     0     0
     mirror-1                 ONLINE       0     0     0
       c0t5000CCA253C1D3F9d0  ONLINE       0     0     0
       c0t5000CCA253C1D439d0  ONLINE       0     0     0
       c0t5000CCA253C1D43Ad0  ONLINE       0     0     0
     mirror-2                 ONLINE       0     0     0
       c0t5000CCA253C1D443d0  ONLINE       0     0     0
       c0t5000CCA253C1D54Ad0  ONLINE       0     0     0
       c0t5000CCA253C1DF0Dd0  ONLINE       0     0     0
     mirror-3                 ONLINE       0     0     0
       c0t5000CCA253C1E048d0  ONLINE       0     0     0
       c0t5000CCA253C1E4C0d0  ONLINE       0     0     0
       c0t5000CCA253C1E4C8d0  ONLINE       0     0     0
     mirror-4                 ONLINE       0     0     0
       c0t5000CCA253C1E57Cd0  ONLINE       0     0     0
       c0t5000CCA253C1E57Ed0  ONLINE       0     0     0
       c0t5000CCA253C268DEd0  ONLINE       0     0     0


hostname                        san03  Memory size: 65536 Megabytes
pool                            pool01-hdd (recsize=128k, compr=off, readcache=all)
slog                            -
remark                       


Fb3                             sync=always                     sync=disabled                 

Fb4 singlestreamwrite.f         sync=always                     sync=disabled                 
                               197 ops                         9275 ops
                               26.663 ops/s                    1122.947 ops/s
                               9687us cpu/op                   3513us cpu/op
                               25.7ms latency                  0.8ms latency
                                26.5 MB/s                       1122.8 MB/s
________________________________________________________________________________________
 
read fb 7-9 + dd (opt)          randomread.f     randomrw.f     singlestreamr
pri/sec cache=all               95.0 MB/s        149.1 MB/s     1.2 GB/s
 
Last edited:

gea

Well-Known Member
Dec 31, 2010
2,535
856
113
DE
You should insert the output as code (menu [+] in the menues) to make it readable.

If you write to the target filesystem with sync enabled, this would explain your slow results. Your pool offers a write performance of 26.5 MB/s sequential sync write vs 1122.8 MB/s when sync is disabled. Even with a raid-10 setup this is as expected without a fast Slog (ex Intel Optane NVMe, WD SS Ultrastar SAS).

Another explanation for a slow transfer would be pool near full and Jumboframes on some setups.
 

chune

Member
Oct 28, 2013
107
22
18
You should insert the output as code (menu [+] in the menues) to make it readable.

If you write to the target filesystem with sync enabled, this would explain your slow results. Your pool offers a write performance of 26.5 MB/s sequential sync write vs 1122.8 MB/s when sync is disabled. Even with a raid-10 setup this is as expected without a fast Slog (ex Intel Optane NVMe, WD SS Ultrastar SAS).

Another explanation for a slow transfer would be pool near full and Jumboframes on some setups.
I have 64GB of RAM allocated to the omniOS VM, I read that more ram is preferred over a SLOG but maybe that is no longer the case. I do have sync disabled for my pools but i still get the slow speed on ZFS send. My target pool is empty and the sending pool is 50% full. I understand that my random IO performance will not be good but i thought ZFS send was doing sync writes and with sync disabled this should be quite fast.
 

thulle

New Member
Apr 11, 2019
19
11
3
I'd try adding some buffering in pv on both ends to see if it does anything. Ie:
zfs send pool01-esx/datastore09a-hdd-backup@weekly-1517848527_2018.09.14.23.00.14 | pv -B 1G | nc -w 20 10.10.10.10 9090
 
Last edited:

gea

Well-Known Member
Dec 31, 2010
2,535
856
113
DE
RAM is performance relevant as it is used as write cache (default max 4GB/10% RAM) and readcache. Without sync enabled the content of the write ramcache is lost on a crash. When you enable sync, every committed write is logged to a ZIL or Slog device to redo the write on next bootup after a crash.

ZIL is onpool while Slog is an additional drive that can be much faster than the pool itself. Sync write to your pool is very slow so this would be an explanation for bad write values if enabled. A near full pool would be another explanation as well as a single weak disk.

Jumbo can also be a problem. I would retry with Jumbo disabled and check if iostat of disk wait and busy are quite equal.

btw
The built in napp-it replication does nc buffered tranfers automatically.
 

chune

Member
Oct 28, 2013
107
22
18
Buffer did not appear to help. Weird thing is if i vmotion a VM from one AIO box to another AIO box i get the full 10gb speed. Any other suggestions?
 

gea

Well-Known Member
Dec 31, 2010
2,535
856
113
DE
Have you disabled Jumbo?
Is the performance ok on transfers between two local filesystems