How to acheive SMOKIN' ZFS send/recv transfers

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

K D

Well-Known Member
Dec 24, 2016
1,439
320
83
30041
Source is a bare metal Freenas. Destination is an AIO Freenas with 2 vcpu and 24 GB RAM. CPU us barely being used in both systems.

Using netcat as described in the OP.

There is no SLOG device in the destination pool. It is 2x 5x8GBRaidz2 pool. Sync write is not enabled.
 

K D

Well-Known Member
Dec 24, 2016
1,439
320
83
30041
The pool was originally created in Napp-IT and then imported into FreeNAS. Thought maybe that was contributing to the problem and deleted and recreated the pool in Freenas 11U3. The issue continued.

It's been running for 5 Hrs and only 460GB copied. I cancelled the file transfer for now.

I tried the other way round. Copying from t1stor to astor via netcat gave me a good 600MB/s.

Code:
root@astor:~ # nc 172.16.10.81 3333 | zfs recv -v m2tank1/test/esxi | pv
receiving full stream of tank1/esxi@testsnap into m2tank1/test/esxi@testsnap
received 121GB stream in 204 seconds (608MB/sec)                               ]
download.png

The blue line is the original transfer that I cancelled. The green spike is when i sent data from this box to the source to test where I got ~600 MB/s

Someone please give me a thread to pull to unravel this mystery.
 

whitey

Moderator
Jun 30, 2014
2,766
868
113
41
Let me think on this for a bit and I'll report back, have a couple thoughts in my head but may need some clarification.
 

whitey

Moderator
Jun 30, 2014
2,766
868
113
41
Ok, collected my thoughts.

First off on the src/phys system is it again FreeNAS, if so what release, same as AIO version or different. Is that phys ZFS box using also CX3 cards connected to your Ubi 10G switch? I initially wanted to say maybe check jumbo frames on phy system but with it receiving data at a good rate I steer away from that. What is dest pool 2x 5x8GBRaidz2 pool? Is that a stripped raidz2 (al la raid60) pool of 10 total 8TB disks, 5 disks in each vdev? Any ent class ZIL device behind those. Again w/out ZIL and sync set to disabled if I interpreted that right I may be chasing my tail there again. Receive buffer size maybe on AIO ZFS VM...again feels like a stretch. If you had two AIO FreeNAS ZFS systems prior sending data at a MUCH faster rate w/ nc transfer method I have to keep coming back to 'what is different on phys system/setup' but the fact that it can receive data quickly/at an acceptable speed maybe points back to your raid60 not being able to cut the mustard.

I'm flailing here hah.
 
Last edited:

K D

Well-Known Member
Dec 24, 2016
1,439
320
83
30041
Here are the System Details

Source - astor
CPU/Mobo - X10SRL-F with E5-2630 v4
RAM - 64 GB
HBA - 3x H310 flashed to LSI P20 in IT mode
NIC - Mellanox Connecct-X2 Single Port 10GB SFP+
Chassis - Supermicro 846 with A backplane
OS -FreeNAS-11.0-U3 (c5dcf4416)
SLOG - ZEUSRAM (sync=disabled on all data sets though.)
DISKS - 12x HGST Coolspin 4GB and 4x WD Red 4TB
ZPool Config :​
Code:
        NAME                                            STATE     READ WRITE CKSUM                                                
        m2tank1                                         ONLINE       0     0     0                                                
          raidz1-0                                      ONLINE       0     0     0                                                
            gptid/3127b9e0-6a6e-11e7-9c12-000c2974005c  ONLINE       0     0     0                                                
            gptid/37e5910f-6a6e-11e7-9c12-000c2974005c  ONLINE       0     0     0                                                
            gptid/3eb1f2ca-6a6e-11e7-9c12-000c2974005c  ONLINE       0     0     0                                                
            gptid/45828779-6a6e-11e7-9c12-000c2974005c  ONLINE       0     0     0                                                
          raidz1-1                                      ONLINE       0     0     0                                                
            gptid/e4f24c5f-6a6e-11e7-9c12-000c2974005c  ONLINE       0     0     0                                                
            gptid/ebc333fc-6a6e-11e7-9c12-000c2974005c  ONLINE       0     0     0                                                
            gptid/f28004dc-6a6e-11e7-9c12-000c2974005c  ONLINE       0     0     0                                                
            gptid/f9574a77-6a6e-11e7-9c12-000c2974005c  ONLINE       0     0     0                                                
          raidz1-2                                      ONLINE       0     0     0                                                
            gptid/37e631db-6a6f-11e7-9c12-000c2974005c  ONLINE       0     0     0                                                
            gptid/391b4bfa-6a6f-11e7-9c12-000c2974005c  ONLINE       0     0     0                                                
            gptid/3a5029bb-6a6f-11e7-9c12-000c2974005c  ONLINE       0     0     0                                                
            gptid/3b8a0cf8-6a6f-11e7-9c12-000c2974005c  ONLINE       0     0     0                                                
          raidz1-4                                      ONLINE       0     0     0                                                
            gptid/1ddab984-79a3-11e7-b72c-000c2974005c  ONLINE       0     0     0                                                
            gptid/1e767afe-79a3-11e7-b72c-000c2974005c  ONLINE       0     0     0                                                
            gptid/1f15a641-79a3-11e7-b72c-000c2974005c  ONLINE       0     0     0                                                
            gptid/afce452c-a017-11e7-abef-0cc47adba166  ONLINE       0     0     0                                                
        logs                                                                                                                      
          gptid/29a6dc14-6a71-11e7-94a6-000c2974005c    ONLINE       0     0     0
Target- t1stor
CPU/Mobo - X1SSH-CTF with E3-1375 v6
RAM - 64 GB with 24GB allocated to FreeNAS VM
HBA - Onboard LSI 3008 P14 IT mode
NIC - Mellanox Connect-X2 Single Port 10GB SFP+ on the host. Using vmxnet3 VNic for FreeNAS
Chassis - Supermicro 846 with SAS2EL1 backplane (Both ports from the onboard LSI 3008 connected to the backplane)
OS -FreeNAS-11.0-U3 (c5dcf4416) VM on ESXi 6.5 U1
DISKS - 10x WD Red 4TB
ZPool Config :​

Code:
        tank                                            ONLINE       0     0     0
          raidz1-0                                      ONLINE       0     0     0
            gptid/4170a89f-a102-11e7-a034-0050569ef91e  ONLINE       0     0     0
            gptid/41f6895f-a102-11e7-a034-0050569ef91e  ONLINE       0     0     0
            gptid/42794d26-a102-11e7-a034-0050569ef91e  ONLINE       0     0     0
            gptid/43028398-a102-11e7-a034-0050569ef91e  ONLINE       0     0     0
            gptid/4388437c-a102-11e7-a034-0050569ef91e  ONLINE       0     0     0
          raidz1-1                                      ONLINE       0     0     0
            gptid/441cca3c-a102-11e7-a034-0050569ef91e  ONLINE       0     0     0
            gptid/44a4e59f-a102-11e7-a034-0050569ef91e  ONLINE       0     0     0
            gptid/45394570-a102-11e7-a034-0050569ef91e  ONLINE       0     0     0
            gptid/45c2662a-a102-11e7-a034-0050569ef91e  ONLINE       0     0     0
            gptid/464be365-a102-11e7-a034-0050569ef91e  ONLINE       0     0     0
Both Systems are connected via Mellanox DACs to unifi US-16-XG switch. Jumbo Frames not enabled anywhere.
 

K D

Well-Known Member
Dec 24, 2016
1,439
320
83
30041
I plugged in drives from another pool into t1stor for testing and was able to do a local transfer using zfs send with the following command. Transferred 33TB in 15 Hrs. So it doesnt seem to be a write speed bottleneck on the target pool.

Code:
zfs send -R tank2/ds@initial | pv | zfs recv tank/ds
 

Rand__

Well-Known Member
Mar 6, 2014
6,634
1,767
113
CPU utilization? The 2630 will have much less peak CPU and if its doing the heavy lifting it might fail to deliver (not been following the initial post to see whether it covers that)
 

Rand__

Well-Known Member
Mar 6, 2014
6,634
1,767
113
10 Core cpu, 1 core=10%, (potentially) single threaded copy operation -> take a closer look ;)
 

K D

Well-Known Member
Dec 24, 2016
1,439
320
83
30041
10 Core cpu, 1 core=10%, (potentially) single threaded copy operation -> take a closer look ;)
Good Point. Looking at 2 scenarios

1. Copy from astor (baremetal) to t1stor (AIO) - Speed of around ~50-60MB/S using netcat.
2 Copy from t1stor (AIO) to astor (baremetal) - Speed of around ~ 600MB/s using netcat.

See graph for CPU utilization during those times. Barely felt it.
download.png

Also, recently when I replaced a disk in the pool in astor, it completed the operation in about 3.5 Hrs at a peak speed of 3.6GB/sec. I dont think the CPU is a bottleneck here.
 

Rand__

Well-Known Member
Mar 6, 2014
6,634
1,767
113
-You ran iperf to see whether thats fine both ways?
-increased rec Buffer sizes on ESX box? Maybe vmxnet3 does not auto scale correctly?
-I don't assume you can exchange disks between the two boxes for testers?;)
 

K D

Well-Known Member
Dec 24, 2016
1,439
320
83
30041
@Rand__

I looked back at my notes and see that I had run iperf only one way. Ran it both ways now and see a noticeable difference.

The first test is from AIO to baremetal and the second is the reverse. I also see that the TCP window size is different in both. I have no idea what that signifies though.

iperf.png
 

K D

Well-Known Member
Dec 24, 2016
1,439
320
83
30041
I also tried checking the connection from a Windows VM running on the AIO and using the same portgroup as the FreeNAS VM

iperf 2.png
 

Rand__

Well-Known Member
Mar 6, 2014
6,634
1,767
113
Well there at least is a reason, if no explanation.
I am facing a 'one way express lane' issue myself at the moment and have not wrapped my head around how that happens, so definetly interested in those explanations :)

You of course can ramp up the iperf with larger windows sizes ( -w 1M) or multiple threads (-P 4/8) but without knowing the exact details of the helper program that does the later test its always a bit optimistic and just proves theoretical capability
 

K D

Well-Known Member
Dec 24, 2016
1,439
320
83
30041
I see the same behavior when connecting from the esxi host to the baremetal freenas server. So the issue seems to be with esxi networking.

iperf.png
 

K D

Well-Known Member
Dec 24, 2016
1,439
320
83
30041
You mean in a VM or bare metal on the host? I can do that. Let me know what I need to do.
 

Rand__

Well-Known Member
Mar 6, 2014
6,634
1,767
113
Baremetal ideally to test whether its ESX or not - just run iperf there to see whether its similar (hardware issue) or not (software)