Can't get more than 20Gbps out of a 40GbE network - Suggestions?

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Perry

Member
Sep 22, 2016
66
11
8
52
First some background: We're a small motion picture film scanning and restoration business. The files we work with are very large (a typical feature film will be about 5-6TB in size, roughly 45MB/Frame of film with somewhere between 120,000 and 150,000 sequentially numbered frames. Our throughput requirements for 4k film scans are about 1200MB/second, as a baseline minimum.

We've always done this with direct-attach SAS or SATA RAIDs, but moving a file set like that from one machine to the next is too time consuming. We're moving to a more centralized model where files will be scanned directly to a shared volume on a server (or iSCSI if that proves to be better performing), then all the other machines that will need to manipulate those files will have access without having to make copies.

So this summer I bought a new (old stock) IBM G8316 40Gbe 16-port switch and began setting up a FreeNAS server. That's all up and running, and we're starting some performance testing, both on network speed and then on drive speed.

The issue I'm finding is that the fastest iperf speeds I get top out at 21Gbps. While nothing to sneeze at, we need the server (at minimum) to be working at 40, so that it can handle 2-3 simultaneous operations, with different machines hitting it at the same time, or with a single machine reading from one shared volume and writing to another, such as when rendering out the final product.

So here are the details:

Switch: IBM G8316 w/v7.8.10 firmware

FreeNAS:
Supermicro X10SRL-F-O
Xeon E5-1620 v3 @ 3.50GHz
64GB ECC RAM 2133MHz
Chelsio T580-SO-CR 40GbE NIC
2x M1505 8i IT Mode
1x LSI SAS9201-16e
8x Seagate 6TB ST6000DM001
Norco Enclosure w/20 hot-swap bays (8 in use for now)
External JBOD Enclosure with 16 hot-swap bays (currently empty)

Workstation 1 (most of our workstations are nearly identical to this spec):
ASRock X99 Extreme 4 motherboard
i7 5930k/64GB RAM
Mellanox ConnectX-3 VPI single-port 40GbE
Windows 7 Ultimate

On all machines the MTU is set to 9000, which actually gave us a 30% speed boost over the default 1500.

When I run iperf on the Windows machine as a server, then connect to it from the FreeNAS box, I never see speeds over 21Gbps, even though all machines and the switch are reporting 40Gb connections.

Any suggestions on where to go next? I want to get the network bottlenecks cleared up before I start doing heavy disk testing, so I know I'm not dealing with throughput issues on the 40GbE side.

Thanks!
 
  • Like
Reactions: cudak888

pyro_

Active Member
Oct 4, 2013
747
165
43
Can you try direct connect between server and workstation and leave the switch out and see what kind of performace you are getting with iperf?
 
  • Like
Reactions: T_Minus

Perry

Member
Sep 22, 2016
66
11
8
52
bottleneck might be at the CPU clock and RAM speed?
Is there any tool I can use to test that theory? Unfortunately, all of the PCs that will have 40GbE NICs installed are basically the same spec as the workstation listed above.

If it makes any difference, CPU usage on both the FreeNAS and Windows machine is pretty low. But I suspect you're talking about something different.
 

Patrick

Administrator
Staff member
Dec 21, 2010
12,513
5,804
113
@Perry is this system being actively used for data now? Is there any chance you can try booting with a Ubuntu 16.04.1 LTS installation (or LiveCD), on two system and use iperf on that? (btw I personally use iperf3)

Also, are you doing multiple iperf or single?

Here is my thinking. You have Windows to FreeBSD and different adapters on both ends. It would be good to start with a simple configuration then start working from there. Eliminating the OS difference would at least help you rule that out. Then pull a live CD and try the Windows machine against Linux. Then try FreeNAS against Linux. That simple progression will let you rule out a lot of different possibilities. I had an issue with beta hardware and that progression saved my sanity.
 

Perry

Member
Sep 22, 2016
66
11
8
52
It's currently in testing and we're not using it for real data yet. I can do most of the stuff you suggest in the next few days. We currently have one NIC in Windows, on in a CentOS box and one in the FreeNAS system with the cards installed, but we haven't installed drivers or updated the card firmware yet (a process in and of itself...). A second Windows system will need to be pulled off the rack before I can install the card, but we're working on a project on that machine so it's something that might take a couple days.

We're using iperf2, because v3 isn't installed on freeNAS, far as i can tell. I've been trying to avoid installing anything custom on that machine if I can help it. What's the advantage of iperf3 over iperf2?
 

_alex

Active Member
Jan 28, 2016
866
97
28
Bavaria / Germany
don't know if iser would work with Windows and FreeNas but could be worth to consider to get rid of the tcp/ip overhead. i can do some iperf on my ipoib over qdr infiniband later, but as i remember numbers were much lower than 40Gb, too.
 

Perry

Member
Sep 22, 2016
66
11
8
52
Half the nominal speed seems really slow. I'd expect there to be some overhead, but not 50%!
 

Patrick

Administrator
Staff member
Dec 21, 2010
12,513
5,804
113
@Perry - TBH I am not the best person on iperf/iperf2/iperf3 differences. I just have done T580 to CX314A using iperf3

Did you try the -P option for making parallel streams?
 

markpower28

Active Member
Apr 9, 2013
413
104
43
What kind of PCI-E slot did you put the 40GE NIC? I would recommend you try Gen 3 PCI-16X slot first see if that makes any difference.
 

_alex

Active Member
Jan 28, 2016
866
97
28
Bavaria / Germany
Guess it`s not so unusual what you see, here is my iperf (P 1 / 4 / 8) for ipoib between two nodes, each QDR ConnectX2 with an IS-5022 between. No tuning/extra-settings done on the ip-stack, as i use these links for srp + iser.
Os is Proxmox 4.2, with in-tree infiniband stack.


Code:
iperf -s -B 10.10.0.20
------------------------------------------------------------
Server listening on TCP port 5001
Binding to local address 10.10.0.20
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[  4] local 10.10.0.20 port 5001 connected with 10.10.0.10 port 44386
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.0 sec  27.4 GBytes  23.5 Gbits/sec
[  5] local 10.10.0.20 port 5001 connected with 10.10.0.10 port 44466
[  4] local 10.10.0.20 port 5001 connected with 10.10.0.10 port 44468
[  7] local 10.10.0.20 port 5001 connected with 10.10.0.10 port 44472
[  6] local 10.10.0.20 port 5001 connected with 10.10.0.10 port 44470
[  4]  0.0-10.0 sec  7.30 GBytes  6.27 Gbits/sec
[  5]  0.0-10.0 sec  7.24 GBytes  6.21 Gbits/sec
[  7]  0.0-10.0 sec  7.13 GBytes  6.12 Gbits/sec
[  6]  0.0-10.0 sec  7.27 GBytes  6.24 Gbits/sec
[SUM]  0.0-10.0 sec  28.9 GBytes  24.8 Gbits/sec
[  8] local 10.10.0.20 port 5001 connected with 10.10.0.10 port 44528
[  4] local 10.10.0.20 port 5001 connected with 10.10.0.10 port 44530
[  5] local 10.10.0.20 port 5001 connected with 10.10.0.10 port 44532
[  6] local 10.10.0.20 port 5001 connected with 10.10.0.10 port 44534
[  7] local 10.10.0.20 port 5001 connected with 10.10.0.10 port 44536
[  9] local 10.10.0.20 port 5001 connected with 10.10.0.10 port 44538
[ 10] local 10.10.0.20 port 5001 connected with 10.10.0.10 port 44540
[ 11] local 10.10.0.20 port 5001 connected with 10.10.0.10 port 44542
[  5]  0.0-10.0 sec  3.23 GBytes  2.77 Gbits/sec
[  8]  0.0-10.0 sec  3.94 GBytes  3.38 Gbits/sec
[  4]  0.0-10.0 sec  3.26 GBytes  2.80 Gbits/sec
[  6]  0.0-10.0 sec  3.55 GBytes  3.05 Gbits/sec
[  7]  0.0-10.0 sec  3.27 GBytes  2.80 Gbits/sec
[  9]  0.0-10.0 sec  4.15 GBytes  3.56 Gbits/sec
[ 10]  0.0-10.0 sec  3.71 GBytes  3.19 Gbits/sec
[ 11]  0.0-10.0 sec  3.88 GBytes  3.33 Gbits/sec
[SUM]  0.0-10.0 sec  29.0 GBytes  24.9 Gbits/sec
Alex
 

markpower28

Active Member
Apr 9, 2013
413
104
43
Connectx2 has a bus limitations. 23 gbit/s is the best you can do. It's gen2 pcie8x

Sent from my SM-G928V using Tapatalk
 

markpower28

Active Member
Apr 9, 2013
413
104
43
There is overhead for Connectx2, that's why in Windows you see 32 instead of 40 gb for ib. Connectx3 is better.

Sent from my SM-G928V using Tapatalk
 

Perry

Member
Sep 22, 2016
66
11
8
52
Both the Chelsio card in the FreeNAS box and the ConnectX-3 VPI in the Windows machine are x8 PCIe 3 cards. both are in either x16 or x8 PCIe 3.0 slots, and the slots should be running at full bandwidth (that is, the CPU is big enough in each machine that all lanes are available). I don't think this is the issue, because x8 should be about 62Gbps. That is, there should be plenty of bandwidth on the card/motherboard.

So I ran iPerf in Parallel mode:

SERVER (Windows 7):
Code:
C:\Users\perry\Downloads\iperf-2.0.9-win64\iperf-2.0.9-win64>iperf.exe -s -P 4
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size:  208 KByte (default)
------------------------------------------------------------
[  4] local 10.0.0.4 port 5001 connected with 10.0.0.2 port 15367
[  5] local 10.0.0.4 port 5001 connected with 10.0.0.2 port 52129
[  6] local 10.0.0.4 port 5001 connected with 10.0.0.2 port 13462
[  7] local 10.0.0.4 port 5001 connected with 10.0.0.2 port 28235
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-13.0 sec  11.3 GBytes  7.47 Gbits/sec
[  5]  0.0-13.0 sec  11.4 GBytes  7.50 Gbits/sec
[  6]  0.0-13.0 sec  11.0 GBytes  7.29 Gbits/sec
[  7]  0.0-13.0 sec  11.1 GBytes  7.30 Gbits/sec
[SUM]  0.0-13.0 sec  44.8 GBytes  29.6 Gbits/sec
CLIENT (FreeNAS):
Code:
[root@freenas ~]# iperf -c 10.0.0.4 -P 4                                       
------------------------------------------------------------                   
Client connecting to 10.0.0.4, TCP port 5001                                   
TCP window size: 4.00 MByte (default)                                         
------------------------------------------------------------                   
[  8] local 10.0.0.2 port 28235 connected with 10.0.0.4 port 5001             
[  7] local 10.0.0.2 port 13462 connected with 10.0.0.4 port 5001             
[  6] local 10.0.0.2 port 52129 connected with 10.0.0.4 port 5001             
[  9] local 10.0.0.2 port 15367 connected with 10.0.0.4 port 5001             
[ ID] Interval       Transfer     Bandwidth                                   
[  8]  0.0-10.0 sec  11.1 GBytes  9.50 Gbits/sec                               
[  7]  0.0-10.0 sec  11.0 GBytes  9.49 Gbits/sec                               
[  6]  0.0-10.0 sec  11.4 GBytes  9.76 Gbits/sec                               
[  9]  0.0-10.0 sec  11.3 GBytes  9.72 Gbits/sec                               
[SUM]  0.0-10.0 sec  44.8 GBytes  38.5 Gbits/sec
So if the Client side is to be believed, I'm getting damned near 40Gbps throughput overall, which is good news. But why would a single transfer be capped at about 21? that suggests a configuration issue on one end. Odd that the server is reporting drastically different numbers, too. (and just to check, I made the FreeNAS machine the server and the Windows machine the client, and I see identical numbers only reversed, so that no matter which end is the server, it's reporting a smaller number than the client). What's that about?

In the end it may be that this works out fine - at least for now. 4k Uncompressed image sequences are right around 10Gbps when played in real time, so if we can solidly get a little more than that, we can theoretically read from one volume and write to another on the server, without taking a speed hit. And ultimately, that's the goal.
 

KioskAdmin

Active Member
Jan 20, 2015
156
32
28
53
29.6Gbps is a lot more than 10!

That's "odd". Can you try -P 8 and see if that is better?

How are you filling 30Gbps? Must be a big RAID 0 or 10 like pool of SSDs or all NVMe?

38.5 is close enough to 40 that I'd say there isn't some gigantic problem there.
 

Perry

Member
Sep 22, 2016
66
11
8
52
-P 8 gets it to just under 39Gbps, and it's more consistent between the client and server. Strange that the numbers would be so much different for fewer connections.

But, like I said - if we can have 2-3 streams going at a bit more than 10Gbps each, we should be good to go. Our current largest file format would be a 4k DPX image sequence, which we'd be using in our color correction system. That works out to about 1300MB/s, so it's right around 10gbps. With a simple direct attached 8-disk RAID 0 we can easily do 1500MB/s. From a design standpoint, I'd prefer to have the connection have at least 25% headroom over what the drives can handle, though, just to avoid dropped frames on playback due to networking issues (especially when a client is in the room).

I'm fairly new to ZFS and FreeNAS, but we are currently able to do over 1500MB/s reads off of the server with a simple striped pool of 8x Seagate 6TB SATA 3 drives. What I haven't tried yet is setting up the additional workstations and having them hit the server at the same time to see what it does to performance. Our current plan is to install drives in pools of 8 disks, so that we can read from one volume and write to another, and hopefully avoid disk-related performance problems.