Speed/Throughput Check - 40Gbe ConnectX-3 VPIs and ICX6610

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

firworks

Member
May 7, 2021
37
27
18
I'm just getting my little home lab setup and I've got 2 servers (R210ii with an E3-1240V2, and a R420 with 2xE5-2430L) running ConnectX-3 VPI cards flashed to 40Gbe connected together through DACs to the two 40Gbe ports on an ICX6610 switch. I've run iperf3 to try to see if the network is working properly and I'm seeing ~ 25Gb/s between them. I wasn't sure if that was an expected outcome given some overhead or if I've got something misconfigured or even if I'm just using iperf wrong/misinterpreting the output.

I've got the network setup for 8000 byte Jumbo Frames, on the switch and on both adapters.

Here's the command I've been using to test it:
Code:
iperf3 -M 8000 -P 10 -c IP
iperf3 output:
Code:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  3.08 GBytes  2.64 Gbits/sec    0             sender
[  5]   0.00-10.00  sec  3.08 GBytes  2.64 Gbits/sec                  receiver
[  7]   0.00-10.00  sec  3.08 GBytes  2.64 Gbits/sec    0             sender
[  7]   0.00-10.00  sec  3.08 GBytes  2.64 Gbits/sec                  receiver
[  9]   0.00-10.00  sec  3.08 GBytes  2.64 Gbits/sec    0             sender
[  9]   0.00-10.00  sec  3.08 GBytes  2.64 Gbits/sec                  receiver
[ 11]   0.00-10.00  sec  3.08 GBytes  2.64 Gbits/sec    0             sender
[ 11]   0.00-10.00  sec  3.08 GBytes  2.64 Gbits/sec                  receiver
[ 13]   0.00-10.00  sec  3.08 GBytes  2.64 Gbits/sec    0             sender
[ 13]   0.00-10.00  sec  3.08 GBytes  2.64 Gbits/sec                  receiver
[ 15]   0.00-10.00  sec  3.08 GBytes  2.64 Gbits/sec    0             sender
[ 15]   0.00-10.00  sec  3.08 GBytes  2.64 Gbits/sec                  receiver
[ 17]   0.00-10.00  sec  3.08 GBytes  2.64 Gbits/sec    0             sender
[ 17]   0.00-10.00  sec  3.08 GBytes  2.64 Gbits/sec                  receiver
[ 19]   0.00-10.00  sec  3.08 GBytes  2.64 Gbits/sec    0             sender
[ 19]   0.00-10.00  sec  3.08 GBytes  2.64 Gbits/sec                  receiver
[ 21]   0.00-10.00  sec  3.08 GBytes  2.64 Gbits/sec    0             sender
[ 21]   0.00-10.00  sec  3.08 GBytes  2.64 Gbits/sec                  receiver
[ 23]   0.00-10.00  sec  3.08 GBytes  2.64 Gbits/sec    0             sender
[ 23]   0.00-10.00  sec  3.08 GBytes  2.64 Gbits/sec                  receiver
[SUM]   0.00-10.00  sec  30.8 GBytes  26.4 Gbits/sec    0             sender
[SUM]   0.00-10.00  sec  30.8 GBytes  26.4 Gbits/sec                  receiver
It has roughly the same result regardless of which of the two servers is the client and which is the server. Also watching the servers during the test neither goes to high CPU utilization. They both stay very low which I think means a lot of the work is being done on the cards themselves.

So does this seem normal? Abnormal? Anyone have any other thoughts of things to tweak or try or maybe things I've missed trying to set this up.
 
Last edited:

fohdeesha

Kaini Industries
Nov 20, 2016
2,747
3,109
113
33
fohdeesha.com
I'm just getting my little home lab setup and I've got 2 servers (R210ii with an E3-1240V2, and a R420 with 2xE5-2430L) running ConnectX-3 VPI cards flashed to 40Gbe connected together through DACs to the two 40Gbe ports on an ICX6610 switch. I've run iperf3 to try to see if the network is working properly and I'm seeing ~ 25Gb/s between them. I wasn't sure if that was an expected outcome given some overhead or if I've got something misconfigured or even if I'm just using iperf wrong/misinterpreting the output.

I've got the network setup for 8000 byte Jumbo Frames, on the switch and on both adapters.

Here's the command I've been using to test it:
Code:
iperf3 -M 8000 -P 10 -c IP
iperf3 output:
Code:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  3.08 GBytes  2.64 Gbits/sec    0             sender
[  5]   0.00-10.00  sec  3.08 GBytes  2.64 Gbits/sec                  receiver
[  7]   0.00-10.00  sec  3.08 GBytes  2.64 Gbits/sec    0             sender
[  7]   0.00-10.00  sec  3.08 GBytes  2.64 Gbits/sec                  receiver
[  9]   0.00-10.00  sec  3.08 GBytes  2.64 Gbits/sec    0             sender
[  9]   0.00-10.00  sec  3.08 GBytes  2.64 Gbits/sec                  receiver
[ 11]   0.00-10.00  sec  3.08 GBytes  2.64 Gbits/sec    0             sender
[ 11]   0.00-10.00  sec  3.08 GBytes  2.64 Gbits/sec                  receiver
[ 13]   0.00-10.00  sec  3.08 GBytes  2.64 Gbits/sec    0             sender
[ 13]   0.00-10.00  sec  3.08 GBytes  2.64 Gbits/sec                  receiver
[ 15]   0.00-10.00  sec  3.08 GBytes  2.64 Gbits/sec    0             sender
[ 15]   0.00-10.00  sec  3.08 GBytes  2.64 Gbits/sec                  receiver
[ 17]   0.00-10.00  sec  3.08 GBytes  2.64 Gbits/sec    0             sender
[ 17]   0.00-10.00  sec  3.08 GBytes  2.64 Gbits/sec                  receiver
[ 19]   0.00-10.00  sec  3.08 GBytes  2.64 Gbits/sec    0             sender
[ 19]   0.00-10.00  sec  3.08 GBytes  2.64 Gbits/sec                  receiver
[ 21]   0.00-10.00  sec  3.08 GBytes  2.64 Gbits/sec    0             sender
[ 21]   0.00-10.00  sec  3.08 GBytes  2.64 Gbits/sec                  receiver
[ 23]   0.00-10.00  sec  3.08 GBytes  2.64 Gbits/sec    0             sender
[ 23]   0.00-10.00  sec  3.08 GBytes  2.64 Gbits/sec                  receiver
[SUM]   0.00-10.00  sec  30.8 GBytes  26.4 Gbits/sec    0             sender
[SUM]   0.00-10.00  sec  30.8 GBytes  26.4 Gbits/sec                  receiver
It has roughly the same result regardless of which of the two servers is the client and which is the server. Also watching the servers during the test neither goes to high CPU utilization. They both stay very low which I think means a lot of the work is being done on the cards themselves.

So does this seem normal? Abnormal? Anyone have any other thoughts of things to tweak or try or maybe things I've missed trying to set this up.

I can almost guarantee you're hitting a CPU limit on both sides, the single-threaded perf of that E5-2430L is *really* low (about 1005, roughly the same as a t630 thin client for reference). for stuff like this single thread perf is what matters, you can confirm that's what's going on more or less by running iperf3 without the -P option (so it's only one transfer), it will probably be MUCH lower. a single core should be able to do 7-8gbps on older systems no issue at mtu 1500, with 8000 mtu frames it should peg 9-10gbps pretty easily. I have a feeling on these CPUs you'll see closer to ~4gbps (single transfer).

Depending on the tool you're using to monitor CPU usage you might not see it, as it's almost entirely the CPU having to generate interrupts, which is more noticeable on the receiving side. Another sign you're hitting a CPU wall would be if you get slightly better results when *sending* from the slower processor (the 2430L) as generating transmit interrupts is slightly easier scheduler wise versus having to generate *receive* interrupts
 
  • Like
Reactions: dontwanna and klui

firworks

Member
May 7, 2021
37
27
18
I think you are right and I did confirm that the throughput is much higher in one direction than the other. I must have misremembered testing it both ways or confused it with testing the 10gbe card in my workstation. I just grabbed 2x E5-2450V2s off eBay for what I hope was a good price (89$) to test out the theory.
 
  • Like
Reactions: fohdeesha

NablaSquaredG

Layer 1 Magician
Aug 17, 2020
1,355
827
113
Have you also compared without the ICX6610 in between?

I'm almost certain that the ICX6610 is not slowing down anything at all