Mellanox MCX354-FCBT on M.2 riser connected at 40Gbit only achieving 6.5Gbit in iperf3

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

unphased

Active Member
Jun 9, 2022
148
26
28
I have one of these cards in each of two systems, one running Windows 10 and one running Ubuntu 20.04.

I'm getting the same speed on iperf3 whether the windows machine is server or host. Also whether parallel processes are being used with iperf3. A DAC is connecting the two machines.

Has anyone else tried bottlenecking these pcie3 x8 cards to x4 lanes via the use of an M.2 slot or other electrically x4 slot? I expected possibly to be held to 20Gbit but not under 10Gbit!

Oh and yes I have set the MTU to 9000 on both ends in the standard network manager settings on both OS's, but that's the only tweaking I've done. Perhaps I need to do more tweaking? Engaging jumbo packets did bring the speed up from 5.5Gbit to 6.5Gbit, I'll note.

Ohhhh I have not installed any mlnx drivers on the Ubuntu side, i guess that might be it??
 

i386

Well-Known Member
Mar 18, 2016
4,220
1,540
113
34
Germany
In my experience iperf3 always under performed on windows machines (with different 10/40/100gbe nics from mellanox and intel).

If I were you I would test windows to windows with another tool (my favorite is GitHub - microsoft/ntttcp) or linux to linux with iperf3.
 

unphased

Active Member
Jun 9, 2022
148
26
28
Goodness, it's not easy to get this linux driver installed. I'm on this kernel: 5.13.0-44-generic and I've tried installing all of these:

Code:
❯ ls mlnx*.tgz -1                          
mlnx-en-4.9-4.0.8.0-ubuntu20.04-x86_64.tgz  
mlnx-en-4.9-4.1.7.0-ubuntu20.04-x86_64.tgz  
mlnx-en-5.0-1.0.0.0-ubuntu20.04-x86_64.tgz  
mlnx-en-5.1-1.0.4.0-ubuntu20.04-x86_64.tgz  
mlnx-en-5.4-3.1.0.0-ubuntu20.04-x86_64.tgz  
mlnx-en-5.6-1.0.3.5-ubuntu20.04-x86_64.tgz
To no avail. Only three of these have any support for the CX3; out of these:

Code:
❯ tail -n10 /tmp/mlnx*.logs/mlnx-en-dkms.make.log                                                                       
==> /tmp/mlnx-en.2150037.logs/mlnx-en-dkms.make.log <==                                                                 
checking for /boot/kernel.h... no                                                                                       
checking for /var/adm/running-kernel.h... no                                                                            
checking for /usr/src/linux-headers-5.13.0-44-generic/.config... yes                                                    
checking for /usr/src/linux-headers-5.13.0-44-generic/include/generated/autoconf.h... yes                               
checking for /usr/src/linux-headers-5.13.0-44-generic/include/linux/kconfig.h... yes                                    
checking for build ARCH... ARCH=, SRCARCH=x86                                                                           
checking for cross compilation... no                                                                                    
checking for external module build target... configure: error: kernel module make failed; check config.log for details  
                                                                                                                        
Failed executing ./configure                                                                                            
==> /tmp/mlnx-en.2209823.logs/mlnx-en-dkms.make.log <==                                                                 
checking for /usr/src/linux-headers-5.13.0-44-generic... yes                                                            
checking for Linux objects dir... /usr/src/linux-headers-5.13.0-44-generic                                              
checking for /boot/kernel.h... no                                                                                       
checking for /var/adm/running-kernel.h... no                                                                            
checking for /usr/src/linux-headers-5.13.0-44-generic/.config... yes                                                    
checking for /usr/src/linux-headers-5.13.0-44-generic/include/generated/autoconf.h... no                                
checking for /usr/src/linux-headers-5.13.0-44-generic/include/linux/autoconf.h... no                                    
configure: error: Run make config in /usr/src/linux-headers-5.13.0-44-generic.                                          
                                                                                                                        
Failed executing ./configure                                                                                            
==> /tmp/mlnx-en.2260324.logs/mlnx-en-dkms.make.log <==                                                                 
checking for /usr/src/linux-headers-5.13.0-44-generic... yes                                                            
checking for Linux objects dir... /usr/src/linux-headers-5.13.0-44-generic                                              
checking for /boot/kernel.h... no                                                                                       
checking for /var/adm/running-kernel.h... no                                                                            
checking for /usr/src/linux-headers-5.13.0-44-generic/.config... yes                                                    
checking for /usr/src/linux-headers-5.13.0-44-generic/include/generated/autoconf.h... no                                
checking for /usr/src/linux-headers-5.13.0-44-generic/include/linux/autoconf.h... no                                    
configure: error: Run make config in /usr/src/linux-headers-5.13.0-44-generic.                                          
                                                                                                                        
Failed executing ./configure
It's enough information to potentially move forward with but this is not very fun. And I was definitely going to move to 22.04 soon. It would be a real bummer if I was stuck on 20.04 or god forbid 18.04 just to get this driver going.

Also what is the deal with the kernel able to run this card in ethernet mode out of the box? That surprised me. I'm pretty sure I never had the card work like that back when I was on 18.04.
 

unphased

Active Member
Jun 9, 2022
148
26
28
In my experience iperf3 always under performed on windows machines (with different 10/40/100gbe nics from mellanox and intel).

If I were you I would test windows to windows with another tool (my favorite is GitHub - microsoft/ntttcp) or linux to linux with iperf3.
Thanks for the tip! I was just about to go check out ntttcp as well. Since I did find a link to it just today. And it had also come up in my travels last time I messed with these cards a year or two back.

Now I have an interesting idea, just to try to evaluate the situation with the linux driver, is it possible to connect my DAC from one port to the other on that system and test for throughput that way? I'd do the same static ip setup for it. Perhaps worth a try before I hit the hay.

Update: It's not clear to me if I'll be able to force the iperf to serve on one interface and run client on the other interface. hmm.
 

unphased

Active Member
Jun 9, 2022
148
26
28
Welp I tried to follow Sam Lewis but it did not work. But the instructions are unclear on one particular point: I could not figure out which instances of "eth0" and "eth1" I needed to replace with "enp4s0" and "enp4s0d1" and now I have to reboot the system since the devices are gone and the ns devices don't have any devices in them.
 

unphased

Active Member
Jun 9, 2022
148
26
28
So

- I tried to use pop os to bring linux up on both machines. this took a lot of trial and error because for some reason only 20.10 worked properly, and that iso was hard to source. Eventually i determined that when both are running linux and connected via DAC, Network Manager doesnt show the link as connected!!! So I cannot go in and configure it properly. Maybe there is something I have to figure out how to do to bring them out of IB mode into Ethernet mode.

- going back to windows on one machine, they can connect (Linux side shows the 40000Mbps connection in Network Manager...). I got ntttcp working windows <-> linux as well. Guess what, also 6.5Gbit!

I realized also in my current test configuration that both computers are on pcie 3.0 x4 for the MCX354's due to these being consumer platforms (RIP HEDT, I'm still pissed). One is running on the last x4 slot on X570 Crosshair VIII Dark Hero, and the other is running off secondary M.2 on X570 Strix-ITX.

Yeah there is a test I can do by bringing back out the Zenith Extreme on one end for oodles of x16 slots and plugging the card into the 2nd slot (bringing GPU down to x8) in the Dark Hero, that would again give full pcie bandwidth for conducting a test. OK for now I will do a x4 to x8 test as that is much easier to conduct first.

I was hoping dropping to x4 would not impact the 40gbit, because that is almost precisely pcie3.0 x4. I was hoping protocol overheads might limit it to 30Gbit or something like that. As I understand pcie it is duplex.
 

unphased

Active Member
Jun 9, 2022
148
26
28
OK ok I pulled the card out and it is labeled QCBT... gonna try the other QCBT to see if maybe i picked the one that i did not flash to FCBT. I have 1xFCBT and 2xQCBT cards.
 

unphased

Active Member
Jun 9, 2022
148
26
28
Oh my goodness, so the system does not detect a card in the second slot. WTF...

*edit*: Ok this might have been a fluke, since afterward i was able to put one card in each of the two extra slots and they both got detected.
 
Last edited:

joeribl

Active Member
Jun 6, 2021
128
45
28
Did you check on Linux what the actual PCIe link speed is used?

should be a message like:

Code:
[16.270996] mlx4_core 0000:06:00.0: 16.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x4 link at 0000:02:04.0 (capable of 63.008 Gb/s with 8.0 GT/s PCIe x8 link)
 

unphased

Active Member
Jun 9, 2022
148
26
28
Yep. I saw that. Also able to find relevant link speed info via lspci -vvv.

still not sure about the root cause of underwhelming throughput but everything else seems to be accounted for now. I also just flashed the last qcbt card to fcbt.
 

prdtabim

Active Member
Jan 29, 2022
170
66
28
So

- I tried to use pop os to bring linux up on both machines. this took a lot of trial and error because for some reason only 20.10 worked properly, and that iso was hard to source. Eventually i determined that when both are running linux and connected via DAC, Network Manager doesnt show the link as connected!!! So I cannot go in and configure it properly. Maybe there is something I have to figure out how to do to bring them out of IB mode into Ethernet mode.

- going back to windows on one machine, they can connect (Linux side shows the 40000Mbps connection in Network Manager...). I got ntttcp working windows <-> linux as well. Guess what, also 6.5Gbit!

I realized also in my current test configuration that both computers are on pcie 3.0 x4 for the MCX354's due to these being consumer platforms (RIP HEDT, I'm still pissed). One is running on the last x4 slot on X570 Crosshair VIII Dark Hero, and the other is running off secondary M.2 on X570 Strix-ITX.

Yeah there is a test I can do by bringing back out the Zenith Extreme on one end for oodles of x16 slots and plugging the card into the 2nd slot (bringing GPU down to x8) in the Dark Hero, that would again give full pcie bandwidth for conducting a test. OK for now I will do a x4 to x8 test as that is much easier to conduct first.

I was hoping dropping to x4 would not impact the 40gbit, because that is almost precisely pcie3.0 x4. I was hoping protocol overheads might limit it to 30Gbit or something like that. As I understand pcie it is duplex.
PCIe 3.0 x4 have a little mess than 32Gb/s of bandwidth.
In X570 systems you have a PCIe 4.0 x4 at last slot but the bandwidth is shared with the southbridge including various usb and the 2nd M.2. slot.

I use connectx-3 in a X570 system ( Asrock X570 Creator ) in slot 1 sharing lanes with the vga card ( so 8x 8x ). Using Ubuntu 20.04 or 22.04 with kernel driver mlnx4 .
 

unphased

Active Member
Jun 9, 2022
148
26
28
Using Ubuntu 20.04 or 22.04 with kernel driver mlnx4 .
Are you able to get good speeds? Are there any features missing? I had pop_os liveusb running on the second computer, so both were on linux and directly connected, but in this state the interfaces do not connect (reports no cable connected, no activity lights!). When the second computer is running windows, they do connect (once mellanox drivers are installed and the device is switched into Ethernet mode)

I think I need to test some more. need to find out if there is something I can do to toggle the interface into ethernet mode if that's what's the issue here.
 

unphased

Active Member
Jun 9, 2022
148
26
28
I wanna say that it makes sense. based on output of sudo mlxconfig -d /dev/mst/mt4099_pciconf0 query it was set to VPI. It makes sense that when you direct connect two that they would decide to connect IB when in VPI mode. And also that it could talk Ethernet when other end is specifically ethernet (like is the case when connected to either my Ubiquiti switch or one of these cards on windows, on which i've already set it to ethernet).
 

unphased

Active Member
Jun 9, 2022
148
26
28
However I am pretty worried now that i was seeing the same 6.5Gbit limitation even when I had ntttcp running windows <-> linux, but I suppose the only performance that truly matters will be linux at the end of the day. Not really though. I'd love to baremetal boot one of these machines to windows and leverage max network speed for file transfers! Even when the windows is in a VM which might be the final setup, something like this is preferably pcie-passthru'd and that's going to have the same bottleneck.
 
Last edited:

unphased

Active Member
Jun 9, 2022
148
26
28
Ah, ok the smoking gun here is this, I never looked carefully enough earlier. And also nothing under windows actually was able to even tell me about the fact that it was connecting like this:

Code:
8.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x4 link at 0000:03:01.0 (capable of 63.008 Gb/s with 8.0 GT/s PCIe x8 link)
That will yield the just-under-7Gbit bottleneck. PCIe connection must be degraded.

That's a PCIe 1.0 x4 link, so clearly this current setup of an m.2 extension daisy chained to m.2 to pcie slot adapter is doing terrible things to signal integrity.
 

unphased

Active Member
Jun 9, 2022
148
26
28
Alright! took out the crappy M.2 extension and still on a M.2 riser and connected at PCIe 3.0! iperf3 Ubuntu 20.04 -> Pop OS 20.10 LiveUSB is 20.3Gbits. Both NICs are on x4 slots now. So I'm glad I can at least get this baseline level of speed without bending over backwards sacrificing GPU bandwidth on these consumer platforms. I guess it does not surprise me that halving the lanes halves the throughput, though it's mildly disappointing it does peg it at 20, whereas I know with the full 8 lanes it will never quite reach 40.

Now to test Windows again...
 

unphased

Active Member
Jun 9, 2022
148
26
28
With iperf3:

Windows reaches 21.1Gbps. Only in one direction. Other direction is 19Gbps. I have MTU set to 9014. CPU utillization stays under a full core so this checks out.

With ntttcp:

Code:
PS C:\Users\slu\Downloads> ./NTttcp.exe -s -m 16,*,10.10.10.1 -ns
Copyright Version 5.38
Network activity progressing...


Thread  Time(s) Throughput(KB/s) Avg B / Compl
======  ======= ================ =============
     0    5.076       258219.070     65536.000
     1    3.376       388246.445     65536.000
     2    5.352       244902.840     65536.000
     3    5.333       245775.361     65536.000
     4    6.876       190622.455     65536.000
     5    6.773       193521.335     65536.000
     6    6.708       195396.541     65536.000
     7    1.672       783923.445     65536.000
     8    6.387       205216.847     65536.000
     9    4.827       271539.258     65536.000
    10    6.252       209648.113     65536.000
    11    6.334       206934.007     65536.000
    12    6.203       211304.208     65536.000
    13    6.171       212399.935     65536.000
    14    3.365       389515.602     65536.000
    15    6.601       198563.854     65536.000


#####  Totals:  #####


   Bytes(MEG)    realtime(s) Avg Frame Size Throughput(MB/s)
================ =========== ============== ================
    20480.000000       6.877 1130254551.579         2978.015


Throughput(Buffers/s) Cycles/Byte       Buffers
===================== =========== =============
            47648.246       1.687    327680.000


DPCs(count/s) Pkts(num/DPC)   Intr(count/s) Pkts(num/intr)
============= ============= =============== ==============
   153229.652         0.906      223916.221          0.620


Packets Sent Packets Received Retransmits Errors Avg. CPU %
============ ================ =========== ====== ==========
          19           954897           0      0      8.662
Wow, I'm very impressed with this, that's like 24Gbits and the "real time" readout was over 25Gbits on the other command line.

Other direction:

Code:
PS C:\Users\slu\Downloads> ./NTttcp.exe -r -m 16,*,10.10.10.2 -ns
Copyright Version 5.38
Network activity progressing...


Thread  Time(s) Throughput(KB/s) Avg B / Compl
======  ======= ================ =============
     0    6.382       196495.652     62701.762
     1    6.382       196520.252     62709.613
     2    6.382       196495.652     62701.762
     3    6.381       196555.242     62710.950
     4    6.384       196423.128     62698.262
     5    6.382       196518.881     62709.175
     6    6.382       196505.249     62704.825
     7    6.383       196493.655     62710.950
     8    6.382       196520.252     62709.613
     9    6.382       196506.620     62705.262
    10    6.382       196534.041     62714.012
    11    6.382       196514.846     62707.887
    12    6.382       196525.815     62711.387
    13    6.382       196525.815     62711.387
    14    6.381       196564.841     62714.012
    15    6.382       196528.479     62712.238


#####  Totals:  #####


   Bytes(MEG)    realtime(s) Avg Frame Size Throughput(MB/s)
================ =========== ============== ================
    19596.349598       6.384      24191.701         3069.603


Throughput(Buffers/s) Cycles/Byte       Buffers
===================== =========== =============
            49113.640       0.754    313541.594


DPCs(count/s) Pkts(num/DPC)   Intr(count/s) Pkts(num/intr)
============= ============= =============== ==============
   149902.045         0.888      246070.087          0.541


Packets Sent Packets Received Retransmits Errors Avg. CPU %
============ ================ =========== ====== ==========
      718707           849393           0      0      3.993
Booyah! Look at that CPU %! Woof!

This is perfect. 25Gbps does sound well within range of transport overheads, and I'm delighted to see that I can almost max out the pcie3 x4 with a single link, maxing the pcie interfaces, love to see it.

Now I'm totally ready for the MPO fiber and transceivers I'm going to receive in the mail over the next few days. And the technician is coming to replace the panel on my LG C1 TV... My living room TV will have a fiber connection to my NAS...
 

unphased

Active Member
Jun 9, 2022
148
26
28
For sure! it's just great as an upper bound though on what's possible!

I did some tests with SMB and unsurprisingly it will bog down just as much on small files as any other kind of network. It will be fun to see how the real use cases fare. I just love this so much because now I have the hardware I can use to develop and test application file access patterns and things like that in a forward-looking way that is tuned for modern computers. 1Gbit has held us back for far too long. Even 2.5Gbit is still having a hard time to penetrate, and going beyond 10Gbit with copper is a dead end.

It seems even for something so simple like browsing/transferring large quantities of data, if you have a large number of files, most network filesystem implementations bog down significantly. They are not written with modern processor and storage architectures in mind.
 

prdtabim

Active Member
Jan 29, 2022
170
66
28
Are you able to get good speeds? Are there any features missing? I had pop_os liveusb running on the second computer, so both were on linux and directly connected, but in this state the interfaces do not connect (reports no cable connected, no activity lights!). When the second computer is running windows, they do connect (once mellanox drivers are installed and the device is switched into Ethernet mode)

I think I need to test some more. need to find out if there is something I can do to toggle the interface into ethernet mode if that's what's the issue here.
Using qsfp cable point to point 36-39Gb/s with mtu 9000 and max ring buffers ( in linux see ethtool -g )