Mellanox MCX354-FCBT on M.2 riser connected at 40Gbit only achieving 6.5Gbit in iperf3

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

NachoCDN

Active Member
Apr 18, 2016
111
91
28
53
Alright! took out the crappy M.2 extension and still on a M.2 riser and connected at PCIe 3.0! iperf3 Ubuntu 20.04 -> Pop OS 20.10 LiveUSB is 20.3Gbits. Both NICs are on x4 slots now. So I'm glad I can at least get this baseline level of speed without bending over backwards sacrificing GPU bandwidth on these consumer platforms. I guess it does not surprise me that halving the lanes halves the throughput, though it's mildly disappointing it does peg it at 20, whereas I know with the full 8 lanes it will never quite reach 40.

Now to test Windows again...
I also see between 5.5Gb/s and 6Gb/s when transferring using iperf3 under windows, with the cards in an x4 slot. Can I assume it's a limitation of the adapter card and not the connect-x3 itself? What else would I use for an adapter card if not an adapter card?
 

NachoCDN

Active Member
Apr 18, 2016
111
91
28
53
ok so this was a rookie mistake on my part.. once I parallelized my iperf to about 10, from one direction, i'm getting what I would expect.

2022-06-10_23h38_06.jpg

But only from one side. if I test without reversing the copy, then i only get about 5.5~6Gb/s



1654918818522.png

Now i do believe the cause of this issue is the motherboard in the "server," which is a.. ah. JINGSHA X99 D8.. so I think I'm not getting what I would expect because the slots are shared with M.2 storage. I tried various slots but could only get the system to boot in one configuration. not 10Gb/s, but I'm happy with the purchase overall.
 
Last edited:

jdnz

Member
Apr 29, 2021
81
21
8
So I'm glad I can at least get this baseline level of speed without bending over backwards sacrificing GPU bandwidth on these consumer platforms.
before you get too hung up about sacrificing GPU bandwidth by putting the NIC in the 2nd x16 slot and droppinbg the GPU from x16 to x8, check the specs on your GPU - you'll be surprised how many GPUs actually only run x8 anyway!!!

On the server end things are even easier as typically all you want is a text mode console, - that's where open ended x1 slots come into their own, lots of the servers at work have the console gpu ( typically a low power card like a quadro 600 ) in the pcie2x1 slot -works fine and maximises available high bandwidth slots for cards that actually need it (raid controllers , high speed NICs, computer oriented GPUs etc). See attached photo of an old z420 we use for testing stuff at work - the top gpu ( in pcie2x1 slot ) is running the console, all the rest as passed thru in esxi to vms ( yes, we had to raid the deepest parts of our 'junk' gpu box for the test to find enough single-width gpus ). Helps that the z series has a BIOS setting which allows you to explicitly select which slot will be use for BIOS POST/console.
 

Attachments

Last edited:

NachoCDN

Active Member
Apr 18, 2016
111
91
28
53
before you get too hung up about sacrificing GPU bandwidth by putting the NIC in the 2nd x16 slot and droppinbg the GPU from x16 to x8, check the specs on your GPU - you'll be surprised how many GPUs actually only run x8 anyway!!!

On the server end things are even easier as typically all you want is a text mode console, - that's where open ended x1 slots come into their own, lots of the servers at work have the console gpu ( typically a low power card like a quadro 600 ) in the pcie2x1 slot -works fine and maximises available high bandwidth slots for cards that actually need it (raid controllers , high speed NICs, computer oriented GPUs etc). See attached photo of an old z420 we use for testing stuff at work - the top gpu ( in pcie2x1 slot ) is running the console, all the rest as passed thru in esxi to vms ( yes, we had to raid the deepest parts of our 'junk' gpu box for the test to find enough single-width gpus ). Helps that the z series has a BIOS setting which allows you to explicitly select which slot will be use for BIOS POST/console.
ah you don't have enough video cards in that picture!! :p
 

unphased

Active Member
Jun 9, 2022
148
26
28
What a fun picture! thanks for that.

Yeah this GPU is actually an RTX 3080. It's gross overkill since this is a workstation and I use it for work and although we do nvidia things for work, I do honestly rarely game on it, it is a bit of a waste. Going down to pcie4 x8 doesn't quite count as hamstringing it, and it is indeed really really useful in a pinch to use that slot, but I'll try to keep that GPU at x16 when I can.

It's being wasted (and I don't mine crypto) but I do appreciate the nvidia drivers' ability in recent days of providing good performance in the desktop for all of the 120hz monitors I have hooked up! Probably a 3060 could offer the same experience! Oh well! Now, the other machine i'm connecting (which is going to be running windows primarily) has another 3080 in it. Since it's a mini-ITX X570 platform (I've got a 5800X in there that I'm angling to replace with a 5800X3D...) I have to drop down to x4 for the NIC by running it off a riser from the secondary M.2 slot. I am very stoked to evaluate the performance of a ZFS SMB share as steam library folder off what looks like will be a 25Gbit connection. And there will be more distributed computing experiments I'll come up with to try I am sure.
 

unphased

Active Member
Jun 9, 2022
148
26
28
I also see between 5.5Gb/s and 6Gb/s when transferring using iperf3 under windows, with the cards in an x4 slot. Can I assume it's a limitation of the adapter card and not the connect-x3 itself? What else would I use for an adapter card if not an adapter card?
See if you boot into Linux whether you'll get a report that tells you the pcie version it's connected at. My original 6.5Gbps bottleneck was due to the 8Gbps link speed of PCIe 1.0 x4. It was running at PCIe 1.0 signaling due to the poor quality of the connection I was using on that machine at the time. I tried a few tools from Windows such as AIDA64 but nothing seems to be built to help troubleshoot this. Of course stuff like GPU-Z will show the link information for GPUs but this thing isn't a GPU.

Once I actually paid attention to the message 8.000 Gb/s available PCIe bandwidth in syslog it spelled out very clearly what was going on.

I will also note! The overhead ratio looks identical.

Degraded PCIe 1.0 x4: 6.5Gbps observed / 8 Gbps theoretical = 0.8125
PCIe 3.0 x4: 25.7Gbps observed / 32 Gbps theoretical = 0.803

No idea if this is a coincidence or not.
 

prdtabim

Active Member
Jan 29, 2022
171
66
28
Nice, @prdtabim what kind of speed do you get without modifying ring buffer size?
33Gb/s in most tests. This is point to point using a QSFP+ AOC cable.
Using QSFP to SFP+ adapter limits to 10Gb/s allowing to use my Mikrotik CRS309. Iperf3 shows 9.92 Gb/s using MTU 9000.
 

prdtabim

Active Member
Jan 29, 2022
171
66
28
See if you boot into Linux whether you'll get a report that tells you the pcie version it's connected at. My original 6.5Gbps bottleneck was due to the 8Gbps link speed of PCIe 1.0 x4. It was running at PCIe 1.0 signaling due to the poor quality of the connection I was using on that machine at the time. I tried a few tools from Windows such as AIDA64 but nothing seems to be built to help troubleshoot this. Of course stuff like GPU-Z will show the link information for GPUs but this thing isn't a GPU.

Once I actually paid attention to the message 8.000 Gb/s available PCIe bandwidth in syslog it spelled out very clearly what was going on.

I will also note! The overhead ratio looks identical.

Degraded PCIe 1.0 x4: 6.5Gbps observed / 8 Gbps theoretical = 0.8125
PCIe 3.0 x4: 25.7Gbps observed / 32 Gbps theoretical = 0.803

No idea if this is a coincidence or not.
Looks odd since PCIe 1.1 and 2.0 use a 10/8 error coding and PCIe 3.0 use a 66/64.
 
  • Like
Reactions: unphased

jdnz

Member
Apr 29, 2021
81
21
8
Looks odd since PCIe 1.1 and 2.0 use a 10/8 error coding and PCIe 3.0 use a 66/64.
but the lspci output states link speed was 2.5gt/s ( pcie1 ) - so with the 20% overhead for pci1/pci2 that's 2gt/s usable bandwidth per lane - times 4 is 8gt/s ( hence 8Gbps theoretical throughput )

so the OP's observation is correct
 

unphased

Active Member
Jun 9, 2022
148
26
28
but the lspci output states link speed was 2.5gt/s ( pcie1 ) - so with the 20% overhead for pci1/pci2 that's 2gt/s usable bandwidth per lane - times 4 is 8gt/s ( hence 8Gbps theoretical throughput )

so the OP's observation is correct
Well, hold on a sec, so it looks like based on this new learning the 6.5Gbit speed on 2.5GT/s 4x makes sense, but if the PCIe 3.0 error coding is 66/64 that's a much lower ratio than 10/8 and something else is limiting 32Gbit theoretical down to ~25. Anyways. Fine enough for me. More pressing now are some issues related to "cable disconnected" upon windows waking up from sleep. Needs more testing.
 

unphased

Active Member
Jun 9, 2022
148
26
28
worse decision i made was purchasing monitors without vesa mounts.. i have 6 monitors and can only use 3 with vesa mounts.. 6 monitors hooked up in a grid configuration would rock!!
I have this monster 8-monitor humanscale monitor arm I got from a guy on craigslist. So far I have 3 monitors mounted on it all arrayed around my main ultrawide monitor, since not only are modern widescreen monitors too large to squeeze 8 into the space (6 would also be tight), it only came with like 5 of the mount plates anyway.

I really would like to upgrade to a grid of 6 monitors someday, that would be pretty useful. Yeah. when you have a lot of monitors it becomes impractical to place more than something like 3 of them side by side because the neck movement becomes unreasonable past that point. Really need to start using the vertical space!
 
  • Like
Reactions: NachoCDN

unphased

Active Member
Jun 9, 2022
148
26
28
but then i need to look up :D
Yes which is why the 3x2 6 monitor configuration is pretty nice. You do it with widescreens and you’ll get a gimongous ultrawide out of it. Looking up and down is definitely more straining than left and right so thats why we only make the grid 2 monitors high ;)
 
  • Like
Reactions: NachoCDN

NachoCDN

Active Member
Apr 18, 2016
111
91
28
53
. You do it with widescreens and you’ll get a gimongous ultrawide out of it.
exactly.. you arrange them in a circle and then just swivel to the left and the right.. to quote alan watts.. "you are always in the same place that you always are, it just appears to change"
 
  • Like
Reactions: unphased

unphased

Active Member
Jun 9, 2022
148
26
28
worse decision i made was purchasing monitors without vesa mounts.. i have 6 monitors and can only use 3 with vesa mounts.. 6 monitors hooked up in a grid configuration would rock!!
Yeah so since i have an ATX mobo again, I finally got around to adding a second little AMD card in addition to the 3080. Well, in Ubuntu 20.04 the multi monitor behavior is atrocious. Yeah I can run 3 monitors off the 3080 and my DVI monitor off the AMD card but each time they go to sleep and wake back up the display positioning setup is completely broken and multiple monitors usually get switched off. switching them back on 98% of the time results in a reset monitor positioning, or a failure to switch on. It's frankly not a usable system if I want the idle power off behavior. Since I don't have solar power set up yet I might have to stick to using only two monitors on this workstation.

Maybe running two nvidia cards will make it possible to run 6 monitors.