Mellanox MCX354-FCBT on M.2 riser connected at 40Gbit only achieving 6.5Gbit in iperf3

NachoCDN · Jun 10, 2022

unphased said:
Alright! took out the crappy M.2 extension and still on a M.2 riser and connected at PCIe 3.0! iperf3 Ubuntu 20.04 -> Pop OS 20.10 LiveUSB is 20.3Gbits. Both NICs are on x4 slots now. So I'm glad I can at least get this baseline level of speed without bending over backwards sacrificing GPU bandwidth on these consumer platforms. I guess it does not surprise me that halving the lanes halves the throughput, though it's mildly disappointing it does peg it at 20, whereas I know with the full 8 lanes it will never quite reach 40.

Now to test Windows again...

I also see between 5.5Gb/s and 6Gb/s when transferring using iperf3 under windows, with the cards in an x4 slot. Can I assume it's a limitation of the adapter card and not the connect-x3 itself? What else would I use for an adapter card if not an adapter card?

NachoCDN · Jun 10, 2022

ok so this was a rookie mistake on my part.. once I parallelized my iperf to about 10, from one direction, i'm getting what I would expect.

But only from one side. if I test without reversing the copy, then i only get about 5.5~6Gb/s

Now i do believe the cause of this issue is the motherboard in the "server," which is a.. ah. JINGSHA X99 D8.. so I think I'm not getting what I would expect because the slots are shared with M.2 storage. I tried various slots but could only get the system to boot in one configuration. not 10Gb/s, but I'm happy with the purchase overall.

jdnz · Jun 11, 2022

unphased said:
So I'm glad I can at least get this baseline level of speed without bending over backwards sacrificing GPU bandwidth on these consumer platforms.

before you get too hung up about sacrificing GPU bandwidth by putting the NIC in the 2nd x16 slot and droppinbg the GPU from x16 to x8, check the specs on your GPU - you'll be surprised how many GPUs actually only run x8 anyway!!!

On the server end things are even easier as typically all you want is a text mode console, - that's where open ended x1 slots come into their own, lots of the servers at work have the console gpu ( typically a low power card like a quadro 600 ) in the pcie2x1 slot -works fine and maximises available high bandwidth slots for cards that actually need it (raid controllers , high speed NICs, computer oriented GPUs etc). See attached photo of an old z420 we use for testing stuff at work - the top gpu ( in pcie2x1 slot ) is running the console, all the rest as passed thru in esxi to vms ( yes, we had to raid the deepest parts of our 'junk' gpu box for the test to find enough single-width gpus ). Helps that the z series has a BIOS setting which allows you to explicitly select which slot will be use for BIOS POST/console.

NachoCDN · Jun 11, 2022

jdnz said:
before you get too hung up about sacrificing GPU bandwidth by putting the NIC in the 2nd x16 slot and droppinbg the GPU from x16 to x8, check the specs on your GPU - you'll be surprised how many GPUs actually only run x8 anyway!!!

On the server end things are even easier as typically all you want is a text mode console, - that's where open ended x1 slots come into their own, lots of the servers at work have the console gpu ( typically a low power card like a quadro 600 ) in the pcie2x1 slot -works fine and maximises available high bandwidth slots for cards that actually need it (raid controllers , high speed NICs, computer oriented GPUs etc). See attached photo of an old z420 we use for testing stuff at work - the top gpu ( in pcie2x1 slot ) is running the console, all the rest as passed thru in esxi to vms ( yes, we had to raid the deepest parts of our 'junk' gpu box for the test to find enough single-width gpus ). Helps that the z series has a BIOS setting which allows you to explicitly select which slot will be use for BIOS POST/console.

ah you don't have enough video cards in that picture!!

unphased · Jun 11, 2022

jdnz said:
...

What a fun picture! thanks for that.

Yeah this GPU is actually an RTX 3080. It's gross overkill since this is a workstation and I use it for work and although we do nvidia things for work, I do honestly rarely game on it, it is a bit of a waste. Going down to pcie4 x8 doesn't quite count as hamstringing it, and it is indeed really really useful in a pinch to use that slot, but I'll try to keep that GPU at x16 when I can.

It's being wasted (and I don't mine crypto) but I do appreciate the nvidia drivers' ability in recent days of providing good performance in the desktop for all of the 120hz monitors I have hooked up! Probably a 3060 could offer the same experience! Oh well! Now, the other machine i'm connecting (which is going to be running windows primarily) has another 3080 in it. Since it's a mini-ITX X570 platform (I've got a 5800X in there that I'm angling to replace with a 5800X3D...) I have to drop down to x4 for the NIC by running it off a riser from the secondary M.2 slot. I am very stoked to evaluate the performance of a ZFS SMB share as steam library folder off what looks like will be a 25Gbit connection. And there will be more distributed computing experiments I'll come up with to try I am sure.

unphased · Jun 12, 2022

prdtabim said:
Using qsfp cable point to point 36-39Gb/s with mtu 9000 and max ring buffers ( in linux see ethtool -g )

Nice, @prdtabim what kind of speed do you get without modifying ring buffer size?

unphased · Jun 12, 2022

NachoCDN said:
I also see between 5.5Gb/s and 6Gb/s when transferring using iperf3 under windows, with the cards in an x4 slot. Can I assume it's a limitation of the adapter card and not the connect-x3 itself? What else would I use for an adapter card if not an adapter card?

See if you boot into Linux whether you'll get a report that tells you the pcie version it's connected at. My original 6.5Gbps bottleneck was due to the 8Gbps link speed of PCIe 1.0 x4. It was running at PCIe 1.0 signaling due to the poor quality of the connection I was using on that machine at the time. I tried a few tools from Windows such as AIDA64 but nothing seems to be built to help troubleshoot this. Of course stuff like GPU-Z will show the link information for GPUs but this thing isn't a GPU.

Once I actually paid attention to the message 8.000 Gb/s available PCIe bandwidth in syslog it spelled out very clearly what was going on.

I will also note! The overhead ratio looks identical.

Degraded PCIe 1.0 x4: 6.5Gbps observed / 8 Gbps theoretical = 0.8125
PCIe 3.0 x4: 25.7Gbps observed / 32 Gbps theoretical = 0.803

No idea if this is a coincidence or not.

prdtabim · Jun 12, 2022

unphased said:
Nice, @prdtabim what kind of speed do you get without modifying ring buffer size?

33Gb/s in most tests. This is point to point using a QSFP+ AOC cable.
Using QSFP to SFP+ adapter limits to 10Gb/s allowing to use my Mikrotik CRS309. Iperf3 shows 9.92 Gb/s using MTU 9000.

prdtabim · Jun 12, 2022

unphased said:
See if you boot into Linux whether you'll get a report that tells you the pcie version it's connected at. My original 6.5Gbps bottleneck was due to the 8Gbps link speed of PCIe 1.0 x4. It was running at PCIe 1.0 signaling due to the poor quality of the connection I was using on that machine at the time. I tried a few tools from Windows such as AIDA64 but nothing seems to be built to help troubleshoot this. Of course stuff like GPU-Z will show the link information for GPUs but this thing isn't a GPU.

Once I actually paid attention to the message 8.000 Gb/s available PCIe bandwidth in syslog it spelled out very clearly what was going on.

I will also note! The overhead ratio looks identical.

Degraded PCIe 1.0 x4: 6.5Gbps observed / 8 Gbps theoretical = 0.8125
PCIe 3.0 x4: 25.7Gbps observed / 32 Gbps theoretical = 0.803

No idea if this is a coincidence or not.

Looks odd since PCIe 1.1 and 2.0 use a 10/8 error coding and PCIe 3.0 use a 66/64.

NachoCDN · Jun 12, 2022

unphased said:
all of the 120hz monitors I have hooked up!

worse decision i made was purchasing monitors without vesa mounts.. i have 6 monitors and can only use 3 with vesa mounts.. 6 monitors hooked up in a grid configuration would rock!!

jdnz · Jun 12, 2022

prdtabim said:
Looks odd since PCIe 1.1 and 2.0 use a 10/8 error coding and PCIe 3.0 use a 66/64.

but the lspci output states link speed was 2.5gt/s ( pcie1 ) - so with the 20% overhead for pci1/pci2 that's 2gt/s usable bandwidth per lane - times 4 is 8gt/s ( hence 8Gbps theoretical throughput )

so the OP's observation is correct

unphased · Jun 13, 2022

jdnz said:
but the lspci output states link speed was 2.5gt/s ( pcie1 ) - so with the 20% overhead for pci1/pci2 that's 2gt/s usable bandwidth per lane - times 4 is 8gt/s ( hence 8Gbps theoretical throughput )

so the OP's observation is correct

Well, hold on a sec, so it looks like based on this new learning the 6.5Gbit speed on 2.5GT/s 4x makes sense, but if the PCIe 3.0 error coding is 66/64 that's a much lower ratio than 10/8 and something else is limiting 32Gbit theoretical down to ~25. Anyways. Fine enough for me. More pressing now are some issues related to "cable disconnected" upon windows waking up from sleep. Needs more testing.

i386 · Jun 13, 2022

pcie, ethernet, ip, tcp/udp and application overhead included?

unphased · Jun 13, 2022

NachoCDN said:
worse decision i made was purchasing monitors without vesa mounts.. i have 6 monitors and can only use 3 with vesa mounts.. 6 monitors hooked up in a grid configuration would rock!!

I have this monster 8-monitor humanscale monitor arm I got from a guy on craigslist. So far I have 3 monitors mounted on it all arrayed around my main ultrawide monitor, since not only are modern widescreen monitors too large to squeeze 8 into the space (6 would also be tight), it only came with like 5 of the mount plates anyway.

I really would like to upgrade to a grid of 6 monitors someday, that would be pretty useful. Yeah. when you have a lot of monitors it becomes impractical to place more than something like 3 of them side by side because the neck movement becomes unreasonable past that point. Really need to start using the vertical space!

NachoCDN · Jun 13, 2022

unphased said:
Really need to start using the vertical space!

but then i need to look up

unphased · Jun 13, 2022

NachoCDN said:
but then i need to look up

Yes which is why the 3x2 6 monitor configuration is pretty nice. You do it with widescreens and you’ll get a gimongous ultrawide out of it. Looking up and down is definitely more straining than left and right so thats why we only make the grid 2 monitors high

NachoCDN · Jun 13, 2022

unphased said:
. You do it with widescreens and you’ll get a gimongous ultrawide out of it.

exactly.. you arrange them in a circle and then just swivel to the left and the right.. to quote alan watts.. "you are always in the same place that you always are, it just appears to change"

unphased · Aug 17, 2022

NachoCDN said:
worse decision i made was purchasing monitors without vesa mounts.. i have 6 monitors and can only use 3 with vesa mounts.. 6 monitors hooked up in a grid configuration would rock!!

Yeah so since i have an ATX mobo again, I finally got around to adding a second little AMD card in addition to the 3080. Well, in Ubuntu 20.04 the multi monitor behavior is atrocious. Yeah I can run 3 monitors off the 3080 and my DVI monitor off the AMD card but each time they go to sleep and wake back up the display positioning setup is completely broken and multiple monitors usually get switched off. switching them back on 98% of the time results in a reset monitor positioning, or a failure to switch on. It's frankly not a usable system if I want the idle power off behavior. Since I don't have solar power set up yet I might have to stick to using only two monitors on this workstation.

Maybe running two nvidia cards will make it possible to run 6 monitors.

Search

Mellanox MCX354-FCBT on M.2 riser connected at 40Gbit only achieving 6.5Gbit in iperf3

NachoCDN

Active Member

NachoCDN

Active Member

jdnz

Member

Attachments

NachoCDN

Active Member

unphased

Active Member

unphased

Active Member

unphased

Active Member

prdtabim

Active Member

prdtabim

Active Member

NachoCDN

Active Member

jdnz

Member

unphased

Active Member

i386

Well-Known Member

unphased

Active Member

NachoCDN

Active Member

unphased

Active Member

NachoCDN

Active Member

unphased

Active Member