Possible cause found for slower than expected 10gb networking performance.

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

MountainBofh

Beating my users into submission
Mar 9, 2024
150
121
43
There's been a few posts here recently about people having slower than expected network performance when trying to get 10gb ethernet up and running. I may have stumbled across a possible cause and solution to some of these issues.

My main Windows 10 LTSC box (and my gaming rig) is an AMD 5900x class machine, with a run of the mill Gigabyte B550 motherboard. It has 3 16x pci-e slots, the top one has my Nvidia 3070 in it, and the bottom one has my Mellanox 4x NIC. In this system, the top pci-e slot gets 16 lanes, the middle slot 4 lanes, and the bottom pci-e slot only gets 2 lanes. However the middle slot shares pci-e lanes with one of my M.2 slots (which is occupied) and its also right next to the GPU so I don't use it.

My switch is a cheap but effective Horaco 8 port SFP+ switch, using a realtek 9303 chipset.

With my Mellanox, I get 10gb to my linux file server all day no problems at all. Iperf3 (even though its not 100% reliable on Windows) shows consistent 9.5gb transfer rates, and doing a SMB copy to my file server holds 1.00 to 1.05 GB/sec transfer rates, even with large 50GB+ sized iso files.

I decided to swap out the Mellanox to an Intel X710-DA2 I had sitting collecting dust, for the simple reason of hoping to cut down my power consumption a few watts (the X710's are known to be a pretty low power draw card). As soon as I rebooted with the new card and installed latest drivers and got it working, I re-ran iperf3 and my 50GB iso file copy tests. Performance dropped considerably, down to about 6.5gb/sec speeds in both tests. Knowing that my X710 is a known good card (I've had no problems getting 10gb/sec out of it in other machines), I decided something else had to be going on.

I booted my gaming box into Parted Magic (a usb based Linux distro) and ran iperf3 again. Same results as Windows. So that rules out a driver, iperf windows wonkiness, or OS issue.

Next up I grabbed an Intel X520, and plugged it into my gaming box and booted into Parted Magic. The X520 gave the exact same performance as the X710, so that DEFINITELY rules out the X710 being defective.



Now I'm beginning to think it's an issue with my gaming box. So onto the testbox.

This is a I7-9700K machine with a Gigabyte Z370 based motherboard and 2 pci-e slots (one providing 16 lanes, and the other 4 lanes). There is no GPU in this machine as I just use the Intel integrated graphics on it. I plugged the X710 into the top pci-e slot, booted it into Win 10 LTSC, and repeated the same tests. This time I got the same results as I got on my gaming box with the Mellanox, consistent 10gb transfer speeds. For reasons unknown, the X710 will NOT work in the bottom slot in my test box. However the X520 does work fine in the bottom slot of the test box. I tested the X520 on the top slot as a baseline test, and got 10gb no problem. Moving it to the bottom slot and "restricting" it to 4 lanes still gave me 10gb.

My guess is that the Mellanox 4 cards don't mind if they're only getting 2 pci-e lanes when running at 10gb, while the X710 gets "grumpy" if its not getting 4 pci-e lanes even though 2 lanes at version 3.0 should be more than fast enough for 10gb speeds. My testbox is able to give the Intel X710 or X520 all the pci-e lanes they want, and I suspect this is why the X710 and X520 gives the expected performance in that machine.

Moral of the story - The Intel NIC's appear to have a design flaw/bug where they don't perform well if they're not getting at least 4 lanes from the pci-e bus.
 

nabsltd

Well-Known Member
Jan 26, 2022
447
304
63
My guess is that the Mellanox 4 cards don't mind if they're only getting 2 pci-e lanes when running at 10gb, while the X710 gets "grumpy" if its not getting 4 pci-e lanes even though 2 lanes at version 3.0 should be more than fast enough for 10gb speeds.
Note that the X520 is a PCIe 2.0 card, so 2 lanes would be about 8Gbps. So, right there, you lose 20%.

The X710 is a PCIe 3.0 card, but the design seems to be x4 lanes of PCIe 3.0 per port. Two lanes should be more than enough for 10Gbps, but if Intel thought so, the dual-port cards would only have been x4.
 

MountainBofh

Beating my users into submission
Mar 9, 2024
150
121
43
Note that the X520 is a PCIe 2.0 card, so 2 lanes would be about 8Gbps. So, right there, you lose 20%.

The X710 is a PCIe 3.0 card, but the design seems to be x4 lanes of PCIe 3.0 per port. Two lanes should be more than enough for 10Gbps, but if Intel thought so, the dual-port cards would only have been x4.
I wonder if Intel carried over some design limit from the X520 to the X710. Something that would cause the same performance issue as the X520. 20% does sound about right for the performance loss I was seeing. From 9.5gb to 6.5gb.

Makes me VERY glad I settled on the Mellanox 4x cards as my NIC of choice.
 

mattlach

Active Member
Aug 1, 2014
344
98
28
This may be historical at this point, but in the past I have experienced that Intel NIC's expect all PCIe lanes to be present, or they often will behave badly.

If it is an 8x card, it will want to be in an 8x (electrical) slot, or it will be unhappy.

It doesn't matter if it is a dual port NIC, and you only want to use one of the port, and 4x lanes should be enough bandwidth to support that one port. Give those NIC's all 8 lanes, or they will be unhappy.

At least that has been my experience.

Of course, this means that you won't find any consumer motherboards they will work right in, because in 2024 there are no consumer motherboards with 8x secondary PCIe slots. At least ones that don't sabotage the main PCIe slot and pull it down to 8x mode.

I have for a long time suspected this is by design for market segmentation, to try to force anyone who wants to do anything even remotely similar to enterprise/workstation work to buy an expensive workstation.

I could be wrong though.
 

blunden

Active Member
Nov 29, 2019
498
159
43
This may be historical at this point, but in the past I have experienced that Intel NIC's expect all PCIe lanes to be present, or they often will behave badly.

If it is an 8x card, it will want to be in an 8x (electrical) slot, or it will be unhappy.

It doesn't matter if it is a dual port NIC, and you only want to use one of the port, and 4x lanes should be enough bandwidth to support that one port. Give those NIC's all 8 lanes, or they will be unhappy.

At least that has been my experience.

Of course, this means that you won't find any consumer motherboards they will work right in, because in 2024 there are no consumer motherboards with 8x secondary PCIe slots. At least ones that don't sabotage the main PCIe slot and pull it down to 8x mode.

I have for a long time suspected this is by design for market segmentation, to try to force anyone who wants to do anything even remotely similar to enterprise/workstation work to buy an expensive workstation.

I could be wrong though.
Behave badly in what way? :)
 

mattlach

Active Member
Aug 1, 2014
344
98
28
Behave badly in what way? :)
This varied a little bit from card to card.

My really old Intel AT2 10gbit E10G41AT2 adapters had full 10gig downstream but almost unusable upstream. This was in a Gen2 x4 slot, which should have been able to produce 16gbit full duplex, for reference.

More modern x520 and x710 adapters I've used are just really inconsistent unless you feed all 8 lanes. Random lag spikes, poor performance etc.

Your milage may vary, but this has been my experience.
 

blunden

Active Member
Nov 29, 2019
498
159
43
More modern x520 and x710 adapters I've used are just really inconsistent unless you feed all 8 lanes. Random lag spikes, poor performance etc.

Your milage may vary, but this has been my experience.
I see. I'll have to check for that with my X710-DA2 which is in a slot that is effectively limited to 4 lanes since that's the connection between the X570 chipset and the CPU.

I picked the card due to its low power consumption (and heat generation). Hopefully this doesn't become an issue.