Mellanox Infiniband Problems

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Castlers

New Member
Feb 22, 2021
23
1
3
I'm trying to troubleshoot some issues with my infiniband home network where I have 56gb IB connections from my workstation to my home server through a SX-6036. I'm having trouble getting anything beyond 20gb speeds from anything connected on this network.

My workstation specs that are meaningful:
AMD Ryzen 9 5950x
Nvidia RTX 3090
Samsung 970 evo plus
Windows 10 Pro for Workstations (20H2)
Mellanox CX4 MCX456A-ECAT running at a reported 56gb when viewing it inside windows.

My server specs.
Intel Xeon E5-2660v3
Mellanox CX4 MCX456A-ECAT also reporting that its running at 56gb when viewed inside windows
Windows Server 2019
4x intel nvme SSDS running off of a LSI card and are configured with windows storage pools in raid0

whether I test with iperf3 or use windows explorer I get about 2GB/s reported or a little less.

Both of these computers have RDMA enabled via powershell
 

necr

Active Member
Dec 27, 2017
151
47
28
124
Yes, PCIe link speed and width on the W10 could be a factor. ib_send_bw (installed with WinOF2) or nttcp can be used for a test instead of iperf3.
 

NablaSquaredG

Layer 1 Magician
Aug 17, 2020
1,320
800
113
Keep in Mind that Infiniband and Ethernet is not a good combination, because you lose all offloading features when you use IPoIB.

Have you tested with ib_send_bw yet to get the pure Infiniband RDMA bandwidth?
Have you tried linking the server and client with Ethernet instead of using Infiniband and tunneling the Ethernet packages via IPoIB?
 
Last edited:

Falloutboy

Member
Oct 23, 2011
221
23
18
I don't believe the 5950x even with the latest chipset board will run the Video card and CX3 without you having to force the slots to x8 mode for both cards.
 

NablaSquaredG

Layer 1 Magician
Aug 17, 2020
1,320
800
113
Au contraire! I don't know any motherboard that doesn't automatically switch to x8 / x8 mode when both slots are being used.
 

Castlers

New Member
Feb 22, 2021
23
1
3
Hey guys. I checked through HWINFO64 to see what pcie lanes went where in the system and here are the results for both the GPU and the NIC3090.pngMellanox.PNG
With the 2nd picture being the Mellanox CX4 we can see that it only has a x4 link at 3.0 which is about 4GB/s of bandwidth. This would be a problem if I'm not getting the full 56gbs, but I can't even get past 20gbs at the moment so I'll have to revisit this. The server has a full x16 negotiated to the CX4.
 

NablaSquaredG

Layer 1 Magician
Aug 17, 2020
1,320
800
113
Could it be that you have plugged in the CX4 in the bottommost PCIe "x16" (quoted by purpose, because this slot is electrically x4, always!) slot?
 

Castlers

New Member
Feb 22, 2021
23
1
3
Could it be that you have plugged in the CX4 in the bottommost PCIe "x16" (quoted by purpose) slot?
It could very likely be that its lane allocation could be a factor in not hitting 56gb/s, but I'm only getting 20gb/s at the moment and I have enough bandwidth for about double that.
 

Castlers

New Member
Feb 22, 2021
23
1
3
I'm trying to get that to run atm, I have the latest WINOF2 driver installed, but the only documentation for how to run the test is on linux. How would I run this in powershell or something as there must be a specific mellanox module im supposed to call for it as ib_send_bw isn't recognized as a cmdlet.
 

NablaSquaredG

Layer 1 Magician
Aug 17, 2020
1,320
800
113

Castlers

New Member
Feb 22, 2021
23
1
3
while it would be easier for me the SX6036 is significantly cheaper than pretty much anything and I have a subnet manager/ethernet bridge for all of the machines so buying an equivalent ethernet switch isn't an option
 

NablaSquaredG

Layer 1 Magician
Aug 17, 2020
1,320
800
113
The SX6036 is a VPI switch and supports Ethernet and Infiniband at the same time with the right license.

If you ask the right forum members nicely, you might even get a complete L3 switching license ;-)
 

Castlers

New Member
Feb 22, 2021
23
1
3
I've been trying to get one from mellanox for a little, but its getting hard to convince them to respond about it. I know that the SX6036G has it built in, but I was going to try my luck since you can pick these switches up for around $200 on ebay.
 

NablaSquaredG

Layer 1 Magician
Aug 17, 2020
1,320
800
113
You probanly won't get one from Mellanox (or pay 10 grand).

Just read around a bit and perhaps drop the right forum member a PM, you might soon have a SX6036 with full L3 capabilities then...
 

necr

Active Member
Dec 27, 2017
151
47
28
124
I'm trying to get that to run atm, I have the latest WINOF2 driver installed, but the only documentation for how to run the test is on linux. How would I run this in powershell or something as there must be a specific mellanox module im supposed to call for it as ib_send_bw isn't recognized as a cmdlet.
cd "C:\Program Files\Mellanox\MLNX_VPI\IB\Tools"
in Windows, utilities start with nd_, you might want to try nd_send_bw and start from there. Utilities are client-server, means you have to first start one as a server in listening mode, and the other side as a client. You might want to run PS as admin.
 

Castlers

New Member
Feb 22, 2021
23
1
3
cd "C:\Program Files\Mellanox\MLNX_VPI\IB\Tools"
in Windows, utilities start with nd_, you might want to try nd_send_bw and start from there. Utilities are client-server, means you have to first start one as a server in listening mode, and the other side as a client. You might want to run PS as admin.
Thanks for helping me out with that I have those Mellanox tools working now. Here is the result of my client to server connection
powershell_jlN59hbNTJ.png