ConnectX-3: Unable to get Ethernet mode working

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

HindrikS

New Member
Jan 22, 2023
8
0
3
Hello everyone,

Recently I purchased a pair of ConnectX-3 354A FCBT Oracle branded dual port NICs.
The original firmware indicated it was an fcbt model. I then proceeded to cross flash it to original mellanox firmware. I received a QSFP+ cable yesterday, ordered from ebay:
- HP Mellanox Technologies Mc2207130-0A1 1.5M Fdr Infiniband Qsfp Passive Copper
I expected to get 56GbE through this. However, I can't seem to get ethernet working whatsoever.
In Infiniband mode the cards seem to work fine, I get a link and once running OpenSM I can push data over IPoIB adapters. However, once switching the ports to Eth mode, I don't have a link anymore, on neither of the cards. I also tried reflashing the original oracle firmware as well as QBCT firmware, nothing seems to give me a link up once I configure ports in Ethernet mode.

Does anyone have an idea on how to get it set up in Ethernet mode?
Could it somehow be the cable?
 

i386

Well-Known Member
Mar 18, 2016
4,241
1,546
113
34
Germany
What os?
Ip addresses in the same network when using ethernet?

Do you plan to use infiniband? Or ethernet only?
I flashed the ethernet firmware on all my cx-3 adapters, changing the port type is a pita on the different linux distros and infiniband support was removed from the stuff I was interested...

Edit: I assume no switch is involved (1x cable, host to host)?
 

HindrikS

New Member
Jan 22, 2023
8
0
3
I'm using windows, but I also tried it on linux. (eventual deployment will be windows to unraid) When setting the cards to infiniband mode, I get a link up on both cards and it works once starting opensm. (I just put them in the same system for testing). Once I set them to ethernet, no links come up. And yes, it's indeed host to host, no switch. I just have a QSFP cable in between.
The cable I received also seems to be different than what I expected, see the attached images.
Also, it only links at FDR10 instead of full FDR it seems.
 

Attachments

Last edited:

Stephan

Well-Known Member
Apr 21, 2017
923
700
93
Germany
Not sure what is up with that cable, it's from a PCI-Extension something, hence the label.

Check Table 6 page 12f. of https://network.nvidia.com/related-docs/firmware/ConnectX3-FW-2_42_5032-release_notes.pdf for a list of validated cables. The EMC versions work but might show wrong or no length info because of different EEPROM contents.

Also try to switch both cards into Ethernet-only mode:

mlxconfig -d /dev/mst/mt4099_pci_cr0 set LINK_TYPE_P1=2 LINK_TYPE_P2=2
 

HindrikS

New Member
Jan 22, 2023
8
0
3
Switching both cards to ethernet mode does not work, I get no link in eth mode. The same is true when linking one port of a card to the other port on the same card, no link. Regarding the cable, I am starting to suspect it. While it may be a mellanox cable, it for sure isn't the model I ordered and indeed this PCI EX label is strange. Could it be the cause of Ethernet not working? Or would that more likely be a card issue?
 

Stephan

Well-Known Member
Apr 21, 2017
923
700
93
Germany
Note, if you have a Coffee Lake Refresh or later CPU, which has a hardware fix for Meltdown so doesn't incur context switch time hell, you can check out the mlx4 driver history e.g. from Add driver bpf hook for early packet drop and forwarding [LWN.net] and e.g. Re: [RFC PATCH 4/5] mlx4: add support for fast rx drop bpf program - Alexei Starovoitov

Development for this card was pretty much finished in 2016/2017 with performance on one CPU reaching 20 Mpps for a simple "drop" and 10 Mpps for "rewrite and forward". Meanwhile, Intel was still trying to figure out, why their 1 Gbps (!!) chips had bad performance, driver hangs, link drops e1000e: fix buffer overrun while the I219 is processing DMA transactions · torvalds/linux@b10effb

This is why Mellanox is superior silicon, and mostly everything else just a toy.
 

Stephan

Well-Known Member
Apr 21, 2017
923
700
93
Germany
If both cards show no PCIe abnormalities, have recent firmware, I suspect the cable. Only way to get closer to truth is to try one or two from the validated list from the PDF I posted.

One last thing to try, before you return the cable, on Linux is to nail the speed to 40 Gbps fixed, or whatever ethtool is offering. Try every speed on both systems.
 

HindrikS

New Member
Jan 22, 2023
8
0
3
Just tried some stuff, I suspect the cable just doesn't do ethernet. What cable would I need to have then? I'm seeing conflicting information as to which cables do infiniband/ethernet or ethernet or IB only. Would this work for IB and Ethernet MC2207130-002 ?
 

HindrikS

New Member
Jan 22, 2023
8
0
3
Thanks! I'll get that one then. Will contact the seller of my previous cable then, not sure what it is that they send me, except that it does 40gbit FDR10. I also can't seem to find the product numbers on the cable anywhere on the internet either.
 

HindrikS

New Member
Jan 22, 2023
8
0
3
So I got my hands on a 100GBe QSFP28 cable. It indeed was the weird mellanox cable causing the problem! I can now happily do 40gbit between my cards.
Thank you all for your help!