Mellanox ConnectX-4s Not Working

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Larco

New Member
Jun 7, 2020
11
1
3
United States
So I've spent many hours on this completely unnecessary home network upgrade from 40 gigabit to 100 gigabit and it has been a very frustrating journey. I recently acquired a Celestica DX010 100 gigabit switch loaded with SONiC, two Mellanox ConnectX-4s (Exact model for both appears to be MCX456A-ECAT) from ebay, a generic FS.com 1m 100g DAC cable, and generic FS.com 100g transceivers. The first connectx will be going into my X299 workstation and the second into a old v2 xeon server loaded up with raided nvme drives. I placed the first card into my machine, checked lspci and it does not show up at all. No problem, I tried it in a secondary rig running Windows and that didn't show up either via device manager. Finally I tried that card in my server and even that did not show up. With a DAC cable plugged into it, it did not show any blinking lights at all, something that the second card did. I want to think it's just a very corrupted firmware, but I'm 90% the card is just DOA and that I should just return it.

As for the second card, after spending quite a few hours flashing the latest firmware for it, and changing it to ethernet mode, I did get the second card to show the two ethernet devices. However, I could not get a linkup at all with the Celestica. For simplicity, I used dhcpcd on both of the ports and could never get any blinking lights on the switch. I did some searching and found this thread about changing fec to rs and tried that, and that did not work either. Interestingly while troubleshooting by changing the ports on the card, I found that the port on the left gets the card to blink when plugged in, while the one on the right does not. I then grabbed a 40g DAC cable and plugged into my old Arista 7050t's QSFP+ port which I know for certain has working ports and not even that would link up. Out of desperation (and laziness) I installed a desktop environment on the server and started NetworkManager. My gigabit port that I was using to keep the server on the internet while I troubleshooted linked up no problem, but when I checked the interfaces, neither one of the ports on the mellanox card would give me the option to connect in the GUI, yet the lights were blinking on the card. Is there something I'm missing, or are both of the cards DOA?

TLDR:
  1. Have two mellanox connectx-4s (MCX456A-ECAT), FS.com Generic 100G DAC cables, and Celestica DX010
  2. First card straight out does not work in any PC
  3. Flashed second card with latest firmware and switched to ethernet mode
  4. Second card blinks when DAC cable is connected, but only in the left port.
  5. Second card cannot get a link up with any of my switches.
  6. Installed desktop environment, and discovered that I'm not given the opportunity to connect
 
Last edited:

i386

Well-Known Member
Mar 18, 2016
4,220
1,540
113
34
Germany
I would troubleshoot with the arista and known working cables and transceivers (I have seen too many posts in the last year with fs.com stuff that was causing problems).

And try to format your post, it's hard to read and finding the relevant information again ._.
 

Larco

New Member
Jun 7, 2020
11
1
3
United States
I added a TLDR to get the jist of it down. But yeah I'm not sure what else to do other than going back to my Chelsio 40 gigabit cards
 

Larco

New Member
Jun 7, 2020
11
1
3
United States
You're lucky: It seems like somebody (aka @Rand__ ) has already had the same issue ;)

(same Switch and same NIC type!)

Yeah I saw that thread in the one I linked in my original post. I tried that but it still didn't link up. Also as I said before I tried an old Arista 7050t with QSFP+ ports and even that did not work. So at this point it's either a card issue or a cable issue. I ordered a Mellanox DAC cable to see if the issue still persists.
 

klui

Well-Known Member
Feb 3, 2019
824
453
63
Try the following.

On your bad CX4, enable Livefish mode and see if MFT tools see it. Refer to Mellanox's MFT tools manual on how to do that. It's in the appendix. There should be jumpers or holes for jumpers on the card. It's a last resort but if it is not enumerated on the PCI bus chances are the card is toast. I have a CX3 with this behavior. Livefish didn't work, and even using Mellanox's I2C programmer didn't work.

On your CX4 that's partially working, see what speeds are supported for each port using ethtool. Manually set the speed and use your DAC and connect port 1 to port 2 and see if it links up. Use a known good cable--your QSFP+ DAC for example.

You should connect your Arista to Celestica and see if it links up using a known good cable.

I have no experience using FS.com cables: only Mellanox- and Molex-branded.
 

Larco

New Member
Jun 7, 2020
11
1
3
United States
Try the following.

On your bad CX4, enable Livefish mode and see if MFT tools see it. Refer to Mellanox's MFT tools manual on how to do that. It's in the appendix. There should be jumpers or holes for jumpers on the card. It's a last resort but if it is not enumerated on the PCI bus chances are the card is toast. I have a CX3 with this behavior. Livefish didn't work, and even using Mellanox's I2C programmer didn't work.

On your CX4 that's partially working, see what speeds are supported for each port using ethtool. Manually set the speed and use your DAC and connect port 1 to port 2 and see if it links up. Use a known good cable--your QSFP+ DAC for example.

You should connect your Arista to Celestica and see if it links up using a known good cable.

I have no experience using FS.com cables: only Mellanox- and Molex-branded.
Screen shot of one of the ports (other port prints the same thing):
1625380670924.png
After setting the speed to 40 gigabit and connecting the ports to each other using my known good QSFP+ cable, the light does blink once again, but still no option to connect in the GUI and ethtool reports no link detected.

In regards to connecting my arista to my celestica, I actually have LACP going with two QSFP+ cables (One of them I pulled for the troubleshooting step explained in the previous sentence) for 80 gigabit and they both link up perfectly fine.

Lastly in regards to the livefish mode, I don't see anything on the card where I can place jumpers and didn't see anything in the documentation about where they are, but maybe I'll have to have a deeper look. Honestly though it's probably dead and it's easier to just return it.
 

klui

Well-Known Member
Feb 3, 2019
824
453
63
Light shouldn't blink. Should be solid when linked up.

Play around with autoneg and fec using ethtool.
 

Larco

New Member
Jun 7, 2020
11
1
3
United States
Okay so update, I unplugged the cable from the right port and learned that it still blinks in ~0.5 second intervals and that it only stops when the left port is unplugged. The port on the right does not do anything at all with the DAC cable plugged in. I'll update if I can get anything interesting with autoneg and fec. EDIT: I found this known issues document on mellanox's website and it says if a faulty cable is inserted, it will blink an amber LED, which lines up exactly with my findings. The cable works fine with my chelsio, arista and celestica gear, so I guess I'll see if I can get any change with autoneg/fec but otherwise I'll have to report back in a few days when I get my mellanox cable.
 
Last edited:

Larco

New Member
Jun 7, 2020
11
1
3
United States
Alright so I ended up ordering a single port card from ebay from a different seller, and after flashing new firmware, switching to ethernet mode, and configuring the celestica to rs, I got a link up without a hitch. I'm gonna take that the two cards I got were DOA and will be returning them.
 
  • Like
Reactions: ofan