ConnectX-5 OCP 3.0 engineering sample - safe to update firmware to production version?

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Legionnaire

New Member
Dec 18, 2022
4
4
3
Hi guys, looking for opinions and for an input regarding updating and possibly cross-flashing firmware on ConnectX-5.

I was in need of an OCP3 NIC and was able to obtain a single-port ConnectX-5 relatively inexpensively. It turned out to be some sort of engineering sample.

It introduces itself as follows in lspci -vv:
Code:
Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
Subsystem: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
...
LnkCap: Port #0, Speed 16GT/s, Width x16, ASPM not supported
...
LnkSta: Speed 16GT/s, Width x16
...
Product Name: ConnectX-5 Ex EN network interface card for OCP 3.0, with Multi-Host and host management, 100GbE Single-Port QSFP28, Internal Lock bracket                                         
    Read-only fields:
        [PN] Part number: MCX565M-CDUI_C11       
        [EC] Engineering changes: A1
        [V2] Vendor specific: MCX565M-CDUI_C11       
        [SN] Serial number: MT1945X12088 
        [V3] Vendor specific: 80d739320d02ea1180001c34da40fd06
        [VA] Vendor specific: MLX:MN=MLNX:CSKU=V2:UUID=V3:PCI=V0:MODL=CX565M       
        [V0] Vendor specific: PCIeGen4 x16
        [RV] Reserved: checksum good, 1 byte(s) reserved
    End
...
Kernel driver in use: mlx5_core
Kernel modules: mlx5_core
And here is the output of ethtool -i:
Code:
driver: mlx5_core
firmware-version: 16.27.1016 (MSF0000000037)
expansion-rom-version:
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes
The part in parenthesis is PSID I assume.
As we can see, it runs and supports PCIe 4.0 x16.

Now I have two sets of questions.
  • 1. Upating firmware to production version:
    • 1.a Is it generally recommended to update FW from this old ES to a production FW?
    • 1.b Is it risky to do such an update?
    • 1.c Is it possible at all? The PSID does not match any of versions listed on Mellanox/Nvidia website, the closest one is for a MCX565M-CDA, which has PSID= MT_0000000347
    • 1.d If possible, should I do it in one big leap, or should I do progressive updates like ES FW -> Earliest and version-matched prod FW -> intermediate version FW -> most recent FW?
Then, as we can see, it advertises the same MT28800 chip as their VPI adapters. Mellanox didn't offer Infiniband-capable MCX5 in OCP3 form, but I was curious if it is possible to make this one a VPI adapter by flashing a suitable FW image. E.g. the one from MCX556-EDA, which is also MT28800 chip. So the second group of questions is as follows:
  • 2. Cross-flashing firmware to VPI:
    • 2.a Has anyone done it? Is it possible to flash Ethernet NIC to VPI NIC and expect it to support IB provided that the base device is the same as their VPI NICs?
    • 2.b What is the best FW image to use for the purpose? The one for MCX556A-EDA I assume? There are no single port VPI NICs that were PCIe 4.0-capable, only dual-port ones.
    • 2.c What are the risks and side-effects involved? Do I risk bricking this NIC to a non-recoverable state?
    • 2.d Is it possible to roll back firmware or unbrick the NIC if cross-flash turns out unsuccessful or instable?
    • 2.e Do I necessarily need transceivers that support Infiniband to check if cross-flashing succeeded? Presently I have some Finisar FTLC9558REPM's and Mellanox MMA1B00-C100's, but those are advertised for 100GbE, and I have none of explicitly IB-capable ones.
If any additional info is needed, I'm happy to provide it, please ask.
Any input is appreciated, particularly from knowledgable/experienced people.
 
Last edited:

i386

Well-Known Member
Mar 18, 2016
4,849
1,895
113
36
Germany
I would try to dump/backup the current firmware/configuration, then try to crossflash it to "MCX565M-CDAI" firmware and see what happens.

Personally I started to avoid infiniband (I flashed my cx-3 vpi cards to ethernet only firmware, the port type switching in different linux distros sucked). Is there any reason you want infiniband?

On the pcie add on cards there are usually pins (or apds) that can be connected via jumpers to set the card in a "flash not present mode" to recover the firmware, not sure if these also exist on the ocp versions. (you could detailed pictures of the card :D)
 

AICPLIGHT

New Member
Dec 3, 2025
2
0
1
www.aicplight.com
This is an engineering sample CX-5 OCP3 card, and with those the safest rule is basically “if it works, don’t touch it.” Flashing it to a production firmware or trying to turn an EN card into VPI is very risky and often ends in a brick, especially since the PSID isn’t public and OCP3 boards are more customized than normal PCIe cards. Even though the ASIC is the same, InfiniBand support isn’t just firmware — it also depends on the board design — so cross-flashing usually doesn’t work and rollback isn’t guaranteed if something goes wrong. If you’re happy with 100GbE, leave it alone; if you really need IB/VPI, selling this and buying a proper retail VPI card will save you a lot of pain.
 

Legionnaire

New Member
Dec 18, 2022
4
4
3
Thank you for replies!
I would try to dump/backup the current firmware/configuration, then try to crossflash it to "MCX565M-CDAI" firmware and see what happens.
Yes, thanks for the confirmation, that's what I'm going to do this weekend. If things go south, hopefully I'd have a rollback plan.

On the pcie add on cards there are usually pins (or apds) that can be connected via jumpers to set the card in a "flash not present mode" to recover the firmware, not sure if these also exist on the ocp versions. (you could detailed pictures of the card :D)
Yes, also good point, the NIC is presently installed and is working in a lab server, which is scheduled for maintenance on Saturday or Sunday, I'll try to take pictures.

Is there any reason you want infiniband?
Not really, no particular reason other than curiosuty - basically I'd like to see what all that "nanosecond-level latency" fuss is all about. I do realise that I'd be locking myself into a brand-specific technology, and that brand is not recently (or have ever been) known for its generosity and low prices. I also realise that I'd need either a switch (e.g. SX6036) with an installed net manager, or a node with installed OpenSM. If I buy 6036 with GW license and dislike infiniband, I could always use it as a fast 40/56GbE-capable switch.

What I'm very unsure about is whether I need transceivers that are specifically IB-capable, or can re-use existing ones or should re-flash existing ones to add IB support (and whether such a reflash is possible/feasible). The units I have are not listed as supporting IB. Any input on that?


This is an engineering sample CX-5 OCP3 card, and with those the safest rule is basically “if it works, don’t touch it.” Flashing it to a production firmware or trying to turn an EN card into VPI is very risky and often ends in a brick, especially since the PSID isn’t public and OCP3 boards are more customized than normal PCIe cards. Even though the ASIC is the same, InfiniBand support isn’t just firmware — it also depends on the board design — so cross-flashing usually doesn’t work and rollback isn’t guaranteed if something goes wrong. If you’re happy with 100GbE, leave it alone; if you really need IB/VPI, selling this and buying a proper retail VPI card will save you a lot of pain.
It does work, but logs show that it has some habit of dropping and re-establishing link cyclically, and sometimes drops it for good and requires transceiver unplug and replug for the link to be restored. Of course there are usual suspects like
  • overheating chip (checked, not the case),
  • overheating transceiver on either side (checked, not the case),
  • dying transceiver on either side (being checked presently),
  • cracked or damaged fiber or fiber ends (being checked presently),
  • dust or dirt in transceiver-fiber inteface (checked, not the case, cleaned for good measure anyway).
Basically I'm running out of reasons for link-downs, so decided to see if FW update improves link stability. The release notes for FW releases contain the list of supported optical modules, and the list expands with new releases, so I thought that I could have an unsupported module in the ES firmware. That's the main reason I'd like to try reflash in the first place.

I'd happily buy a production specimen of an IB-capable, OCP3-shaped NIC if those were available at a remotely affordable price. It just so happened that the earliest NIC that matches the criteria is ConnectX-6, like MCX653435A or MCX653436A, neither is cheap (or I don't know how to look).

Why do you say it's risky, I was of impression that MCX cards are fairly forgiving when it comes to flashing? In the sense that if flashing went wrong, there usually are ways to flash previously dumped FW image, and that would bring NIC back to life? Any experience of bricking an ES card you'd like to share?

Anyway, thanks for input, risks noted, I'll weight them vs. potential benefits gained.
 
Last edited:

klui

༺༻
Feb 3, 2019
1,036
616
113
You can't treat ES as feature-stable nor -complete items. I have personal experience of ES NICs that do not support certain features. They had the same behavior even flashing with the latest production firmware. They were created with the stepping available at that time and unless you are working with that vendor the affected feature(s) might be related to some corner case that you may or may not care about. The non-production cards you want are QS, not ES.