Mellanox flashing ConnectX-5 to ConnectX-5 Ex ? (PCIe 3.0 to PCIe 4.0 ?)

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

jpmomo

Well-Known Member
Aug 12, 2018
594
258
63
To be more high-level, qsfp28 equates to 100G which can use 25g lanes with nrz or 50g lanes with pam4. The nic would need to support the associated signaling. Qsfp56 equates to 200g and can be flexible in how you may connect. Qsfp112 is 400g. Some of the cx7 nics are dual port qsfp112 but only support 200g per port. I am able to use a qsfp56 DAC to connect the 2 ports b2b and they link up at 200g.
 

vincococka

Member
Sep 29, 2019
80
44
18
Slovakia
Did anyone try cross-flashing a MCX512A-ACAT or MCX512A-ACUT to MCX512A-ADAT? Would be nice with a somewhat reasonably priced 10/25 Gbps NIC with PCI-E 4.0. :)

One PCB image I saw had a 1 instead of a 0 in the PCB number though so perhaps they are too different.
1. Let me confirm that crossflash MCX512A-ACU (pci-e 3.0) -> MCX512A-ADA (pci-e 4.0) works.
I crossflashed 2 cards and they seem to work OK at 25 Gbps on both ports.

Command used (windows and linux - under linux do not forget "mst start"):
Code:
$ mst status
$ flint -d mt4119_pciconf0 -i fw-ConnectX5-rel-16_35_4506-MCX512A-ADA_Ax_Bx-UEFI-14.29.15-FlexBoot-3.6.902.bin   -allow_psid_change   burn
$ mlxfwreset.exe -d mt4119_pciconf0 reset

2. I also crossflashed 3 cards MCX515A-CCA (pci-e 3.0) -> CX516A-CDA (pci-e 4.0) and all three seems to work OK at 100Gbps

Commands used (windows and linux - under linux do not forget "mst start"):
Code:
$ mst status
$ flint -d mt4119_pciconf0 -i fw-ConnectX5-rel-16_35_4506-MCX516A-CDA_Ax_Bx-UEFI-14.29.15-FlexBoot-3.6.902.bin   -allow_psid_change   burn
$ mlxfwreset.exe -d mt4119_pciconf0 reset
 

blunden

Well-Known Member
Nov 29, 2019
1,183
418
83
1. Let me confirm that crossflash MCX512A-ACU (pci-e 3.0) -> MCX512A-ADA (pci-e 4.0) works.
I crossflashed 2 cards and they seem to work OK at 25 Gbps on both ports.

Command used (windows and linux - under linux do not forget "mst start"):
Code:
$ mst status
$ flint -d mt4119_pciconf0 -i fw-ConnectX5-rel-16_35_4506-MCX512A-ADA_Ax_Bx-UEFI-14.29.15-FlexBoot-3.6.902.bin   -allow_psid_change   burn
$ mlxfwreset.exe -d mt4119_pciconf0 reset

2. I also crossflashed 3 cards MCX515A-CCA (pci-e 3.0) -> CX516A-CDA (pci-e 4.0) and all three seems to work OK at 100Gbps

Commands used (windows and linux - under linux do not forget "mst start"):
Code:
$ mst status
$ flint -d mt4119_pciconf0 -i fw-ConnectX5-rel-16_35_4506-MCX516A-CDA_Ax_Bx-UEFI-14.29.15-FlexBoot-3.6.902.bin   -allow_psid_change   burn
$ mlxfwreset.exe -d mt4119_pciconf0 reset
Thank you! :) Sounds like it's likely to work then, even though we can't know for sure yet.
 

vincococka

Member
Sep 29, 2019
80
44
18
Slovakia
You flashed a single port card with a dual port firmware? Or am I mixing up the product codes?
According to ConnectX5-EN product brief - Page4 Table1 you can find that there is only one PCIe 4.0 x16 card available - that was the reason why I decided to go that way.
Port 2 is seen by the OS (Windows Server / Linux / FreeBSD), but I know that I do not have to use it :).
cx5-pcie4.png
 

shpitz461

Active Member
Sep 29, 2017
188
25
28
52
Hi everyone, having trouble cross-flashing my ConnectX-5 HPE card.

HPE 874251-001
HPE Eth 100Gb 1p 842QSFP28 Adptr
PCIe GEN3 x16 100Gb 22W
Single port
PSID HPE0000000014

Code:
 sudo flint_oem -d /dev/mst/mt4119_pciconf0 -override_cache_replacement hw set Flash0.WriteProtected=Disabled
Results in
-W- Firmware flash cache access is enabled. Running in this mode may cause the firmware to hang.
-E- Failed to open Device: MFE_NO_FLASH_DETECTED
MST modules:
------------
MST PCI module is not loaded
MST PCI configuration module loaded

MST devices:
------------
/dev/mst/mt4119_pciconf0 - PCI configuration cycles access.
domain:bus:dev.fn=0000:82:00.0 addr.reg=88 data.reg=92 cr_bar.gw_offset=-1
Chip revision is: 00

Image type: FS4
FW Version: 16.35.4506
FW Release Date: 22.12.2024
Part Number: 874253-B21_Ax
Description: HPE Ethernet 100Gb 1-port 842QSFP28 Adapter
Product Version: 16.35.4506
Rom Info: type=UEFI version=14.29.15 cpu=AMD64
type=PXE version=3.6.902 cpu=AMD64
Description: UID GuidsNumber
Base GUID: 88e9axxxxxxxxxx 4
Base MAC: 88e9xxxxxxxxxxx 4
Image VSD: N/A
Device VSD: N/A
PSID: HPE0000000014
Security Attributes: secure-fw
Default Update Method: fw_ctrl
-E- Failed to open Device: MFE_NO_FLASH_DETECTED
trying to unlock the firmware using

Code:
1sudo flint -d /dev/mst/mt4119_pciconf0 -override_cache_replacement hw set Flash0.WriteProtected=Disabled
Results in:
-W- Firmware flash cache access is enabled. Running in this mode may cause the firmware to hang.
-E- Failed to open Device: MFE_NO_FLASH_DETECTED
Any idea how I can unlock the card for flashing?
 

i386

Well-Known Member
Mar 18, 2016
4,901
1,931
113
36
Germany
Under linux I had a lot of trouble with the mellanox tools, disabling secureboot helped. My 25GBE hpe cx-5 wouldn't crossflash until I shortend the fnp (flash not present) connectors ._.
 

shpitz461

Active Member
Sep 29, 2017
188
25
28
52
Under linux I had a lot of trouble with the mellanox tools, disabling secureboot helped. My 25GBE hpe cx-5 wouldn't crossflash until I shortend the fnp (flash not present) connectors ._.
Can you elaborate on that? Shortened what and how?
 

shpitz461

Active Member
Sep 29, 2017
188
25
28
52
Figured out the pins to short on the card.
Ran the crossflash script which installed 515-CCA on it.
The device name changed from '/dev/mst/mt4119_pciconf0' to '/dev/mst/mt525_pciconf0'
I used flint to query the card, this is what I get:
Code:
flint -d /dev/mst/mt525_pciconf0 query full
Image type:            FS4
FW Version:            16.35.8002
FW Release Date:       13.8.2025
MIC Version:           2.0.0
PRS Name:              cx5_MCX515A_1p_x16.prs
Part Number:           MCX515A-CCA_Ax_Bx
Description:           ConnectX-5 EN network interface card; 100GbE single-port QSFP28; PCIe3.0 x16; tall bracket; ROHS R6
Product Version:       rel-16_35_8002
Rom Info:              type=UEFI version=14.29.15 cpu=AMD64
                       type=PXE version=3.6.902 cpu=AMD64
Description:           UID                GuidsNumber
Base GUID:             88e9xxxxxxxxxxxx        4
Base MAC:              88e9xxxxxxxx            4
Image VSD:             N/A
Device VSD:            N/A
PSID:                  MT_0000000011
Orig PSID:             HPE0000000014
Security Attributes:   N/A
Default Update Method: fw_ctrl
HP firmware update no longer recognizes the card.

As soon as I remove the jumper from the card, upon rebooting, the card is no longer detected. Running in a ProxMox 9.

It's only recognized if the FNP jumper is connected.

Also, the lspci report is very short and weird link-speed:

Code:
lspci -vvvnn -s 81:00.0
81:00.0 Memory controller [0580]: Mellanox Technologies MT28800 Family [ConnectX-5 Flash Recovery] [15b3:020d]
        Subsystem: Mellanox Technologies MT28800 Family [ConnectX-5 Flash Recovery] [15b3:020d]
        Physical Slot: 7
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        NUMA node: 1
        IOMMU group: 1
        Capabilities: [40] Power Management version 2
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [60] Express (v1) Endpoint, IntMsgNum 0
                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s unlimited, L1 <32us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset- SlotPowerLimit 0W TEE-IO-
                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 128 bytes, MaxReadReq 128 bytes
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x16, ASPM L0s, Exit Latency L0s unlimited
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
                LnkCtl: ASPM Disabled; RCB 64 bytes, LnkDisable- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x16
                        TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-

Any idea how to bring it back to life?
 

dsrhdev

Member
May 28, 2024
36
9
8
81:00.0 Memory controller [0580]: Mellanox Technologies MT28800 Family [ConnectX-5 Flash Recovery] [15b3:020d]
hello,
looks like your card still in flash-recovery mode, have you tried to pull it from motherboard for few seconds and put it again?
 

shpitz461

Active Member
Sep 29, 2017
188
25
28
52
Hi, it is in flash recovery because I still have the NFP jumper connected. I turn off the server, remove the NFP jumper and boot it back up, the card is nowhere to be found, not listed in lspci and not found using mst utility.
 

dsrhdev

Member
May 28, 2024
36
9
8
Hi, it is in flash recovery because I still have the NFP jumper connected. I turn off the server, remove the NFP jumper and boot it back up, the card is nowhere to be found, not listed in lspci and not found using mst utility.
just try to remove it from server for few seconds (with jumper also removed). the problem is on stand-by voltage in pci-e slot
 

dsrhdev

Member
May 28, 2024
36
9
8
can you please reflash it back to HPE0000000014 and run again
Code:
flint -d /dev/mst/mt525_pciconf0 query full
it is probably had "Security Attributes: secure_fw", like
Code:
⬢[root@t-n5-w ~]# mstflint -d mlx5_0 q
Image type:            FS4
FW Version:            16.35.4030
FW Release Date:       27.6.2024
Product Version:       16.35.4030
Rom Info:              type=UEFI version=14.29.15 cpu=AMD64
                       type=PXE version=3.6.902 cpu=AMD64
Description:           UID                GuidsNumber
Base GUID:             b8cef60300f26962        8
Base MAC:              b8cef6f26962            8
Image VSD:             N/A
Device VSD:            N/A
PSID:                  HPE0000000009
Security Attributes:   secure-fw
in this case card will not be initialized by itself during bios/efi boot
 

shpitz461

Active Member
Sep 29, 2017
188
25
28
52
It had secure_fw attribute, but I removed it using the FNP jumper and running
Code:
sudo flint_oem -d /dev/mst/mt4119_pciconf0 -override_cache_replacement hw set Flash0.WriteProtected=Disabled
I have the FNP jumper connected now, and this is what I get:

Code:
root@proxmox:~# mst status
MST modules:
------------
    MST PCI module is not loaded
    MST PCI configuration module loaded

MST devices:
------------
/dev/mst/mt525_pciconf0          - PCI configuration cycles access.
                                   domain:bus:dev.fn=0000:81:00.0 addr.reg=88 data.reg=92 cr_bar.gw_offset=-1
                                   Chip revision is: 00


root@proxmox:~# flint -d /dev/mst/mt525_pciconf0 query full
Image type:            FS4
FW Version:            16.35.8002
FW Release Date:       13.8.2025
MIC Version:           2.0.0
PRS Name:              cx5_MCX515A_1p_x16.prs
Part Number:           MCX515A-CCA_Ax_Bx
Description:           ConnectX-5 EN network interface card; 100GbE single-port QSFP28; PCIe3.0 x16; tall bracket; ROHS R6
Product Version:       rel-16_35_8002
Rom Info:              type=UEFI version=14.29.15 cpu=AMD64
                       type=PXE version=3.6.902 cpu=AMD64
Description:           UID                GuidsNumber
Base GUID:             88e9axxxxxxxxxxx        4
Orig Base GUID:        N/A                     4
Base MAC:              88e9axxxxxxx            4
Orig Base MAC:         N/A                     4
Image VSD:             N/A
Device VSD:            N/A
PSID:                  MT_0000000011
Security Attributes:   N/A
Default Update Method: fw_ctrl
 

shpitz461

Active Member
Sep 29, 2017
188
25
28
52
And when I try to flash HPE firmware, this is what I get:

Code:
root@proxmox:~/usr/lib/x86_64-linux-gnu/firmware-nic-mellanox-ethernet-only-1.0.23-1.1# ./setup

##################################################################################################
HPE Mellanox InfiniBand Online Firmware Upgrade Utility for Linux
Copyright (c) 2011 Hewlett-Packard Enterprise Development Company, L.P.
##################################################################################################

EFI variables are not supported on this system
SecureBoot is disabled.
List of Network Adapters detected on the Server.................
[0] 0000:07:00.0 8086
[1] 0000:08:00.0 8086

If PSID or FW_Version is not found for some interfaces, please check /tmp/data5ZkMSw

Interface 0000:07:00.0 is not Mellanox one.
Interface 0000:08:00.0 is not Mellanox one.

NIC firmware update did not complete.  Check log for errors.
and the log-file is empty, /tmp/data5ZkMSw.
 

dsrhdev

Member
May 28, 2024
36
9
8
Code:
root@proxmox:~/usr/lib/x86_64-linux-gnu/firmware-nic-mellanox-ethernet-only-1.0.23-1.1#
hello,
just find hpe (mellanox oem) firmware in this folder and burn it in life-fish mode (nfp jumper sorted) it should be named like
Code:
fw-ConnectX5-rel-*signed.bin
Code:
root@master:/tmp# mstflint -i fw-ConnectX5-rel-16_35_4030-872726-B21_Ax_Bx-UEFI-14.29.15-FlexBoot-3.6.902.signed.bin q
Image type:            FS4
FW Version:            16.35.4030
FW Release Date:       27.6.2024
Product Version:       rel-16_35_4030
Rom Info:              type=UEFI version=14.29.15 cpu=AMD64
                       type=PXE version=3.6.902 cpu=AMD64
Description:           UID                GuidsNumber
Base GUID:             N/A                     8
Base MAC:              N/A                     8
Image VSD:             N/A
Device VSD:            N/A
PSID:                  HPE0000000009
Security Attributes:   secure-fw
Security Ver:          0
 

shpitz461

Active Member
Sep 29, 2017
188
25
28
52
Yep, thanks, was able to bring it back to life:

Code:
flint -allow_psid_change -d /dev/mst/mt525_pciconf0 -i fw-ConnectX5-rel-16_35_4506-874253-B21_Ax-UEFI-14.29.15-FlexBoot-3.6.902.signed.bin burn
Done.
Current FW version on flash: 16.35.8002
New FW version: 16.35.4506

Note: The new FW version is older than the current FW version on flash.

Do you want to continue ? (y/n) [n] : y


You are about to replace current PSID on flash - "MT_0000000011" with a different PSID - "HPE0000000014".
Note: It is highly recommended not to change the PSID.

Do you want to continue ? (y/n) [n] : y
Burning FW image without signatures - OK
Burning FW image without signatures - OK

-W- Failed to update FW boot address. Power cycle the device in order to load the new FW.

Restoring signature - OK
-I- To load new FW, issue system-level reset.
Card is now alive and well.

I think I'm a retard, I flashed the wrong Mellanox firmware.
I assumed the card is an MCX515A.

I looked at the back of my HPE card and it shows MSIP-REM-MLN-CX556A printed on the board.

I'm going to attempt flashing the 556A firmware and see if that also bricks the card.
 
  • Like
Reactions: nedimzukic2