Crossflash Oracle CX556A to Mellanox OEM?

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

im.thatoneguy

Member
Oct 28, 2020
31
8
8
Has anyone cross flashed a CX556A to Mellanox OEM firmware? I tried:

flint -d mt4121_pciconf0 -i fw-ConnectX5-rel-16_31_1014-MCX556A-EDA_Ax_Bx-UEFI-14.24.13-FlexBoot-3.6.403.bin -allow_psid_change burn

but got:

-E- Burning FS4 image failed: Changing PSID is unsupported under controlled FW. You can try to run again with the flag "--no_fw_ctrl".

Is it safe to just go ahead and burn? Or do I need to prep ini files or something first? I tried to back up the firmware but it's encrypted so it won't let me download it.
 

chicken-of-the-cave

New Member
Mar 13, 2020
18
8
3
I am in the same boat as you. I have the Oracle CX556's as well from eBay, and I ran into this post after from searching.
One observation I made is that the PSID between Mellanox and Oracle Mellanox cards are different (Mellanox OEM starts with MT_ whereas Oracle Mellanox cards start with ORC000...).

Hav you made any progress since? If anything comes up on my end, I'll let you guys know.
 

im.thatoneguy

Member
Oct 28, 2020
31
8
8
No progress here.

And my Supermicro server won't boot with it installed so I'm very eager to flash it and see if that fixes it.
 

chicken-of-the-cave

New Member
Mar 13, 2020
18
8
3
No progress here.

And my Supermicro server won't boot with it installed so I'm very eager to flash it and see if that fixes it.
I have an Asus Server and running into the same issue. I was able to boot it with another server, but the ' dmesg ' output in Linux is showing weird errors here and there.
I guess more research is needed on my end before I attempt a YOLO.
 

chicken-of-the-cave

New Member
Mar 13, 2020
18
8
3
Last edited:

chicken-of-the-cave

New Member
Mar 13, 2020
18
8
3
Figured it out.

Poked around, and this tidbit from the nVIDIA/Mellanox documentation helped shed some reasoning (thought I could do something with signing the Mellanox firmware and work with the ' secure-fw ' flag instead, but realized that keys are not recoverable. That said, keys should never be recoverable, otherwise it defeats the security):

Screen Shot 2021-11-20 at 3.38.47 PM.png

So the CX-5's do have the recovery jumper on them ("JP2" / "FNP" aka. " flash-not-present"). I shorted it and I was able to boot Linux, and noticed the card was in flash recovery mode. Below is the JP2/FNP pins (I installed the green jumper to force it into image recovery mode) on the CX556A (Oracle-branded):

IMG_3746.jpg

lspci
*snip*
02:00.0 Memory controller: Mellanox Technologies MT28800 Family [ConnectX-5 Flash Recovery]
*snip*
Then, I flashed it with a latest firmware from the nVIDIA/Mellanox website:

flint -d /dev/mst/mt525_pciconf0 -i fw-ConnectX5-rel-16_31_1014-MCX556A-EDA_Ax_Bx-UEFI-14.24.13-FlexBoot-3.6.403.bin -allow_psid_change burn
Done.
Current FW version on flash: 16.23.1020
New FW version: 16.31.1014


You are about to replace current PSID on flash - "ORC0000000003" with a different PSID - "MT_0000000009".
Note: It is highly recommended not to change the PSID.

Do you want to continue ? (y/n) [n] : y
Burning FW image without signatures - OK
Burning FW image without signatures - OK

-W- Failed to update FW boot address. Power cycle the device in order to load the new FW.

Restoring signature - OK
-I- To load new FW run reboot machine.
Shutdown the computer, removed the jumper, re-installed the card, and turned it on.
This time, the Mellanox NIC is detected as an Infiniband card. I effectively forced it the ports to be Ethernet-only ports:

mlxconfig -d /dev/mst/mt4121_pciconf0 set LINK_TYPE_P1=2 LINK_TYPE_P2=2

Device #1:
----------

Device type: ConnectX5
Name: MCX556A-EDA_Ax_Bx
Description: ConnectX-5 Ex VPI adapter card; EDR IB (100Gb/s) and 100GbE; dual-port QSFP28; PCIe4.0 x16; tall bracket; ROHS R6
Device: /dev/mst/mt4121_pciconf0

Configurations: Next Boot New
LINK_TYPE_P1 IB(1) ETH(2)
LINK_TYPE_P2 IB(1) ETH(2)

Apply new Configuration? (y/n) [n] : y
Applying... Done!
-I- Please reboot machine to load new configurations.
Reboot again, and now I am good to go!

Image type: FS4
FW Version: 16.31.1014
FW Release Date: 30.6.2021
Product Version: 16.31.1014
Rom Info: type=UEFI version=14.24.13 cpu=AMD64
type=PXE version=3.6.403 cpu=AMD64
Description: UID GuidsNumber
Base GUID: 1c34da030071b83a 8
Base MAC: 1c34da71b83a 8
Image VSD: N/A
Device VSD: N/A
PSID: MT_0000000009
Security Attributes: N/A

Lots of forum posts and DuckDuckGo'ing helped, however the most helpful was this one:

 

chicken-of-the-cave

New Member
Mar 13, 2020
18
8
3
Ah, that's good to know.
Because HPE cards do not seem to have the jumper:
View attachment 20540
You are right, there are no pins but it does have the "FNP / JP2" solder points.
IMHO, thats basically your way forward. You would need to short those solder pins where " FNP / JP2 " is somehow to force your HPE-branded card into image recovery mode. Once you reflash, you need to remove whatever short you did afterwards to use the card normally.
Fortunately for me, my card (Oracle-branded) had those pins available.
 
  • Like
Reactions: tinfoil3d

jpmomo

Active Member
Aug 12, 2018
531
192
43
You are right, there are no pins but it does have the "FNP / JP2" solder points.
IMHO, thats basically your way forward. You would need to short those solder pins where " FNP / JP2 " is somehow to force your HPE-branded card into image recovery mode. Once you reflash, you need to remove whatever short you did afterwards to use the card normally.
Fortunately for me, my card (Oracle-branded) had those pins available.
There doesn't seem to be any solder points only 2 holes for the FNP. See attached for detail. Is there any way to effectively jump the FNP on my card?
thanks,
jp
 

Attachments

oneplane

Well-Known Member
Jul 23, 2021
844
484
63
You can put a paper clip (blank metal one), some bent wire, tips of a tweezer or even solder tin through the holes. It's just a strap to Vss or Vdd, and I imagine it's read when the ASIC comes out of reset (but keep it shorted the entire time to be sure).
 

jpmomo

Active Member
Aug 12, 2018
531
192
43
Figured it out.

Poked around, and this tidbit from the nVIDIA/Mellanox documentation helped shed some reasoning (thought I could do something with signing the Mellanox firmware and work with the ' secure-fw ' flag instead, but realized that keys are not recoverable. That said, keys should never be recoverable, otherwise it defeats the security):

View attachment 20539

So the CX-5's do have the recovery jumper on them ("JP2" / "FNP" aka. " flash-not-present"). I shorted it and I was able to boot Linux, and noticed the card was in flash recovery mode. Below is the JP2/FNP pins (I installed the green jumper to force it into image recovery mode) on the CX556A (Oracle-branded):

View attachment 20538



Then, I flashed it with a latest firmware from the nVIDIA/Mellanox website:



Shutdown the computer, removed the jumper, re-installed the card, and turned it on.
This time, the Mellanox NIC is detected as an Infiniband card. I effectively forced it the ports to be Ethernet-only ports:



Reboot again, and now I am good to go!




Lots of forum posts and DuckDuckGo'ing helped, however the most helpful was this one:

The one issue that I ran into is that when I booted up with the pins jumped, it showed flash recovery mode like you mentioned. However, when I tried to flash or even check the status, there was no device found.

I am using the mellanox tools in vmware using the esxcli.

I removed the jumper and then rebooted. Luckily everything went back to normal. The Security Attributes: secure-fw was still there.

After you booted the server with the jumper on the nic, were you able to run the mst status command and see any devices? If so, do you think this might just be an issue with using the mellanox tools in vmware?

Thanks for any suggestions.
 

chicken-of-the-cave

New Member
Mar 13, 2020
18
8
3
The one issue that I ran into is that when I booted up with the pins jumped, it showed flash recovery mode like you mentioned. However, when I tried to flash or even check the status, there was no device found.

I am using the mellanox tools in vmware using the esxcli.

I removed the jumper and then rebooted. Luckily everything went back to normal. The Security Attributes: secure-fw was still there.

After you booted the server with the jumper on the nic, were you able to run the mst status command and see any devices? If so, do you think this might just be an issue with using the mellanox tools in vmware?

Thanks for any suggestions.
I would retry the steps with a Linux Live image (i.e.: booting Ubuntu Server from a USB stick for instance). VMWare ESX has a very limited CLI and likely meant for troubleshooting ESX as opposed to leveraging it as a Linux distro.
 

jpmomo

Active Member
Aug 12, 2018
531
192
43
thanks. I am booting with centos 8 now and will see if the card is recognized after booting with jumper.
 

jpmomo

Active Member
Aug 12, 2018
531
192
43
getting closer with linux but still not quite there. Maybe I am missing something with the mellanox tools under linux.

1653000791732.png
 

jpmomo

Active Member
Aug 12, 2018
531
192
43
thanks for the suggestion. I did do that, in fact it was necessary to issue that command with this linux system. the vmware system would already have it started unless there was no nic recognized. I was wondering if there was something else that I needed to start besides mst.
 

jpmomo

Active Member
Aug 12, 2018
531
192
43
thanks for the link. Can you give me a high level of what that build of MFT might be able to help with? ex. I would use that instead of the mellanox tools to try and see if that would allow the allow_psid_change? Or is it meant to help the flint command see the card right after booting into recovery mode?

if that is the case and you think it might be worth a try, can you let me know some brief steps to properly build the new MFT?
ex. cut and paste of the script from your PKGBUILD link
then when do I issue the makepkg -Ccfi?

Thanks again for trying to figure this out. The purpose of this effort is to be able to burn new/different fw onto physically similar cards. converting a pcie gen3 to gen4. a 50G nic to a 100G nic. a 100G into a 200G. We are able to do this in some instances but the secure_fw on the newer cards is making that difficult. It is a little bit like the amd epyc and vendor locking!