Mellanox SX6018/SX6032 Patched Firmware for High Power Modules (LR4)

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

llowrey

Active Member
Feb 26, 2018
174
147
43
Unlike the SX6012, the SX6018 and SX6032 only support high power modules in certain ports. I'm sure Mellanox had good reasons for the limits, possibly power delivery or heat dissipation. Home Lab users like us may not be pushing things to the limits that Mellanox was worried about. With this in mind, I have modified the firmware to enable high power on all ports.

I and many users here have been using this patched firmware for years with no ill effects reported. Nobody has reported that they've bricked their switch or burned their house down but you should still consider this to be risky.

You'll need to be able to get to a shell prompt via the _shell command in order to run flint.

Here's a zip with the firmware for both the SX6018 and the SX6036:


The flint command for the SX6036 would be something like:

Code:
flint --override_cache_replacement --allow_psid_change -d /dev/mst/mt51000_pciconf0 -i ./MT1010310020.bin b
The -d /dev/mst/mt51000_pciconf0 argument might be different for the SX6036 but it's the same for the SX6012 and SX6018.

If you haven't already followed, or read, the SX60XX conversion doc maintained by @dodgy route I strongly suggest you start there in order to get more familiar with the firmware flashing process.

This only addresses the per-port power limits and the fae cable-stamping-unlock 40g_lr4 command is still needed to enable LR4 transceivers.

Good luck!
 
  • Like
Reactions: klui and blunden

NablaSquaredG

Bringing 100G switches to homelabs
Aug 17, 2020
1,724
1,154
113
One note:
Code:
MT_1010310020  MSX6036F-xxxS_Ax                 SwitchX-2 based FDR InfiniBand Switch; 36 QSFP; Managed
MT_1240212020  MSX6018F-xxxS_Ax                 SwitchX-2 based FDR InfiniBand Switch; 18 QSFP; Managed
Those are all SwitchX-2.

Switch-X and SwitchX-2 exist with nearly the same PNs (MSX6036F-1SFR -> SwitchX, MSX6036F-1SFS -> SwitchX-2, the last character indicates rev, R = Switch-X, S = SwitchX-2)

I have not yet tested what happens when you flash SwitchX Firmware to a SwitchX-2 ASIC.

I'm sure Mellanox had good reasons for the limits, possibly power delivery or heat dissipation.
I suppose it's market segmentation. They had the MetroX line for long haul (which also featured full SwitchX ASICs, lol)

Code:
 82. MT_1250110001  MTX6100_Ax                       MetroX 10km long haul FDR10 InfiniBand switch; 6 long haul QSFP+ ports; Managed
 93. MT_1550110001  MTX6000_Ax                       MetroX 1km long haul FDR10 InfiniBand switch; 16 long-haul and 16 downlink QSFP ports; Managed
95. MT_1710110017 MTX6280_Ax MetroX 80km WDM lossless long haul 40GbE system; 1 long haul QSFP+ port; Managed
96. MT_1720110017 MTX6240_Ax MetroX 40km WDM lossless long haul 40GbE system; 2 long haul QSFP+ port; Managed
According to the schematics I have:

Each group of 4 ports should be limited to 4.25A @ 3.3V -> 14W of transceiver power.
Total transceiver power consumption is limited to 40A @ 3.3V -> 132W of transceivers.
 
  • Like
Reactions: gb00s and Stephan

BoGs

Active Member
Feb 18, 2019
149
29
28
@llowrey Would love some input here as I got another SX6036 to play around with MLAG and when I was trying to flash the same steps. I got a kernel panic.

Code:
-W- Firmware flash cache access is enabled. Running in this mode may cause the firmware to hang.

    Current FW version on flash:  9.4.5110
    New FW version:               9.4.5110

    Note: The new FW version is the same as the current FW version on flash.

 Do you want to continue ? (y/n) [n] : y

Burning FS2 FW imagOops: Machine check, sig: 7 [#1]
PREEMPT mlnx460ex
Modules linked in: af_packet mst_pciconf(O) mst_pci(O) oops_dump_reg(O) sx_pra(O) int_hndl_main(O) act_police cls_u32 sch_ingress ip6table_filter ip6_tables iptable_filter ip_tables x_tables ib_ipoib(O) ib_cm(O) ib_sa(O) inet_lro ib_uverbs(O) ib_umad(O) sx_ib(O) ib_mad(O) ib_core(O) ib_addr(O) ipv6 compat(O) sx_bfd(O) sx_netdev(O) sx_core(O) 8021q switchx(O) mellaggra_mod(O) cpld_handler(O) watchdog(O) sx_glue_if(O) i2c_mux_pca954x i2c_mux ehci_hcd usbcore lm87 hwmon_vid hwmon usb_common unix
CPU: 0 PID: 5940 Comm: flint_oem Tainted: G           O 3.10.94-MELLANOXuni-m460ex PPC_M460EX
task: ef9475a0 ti: efff6000 task.ti: ec2f0000
NIP: c0003328 LR: c00032c4 CTR: 00000000
REGS: efff7f10 TRAP: 0214   Tainted: G           O  (3.10.94-MELLANOXuni-m460ex)
MSR: 00021000 <CE,ME>  CR: 84242448  XER: 00000000

GPR00: c00032c4 ec2f1d50 ef9475a0 0000001e ef804900 efff2000 00000004 01400001
GPR08: 00000000 c03ad008 00010000 ec2f1d40 ef9477a0 10754e48 00000000 00000000
GPR16: 100e7dc0 100e83c0 100e8558 1074d24d bf9249b8 bf9248b8 bf924778 fffffff0
GPR24: 0003041d 104fb178 ec2f0040 00000000 00000000 0000001e ec2f0000 efff2000
NIP [c0003328] do_IRQ+0xb0/0x118
LR [c00032c4] do_IRQ+0x4c/0x118
Call Trace(ef9475a0): name=flint_oem, state=0
[ec2f1d50] [c00032c4] do_IRQ+0x4c/0x118 (unreliable)
[ec2f1d80] [c000b538] ret_from_except+0x0/0x18
--- Exception: 501 at pci_bus_write_config_dword+0x68/0x94
    LR = pci_bus_write_config_dword+0x60/0x94
[ec2f1e60] [fc632680] ioctl.isra.4+0x208/0x48c [mst_pciconf]
[ec2f1ea0] [c00bd818] vfs_ioctl+0x38/0x58
[ec2f1eb0] [c00be3ec] do_vfs_ioctl+0x59c/0x6d8
[ec2f1f10] [c00be568] SyS_ioctl+0x40/0x68
[ec2f1f40] [c000ae74] ret_from_syscall+0x0/0x3c
--- Exception: c01 at 0xfe00638
    LR = 0xfe97320
Instruction dump:
815e0000 3b800000 83420204 7fa3eb78 915f0000 7fe5fb78 939f003c 815e000c
5548042e 815f000c 554a061e 7d0a5378 <915f000c> 395f0040 91420204 80c4002c
---[ end trace 031f1d683bb93694 ]---

Kernel panic - not syncing: Fatal exception in interrupt
Stops the flash around 70% and automatically restarts the switch. I am flashing from the console so no network issues. Switch reboots and everything works from what I can tell and I do not have the warning on the LR optics I have so could this mean it was already flashed?
 

BoGs

Active Member
Feb 18, 2019
149
29
28
Everything seems like it should work:

Code:
CHASSIS     MSX6036F-2SRS     MTxxxxx     N/A     A1     xxxx     -
MGMT     MSX6036F-2SRS     MTxxxxx     2     A1     -     9.4.5110
FAN     MSX60-FR     MTxxxxx     N/A     A1     -     -
PS1     MSX60-PR     MTxxxxx     N/A     A1     -     -
PS2     MSX60-PR     MTxxxxx     N/A     A1     -     -
Code:
flint -i MT_1010310020_HP.bin q
Image type:          FS2
FW Version:          9.4.5110
FW Release Date:     12.2.2019
Device ID:           51000
Description:         Node             Sys image
GUIDs:               0000000000000000 0000000000000000
Description:         Base             Switch
MACs:                    000000000000     000000000000
VSD:                 n/a
PSID:                MT_1010310020
Code:
flint --override_cache_replacement -d /dev/mst/mt51000_pciconf0 q

-W- Firmware flash cache access is enabled. Running in this mode may cause the firmware to hang.
Image type:          FS2
FW Version:          9.4.5110
FW Release Date:     12.2.2019
Device ID:           51000
Description:         Node             Sys image
GUIDs:               xxxx xxxx
Description:         Base             Switch
MACs:                    xxxx     xxxx
VSD:                 n/a
PSID:                MT_1010310020
(removed the guids and macs and sn)
 

BoGs

Active Member
Feb 18, 2019
149
29
28
Found out that the new switch I got is Switch x 2 asic vs the switch x original one and the firmware does not work for those as metioned above MSX6036F-xxxS_Ax the S designates switch x 2.
 

llowrey

Active Member
Feb 26, 2018
174
147
43
This is what exists in the MLX-OS firmware:
Code:
IBM1530310031 00AE064_00AE068_Ax Mellanox SX6036G FDR14 IB/40GbE Gateway
IBM1010110020 00W0004_00W0008_A1 Mellanox SX6036 FDR14 InfiniBand Switch
IBM1010210020 00W0004_00W0008_Bx Mellanox SX6036 FDR14 InfiniBand Switch
IBM1010310020 00WT030_00WT031_Ax Mellanox SX6036 FDR14 InfiniBand Switch
IBM1010110029 90Y3767_90Y3777_A1 Mellanox SX6036 QDR/FDR10 InfiniBand Switch
IBM1010210029 90Y3767_90Y3777_Bx Mellanox SX6036 QDR/FDR10 InfiniBand Switch
MT_1010210020 MSX6036F_B1 SwitchX based FDR InfiniBand Switch; 36 QSFP; Managed
MT_1010310020 MSX6036F-xxxS_Ax SwitchX-2 based FDR InfiniBand Switch; 36 QSFP; Managed
MT_1530310031 MSX6036G-xxxS_Ax SwitchX-2 based 36-port QSFP 56GbE Managed InfiniBand to Ethernet gateway system
MT_1010110029 MSX6036T_A1 SwitchX based FDR-10 InfiniBand Switch; 36 QSFP; Managed
MT_1010210029 MSX6036T_B1 SwitchX based FDR-10 InfiniBand Switch; 36 QSFP; Managed
MT_1010310029 MSX6036T-xxxS_Ax SwitchX-2 based FDR-10 InfiniBand Switch; 36 QSFP; Managed
My high-power mod is for this one:
Code:
MT_1010310020 MSX6036F-xxxS_Ax SwitchX-2 based FDR InfiniBand Switch; 36 QSFP; Managed
What is the PSID of the firmware you currently have loaded?
 

naptastic

New Member
Jan 27, 2023
21
3
3
Funny story. It's relevant, I promise.

Years ago, I was an amateur DJ. I had five speakers that I knew could handle 100 Watts each. I bought a home theater amplifier with five 100W channels. Played 2/3 of a show with it before the amplifier turned itself off with a puff of magic smoke. One subwoofer was completely destroyed (the glue holding the voice coil together melted) and the amplifier itself wouldn't power on anymore.

All five output transistors could handle 100W each, yes, but the input transformer was only rated for 250 Volt-Amps.

The moral of the story is this: DO NOT OVERBURDEN YOUR INPUT STAGES. Things will break.
 
  • Like
Reactions: llowrey

BoGs

Active Member
Feb 18, 2019
149
29
28
What is the PSID of the firmware you currently have loaded?
On my original switch that I flashed the firmware successfully

Code:
flint --override_cache_replacement -d /dev/mst/mt51000_pciconf0 q

-W- Firmware flash cache access is enabled. Running in this mode may cause the firmware to hang.
Image type:          FS2
FW Version:          9.4.5110
FW Release Date:     12.2.2019
Device ID:           51000
Description:         Node             Sys image
GUIDs:               xxx xxx
Description:         Base             Switch
MACs:                    xxx     xxx
VSD:                 n/a
PSID:                MT_1010310020
Then on the switch that I JUST got that it fails on with the error above

Code:
flint --override_cache_replacement -d /dev/mst/mt51000_pciconf0 q

-W- Firmware flash cache access is enabled. Running in this mode may cause the firmware to hang.
Image type:          FS2
FW Version:          9.4.5110
FW Release Date:     12.2.2019
Device ID:           51000
Description:         Node             Sys image
GUIDs:               xxx xxx
Description:         Base             Switch
MACs:                    xxx     xxx
VSD:                 n/a
PSID:                MT_1010310020
To my naked eyes its exactly the same, and I plugged LR4 optics on the new switch in Port 36 which should not be able to handle the LR optics technically. I have been reading on SC6720 and its Switch-X 2 maybe that is why because the newer switches? I do not know to be honest.
 

PeralChen

New Member
Jan 14, 2025
3
1
1
Since I have a SX6036 switch with PSID MT_1010310029, I try to dump MT_pciconf0.ini and patch it with module_power_level_supported=5, but it doesn't work after I reflash the bin file.
Here's the zip file include ini and bin, could you help me to find out what thing I miss to do?
 

Attachments

llowrey

Active Member
Feb 26, 2018
174
147
43
Since I have a SX6036 switch with PSID MT_1010310029, I try to dump MT_pciconf0.ini and patch it with module_power_level_supported=5, but it doesn't work after I reflash the bin file.
Here's the zip file include ini and bin, could you help me to find out what thing I miss to do?
Your firmware only has modifications to the embedded ini content which is not actually used. It's only there for information purposes. The second configuration section contains the data that configures the ports.

I finally found my old java code to patch the firmware and rewrote it in rust (why not) and pushed to github here:


I'll get around to publishing executable release binaries in github at some point but it's easy enough to build yourself.
 

PeralChen

New Member
Jan 14, 2025
3
1
1
Your firmware only has modifications to the embedded ini content which is not actually used. It's only there for information purposes. The second configuration section contains the data that configures the ports.

I finally found my old java code to patch the firmware and rewrote it in rust (why not) and pushed to github here:


I'll get around to publishing executable release binaries in github at some point but it's easy enough to build yourself.
Thanks, it works!
 
  • Like
Reactions: llowrey