Beware of EMC switches sold as Mellanox SX6XXX on eBay

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

NablaSquaredG

Bringing 100G switches to homelabs
Aug 17, 2020
1,743
1,161
113
I read about patching the hardware in this thread but all of it was so secret,
it's a very questionable / experimental and I don't even remember exactly how I did it.

wondered if the downside is burning down your house lol.
Probably not, seemed more like product segmentation.
Each group of 4 ports should be limited to 4.25A @ 3.3V -> 14W of transceiver power.
Total transceiver power consumption is limited to 40A @ 3.3V -> 132W of transceivers.
 

BoGs

Active Member
Feb 18, 2019
152
37
28
it's a very questionable / experimental and I don't even remember exactly how I did it.


Probably not, seemed more like product segmentation.
Each group of 4 ports should be limited to 4.25A @ 3.3V -> 14W of transceiver power.
Total transceiver power consumption is limited to 40A @ 3.3V -> 132W of transceivers.
That would be more then enough I have a bunch of Arista PLRL4 that I got for a good price. Seems they use 3.5W, and I have it split up on port 4 and 5, so 1 and 6 is disabled so I could totally do 14w for each 4 bank 1-4, 5-8, etc. etc.

40G QSFP+ AOC, BIDI, UNIV, LRL4, LR4, PLR4, PLRL4 and ER4: 3.5 W
 

nbritton

Member
Nov 19, 2016
30
20
8
46
WARNING to all SX6720 owners:

When you upgrade to 3.10, it will do a BIOS update. This BIOS update messes up the I²C bus. When the I²C bus is messed up, it will get stuck at "Initializing modules, this may take a few minutes" when you try to login.
This isn't true, MLNX-OS 3.10.4006 will boot fine on my SX6720, it does have the correct Intel Ivy Bridge BIOS firmware version and SwitchX firmware packages. The issue that you ran into is simply that kernel version 4.19 is missing the Board Support Package (BSP) kernel modules for the SwitchX ASIC platform, as shown below, these missing modules cause the mgmtd service to fail. I speculate you just need the SwitchX SDK to rebuild the kernel modules for version 4.19.

Code:
[admin@switch-85643e ~]# lsblk /dev/sda{5,6}
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda5   8:5    0   2G  0 part /
sda6   8:6    0   2G  0 part /mnt
[admin@switch-85643e ~]# ll /mnt/opt/tms/bin/{bios_ivb*.rom,fw-SX*.mfa}
-rwxr-xr-x 1  2310  999 3145728 Sep 16  2018 /mnt/opt/tms/bin/bios_ivb.rom
-rwxrwxrwx 1 admin root 1255736 Nov  9  2022 /mnt/opt/tms/bin/fw-SX-rel-9_4_5090-FIT.mfa
[admin@switch-85643e ~]# ll /opt/tms/bin/{bios_ivb*.rom,fw-SX*.mfa}
-rw------- 1  2310  999 3145728 Jun  3  2018 /opt/tms/bin/bios_ivb.rom
-rwxrwxrwx 1 admin root 1254436 Feb 22  2019 /opt/tms/bin/fw-SX-rel-9_4_5110-FIT.mfa
[admin@switch-85643e ~]# ll /mnt/lib/modules/4.19.72-300.el7MELLANOXsmp-x86_64/extra/bsp_tools_kernel_level/
total 4
drwxr-xr-x 2 admin root 4096 Nov  9  2022 amifldrv_mod
[admin@switch-85643e ~]# ll /lib/modules/3.10.0-327.36.3.el7MELLANOXsmp-x86_64/extra/bsp_tools_kernel_level/
total 40
drwxr-xr-x 2 admin root 4096 Feb 13  2019 amifldrv_mod
drwxr-xr-x 2 admin root 4096 Feb 13  2019 cpld_handler
drwxr-xr-x 2 admin root 4096 Feb 13  2019 cpld_jtag
drwxr-xr-x 2 admin root 4096 Feb 13  2019 cpld_mux
drwxr-xr-x 2 admin root 4096 Feb 13  2019 gpio_pch
drwxr-xr-x 2 admin root 4096 Feb 13  2019 i2c_switchx
drwxr-xr-x 2 admin root 4096 Feb 13  2019 lpc_i2c
drwxr-xr-x 2 admin root 4096 Feb 13  2019 mellaggra
drwxr-xr-x 2 admin root 4096 Feb 13  2019 sx_glue_if
drwxr-xr-x 2 admin root 4096 Feb 13  2019 watchdog
[admin@switch-85643e ~]# ll /lib/modules/3.10.0-327.36.3.el7MELLANOXsmp-x86_64/extra/bsp_tools_kernel_level/i2c_switchx/
total 740
-rw-r--r-- 1 admin root 749684 Feb 13  2019 switchx.ko
[admin@switch-85643e ~]# strings /lib/modules/3.10.0-327.36.3.el7MELLANOXsmp-x86_64/extra/bsp_tools_kernel_level/i2c_switchx/switchx.ko | egrep "GPL|@mellanox.com"
license=GPL v2
author=Vadim Pasternak <vadimp@mellanox.com>
 

Attachments

NablaSquaredG

Bringing 100G switches to homelabs
Aug 17, 2020
1,743
1,161
113
This isn't true, MLNX-OS 3.10.4006 will boot fine on my SX6720, it does have the correct Intel Ivy Bridge BIOS firmware version and SwitchX firmware packages.
So why wasn’t the switch able to find the ASIC AFTER a downgrade of MLNX-OS?
Only after a manual downgrade of the BIOS, an older MLNX-OS was able to boot properly again.

Anyway it does not change the situation: 3.10 does not work.
 

nbritton

Member
Nov 19, 2016
30
20
8
46
So why wasn’t the switch able to find the ASIC AFTER a downgrade of MLNX-OS?
Heck if I know, but I can switch between 3.6.8012 and 3.10.4006 on mine without any issues, other then mgmtd failing badly on 3.10.4006 because of the missing kernel modules. You can get into the BIOS on the SX6720 with Ctrl+B, the password is admin. You can also boot from USB flash by hitting F1 at bootup.

I installed Ubuntu 20.04 on my SX6720 today, just plugged in a USB stick with Ubuntu Server 20.04.6 and appended "console=tty0 console=ttyS0,115200n8" to grub boot prompt, then did an ordinary EFI based install and then Mellanox OFED v4.9-7.1.0.0 install. I swapped out the stock 16GB mSATA drive with a Micron 128GB mSATA drive and also upgrade the RAM to 8GB with a Micron MT18KSF1G72HZ-1G6E2ZE module.

I'm getting a few errors with mlxsw_switchx2, EMAD reg access failed??? Is there a different PSID I can use, mine is IBM branded and I wonder if the IBM1840111020 PSID is messing things up?

Code:
root@sx6720:~# uname -a
Linux sx6720 5.4.0-205-generic #225-Ubuntu SMP Fri Jan 10 22:23:35 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
root@sx6720:~# lsb_release -a
No LSB modules are available.
Distributor ID:    Ubuntu
Description:    Ubuntu 20.04.6 LTS
Release:    20.04
Codename:    focal
root@sx6720:~# lsmod | grep mlx
mlx5_fpga_tools        20480  0
mlx5_ib               401408  0
ib_uverbs             135168  3 rdma_ucm,mlx5_ib,ib_ucm
mlx5_core            1216512  2 mlx5_fpga_tools,mlx5_ib
tls                    73728  1 mlx5_core
mlxfw                  24576  1 mlx5_core
mlx4_en               143360  0
mlx4_ib               229376  0
ib_core               335872  10 rdma_cm,ib_ipoib,mlx4_ib,iw_cm,ib_umad,rdma_ucm,ib_uverbs,mlx5_ib,ib_cm,ib_ucm
mlx4_core             352256  2 mlx4_ib,mlx4_en
mlx_compat             65536  15 rdma_cm,ib_ipoib,mlx4_core,mlx4_ib,iw_cm,mlx5_fpga_tools,ib_umad,mlx4_en,ib_core,rdma_ucm,ib_uverbs,mlx5_ib,ib_cm,mlx5_core,ib_ucm
mlxsw_switchx2         61440  0
mlxsw_pci              65536  1 mlxsw_switchx2
mlxsw_core            143360  2 mlxsw_pci,mlxsw_switchx2
root@sx6720:~# dmesg | grep mlx
[    5.033060] mlxsw_switchx2 0000:03:00.0: enabling device (0000 -> 0002)
[   49.486579] mlxsw_switchx2 0000:03:00.0: EMAD reg access failed (tid=2ee96a70000001a3,reg_id=9100(mgpir),type=query,status=4(register not supported))
[   50.343156] mlxsw_switchx2 0000:03:00.0: cannot register bus device
[   50.349655] mlxsw_switchx2: probe of 0000:03:00.0 failed with error -5
[  472.741952] mlxsw_switchx2 0000:03:00.0 enp3s0np20: renamed from eth0
[  472.796994] mlxsw_switchx2 0000:03:00.0 enp3s0np19: renamed from eth1
[  472.956255] mlxsw_switchx2 0000:03:00.0 enp3s0np22: renamed from eth3
[  472.986736] mlxsw_switchx2 0000:03:00.0 enp3s0np24: renamed from eth2
[  473.192333] mlxsw_switchx2 0000:03:00.0 enp3s0np31: renamed from eth9
[  473.256355] mlxsw_switchx2 0000:03:00.0 enp3s0np25: renamed from eth5
[  473.279683] mlxsw_switchx2 0000:03:00.0 enp3s0np34: renamed from eth10
[  473.297957] mlxsw_switchx2 0000:03:00.0 enp3s0np32: renamed from eth8
[  473.352920] mlxsw_switchx2 0000:03:00.0 enp3s0np26: renamed from eth4
[  473.365110] mlxsw_switchx2 0000:03:00.0 enp3s0np30: renamed from eth11
[  473.407831] mlxsw_switchx2 0000:03:00.0 enp3s0np33: renamed from eth7
[  473.422628] mlxsw_switchx2 0000:03:00.0 enp3s0np27: renamed from eth6
[  473.452637] mlxsw_switchx2 0000:03:00.0 enp3s0np36: renamed from eth2
[  473.850010] mlxsw_switchx2 0000:03:00.0 enp3s0np21: renamed from eth0
[  474.003532] mlxsw_switchx2 0000:03:00.0: EMAD reg access failed (tid=6bceaf86000001e5,reg_id=9100(mgpir),type=query,status=4(register not supported))
[  474.050586] mlxsw_switchx2 0000:03:00.0 enp3s0np23: renamed from eth1
[  474.143222] mlxsw_switchx2 0000:03:00.0 enp3s0np29: renamed from eth9
[  474.193224] mlxsw_switchx2 0000:03:00.0 enp3s0np16: renamed from eth10
[  474.211328] mlxsw_switchx2 0000:03:00.0 enp3s0np12: renamed from eth11
[  474.232444] mlxsw_switchx2 0000:03:00.0 enp3s0np18: renamed from eth4
[  474.255480] mlxsw_switchx2 0000:03:00.0 enp3s0np28: renamed from eth5
[  474.374372] mlxsw_switchx2 0000:03:00.0 enp3s0np13: renamed from eth8
[  474.396250] mlxsw_switchx2 0000:03:00.0 enp3s0np14: renamed from eth6
[  474.423398] mlxsw_switchx2 0000:03:00.0 enp3s0np35: renamed from eth3
[  474.447781] mlxsw_switchx2 0000:03:00.0 enp3s0np17: renamed from eth7
[  474.491625] mlxsw_switchx2 0000:03:00.0 enp3s0np11: renamed from eth12
[  475.032133] mlxsw_switchx2 0000:03:00.0: cannot register bus device
[  475.044499] mlxsw_switchx2: probe of 0000:03:00.0 failed with error -5
root@sx6720:~# mstflint -override_cache_replacement -d 0000:03:00.0 q

-W- Firmware flash cache access is enabled. Running in this mode may cause the firmware to hang.
Image type:            FS2
FW Version:            9.4.5110
FW Release Date:       12.2.2019
Device ID:             51000
Description:           Node             Sys image
GUIDs:                 98039b0300fb7cc0 98039b0300fb7cc0
Description:           Base             Switch
MACs:                      98039bfb7cc0     98039bfb7d20
VSD:                   n/a
PSID:                  IBM1840111020
root@sx6720:~# lspci -nn | grep -i mellanox
03:00.0 InfiniBand [0c06]: Mellanox Technologies MT51136 [15b3:c738] (rev 02)

sx6720_bios_screenshot.png
 

NablaSquaredG

Bringing 100G switches to homelabs
Aug 17, 2020
1,743
1,161
113
Is there a different PSID I can use, mine is IBM branded and I wonder if the IBM1840111020 PSID is messing things up?
Sure, just use the open source tools to extract any backplane firmware you want from the firmware mfa file and flash it over the existing one.

Sadly, open source tooling for SwitchX is not great. Spectrum is much better.

Oh BTW there are leaked versions of old versions of Mellanox SDK for SwitchX and Spectrum somewhere, in case you want to dig really deep.

But SwitchX has too many limitations imo, like no ACLs on SVIs or only fixed buffers per port (not good for Ethernet / RoCE)
 

BoGs

Active Member
Feb 18, 2019
152
37
28
What temperature are you all running at? I have the custom script with 1 PS at 25% fan speed sitting at around 6k rpm per fan and cpu is around 30-60. Not sure if that is fine, high or whatever. Ambient is 19.

Code:
---------------------------------------------------------
Module      Component              Reg  CurTemp    Status
                                        (Celsius)
---------------------------------------------------------
MGMT        SX                     T1   32.00      OK
MGMT        QSFP_TEMP1             T1   23.00      OK
MGMT        QSFP_TEMP2             T1   21.50      OK
MGMT        QSFP_TEMP3             T1   22.00      OK
MGMT        BOARD_MONITOR          T1   27.00      OK
MGMT        CPU_BOARD_MONITOR      T1   28.00      OK
MGMT        CPU_BOARD_MONITOR      T2   55.00      OK

-----------------------------------------------------
Module          Device          Fan  Speed     Status
                                     (RPM)
-----------------------------------------------------
FAN             FAN             F1   5580.00   OK
FAN             FAN             F2   5910.00   OK
FAN             FAN             F3   5730.00   OK
FAN             FAN             F4   5910.00   OK
PS1             FAN             F1   5190.00   OK
PS2             FAN             -    -         NOT PRESENT
 

BoGs

Active Member
Feb 18, 2019
152
37
28
The CPU has a very... questionable heatsink. You can try to replace it
Any recommendations if you happen to know of any? I assume replacing the heatsink is the same as a regular cpu take off clean with 99.9 alcohol wipe, paste put back on.
 

BoGs

Active Member
Feb 18, 2019
152
37
28
I am a little stumped and I would love some ideas.

I have a DCS-7050SX2-72Q that I wanted to connect to a SX6036 but when I do over QSFP dac on certain ports the link on mellanox comes up, but the arista does not even after I forced 40gfull.

I changed the spanning tree protocol mst on the mellanox to match the arista.

Code:
show interface ethernet 51/1 status
Port       Name   Status       Vlan     Duplex Speed  Type         Flags Encapsulation
Et51/1            notconnect   trunk    full   40G    40GBASE-CR4

show interface ethernet 51/1
Ethernet51/1 is down, line protocol is down (notconnect)
  Hardware is Ethernet, address is <snip>
  Ethernet MTU 9214 bytes, BW 40000000 kbit
  Full-duplex, 40Gb/s, auto negotiation: off, uni-link: n/a
  Down 22 days, 23 hours, 9 minutes, 29 seconds
  Loopback Mode : None
  0 link status changes since last clear
  Last clearing of "show interface" counters 22 days, 19:29:20 ago
  5 minutes input rate 0 bps (0.0% with framing overhead), 0 packets/sec
  5 minutes output rate 0 bps (0.0% with framing overhead), 0 packets/sec
     0 packets input, 0 bytes
     Received 0 broadcasts, 0 multicast
     0 runts, 0 giants
     0 input errors, 0 CRC, 0 alignment, 0 symbol, 0 input discards
     0 PAUSE input
     0 packets output, 0 bytes
     Sent 0 broadcasts, 0 multicast
     0 output errors, 0 collisions
     0 late collision, 0 deferred, 0 output discards
     0 PAUSE output
I tried mellanox ports 29 - 36 and they cannot uplink to the arista, but if I move it to port 27 the uplink happens - the settings are the same as the port is set to trunk the vlan. The SUPER wtf thing though if I take my server and using the same length dac as the link from mellanox to arista to connect the mellanox to a dell server having a mellanox nic the link comes up. whhhhaaaaa ? I do not know what to look at now :D

nothing yelling at me in arista or mellanox logs.

Any help is appreciated.
 

BoGs

Active Member
Feb 18, 2019
152
37
28
A small update to the above ^ I used LR4 Mellanox and Arista optics in their respective ports with SMF and all the ports that that dac do not come up with LC SMF the ports come up and pass traffic. I wonder if its something related to those ports and the dac cable o_O