Beware of EMC switches sold as Mellanox SX6XXX on eBay

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

NablaSquaredG

Bringing 100G switches to homelabs
Aug 17, 2020
1,591
1,051
113
Well... it seems like the SB7700 engineering sample just died.

After a reboot, there is no serial output and the LED on the rear stays red.


Press F to pay respect :)
 
  • Sad
Reactions: Rand__

Freebsd1976

Active Member
Feb 23, 2018
404
73
28
maybe wrong firmware ? if unmangement firmware falshed ,then no serial console output, maybe you need reflash mangement firmware use programmer?
 

NablaSquaredG

Bringing 100G switches to homelabs
Aug 17, 2020
1,591
1,051
113
The switch doesn't even show the BIOS output as it should and has always done before.

This is not even related to installed firmware / software.
 

Labs

Member
Mar 21, 2019
88
16
8
The switch doesn't even show the BIOS output as it should and has always done before.

This is not even related to installed firmware / software.
Do you see a CMOS battery that you can pull out? Or some CMOS reset jumper? Worth to try it. PSU is OK and has all voltages in range?
 

RedX1

Active Member
Aug 11, 2017
134
147
43
Hello



I do not have any EMC Mellanox SX6XXX switches, but I have taken several IBM and HP SX1036 and SX6036 Switches through the upgrade process.

The upgrade process using the Web-GUI for the HP images takes around 25 mins to upload the image and then another 25 mins to fully install it.

The upgrade from the HP 3.6.8010 to the Lenovo 3.6.8012 image is not as straightforward. The process is very slow and will take almost 120 mins. (There is a counter).

If you are installing multiple images you will need to remove the old web-images, otherwise the process will fail.

Please see posts 20 and 24 from this thread.
https://forums.servethehome.com/index.php?threads/us-mellanox-sx6036-200.31513/


I hope this helps.



Good luck.





RedX1


HI



Earlier in this thread I reported experiencing very long firmware upgrade times, on some Non-EMC SX6036 switches when upgrading to 3.6.8012 firmware.

I just came across this KB article from the Nvidia Mellanox site which may explain the reason for this behaviour.

Once the upgrade process is completed, the switches perform as expected.



Link.

https://support.mellanox.com/s/article/MLNX2-117-3464kn



MLNX-OS PPC based systems log activity

Rate This Article

Avarage Rating: 0.0

Related Product Release/Version:

all

Problem Question:

SwitchX / SwitchX2 based systems with PPC CPU may encounter high CPU utilization due to consistent high logging rate.

The high CPU utilization can cause issues during software upgrades processes (during the install phase) and it may (rarely) cause other processes to work slower than usual.

Solution Answer:

The MLNX-OS file system garbage collector jffs_mtd9 process may use a lot of CPU cycles during it's activity.

The more logging is being done on the switch different log files, the more the jffs_mt9 process will be active.

In case the switch CLI/ WebUI is slow and there are High CPU utilization events generated by the switch, the customer should contact Mellanox Technologies Technical Support (via the regular ticketing system Networking-support@nvidia.com) in order to identify the process that causes the high CPU utilization, and in order to get guidance on how to avoid the high CPU utilization.

Related case:

250956 Related RM Release:





I hope this helps, in case someone else encounters this very slow firmware upgrade issue.





Have fun.





RedX1
 
  • Like
Reactions: Labs

veegee

New Member
Dec 9, 2019
6
1
3
Hi guys, I've messaged the OP for the guide a few days ago but didn't get a response. Would anyone be kind enough to share to do the conversion and get the ethernet gateway working?
 

jb221

New Member
Apr 18, 2022
2
0
1
Once the switch is fully converted can I update to a newer version using the Mellanox management console? Or would I need to upgrade it the way the guide describes? I don't want to have to completely re-do the conversion process or something if the built in update feature doesn't work right.
 

Stephan

Well-Known Member
Apr 21, 2017
1,013
781
113
Germany
@jb221 For SX6012, if you used the HP manufacturing boot environment via TFTP, ran manufacture.sh with image-PPC_M460EX-SX_3.4.0012.img, patched fru_backplate and wrote that to switch's correct EEPROM, flashed MT_1270110020.bin SwitchX firmware using flint, then it will be just a matter of

enable
configure terminal
image fetch http://192.168.128.100:8100/image-PPC_M460EX-3.6.8012.img
image install image-PPC_M460EX-3.6.8012.img
image boot next
configuration write
reload

from the console (not even shell). Web UI should then also work, because all partitioning, firmware and EEPROM is then standard Mellanox.

Edit: Might need efm_sx_ib_enabled true in your license if you intend to use infiniband.
 

Rand__

Well-Known Member
Mar 6, 2014
6,643
1,778
113
And if done via the old method?
Always wanted to update my Pinocchio to a real Boy (6012) but never found a clear (enough) guide to go from mpogr's conversion to "no EMC left on system..."
 

Stephan

Well-Known Member
Apr 21, 2017
1,013
781
113
Germany
Probably recommended to start from zero.

You need a DHCP-server to serve IPs, like tftpd64 on Windows, also for TFTP with populated files vmlinuz, fdt and rootfs in subdirectory mlnx460ex/, and a simple web-server like sheret that holds MT_1270110020.bin, image-PPC_M460EX-SX_3.4.0012.img and image-PPC_M460EX-3.6.8012.img.

Boot into u-boot, save all variables (copy paste printenv output into a text file, just to be sure), erase all variables, and make it permanent:

printenv
env default -a -f
saveenv


Boot the HP factory mini-Linux from u-boot and start manufacture.sh script (does not rely on anything from environment variables)

setenv autostart no
setenv autoload no
dhcp
tftp 400000 192.168.128.100:mlnx460ex/vmlinuz
tftp 800000 192.168.128.100:mlnx460ex/fdt
tftp C00000 192.168.128.100:mlnx460ex/rootfs
setenv bootargs root=/dev/ram rw ramdisk_size=262144 ramdisk=262144
bootm 400000 C00000 800000

(Login as root)
/sbin/manufacture.sh -a -v -v -B -m ppc -u http://192.168.128.100:8100/image-PPC_M460EX-SX_3.4.0012.img
reboot


Then login as admin/admin, fill out the Wizard. Enable shell access:

enable
conf t
license install LK2-RESTRICTED_CMDS_GEN2-88A1-NEWD-BPNB-1
conf write
_shell


Delete any U-Boot password:

/opt/tms/bin/mddbreq /config/db/initial set modify - /system/bootmgr/password string ''
eetool -a bf -s UBPASSWD=""


Check bootstrap EEPROM (16 bytes):

/opt/tms/bin/mellaggra _read 0 0x52 0 1 16
Check if correct 166 MHz settings else
mlxi2c update_bootstrap166 (86 82 96 1a d9 80 0 e0 c0 8 23 50 d 5 0 0) or
mlxi2c update_bootstrap200 (86 82 96 19 b9 80 0 e0 c0 8 23 50 d 5 0 0)
Should not be necessary! I can decode the bytes if you don't have those from the update_bootstrap166-line. Ask before rewriting.

Read backplate_fru and cpu_fru EEPROMs:

/opt/tms/bin/mellaggra _read_fru 1 0x51 1000 fru_backplate.bin or
/opt/tms/bin/mellaggra _read_fru 8 0x51 1000 fru_backplate.bin
/opt/tms/bin/mellaggra _read_fru 0 0x50 1000 fru_cpu.bin


Copy them off to safety (TFTP server):

tftp 192.168.128.100 -m binary -c put fru_backplate.bin
tftp 192.168.128.100 -m binary -c put fru_patched.bin
tftp 192.168.128.100 -m binary -c put fru_cpu.bin


If your backplate EEPROM is unmodified so far, you have to patch it:

touch emc_to_6012
vi emc_to_6012


Insert relevant snippet from emc_to_6012 function, I am assuming you are converting an EMC to an SX6012 and have so far used modified binaries instead of patching the backplate FRU EEPROM:

dd if=/dev/zero bs=16 count=256 of="$2" 2> /dev/null
dd if="$1" bs=16 count=12 of="$2" conv=notrunc 2> /dev/null
dd if="$1" bs=16 count=5 of="$2" skip=12 seek=14 conv=notrunc 2> /dev/null
printf "\x20" | dd of="$2" bs=1 seek=1 count=1 conv=notrunc 2> /dev/null
printf "\x00" | dd of="$2" bs=1 seek=5 count=1 conv=notrunc 2> /dev/null
printf "\x05\x0E\x02\x14\x06\x16\x07" | dd of="$2" bs=1 seek=15 count=7 conv=notrunc 2> /dev/null
printf "\x00\x1A\x00\x03\x05\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00" | dd of="$2" bs=1 seek=192 count=32 conv=notrunc 2> /dev/null
printf "\x00\x12\x00\x01\x06\x00\x00\x00\x00\x01\x00\x00\x02\x88\x04\x04\x02\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0A\x00\x01\x07\x00\x00\x00\x00\x02\x10\x00\x00\x00\x00\x00" | dd of="$2" bs=1 seek=320 count=48 conv=notrunc 2> /dev/null
printf "\x4D\x53\x58\x36\x30\x31\x32\x46\x2D\x32\x42\x46\x53\x00" | dd of="$2" bs=1 seek=64 count=14 conv=notrunc 2> /dev/null


Patch EEPROM file on switch and write it to EEPROM:

sh emc_to_6012 fru_backplate.bin fru_patched.bin
/opt/tms/bin/mellaggra _write_fru 1 0x51 1000 fru_patched.bin


Download Mellanox SwitchX firmware and flash it:

curl -O http://192.168.128.100:8100/MT_1270110020.bin
flint --override_cache_replacement --allow_psid_change -d /dev/mst/mt51000_pciconf0 -i ./MT_1270110020.bin b


Reboot into u-boot, then powercycle for good measure.

Update to 3.6.8012:

enable
configure terminal
image fetch http://192.168.128.100:8100/image-PPC_M460EX-3.6.8012.img
image install image-PPC_M460EX-3.6.8012.img
image boot next
configuration write
reload


Install licenses... (not saying more here)

Modify rc.local to slow down fans:

mount -nwo remount,rw /
vi /etc/rc.d/rc.local


Remove any exit line and add this after the touch... line

FAN_MIN="27"
FAN_MAX="50"
WAIT_MAX="10" # 5 minutes

MDREQ1="/opt/tms/bin/mdreq action /system/chassis/actions/set-fan-speed fan_module string"
MDREQ2="fan_number int8 1 fan_speed int8"
MDREQ3="set_max uint8"

i=1
while :; do
PID=$(pidof clusterd)
if [ -n "$PID" ]; then
sleep 60
echo "Adjusting fan speed"
$MDREQ1 "/MGMT/FAN1" $MDREQ2 $FAN_MIN $MDREQ3 $FAN_MAX
break
else
sleep 30
i=$((i+1))
if [ $i -gt $WAIT_MAX ]; then
echo "Timeout waiting for clusterd"
break
fi
fi
done

exit 0


FAN_MIN="27" is debatable. Fans should run at ~4500-5000rpm. If you get sudden fan spinups or annoying harmonics, increase by 1.

Good luck... also, no warranties. As long as U-Boot is there, you can fix everything yourself. If U-Boot becomes damaged, then meanwhile a handful of people on here and me can fix that now as well. Would require a BDI 2000 hardware JTAG debugger, some cables and some magic files though.
 

klui

༺༻
Feb 3, 2019
919
526
93
What's the procedure here?
Check bootstrap EEPROM (16 bytes):

/opt/tms/bin/mellaggra _read 0 0x52 0 1 16
Check if correct 166 MHz settings else
mlxi2c update_bootstrap166 (86 82 96 1a d9 80 0 e0 c0 8 23 50 d 5 0 0) or
mlxi2c update_bootstrap200 (86 82 96 19 b9 80 0 e0 c0 8 23 50 d 5 0 0)
Should not be necessary! I can decode the bytes if you don't have those from the update_bootstrap166-line. Ask before rewriting.
What output from the mellaggra command determines whether update_bootstrap166 or update_bootstrap200 is run?

EDIT: @andvalb provided the answer at https://forums.servethehome.com/ind...-as-mellanox-sx6xxx-on-ebay.10786/post-323205 through imd in the bootloader.

I'm not a HW guy but could someone explain why mellaggra uses 0x52 but imd uses 0x50 for the address?
 
Last edited:

Stephan

Well-Known Member
Apr 21, 2017
1,013
781
113
Germany
DO NOT USE U-BOOT TO READ/WRITE EEPROM. (Sorry for yelling) A software bug can overwrite the first byte instead of performing the wanted read operation and you will enter an entirely new fresh hell. You will need an I2C programmer or a RPi and deep knowledge to fix this.

@klui I only ever checked that 86 82 96 1a d9 80 0 e0 c0 8 23 50 d 5 0 0 is in EEPROM, so there is not slowdown of CPU due to wrong values. Do not execute these commands, other in Linux the mellaggra _read. If values are good, never ever touch them again. First few bytes determine e.g. CPU clock and if these are corrupted, CPU will not boot anymore. This EEPROM is read by factory CPU ROM as a very very first bootstrap to set things up.
 
  • Like
Reactions: klui

Rand__

Well-Known Member
Mar 6, 2014
6,643
1,778
113
How'd you do that? I noticed a slowdown on the login in cli around when I got to 3.6.8010, it took way longer to get from login to where I could begin to type commands, same on my 6012 on 3.6.1002.
Have you been able to fix the slowness with the EEPROM?

I noticed the same but my EEPROM looks fine, so am looking for other reasons ...
 

Labs

Member
Mar 21, 2019
88
16
8
If your backplate EEPROM is unmodified so far, you have to patch it:

touch emc_to_6012
vi emc_to_6012


Insert relevant snippet from emc_to_6012 function, I am assuming you are converting an EMC to an SX6012 and have so far used modified binaries instead of patching the backplate FRU EEPROM:

dd if=/dev/zero bs=16 count=256 of="$2" 2> /dev/null
dd if="$1" bs=16 count=12 of="$2" conv=notrunc 2> /dev/null
dd if="$1" bs=16 count=5 of="$2" skip=12 seek=14 conv=notrunc 2> /dev/null
printf "\x20" | dd of="$2" bs=1 seek=1 count=1 conv=notrunc 2> /dev/null
printf "\x00" | dd of="$2" bs=1 seek=5 count=1 conv=notrunc 2> /dev/null
printf "\x05\x0E\x02\x14\x06\x16\x07" | dd of="$2" bs=1 seek=15 count=7 conv=notrunc 2> /dev/null
printf "\x00\x1A\x00\x03\x05\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00" | dd of="$2" bs=1 seek=192 count=32 conv=notrunc 2> /dev/null
printf "\x00\x12\x00\x01\x06\x00\x00\x00\x00\x01\x00\x00\x02\x88\x04\x04\x02\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0A\x00\x01\x07\x00\x00\x00\x00\x02\x10\x00\x00\x00\x00\x00" | dd of="$2" bs=1 seek=320 count=48 conv=notrunc 2> /dev/null
printf "\x4D\x53\x58\x36\x30\x31\x32\x46\x2D\x32\x42\x46\x53\x00" | dd of="$2" bs=1 seek=64 count=14 conv=notrunc 2> /dev/null


Patch EEPROM file on switch and write it to EEPROM:

sh emc_to_6012 fru_backplate.bin fru_patched.bin
/opt/tms/bin/mellaggra _write_fru 1 0x51 1000 fru_patched.bin
Hello Stephan,

Do you have a script to patch the EEPROM for the 6036? I have an IBM branded one and I would like to make it OEM.

Thanks!