Beware of EMC switches sold as Mellanox SX6XXX on eBay

mmx01 · Jan 21, 2021

tRens said:
So, from my recent experience of getting the 6018 flashed and working.

Thanks tRens & everyone who put together this amazing guide! Just have completed my first cross-flash of that switch and with few breath taking moments like ecc errors and no space left on the device it runs VERSION 3.6.8012! So per above one can go straight to 8012.

What is not in the manual, you may want to set the date to avoid flood of "date in the past" errors. It will work anyways but same as with ecc errors, it slows things down and is annoying. To get date command: ln -s /bin/busybox /bin/date while mtd6 is mounted as rw.

New error also appeared which is safe to ignore: /lib/libblkid.so.1: error reading information on service arp_responder: No such file or directory

For firmware upgrade I went with chroot to mtd7 option which was mounted to /mnt/root2. No need to reboot, you just exit chroot at the end of firmware upgrade.

I kept EMC U-boot and just changed setenv bootcmd 'run mlxlinux' saveenv to autoboot to mellanox rather than EMC kernel.

Fan speed mod requires adjustment versus the guide as hex string is different in tc binary. String to find is already in this thread. Bytes to change are at: 001E2650, you need to change only two: 00 05 28 00 00 0A 28 00 00 00 00 63 00 00 00 00 only in this place.

I was stupid to change switch profile to eth-single-switch via GUI from vpi and bam! It stopped allowing me to login via cli or GUI. It was responding to ping, asking for user/password but then was waiting for something forever. Needed to redo a lot of steps! Essentially you have to wipe out the /config db /var etc. and start from this part again. With just wiping the db portion switch came back to life but only in ib mode without any licenses. Perhaps this issue is linked with hardcoding this: system_profile="3". After I made this change it would no longer allow me to access cli, just hanged with initialization prompt for hours at jet speed.

Consider that before changing to eth-single-switch, unless it worked for others and I did something wrong here.

Like some have mentioned version 8012 complains about PSU: [hwd.WARNING]: adjust_pses_number_according_to_machine: P/N:[(null)] for Dingo is unknown and [web.WARNING]: util_ui_api_is_ib_eth_vpi_enabled: HW supports: IB ETH VPI profiles, but they are not enabled by HW VPD or license. Latter shows only when logged in via GUI.

Even though I have ConnectX-3 cards, Mellanox cables and SX6012, ESXI 7.0 refuses to work at 56GB over Ethernet. It is not displaying that speed at all. When forced with:
esxcli network nic set -n vmnic1000402 -S 56000 -D full , it stops working with 56G forced on the switch. When switch is set to 40GB it starts working showing in ESXI configured speed 56, actual 40.

esxcli network nic get -n vmnic1000402
Advertised Auto Negotiation: true
Advertised Link Modes: 1000None/Half, 1000None/Full, 10000None/Half, 10000None/Full, 40000None/Half, 40000None/Full, Auto
Guessing this is esxi driver limitation.

Total power used : 25.16 Watts

tRens · Jan 22, 2021

[mgmtd.WARNING]: Exit code 1 from /sbin/get_layout_disks.sh: /sbin/get_layout_disks.sh: line 14: /etc/layout_settings.sh: No such file or directory
[mgmtd.ERR]: md_system_iterate_disk(), md_system.c:2973, build 1: Unexpected NULL

I keep seeing this from the sx6018 all 205 pages after running for about ~2 hours.

Has anyone gotten their sx6018 to work with 3.6.8012?

mmx01 · Jan 23, 2021

So 8012 has different hwd checks for PSUs compared to earlier version... I assume system_type==S_SX_DINGO is a way of differentiating Mellanox vs. OEM hardware.

Older hwd had a simple if
if ( ret_ps_num )
{
if ( g_hwd_static_params.system_type == S_SX_DINGO )
{
}
else if ( strncmp(pn, "MSX6012F-1BFS", 0xDu) && strncmp(pn, "MSX6012T-1BFS", 0xDu) ... and other genuine part numbers.

New one is different

if ( ret_ps_num )
{
if ( num_ps_54264 )
{
if ( num_ps_54264 == DPN_ERROR )
{
}
else
{
*ret_ps_numa = num_ps_54264;
}
}
else if ( g_hwd_static_params.system_type == S_SX_DINGO )

So we are failing with this new condition that DPN_ERROR occurs, interesting enough non-standard part numbers have been added too!
&& strncmp(pn, "00YY0K", 6u)
&& strncmp(pn, "0MJ17V", 6u)
&& strncmp(pn, "100-565-095-07", 0xEu)
&& strncmp(pn, "100-565-117-05", 0xEu)
&& strncmp(pn, "100-586-412-01", 0xEu)
&& strncmp(pn, "100-586-512-01", 0xEu)
&& strncmp(pn, "100-886-236-04", 0xEu)
&& strncmp(pn, "100-886-277-03", 0xEu)

Needs more time to understand how system is getting pn (part number). I think overwriting it with MSX6012F-2BFS will solve that issue. Trying to understand if that's pulled via i2c and is programmed somewhere in flash or set as a variable in db. Function reading it seems to be at: hwd_db_get_vpd_pn which suggest mddbreq possibly could overwrie system_type?

Another vector would be to tamper dingo_ps_num and set DPN_ERROR & DPN_UNKNON both to 0x2 (2 PSUs).

What is bizzare is that pulling power plug from either of PSUs provides expected resiliency, my bet is on more advanced load sharing functions... but why the trouble??

That's beyond my understanding at the time.

mmx01 · Jan 23, 2021

Have you completed script with +x rights as in the guide?

rm -f /mnt/root2/etc/layout_settings.sh

touch /mnt/root2/etc/layout_settings.sh

tRens said:
[mgmtd.WARNING]: Exit code 1 from /sbin/get_layout_disks.sh: /sbin/get_layout_disks.sh: line 14: /etc/layout_settings.sh: No such file or directory

It complains there's no file so either it is not there or not executable.
chmod +x /mnt/root2/etc/layout_settings.sh

that of course provided you have mounted: mount -t jffs2 /dev/mtdblock7 /mnt/root2

Psycho_Robotico · Jan 24, 2021

mmx01 said:
Fan speed mod requires adjustment versus the guide as hex string is different in tc binary. String to find is already in this thread. Bytes to change are at: 001E2650, you need to change only two: 00 05 28 00 00 0A 28 00 00 00 00 63 00 00 00 00 only in this place.

Although I edited this part to 1B, fans are still shown at ~8000rpm. Any ideas what I might have missed?

up3up4 · Jan 24, 2021

Psycho_Robotico said:
Although I edited this part to 1B, fans are still shown at ~8000rpm. Any ideas what I might have missed?

Only edited one instance it won't work. 4 instances should be edited in TC file to make the fans slow down.

mmx01 · Jan 24, 2021

First let us know which version you are working with 8012 or earlier? There are two hosting sites if you only work with files-needed-for-conversion then it is older release and you need to modify more than one instance.

I have modded tc recently for 8012 and I needed to change two instances as stated above.

Now interesting issue with PSUs and 8012 version continues. I have managed to make this error go away (treating as 1PSU Dingo). I am not clear at this point what benefits this delivers since from hardware perspective both PSUs work even with the error and one can un-plug one at a time and nothing bad happens.

Second PSU now appeared in GUI, yet GUI refuses to give any parameters here, no other errors. Voltages screen looks normal.

Not elegant way of overcoming 1PSU Dingo error was a simple hack to map DPN_ERROR to value 2 (which means 2 PSUs).

In hwd at address: 0055D90 you need to change 03 to 02 for 2 PSUs or 01 if you want one.
39 40 00 03 91 49 BD 6C 3D 20 10 90 81 29 BD 6C

So far I am trying to understand motives what appears to be intentional black listing via part number check as hwd clearly has OEM part numbers added which weren't in older image. Struct with that definition:
DPN_UNKNON: .set 0
DPN_1: .set 1
DPN_2: .set 2
DPN_ERROR: .set 3

I am not encouraging anyone to do it, whatever you do is at your own risk.

Psycho_Robotico · Jan 24, 2021

mmx01 said:
First let us know which version you are working with 8012 or earlier? There are two hosting sites if you only work with files-needed-for-conversion then it is older release and you need to modify more than one instance.

I have modded tc recently for 8012 and I needed to change two instances as stated above.

Thanks for the help!

Going straight from EMC to 8012. The mlnxbase and image_layout files were taken from the files-needed-for-conversion archive (as I didn't know if there were any newer versions), while the rest was pulled straight from the 8012 firmware as available from Mellanox.

With "two instances" you're referring to the two "28" at address 001E2650? Just to be sure. I changed those to "1B".

Here are the MD5 hashes of the files I used:

Code:

chad B7E34889A3E839F0515F218E96FC867E
hwd 5D73C7B494B660EB224C13946C7575FF
ibd 87FE3958039E92B897E98D4BF54D6C06
image_layout F5FAEB61A5A11B7B67F14735FE81406D
image-PPC_M460EX-ppc-m460ex-20190222-075342.tgz 63B38A5A74BDC8BAB2FC16CD6CA0C628
mlnxbase D6B3D56498EC92E3F44D1E002540F88A
MT_1270110020.bin D570A52114F97ED6BC6D228188EFE7F7
fdt-uni 37C53191F8A1749EBDB42463D9454D88
vmlinuz-uni 445D93FDB81EF30A9AB627461C68FC86

mmx01 · Jan 24, 2021

Psycho_Robotico said:
Thanks for the help!
Going straight from EMC to 8012. The mlnxbase and image_layout files were taken from the files-needed-for-conversion archive (as I didn't know if there were any newer versions), while the rest was pulled straight from the 8012 firmware as available from Mellanox.
...
With "two instances" you're referring to the two "28" at address 001E2650? Just to be sure. I changed those to "1B".

Yes for mlnxbase and image_layout, these come from files-needed-for-conversion. With 8012 you don't need to change idb and both chad & hwd must come patched from 8012 (these are in the folder 3.6.8012 next to MT_1270110020.bin). tc you just extract from image-PPC_M460EX-ppc-m460ex-20190222-075342.tgz. I wrote all of that because there is no ibd in 3.6.8012 folder so be cautious not to mix them.

The original string in tc binary at: 001E2650 is starting with green (different to the guide 10 17 F2 7C) and I changed two in red only in this string.
10 18 77 D8 00 01 00 00 00 00 00 04 00 00 00 04 00 00 00 04 00 04 00 01 00 05 28 00 00 0A 28 00 00

These however are device specific so I see now why my outcome may be different. I did flash SX6012, do you flash SX6012 or SX6018?
Works for me:
SX6012 [standalone: master] # sh temp
---------------------------------------------------------
Module Component Reg CurTemp Status
(Celsius)
---------------------------------------------------------
MGMT SX T1 37.00 OK
MGMT QSFP_TEMP1 T1 29.00 OK
MGMT QSFP_TEMP2 T1 31.50 OK
MGMT QSFP_TEMP3 T1 32.00 OK
MGMT BOARD_MONITOR T1 35.00 OK
MGMT CPU_BOARD_MONITOR T1 40.00 OK
MGMT CPU_BOARD_MONITOR T2 62.00 OK
SX6012 [standalone: master] # sh fan
-----------------------------------------------------
Module Device Fan Speed Status
(RPM)
-----------------------------------------------------
MGMT FAN1 F1 4590.00 OK
MGMT FAN2 F1 4710.00 OK
MGMT FAN3 F1 4200.00 OK
MGMT FAN4 F1 4590.00 OK
SX6012 [standalone: master] #

Psycho_Robotico · Jan 24, 2021

That's exactly what I did on my SX6012, right after uploading and chmod-ding the chad and hwd files. The only difference I can see so far is that you set the values to 1D while I used 1B.

mmx01 · Jan 24, 2021

That shouldn't matter I played a bit with min setting, did not observe much change between 1B and 1D. Can't tell why it works for me and not for you with the same device and binary.

Psycho_Robotico · Jan 24, 2021

Most likely I bungled something up during the process and will have to retry. Thanks for your help!

mmx01 · Jan 25, 2021

If you trust it here is the one I am using for 8012, you can use it for binary comparison.

easyupload.io

Some first tests between ESXI hosts, knowing iperf3 is single threaded I fired up 2 server processes on ports 5000 & 5001 and stared two client connections:
[ 4] 9.00-10.00 sec 1.22 GBytes 10.5 Gbits/sec 4286332568 0.00 Bytes
[ 4] 9.00-10.00 sec 2.99 GBytes 25.7 Gbits/sec 4286332568 0.00 Bytes
So a total of 36.2Gbit/s

Single process gets me in the range of 26-27Gbit/s
[ 4] 0.00-10.00 sec 30.4 GBytes 26.1 Gbits/sec receiver

Looking at upgrade from 10Gbit this is a win for the price of 40Gbit gear. Excluding shipping:
2x40EUR for ConnectX-3 flashed to FCBT
105EUR for the EMC flashed to SX6012 and
2x QFSP+ DAC cables from China for 35EUR

Psycho_Robotico · Jan 25, 2021

mmx01 said:
If you trust it here is the one I am using for 8012, you can use it for binary comparison.

Thanks!
The file you sent replaced the "28" with "1B" at 1E3B26, 1E3B2A, 1E5B02 and 1E5B06, but not at 1E2652 and 1E2656. In my file it's the other way round. Apparently it needs to be changed in multiple places. Yours was already edited in the other locations, while mine wasn't and thus failed.

mmx01 · Jan 25, 2021

Then mine must have been pre-edited... interesting since I took it out from the image file attached with other files. Is it working for you now?

Psycho_Robotico · Jan 25, 2021

I'll give it another try on the weekend. As this seems to be the root case I'm confident it will work as intended on the second attempt.

mmx01 · Jan 26, 2021

Has anyone tested SX6012 or SX6018 for inter vlan routing performance? Where between two ESXI hosts in the same VLAN over SX6012 I get 27Gbit/s with single thread utilizing just switching capabilities that's okay. When I moved VMs to separate VLANs with routing over SX6012 VLAN interfaces I get 14Gbit/s with no ACLs... not impressed with a $5k (when new) L3 switch.

Also managed to get 56Gb/s link speeds with ESXI, apparently I was mis-sold IBM branded version of the cable. I was able to force both ends (switch & ESXI) to 56 but no juice. With the right cable speed got negotiated by the network adapter to 56Gb.

Almost all Chinese DACs sold as 56Gb/s with mellanox pn listed are either IBM or HP despite pictures showing blue tabs and mellanox part numbers. Ordered 4 and all 4 of them are IBM ones. Upon lecturing 2 different sellers have asked for refund and purchased from a different source which this time worked!

If you have this, then 56Gb/s will not work and IBM part number will show instead.

mmx01 · Jan 26, 2021

And the last error is resolved: [web.WARNING]: util_ui_api_is_ib_eth_vpi_enabled: HW supports: IB ETH VPI profiles, but they are not enabled by HW VPD or license.

Responsible binary: /opt/tms/lib/web/handlers/rh

It occurs when: if ( *is_ib_enableda == false && *is_eth_enableda == false && *is_vpi_enableda == false )
so we need to convince it at least one of conditions is not met: *is_vpi_enableda = false; -> true, this is bool so:
00 = false
01 = true

102D74C0 81 3F 00 44 39 40 00 00 -> 102D74C0 81 3F 00 44 39 40 00 01

Happy I could give something back on two errors reported, no more worries nand will get worn through excessive logging.

I will come back on tc with fan curves... when temp got down fans ramped-up??? Bizarre logic. I looked at fresh tc and it seems that every instance of 10 18 77 D8 needs to be modified, reason being 10 18 77 D8 is mapped to our SX_Dingo device (linked to PSU errors).

10 18 77 D8 00 01 00 00 00 00 00 04 00 00 00 04 00 00 00 04 00 04 00 01 00 05 28 00 00 0A 28 00 00
First occurence in each dingo string (28) is min_chassis_fan_speed
0A is also interesting, chassis_fan_inc_percent I set it to 05 for testing
Later 28 is min_spine_fan_speed
4rs along the way code number of fans

Seems SX6012 and SX6018 and SX6036 are members of the same Dingo family... there are scorpions, phanters, spiders and other animals around

Regards,
Mariusz

klui · Jan 26, 2021

mmx01 said:
Not elegant way of overcoming 1PSU Dingo error was a simple hack to map DPN_ERROR to value 2 (which means 2 PSUs).

In hwd at address: 0055D90 you need to change 03 to 02 for 2 PSUs or 01 if you want one.
39 40 00 03 91 49 BD 6C 3D 20 10 90 81 29 BD 6C

With what program are you disassembling the binary?

What do you think about the set of values just before that at 00055d80?

39 40 00 02 91 49 BD 6C 48 00 00 10 3D 20 10 90

39 40 00 03 91 49 BD 6C 3D 20 10 90 81 29 BD 6C

Maybe the pointer to the datastructure is wrong and should be using this structure instead of what you patched?

mmx01 · Jan 26, 2021

I am using IDA, there are some shortcuts and these are not 1:1 source code transcipts like with any decompiler plus there seems to be offset as an example compared to HXD editor from what I see. I will upload modded hwd & rh tomorrow, binary comparison will be much more effective than chasing differences between hex editors.

Beware of EMC switches sold as Mellanox SX6XXX on eBay

Member

New Member

Member

Member

Active Member

Member

Member

Active Member

Member

Active Member

Member

Active Member

Member

Active Member

Member

Active Member

Member

Member

Well-Known Member

Member