Mellanox Switches - Tips & Tricks

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

NablaSquaredG

Layer 1 Magician
Aug 17, 2020
1,085
621
113
Similar to the glorious Brocade ICX Series (cheap & powerful 10gbE/40gbE switching) Thread, I'd like to use this thread to compile some knowledge about Mellanox switches that is spread over many places in the forum, which can be hard to find for newbies.

I will expand this thread / post over time, so if you have some information that you'd like to have added, just write it in a comment.

(MAXIMUM) SUPPORTED FIRMWARE VERSIONS:

Upgrading beyond these versions or using other versions may brick your switch and require a recovery procedure!


Name / TypesVersion
SwitchX / SwitchX-2 PowerPC (SX6036, SX6012, SX1016, etc...)3.6.8012 (no newer build for PowerPC available)
SwitchX / SwitchX-2 x86 (SX6710, SX1410, etc...)3.6.8012 (upgrading beyond WILL brick your switches BIOS)
Spectrum x86 (SN2100, SN2700, SN2010, etc)3.10.4XXX (DO NOT upgrade to 3.10.5000, 3.10.6004 etc.)
SwitchIB (SB7700)3.9.3124



As I don't have a lot of time right now (but want to get this done before I forget it), I'll just start with some basics



Overview of current Mellanox switch series that may be interesting for homelabs:

- Mellanox IS5000 Series: Old, Infiniband only switches. They should generally be avoided:

- Mellanox SX Series: 40G / 56G switches with Switch-X / SwitchX-2 chip. They come in different flavours (unmanaged / managed, full width, half width, etc... TODO: Full Model Listing). They are very energy efficient (TODO: Exact numbers). Most of them are PowerPC, but some are x86

Managed SX Series switches can do VPI - That means you can have some ports do Infiniband, some Ethernet at the same time and even use integrated IPoIB gateway functionality.

Highest supported firmware version (for managed switches): 3.6.8012
DO NOT try to update to a version beyond that on x86 switches - It may brick your switch by automatically doing a bios update that prevents the switch ASIC from being detected!


- Mellanox SB77XX/78XX series: 100G Infiniband switches with Switch-IB / Switch-IB 2 chip (no VPI like SX Series). x86 control plane, highest supported version: 3.9.3124
Very power efficient! SB7700 needs only 53W in IDLE (one PSU)

- Mellanox SN2XXX series: 100G Ethernet switches with Spectrum chip (no VPI like SX series). x86 control plane, highest supported version: ?? Apparently, it's 3.10.4100 (LTS)
Very power efficient. SN2700 needs only 51W in IDLE (one PSU), SN2100 about 35W iirc (after the fans have spun down)
They come in different flavours, but generally you have:
- SN2700: 19" 32x100G, Celeron 1047U, mSATA SSD (TODO: Expand)
- SN2100: 9.5" (half-width), Atom C2XXX (possibly affected by AVR54 bug?), M.2 SATA SSD
... and more (TODO: Full model listing)


Tips & Tricks for working with those switches:

- SN2XXX / SB7XXX series: Replace your SSDs and make backups! The original Innodisk 3ME SSDs are prone to failure (one died while I was taking an image).
You can use whatever mSATA (apparently all except SN2100) or M.2 SATA (apparently only SN2100) you want. My go-to model are Transcend 452T2 (e.g. TS128GMSA452T2)

- If you get an SN2XXX with ONIE, Cumulus or no OS, you can easily flash it to ONYX / MLNX-OS by taking a good MLNX-OS / ONYX image from another switch and patching the embedded database to the correct model number, number of ports, MAC Addresses, etc... (TODO: Guide)

TODO: Expand
 
Last edited:

i386

Well-Known Member
Mar 18, 2016
3,939
1,408
113
34
Germany
Personally I would separate some topics into their own threads: the (m)sx 6000 series that are based on the switchx(2) asics in one thread, 100gbe (and 40+gbe that cam after the sx6000 series) in another and 100Gbit/s+ infiniband in another thread

The sx6000 series is the last series to feature vpi*, requires licenses to enable features, is (was?) better avaialable, using ppc for management
The sn2000 and newer series are ethernet only, don't require licenses for ceratian features**, use x86 based cpus for management
the sb series switches are infiniband only***

* nvidia sells a gateway appliance, but that is a 2u server with 4 dual port vpi adapters
** I don't have one and rely on the information that i have seen here in the forums and some pdfs from mellanox
*** infiniband is interesting but has many downsides and support was removed for some widely used applications
 

Freebsd1976

Active Member
Feb 23, 2018
376
71
28
SN2XXX / SB7XXX series: Replace your SSDs and make backups! The original Innodisk 3ME SSDs are prone to failure (one died while I was taking an image).
You can use whatever mSATA (apparently all except SN2100) or M.2 SATA (apparently only SN2100) you want. My go-to model are Transcend 452T2 (e.g. TS128GMSA452T2)
If upgrade recently onyx or mlnxos, the disk firmware Will also upgrade, so maybe replace ssd is not necessary.
in 2019 or 2020 , mellanox change the sn2100 and 2010 ssd to storfly.

In addtion, there also another serious issue about console cable. paste the contents here, just in case the webpage gone.
The switch's I2C bus hangs with Mellanox Serial Console cable (site.com)
Code:
The switch's I2C bus hangs with Mellanox Serial Console cable
Mar 8, 2020•Knowledge Article
The switch's I2C bus hangs with Mellanox Serial Console cable MLNX2-117-6533kn
Hardware issue in the current console cable.
A hardware issue with the provided console cable has been recently discovered and may affect the below switch systems.
The issue can cause the switch I2C buss to hang, resulting in a switch failure. 
Systems may experience slowness or complete inability to function.
Affected switch systems list: SX1710 SX1410 SX6710 SB7700 SB7800 SN2700 SN2410 CS7500 CS7510 CS7520

The provided console cable may cause the internal unit's I2C bus to hang, causing the entire switch system to hang as well.
The unnecessary usage of pins in the current RJ45 to DB9 cable harness may lead to an l2C bus hang.
It is a hardware issue in the current console cable that is provided with the above list of switch systems.
Please use the provided console cable only for the initial and first switch configuration.
Once completed, disconnect the cable and perform the rest of the configurations by using the network. 
A fix will be implemented in the future manufacturing of the console cable.
 
Last edited:

Aluminat

Member
Jul 5, 2019
51
22
8
- SN2XXX / SB7XXX series: Replace your SSDs and make backups! The original Innodisk 3ME SSDs are prone to failure (one died while I was taking an image).
Did you happen to have backup image file of SN2410? I have a pair this but during upgrade the SSDs failed.
 
  • Like
Reactions: eSk8er

NablaSquaredG

Layer 1 Magician
Aug 17, 2020
1,085
621
113
Did you happen to have backup image file of SN2410? I have a pair this but during upgrade the SSDs failed.
You can mod an SN2700 / SN2100 / SB7700 image (disk backup) to work on SN2410. I will post the guide soon, in the meantime you can write me a PM if you haven't already been able to fix your SN2410?
 
  • Like
Reactions: Aluminat

uncensoredtr

New Member
Sep 13, 2022
5
0
1
Personally I would separate some topics into their own threads: the (m)sx 6000 series that are based on the switchx(2) asics in one thread, 100gbe (and 40+gbe that cam after the sx6000 series) in another and 100Gbit/s+ infiniband in another thread

The sx6000 series is the last series to feature vpi*, requires licenses to enable features, is (was?) better avaialable, using ppc for management
The sn2000 and newer series are ethernet only, don't require licenses for ceratian features**, use x86 based cpus for management
the sb series switches are infiniband only***

* nvidia sells a gateway appliance, but that is a 2u server with 4 dual port vpi adapters
** I don't have one and rely on the information that i have seen here in the forums and some pdfs from mellanox
*** infiniband is interesting but has many downsides and support was removed for some widely used applications
I think this will be way better for clarity through the topic. It will be too messy to be able to control the topic when it is too generalized.
 

nasbdh9

Active Member
Aug 4, 2019
158
86
28

NablaSquaredG

Layer 1 Magician
Aug 17, 2020
1,085
621
113
onyx and MLNX-OS are exactly the same thing, although there may be differences between old and new firmware distributed on minor releases
example, although the version number of MLNX-OS 3.10.5000 is higher, the actual firmware for onyx devices (ETH) is the version on "onyx" 3.10.4006
that won't help him. They either need an installer version or someone who provides an image which can be modded to fit their switch ;)

Right now I'm a bit too busy to mod an SN2700 image for them, so maybe someone else can help.
 

nasbdh9

Active Member
Aug 4, 2019
158
86
28
After you have the above files, you will be able to replace the SSD on the device at will, but the non-original SSD will not make the SSD health report take effect.

Please also be careful not to try to use UEFI boot to install these on the SN2××× device, the system will think that the DMI information is incorrect, and then try to update the BIOS to restart continuously, although ONIE and the system itself support UEFI.
UEFI booting may be available on newer devices, but I haven't tried it.

The way to enter the BIOS settings is to press Ctrl+B during booting. If you are prompted to ask for a password, it may be admin.
 

Freebsd1976

Active Member
Feb 23, 2018
376
71
28
After you have the above files, you will be able to replace the SSD on the device at will, but the non-original SSD will not make the SSD health report take effect.
use innodisk or storfly (msata / m.2 sata ) ssd to replace, then SSD health will work as usual

example, although the version number of MLNX-OS 3.10.5000 is higher, the actual firmware for onyx devices (ETH) is the version on "onyx" 3.10.4006
3.10.4206 lts
 
Last edited:

awedio

Active Member
Feb 24, 2012
765
220
43
I have instructions on how to install Onyx (aka MLNX-OS) on a SN2700.
Melanox support provided me with these instructions (long story).
 

nasbdh9

Active Member
Aug 4, 2019
158
86
28
Can you share them with me?
Uploaded a guide on MEGA

I recommend using DHCP and HFS (HFS ~ HTTP File Server), you can complete the reinstallation within 10 minutes, after Embed ONIE, the device reboots into the ONIE install environment, and then executes
onie-nos-install http://[IP]/X86_64-x.x.xxxx-installer.bin Wait for the installation to complete and the device will automatically restart
 
  • Like
Reactions: awedio and Stephan

dbTH

Member
Apr 9, 2017
146
57
28
What USB to Serial adapter do you guys use to connect your laptop to the Mellanox switch serial console (especially the SN2xxx series)? There's an article that talked about the adapter selection (5 Steps for Selecting the Right USB to Serial adapter). But not sure if it is really worth to buy those recommended. I have an older adapter (pre-2012) with older Prolific PL2303HXA chip lying around , however, the adapter driver is no more compatible with the newer Windows OS. Looking to get an adapter with newer chip, and would like to hear your suggestions.