Mellanox Switches - Tips & Tricks

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

NablaSquaredG

Bringing 100G switches to homelabs
Aug 17, 2020
1,618
1,072
113
Similar to the glorious Brocade ICX Series (cheap & powerful 10gbE/40gbE switching) Thread, I'd like to use this thread to compile some knowledge about Mellanox switches that is spread over many places in the forum, which can be hard to find for newbies.

I will expand this thread / post over time, so if you have some information that you'd like to have added, just write it in a comment.

(MAXIMUM) SUPPORTED MLNX-OS / ONYX VERSIONS:

Upgrading beyond these versions or using other versions may brick your switch and require a recovery procedure!


Name / TypesVersion
SwitchX / SwitchX-2 PowerPC (SX6036, SX6012, SX1016, etc...)3.6.8012 (no newer build for PowerPC available)
SwitchX / SwitchX-2 x86 (SX6710, SX1410, etc...)3.6.8012 (upgrading beyond WILL brick your switches' BIOS)
SwitchIB (SB7700)3.9.3124
Spectrum x86 (SN2100, SN2700, SN2010, etc)3.10.4XXX (3.10.4404 as of 24th January 2024)
(DO NOT upgrade to 3.10.5000, 3.10.6004, 3.11.XXXX etc.)



As I don't have a lot of time right now (but want to get this done before I forget it), I'll just start with some basics



Overview of current Mellanox switch series that may be interesting for homelabs:

- Mellanox IS5000 Series: Old, Infiniband only switches. They should generally be avoided:


- Mellanox SX Series: 40G / 56G switches with Switch-X / SwitchX-2 chip.

They come in different flavours (unmanaged / managed, full width, half width, etc... TODO: Full Model Listing).
They are very energy efficient (TODO: Exact numbers). Most of them are PowerPC, but some are x86

Managed SX Series switches can do VPI - That means you can have some ports do Infiniband, some Ethernet at the same time and even use integrated IPoIB gateway functionality.

Highest supported firmware version (for managed switches): 3.6.8012
DO NOT try to update to a version beyond that on x86 switches - It may brick your switch by automatically doing a bios update that prevents the switch ASIC from being detected!



- Mellanox SB77XX/78XX series: 100G Infiniband switches with Switch-IB / Switch-IB 2 chip (no VPI like SX Series). x86 control plane, highest supported version: 3.9.3124
Very power efficient! SB7700 needs only 53W in IDLE (one PSU)


- Mellanox SN2XXX series: 100G Ethernet switches with Spectrum chip (no VPI like SX series). x86 control plane, highest supported version: ?? Apparently, it's 3.10.4100 (LTS)
Very power efficient. SN2700 needs only 51W in IDLE (one PSU), SN2100 about 35W iirc (after the fans have spun down)
They come in different flavours, but generally you have:

Full Width:
- SN2700: 19" 32x100G, Celeron 1047U, mSATA SSD (TODO: Expand)
- SN2410: 19", 48x25G + 8x100G, Celeron 1047U, mSATA SSD (TODO: Expand)

- SN2100: 9.5" (half-width) 16x100G, Atom C2XXX (affected by AVR54 bug, depending on production date), M.2 SATA SSD (M.2 2242), no normal USB Port (needs Mini-USB to USB OTG adapter)
- SN2010: 9.5" (half-widht) 18x25G + 4x100G, Atom C2XXX (affected by AVR54 bug, depending on production date), M.2 SATA SSD (M.2 2242)

..and more?



Tips & Tricks for working with those switches:

- SN2XXX / SB7XXX series: Replace your SSDs and make backups! The original Innodisk 3ME3 / 3IE1 / StorFly VSF302XC016G-MLX SSDs are prone to failure. There are firmware updates, but they do not help.
Apparently, there are fixed versions: StorFly VSF302XC016G-MLX1, Innodisk DHM24-16GD09BC1DC-92

Recommended replacement models:

Full Width (SN2700, SN2410): mSATA -> Transcend TS128GMSA452T-I (Industrial Variant, higher temperature range) or TS128GMSA452T, other variants like TS64GMSA452T, TS64GMSA452T-I, TS256GMSA452T, TS256GMSA452T-I, etc... will work too. But the 128GB is usually the best choice (enough capacity and price per capacity). I recommend the -I Industrial Version due to higher temperature range - it's just about 2 bucks more than the standard version

Half Width (SN2100, SN2010): M.2 2242 SATA -> TS128GMTS552T-I (Industrial Variant, higher temperature range) or TS128GMTS552T
Same as above: Other capacities will work too

- If you get an SN2XXX with ONIE, Cumulus or no OS, you can easily flash it to ONYX / MLNX-OS by using the setups linked later in the thread (TODO: Guide)

- When installing / recovering half-width series (SN2100, SN2010), DO NOT USE UEFI BOOT
Control Plane with Atom C2000 needs legacy boot. Otherwise it will get stuck in an infinite loop trying to update the BIOS, because it cannot read the DMI and the auto bios update script is terribly written.

- When upgrading SN2000 switches, from very old firmware, do not take too large steps.
There is at least one known issue when upgrading from v3.7 to v3.10, where v3.10 OS will not be able to find the ASIC and be stuck in a loop.
The ASIC has its own EEPROM with firmware, which is upgraded when upgrading the OS. However, v3.10 OS seems to be incompatible with the ASIC Firmware from v3.7

- The serial terminal behaves a bit weird sometimes when editing large commands. To fix this, use cli session terminal resize

- To download firmware from an USB stick, use image fetch scp://admin:admin@127.0.0.1/var/mnt/usb1/image-X86_64-3.10.4404.img

TODO: Expand
 
Last edited:

i386

Well-Known Member
Mar 18, 2016
4,444
1,660
113
35
Germany
Personally I would separate some topics into their own threads: the (m)sx 6000 series that are based on the switchx(2) asics in one thread, 100gbe (and 40+gbe that cam after the sx6000 series) in another and 100Gbit/s+ infiniband in another thread

The sx6000 series is the last series to feature vpi*, requires licenses to enable features, is (was?) better avaialable, using ppc for management
The sn2000 and newer series are ethernet only, don't require licenses for ceratian features**, use x86 based cpus for management
the sb series switches are infiniband only***

* nvidia sells a gateway appliance, but that is a 2u server with 4 dual port vpi adapters
** I don't have one and rely on the information that i have seen here in the forums and some pdfs from mellanox
*** infiniband is interesting but has many downsides and support was removed for some widely used applications
 
  • Like
Reactions: jode

Freebsd1976

Active Member
Feb 23, 2018
404
73
28
SN2XXX / SB7XXX series: Replace your SSDs and make backups! The original Innodisk 3ME SSDs are prone to failure (one died while I was taking an image).
You can use whatever mSATA (apparently all except SN2100) or M.2 SATA (apparently only SN2100) you want. My go-to model are Transcend 452T2 (e.g. TS128GMSA452T2)
If upgrade recently onyx or mlnxos, the disk firmware Will also upgrade, so maybe replace ssd is not necessary.
in 2019 or 2020 , mellanox change the sn2100 and 2010 ssd to storfly.

In addtion, there also another serious issue about console cable. paste the contents here, just in case the webpage gone.
The switch's I2C bus hangs with Mellanox Serial Console cable (site.com)
Code:
The switch's I2C bus hangs with Mellanox Serial Console cable
Mar 8, 2020•Knowledge Article
The switch's I2C bus hangs with Mellanox Serial Console cable MLNX2-117-6533kn
Hardware issue in the current console cable.
A hardware issue with the provided console cable has been recently discovered and may affect the below switch systems.
The issue can cause the switch I2C buss to hang, resulting in a switch failure. 
Systems may experience slowness or complete inability to function.
Affected switch systems list: SX1710 SX1410 SX6710 SB7700 SB7800 SN2700 SN2410 CS7500 CS7510 CS7520

The provided console cable may cause the internal unit's I2C bus to hang, causing the entire switch system to hang as well.
The unnecessary usage of pins in the current RJ45 to DB9 cable harness may lead to an l2C bus hang.
It is a hardware issue in the current console cable that is provided with the above list of switch systems.
Please use the provided console cable only for the initial and first switch configuration.
Once completed, disconnect the cable and perform the rest of the configurations by using the network. 
A fix will be implemented in the future manufacturing of the console cable.
 
Last edited:

Aluminat

Member
Jul 5, 2019
56
23
8
- SN2XXX / SB7XXX series: Replace your SSDs and make backups! The original Innodisk 3ME SSDs are prone to failure (one died while I was taking an image).
Did you happen to have backup image file of SN2410? I have a pair this but during upgrade the SSDs failed.
 
  • Like
Reactions: eSk8er

NablaSquaredG

Bringing 100G switches to homelabs
Aug 17, 2020
1,618
1,072
113
Did you happen to have backup image file of SN2410? I have a pair this but during upgrade the SSDs failed.
You can mod an SN2700 / SN2100 / SB7700 image (disk backup) to work on SN2410. I will post the guide soon, in the meantime you can write me a PM if you haven't already been able to fix your SN2410?
 
  • Like
Reactions: Aluminat

uncensoredtr

New Member
Sep 13, 2022
5
0
1
Personally I would separate some topics into their own threads: the (m)sx 6000 series that are based on the switchx(2) asics in one thread, 100gbe (and 40+gbe that cam after the sx6000 series) in another and 100Gbit/s+ infiniband in another thread

The sx6000 series is the last series to feature vpi*, requires licenses to enable features, is (was?) better avaialable, using ppc for management
The sn2000 and newer series are ethernet only, don't require licenses for ceratian features**, use x86 based cpus for management
the sb series switches are infiniband only***

* nvidia sells a gateway appliance, but that is a 2u server with 4 dual port vpi adapters
** I don't have one and rely on the information that i have seen here in the forums and some pdfs from mellanox
*** infiniband is interesting but has many downsides and support was removed for some widely used applications
I think this will be way better for clarity through the topic. It will be too messy to be able to control the topic when it is too generalized.
 

nasbdh9

Active Member
Aug 4, 2019
180
114
43
  • Like
Reactions: evadne

NablaSquaredG

Bringing 100G switches to homelabs
Aug 17, 2020
1,618
1,072
113
onyx and MLNX-OS are exactly the same thing, although there may be differences between old and new firmware distributed on minor releases
example, although the version number of MLNX-OS 3.10.5000 is higher, the actual firmware for onyx devices (ETH) is the version on "onyx" 3.10.4006
that won't help him. They either need an installer version or someone who provides an image which can be modded to fit their switch ;)

Right now I'm a bit too busy to mod an SN2700 image for them, so maybe someone else can help.
 

nasbdh9

Active Member
Aug 4, 2019
180
114
43
After you have the above files, you will be able to replace the SSD on the device at will, but the non-original SSD will not make the SSD health report take effect.

Please also be careful not to try to use UEFI boot to install these on the SN2××× device, the system will think that the DMI information is incorrect, and then try to update the BIOS to restart continuously, although ONIE and the system itself support UEFI.
UEFI booting may be available on newer devices, but I haven't tried it.

The way to enter the BIOS settings is to press Ctrl+B during booting. If you are prompted to ask for a password, it may be admin.
 
  • Like
Reactions: lightsword

Freebsd1976

Active Member
Feb 23, 2018
404
73
28
After you have the above files, you will be able to replace the SSD on the device at will, but the non-original SSD will not make the SSD health report take effect.
use innodisk or storfly (msata / m.2 sata ) ssd to replace, then SSD health will work as usual

example, although the version number of MLNX-OS 3.10.5000 is higher, the actual firmware for onyx devices (ETH) is the version on "onyx" 3.10.4006
3.10.4206 lts
 
Last edited:

awedio

Active Member
Feb 24, 2012
779
228
43
I have instructions on how to install Onyx (aka MLNX-OS) on a SN2700.
Melanox support provided me with these instructions (long story).
 

nasbdh9

Active Member
Aug 4, 2019
180
114
43
Can you share them with me?
Uploaded a guide on MEGA

I recommend using DHCP and HFS (HFS ~ HTTP File Server), you can complete the reinstallation within 10 minutes, after Embed ONIE, the device reboots into the ONIE install environment, and then executes
onie-nos-install http://[IP]/X86_64-x.x.xxxx-installer.bin Wait for the installation to complete and the device will automatically restart
 

dbTH

Member
Apr 9, 2017
153
60
28
What USB to Serial adapter do you guys use to connect your laptop to the Mellanox switch serial console (especially the SN2xxx series)? There's an article that talked about the adapter selection (5 Steps for Selecting the Right USB to Serial adapter). But not sure if it is really worth to buy those recommended. I have an older adapter (pre-2012) with older Prolific PL2303HXA chip lying around , however, the adapter driver is no more compatible with the newer Windows OS. Looking to get an adapter with newer chip, and would like to hear your suggestions.