Mellanox Switches - Tips & Tricks

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

NablaSquaredG

Layer 1 Magician
Aug 17, 2020
803
411
63
Similar to the glorious Brocade ICX Series (cheap & powerful 10gbE/40gbE switching) Thread, I'd like to use this thread to compile some knowledge about Mellanox switches that is spread over many places in the forum, which can be hard to find for newbies.

I will expand this thread / post over time, so if you have some information that you'd like to have added, just write it in a comment.


As I don't have a lot of time right now (but want to get this done before I forget it), I'll just start with some basics



Overview of current Mellanox switch series that may be interesting for homelabs:

- Mellanox IS5000 Series: Old, Infiniband only switches. They should generally be avoided:

- Mellanox SX Series: 40G / 56G switches with Switch-X / SwitchX-2 chip. They come in different flavours (unmanaged / managed, full width, half width, etc... TODO: Full Model Listing). They are very energy efficient (TODO: Exact numbers). Most of them are PowerPC, but some are x86

Managed SX Series switches can do VPI - That means you can have some ports do Infiniband, some Ethernet at the same time and even use integrated IPoIB gateway functionality.

Highest supported firmware version (for managed switches): 3.6.8012
DO NOT try to update to a version beyond that on x86 switches - It may brick your switch by automatically doing a bios update that prevents the switch ASIC from being detected!


- Mellanox SB77XX/78XX series: 100G Infiniband switches with Switch-IB / Switch-IB 2 chip (no VPI like SX Series). x86 control plane, highest supported version: 3.9.3124
Very power efficient! SB7700 needs only 53W in IDLE (one PSU)

- Mellanox SN2XXX series: 100G Ethernet switches with Spectrum chip (no VPI like SX series). x86 control plane, highest supported version: ?? Apparently, it's 3.10.4100 (LTS)
Very power efficient. SN2700 needs only 51W in IDLE (one PSU), SN2100 about 35W iirc (after the fans have spun down)
They come in different flavours, but generally you have:
- SN2700: 19" 32x100G, Celeron 1047U, mSATA SSD (TODO: Expand)
- SN2100: 9.5" (half-width), Atom C2XXX (possibly affected by AVR54 bug?), M.2 SATA SSD
... and more (TODO: Full model listing)


Tips & Tricks for working with those switches:

- SN2XXX / SB7XXX series: Replace your SSDs and make backups! The original Innodisk 3ME SSDs are prone to failure (one died while I was taking an image).
You can use whatever mSATA (apparently all except SN2100) or M.2 SATA (apparently only SN2100) you want. My go-to model are Transcend 452T2 (e.g. TS128GMSA452T2)

- If you get an SN2XXX with ONIE, Cumulus or no OS, you can easily flash it to ONYX / MLNX-OS by taking a good MLNX-OS / ONYX image from another switch and patching the embedded database to the correct model number, number of ports, MAC Addresses, etc... (TODO: Guide)

TODO: Expand
 

i386

Well-Known Member
Mar 18, 2016
3,656
1,262
113
33
Germany
Personally I would separate some topics into their own threads: the (m)sx 6000 series that are based on the switchx(2) asics in one thread, 100gbe (and 40+gbe that cam after the sx6000 series) in another and 100Gbit/s+ infiniband in another thread

The sx6000 series is the last series to feature vpi*, requires licenses to enable features, is (was?) better avaialable, using ppc for management
The sn2000 and newer series are ethernet only, don't require licenses for ceratian features**, use x86 based cpus for management
the sb series switches are infiniband only***

* nvidia sells a gateway appliance, but that is a 2u server with 4 dual port vpi adapters
** I don't have one and rely on the information that i have seen here in the forums and some pdfs from mellanox
*** infiniband is interesting but has many downsides and support was removed for some widely used applications
 

Freebsd1976

Active Member
Feb 23, 2018
362
66
28
SN2XXX / SB7XXX series: Replace your SSDs and make backups! The original Innodisk 3ME SSDs are prone to failure (one died while I was taking an image).
You can use whatever mSATA (apparently all except SN2100) or M.2 SATA (apparently only SN2100) you want. My go-to model are Transcend 452T2 (e.g. TS128GMSA452T2)
If upgrade recently onyx or mlnxos, the disk firmware Will also upgrade, so maybe replace ssd is not necessary.
in 2019 or 2020 , mellanox change the sn2100 and 2010 ssd to storfly.

In addtion, there also another serious issue about console cable. paste the contents here, just in case the webpage gone.
The switch's I2C bus hangs with Mellanox Serial Console cable (site.com)
Code:
The switch's I2C bus hangs with Mellanox Serial Console cable
Mar 8, 2020•Knowledge Article
The switch's I2C bus hangs with Mellanox Serial Console cable MLNX2-117-6533kn
Hardware issue in the current console cable.
A hardware issue with the provided console cable has been recently discovered and may affect the below switch systems.
The issue can cause the switch I2C buss to hang, resulting in a switch failure. 
Systems may experience slowness or complete inability to function.
Affected switch systems list: SX1710 SX1410 SX6710 SB7700 SB7800 SN2700 SN2410 CS7500 CS7510 CS7520

The provided console cable may cause the internal unit's I2C bus to hang, causing the entire switch system to hang as well.
The unnecessary usage of pins in the current RJ45 to DB9 cable harness may lead to an l2C bus hang.
It is a hardware issue in the current console cable that is provided with the above list of switch systems.
Please use the provided console cable only for the initial and first switch configuration.
Once completed, disconnect the cable and perform the rest of the configurations by using the network. 
A fix will be implemented in the future manufacturing of the console cable.
 
Last edited:

Aluminat

Member
Jul 5, 2019
43
16
8
- SN2XXX / SB7XXX series: Replace your SSDs and make backups! The original Innodisk 3ME SSDs are prone to failure (one died while I was taking an image).
Did you happen to have backup image file of SN2410? I have a pair this but during upgrade the SSDs failed.
 

NablaSquaredG

Layer 1 Magician
Aug 17, 2020
803
411
63
Did you happen to have backup image file of SN2410? I have a pair this but during upgrade the SSDs failed.
You can mod an SN2700 / SN2100 / SB7700 image (disk backup) to work on SN2410. I will post the guide soon, in the meantime you can write me a PM if you haven't already been able to fix your SN2410?
 
  • Like
Reactions: Aluminat

uncensoredtr

New Member
Sep 13, 2022
5
0
1
Personally I would separate some topics into their own threads: the (m)sx 6000 series that are based on the switchx(2) asics in one thread, 100gbe (and 40+gbe that cam after the sx6000 series) in another and 100Gbit/s+ infiniband in another thread

The sx6000 series is the last series to feature vpi*, requires licenses to enable features, is (was?) better avaialable, using ppc for management
The sn2000 and newer series are ethernet only, don't require licenses for ceratian features**, use x86 based cpus for management
the sb series switches are infiniband only***

* nvidia sells a gateway appliance, but that is a 2u server with 4 dual port vpi adapters
** I don't have one and rely on the information that i have seen here in the forums and some pdfs from mellanox
*** infiniband is interesting but has many downsides and support was removed for some widely used applications
I think this will be way better for clarity through the topic. It will be too messy to be able to control the topic when it is too generalized.