Mellanox Switches - Tips & Tricks

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

simplex6

New Member
Feb 1, 2025
16
2
3
Have anyone tried running the switch not only with a custom fan, but a custom PSU?

I was curious if I could just have a separate PSU grid, connect the pmbuses together somehow, and feed both my servers and switches from it directly.
 

Blue)(Fusion

Active Member
Mar 1, 2017
162
62
28
Chicago
Does anyone have IPv6 working correctly on a management or VLAN interface? I have the following and it does not appear to work.

Code:
# ip addr
.....
2: mgmt0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether e4:1d:2d:cd:be:8a brd ff:ff:ff:ff:ff:ff
inet 10.99.99.6/30 brd 10.99.99.7 scope global mgmt0
valid_lft forever preferred_lft forever
inet6 fd33:58bc:59a0:9991::6/126 scope global
valid_lft forever preferred_lft forever
inet6 fe80::e61d:2dff:fecd:be8a/64 scope link
valid_lft forever preferred_lft forever
.....
Code:
# ip -6 route
fd33:58bc:59a0:9991::4/126 dev mgmt0  proto kernel  metric 256  
fe80::/64 dev mgmt0  proto kernel  metric 256
default via fe80::1 dev mgmt0  proto ra  metric 1024  expires 1749sec hoplimit 64
Can ping the router link-local address.
Code:
# ping6 -I mgmt0 fe80::1
PING fe80::1(fe80::1) from fe80::e61d:2dff:fecd:be8a mgmt0: 56 data bytes
64 bytes from fe80::1: icmp_seq=1 ttl=64 time=0.343 ms
64 bytes from fe80::1: icmp_seq=2 ttl=64 time=0.309 ms
^C
--- fe80::1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.309/0.326/0.343/0.017 ms
Can't ping the router ULA.
Code:
# ping6 fd33:58bc:59a0:9991::5
PING fd33:58bc:59a0:9991::5(fd33:58bc:59a0:9991::5) 56 data bytes
^C
--- fd33:58bc:59a0:9991::5 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1007ms
mgmt0 info:
Code:
# show int mgmt0

Interface mgmt0 status:
  Comment         : 
  Admin up        : yes
  Link up         : yes
  DHCP running    : no
  IP address      : 10.99.99.6
  Netmask         : 255.255.255.252
  IPv6 enabled    : yes
  Autoconf enabled: yes
  Autoconf route  : yes
  Autoconf privacy: no
  DHCPv6 running  : yes (but no valid lease)
  IPv6 addresses  : 2

  IPv6 address:
    fd33:58bc:59a0:9991::6/126
    fe80::e61d:2dff:fecd:be8a/64

  Speed           : 1000Mb/s (auto)
  Duplex          : full (auto)
  Interface type  : ethernet
  Interface source: physical
  MTU             : 1500
  HW address      : E4:1D:2D:CD:BE:8A
mgmt0 interface config info:
Code:
##
## Network interface configuration
##
   interface ib0 ip address 0.0.0.0 /0
   interface ib0 shutdown
no interface mgmt0 dhcp
   interface mgmt0 ip address 10.99.99.6 /30
   
##
## Network interface IPv6 configuration
##
   interface mgmt0 ipv6 address autoconfig
   interface mgmt0 ipv6 address fd33:58bc:59a0:9991::6/126
ICX port config:.
Code:
interface ethernet 1/1/48
 port-name swmlnx-mgmt0
 route-only
 ip address 10.99.99.5 255.255.255.252
 ip ospf area 0
 ipv6 address fe80::1 link-local
 ipv6 address fd33:58bc:59a0:9991::5/126
 ipv6 enable
 ipv6 ospf area 0
 ipv6 dhcp-relay destination fd33:58bc:59a0:2301::10
 ipv6 dhcp-relay destination fd33:58bc:59a0:2301::11
 ipv6 dhcp-relay include-options interface-id remote-id
 ipv6 nd other-config-flag
 ipv6 nd prefix-advertisement fd33:58bc:59a0:9991::4/126 43200 21600 onlink
 no spanning-tree
 no flow-control
 stp-bpdu-guard
 

Cheburashka

New Member
Aug 10, 2020
17
0
1
Does anybody know what the packet buffer size is on a SX6036?

ChatGPT is saying 128MB per port group but I cannot find any documentation that lists this.

I'm comparing to Arista 40GbE capable switches that list 32MB so I'm doubting the SX6036 for being 128MB.

Anybody know what this is? it seems that Mellanox is secretive with their buffer sizes, not sure why.
 

NablaSquaredG

Bringing 100G switches to homelabs
Aug 17, 2020
1,854
1,234
113
I'm comparing to Arista 40GbE capable switches that list 32MB
The Aristas have 12MB (or 16MB for Trident2+) shared buffer per ASIC. Most Arista 7050X have only one ASIC.

SX series:
Per Port fixed size 128 KB for 40/56 Gb/s ports and 64 KB for SFP+ ports
 
  • Like
Reactions: Cheburashka

Cheburashka

New Member
Aug 10, 2020
17
0
1
The Aristas have 12MB (or 16MB for Trident2+) shared buffer per ASIC. Most Arista 7050X have only one ASIC.

SX series:
Per Port fixed size 128 KB for 40/56 Gb/s ports and 64 KB for SFP+ ports
Thank you,

I was looking more at the Arista 7280 series. I need at most 6 40GbE ports for a 5-Node Proxmox/CEPH build and from some discussions on reddit, I should be aiming for 32-64MB and greater buffer.

Back to the drawing board.
 

NablaSquaredG

Bringing 100G switches to homelabs
Aug 17, 2020
1,854
1,234
113
I was looking more at the Arista 7280 series. I need at most 6 40GbE ports for a 5-Node Proxmox/CEPH build and from some discussions on reddit, I should be aiming for 32-64MB and greater buffer.
That’s nonsense.

7280SE are a dead end and a space heater, avoid them.
7280R are too expensive.

Go for a 7050QX-32S (the S is important) and be happy :)
 

calmserene

New Member
Jun 27, 2025
1
0
1
Another question regarding Mellanox SX6012:
- I'm planing to connect this switch and Mikrotik CCR2116-12G-4S+ with QSFP+ to 4xSFP+ breakout cable (Mikrotik has 4 x SFP+ port) and to do interface bonding, to get 40Gbit between those two devices. In theory it should work. Anyone have any experience in similar matter ?
Thanks
I am wondering how loud is the CCR2116-12G-4S+.
 

BoGs

Active Member
Feb 18, 2019
167
39
28
I am wondering how loud is the CCR2116-12G-4S+.
This is the setup I have running CCR2116 with MLAG to 2x 6036. It is as quiet as the SX6036 as you hear the air but not a fan whine. Remember any one connection will only be able to do max 10G 4x10G != 1x40G
 

Rakkzi

New Member
Sep 14, 2024
5
0
1
- To download firmware from an USB stick, use image fetch scp://admin:admin@127.0.0.1/var/mnt/usb1/image-X86_64-3.10.4404.img
I was running into problems trying to update my SX1024 over the webUI (kept getting a connection reset error after a few minutes), so I tried sftp and the same thing kept happening, so I decided to put the firmware files on a FAT32 USB drive and plug it into the back. I can see the USB drive appearing in the logs but the above command and the one on the webUI (the update field with scp://admin@localhost/var/mnt/usb1/image.img and the password field filled with my admin pw), keep telling me there's no file or directory. I have the file(s) on the root of the only partition (4GB) on the USB drive. Tried running it on the webUI and the image fetch on the terminal but no dice so far.. what am I missing?

edit so anyone else who finds this knows: I originally tried to upgrade from 3.3 straight to 3.6 which didn't work, so I switched to incremental upgrades (ie 3.3->3.4->3.5->3.6), and when 3.4 still wasn't working I switched to using the images from nvidia directly instead of the firmware images from the lenovo archive and they started working perfectly. I took the https://www.mellanox.com/downloads/Software/image-PPC_M460EX-3.6.8012.img link and changed it to match the version I needed to update to, ie for 3.4 it became https://www.mellanox.com/downloads/Software/image-PPC_M460EX-3.4.2008.img.

I was able to do it through the webUI so I never figured out what was up with the USB but there ya go.
 
Last edited:

BoGs

Active Member
Feb 18, 2019
167
39
28
I have SN2700 that randomly starts increasing its fan speed the highest temp is 35C - which I am guessing is a gate to recalculate which increases the fan speed from 30ish percent (6k rpm) to 8kish rpm. Anyone else notice that even after running the fae command. My room temperature oscilates between 19-22c which makes the switch go from 32-36 depending on day.
 

cy384

Member
Aug 19, 2022
28
28
13
cy384.com
I finally acquired an SN2010 and started messing around with the various OS options. Just for fun, I stuck OpenWRT on it. There are instructions on their wiki to install the firmware and kernel modules needed for the switch, generally pretty easy. There were a few extra tweaks, like I had to write a script to get usable names for the ports, mess with the default bridges, disable dhcp, edit some minor config files to get things to show up in the web UI. Overall, works fine, since it's just Linux. Mellanox did an astonishingly good job of upstreaming all their code AND documenting how to configure stuff (on github), so I think everything you can do in Onyx is doable without it (though probably less convenient). Even fans, LEDs, and temperatures are all working perfectly.

Anyway, it's a weirdly viable option, especially if you're familiar with OpenWRT already.
 
  • Like
Reactions: klui and blunden

Rakkzi

New Member
Sep 14, 2024
5
0
1
Trying to figure out how to enable VPI on my SX1024, seems the IB license key doesn't enable it though..
 

Rakkzi

New Member
Sep 14, 2024
5
0
1
Only eth can be loaded, sx1024 has no corresponding vpi model (sx60**) that can be converted
Really? This goes against what the Nvidia whitepaper says:
"Through a software upgrade, the SX1024 switch system is ready for adding InfiniBand functionality and enabling a Virtual Protocol Interconnect (VPI) gateway"

The interface also mentions needing a VPI profile, does it really show these even if it's impossible to enable?
1751765327327.png
 

up3up4

Member
Jun 10, 2018
92
33
18
I finally acquired an SN2010 and started messing around with the various OS options. Just for fun, I stuck OpenWRT on it. There are instructions on their wiki to install the firmware and kernel modules needed for the switch, generally pretty easy. There were a few extra tweaks, like I had to write a script to get usable names for the ports, mess with the default bridges, disable dhcp, edit some minor config files to get things to show up in the web UI. Overall, works fine, since it's just Linux. Mellanox did an astonishingly good job of upstreaming all their code AND documenting how to configure stuff (on github), so I think everything you can do in Onyx is doable without it (though probably less convenient). Even fans, LEDs, and temperatures are all working perfectly.

Anyway, it's a weirdly viable option, especially if you're familiar with OpenWRT already.
Is the dual core 1.4G Celeron powerful to handle 3Gbit WAN?
 

blunden

Well-Known Member
Nov 29, 2019
981
314
63
Is the dual core 1.4G Celeron powerful to handle 3Gbit WAN?
I kind of doubt it. I don't think he flashed OpenWrt on it to use it as a router. It was presumably only a way to get switch functionality with an up-to-date software stack. :)