Mellanox Switches - Tips & Tricks

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Koop

Well-Known Member
Jan 24, 2024
369
267
63
What do you mean by Firmware? BIOS? ASIC?

Those are included in the Network Operating System like Cumulus. How you acquire newer versions of Cumulus is a different thing.
Hmm I guess everything? I am referring to the part of your OP at the top where you advise
"3.10.4XXX (3.10.4404 as of 24th January 2024)
(DO NOT upgrade to 3.10.5000, 3.10.6004, 3.11.XXXX etc.)"


When you refer to it as firmware I thought that meant this it's something that lives outside the OS and had to be updated separately somehow. I see now that it's the version of ONYX you're referring to, right?

What I did was punch in the switch model on NVIDIA's side and it listed what versions of cumulus or ONYX I could install on it as well as upgrade paths if was already on an existing version. I see it allowed me to select up to 3.10.4504 which is inline with your recommendation. That makes sense now.

What had originally added to my confusion is that there is a "firmware" selection over at NVIDIA that asks for a PSID. That further confused me into thinking perhaps I was missing something as part of the update progress. I guess if I gave it a PSID it would just point me to the appropriate version of ONYX.

As for access to cumulus and the such... That isn't a problem for me. Would not have bought any hardware otherwise. I know how it goes.

Appreciate you all sharing knowledge with a noob.

If there's any recommendations or guides for switch configurations I'd be interested to see them. I was able to set up a basic network and talk between hosts. If there's material out there for what I should/could do next I would appreciate any links or guidance.

When I got to the latest cumulus (well based on what was recommended, which was 5.10.1) I noticed my fans went from always being ramped up to very quiet. So that was cool. Not so loud in my garage now.

For the record my switch was manufactured in 2018 and had Cumulus 3.6.2 on it. Had a full config on it for some IBM solution. Second switch (which I can't get serial working on and make a topic on) supposedly came from the same environment. I got a replacement memory DIMM today to swap in that one to test it further. Will hopefully get time to work on that this evening.
 
Last edited:

NablaSquaredG

Bringing 100G switches to homelabs
Aug 17, 2020
1,618
1,072
113
that it's the version of ONYX you're referring to, right?
Yes I have updated the initial post

I got a replacement memory DIMM today to swap in that one to test it further. Will hopefully get time to work on that this evening.
You can try to swap in a working SSD from the other switch, configured so that it fetches an IP on the management port via DHCP. This way you can test whether it's just the serial port that's defective or the entire switch.
 

Koop

Well-Known Member
Jan 24, 2024
369
267
63
swap in a working SSD from the other switch, configured so that it fetches an IP on the management port via DHCP. This way you can test whether it's just the serial port that's defective or the entire switch.
That... Is a great idea. I will report back.
 

Koop

Well-Known Member
Jan 24, 2024
369
267
63
So far nobody here shared 3.10.4504, but it seems was released this summer and had a fix for CVE-2024-0113...
Oh I see.... I am sure distribution of software is a no no? Hopefully nobody reaches out to me directly with such a request as I would, of course, rightfully refuse to assist with anything questionable.
 

j0t4

New Member
Oct 18, 2024
1
0
1
Hello, greetings to all forum members and visitors, I'm pretty new to mellanox switches management, but so far I've learned a little bit, I'm facing this issue, i have two Mellanox MSX6720 which I'm trying to interconnect using this resources:
  1. Cable: Fiber Patch Cable, LC UPC to LC UPC, Duplex, 2 Fibers, Multimode (OM4), Riser (OFNR), 2.0mm, Tight-Buffered, Aqua 100M length.
  2. Transceivers: Cisco transceivers QSFP-40G-SR-BD.
Current ports config and transceiver diagnostic of each switch are in attached file.
Although, if instead of connecting both mellanox, I connect one mellanox to a cisco Nexus3132Q-X it will effectively work as showed here:

test-ciscoN3000(config)# show interface ethernet 1/15
Ethernet1/15 is up
admin state is up, Dedicated Interface
Hardware: 40000 Ethernet, address: 0078.88dc.8920 (bia 0078.88dc.8920)
MTU 1500 bytes, BW 40000000 Kbit, DLY 10 usec
reliability 255/255, txload 1/255, rxload 1/255
Encapsulation ARPA, medium is broadcast
Port mode is access
full-duplex, 40 Gb/s, media type is 40G
Beacon is turned off
Auto-Negotiation is turned on FEC mode is Auto
Input flow-control is off, output flow-control is off
Auto-mdix is turned off
Rate mode is dedicated
Switchport monitor is off
EtherType is 0x8100
EEE (efficient-ethernet) : n/a
admin fec state is auto, oper fec state is off
Last link flapped 00:29:23
Last clearing of "show interface" counters never
1 interface resets
Load-Interval #1: 30 seconds
30 seconds input rate 9616 bits/sec, 8 packets/sec
30 seconds output rate 864 bits/sec, 0 packets/sec
input rate 9.62 Kbps, 8 pps; output rate 864 bps, 0 pps
Load-Interval #2: 5 minute (300 seconds)
300 seconds input rate 15824 bits/sec, 9 packets/sec
300 seconds output rate 504 bits/sec, 0 packets/sec
input rate 15.82 Kbps, 9 pps; output rate 504 bps, 0 pps
RX
1724 unicast packets 15009 multicast packets 18713 broadcast packets
35446 input packets 7584112 bytes
1085 jumbo packets 0 storm suppression packets
0 runts 0 giants 0 CRC 0 no buffer
0 input error 0 short frame 0 overrun 0 underrun 0 ignored
0 watchdog 0 bad etype drop 0 bad proto drop 0 if down drop
0 input with dribble 0 input discard
0 Rx pause
TX
0 unicast packets 106 multicast packets 2748 broadcast packets
2854 output packets 211619 bytes
0 jumbo packets
0 output error 0 collision 0 deferred 0 late collision
0 lost carrier 0 no carrier 0 babble 0 output discard
0 Tx pause

If anyone has experienced similar or same issue or anybody could have an idea of why isn't working interconnection between the two Mellanox i will really appreciate your help.
Thanks
 

Attachments

BoGs

Member
Feb 18, 2019
65
11
8
I acquired an open box sealed HP SN2700 :) which comes with ONIE installed already. I will check the SSD and replace if its the bad versions metioned in this thread and go through the process of recovery and update.

If I were not to do that as the SSD is different I assume I could just install the onyx os since ONIE is already setup, if that is the case would I need to go through the versions or can I install latest? Not sure if the asic will like that or not. Did not see this mentioned anywhere in the thread.

Thanks in advance.
 

Civiloid

Member
Jan 15, 2024
89
59
18
Switzerland
would I need to go through the versions or can I install latest?
You should follow the suggested upgrade path, which comes from Mellanox and is there for a reason. There is a chance it would be fine to install the latest, but it is not designed for that and could potentially miss some updates (e.x. to the BIOS or SSD firmware) that were included in the older versions.
 
  • Like
Reactions: BoGs

i386

Well-Known Member
Mar 18, 2016
4,448
1,663
113
35
Germany
C2P = (power) connectors to (network) port (aka rear to front airflow for tor setups)
P2C = (network) ports to (power) connector (front to rear airflow)
@NablaSquaredG could you add this to the first post? :D
 

klui

༺༻
Feb 3, 2019
924
530
93
C2P = (power) connectors to (network) port (aka rear to front airflow for tor setups)
P2C = (network) ports to (power) connector (front to rear airflow)
This isn't correct.

The terminology is described in the Note at Cable Installation
  • C2P = connector to power (ports to PSU flow); AFO (air flow out of chassis [from PSU's perspective]) in Juniper nomenclature; no suffix for most other vendors
  • P2C = power to connector (PSU to port flow); AFI (air flow into chassis [from PSU's perspective]) in Juniper nomenclature; -R for most other vendors
 

Conclude1189

New Member
Jun 6, 2022
4
0
1
My apologies if this is the wrong place to ask. Someone is trying to sell me a Mellanox SN2010 for 750 euros, and I was wondering if that’s a good price for the switch nowadays. I checked the manufacturer date, and it’s 2018-05-17. Do you think this is a reasonable price for it?

I’ve also heard that ONYX is going out of support, but I don’t fully understand what that means or how it might affect the price of these switches. Please keep in mind that I’m a complete newbie, and this is literally my first switch ever.
 

NablaSquaredG

Bringing 100G switches to homelabs
Aug 17, 2020
1,618
1,072
113
Someone is trying to sell me a Mellanox SN2010 for 750 euros, and I was wondering if that’s a good price
It’s a fair price. They usually go around the 1000€


I checked the manufacturer date, and it’s 2018-05-17. Do you think this is a reasonable price for it?
The switch may be affected by the Intel Atom C2000 / AVR54 bug. Does it have a rev printed on the pull-out tag?
What OS does it have installed?


I’ve also heard that ONYX is going out of support, but I don’t fully understand what that means or how it might affect the price of these switches. Please keep in mind that I’m a complete newbie, and this is literally my first switch ever.
Onyx will not get any new features. I recommend Cumulus Linux instead. If you ask nicely, someone might send you a setup.
 

Civiloid

Member
Jan 15, 2024
89
59
18
Switzerland
The switch may be affected by the Intel Atom C2000 / AVR54 bug. Does it have a rev printed on the pull-out tag?
If the switch was manufactured after ~ the beginning of 2017, it should have at least a wire-fix or PCB level fix applied by the manufacturer, and if it was produced after 2018 - there is a good chance that it will have a newer stepping that was not affected.
 

Conclude1189

New Member
Jun 6, 2022
4
0
1
The switch may be affected by the Intel Atom C2000 / AVR54 bug. Does it have a rev printed on the pull-out tag?
I see `Rev: A4` on the pull-out tag. Is that what you mean?

What OS does it have installed?
Onyx 3.6.600

Onyx will not get any new features. I recommend Cumulus Linux instead. If you ask nicely, someone might send you a setup.
Perfect. So this means the switch is still usable and it'll get some updates/patches right?
 

NablaSquaredG

Bringing 100G switches to homelabs
Aug 17, 2020
1,618
1,072
113
~ the beginning of 2017, it should have at least a wire-fix or PCB level fix applied by the manufacturer,
Does such a fix exist for the Sn2000 series? Never seen any evidence of it. I only know that it exists for DX010
 

Civiloid

Member
Jan 15, 2024
89
59
18
Switzerland
Does such a fix exist for the Sn2000 series? Never seen any evidence of it. I only know that it exists for DX010
II have 3 SN2100s with C2000 C0 stepping, and those are the only ones of SN2000 series I've seen. So to be honest I don't know, but idea for the fix is the same for all of them, so should be possible to apply even with minimal soldering skills.