Mellanox (NVIDIA) NIC reliability & support contracts

lunadesign

Active Member
Aug 7, 2013
220
27
28
I generally buy Intel NICs but am looking to buy some ConnectX-5 NICs (new, not EBay) for the first time for use in a new TrueNAS box.

While the Intel NICs have "limited lifetime" warranties and unlimited access to firmware/software updates, it appears the ConnectX cards only have a 1 year warranty. It also appears you need to have an active support contract to get firmware/software updates. The support contacts also appear to extend the warranty's hardware coverage. Luckily, the "Silver" support contacts aren't super pricey.

Questions:
1) Are the ConnectX cards generally reliable?
2) Do people buying new ConnectX cards usually buy support contracts with them?
 

i386

Well-Known Member
Mar 18, 2016
2,993
959
113
33
Germany
1) From personal experience I would say yes. My oldest nic is a connect-x2 in a windows 10 workstation and still runs fine.
2) I don't think people buy "retail" mellanox nics, but get them in systems which are covered by support contracts/sla. (All my mellanox nics are from hp(e) servers)
 

klui

Active Member
Feb 3, 2019
545
241
43
You can obtain Mellanox NIC firmware from their website at Firmware Downloads without a service contract. I'm not sure why you thought that way. Maybe because of switch firmware? That's another business model.

While I like Mellanox cards and their general ability to cross flash, they do sometimes have their controllers fail. I've run into one but not something I did and I have read online several have experienced the same. Not sure the percentage but probably low. Maybe that's because many more people cross flash them do it incorrectly.
 

lunadesign

Active Member
Aug 7, 2013
220
27
28
1) From personal experience I would say yes. My oldest nic is a connect-x2 in a windows 10 workstation and still runs fine.
Good to know. Thanks!

2) I don't think people buy "retail" mellanox nics, but get them in systems which are covered by support contracts/sla. (All my mellanox nics are from hp(e) servers)
While that's very likely the prevalent way people get these NICs, I definitely see quite a few for sale through the normal retail channels. It makes sense that if you get it as part of a system that the system's support contact will cover it. But if you don't, 1 year seems awfully short for enterprise gear. On the flip side, making extra support years an option might mean the "base" price can be cheaper for those that don't care or keep spares to self-warranty, I guess.
 

lunadesign

Active Member
Aug 7, 2013
220
27
28
You can obtain Mellanox NIC firmware from their website at Firmware Downloads without a service contract. I'm not sure why you thought that way. Maybe because of switch firmware? That's another business model.
Well now I'm completely embarrassed as I see it's super easy to download firmware from that page.

The reason why I was thinking a support contract was needed was from the "bubble help" next to "Recommended Support" on store.mellanox.com pages like this one.

While I like Mellanox cards and their general ability to cross flash, they do sometimes have their controllers fail. I've run into one but not something I did and I have read online several have experienced the same. Not sure the percentage but probably low. Maybe that's because many more people cross flash them do it incorrectly.
OK, that's good to know. Thanks!
 

tinfoil3d

QSFP28
May 11, 2020
547
191
43
Japan
While it may be tricky right now to get fw for connectx-3 for newest OSes, since you're looking for current, new versions that doesn't apply. cx2 and cx3 still going strong after many years. What's more mellanox isn't as picky as intel and takes just about any module, without any RE and patching the flash.
And they also have some nice RDMA-related features. You can't go wrong with it.
 

jpmomo

Active Member
Aug 12, 2018
437
140
43
Well now I'm completely embarrassed as I see it's super easy to download firmware from that page.

The reason why I was thinking a support contract was needed was from the "bubble help" next to "Recommended Support" on store.mellanox.com pages like this one.


OK, that's good to know. Thanks!
up until recently, mellanox was the main vendor for 100G nics. When you buy a dell, hpe or lenovo and want a 100G nic, it is usually a mellanox (sometimes broadcom). the new e-810 series of intel nics are relatively new and most of the 100G dual port e-810 series nics only support 100G max. this is unlike the mellanox dual port 100G that support up to 200G (when using both ports at the same time and using them in a pci gen4 server slot.) They also make 200G and soon 400G nics. If you don't care about the higher speeds, the intel nics have been pretty popular/reliable.
 

lunadesign

Active Member
Aug 7, 2013
220
27
28
While it may be tricky right now to get fw for connectx-3 for newest OSes, since you're looking for current, new versions that doesn't apply. cx2 and cx3 still going strong after many years. What's more mellanox isn't as picky as intel and takes just about any module, without any RE and patching the flash.
And they also have some nice RDMA-related features. You can't go wrong with it.
This is great to hear. Thank you very much!
 

lunadesign

Active Member
Aug 7, 2013
220
27
28
up until recently, mellanox was the main vendor for 100G nics. When you buy a dell, hpe or lenovo and want a 100G nic, it is usually a mellanox (sometimes broadcom). the new e-810 series of intel nics are relatively new and most of the 100G dual port e-810 series nics only support 100G max. this is unlike the mellanox dual port 100G that support up to 200G (when using both ports at the same time and using them in a pci gen4 server slot.) They also make 200G and soon 400G nics. If you don't care about the higher speeds, the intel nics have been pretty popular/reliable.
I was very interested in the E810 NICs but couldn't find any cases where people got them working with FreeNAS/TrueNAS CORE beyond 10G. I'm guessing they are too new to have out-of-the-box drivers in FreeBSD. That said, Intel's site does have FreeBSD drivers so I'm a bit surprised to not see anyone getting them to work (maybe they work but the users are shy? :))

Meanwhile, I saw a handful of people reporting success with the ConnectX-4 and ConnectX-5 NICs (but curiously not the ConnectX-6 NICs) so I figured I'd play it safe and give Mellanox a whirl with two MCX515A-CCAT's. I'm looking forward to my first foray into 100G this weekend!
 

jpmomo

Active Member
Aug 12, 2018
437
140
43
It may not matter to you but if you want full bandwidth on a dual port 100GE nic and you are using PCI gen4 x16, you need to choose your nic model carefully. You need to select the nic that is PCI gen4. Only a couple of the connectx-5 meet this requirement.
 

lunadesign

Active Member
Aug 7, 2013
220
27
28
Squeezing out a full 100G does require some tweaking there, "one does not simply wget into 100gbit/s". For those who don't yet utilize fast networks, here's a good example for you if we're talking about linux, not bsd http://doc.tm.uka.de/2019-LCN-100g-tuning-authors-copy.pdf in a way it applies to both though.
Thank you! I'll definitely read this. I figured there would be some interesting tweaks, especially with respect to CPU usage, etc.

I haven't looked yet but if you know of any similar resources for VMware, please feel free to share.
 

lunadesign

Active Member
Aug 7, 2013
220
27
28
It may not matter to you but if you want full bandwidth on a dual port 100GE nic and you are using PCI gen4 x16, you need to choose your nic model carefully. You need to select the nic that is PCI gen4. Only a couple of the connectx-5 meet this requirement.
Thanks....I had already noticed this. You're referring to the "ConnectX-5 EX" subset of the ConnectX-5 line which does PCIe 4.0. This is not to be confused with "ConnectX-5 EN" which is PCIe 3.0.

PCIe 3.0 x16 is limited to 128 Gb/s in each direction, while PCIe 4.0 x16 can do 256 Gb/s. So if you want 2x100GbE and expect both ports to run full out, you definitely need PCIe 4.0. But if you expect bursty traffic, PCIe 3.0 x16 might be ok.

For dual port 100G, the only PCIe 4.0 model is MCX516A-CDAT. I was tempted to get this card but it was out of stock earlier this week and I haven't seen anyone specifically mentioning it in a FreeNAS/TrueNAS CORE build. I suspect this card is covered by the same mlx5en driver that covers the "EN" cards but haven't been able to confirm this.
 

jpmomo

Active Member
Aug 12, 2018
437
140
43
yes the CDAT with an emphasis on the "D" as it can be confusing. I use that specific card a lot but also use the connectx-6 and -6 DX as I need both ports to be running at 100Gbps each. I have used these with truenas for some storage benchmarking and they seemed to be ok. although, anytime you are dealing with pci gen4 and generic mb (including the supermicro), you need to be careful about the pci bus. Ping me if you decide to go down this road and I can give you some pointers as to what to look out for.
jp
 

tinfoil3d

QSFP28
May 11, 2020
547
191
43
Japan
although, anytime you are dealing with pci gen4 and generic mb (including the supermicro), you need to be careful about the pci bus. Ping me if you decide to go down this road and I can give you some pointers as to what to look out for.
I'm interested in what to look out for too