I got a bunch of these Mellanox ConnectX HP branded MT26448 cards on eBay for use in a Proxmox cluster, and mostly they're fine. However, once every 3 - 5 reboots, they don't come up and we see the this in the logs:
I've found a few posts about this issue in various places, but none with a solution. I know these are popular cards here, so I thought I'd ask if anyone here has experienced this, and if they found a solution.
More info:
When they do come up normally, dmesg output looks like this:
Any suggestions, or are these just too old to bother with at this point? If so, what's a reliable cost effective alternative? It needs 2x 10G Ethernet ports.
I've found a few posts about this issue in various places, but none with a solution. I know these are popular cards here, so I thought I'd ask if anyone here has experienced this, and if they found a solution.
More info:
Code:
root@sb1:~# lspci | grep Mellanox
01:00.0 Ethernet controller: Mellanox Technologies MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s] (rev a0)
root@sb1:~# dmidecode -t 2
# dmidecode 3.0
Getting SMBIOS data from sysfs.
SMBIOS 3.0 present.
Handle 0x0002, DMI type 2, 15 bytes
Base Board Information
Manufacturer: MSI
Product Name: B250M BAZOOKA (MS-7A70)
Version: 1.0
Serial Number: H716080452
Asset Tag: Default string
Features:
Board is a hosting board
Board is replaceable
Location In Chassis: Default string
Chassis Handle: 0x0003
Type: Motherboard
Contained Object Handles: 0
Code:
root@sb1:~# dmesg | grep mlx
[ 0.892052] mlx4_core: Mellanox ConnectX core driver v4.0-0
[ 0.892057] mlx4_core: Initializing 0000:01:00.0
[ 4.597322] mlx4_core 0000:01:00.0: PCIe BW is different than device's capability
[ 4.597323] mlx4_core 0000:01:00.0: PCIe link speed is 2.5GT/s, device supports 5.0GT/s
[ 4.597324] mlx4_core 0000:01:00.0: PCIe link width is x8, device supports x8
[ 4.621739] mlx4_en: Mellanox ConnectX HCA Ethernet driver v4.0-0
[ 4.621806] mlx4_en 0000:01:00.0: UDP RSS is not supported on this device
[ 4.621824] mlx4_en 0000:01:00.0: Activating port:1
[ 4.621917] mlx4_en: 0000:01:00.0: Port 1: enabling only PFC DCB ops
[ 4.623159] mlx4_en: 0000:01:00.0: Port 1: Using 4 TX rings
[ 4.623160] mlx4_en: 0000:01:00.0: Port 1: Using 4 RX rings
[ 4.623287] mlx4_en: 0000:01:00.0: Port 1: Initializing port
[ 5.436650] mlx4_en 0000:01:00.0: Activating port:2
[ 5.436686] mlx4_en: 0000:01:00.0: Port 2: enabling only PFC DCB ops
[ 5.437496] mlx4_en: 0000:01:00.0: Port 2: Using 4 TX rings
[ 5.437497] mlx4_en: 0000:01:00.0: Port 2: Using 4 RX rings
[ 6.136744] mlx4_en: eth0: Link Up
[ 6.137102] mlx4_en: 0000:01:00.0: Port 2: Initializing port
[ 6.322781] mlx4_core 0000:01:00.0 enp1s0: renamed from eth0
[ 6.839398] mlx4_en: enp1s0: Link Down
[ 6.948461] mlx4_core 0000:01:00.0 enp1s0d1: renamed from eth0
[ 7.540782] mlx4_en: enp1s0d1: Link Up
[ 7.541078] mlx4_en: enp1s0: Link Up
[ 7.541079] mlx4_en: enp1s0d1: Link Down
[ 8.295956] mlx4_en: enp1s0: Steering Mode 0
[ 8.852097] mlx4_en: enp1s0d1: Link Up
[ 7325.449436] mlx4_en: enp1s0d1: Steering Mode 0