Mellanox OFED drives not working correctly in ESXi 6.0U3 Patch 22?

WANg

Well-Known Member
Jun 10, 2018
984
581
93
Folks -

Does anyone here use Mellanox ConnectX3 cards with ESXi 6.0 U3 Patch 22 or above? I installed OFED v2.4 (the latest version available for this card and can support SRIOV), removed the native drivers (nmlx4_core, nmxl4_en and nmxl4_rdma), and for some reason the cards are not showing up on the admin web console after a reboot. The drivers are installed, the modules show up as enabled when I do an

esxcli system module list

When I do a software inventory with:

esxcli software vib list | grep MEL

It shows the driver as installed...

And lspci shows the cards as present.

Update: Hmmm...Crap.
Looks like mlx4_core is puking up on driver init.

INI:
2020-05-01T19:24:43.199Z cpu0:33146)PCI: driver mlx4_core is looking for devices                                                                                                                                                                                                    
<6>mlx4_core: Mellanox ConnectX core driver v1.1 (Jan 31 2016)                                                                                                                                                                                                                      
2020-05-01T19:24:43.199Z cpu0:33146)<6>mlx4_core: Initializing 0000:01:00.0                                                                                                                                                                                                        
2020-05-01T19:24:43.199Z cpu0:33146)DMA: 646: DMA Engine 'vmklnxpci-0:1:0.0' created using mapper 'DMANull'.                                                                                                                                                                        
2020-05-01T19:24:43.199Z cpu0:33146)DMA: 646: DMA Engine 'vmklnxpci-0:1:0.0' created using mapper 'DMANull'.                                                                                                                                                                        
2020-05-01T19:24:43.199Z cpu0:33146)DMA: 646: DMA Engine 'vmklnxpci-0:1:0.0' created using mapper 'DMANull'.                                                                                                                                                                        
2020-05-01T19:24:43.199Z cpu0:33146)DMA: 691: DMA Engine 'vmklnxpci-0:1:0.0' destroyed.                                                                                                                                                                                            
2020-05-01T19:24:48.256Z cpu6:33146)<4>si_meminfo called, stub!                                                                                                                                                                                                                    
2020-05-01T19:24:48.867Z cpu6:33146)vmklinux: alloc_pages:1010: This message has repeated 1 times: gfp_mask=0x202d2, order=0x0, vmk_PktSlabAllocPage returned 'Out of memory'                                                                                                      
2020-05-01T19:24:48.867Z cpu6:33146)vmklinux: alloc_pages:1010: This message has repeated 2 times: gfp_mask=0x202d2, order=0x0, vmk_PktSlabAllocPage returned 'Out of memory'                                                                                                      
2020-05-01T19:24:48.955Z cpu6:33146)<3>mlx4_core 0000:01:00.0: Failed to map MCG context memory, aborting.                                                                                                                                                                          
2020-05-01T19:24:49.972Z cpu6:33146)WARNING: vmklinux: pci_announce_device:1486: PCI: driver mlx4_core probe failed for device 0000:01:00.0                                                                                                                                        
2020-05-01T19:24:49.972Z cpu6:33146)LinPCI: LinuxPCI_DeviceUnclaimed:257: Device 0000:01:00.0 unclaimed.                                                                                                                                                                            
2020-05-01T19:24:49.972Z cpu6:33146)PCI: driver mlx4_core claimed 0 device                                                                                                                                                                                                          
2020-05-01T19:24:49.972Z cpu6:33146)Mod: 4947: Initialization of mlx4_core succeeded with module ID 4126.                                                                                                                                                                          
2020-05-01T19:24:49.972Z cpu6:33146)mlx4_core loaded successfully.
Not sure why it is doing that - nmlx4 (the "native" 6.0 drivers) works just fine on the card, but this one just seem to faceplant pretty badly.
 
Last edited:

WANg

Well-Known Member
Jun 10, 2018
984
581
93
Okay, figured it out.

It turns out that the latest Mellanox OFED drivers (2.4.0) does not play well with the hardware in question - removed it, replaced it with 1.9 and the drivers loaded fine this time around. Although I just upgraded to 6.5 so the MLNX OFED 2.4 might need to go back in again.
 
  • Like
Reactions: klui

klui

Active Member
Feb 3, 2019
334
138
43
Are you installing it because you're using Infiniband and SR-IOV? ESXi 7 recognizes the CX3 natively. I have no need for SR-IOV but I don't use IB either.
 

WANg

Well-Known Member
Jun 10, 2018
984
581
93
I was doing SR-IOV testing. The native ConnectX-3 drivers (nmlx4) does not and will not support SR-IOV.