6.7U1 Vsan doesn't seem to work with Connectx-3

Discussion in 'VMware, VirtualBox, Citrix' started by justinm001, Nov 7, 2018.

  1. justinm001

    justinm001 New Member

    Joined:
    Apr 11, 2018
    Messages:
    19
    Likes Received:
    1
    I've been trying to upgrade 6.7 to U1 and am unable to get vSAN service added to VMkernel. Just get general vsan error when trying. Even worse is I can get it all setup over 1G copper but when I add the drivers to the infiniband NIC and get them online it removes the vSAN service from the 1g link.

    Been pulling my hair out on this and tried everything including new ESXI install and new Vsphere with new cluster. If I run nested ESXI 6.7U1 it'll work fine but i'm guessing its because it sees vmxnet3 NIC and not Connectx-3.

    Now to try other drivers than the MLNX-OFED-ESX-1.8.2.5-10EM-600.0.0.2494585 ones that have been rock solid.
     
    #1
  2. markpower28

    markpower28 Active Member

    Joined:
    Apr 9, 2013
    Messages:
    380
    Likes Received:
    96
    Need to use eithernet, IB not going to work.
     
    #2
  3. Rand__

    Rand__ Well-Known Member

    Joined:
    Mar 6, 2014
    Messages:
    2,599
    Likes Received:
    343
    I have 4 vsan boxes (all cx4) and 1 of the 4 is not working since the upgrade showing some RDMA errors/ unresolved symbols.

    In my case I can't get out of maintenance mode; it looked like MLX error but is not as far as I can tell as I had removed all MLX modules and it didnt work either, still got rdt error

    Code:
    2018-11-06T12:34:59.326Z cpu15:2098312)Elf: 2101: module rdt has license VMware
    2018-11-06T12:34:59.329Z cpu15:2098312)WARNING: Elf: 3144: Kernel based module load of rdt failed: Unresolved symbol <ElfRelocateFile failed>
    2018-11-06T12:35:00.047Z cpu4:2098312)Loading module rdt ...
    
    
    2018-11-06T12:34:25.396Z cpu4:2097620)<NMLX_ERR> nmlx5_core: nmlx5_core_DeviceGetParamMaxVfs - (vmkdrivers/native/BSD/Network/mlnx/nmlx5/nmlx5_core/nmlx5_core_main.c:317) vmk_DeviceGetParamMaxVfs failed: Not initialized, continuing with no VFs
    2018-11-06T12:34:26.176Z cpu12:2097620)<NMLX_ERR> nmlx5_core: nmlx5_core_DeviceGetParamMaxVfs - (vmkdrivers/native/BSD/Network/mlnx/nmlx5/nmlx5_core/nmlx5_core_main.c:317) vmk_DeviceGetParamMaxVfs failed: Not initialized, continuing with no VFs
    2018-11-06T12:34:26.663Z cpu10:2097770)WARNING: Elf: 1741: Relocation of symbol <vmk_RDMACapRegister> failed: Unresolved symbol
    2018-11-06T12:34:26.663Z cpu10:2097770)WARNING: Elf: 1741: Relocation of symbol <vmk_RDMACapRegister> failed: Unresolved symbol
    2018-11-06T12:34:26.663Z cpu10:2097770)WARNING: Elf: 1741: Relocation of symbol <vmk_RDMACapRegister> failed: Unresolved symbol
    2018-11-06T12:34:26.663Z cpu10:2097770)WARNING: Elf: 1741: Relocation of symbol <vmk_RDMACapRegister> failed: Unresolved symbol
    2018-11-06T12:34:26.663Z cpu10:2097770)WARNING: Elf: 1741: Relocation of symbol <vmk_RDMAModifyQPArgsValid> failed: Unresolved symbol
    
    I spent a couple of hours already on this (trying to downgrade, reinstall, cleanup old modules etc) but no luck. Not sure at this time what the underlying reason is, but its not the MLX card, i have similar in my other boxes.
     
    #3
  4. justinm001

    justinm001 New Member

    Joined:
    Apr 11, 2018
    Messages:
    19
    Likes Received:
    1
    Does your vmkernel show vsan enabled? My bet is it doesn't which is why you cant exit maintenance mode. I had same problem. I think a couple times it showed enabled but if I removed I couldn't add it
     
    #4
  5. Rand__

    Rand__ Well-Known Member

    Joined:
    Mar 6, 2014
    Messages:
    2,599
    Likes Received:
    343
    Actually i moved the box out of the vsan cluster and still can move out of maint mode, can't even leave the vsan cluster;)

    esxcli vsan cluster leave
    Failed to leave the host from vSAN cluster. The command should be retried: Unable to load module /usr/lib/vmware/vmkmod/cmmds: Invalid or missing namespace

    ...
    Code:
    Load of <vsanutil> failed : missing required namespace <com.vmware.rdt#0.0.0.1>
    2018-11-07T21:45:36.677Z cpu2:2165150)WARNING: Elf: 3144: Kernel based module load of vsanutil failed: Invalid or missing namespace <ElfSetNamespaceInfo failed>
    2018-11-07T21:45:37.418Z cpu2:2165150)WARNING: Elf: 2277: Load of <cmmds> failed : missing required namespace <com.vmware.vsanutil#0.0.0.1>
    2018-11-07T21:45:37.418Z cpu2:2165150)WARNING: Elf: 3144: Kernel based module load of cmmds failed: Invalid or missing namespace <ElfSetNamespaceInfo failed>
     
    #5
    Last edited: Nov 7, 2018
  6. justinm001

    justinm001 New Member

    Joined:
    Apr 11, 2018
    Messages:
    19
    Likes Received:
    1
    Thats exactly what I was getting. Now with fresh install I'm unable to add to vSAN cluster. Everything else is good just vSAN doesn't work.

    Also i'm not even able to add vSAN service to an 1g ethernet NIC unless i uninstall the mellanox infiniband drivers
     
    #6
  7. justinm001

    justinm001 New Member

    Joined:
    Apr 11, 2018
    Messages:
    19
    Likes Received:
    1
    I give up and rolling back to 6.7. I'm unable to use MST in 6.7U1 as getting "-E- nmlx core must be loaded before starting mst." error even though its loaded.
     
    #7
  8. Rand__

    Rand__ Well-Known Member

    Joined:
    Mar 6, 2014
    Messages:
    2,599
    Likes Received:
    343
    Still not sure what might cause this.
    As I said only one of my four boxes has it, 2 server, 2 workstation boards (albeit different vendors), the affected one is a X10SRA.
    What is your box based on?
     
    #8
  9. justinm001

    justinm001 New Member

    Joined:
    Apr 11, 2018
    Messages:
    19
    Likes Received:
    1
    All are HP DL580 G7's with CX354A cards in them. It looks like when ESXI recognizes the network driver it shuts down vSAN services on the server which prevent vSAN to do anything, thus blocking server from exiting maintenance mode. I'm sure its just some setting or option or oversight since these aren't officially supported on HCL.

    Even with fresh install it works fine, until I add the mel drivers then it dies, even on fresh vSAN cluster. Also just a note I can have 6.7U1 on the server and 6.7U1 nested and the nested will work fine in vSAN since no ConnectX drivers.
     
    #9
  10. Rand__

    Rand__ Well-Known Member

    Joined:
    Mar 6, 2014
    Messages:
    2,599
    Likes Received:
    343
    as I said I dont have CX3's in my boxes. but that particvular box had some old mlx drivers installed. Maybe that broke sth on the upgrade; but removal has not helped
     
    #10
  11. Rand__

    Rand__ Well-Known Member

    Joined:
    Mar 6, 2014
    Messages:
    2,599
    Likes Received:
    343
    So I tried rolling back to 6.7 but since I had reinstalling U1 several time there was no non U1 environment left...

    I then restored to defaults and the box is up and running fine. O/c I have to rebuild it and o/c Host profiles is not working for me (it never does), but at least that shows its a config glitch and not a hardware issue...
     
    #11
  12. markpower28

    markpower28 Active Member

    Joined:
    Apr 9, 2013
    Messages:
    380
    Likes Received:
    96
    with 6.5/6.7, IB/OFED support is gone. Ethernet is the only way moving forward with Mellanox and VMware. iSER is the only option for RDMA with vSphere and no more SRP...
     
    #12
Similar Threads: 67U1 Vsan
Forum Title Date
VMware, VirtualBox, Citrix vSan does not see disks in Gui Oct 14, 2018
VMware, VirtualBox, Citrix vSAN ESXi, StarWind, or Dell EMC software solutions Sep 21, 2018
VMware, VirtualBox, Citrix VMWare vSAN, quad M.2 NVME + 1 cache vs SATA SSDs? Jul 5, 2018
VMware, VirtualBox, Citrix VMware vSAN Performance :-( Apr 4, 2018
VMware, VirtualBox, Citrix 2 node vsan for production? Mar 15, 2018

Share This Page