1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Mellanox unhappy w/ vtD passthru in vSphere

Discussion in 'VMware, VirtualBox, Citrix' started by whitey, Nov 24, 2016.

  1. whitey

    whitey Moderator

    Joined:
    Jun 30, 2014
    Messages:
    2,216
    Likes Received:
    676
    Anyone have an tips/tricks on how to make this play nice? Trying to simply passthru a Mellanox ConnectX-2 to a CentOS7 VM to perform some IB testing. Other HCA works like a champ in a phys host w/ CentOS7. Hoping this is not my LSI HBA passthru nightmare all over again. Worst case I will 'borrow' an ESXi host and load it physical if i have to but was hoping to avoid that.

    mellanox-connectX-2-vtD-not-happy.png
     
    #1
  2. OkiieDoe

    OkiieDoe New Member

    Joined:
    Feb 5, 2015
    Messages:
    11
    Likes Received:
    1
    #2
  3. whitey

    whitey Moderator

    Joined:
    Jun 30, 2014
    Messages:
    2,216
    Likes Received:
    676
    My bad, I should have listed more info.

    My env is vSphere 6.0 U2, I think i heard someone report back in one of my older threads stating that the newer vSphere releases did not require the GRUB_CMDLINE_LINUX_DEFAULT hack so maybe I will update one of my hosts and test again. I have no deep experience w/ IB HCA's so is the trick to getting the flint tools to install OFED? Currently I am just using whatever built-in drivers the vendor (tried CentOS7 and Ubuntu 16.04) and came upon the same failed results w/ ib0 not even being detected and a slew of errors in dmesg output. I don't think I have a OEM card but how do you tell with these?
     
    #3
  4. whitey

    whitey Moderator

    Joined:
    Jun 30, 2014
    Messages:
    2,216
    Likes Received:
    676
    Loaded the MLNX_OFED_LINUX-3.4-1.0.0.0-rhel7.2-x86_64.tgz bits, noticed it loaded mstflint, assuming that is the path forward.
     
    #4
  5. whitey

    whitey Moderator

    Joined:
    Jun 30, 2014
    Messages:
    2,216
    Likes Received:
    676
    Does this look like the most current firmware ALREADY installed? Maybe I just need to bite the bullet and update my hypervisor. OFED drivers didn't help but got me the suite of tools very easy after yanking all RH/vendor installed IB goodies.

    mellanox-connectx2-vpi-mstflint-is-this-current.PNG
     
    #5
  6. OkiieDoe

    OkiieDoe New Member

    Joined:
    Feb 5, 2015
    Messages:
    11
    Likes Received:
    1
    you can update your card firmware with an intermediate update found here release notes here in the .tgz file look for your card .ini file MHQH19B-XTR_A1-A3 this will update your card to firmware to Rev 2.9.1200.

    [​IMG]
     

    Attached Files:

    #6
  7. OkiieDoe

    OkiieDoe New Member

    Joined:
    Feb 5, 2015
    Messages:
    11
    Likes Received:
    1
    oh and in the test server which is a x9srl-f board running esxi 6 4510822 with ofed 2.4.0.0
     
    #7
  8. whitey

    whitey Moderator

    Joined:
    Jun 30, 2014
    Messages:
    2,216
    Likes Received:
    676
    I also have a X9SRL-F mobo i am attempting this in, I must be having a SLOW night, cannot flash seemingly following the right process. Am I supposed to be flashing a .bin file, a .mlx, or a .ini, Still not happt trying .mlx/.ini (the one you mentioned that aligns w/ my card), don't see a .bin file.

    mellanox-connectX2-FW-flash-wtf2.png
    mellanox-connectX2-FW-flash-wtf.png


    Go ahead and laugh if it is simple :-D I'l also take a 'How to flash a Mellanox card from Linux' for dummies tutorial/work instruction.
     
    #8
  9. whitey

    whitey Moderator

    Joined:
    Jun 30, 2014
    Messages:
    2,216
    Likes Received:
    676
    I did find this .bin w/in the fw-ConnectX2-rel-2_9_1000-MHQA19_A1-A2.bin.zip file i was able to hunt down on the mellanox website. Just warns me that it is the same version. Really starting to wonder if it's my ESXi release that isn't playing nice that I really shoudl be chasing down but this can't hurt right?

    mellanox-connectX2-FW-flash-wtf3.png
     
    #9
  10. whitey

    whitey Moderator

    Joined:
    Jun 30, 2014
    Messages:
    2,216
    Likes Received:
    676
    So yeah I MUST be blind, found this:

    http://www.mellanox.com/page/custom_firmware_table

    See the official (which included 2.9.1000 FW .bin file) and Intermediate below w/ updated 2.9.1200 (does NOT include .bin FW file). Maybe there is a trick I am missing.
     
    #10
  11. whitey

    whitey Moderator

    Joined:
    Jun 30, 2014
    Messages:
    2,216
    Likes Received:
    676
    Just updated a older 'test' node I have X8DTL-3F to ESXi 6.0 U2 w/ patch:

    ESXi 6.0 Patch 4* ESXi600-201611001 2016-11-22 4600944

    Couple of patches past yours and completely current, just enabled vt-D for the HCA, gonna reboot and install a CentOS 7 1511 box now after attaching the HCA to it. Let's see what happens.

    EDIT: Same ole' BS...SMH :-(

    mellanox-connectX2-vt-D-ESXi6.0U2-4600944.png
     
    #11
    Last edited: Nov 27, 2016
  12. zhoulander

    zhoulander Active Member

    Joined:
    Feb 1, 2016
    Messages:
    117
    Likes Received:
    29
    #12
  13. whitey

    whitey Moderator

    Joined:
    Jun 30, 2014
    Messages:
    2,216
    Likes Received:
    676
    Had to resort to two phys Linux systems but getting somewhere...I could not see link lights on the cards/IS5022 switch so I direct connected them restarted opensm and BAM lights up. Cannot figure out why the IB switch wont play nice so far. Do I need a OpenSM running on each Linux instance for them to work while connected to the IS5022 Mellanox IB switch?

    Crossover/direct connect iperf w/ 65520 MTU set/connected mode/default CentOS7 load. May try w/ OFED drivers loaded next.

    IB-crossover-mtu-65520-13Gbps.PNG

    EDIT: HA just noticed that w/ my L5630 system driving the server it's maxing out on CPU at 100% while my E5 2670 system in client mode pushes 25% CPU. Let's flip-flop and see what it does....yep a bit better w/ E5 2670 system driving iperf server (40% CPU) and L5630 system as client (still 100%) but better throughput.

    [ 4] local 10.10.10.101 port 5001 connected with 10.10.10.100 port 45410
    [ ID] Interval Transfer Bandwidth
    [ 4] 0.0-10.0 sec 17.8 GBytes 15.3 Gbits/sec
    [ 5] local 10.10.10.101 port 5001 connected with 10.10.10.100 port 45412
    [ 5] 0.0-10.0 sec 17.9 GBytes 15.3 Gbits/sec
    [ 4] local 10.10.10.101 port 5001 connected with 10.10.10.100 port 45460
    [ 4] 0.0-10.0 sec 17.8 GBytes 15.3 Gbits/sec

    So now I am CPU constrained to really see these QDR HCA's fly right since it looks like iperf is single threaded?
     
    #13
    Last edited: Nov 29, 2016
  14. whitey

    whitey Moderator

    Joined:
    Jun 30, 2014
    Messages:
    2,216
    Likes Received:
    676
    I think I 'may' have a jank/broke Mellanox IS5022 IB switch, it powers up, green lights on top two LED's (one with triangle w/ ! and one w/ fan symbol) but no response from console (I2C) port if that is what it is used for, tried the cable they sent me and another serial to rj-45 that I use on my juniper/procurve/cisco switches in the past and it links fine to my Juniper EX3300 @ 9600, 8, N, 1. Docs say IB switch 'should' be at same serial config/setup.

    How else can I validate that the IB switch is working as it should?

    SMH, nothing easy!

    EDIT: Installed OFED drivers 3.4-1 (Now up to 21Gbps limited by client CPU still direct connected/crossover style until I sort out this POS IB switch)
    IB-crossover-mtu-65520-OFED-3.4-1-21Gbps.PNG
     
    #14
    Last edited: Nov 29, 2016
    T_Minus likes this.
  15. whitey

    whitey Moderator

    Joined:
    Jun 30, 2014
    Messages:
    2,216
    Likes Received:
    676
    Got my card firmware updated to 2.9.1200 up from 2.9.1000 using this method if anyone runs into this nonsense.

    mlxburn -fw fw-ConnectX2-rel.mlx -conf MHQH19B-XTR_A1-A3.ini -wrimage fw-ConnectX2-rel-2_9_1200-MHQH19B-XTR_A1-A3.bin (pay attn to use proper .ini for your device for this intermediate update)

    then burn via:

    mstflint -d 03:00.0 -i fw-ConnectX2-rel-2_9_1200-MHQH19B-XTR_A1-A3.bin b
     
    #15
    Last edited: Nov 29, 2016
    T_Minus likes this.
  16. _alex

    _alex Active Member

    Joined:
    Jan 28, 2016
    Messages:
    405
    Likes Received:
    34
    @whitey: does SR-IOV work on the IB-Port with that Firmware ?

    I have the Dualport ConnectX-2 with 10 + 40G ports ...
    I managed to customize/hack FW-Definitions File setting SR-IOV to be enabled and flash it to a card some months ago. The result was that SR-IOV worked for the 10G SFP+ - Port but not the 40G QSFP - Port. In the end i reverted it back to the latest 'official' Firmware.
     
    #16
  17. whitey

    whitey Moderator

    Joined:
    Jun 30, 2014
    Messages:
    2,216
    Likes Received:
    676
    RESOLVED! Updated firmware from 2.9.1000 to intermediate FW release 2.9.1200 and vt-D passthru of the Mellanox ConnectX-2 device is now working like a dream.
     
    #17
  18. whitey

    whitey Moderator

    Joined:
    Jun 30, 2014
    Messages:
    2,216
    Likes Received:
    676
    Not sure, never tried out SR-IOV, maybe be worth a looksie. Is that this ConnectX-2 card?

    MHZH29-XTR

    If so does that card work in dual mode (IB 40G port and 10Gbe port) simultaneously? I knwo the VPI cards you can setup for IB/EN mode but can that card operate both at the same time and hook to an IB fabric and Ethernet fabric?
     
    #18
  19. _alex

    _alex Active Member

    Joined:
    Jan 28, 2016
    Messages:
    405
    Likes Received:
    34
    yes, guess my cards are xtr-b - got a total of 14 of them :)

    you can configure each port if its ib or ethernet-mode separately and use them at the same time. in my 3-node prod. - cluster i have two of them in each node, and use on two is5022 for SAN on the 40G ports and the 10g - ports via dac without Switch (stp to the rescue .. ) as Inter-vm / Cluster Network.

    had sr-iov working on the 10G - Port with that fw-hack, but not on the 40G - Ports. guess connectx-3 would do sr-iov on them ...
     
    #19
Similar Threads: Mellanox unhappy
Forum Title Date
VMware, VirtualBox, Citrix Mellanox ConnectX-2 and ESXi 6.0 - Barely Working - Terrible Performance Nov 7, 2016
VMware, VirtualBox, Citrix Proxmox issues with SR-IOV and Mellanox Connectx-3 Dec 20, 2015
VMware, VirtualBox, Citrix Need help with Mellanox InfinBand SR-IOV and RDMA Nov 19, 2015
VMware, VirtualBox, Citrix Upgrade to ESXi 5.5 with C6100 and Mellanox Dec 11, 2013

Share This Page