Mellanox unhappy w/ vtD passthru in vSphere

whitey

Moderator
Jun 30, 2014
2,770
865
113
37
Anyone have an tips/tricks on how to make this play nice? Trying to simply passthru a Mellanox ConnectX-2 to a CentOS7 VM to perform some IB testing. Other HCA works like a champ in a phys host w/ CentOS7. Hoping this is not my LSI HBA passthru nightmare all over again. Worst case I will 'borrow' an ESXi host and load it physical if i have to but was hoping to avoid that.

mellanox-connectX-2-vtD-not-happy.png
 

whitey

Moderator
Jun 30, 2014
2,770
865
113
37
My bad, I should have listed more info.

My env is vSphere 6.0 U2, I think i heard someone report back in one of my older threads stating that the newer vSphere releases did not require the GRUB_CMDLINE_LINUX_DEFAULT hack so maybe I will update one of my hosts and test again. I have no deep experience w/ IB HCA's so is the trick to getting the flint tools to install OFED? Currently I am just using whatever built-in drivers the vendor (tried CentOS7 and Ubuntu 16.04) and came upon the same failed results w/ ib0 not even being detected and a slew of errors in dmesg output. I don't think I have a OEM card but how do you tell with these?
 

whitey

Moderator
Jun 30, 2014
2,770
865
113
37
Loaded the MLNX_OFED_LINUX-3.4-1.0.0.0-rhel7.2-x86_64.tgz bits, noticed it loaded mstflint, assuming that is the path forward.
 

whitey

Moderator
Jun 30, 2014
2,770
865
113
37
Does this look like the most current firmware ALREADY installed? Maybe I just need to bite the bullet and update my hypervisor. OFED drivers didn't help but got me the suite of tools very easy after yanking all RH/vendor installed IB goodies.

mellanox-connectx2-vpi-mstflint-is-this-current.PNG
 

OkiieDoe

New Member
Feb 5, 2015
12
1
3
28
oh and in the test server which is a x9srl-f board running esxi 6 4510822 with ofed 2.4.0.0
 

whitey

Moderator
Jun 30, 2014
2,770
865
113
37
I also have a X9SRL-F mobo i am attempting this in, I must be having a SLOW night, cannot flash seemingly following the right process. Am I supposed to be flashing a .bin file, a .mlx, or a .ini, Still not happt trying .mlx/.ini (the one you mentioned that aligns w/ my card), don't see a .bin file.

mellanox-connectX2-FW-flash-wtf2.png
mellanox-connectX2-FW-flash-wtf.png


Go ahead and laugh if it is simple :-D I'l also take a 'How to flash a Mellanox card from Linux' for dummies tutorial/work instruction.
 

whitey

Moderator
Jun 30, 2014
2,770
865
113
37
I did find this .bin w/in the fw-ConnectX2-rel-2_9_1000-MHQA19_A1-A2.bin.zip file i was able to hunt down on the mellanox website. Just warns me that it is the same version. Really starting to wonder if it's my ESXi release that isn't playing nice that I really shoudl be chasing down but this can't hurt right?

mellanox-connectX2-FW-flash-wtf3.png
 

whitey

Moderator
Jun 30, 2014
2,770
865
113
37
Just updated a older 'test' node I have X8DTL-3F to ESXi 6.0 U2 w/ patch:

ESXi 6.0 Patch 4* ESXi600-201611001 2016-11-22 4600944

Couple of patches past yours and completely current, just enabled vt-D for the HCA, gonna reboot and install a CentOS 7 1511 box now after attaching the HCA to it. Let's see what happens.

EDIT: Same ole' BS...SMH :-(

mellanox-connectX2-vt-D-ESXi6.0U2-4600944.png
 
Last edited:

whitey

Moderator
Jun 30, 2014
2,770
865
113
37
Had to resort to two phys Linux systems but getting somewhere...I could not see link lights on the cards/IS5022 switch so I direct connected them restarted opensm and BAM lights up. Cannot figure out why the IB switch wont play nice so far. Do I need a OpenSM running on each Linux instance for them to work while connected to the IS5022 Mellanox IB switch?

Crossover/direct connect iperf w/ 65520 MTU set/connected mode/default CentOS7 load. May try w/ OFED drivers loaded next.

IB-crossover-mtu-65520-13Gbps.PNG

EDIT: HA just noticed that w/ my L5630 system driving the server it's maxing out on CPU at 100% while my E5 2670 system in client mode pushes 25% CPU. Let's flip-flop and see what it does....yep a bit better w/ E5 2670 system driving iperf server (40% CPU) and L5630 system as client (still 100%) but better throughput.

[ 4] local 10.10.10.101 port 5001 connected with 10.10.10.100 port 45410
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.0 sec 17.8 GBytes 15.3 Gbits/sec
[ 5] local 10.10.10.101 port 5001 connected with 10.10.10.100 port 45412
[ 5] 0.0-10.0 sec 17.9 GBytes 15.3 Gbits/sec
[ 4] local 10.10.10.101 port 5001 connected with 10.10.10.100 port 45460
[ 4] 0.0-10.0 sec 17.8 GBytes 15.3 Gbits/sec

So now I am CPU constrained to really see these QDR HCA's fly right since it looks like iperf is single threaded?
 
Last edited:

whitey

Moderator
Jun 30, 2014
2,770
865
113
37
I think I 'may' have a jank/broke Mellanox IS5022 IB switch, it powers up, green lights on top two LED's (one with triangle w/ ! and one w/ fan symbol) but no response from console (I2C) port if that is what it is used for, tried the cable they sent me and another serial to rj-45 that I use on my juniper/procurve/cisco switches in the past and it links fine to my Juniper EX3300 @ 9600, 8, N, 1. Docs say IB switch 'should' be at same serial config/setup.

How else can I validate that the IB switch is working as it should?

SMH, nothing easy!

EDIT: Installed OFED drivers 3.4-1 (Now up to 21Gbps limited by client CPU still direct connected/crossover style until I sort out this POS IB switch)
IB-crossover-mtu-65520-OFED-3.4-1-21Gbps.PNG
 
Last edited:
  • Like
Reactions: T_Minus

whitey

Moderator
Jun 30, 2014
2,770
865
113
37
Got my card firmware updated to 2.9.1200 up from 2.9.1000 using this method if anyone runs into this nonsense.

mlxburn -fw fw-ConnectX2-rel.mlx -conf MHQH19B-XTR_A1-A3.ini -wrimage fw-ConnectX2-rel-2_9_1200-MHQH19B-XTR_A1-A3.bin (pay attn to use proper .ini for your device for this intermediate update)

then burn via:

mstflint -d 03:00.0 -i fw-ConnectX2-rel-2_9_1200-MHQH19B-XTR_A1-A3.bin b
 
Last edited:
  • Like
Reactions: T_Minus

_alex

Active Member
Jan 28, 2016
874
94
28
Bavaria / Germany
@whitey: does SR-IOV work on the IB-Port with that Firmware ?

I have the Dualport ConnectX-2 with 10 + 40G ports ...
I managed to customize/hack FW-Definitions File setting SR-IOV to be enabled and flash it to a card some months ago. The result was that SR-IOV worked for the 10G SFP+ - Port but not the 40G QSFP - Port. In the end i reverted it back to the latest 'official' Firmware.
 

whitey

Moderator
Jun 30, 2014
2,770
865
113
37
RESOLVED! Updated firmware from 2.9.1000 to intermediate FW release 2.9.1200 and vt-D passthru of the Mellanox ConnectX-2 device is now working like a dream.
 

whitey

Moderator
Jun 30, 2014
2,770
865
113
37
@whitey: does SR-IOV work on the IB-Port with that Firmware ?

I have the Dualport ConnectX-2 with 10 + 40G ports ...
I managed to customize/hack FW-Definitions File setting SR-IOV to be enabled and flash it to a card some months ago. The result was that SR-IOV worked for the 10G SFP+ - Port but not the 40G QSFP - Port. In the end i reverted it back to the latest 'official' Firmware.
Not sure, never tried out SR-IOV, maybe be worth a looksie. Is that this ConnectX-2 card?

MHZH29-XTR

If so does that card work in dual mode (IB 40G port and 10Gbe port) simultaneously? I knwo the VPI cards you can setup for IB/EN mode but can that card operate both at the same time and hook to an IB fabric and Ethernet fabric?
 

_alex

Active Member
Jan 28, 2016
874
94
28
Bavaria / Germany
yes, guess my cards are xtr-b - got a total of 14 of them :)

you can configure each port if its ib or ethernet-mode separately and use them at the same time. in my 3-node prod. - cluster i have two of them in each node, and use on two is5022 for SAN on the 40G ports and the 10g - ports via dac without Switch (stp to the rescue .. ) as Inter-vm / Cluster Network.

had sr-iov working on the 10G - Port with that fw-hack, but not on the 40G - Ports. guess connectx-3 would do sr-iov on them ...