Help with Infiniband on Vmware 5.1 ESXi

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

AlexMercer

New Member
Jun 16, 2014
2
0
1
37
Hello,


I have a really weird problem with Infiniband connection between ESXi Hosts.

Here is my setup :

HP C7000 with BL685c G1 and HP 4x DDR IB Switch Module . The blades are running Vmware Esxi 5.1.0 U2 ( Custom HP image ), I have also installed Mellanox drivers ( MLNX-OFED-ESX-1.8.1.0 ) and ib-opensm on each of the hosts (Infiniband@home : votre homelab à 20Gbps - Hypervisor.fr ) . Here are the vmnics :

# esxcli network nic list | grep 10G
vmnic_ib0 0000:047:00.0 ib_ipoib Up 20000 Full 00:23:7d:94:d8:7d 4092 Mellanox Technologies MT25418 [ConnectX VPI - 10GigE / IB DDR, PCIe 2.0 2.5GT/s]
vmnic_ib1 0000:047:00.0 ib_ipoib Up 20000 Full 00:23:7d:94:d8:7e 1500 Mellanox Technologies MT25418 [ConnectX VPI - 10GigE / IB DDR, PCIe 2.0 2.5GT/s]


Infiniband adapters stats :

/opt/opensm/bin # ./ibstat
CA 'mlx4_0'
CA type: MT25418
Number of ports: 2
Firmware version: 2.7.0
Hardware version: a0
Node GUID: 0x00237dffff94d87c
System image GUID: 0x00237dffff94d87f
Port 1:
State: Active
Physical state: LinkUp
Rate: 20
Base lid: 2
LMC: 0
SM lid: 1
Capability mask: 0x0251086a
Port GUID: 0x00237dffff94d87d
Link layer: InfiniBand
Port 2:
State: Active
Physical state: LinkUp
Rate: 20
Base lid: 2
LMC: 0
SM lid: 1
Capability mask: 0x0251086a
Port GUID: 0x00237dffff94d87e
Link layer: InfiniBand


I have created a VMkernel port and a switch, both the group and switch are setup to deal with mtu=4k. I have also configured the mlx4_core to support mtu=4k

The OpenSM also has a configuration, that enables it to use mtu=4k
partitions.conf : Default=0x7fff,ipoib,mtu=5:ALL=full;

# esxcli system module parameters list -m=mlx4_core | grep mtu_4k
mtu_4k int 1 configure 4k mtu (mtu_4k > 0)


And here is the problem.

When I am using MTU=1500

/opt/iperf/bin # ./iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 64.0 KByte (default)
------------------------------------------------------------
[ 4] local 192.168.13.39 port 5001 connected with 192.168.13.36 port 61140
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.0 sec 3.98 GBytes 3.42 Gbits/sec
[ 5] local 192.168.13.39 port 5001 connected with 192.168.13.36 port 58854
[ 5] 0.0-10.0 sec 4.53 GBytes 3.89 Gbits/sec
[ 4] local 192.168.13.39 port 5001 connected with 192.168.13.36 port 51600
[ 4] 0.0-10.0 sec 3.66 GBytes 3.15 Gbits/sec
[ 5] local 192.168.13.39 port 5001 connected with 192.168.13.36 port 60066
[ 5] 0.0-10.0 sec 4.52 GBytes 3.88 Gbits/sec
[ 4] local 192.168.13.39 port 5001 connected with 192.168.13.36 port 50728
[ 4] 0.0-10.0 sec 4.71 GBytes 4.04 Gbits/sec
[ 5] local 192.168.13.39 port 5001 connected with 192.168.13.36 port 58792
[ 5] 0.0-10.0 sec 4.54 GBytes 3.90 Gbits/sec

MTU=2000


/opt/iperf/bin # ./iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 64.0 KByte (default)
------------------------------------------------------------
[ 4] local 192.168.13.39 port 5001 connected with 192.168.13.36 port 62523
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.0 sec 5.35 GBytes 4.59 Gbits/sec
[ 5] local 192.168.13.39 port 5001 connected with 192.168.13.36 port 56491
[ 5] 0.0-10.0 sec 5.43 GBytes 4.66 Gbits/sec
[ 4] local 192.168.13.39 port 5001 connected with 192.168.13.36 port 63144
[ 4] 0.0-10.0 sec 4.41 GBytes 3.79 Gbits/sec
[ 5] local 192.168.13.39 port 5001 connected with 192.168.13.36 port 53978
[ 5] 0.0-10.0 sec 4.43 GBytes 3.81 Gbits/sec
[ 4] local 192.168.13.39 port 5001 connected with 192.168.13.36 port 61886
[ 4] 0.0-10.0 sec 5.38 GBytes 4.62 Gbits/sec


MTU=4092

/opt/iperf/bin # ./iperf -c 192.168.13.39
------------------------------------------------------------
Client connecting to 192.168.13.39, TCP port 5001
TCP window size: 75.5 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.13.36 port 50673 connected with 192.168.13.39 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-79.5 sec 8.00 GBytes 864 Mbits/sec
/opt/iperf/bin # ./iperf -c 192.168.13.39
------------------------------------------------------------
Client connecting to 192.168.13.39, TCP port 5001
TCP window size: 75.5 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.13.36 port 49604 connected with 192.168.13.39 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-79.5 sec 8.00 GBytes 864 Mbits/sec
/opt/iperf/bin # ./iperf -c 192.168.13.39
------------------------------------------------------------
Client connecting to 192.168.13.39, TCP port 5001
TCP window size: 35.5 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.13.36 port 58764 connected with 192.168.13.39 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-79.5 sec 8.00 GBytes 864 Mbits/sec


All the testing has been done with iperf. Any suggestions why when the mtu is 4092 I get slower connection speeds than when I am using MTU=2000. AFAIK the speed has to increase when the mtu is higher ( I can see this trend from the difference between mtu=1500 and mtu=2000 ) . And in general I am not seeing the performance I was expecting - it should be like ~8G and it's barely clocking 4 .

Any input is welcome :)