Infiniband: So It Begins..

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

voodooFX

Active Member
Jan 26, 2014
247
52
28
I have this "simple" setup which is making me crazy because it does not work (anymore)

Node 1 "STORAGE"
OS: Ubuntu 14.04 (server)
IB: Mellanox ConnectX 2
Software installed: Mellanox OFED 2.3-2.0.0 (ubuntu14.04-x86_64)

Node 2 "HYPERV1"
OS: ESXi 5.5
IB: Mellanox ConnectX 2
Software installed: MLNX-OFED-ESX-1.8.2.0, mlx4_en-mlnx-1.6.1.2-471530, ib-opensm-3.3.16-64.x86_64

No IB Switch, the nodes are directly connected.

THE PROBLEM

The fisrt day everything worked well, I was able to export a NFS share from the storage node to the esxi and install a VM on it with a great performance.
The second day I had to reboot the storage node and after that I was no more able to make the connection work. No ibping, no ping.

Here are some details


STORAGE NODE

Code:
root@storage:~#ibv_devinfo
hca_id:    mlx4_0
    transport:            InfiniBand (0)
    fw_ver:                2.9.1000
    node_guid:            0002:c903:000d:1c08
    sys_image_guid:            0002:c903:000d:1c0b
    vendor_id:            0x02c9
    vendor_part_id:            26428
    hw_ver:                0xB0
    board_id:            MT_0D81120009
    phys_port_cnt:            2
        port:    1
            state:            PORT_DOWN (1)
            max_mtu:        4096 (5)
            active_mtu:        4096 (5)
            sm_lid:            0
            port_lid:        0
            port_lmc:        0x00
            link_layer:        InfiniBand

        port:    2
            state:            PORT_ACTIVE (4)
            max_mtu:        4096 (5)
            active_mtu:        4096 (5)
            sm_lid:            1
            port_lid:        2
            port_lmc:        0x00
            link_layer:        InfiniBand
Code:
root@storage:~# hca_self_test.ofed

---- Performing Adapter Device Self Test ----
Number of CAs Detected ................. 1
PCI Device Check ....................... PASS
Kernel Arch ............................ x86_64
Host Driver Version .................... MLNX_OFED_LINUX-2.3-2.0.0: 3.13.0-32-generic
Host Driver RPM Check .................. PASS
Firmware on CA #0 HCA .................. v2.9.1000
Firmware Check on CA #0 (HCA) .......... NA
    REASON: NO required fw version
Host Driver Initialization ............. PASS
Number of CA Ports Active .............. 1
Port State of Port #1 on CA #0 (HCA)..... DOWN (InfiniBand)
Port State of Port #2 on CA #0 (HCA)..... UP 4X QDR (InfiniBand)
Error Counter Check on CA #0 (HCA)...... PASS
Kernel Syslog Check .................... PASS
Node GUID on CA #0 (HCA) ............... 00:02:c9:03:00:0d:1c:08
------------------ DONE ---------------------
Code:
root@storage:~# ibstat
CA 'mlx4_0'
    CA type: MT26428
    Number of ports: 2
    Firmware version: 2.9.1000
    Hardware version: b0
    Node GUID: 0x0002c903000d1c08
    System image GUID: 0x0002c903000d1c0b
    Port 1:
        State: Down
        Physical state: Polling
        Rate: 10
        Base lid: 0
        LMC: 0
        SM lid: 0
        Capability mask: 0x02510868
        Port GUID: 0x0002c903000d1c09
        Link layer: InfiniBand
    Port 2:
        State: Active
        Physical state: LinkUp
        Rate: 40
        Base lid: 2
        LMC: 0
        SM lid: 1
        Capability mask: 0x02510868
        Port GUID: 0x0002c903000d1c0a
        Link layer: InfiniBand
Code:
root@storage:~# ibstatus
Infiniband device 'mlx4_0' port 1 status:
    default gid:    fe80:0000:0000:0000:0002:c903:000d:1c09
    base lid:    0x0
    sm lid:        0x0
    state:        1: DOWN
    phys state:    2: Polling
    rate:        10 Gb/sec (4X)
    link_layer:    InfiniBand

Infiniband device 'mlx4_0' port 2 status:
    default gid:    fe80:0000:0000:0000:0002:c903:000d:1c0a
    base lid:    0x2
    sm lid:        0x1
    state:        4: ACTIVE
    phys state:    5: LinkUp
    rate:        40 Gb/sec (4X QDR)
    link_layer:    InfiniBand

root@storage:~#
Code:
root@storage:~# ibhosts
Ca    : 0x0002c903000d246c ports 2 "hyperv1.home.lan HCA-1"
Ca    : 0x0002c903000d1c08 ports 2 "storage HCA-1"
Code:
root@storage:~# ifconfig ib1
ib1       Link encap:UNSPEC  HWaddr A0-00-02-20-FE-80-00-00-00-00-00-00-00-00-00-00
          inet addr:10.0.1.11  Bcast:10.0.1.255  Mask:255.255.255.0
          inet6 addr: fe80::202:c903:d:1c0a/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:2044  Metric:1
          RX packets:48 errors:0 dropped:15 overruns:0 frame:0
          TX packets:41 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1024
          RX bytes:2954 (2.9 KB)  TX bytes:3288 (3.2 KB)
Code:
root@storage:~# route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
default         ibox.home.lan   0.0.0.0         UG    0      0        0 eth0
10.0.1.0        *               255.255.255.0   U     0      0        0 ib0
10.0.1.0        *               255.255.255.0   U     0      0        0 ib1
192.168.1.0     *               255.255.255.0   U     0      0        0 eth0
Code:
root@storage:~# ping 10.0.1.21
PING 10.0.1.21 (10.0.1.21) 56(84) bytes of data.
From 10.0.1.10 icmp_seq=1 Destination Host Unreachable
From 10.0.1.10 icmp_seq=2 Destination Host Unreachable
From 10.0.1.10 icmp_seq=3 Destination Host Unreachable


ESXi NODE

Code:
~ # /opt/opensm/bin/ibstat
CA 'mlx4_0'
    CA type: MT26428
    Number of ports: 2
    Firmware version: 2.9.1000
    Hardware version: b0
    Node GUID: 0x0002c903000d246c
    System image GUID: 0x0002c903000d246f
    Port 1:
        State: Active
        Physical state: LinkUp
        Rate: 40
        Base lid: 1
        LMC: 0
        SM lid: 1
        Capability mask: 0x0251086a
        Port GUID: 0x0002c903000d246d
        Link layer: InfiniBand
    Port 2:
        State: Down
        Physical state: Polling
        Rate: 68
        Base lid: 0
        LMC: 0
        SM lid: 0
        Capability mask: 0x0251086a
        Port GUID: 0x0002c903000d246e
        Link layer: InfiniBand
~ #
Code:
~ # esxcli network ip interface ipv4 get
Name  IPv4 Address  IPv4 Netmask   IPv4 Broadcast  Address Type  DHCP DNS
----  ------------  -------------  --------------  ------------  --------
vmk0  192.168.1.10  255.255.255.0  192.168.1.255   STATIC           false
vmk1  10.0.1.21     255.255.255.0  10.0.1.255      STATIC           false
~ #
Code:
~ # esxcli network nic list
Name       PCI Device     Driver  Link  Speed  Duplex  MAC Address         MTU  Description                                                                
---------  -------------  ------  ----  -----  ------  -----------------  ----  ------------------------------------------------------------------------------
vmnic0     0000:003:00.0  e1000e  Up     1000  Full    00:25:90:06:ca:16  1500  Intel Corporation 82574L Gigabit Network Connection                        
vmnic1     0000:004:00.0  e1000e  Down      0  Half    00:25:90:06:ca:17  1500  Intel Corporation 82574L Gigabit Network Connection                        
vmnic2     0000:007:00.0  e1000e  Up     1000  Full    00:15:17:d6:c6:26  1500  Intel Corporation 82571EB Gigabit Ethernet Controller                      
vmnic3     0000:007:00.1  e1000e  Up     1000  Full    00:15:17:d6:c6:27  1500  Intel Corporation 82571EB Gigabit Ethernet Controller                      
vmnic_ib0  0000:005:00.0          Up    40000  Full    00:02:c9:0d:24:6d  2044  Mellanox Technologies MT26428 [ConnectX VPI - 10GigE / IB QDR, PCIe 2.0 5GT/s]
vmnic_ib1  0000:005:00.0          Down      0  Half    00:02:c9:0d:24:6e  1500  Mellanox Technologies MT26428 [ConnectX VPI - 10GigE / IB QDR, PCIe 2.0 5GT/s]
~ #
Code:
~ # esxcli network vswitch standard list
vSwitch0
   Name: vSwitch0
   Class: etherswitch
   Num Ports: 1792
   Used Ports: 6
   Configured Ports: 128
   MTU: 1500
   CDP Status: listen
   Beacon Enabled: false
   Beacon Interval: 1
   Beacon Threshold: 3
   Beacon Required By:
   Uplinks: vmnic3, vmnic2
   Portgroups: VM Network, Management Network

vSwitch1
   Name: vSwitch1
   Class: etherswitch
   Num Ports: 1792
   Used Ports: 4
   Configured Ports: 128
   MTU: 2044
   CDP Status: listen
   Beacon Enabled: false
   Beacon Interval: 1
   Beacon Threshold: 3
   Beacon Required By:
   Uplinks: vmnic_ib0
   Portgroups: ib
~ #
Code:
~ # ping 10.0.1.11
PING 10.0.1.11 (10.0.1.11): 56 data bytes

--- 10.0.1.11 ping statistics ---
3 packets transmitted, 0 packets received, 100% packet loss
~ #


Additional info:
- opensm is not started on the storage node because it is running on the esxi one.
- the cards have both the last fw
- I was not able to use ibping both with -G and -L parameter
 
Last edited:

sag

Member
Apr 26, 2013
34
6
8
I don't understand much of that output but maybe this will help? Is opensm running on both the infiniband ports or only the first one that is down?

Could it also be the MTU size causing issues? I had an issue when I increased MTU in my setup.

edit: added mtu question
 

voodooFX

Active Member
Jan 26, 2014
247
52
28
Hi sag

as you can see from the output both the interfaces have the same mtu

esxi
vmnic_ib0 0000:005:00.0 Up 40000 Full 00:02:c9:0d:24:6d 2044 Mellanox Technologies MT26428 [ConnectX VPI - 10GigE / IB QDR, PCIe 2.0 5GT/s]

storage
UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1


As for the opensm, it is (should :D ) configured to run on both ports, but anyway on that esxi node (ehere opensm is running) the port used is the first one, so..

P.S. I'm completely new to ib too, started 3 days ago from the very beginning..
 

voodooFX

Active Member
Jan 26, 2014
247
52
28
ok this is crazy :eek:
moving the link to ib0 it works !! :eek::eek:

now the funny thing will be when I connect the second esxi node, on ib1 o_O

stay tuned..

ah, sag you are my hero of the day, thank you man!
 

CreoleLakerFan

Active Member
Oct 29, 2013
486
181
43
Additional info:
- opensm is not started on the storage node because it is running on the esxi one.
- the cards have both the last fw
- I was not able to use ibping both with -G and -L parameter
Unsure if you are still searching for resolution, but to the best of my knowledge, OpenSM needs to be running on each host with an IB Adapter.
 

dba

Moderator
Feb 20, 2012
1,477
184
63
San Francisco Bay Area, California, USA
You should only need one subnet manager in the entire fabric, the others will detect and go into passive mode to act as backups.
One quirky thing about IB without a switch: If you connect two servers with two point-to-point links, you have two different fabrics (one per port) and need to run two subnet managers. The other quirky thing is that if you run a subnet manager on one of the servers, it will work... unless you boot up in the wrong order. My solution in the point-to-point case is to run a subnet manager on each port of each server.
 

epicurean

Active Member
Sep 29, 2014
785
80
28
Sorry to be bringing up an old subject about subnet managers.
If you have a choice of installing one in esxi , or one of the VMs (eg Ubuntu, Windows 2012 server), which is a better option? This is for esxi 6.0 u3