Help getting Connectx-3 working in SR-IOV under proxmox-ve.

groove

Member
Sep 21, 2011
80
26
18
Hi Folks



Need help getting SR-IOV (inifiniband) to work with Proxmox-ve. I’ve installed MLX_OFED_LINUX ver 4.3-1.0.1.0 installed on both the host (proxmox ve 5.4-3 and the guest (ubuntu 18.04). I’ve enabled SR-IOV in the firmware of the card, SR-IOV on MLX_OFED Driver and go the point of where the VFs are listed ib lspci :

lspci -vvv | grep Mel

0a:00.0 Infiniband controller: Mellanox Technologies MT27500 Family [ConnectX-3]
Subsystem: Mellanox Technologies MT27500 Family [ConnectX-3]

0a:00.1 Infiniband controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
Subsystem: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]

0a:00.2 Infiniband controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
Subsystem: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]

0a:00.3 Infiniband controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
Subsystem: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]

0a:00.4 Infiniband controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
Subsystem: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]

0a:00.5 Infiniband controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
Subsystem: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]

0a:00.6 Infiniband controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
Subsystem: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]

0a:00.7 Infiniband controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
Subsystem: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
0a:01.0 Infiniband controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]

Subsystem: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]


Running ibstat shows that the card is active on the host and there’s link on the port that has a cable attached to it:

CA 'mlx4_1'
CA type: MT4099
Number of ports: 2
Firmware version: 2.42.5000
Hardware version: 1
Node GUID: 0x0010e000015aeb70
System image GUID: 0x0010e000015aeb73
Port 1:
State: Down
Physical state: Polling
Rate: 10
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x02514868
Port GUID: 0x0010e000015aeb71
Link layer: InfiniBand

Port 2:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 7
LMC: 0
SM lid: 1
Capability mask: 0x02514868
Port GUID: 0x0010e000015aeb72
Link layer: InfiniBand

Next I’ve added the following to the corresponding conf :

root@pve:~# cat /etc/pve/qemu-server/103.conf
agent: 1
bootdisk: scsi0
cores: 2
hostpci0: 0a:00.1
ide2: ISO:iso/ubuntu-18.04.1-server-amd64.iso,media=cdrom
memory: 2028
name: ansible-host
net0: virtio=22:61:FF:07:36:9C,bridge=vmbr0
numa: 0
ostype: l26

The guest boots up fine and I can list the VF using lspci etc. But ibstat / ip addr keep showing the port is ‘down’ :

root@ansible-host:~# ibstat
CA 'mlx4_0'
CA type: MT4100
Number of ports: 2
Firmware version: 2.42.5000
Hardware version: 1
Node GUID: 0x0014050007731557
System image GUID: 0x0010e000015aeb73
Port 1:
State: Down
Physical state: Polling
Rate: 10
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x02514868
Port GUID: 0x867fac23b427e626
Link layer: InfiniBand

Port 2:
State: Down
Physical state: LinkUp
Rate: 10
Base lid: 7
LMC: 0
SM lid: 1
Capability mask: 0x02514868
Port GUID: 0xba9de921b39e35b3
Link layer: InfiniBand


root@ansible-host:~# ip a
3: ib0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 4092 qdisc mq state DOWN group default qlen 256
link/infiniband a0:00:0a:18:fe:80:00:00:00:00:00:00:86:7f:ac:23:b4:27:e6:26 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
inet 10.0.1.92/8 brd 10.255.255.255 scope global ib0
valid_lft forever preferred_lft forever

4: ib1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 4092 qdisc mq state DOWN group default qlen 256
link/infiniband a0:00:0a:58:fe:80:00:00:00:00:00:00:ba:9d:e9:21:b3:9e:35:b3 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
inet 10.0.10.92/8 brd 10.255.255.255 scope global ib1
valid_lft forever preferred_lft forever

root@ansible-host:~# ibdev2netdev
mlx4_0 port 1 ==> ib0 (Down)
mlx4_0 port 2 ==> ib1 (Down)


I’m trying to get ipoib / nfs over rdma working in the guest over SR-IOV Would really appreciate it if I get some help with what am I missing in my setup.

Thanks in advance.

G
 

groove

Member
Sep 21, 2011
80
26
18
Ok - looks I've managed to resolve this - leaving it here in case someone else runs into this.

It looks the Subnet Manager that I was running on my switch (IS3035) too old a version and did not recognize the Virtual Function (VF) instances. Once I stopped that SM and ran OpenSM included in the MLNX_OFED instance running on proxmox ve, the port state went to active and now have connectivity to the rest of the infiniband fabric.

Now on to testing the performance of NFS over RDMA from the guest OS....
 

groove

Member
Sep 21, 2011
80
26
18
ok - no dice with this setup. Seems as though RDMA over VFs don't work (at least with infiniband) on proxmox ve.

Tried both - NFS over RDMA and iSer. In both cases, connecting to the back end storage just hangs.

With NFS over RDMA it hangs while doing a mount - steps I'm performing are:

/etc/init.d/openibd restart
modprobe xprtrdma
mount -o rdma,port=20049 10.0.1.1:/data /mnt
The mount never returns (waited for 15 mins before rebooting the guest).

Even tried iser :

modprobe ib_iser
iscsiadm -m discovery -t st -p 10.0.0.1:3260
iscsiadm -m node -T <iqn of solaris host> -o update -n iface.transport_name -v iser
iscsiadm -m node -l
Again, the last iscsiadm does not return - again waited for 15 mins before aborting the command. Subsequent requests to re-login / log out using iscsiadm gives 'internal error' and fails.

Calling other gurus who have done this to shed some light on where to look. My setup is not extensive. I have a Oracle Solaris 3 based 'NAS/SAN' with a Connectx-3 infiniband. I'm trying this on a porxmox ve host (after running all apt updates) and an ubunut 18.04 guest (again after running all apt updates). I'm moved my subnet manager to my promox ve host (please see note above) to get the VF to connect to the fabric.

Thanks in advance.