ConnectX-4 SR-IOV issues on Xeon E3-1270v2

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

rvdm

New Member
Jan 9, 2022
7
0
1
I'm having major issues getting VF's to work on an MCX4121A-ACA_Ax connected to a E3-1270v2 on a Supermicro X9-SCM motherboard.

Bios & card config is no issue, but the moment I try to assign a VF to a VM in Proxmox, the VM fails to start saying it "Cannot bind 0000:02:00.2 to vfio".

Additionally I noticed I can only add VF's on the first port, trying to add VFs to the second port results in the following kernel errors :

[ 123.769976] mlx5_core 0000:02:00.1: E-Switch: Enable: mode(LEGACY), nvfs(2), active vports(3)
[ 123.878692] pci 0000:02:02.1: [15b3:1016] type 7f class 0xffffff
[ 123.878732] pci 0000:02:02.1: unknown header type 7f, ignoring device
[ 124.902663] mlx5_core 0000:02:00.1: mlx5_sriov_enable:157:(pid 2517): pci_enable_sriov failed : -5
[ 124.903107] mlx5_core 0000:02:00.1: E-Switch: Disable: mode(LEGACY), nvfs(2), active vports(3)

I did try different MLX firmwares which don't make a difference. I did notice all VFs and all nic ports are in the same IOMMU group, nut i'm not entirely sure if thats an issue or not.
 

llowrey

Active Member
Feb 26, 2018
177
153
43
You need to enable ARI in the bios, otherwise the full set of VFs can't be properly enumerated. I ran into this with an epyc system so it might be different with xeon.


Next, you may have an issue where the host OS is binding to the VFs. I had to blacklist the VF PCI IDs so the OS wouldn't touch them and thus they would be free for QEMU to bind to them.

Your /etc/modprobe.d/vfio.conf would look like this:

Code:
options vfio-pci ids=15b3:1014
Here're the IDs via lspci:

Code:
2d:00.0 Ethernet controller [0200]: Mellanox Technologies MT27700 Family [ConnectX-4] [15b3:1013]
2d:00.1 Ethernet controller [0200]: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function] [15b3:1014]
You also need to add rd.driver.pre=vfio-pci to your kernel command-line in order to get the VFs blacklisted early.
 
  • Like
Reactions: gordonthree

rvdm

New Member
Jan 9, 2022
7
0
1
I mean that BIOS + card configuration didn't throw any errors. It's indeed not given that they aren't the root cause. I have no other SR-IOV devices in the system.

IOMMU seems enabled

[ 0.000000] Command line: initrd=\EFI\proxmox\5.13.19-2-pve\initrd.img-5.13.19-2-pve root=ZFS=rpool/ROOT/pve-1 boot=zfs intel_iommu=on iommu=pt
[ 0.011548] ACPI: DMAR 0x00000000CE65DD20 000078 (v01 INTEL SNB 00000001 INTL 00000001)
[ 0.011575] ACPI: Reserving DMAR table memory at [mem 0xce65dd20-0xce65dd97]
[ 0.063197] Kernel command line: initrd=\EFI\proxmox\5.13.19-2-pve\initrd.img-5.13.19-2-pve root=ZFS=rpool/ROOT/pve-1 boot=zfs intel_iommu=on iommu=pt
[ 0.063243] DMAR: IOMMU enabled
[ 0.156178] DMAR: Host address width 36
[ 0.156180] DMAR: DRHD base: 0x000000fed90000 flags: 0x1
[ 0.156185] DMAR: dmar0: reg_base_addr fed90000 ver 1:0 cap c9008020660262 ecap f010da
[ 0.156188] DMAR: RMRR base: 0x000000ce4cc000 end: 0x000000ce4d8fff
[ 0.156191] DMAR-IR: IOAPIC id 2 under DRHD base 0xfed90000 IOMMU 0
[ 0.156193] DMAR-IR: HPET id 0 under DRHD base 0xfed90000
[ 0.156195] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
[ 0.156407] DMAR-IR: Enabled IRQ remapping in x2apic mode
[ 0.242977] iommu: Default domain type: Passthrough (set via kernel command line)
[ 0.266684] DMAR: No ATSR found
[ 0.266686] DMAR: No SATC found
[ 0.266688] DMAR: dmar0: Using Queued invalidation
The card has SRIOV enabled, used the cmd below + reboot
mstconfig -d 02:00.0 set SRIOV_EN=0 NUM_OF_VFS=15

I enable VF's with
echo 2 > /sys/bus/pci/devices/0000\:02\:00.0/sriov_numvfs
second port attempt would be
echo 2 > /sys/bus/pci/devices/0000\:02\:00.1/sriov_numvfs
 
Last edited:

rvdm

New Member
Jan 9, 2022
7
0
1
thanks llowrey

I disabled autprobe for VFs so I think that also stops the host from binding the VFs
echo 0 > /sys/bus/pci/devices/0000\:01\:00.0/sriov_drivers_autoprobe

but you're probably onto something wrt ARI - lspci -vvv shows ARIFwd- on the root port, which I guess means it isn't enabled or supported.
I scanned trough the BIOS options and didn't find anything that mentioned ARI or "smelled" like it.
 

llowrey

Active Member
Feb 26, 2018
177
153
43
Ah right. I still have some left-over config from my CX-3.

Here's my /etc/udev/rules.d/99-sriov.rules file:

Code:
ACTION=="add", SUBSYSTEM=="net",  ATTRS{vendor}=="0x15b3", ATTRS{device}=="0x1013", ATTR{device/sriov_drivers_autoprobe}="0", ATTR{device/sriov_numvfs}="16"
This method configures all devices with the given vendor:device id and would work if you wanted to configure both ports identically.

I can't remember what the epyc bios setting was for ARI but it was a bit of an adventure to find it.
 

llowrey

Active Member
Feb 26, 2018
177
153
43
Also, here's the results from 'ip link' with 6 VMs running:

Code:
4: enp194s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether XX:XX:XX:XX:XX:XX brd ff:ff:ff:ff:ff:ff
    vf 0     link/ether 52:54:00:XX:XX:29 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    vf 1     link/ether 52:54:00:XX:XX:17 brd ff:ff:ff:ff:ff:ff, vlan 666, spoof checking off, link-state auto, trust off, query_rss off
    vf 2     link/ether 52:54:00:XX:XX:10 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    vf 3     link/ether 52:54:00:XX:XX:3e brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    vf 4     link/ether 52:54:00:XX:XX:e7 brd ff:ff:ff:ff:ff:ff, vlan 666, spoof checking off, link-state auto, trust off, query_rss off
    vf 5     link/ether 52:54:00:aa:03:2c brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    vf 6     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    vf 7     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    vf 8     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    vf 9     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    vf 10     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    vf 11     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    vf 12     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    vf 13     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    vf 14     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    vf 15     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    altname enp194s0np0
I don't know how promox works but I don't directly use pci passthrough for these VFs. I have a libvirt network set up with a network that looks like this:
Code:
<network connections='6'>
  <name>Mellanox</name>
  <uuid>a4bcc942-67c9-4942-8e3a-f21863ba7c41</uuid>
  <forward mode='hostdev' managed='yes'>
    <pf dev='enp194s0'/>
    <address type='pci' domain='0x0000' bus='0xc2' slot='0x00' function='0x1'/>
    <address type='pci' domain='0x0000' bus='0xc2' slot='0x00' function='0x2'/>
    <address type='pci' domain='0x0000' bus='0xc2' slot='0x00' function='0x3'/>
    <address type='pci' domain='0x0000' bus='0xc2' slot='0x00' function='0x4'/>
    <address type='pci' domain='0x0000' bus='0xc2' slot='0x00' function='0x5'/>
    <address type='pci' domain='0x0000' bus='0xc2' slot='0x00' function='0x6'/>
    <address type='pci' domain='0x0000' bus='0xc2' slot='0x00' function='0x7'/>
    <address type='pci' domain='0x0000' bus='0xc2' slot='0x01' function='0x0'/>
    <address type='pci' domain='0x0000' bus='0xc2' slot='0x01' function='0x1'/>
    <address type='pci' domain='0x0000' bus='0xc2' slot='0x01' function='0x2'/>
    <address type='pci' domain='0x0000' bus='0xc2' slot='0x01' function='0x3'/>
    <address type='pci' domain='0x0000' bus='0xc2' slot='0x01' function='0x4'/>
    <address type='pci' domain='0x0000' bus='0xc2' slot='0x01' function='0x5'/>
    <address type='pci' domain='0x0000' bus='0xc2' slot='0x01' function='0x6'/>
    <address type='pci' domain='0x0000' bus='0xc2' slot='0x01' function='0x7'/>
    <address type='pci' domain='0x0000' bus='0xc2' slot='0x02' function='0x0'/>
  </forward>
</network>
IIRC, when I originally set this up I only specified the "pf" element and libvirt added the "address" tags. It's been a while so my memory is fuzzy.
 
Last edited:

richardm

Member
Sep 27, 2013
63
28
18
Sorry for necroposting but I think I've stumbled on a solution for the dreaded unknown header type 7f, ignoring device, at least on my system. ARI is enabled in BIOS (UEFI) but setpci suggests the upstream chipset port does not support ARI (I am using my chipset x4 port).

Regardless, toggling the poorly documented PF_NUM_OF_VF_VALID to '1' magically made for a successful spawning of the elusive SRIOV VF's.

sudo mstconfig -d 06:00.0 set PF_NUM_OF_VF_VALID=1

After a reboot:

echo 2 | sudo tee /sys/bus/pci/devices/0000:06:00.0/sriov_numvfs
echo 2 | sudo tee /sys/bus/pci/devices/0000:06:00.1/sriov_numvfs


...yields:

06:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
06:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
06:00.2 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
06:00.3 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
06:00.4 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function]
06:00.5 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function]
06:00.6 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function]
06:00.7 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function]


0 and 1 are the "native" interfaces, port 1 and 2.
2 and 3 are NPARs (i.e. NUM_OF_PF=4 -- VERY poorly documented)
4 and 5 are SRIOV VFs for port 1
6 and 7 are SRIOV VFs for port 2

Hope this helps someone. I've been picking at this off-and-on for months.
 
  • Like
Reactions: blunden