How to enable SR-IOV on Connectx-3

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

elag

Member
Dec 1, 2018
79
14
8
I have a few ConnectX-3 (MCX 311, ethernet) and am looking to activate SR-IOV on one of them on my Asrock Z370 Extreme 4 motherboard.
I have a few questions for which I cannot find answers (or my Google-FU is lacking):

1- When I try to check the configuration I get:
[root@lair ~]# mlxconfig -d /dev/mst/mt4099_pciconf0 q

Device #1:
----------

Device type: ConnectX3
Device: /dev/mst/mt4099_pciconf0

Configurations: Next Boot
-E- Failed to query device current configuration
The Bios configration tool cannot set SRIOV either.
When I dump the ini file, it does not contain the sriov_en setting.
Is it possible to get the card to support SR-IOV?

2- What is the probability of my motherboard supporting SR-IOV at all?
Centos 7.6 reports the following for the IOMMU settings:
[ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.10.0-957.10.1.el7.x86_64 root=/dev/mapper/lair1-root ro crashkernel=auto rd.lvm.lv=lair1/root rd.lvm.lv=lair1/swap LANG=en_US.UTF-8 intel_iommu=on iommu=pt
[ 0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-3.10.0-957.10.1.el7.x86_64 root=/dev/mapper/lair1-root ro crashkernel=auto rd.lvm.lv=lair1/root rd.lvm.lv=lair1/swap LANG=en_US.UTF-8 intel_iommu=on iommu=pt
[ 0.689043] iommu: Adding device 0000:00:00.0 to group 0
[ 0.689058] iommu: Adding device 0000:00:01.0 to group 1
[ 0.689067] iommu: Adding device 0000:00:02.0 to group 2
[ 0.689081] iommu: Adding device 0000:00:14.0 to group 3
[ 0.689089] iommu: Adding device 0000:00:14.2 to group 3
[ 0.689100] iommu: Adding device 0000:00:16.0 to group 4
[ 0.689109] iommu: Adding device 0000:00:17.0 to group 5
[ 0.689122] iommu: Adding device 0000:00:1b.0 to group 6
[ 0.689134] iommu: Adding device 0000:00:1b.4 to group 7
[ 0.689145] iommu: Adding device 0000:00:1c.0 to group 8
[ 0.689157] iommu: Adding device 0000:00:1c.1 to group 9
[ 0.689170] iommu: Adding device 0000:00:1c.4 to group 10
[ 0.689182] iommu: Adding device 0000:00:1c.7 to group 11
[ 0.689193] iommu: Adding device 0000:00:1d.0 to group 12
[ 0.689211] iommu: Adding device 0000:00:1f.0 to group 13
[ 0.689220] iommu: Adding device 0000:00:1f.2 to group 13
[ 0.689229] iommu: Adding device 0000:00:1f.3 to group 13
[ 0.689238] iommu: Adding device 0000:00:1f.4 to group 13
[ 0.689247] iommu: Adding device 0000:00:1f.6 to group 14
[ 0.689253] iommu: Adding device 0000:01:00.0 to group 1
[ 0.689266] iommu: Adding device 0000:02:00.0 to group 15
[ 0.689277] iommu: Adding device 0000:6f:00.0 to group 16
[ 0.689289] iommu: Adding device 0000:70:00.0 to group 17

and lspci reports:
00:00.0 Host bridge: Intel Corporation 8th Gen Core Processor Host Bridge/DRAM Registers (rev 07)
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) (rev 07)
00:02.0 VGA compatible controller: Intel Corporation UHD Graphics 630 (Desktop)
00:14.0 USB controller: Intel Corporation 200 Series/Z370 Chipset Family USB 3.0 xHCI Controller
00:14.2 Signal processing controller: Intel Corporation 200 Series PCH Thermal Subsystem
00:16.0 Communication controller: Intel Corporation 200 Series PCH CSME HECI #1
00:17.0 SATA controller: Intel Corporation 200 Series PCH SATA controller [AHCI mode]
00:1b.0 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root Port #17 (rev f0)
00:1b.4 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root Port #21 (rev f0)
00:1c.0 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root Port #1 (rev f0)
00:1c.1 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root Port #2 (rev f0)
00:1c.4 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root Port #5 (rev f0)
00:1c.7 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root Port #8 (rev f0)
00:1d.0 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root Port #9 (rev f0)
00:1f.0 ISA bridge: Intel Corporation Z370 Chipset LPC/eSPI Controller
00:1f.2 Memory controller: Intel Corporation 200 Series/Z370 Chipset Family Power Management Controller
00:1f.3 Audio device: Intel Corporation 200 Series PCH HD Audio
00:1f.4 SMBus: Intel Corporation 200 Series/Z370 Chipset Family SMBus Controller
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (2) I219-V
01:00.0 Ethernet controller: Mellanox Technologies MT27500 Family [ConnectX-3]
02:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981
6f:00.0 SATA controller: ASMedia Technology Inc. ASM1062 Serial ATA Controller (rev 02)
70:00.0 USB controller: ASMedia Technology Inc. ASM2142 USB 3.1 Host Controller

I am pretty new to this stuff, so please be gentle....
/Louis
 

elag

Member
Dec 1, 2018
79
14
8
All tools report it as a mt4099:
[root@lair ~]# mst status
MST modules:
------------
MST PCI module loaded
MST PCI configuration module loaded

MST devices:
------------
/dev/mst/mt4099_pciconf0 - PCI configuration cycles access.
domain:bus:dev.fn=0000:01:00.0 addr.reg=88 data.reg=92
Chip revision is: 01
/dev/mst/mt4099_pci_cr0 - PCI direct access.
domain:bus:dev.fn=0000:01:00.0 bar=0xdc100000 size=0x100000
Chip revision is: 01
[root@lair ~]# flint -d /dev/mst/mt4099_pci_cr0 q
Image type: FS2
FW Version: 2.42.5000
FW Release Date: 5.9.2017
Product Version: 02.42.50.00
Rom Info: type=PXE version=3.4.752
Device ID: 4099
Description: Node Port1 Port2 Sys image
GUIDs: 0012312312312345 0012312312312346 0012312312312347 0012312312312348
MACs: 0002c9a166c0 0002c9a166c1
VSD:
PSID: MT_1170110023

[root@lair ~]# mstflint -d 04:00.0 q
-E- Cannot open Device: 04:00.0. No such file or directory. MFE_CR_ERROR
[root@lair ~]# mstflint -d 01:00.0 q
Image type: FS2
FW Version: 2.42.5000
FW Release Date: 5.9.2017
Product Version: 02.42.50.00
Rom Info: type=PXE version=3.4.752
Device ID: 4099
Description: Node Port1 Port2 Sys image
GUIDs: 0012312312312345 0012312312312346 0012312312312347 0012312312312348
MACs: 0002c9a166c0 0002c9a166c1
VSD:
PSID: MT_1170110023
[root@lair ~]#
 

arglebargle

H̸̖̅ȩ̸̐l̷̦͋l̴̰̈ỏ̶̱ ̸̢͋W̵͖̌ò̴͚r̴͇̀l̵̼͗d̷͕̈
Jul 15, 2018
657
244
43
Hey, I know the answer to this one! I spent a week or so trying to make SR-IOV work on unworthy hardware last year. I got close, but without ARI Forwarding most things SR-IOV tended to break.

Before we get into that let's get you going with the Mellanox tools. Your card is definitely an mt4099, the problem is which tools you're running and how you're addressing it:

Code:
[root@meta src]# mst status
MST modules:
------------
    MST PCI module loaded
    MST PCI configuration module loaded
MST devices:
------------
/dev/mst/mt4099_pciconf0         - PCI configuration cycles access.
                                   domain:bus:dev.fn=0000:01:00.0 addr.reg=88 data.reg=92
                                   Chip revision is: 01
/dev/mst/mt4099_pci_cr0          - PCI direct access.
                                   domain:bus:dev.fn=0000:01:00.0 bar=0xefc00000 size=0x100000
                                   Chip revision is: 01
[root@meta src]#
You want to use the PCI direct access device, like so:

Code:
[root@meta src]# mlxconfig -d /dev/mst/mt4099_pci_cr0 q
Device #1:
----------
Device type:    ConnectX3      
Device:         /dev/mst/mt4099_pci_cr0
Configurations:                              Next Boot
         SRIOV_EN                            False(0)      
         NUM_OF_VFS                          24            
         LINK_TYPE_P1                        ETH(2)        
         LINK_TYPE_P2                        ETH(2)        
         LOG_BAR_SIZE                        3              
         BOOT_PKEY_P1                        0              
         BOOT_PKEY_P2                        0              
         BOOT_OPTION_ROM_EN_P1               False(0)      
         BOOT_VLAN_EN_P1                     False(0)      
         BOOT_RETRY_CNT_P1                   0              
         LEGACY_BOOT_PROTOCOL_P1             None(0)        
         BOOT_VLAN_P1                        1              
         BOOT_OPTION_ROM_EN_P2               False(0)      
         BOOT_VLAN_EN_P2                     False(0)      
         BOOT_RETRY_CNT_P2                   0              
         LEGACY_BOOT_PROTOCOL_P2             None(0)        
         BOOT_VLAN_P2                        1              
         IP_VER_P1                           IPv4(0)        
         IP_VER_P2                           IPv4(0)        
         CQ_TIMESTAMP                        True(1)        
[root@meta src]#
and `flint` instead of `mstflint` (why they include both I will never understand)

Code:
[root@meta src]# flint -d /dev/mst/mt4099_pci_cr0 q
Image type:            FS2
FW Version:            2.42.5000
FW Release Date:       5.9.2017
Product Version:       02.42.50.00
Rom Info:              type=PXE version=3.4.752
Device ID:             4099
Description:           Node             Port1            Port2            Sys image
GUIDs:                 0002c9030039af50 0002c9030039af51 0002c9030039af52 0002c9030039af53
MACs:                                       0002c939af50     0002c939af51
VSD:                  
PSID:                  MT_1090120019
[root@meta src]#
That'll get you going to the point where you can try to make SR-IOV work.

Here's the rest of the problem: You almost certainly don't have ARI Forwarding capability on your motherboard chipset. Run `lspci -vvv | grep ARIFwd` and you'll see something like the following:

Code:
[root@meta src]# lspci -vvv | grep ARIFwd
                DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR+, OBFF Via WAKE# ARIFwd-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Via WAKE# ARIFwd-
                DevCap2: Completion Timeout: Range ABC, TimeoutDis+, LTR+, OBFF Not Supported ARIFwd+
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled ARIFwd-
                DevCap2: Completion Timeout: Range ABC, TimeoutDis+, LTR+, OBFF Not Supported ARIFwd+
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled ARIFwd-
...
You can page through the output of `lspci -vvv` and look at what's going on with the devices but almost certainly your chipset's PCIe bridge will have 'ARIFwd-' listed as a capability, that means that nothing downstream from that point can use ARI Forwarding even if it supports it, which basically screws you for SR-IOV on most devices.

You can actually turn on SR-IOV features for your mellanox card and have up to 8 total PF and VF devices, and you can actually sort-of isolate them well enough that you can use them for passthrough if you use the pcie_acs_override kernel hack but when I did this I ran into problems getting the mellanox drivers to behave when I shut down a guest. I never did get to the point where I could get a guest to properly release a VF so it could be re-used, which basically meant that I had to reboot the host every time I rebooted a guest and flat out killed the point of using SR-IOV on a hypervisor in the first place.

https://pcisig.com/sites/default/files/specification_documents/ECN-alt-rid-interpretation-070604.pdf
PCI passthrough via OVMF - ArchWiki
 

arglebargle

H̸̖̅ȩ̸̐l̷̦͋l̴̰̈ỏ̶̱ ̸̢͋W̵͖̌ò̴͚r̴͇̀l̵̼͗d̷͕̈
Jul 15, 2018
657
244
43

elag

Member
Dec 1, 2018
79
14
8
Yes, I did, I cannot exclude pilot errors, but it was not that one....:
[root@lair ~]# mlxconfig -d /dev/mst/mt4099_pci_cr0 q

[root@lair ~]# mst status
MST modules:
------------
MST PCI module loaded
MST PCI configuration module loaded

MST devices:
------------
/dev/mst/mt4099_pciconf0 - PCI configuration cycles access.
domain:bus:dev.fn=0000:01:00.0 addr.reg=88 data.reg=92
Chip revision is: 01
/dev/mst/mt4099_pci_cr0 - PCI direct access.
domain:bus:dev.fn=0000:01:00.0 bar=0xdc100000 size=0x100000
Chip revision is: 01
[root@lair ~]# mlxconfig -d /dev/mst/mt4099_pci_cr0 q

Device #1:
----------

Device type: ConnectX3
Device: /dev/mst/mt4099_pci_cr0

Configurations: Next Boot
-E- Failed to query device current configuration
 

arglebargle

H̸̖̅ȩ̸̐l̷̦͋l̴̰̈ỏ̶̱ ̸̢͋W̵͖̌ò̴͚r̴͇̀l̵̼͗d̷͕̈
Jul 15, 2018
657
244
43
Does flint work? Your fw could be too old for mlxconfig. Try flint -d device q and see what it outputs.
 

elag

Member
Dec 1, 2018
79
14
8
Firmware is the latest version I think:
[root@lair ~]# flint -d /dev/mst/mt4099_pci_cr0 q
Image type: FS2
FW Version: 2.42.5000
FW Release Date: 5.9.2017
Product Version: 02.42.50.00
Rom Info: type=PXE version=3.4.752
Device ID: 4099
Description: Node Port1 Port2 Sys image
GUIDs: 0012312312312345 0012312312312346 0012312312312347 0012312312312348
MACs: 0002c9a166c0 0002c9a166c1
VSD:
PSID: MT_1170110023

Is there another firmware I should try? Thanks for the help!
 

arglebargle

H̸̖̅ȩ̸̐l̷̦͋l̴̰̈ỏ̶̱ ̸̢͋W̵͖̌ò̴͚r̴͇̀l̵̼͗d̷͕̈
Jul 15, 2018
657
244
43
Huh, I'm at a loss.

Here's the first google result for your error message:
https://community.mellanox.com/s/qu...-failed-to-query-device-current-configuration

This guy used the direct PCI address when running mstconfig:
Configuring SR-IOV for a Mellanox ConnectX-3 NIC | that.guru

You might try that too. Beyond that there seem to be a lot of people with the same problem:
https://www.google.com/search?q=mellanox+mst+"Failed+to+query+device+current+configuration"

I'd try flushing the configuration on the card (the first link) and if that doesn't work you could close the 'flash not present' jumper (it's on the board up near the white i2c header, it should be labeled 'FNP' or somesuch) and flash the card again in failsafe mode. Beyond that someone else will need to chime in, I'm all out of ideas.