How to use the mlx5_core driver with Mellanox ConnectX-4 Lx in Debian?

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

crackelf

Member
Apr 11, 2021
74
6
8
They work perfectly and don't need a driver download from mellanox, debian has and will have the mlx kernel driver included in it like, forever probably. Removing that would be like removing the Intel ixgb driver
I just got these ConnectX-4 Lx in the mail, but not getting very far.

I've been treating this like the ixgbe driver: I see Kernel driver in use: mlx5_core and am trying to create virtual functions in /etc/modprobe.d/mlx5_core.conf with some options like options mlx5_core num_vfs=12 port_type_array=2 probe_vf=12, but predictably dmesg says it's ignoring those options as they are "unknown".

Reading the docs over at nvidia show these as valid options, but only for their MLNX_OFED driver. Any tips and or resources here? I'm not finding much for the kernel driver other than the kernel.org docs, but not seeing much about vfs. Is the core driver not enough, and if so, how do I enable the EN driver in the kernel? Thanks for any help!
 

prdtabim

Active Member
Jan 29, 2022
184
72
28
Hi,
Use modinfo mlx5_core to see the module options.
There are the mlx5_ib and mlx5_vdpa modules too.

Look here:

 
Last edited:
  • Like
Reactions: klui and crackelf

crackelf

Member
Apr 11, 2021
74
6
8
Hi,
Use modinfo mlx5_core to see the module options.
There are the mlx5_ib and mlx5_vdpa modules too.

Look here:

Huge thank you for the resource! I was able to get mstconfig to spit out some info, but I'm still not seeing the virtual functions as addressable network interfaces with ip a. modinfo didn't give me many options, but maybe I'm reading it wrong. Am I supposed to be able to interact with the driver via modprobe at all?

Here's the card's current config:

Code:
Device type:    ConnectX4LX    
Name:           MCX4131A-GCA_Ax
Description:    ConnectX-4 Lx EN network interface card; 50GbE single-port QSFP28; PCIe3.0 x8; ROHS R6
Device:         03:00.0        

Configurations:                              Next Boot
         MEMIC_BAR_SIZE                      0              
         MEMIC_SIZE_LIMIT                    _256KB(1)      
         FLEX_PARSER_PROFILE_ENABLE          0              
         FLEX_IPV4_OVER_VXLAN_PORT           0              
         ROCE_NEXT_PROTOCOL                  254            
         NON_PREFETCHABLE_PF_BAR             False(0)       
         VF_VPD_ENABLE                       False(0)       
         STRICT_VF_MSIX_NUM                  False(0)       
         VF_NODNIC_ENABLE                    False(0)       
         NUM_OF_VFS                          8              
         SRIOV_EN                            True(1)        
         PF_LOG_BAR_SIZE                     5              
         VF_LOG_BAR_SIZE                     0              
         NUM_PF_MSIX                         63             
         NUM_VF_MSIX                         11             
         INT_LOG_MAX_PAYLOAD_SIZE            AUTOMATIC(0)   
         PCIE_CREDIT_TOKEN_TIMEOUT           0              
         ACCURATE_TX_SCHEDULER               False(0)       
         PARTIAL_RESET_EN                    False(0)       
         SW_RECOVERY_ON_ERRORS               False(0)       
         RESET_WITH_HOST_ON_ERRORS           False(0)       
         PCI_BUS0_RESTRICT_SPEED             PCI_GEN_1(0)   
         PCI_BUS0_RESTRICT_ASPM              False(0)       
         PCI_BUS0_RESTRICT_WIDTH             PCI_X1(0)      
         PCI_BUS0_RESTRICT                   False(0)       
         CQE_COMPRESSION                     BALANCED(0)    
         IP_OVER_VXLAN_EN                    False(0)       
         MKEY_BY_NAME                        False(0)       
         UCTX_EN                             True(1)        
         PCI_ATOMIC_MODE                     PCI_ATOMIC_DISABLED_EXT_ATOMIC_ENABLED(0)
         TUNNEL_ECN_COPY_DISABLE             False(0)       
         LRO_LOG_TIMEOUT0                    6              
         LRO_LOG_TIMEOUT1                    7              
         LRO_LOG_TIMEOUT2                    8              
         LRO_LOG_TIMEOUT3                    13             
         TX_SCHEDULER_BURST                  0              
         LOG_DCR_HASH_TABLE_SIZE             14             
         DCR_LIFO_SIZE                       16384          
         ROCE_CC_PRIO_MASK_P1                255            
         CLAMP_TGT_RATE_AFTER_TIME_INC_P1    True(1)        
         CLAMP_TGT_RATE_P1                   False(0)       
         RPG_TIME_RESET_P1                   300            
         RPG_BYTE_RESET_P1                   32767          
         RPG_THRESHOLD_P1                    1              
         RPG_MAX_RATE_P1                     0              
         RPG_AI_RATE_P1                      5              
         RPG_HAI_RATE_P1                     50             
         RPG_GD_P1                           11             
         RPG_MIN_DEC_FAC_P1                  50             
         RPG_MIN_RATE_P1                     1              
         RATE_TO_SET_ON_FIRST_CNP_P1         0              
         DCE_TCP_G_P1                        1019           
         DCE_TCP_RTT_P1                      1              
         RATE_REDUCE_MONITOR_PERIOD_P1       4              
         INITIAL_ALPHA_VALUE_P1              1023           
         MIN_TIME_BETWEEN_CNPS_P1            4              
         CNP_802P_PRIO_P1                    6              
         CNP_DSCP_P1                         48             
         LLDP_NB_DCBX_P1                     False(0)       
         LLDP_NB_RX_MODE_P1                  OFF(0)         
         LLDP_NB_TX_MODE_P1                  OFF(0)         
         DCBX_IEEE_P1                        True(1)        
         DCBX_CEE_P1                         True(1)        
         DCBX_WILLING_P1                     True(1)        
         KEEP_ETH_LINK_UP_P1                 True(1)        
         KEEP_IB_LINK_UP_P1                  False(0)       
         KEEP_LINK_UP_ON_BOOT_P1             False(0)       
         KEEP_LINK_UP_ON_STANDBY_P1          False(0)       
         DO_NOT_CLEAR_PORT_STATS_P1          False(0)       
         AUTO_POWER_SAVE_LINK_DOWN_P1        False(0)       
         NUM_OF_VL_P1                        _4_VLs(3)      
         NUM_OF_TC_P1                        _8_TCs(0)      
         NUM_OF_PFC_P1                       8              
         VL15_BUFFER_SIZE_P1                 0              
         DUP_MAC_ACTION_P1                   LAST_CFG(0)    
         SRIOV_IB_ROUTING_MODE_P1            LID(1)         
         IB_ROUTING_MODE_P1                  LID(1)         
         PCI_WR_ORDERING                     per_mkey(0)    
         MULTI_PORT_VHCA_EN                  False(0)       
         PORT_OWNER                          True(1)        
         ALLOW_RD_COUNTERS                   True(1)        
         RENEG_ON_CHANGE                     True(1)        
         TRACER_ENABLE                       True(1)        
         IP_VER                              IPv4(0)        
         UEFI_HII_EN                         True(1)        
         BOOT_DBG_LOG                        False(0)       
         UEFI_LOGS                           DISABLED(0)    
         BOOT_INTERRUPT_DIS                  False(0)       
         BOOT_LACP_DIS                       True(1)        
         DYNAMIC_VF_MSIX_TABLE               False(0)       
         EXP_ROM_UEFI_ARM_ENABLE             False(0)       
         EXP_ROM_UEFI_x86_ENABLE             False(0)       
         EXP_ROM_PXE_ENABLE                  True(1)        
         ADVANCED_PCI_SETTINGS               False(0)       
         SAFE_MODE_THRESHOLD                 10             
         SAFE_MODE_ENABLE                    True(1)
Any ideas how to get those 8 VF's to show up for kvm to use? Thanks again!
 

prdtabim

Active Member
Jan 29, 2022
184
72
28
Huge thank you for the resource! I was able to get mstconfig to spit out some info, but I'm still not seeing the virtual functions as addressable network interfaces with ip a. modinfo didn't give me many options, but maybe I'm reading it wrong. Am I supposed to be able to interact with the driver via modprobe at all?

Here's the card's current config:

Code:
Device type:    ConnectX4LX 
Name:           MCX4131A-GCA_Ax
Description:    ConnectX-4 Lx EN network interface card; 50GbE single-port QSFP28; PCIe3.0 x8; ROHS R6
Device:         03:00.0     

Configurations:                              Next Boot
         MEMIC_BAR_SIZE                      0           
         MEMIC_SIZE_LIMIT                    _256KB(1)   
         FLEX_PARSER_PROFILE_ENABLE          0           
         FLEX_IPV4_OVER_VXLAN_PORT           0           
         ROCE_NEXT_PROTOCOL                  254         
         NON_PREFETCHABLE_PF_BAR             False(0)    
         VF_VPD_ENABLE                       False(0)    
         STRICT_VF_MSIX_NUM                  False(0)    
         VF_NODNIC_ENABLE                    False(0)    
         NUM_OF_VFS                          8           
         SRIOV_EN                            True(1)     
         PF_LOG_BAR_SIZE                     5           
         VF_LOG_BAR_SIZE                     0           
         NUM_PF_MSIX                         63          
         NUM_VF_MSIX                         11          
         INT_LOG_MAX_PAYLOAD_SIZE            AUTOMATIC(0)
         PCIE_CREDIT_TOKEN_TIMEOUT           0           
         ACCURATE_TX_SCHEDULER               False(0)    
         PARTIAL_RESET_EN                    False(0)    
         SW_RECOVERY_ON_ERRORS               False(0)    
         RESET_WITH_HOST_ON_ERRORS           False(0)    
         PCI_BUS0_RESTRICT_SPEED             PCI_GEN_1(0)
         PCI_BUS0_RESTRICT_ASPM              False(0)    
         PCI_BUS0_RESTRICT_WIDTH             PCI_X1(0)   
         PCI_BUS0_RESTRICT                   False(0)    
         CQE_COMPRESSION                     BALANCED(0) 
         IP_OVER_VXLAN_EN                    False(0)    
         MKEY_BY_NAME                        False(0)    
         UCTX_EN                             True(1)     
         PCI_ATOMIC_MODE                     PCI_ATOMIC_DISABLED_EXT_ATOMIC_ENABLED(0)
         TUNNEL_ECN_COPY_DISABLE             False(0)    
         LRO_LOG_TIMEOUT0                    6           
         LRO_LOG_TIMEOUT1                    7           
         LRO_LOG_TIMEOUT2                    8           
         LRO_LOG_TIMEOUT3                    13          
         TX_SCHEDULER_BURST                  0           
         LOG_DCR_HASH_TABLE_SIZE             14          
         DCR_LIFO_SIZE                       16384       
         ROCE_CC_PRIO_MASK_P1                255         
         CLAMP_TGT_RATE_AFTER_TIME_INC_P1    True(1)     
         CLAMP_TGT_RATE_P1                   False(0)    
         RPG_TIME_RESET_P1                   300         
         RPG_BYTE_RESET_P1                   32767       
         RPG_THRESHOLD_P1                    1           
         RPG_MAX_RATE_P1                     0           
         RPG_AI_RATE_P1                      5           
         RPG_HAI_RATE_P1                     50          
         RPG_GD_P1                           11          
         RPG_MIN_DEC_FAC_P1                  50          
         RPG_MIN_RATE_P1                     1           
         RATE_TO_SET_ON_FIRST_CNP_P1         0           
         DCE_TCP_G_P1                        1019        
         DCE_TCP_RTT_P1                      1           
         RATE_REDUCE_MONITOR_PERIOD_P1       4           
         INITIAL_ALPHA_VALUE_P1              1023        
         MIN_TIME_BETWEEN_CNPS_P1            4           
         CNP_802P_PRIO_P1                    6           
         CNP_DSCP_P1                         48          
         LLDP_NB_DCBX_P1                     False(0)    
         LLDP_NB_RX_MODE_P1                  OFF(0)      
         LLDP_NB_TX_MODE_P1                  OFF(0)      
         DCBX_IEEE_P1                        True(1)     
         DCBX_CEE_P1                         True(1)     
         DCBX_WILLING_P1                     True(1)     
         KEEP_ETH_LINK_UP_P1                 True(1)     
         KEEP_IB_LINK_UP_P1                  False(0)    
         KEEP_LINK_UP_ON_BOOT_P1             False(0)    
         KEEP_LINK_UP_ON_STANDBY_P1          False(0)    
         DO_NOT_CLEAR_PORT_STATS_P1          False(0)    
         AUTO_POWER_SAVE_LINK_DOWN_P1        False(0)    
         NUM_OF_VL_P1                        _4_VLs(3)   
         NUM_OF_TC_P1                        _8_TCs(0)   
         NUM_OF_PFC_P1                       8           
         VL15_BUFFER_SIZE_P1                 0           
         DUP_MAC_ACTION_P1                   LAST_CFG(0) 
         SRIOV_IB_ROUTING_MODE_P1            LID(1)      
         IB_ROUTING_MODE_P1                  LID(1)      
         PCI_WR_ORDERING                     per_mkey(0) 
         MULTI_PORT_VHCA_EN                  False(0)    
         PORT_OWNER                          True(1)     
         ALLOW_RD_COUNTERS                   True(1)     
         RENEG_ON_CHANGE                     True(1)     
         TRACER_ENABLE                       True(1)     
         IP_VER                              IPv4(0)     
         UEFI_HII_EN                         True(1)     
         BOOT_DBG_LOG                        False(0)    
         UEFI_LOGS                           DISABLED(0) 
         BOOT_INTERRUPT_DIS                  False(0)    
         BOOT_LACP_DIS                       True(1)     
         DYNAMIC_VF_MSIX_TABLE               False(0)    
         EXP_ROM_UEFI_ARM_ENABLE             False(0)    
         EXP_ROM_UEFI_x86_ENABLE             False(0)    
         EXP_ROM_PXE_ENABLE                  True(1)     
         ADVANCED_PCI_SETTINGS               False(0)    
         SAFE_MODE_THRESHOLD                 10          
         SAFE_MODE_ENABLE                    True(1)
Any ideas how to get those 8 VF's to show up for kvm to use? Thanks again!
I tried recently this: My card is a ConnectX-3 PRO
modprobe mlx4_core num_vfs=8,0,0 probe_vf=8,0,0
I notice that without the probe_vf the interfaces are not detected.

But mlx5_core module doesn't have that options ... :(

Try looking in the sysfs. In intel based cards the vf interfaces are controlled from sysfs.
Look at lspci to see the bus and numbers of the device.

In my system the card is at 32:00.0 .
Look here: cd /sys/bus/pci/drivers/mlx5_core

If the driver detected the card correctly the must be 1 or more links like
0000:32:00.0 -> ../../../../devices/pci0000:00/0000:00:03.1/0000:32:00.0

Follow the link and will be show many files about the device including various sriov_

Look here too: https://support.mellanox.com/s/arti...ctX-4-ConnectX-5-ConnectX-6-with-KVM-Ethernet

Edit1. Look here: https://community.mellanox.com/s/article/howto-configure-and-probe-vfs-on-mlx5-drivers
 
Last edited:
  • Like
Reactions: klui

crackelf

Member
Apr 11, 2021
74
6
8
I tried recently this: My card is a ConnectX-3 PRO
modprobe mlx4_core num_vfs=8,0,0 probe_vf=8,0,0
I notice that without the probe_vf the interfaces are not detected.

But mlx5_core module doesn't have that options ... :(
Yes this is exactly where things went wrong for me haha
Try looking in the sysfs. In intel based cards the vf interfaces are controlled from sysfs.
Look at lspci to see the bus and numbers of the device.

In my system the card is at 32:00.0 .
Look here: cd /sys/bus/pci/drivers/mlx5_core

If the driver detected the card correctly the must be 1 or more links like
0000:32:00.0 -> ../../../../devices/pci0000:00/0000:00:03.1/0000:32:00.0

Follow the link and will be show many files about the device including various sriov_
Now we're getting somewhere! Thank you again for your help with this even though you don't have this specific card.
Code:
ls -lh /sys/devices/pci0000\:00/0000\:00\:1d.0/0000\:03\:00.0/ | grep -a "sriov"
-rw-r--r-- 1 root root 4.0K Feb 25 23:09 sriov_drivers_autoprobe
-rw-r--r-- 1 root root 4.0K Feb 25 23:09 sriov_numvfs
-r--r--r-- 1 root root 4.0K Feb 25 23:09 sriov_offset
-r--r--r-- 1 root root 4.0K Feb 25 23:09 sriov_stride
-r--r--r-- 1 root root 4.0K Feb 25 23:09 sriov_totalvfs
-r--r--r-- 1 root root 4.0K Feb 25 23:09 sriov_vf_device
and
Code:
cat /sys/devices/pci0000\:00/0000\:00\:1d.0/0000\:03\:00.0/sriov*
1
0
1
1
8
1016
The weirdest thing happens here... I try to echo 1 > /sys/module/mlx5_core/parameters/probe_vf like the second link mentions and get back -bash: /sys/module/mlx5_core/parameters/probe_vf: Permission denied

This is my first time messing with /sys/devices directly and not through a driver, so I'm not sure if I'm doing something terribly wrong here. Maybe the driver isn't letting me write to the space for whatever reason..? Will keep digging around thank you again for your help. I couldn't find these articles myself, so you're a massive help just pointing me to the right places.
 

crackelf

Member
Apr 11, 2021
74
6
8
Try looking in the sysfs. In intel based cards the vf interfaces are controlled from sysfs.
Look at lspci to see the bus and numbers of the device.

If the driver detected the card correctly the must be 1 or more links like
0000:32:00.0 -> ../../../../devices/pci0000:00/0000:00:03.1/0000:32:00.0

Follow the link and will be show many files about the device including various sriov_

Edit1. Look here: https://community.mellanox.com/s/article/howto-configure-and-probe-vfs-on-mlx5-drivers
Got it working!! Thank you thank you thank you. Some combination of this worked.

That article is right in nearly every aspect except for the actual /sys/ path, which was confusing.

ANSWER:
echo 8 > /sys/devices/pci0000\:00/0000\:00\:1d.0/0000\:03\:00.0/sriov_numvfs

Had to make sure a few things were lined up first:

mstconfig -d 03:00.0 set SRIOV_EN=1
and
mstconfig -d 03:00.0 set NUM_OF_VFS=8
were what set these values:
Code:
ls -lh /sys/devices/pci0000\:00/0000\:00\:1d.0/0000\:03\:00.0/ | grep -a "sriov"
-rw-r--r-- 1 root root 4.0K Feb 25 23:09 sriov_drivers_autoprobe
-rw-r--r-- 1 root root 4.0K Feb 25 23:09 sriov_numvfs
-r--r--r-- 1 root root 4.0K Feb 25 23:09 sriov_offset
-r--r--r-- 1 root root 4.0K Feb 25 23:09 sriov_stride
-r--r--r-- 1 root root 4.0K Feb 25 23:09 sriov_totalvfs
-r--r--r-- 1 root root 4.0K Feb 25 23:09 sriov_vf_device
and
Code:
cat /sys/devices/pci0000\:00/0000\:00\:1d.0/0000\:03\:00.0/sriov*
1
0
1
1
8
1016
Mellanox sets the NUM_OF_VFS value to the sriov_totalvfs value in /sys/, not the sriov_numvfs value.

Thank you again for putting me down the right path![/QUOTE]
 
  • Like
Reactions: klui

prdtabim

Active Member
Jan 29, 2022
184
72
28
Got it working!! Thank you thank you thank you. Some combination of this worked.

That article is right in nearly every aspect except for the actual /sys/ path, which was confusing.

ANSWER:
echo 8 > /sys/devices/pci0000\:00/0000\:00\:1d.0/0000\:03\:00.0/sriov_numvfs

Had to make sure a few things were lined up first:

mstconfig -d 03:00.0 set SRIOV_EN=1
and
mstconfig -d 03:00.0 set NUM_OF_VFS=8
were what set these values:


Mellanox sets the NUM_OF_VFS value to the sriov_totalvfs value in /sys/, not the sriov_numvfs value.

Thank you again for putting me down the right path!
[/QUOTE]

Good to know that the problem is solved.
Do you will use the vfs to associate in VMs ?
 

llowrey

Active Member
Feb 26, 2018
170
142
43
My suggestion for configuring VFs at boot is to create /etc/udev/rules.d/99-sriov.rules with this content:

Code:
ACTION=="add", SUBSYSTEM=="net", ATTRS{vendor}=="0x15b3", ATTRS{device}=="0x1013", ATTR{device/sriov_drivers_autoprobe}="0", ATTR{device/sriov_numvfs}="16"
Adjust the number of VFs to suite your needs.
 
  • Like
Reactions: klui

crackelf

Member
Apr 11, 2021
74
6
8
Good to know that the problem is solved.
Do you will use the vfs to associate in VMs ?
That is the plan! I can't get the link to show as connected now, but that may be a different issue. At least the VFs are visible.

I'm using these NICS
yes they're regular 40gbe ports. instead of expensive annoying mtp you can grab these BiDi optics and run 40gbE over cheap regular singlemode duplex LC fiber XQX2502 KAIAM QSFP+40G-LR4 Lite OPTICAL MODULE NEW PULLS | eBay
with a Brocade ICX 7450-48P and these LC UPC to LC UPC Duplex OS2 Single Mode PVC from FS.com, but "cable unplugged" and no connectivity
enp3s0np0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default
same for all the vfs
enp3s0v1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default

I know with my old Intel cards I had to enable unsupported sfp with options ixgbe allow_unsupported_sfp=1

From the Brocade side we get
Code:
1/3/1      Down    None    None None  None  No  1    0   cc4e.2488.3380         
1/4/1      Down    None    None None  None  No  1    0   cc4e.2488.3380
 
Last edited:

crackelf

Member
Apr 11, 2021
74
6
8
My suggestion for configuring VFs at boot is to create /etc/udev/rules.d/99-sriov.rules with this content:

Code:
ACTION=="add", SUBSYSTEM=="net", ATTRS{vendor}=="0x15b3", ATTRS{device}=="0x1013", ATTR{device/sriov_drivers_autoprobe}="0", ATTR{device/sriov_numvfs}="16"
Adjust the number of VFs to suite your needs.
I can't get this working for some reason. Ran udevadm control --reload-rules to make sure it was set but still nothing.

Where did you get the values for the vendor & device attrs?
 

llowrey

Active Member
Feb 26, 2018
170
142
43
I can't get this working for some reason. Ran udevadm control --reload-rules to make sure it was set but still nothing.

Where did you get the values for the vendor & device attrs?
Ah... I didn't see that yours is an Lx. Try something like this:

Code:
# lspci -nn | grep Mellanox
c2:00.0 Ethernet controller [0200]: Mellanox Technologies MT27700 Family [ConnectX-4] [15b3:1013]
c2:00.1 Ethernet controller [0200]: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function] [15b3:1014]
...
 
  • Like
Reactions: crackelf

crackelf

Member
Apr 11, 2021
74
6
8
Ah... I didn't see that yours is an Lx. Try something like this:

Code:
# lspci -nn | grep Mellanox
c2:00.0 Ethernet controller [0200]: Mellanox Technologies MT27700 Family [ConnectX-4] [15b3:1013]
c2:00.1 Ethernet controller [0200]: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function] [15b3:1014]
...
Ah ha, I didn't realize it was the lspci id. The LX ends in 5 instead of 3. Thanks I swapped it out and that now loads on boot! I was just going to bash script out "echo x > /sys/devices/....." but that is much cleaner. Where did you find this?

I'm still stuck on getting the NICs to be recognized to link these things to a switch. If you see anything I'm doing that is obviously wrong please let me know!
 

llowrey

Active Member
Feb 26, 2018
170
142
43
Ah ha, I didn't realize it was the lspci id. The LX ends in 5 instead of 3. Thanks I swapped it out and that now loads on boot! I was just going to bash script out "echo x > /sys/devices/....." but that is much cleaner. Where did you find this?

I'm still stuck on getting the NICs to be recognized to link these things to a switch. If you see anything I'm doing that is obviously wrong please let me know!
It took A LOT of googling to find the udev solution. I had taken the bash approach but it was an irritant that kept gnawing at me and I just kept searching and working at it.

Are you having trouble getting the nic to link or just VMs being able to use the VFs?

Are you running KVM/libvirt/QEMU VMs? If so, you should have a network that looks like this:

Code:
# virsh net-dumpxml Mellanox
<network connections='7'>
  <name>Mellanox</name>
  <uuid>a4bcc942-67c9-4942-8e3a-f21863ba7c41</uuid>
  <forward mode='hostdev' managed='yes'>
    <pf dev='enp194s0'/>
    <address type='pci' domain='0x0000' bus='0xc2' slot='0x00' function='0x1'/>
    <address type='pci' domain='0x0000' bus='0xc2' slot='0x00' function='0x2'/>
    <address type='pci' domain='0x0000' bus='0xc2' slot='0x00' function='0x3'/>
    <address type='pci' domain='0x0000' bus='0xc2' slot='0x00' function='0x4'/>
    <address type='pci' domain='0x0000' bus='0xc2' slot='0x00' function='0x5'/>
    <address type='pci' domain='0x0000' bus='0xc2' slot='0x00' function='0x6'/>
    <address type='pci' domain='0x0000' bus='0xc2' slot='0x00' function='0x7'/>
    <address type='pci' domain='0x0000' bus='0xc2' slot='0x01' function='0x0'/>
    <address type='pci' domain='0x0000' bus='0xc2' slot='0x01' function='0x1'/>
    <address type='pci' domain='0x0000' bus='0xc2' slot='0x01' function='0x2'/>
    <address type='pci' domain='0x0000' bus='0xc2' slot='0x01' function='0x3'/>
    <address type='pci' domain='0x0000' bus='0xc2' slot='0x01' function='0x4'/>
    <address type='pci' domain='0x0000' bus='0xc2' slot='0x01' function='0x5'/>
    <address type='pci' domain='0x0000' bus='0xc2' slot='0x01' function='0x6'/>
    <address type='pci' domain='0x0000' bus='0xc2' slot='0x01' function='0x7'/>
    <address type='pci' domain='0x0000' bus='0xc2' slot='0x02' function='0x0'/>
  </forward>
</network>
If not, create mellanox.xml with this content:

*the following is from memory, so ymmv

Code:
<network'>
  <name>Mellanox</name>
  <uuid>a4bcc942-67c9-4942-8e3a-f21863ba7c41</uuid>
  <forward mode='hostdev' managed='yes'>
    <pf dev='enp194s0'/>
  </forward>
</network>
Make sure you set the pf dev to your NIC's main interface as shown by ip link. You can generate your own uuid if you want. You might be able to delete the UUID line and libvirt will probably generate one for you but I don't know that for sure.

Run this to create the network: virsh net-create mellanox.xml.

You may need to manually start the network via virsh net-start Mellanox.

You should see your network fully populated as the above when you run virsh net-dumpxml Mellanox.

With this in place, libvirt will now manage the VFs for you. All you need to do is select the Mellanox network when attaching a virtual NIC to a VM. The VM will see it as a PCIe device and the OS should load the Mellanox driver and you should be off to the races.
 
  • Like
Reactions: crackelf

crackelf

Member
Apr 11, 2021
74
6
8
It took A LOT of googling to find the udev solution. I had taken the bash approach but it was an irritant that kept gnawing at me and I just kept searching and working at it.
I greatly appreciate your sacrifice to the google gods... I wasn't too satisfied with the bash approach either.
Are you having trouble getting the nic to link or just VMs being able to use the VFs?
I just tried connecting two cards together and they're perfectly happy to direct connect, so it seems like my Brocade switch is the problem here! You may wonder why the "!" - I'm overjoyed these things even work and I get a status light lol
Are you running KVM/libvirt/QEMU VMs? If so, you should have a network that looks like this:
I actually am that is my exact virtualization stack! You read my mind. Seriously thank you for all the work you've put into this and documentation. I'll come back to your xmls here when I get the Brocade playing nicely.
With this in place, libvirt will now manage the VFs for you. All you need to do is select the Mellanox network when attaching a virtual NIC to a VM. The VM will see it as a PCIe device and the OS should load the Mellanox driver and you should be off to the races.
I'm showing my novice here: I've been using macvtap for all my network connections and just letting the virtio driver carry the weight for my vm's networking. I've never created a network before, so this is all new to me. Thank you for your notes. I assume the PCIe device is much more transparent to the guest OS and you have proper driver support on the inside, and like all PCIe passthrough you won't have the CPU cycle overhead of processing things through virtio on the hypervisor. I'll reach back out when I've wrapped up the switching side of things! Thank you again for you insight on all of this you've saved me a tremendous amount of time.
 

llowrey

Active Member
Feb 26, 2018
170
142
43
I greatly appreciate your sacrifice to the google gods... I wasn't too satisfied with the bash approach either.

I just tried connecting two cards together and they're perfectly happy to direct connect, so it seems like my Brocade switch is the problem here! You may wonder why the "!" - I'm overjoyed these things even work and I get a status light lol

I actually am that is my exact virtualization stack! You read my mind. Seriously thank you for all the work you've put into this and documentation. I'll come back to your xmls here when I get the Brocade playing nicely.

I'm showing my novice here: I've been using macvtap for all my network connections and just letting the virtio driver carry the weight for my vm's networking. I've never created a network before, so this is all new to me. Thank you for your notes. I assume the PCIe device is much more transparent to the guest OS and you have proper driver support on the inside, and like all PCIe passthrough you won't have the CPU cycle overhead of processing things through virtio on the hypervisor. I'll reach back out when I've wrapped up the switching side of things! Thank you again for you insight on all of this you've saved me a tremendous amount of time.
Getting all this set up was a PITA at first but it has been smooth sailing and it has been much easier to put VMs on VLANs. But, the biggest benefit is that with macvtap you can't reach the host from a VM o_O. The NIC firmware has a vSwitch that handles routing of all traffic between PF & all VFs. So, no problem connecting from a VM's VF to the host's PF.
 

crackelf

Member
Apr 11, 2021
74
6
8
Do I need to assign MAC addresses to these VFs? Mellanox ConnectX(R) mlx5 core VPI Network Driver — The Linux Kernel documentation

devlink port show isn't matching what the kernel doc says. my hw_addr is 00:00:00:00:00:00 for both the physical and virtual functions. I'm in ethernet mode as proven by devlink port show. I'm set up in switchdev mode already, but can't seem to call the devlink port add argument.

edit: set up MAC addresses and still nothing.
 
Last edited:

llowrey

Active Member
Feb 26, 2018
170
142
43
You don't need to do anything with MACs. Your PF interface will get whatever MAC is assigned to the card and the VFs will get whatever MAC is assigned to the VM adapter that is using the VF.

Here's what mine look like:

Code:
# ip link
...
4: enp194s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 24:8a:07:1e:06:44 brd ff:ff:ff:ff:ff:ff
    vf 0     link/ether 52:54:00:a0:f6:10 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    vf 1     link/ether 52:54:00:e2:f0:17 brd ff:ff:ff:ff:ff:ff, vlan 666, spoof checking off, link-state auto, trust off, query_rss off
    vf 2     link/ether 52:54:00:0e:21:e7 brd ff:ff:ff:ff:ff:ff, vlan 666, spoof checking off, link-state auto, trust off, query_rss off
    vf 3     link/ether 52:54:00:aa:03:2c brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    vf 4     link/ether 52:54:00:a0:14:29 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    vf 5     link/ether 52:54:00:07:78:3e brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    vf 6     link/ether 52:54:00:76:9a:89 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    vf 7     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    vf 8     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    vf 9     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    vf 10     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    vf 11     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    vf 12     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    vf 13     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    vf 14     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    vf 15     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    altname enp194s0np0
You can see that I have VMs using VFs 0-6 by the fact that they have MACs assigned. The VF assignments are dynamic and managed by libvirt so you don't want to try to manually configure them.

Give ip link a try.
 

crackelf

Member
Apr 11, 2021
74
6
8
You don't need to do anything with MACs. Your PF interface will get whatever MAC is assigned to the card and the VFs will get whatever MAC is assigned to the VM adapter that is using the VF.

Give ip link a try.
Mine look identical minus the UP state >.> this is the weirdest thing. I think the card is fine, this switch is just not playing ball here. Thank you for all your help I can't say it enough you've really gone above and beyond helping me :)
 

crackelf

Member
Apr 11, 2021
74
6
8
Update I can't get these cards to connect to each other directly at anything over 1Gbps. I'm giving up on this lol I've lost too much time fiddling with these cards / switch.
 
Last edited:

crackelf

Member
Apr 11, 2021
74
6
8
Run this to create the network: virsh net-create mellanox.xml.

You may need to manually start the network via virsh net-start Mellanox.

You should see your network fully populated as the above when you run virsh net-dumpxml Mellanox.

With this in place, libvirt will now manage the VFs for you. All you need to do is select the Mellanox network when attaching a virtual NIC to a VM. The VM will see it as a PCIe device and the OS should load the Mellanox driver and you should be off to the races.
Trying this out with my old intel setup:
I've gotten the network xml created through virsh, and I'm stuck at
Code:
error: internal error: qemu unexpectedly closed the monitor: 2022-02-28T06:38:56.918858Z qemu-system-x86_64: -device vfio-pci,host=0000:03:10.1,id=hostdev0,bus=pci.1,addr=0x0: vfio 0000:03:10.1: failed to open /dev/vfio/21: Permission denied
This could be that ixbge behaves differently than mlnx but it *is* all PCI at the end of the day. Do I need to set permissions somewhere for this to be happy? Thanks in advanced!