Bridging Network Interfaces with SR-IOV on a Proxmox Host

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

SeeGee

New Member
Jun 20, 2020
6
0
1
It's been a learning curve, but I have managed to enable SR-IOV on my Proxmox Host using a dual port Mellanox Connectx-3 Pro.
When you enable SR-IOV on these cards, you end up having a PF for each port enp1s0, enp1s0d1 respectively and then the mlx4_core driver also creates the VFs as Port1 only, Port2 only, and Port1+2 together as expected. I understand that "probing" (probe_vf= ) the VFs on driver initialization allows the Host to use the interface, and will be seen on the host as an additional nic.
In this case, I have probed one dual port VF on the Proxmox Host, which causes enp1s0v4 and enp1s0d1v4 interfaces to show up as seen in the output of $ ip link below:

PF Devices: enp1s0, enp1s0d1
(Probed) VF Devices: enp1s0v4, enp1s0d1v4

Code:
3: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 24:8a:07:bc:d5:a3 brd ff:ff:ff:ff:ff:ff
    vf 0     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, vlan 4095, spoof checking off, link-state auto
    vf 1     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, vlan 4095, spoof checking off, link-state auto
    vf 2     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, vlan 4095, spoof checking off, link-state auto
    vf 3     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, vlan 4095, spoof checking off, link-state auto
    vf 4     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, vlan 4095, spoof checking off, link-state auto
    vf 5     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, vlan 4095, spoof checking off, link-state auto
    vf 6     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, vlan 4095, spoof checking off, link-state auto
    vf 7     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, vlan 4095, spoof checking off, link-state auto
4: enp1s0d1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 24:8a:07:bc:d5:a4 brd ff:ff:ff:ff:ff:ff
    vf 0     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, vlan 4095, spoof checking off, link-state auto
    vf 1     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, vlan 4095, spoof checking off, link-state auto
    vf 2     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, vlan 4095, spoof checking off, link-state auto
    vf 3     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, vlan 4095, spoof checking off, link-state auto
    vf 4     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, vlan 4095, spoof checking off, link-state auto
    vf 5     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, vlan 4095, spoof checking off, link-state auto
    vf 6     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, vlan 4095, spoof checking off, link-state auto
    vf 7     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, vlan 4095, spoof checking off, link-state auto
5: enp1s0v4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 6a:4f:10:17:f8:18 brd ff:ff:ff:ff:ff:ff
6: enp1s0d1v4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 02:58:41:3a:23:53 brd ff:ff:ff:ff:ff:ff
I want to create a bridge on the Proxmox Host to each port, but what I would like to know is it better to create a bridge on the PF, or is it better to bridge the VF?
Some sr-iov documentation says the PFs will disappear once you initialize a VF, but I do not seem to experience that, even when I passthrough the unprobed VFs to guest VMs, all the interfaces listed above remain, and seem to function perfectly...

Should I just bridge the PF devices (enp1s0, and enp1s0d1), and do not probe any VFs within the Host, leaving all VFs for passthrough to VMs?
Or should I bridge the VF devices (enp1s0v4, enp1s0d1v4) instead?
 

mattventura

Active Member
Nov 9, 2022
447
217
43
FWIW my experience with a ConnectX-3 dual port 40GbE card:
1. PF works completely fine on the host even once a VF is initialized.
2. I couldn't get bridging to work whatsoever - even just a simple bridge with the host. As soon as I turned on SR-IOV and rebooted, my plain old bridge setup just wouldn't work. This works fine on Intel NICs (both via PF and VF - for VFs you have to enable promisc and allow spoofing), but I couldn't get it to work on Mellanox.
3. I found it easier to use the PF for the host and VF for VMs only because then I can just give whatever virtualization solution the entire SR-IOV pool to play with instead of needing to exclude one for the host (the second best play would be to use the last VF for the host so that it will be chosen last out of the pool).
 

heromode

Active Member
May 25, 2020
380
201
43
My systemd script for partitioning solarflare cards has the correct Requires=, Before= and After= variables to get the partitioning done before proxmox loads the networking and firewall services.

The script unbinds the vf's i want to use for passthrough to vm's from the host, and leaves the pf + vf's i want to use for the proxmox host. In my example, i leave both physical ports for proxmox, which i can bridge for use in vm's and ct's via virtio. I also leave one vf for each physical port for things like iscsi target on the host.

The remaining vf's i can then pcie passthrough to vm's.

I don't know how different mellanox sr-iov partitioning is, but it might help you:

nano /etc/systemd/system/sriov-vfs.service

Code:
[Unit]
Description=Enable SR-IOV and detach guest VFs from host
Requires=network.target
After=network.target
Before=pve-firewall.service
[Service]
Type=oneshot
RemainAfterExit=yes
# Create NIC VFs
ExecStart=/usr/bin/bash -c 'echo 8 > /sys/class/net/ens2f0np0/device/sriov_numvfs'
ExecStart=/usr/bin/bash -c 'echo 8 > /sys/class/net/ens2f1np1/device/sriov_numvfs'
# Set static MACs for VFs
ExecStart=/usr/bin/bash -c '/usr/bin/ip link set ens2f0np0 vf 0 mac 76:9e:17:83:39:e5'
ExecStart=/usr/bin/bash -c '/usr/bin/ip link set ens2f0np0 vf 1 mac 46:2c:6d:24:6b:1b'
ExecStart=/usr/bin/bash -c '/usr/bin/ip link set ens2f0np0 vf 2 mac 3e:47:48:12:ed:94'
ExecStart=/usr/bin/bash -c '/usr/bin/ip link set ens2f0np0 vf 3 mac be:e3:6a:f3:8f:ac'
ExecStart=/usr/bin/bash -c '/usr/bin/ip link set ens2f0np0 vf 4 mac 62:8f:3d:bb:02:08'
ExecStart=/usr/bin/bash -c '/usr/bin/ip link set ens2f0np0 vf 5 mac ae:91:57:b9:14:7f'
ExecStart=/usr/bin/bash -c '/usr/bin/ip link set ens2f0np0 vf 6 mac 5a:c2:08:a9:68:a7'
ExecStart=/usr/bin/bash -c '/usr/bin/ip link set ens2f0np0 vf 7 mac b2:f0:18:af:cb:c5'
ExecStart=/usr/bin/bash -c '/usr/bin/ip link set ens2f1np1 vf 0 mac 16:47:7c:a8:95:98'
ExecStart=/usr/bin/bash -c '/usr/bin/ip link set ens2f1np1 vf 1 mac a6:c7:c5:7f:9c:22'
ExecStart=/usr/bin/bash -c '/usr/bin/ip link set ens2f1np1 vf 2 mac b6:0f:45:34:5e:19'
ExecStart=/usr/bin/bash -c '/usr/bin/ip link set ens2f1np1 vf 3 mac 2a:f7:37:84:31:30'
ExecStart=/usr/bin/bash -c '/usr/bin/ip link set ens2f1np1 vf 4 mac 8a:fa:f8:c5:0b:93'
ExecStart=/usr/bin/bash -c '/usr/bin/ip link set ens2f1np1 vf 5 mac b2:f5:d5:2f:79:06'
ExecStart=/usr/bin/bash -c '/usr/bin/ip link set ens2f1np1 vf 6 mac c2:92:f5:fa:32:20'
ExecStart=/usr/bin/bash -c '/usr/bin/ip link set ens2f1np1 vf 7 mac 2e:fb:29:1e:48:31'
# Detach VFs from host
ExecStart=/usr/bin/bash -c 'echo 0000:01:00.3 > /sys/bus/pci/devices/0000\\:01\\:00.3/driver/unbind'
ExecStart=/usr/bin/bash -c 'echo 0000:01:00.4 > /sys/bus/pci/devices/0000\\:01\\:00.4/driver/unbind'
ExecStart=/usr/bin/bash -c 'echo 0000:01:00.5 > /sys/bus/pci/devices/0000\\:01\\:00.5/driver/unbind'
ExecStart=/usr/bin/bash -c 'echo 0000:01:00.6 > /sys/bus/pci/devices/0000\\:01\\:00.6/driver/unbind'
ExecStart=/usr/bin/bash -c 'echo 0000:01:00.7 > /sys/bus/pci/devices/0000\\:01\\:00.7/driver/unbind'
ExecStart=/usr/bin/bash -c 'echo 0000:01:01.0 > /sys/bus/pci/devices/0000\\:01\\:01.0/driver/unbind'
ExecStart=/usr/bin/bash -c 'echo 0000:01:01.1 > /sys/bus/pci/devices/0000\\:01\\:01.1/driver/unbind'
ExecStart=/usr/bin/bash -c 'echo 0000:01:01.3 > /sys/bus/pci/devices/0000\\:01\\:01.3/driver/unbind'
ExecStart=/usr/bin/bash -c 'echo 0000:01:01.4 > /sys/bus/pci/devices/0000\\:01\\:01.4/driver/unbind'
ExecStart=/usr/bin/bash -c 'echo 0000:01:01.5 > /sys/bus/pci/devices/0000\\:01\\:01.5/driver/unbind'
ExecStart=/usr/bin/bash -c 'echo 0000:01:01.6 > /sys/bus/pci/devices/0000\\:01\\:01.6/driver/unbind'
ExecStart=/usr/bin/bash -c 'echo 0000:01:01.7 > /sys/bus/pci/devices/0000\\:01\\:01.7/driver/unbind'
ExecStart=/usr/bin/bash -c 'echo 0000:01:02.0 > /sys/bus/pci/devices/0000\\:01\\:02.0/driver/unbind'
ExecStart=/usr/bin/bash -c 'echo 0000:01:02.1 > /sys/bus/pci/devices/0000\\:01\\:02.1/driver/unbind'
# List new VFs
ExecStart=/usr/bin/lspci -D -d1924:
# Destroy VFs
ExecStop=/usr/bin/bash -c 'echo 0 > /sys/class/net/ens2f0np0/device/sriov_numvfs'
ExecStop=/usr/bin/bash -c 'echo 0 > /sys/class/net/ens2f1np1/device/sriov_numvfs'
# Reload NIC VFs
ExecReload=/usr/bin/bash -c 'echo 0 > /sys/class/net/ens2f0np0/device/sriov_numvfs'
ExecReload=/usr/bin/bash -c 'echo 0 > /sys/class/net/ens2f1np1/device/sriov_numvfs'
ExecReload=/usr/bin/bash -c 'echo 8 > /sys/class/net/ens2f0np0/device/sriov_numvfs'
ExecReload=/usr/bin/bash -c 'echo 8 > /sys/class/net/ens2f1np1/device/sriov_numvfs'
ExecReload=/usr/bin/bash -c '/usr/bin/ip link set ens2f0np0 vf 0 mac 76:9e:17:83:39:e5'
ExecReload=/usr/bin/bash -c '/usr/bin/ip link set ens2f0np0 vf 1 mac 46:2c:6d:24:6b:1b'
ExecReload=/usr/bin/bash -c '/usr/bin/ip link set ens2f0np0 vf 2 mac 3e:47:48:12:ed:94'
ExecReload=/usr/bin/bash -c '/usr/bin/ip link set ens2f0np0 vf 3 mac be:e3:6a:f3:8f:ac'
ExecReload=/usr/bin/bash -c '/usr/bin/ip link set ens2f0np0 vf 4 mac 62:8f:3d:bb:02:08'
ExecReload=/usr/bin/bash -c '/usr/bin/ip link set ens2f0np0 vf 5 mac ae:91:57:b9:14:7f'
ExecReload=/usr/bin/bash -c '/usr/bin/ip link set ens2f0np0 vf 6 mac 5a:c2:08:a9:68:a7'
ExecReload=/usr/bin/bash -c '/usr/bin/ip link set ens2f0np0 vf 7 mac b2:f0:18:af:cb:c5'
ExecReload=/usr/bin/bash -c '/usr/bin/ip link set ens2f1np1 vf 0 mac 16:47:7c:a8:95:98'
ExecReload=/usr/bin/bash -c '/usr/bin/ip link set ens2f1np1 vf 1 mac a6:c7:c5:7f:9c:22'
ExecReload=/usr/bin/bash -c '/usr/bin/ip link set ens2f1np1 vf 2 mac b6:0f:45:34:5e:19'
ExecReload=/usr/bin/bash -c '/usr/bin/ip link set ens2f1np1 vf 3 mac 2a:f7:37:84:31:30'
ExecReload=/usr/bin/bash -c '/usr/bin/ip link set ens2f1np1 vf 4 mac 8a:fa:f8:c5:0b:93'
ExecReload=/usr/bin/bash -c '/usr/bin/ip link set ens2f1np1 vf 5 mac b2:f5:d5:2f:79:06'
ExecReload=/usr/bin/bash -c '/usr/bin/ip link set ens2f1np1 vf 6 mac c2:92:f5:fa:32:20'
ExecReload=/usr/bin/bash -c '/usr/bin/ip link set ens2f1np1 vf 7 mac 2e:fb:29:1e:48:31'
ExecReload=/usr/bin/bash -c 'echo 0000:01:00.3 > /sys/bus/pci/devices/0000\\:01\\:00.3/driver/unbind'
ExecReload=/usr/bin/bash -c 'echo 0000:01:00.4 > /sys/bus/pci/devices/0000\\:01\\:00.4/driver/unbind'
ExecReload=/usr/bin/bash -c 'echo 0000:01:00.5 > /sys/bus/pci/devices/0000\\:01\\:00.5/driver/unbind'
ExecReload=/usr/bin/bash -c 'echo 0000:01:00.6 > /sys/bus/pci/devices/0000\\:01\\:00.6/driver/unbind'
ExecReload=/usr/bin/bash -c 'echo 0000:01:00.7 > /sys/bus/pci/devices/0000\\:01\\:00.7/driver/unbind'
ExecReload=/usr/bin/bash -c 'echo 0000:01:01.0 > /sys/bus/pci/devices/0000\\:01\\:01.0/driver/unbind'
ExecReload=/usr/bin/bash -c 'echo 0000:01:01.1 > /sys/bus/pci/devices/0000\\:01\\:01.1/driver/unbind'
ExecReload=/usr/bin/bash -c 'echo 0000:01:01.3 > /sys/bus/pci/devices/0000\\:01\\:01.3/driver/unbind'
ExecReload=/usr/bin/bash -c 'echo 0000:01:01.4 > /sys/bus/pci/devices/0000\\:01\\:01.4/driver/unbind'
ExecReload=/usr/bin/bash -c 'echo 0000:01:01.5 > /sys/bus/pci/devices/0000\\:01\\:01.5/driver/unbind'
ExecReload=/usr/bin/bash -c 'echo 0000:01:01.6 > /sys/bus/pci/devices/0000\\:01\\:01.6/driver/unbind'
ExecReload=/usr/bin/bash -c 'echo 0000:01:01.7 > /sys/bus/pci/devices/0000\\:01\\:01.7/driver/unbind'
ExecReload=/usr/bin/bash -c 'echo 0000:01:02.0 > /sys/bus/pci/devices/0000\\:01\\:02.0/driver/unbind'
ExecReload=/usr/bin/bash -c 'echo 0000:01:02.1 > /sys/bus/pci/devices/0000\\:01\\:02.1/driver/unbind'
ExecReload=/usr/bin/lspci -D -d1924:
[Install]
WantedBy=multi-user.target
test and enable:

Code:
systemctl daemon-reload
systemctl start sriov-vfs.service
systemctl reload sriov-vfs.service
systemctl status sriov-vfs.service
systemctl enable sriov-vfs.service
 
  • Like
Reactions: nexox

SeeGee

New Member
Jun 20, 2020
6
0
1
FWIW my experience with a ConnectX-3 dual port 40GbE card:
1. PF works completely fine on the host even once a VF is initialized.
2. I couldn't get bridging to work whatsoever - even just a simple bridge with the host. As soon as I turned on SR-IOV and rebooted, my plain old bridge setup just wouldn't work. This works fine on Intel NICs (both via PF and VF - for VFs you have to enable promisc and allow spoofing), but I couldn't get it to work on Mellanox.
3. I found it easier to use the PF for the host and VF for VMs only because then I can just give whatever virtualization solution the entire SR-IOV pool to play with instead of needing to exclude one for the host (the second best play would be to use the last VF for the host so that it will be chosen last out of the pool).
Point 2: I can create the bridge on either PF or VF without issue, although I have not actually tried to add the bridged interface to a vm. I'll try that tonight.
Point 3: This is exactly what I was hoping to hear!

Thank you for providing me with some clarity. Got a sff pc I'm setting up as a 2nd Proxmox box, thought I would throw this nic into it, and try my hand at sr-iov passthrough. Never done it before, and I can, so why not?

I'll keep you posted on the bridging once I try it
 

SnJ9MX

Active Member
Jul 18, 2019
130
83
28
Been doing proxmox for quite some time with bridges and bonds. I never got into trying sr-iov stuff. What benefits does it offer compared to just assigning say eth0 and eth1 to vmbr0 and then hanging VMs/LXCs on that vmbr0?
 

mattventura

Active Member
Nov 9, 2022
447
217
43
Been doing proxmox for quite some time with bridges and bonds. I never got into trying sr-iov stuff. What benefits does it offer compared to just assigning say eth0 and eth1 to vmbr0 and then hanging VMs/LXCs on that vmbr0?
Minimizes networking overhead for the host. Less of an issue for 10GbE and below, but becomes a lot more useful at 25+. I noticed quite a difference on my 40GbE setup. It's also makes VLAN management a little easier, since I don't have to worry about whether the host bridge includes a particular VLAN or not, I just set it on the VM and that's it.
 

heromode

Active Member
May 25, 2020
380
201
43
Been doing proxmox for quite some time with bridges and bonds. I never got into trying sr-iov stuff. What benefits does it offer compared to just assigning say eth0 and eth1 to vmbr0 and then hanging VMs/LXCs on that vmbr0?
1: Your VM kernel modules will talk directly to your nic, bypassing virtio and host kernel completely. You can optimize your VM networking performance by loading NIC modules with any kernel module options your like, it's just a physical device as seen by your VM. (In the case of solarflare the proxmox host would load module 'sfc' for solarflare cards. VM will also load module 'sfc' instead of 'virtio', and talk directly to physical nic)

2: Your NIC CPU is a physical L2 switch. Any traffic you send from one VM with access to a sr-iov VF partition to another VM with access to a sr-iov VF partition on the same NIC / physical port, will travel through the NIC chipset physical ethernet layer 2, never touching the hypervisor kernel. Performance will be the same as if it was a physical L2 switch.
 
Last edited:
  • Like
Reactions: mach3.2

SnJ9MX

Active Member
Jul 18, 2019
130
83
28
Thanks for both comments. Squarely in the realm of "one day...". Not at all necessary for me for home lab or even my R630 in a datacenter (yet).
 

SeeGee

New Member
Jun 20, 2020
6
0
1
Thanks for both comments. Squarely in the realm of "one day...". Not at all necessary for me for home lab or even my R630 in a datacenter (yet).
Well, it's not something that I NEED in my homelab either, but hey. I'm a techy guy who enjoys messing with hardware. It's a "because I can" situation. After messing with things a little more, I was able to notice that I get another 500-700mbit increase in bandwidth using sr-iov over bridge when I iperf between the two ports. 8.9gbit/sec on bridging, 9.4gbit/sec using sr-iov. Also able to use hardware checksum offloading, and like mentioned above, vlans is easier to manage on a per vm basis.