(Proxmox + OPNSense) High host CPU with PCI NIC passthrough

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

daern

New Member
May 23, 2023
9
1
1
Hi all,

Like many others (and with much thanks to the STH Youtube channel!), I'm now running what seems to be this year's high fashion of home firewall config:

  • Aliexpress N5105 (i226-V version), using decent RAM and SSD (i.e. Crucial + Samsung)
  • Proxmox (7.4-3 - clean install last week and fully updated post-install)
  • OPNsense (23.1.7_3), configured with two cores and 4GB
All went together fine. I've configured PCI passthrough (iommu / VT-D enabled), and exposed two physical ports to the OPNsense VM for WAN and LAN. PPPoE on the WAN connection, which is only a 45Mbps VDSL connection (sadly). No real issues getting it all working, and it's been stable since installing on Saturday.

During maxed-out downloads from the internet, I'm seeing proxmox reporting the guest CPU rising from 5% to a stable 25% (much higher than I'd expect for a trifling 45Mbps), but the opnsense VM itself reports almost zero change and idle CPU usage during this download. The opnsense UI also feels quite laggy when accessing it during a download. If I switch the two NICs back to virtio adapters exposed from Proxmox, the problem is much reduced with host CPU rising to somewhere around 10% instead. When looking at top on the Proxmox host, the CPU usage is virtually all in the kvm process.

Any thoughts? Is there anything I specifically need to check? I've already confirmed that hardware checksum offload is disabled (this appears to be the default in opnsense for my install), but have tried with it enabled (no change). My experience (to be fair, VMware and much bigger / better hardware!) is that PCI passthrough should be extremely low overhead for both host and guest, with the "cost" of this configuration mostly in a lack of flexibility (e.g. migrating VMs in a cluster) and inability to share resource between multiple VMs, something I'm happy to forego here.

Tried plenty of stuff, but not getting anywhere so have switched back to virtio for now, but would be nice to get to the bottom of this, or at least get experience from others as to what they are seeing.
 

john389

Member
May 21, 2022
36
12
8
Hi, my guess would be that something isn't working quite right when you use VT-D. It happens, problems with the Kernel, BIOS option not enabled/disabled, NIC driver, VM configuration, etc.

Perhaps run these commands to get a better understanding of the problem. It is best if only the VM that is causing problems is running, otherwise you'll have to modify them to ensure you only see the data that is related to your VM.

Bash:
perf kvm --host top -p YOUR-VM-PID-HERE
perf stat -e 'kvm:*' -a -- sleep 1
perf kvm --host stat live
You are looking for the source of the CPU usage, i.e. which part of the kernel- or userspace and which function is called that much, that your CPU goes up to 25%, when the system is practically "idle".

Then you can compare it to the non-VT-D version.

If you search for these commands, you'll find explanations on what exactly they do, what others have done to fix their problems, etc.

Perhaps someone else has the same hardware and can supply a solution.
 
  • Like
Reactions: daern and Stephan

daern

New Member
May 23, 2023
9
1
1
@john389 Many thanks, great advice and I shall do so later tonight when I can mess around without upsetting the household.
 

efschu3

Active Member
Mar 11, 2019
160
61
28
I was never able to get good (expected) network performance with virtualized BSD derivates. No matter if used virtio, pice passthrough or sriov.
Tried for years in kinda all combinations possible.

Switched to Linux as virtualized firewall - works as expected.
 

daern

New Member
May 23, 2023
9
1
1
Tested with PCI passthrough NICs for both LAN and WAN connections:

Code:
perf kvm --host stat live

Idle:

             VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time         Avg time

           MSR_WRITE        271    83.38%    85.91%      0.80us 110364.06us   4884.78us ( +-  23.81% )
                 HLT         35    10.77%    12.91%      0.88us  39560.51us   5682.42us ( +-  31.60% )
  EXTERNAL_INTERRUPT         18     5.54%     1.18%      1.27us  17867.85us   1007.63us ( +-  98.43% )
    PREEMPTION_TIMER          1     0.31%     0.00%      1.63us      1.63us      1.63us ( +-   0.00% )

Total Samples:325, Total events handled time:1540798.91us.

Downloading at 45Mbps:

             VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time         Avg time

                 HLT        337    58.61%    30.66%      0.74us  52683.70us   1089.14us ( +-  21.94% )
           MSR_WRITE        224    38.96%    66.68%      0.90us  64742.81us   3563.61us ( +-  21.43% )
       EPT_MISCONFIG         13     2.26%     1.30%      1.80us  15524.01us   1195.92us ( +-  99.84% )
  EXTERNAL_INTERRUPT          1     0.17%     1.36%  16282.09us  16282.09us  16282.09us ( +-   0.00% )

Total Samples:575, Total events handled time:1197117.26us.
(load averages)HostGuest
Idle0.150.07
Downloading (45Mbps)0.850.10

Interestingly, neither the first nor second commands return any data for PID 100 (my firewall VM), which was quite surprising. The commands appear to execute ok, but there is no data. Likewise if I add a -p 100 switch to the stat live command, this also stops returning data. BSD guest issues, perhaps?
 

john389

Member
May 21, 2022
36
12
8
Most of the time should be spent in HLT = System is really idle.

MSR_WRITE is write instructions to Model Specific Registers, which is used for per-cpu timers, interprocessor-interrupts (communication between vCPUs), tracing, etc. Unless we know where exactly it spends its time, we can only guess what the problem is.

What you are seeing is that the kernel spends most of its time exiting the VM task to perform MSR_WRITE. It's doing something there. Normally, when idle, it should go into the VM task and exit it again, because there is nothing to do. That is what the HLT exit means.

As I don't use Proxmox, I don't know why the other commands didn't return results. I'm assuming that Proxmox has tracing features compiled into the kernel - please verify that -, that you've given it the qemu process PID and are running these commands as root. Please search for the commands online and verify that the options to perf haven't changed. The last time I used them was probably a year ago.

Also start looking into the BIOS for anything APIC related or HPET (High Precision Event Timer).

Most likely whatever method qemu has chosen to keep looking for new incoming ethernet packets - based on its version and the VM configuration in Proxmox - causes the VM/Host to spend a lot of time switching between tasks, a lot more than should be necessary. In other words: Because something isn't running smoothly, you have lots of overhead, which you see as high CPU usage.

Please provide the output of lscpu so that we know what CPU features you have. Also the VM configuration and the command-line options passed to qemu, i.e. what you find in the output of ps aux | grep qemu.

Sadly I can't run the perf commands on my own firewall, pfSense running on a Linux VM host, because the kernel is hardened and all tracing features are disabled.

What I can tell you is that the following produces a minimum of 7% aggregated CPU usage (each vCPU has 0-2% usage):

  • 8-Core Intel Xeon D-2146NT
  • 4 Cores including their threads pinned to the VM, i.e. 8 vCPUs, Emulator and I/O-Thread pinned to another Core and its Thread
  • Intel X710-DA2 with 2x SFP+ 10Gbps ports passed through (IOMMU/VT-d)
  • Nameserver, pfBlockerNG, lots of VLANs with inter-vlan traffic, VPN tunnels, etc. is running on/through pfSense
Of course, if it is really handling traffic, this goes up quite a lot.

Perhaps someone else has more knowledge and uses the perf command more regularly than me and can help.
 

john389

Member
May 21, 2022
36
12
8
After looking at this thread here VM freezes irregularly I'd say you have to ensure that you are running a new enough kernel - perhaps not the newest, I personally prefer stability, but that depends on what support exists for this hardware on which kernel version - and the newest available intel microcode.

Furthermore, I have had bad experiences with Q35 machines in combination with BSD guests and everything I want is running smoothly, and fast, on i440fx + UEFI. So perhaps look into that. While quite a lot of bug reports exist in FreeBSD when it comes to support for functionality in Q35, it is possible, that some have been closed in the upcoming FreeBSD 14.0.

Also make sure you are using host-passthrough for the CPU type, so that OPNSense knows what (advanced) features it can use.
 

daern

New Member
May 23, 2023
9
1
1
After looking at this thread here VM freezes irregularly I'd say you have to ensure that you are running a new enough kernel - perhaps not the newest, I personally prefer stability, but that depends on what support exists for this hardware on which kernel version - and the newest available intel microcode.
I've already updated to the latest kernel (Linux proxmox1 5.15.107-2-pve #1 SMP PVE 5.15.107-2 (2023-05-10T09:10Z) x86_64 GNU/Linux), but microcode was an excellent idea. I'd just kinda assumed that Debian would sort this out on first install, but obviously it's not quite that simple! Once the non-free repos were added, I've installed the version of microcode mentioned in that thread:

Code:
Before:
root@proxmox1:~# dmesg|grep microcode
[    0.154103] MMIO Stale Data: Vulnerable: Clear CPU buffers attempted, no microcode
[    0.154104] SRBDS: Vulnerable: No microcode
[    1.174446] microcode: sig=0x906c0, pf=0x1, revision=0x1d
[    1.174459] microcode: Microcode Update Driver: v2.2.

After:
root@proxmox1:~# dmesg|grep microcode
[    0.000000] microcode: microcode updated early to revision 0x24000024, date = 2022-09-02
[    0.154057] SRBDS: Vulnerable: No microcode
[    1.181158] microcode: sig=0x906c0, pf=0x1, revision=0x24000024
[    1.181194] microcode: Microcode Update Driver: v2.2.
Will do some more testing later on...
 

john389

Member
May 21, 2022
36
12
8
It is possible to use a newer kernel on proxmox, as many hinted at in that thread. That means you can activate different repositories in Proxmox - done manually? - which allows you to install newer kernel versions, like 5.19 or 6.0 or 6.1. The older LTS kernel version is 5.15, which is what I'm also using, but then you have newer hardware and it seemed like the people in that thread needed to update to 5.19 at least to get it stable. Please read it yourself, I don't have the time ...
 

daern

New Member
May 23, 2023
9
1
1
Muuuch better. With PCI passthrough, I'm now shifting 1Gbps (to iPerf on OPNsense) with 0.88 load (< 50%) on OPNsense and 1.00 load (25%) on the host - a significant improvement! Easily saturating the 1Gbps link my PC has.

Repeating the 45Mbps test above:

HostGuest
Idle0.20.10
Downloading (45Mbps)0.40.10

Definitely down into negligible numbers now, but very much happier with this now. For comparison, figures when shifting 1Gbps through the LAN interface:

HostGuest
Transferring (1Gbps)0.84 (22%)0.85 (35%)

This is much more the numbers I'd expect to see with PCI passthrough. Indeed, according to top, most of the OPNsense CPU usage is attributed to iPerf, so I think if this was typical through-box transfers, CPU load would be significantly lower again. I shall test this again when I get chance to do so.
 

daern

New Member
May 23, 2023
9
1
1
It is possible to use a newer kernel on proxmox, as many hinted at in that thread. That means you can activate different repositories in Proxmox - done manually? - which allows you to install newer kernel versions, like 5.19 or 6.0 or 6.1. The older LTS kernel version is 5.15, which is what I'm also using, but then you have newer hardware and it seemed like the people in that thread needed to update to 5.19 at least to get it stable. Please read it yourself, I don't have the time ...
Thanks, I only skimmed it myself, but I'm going to see how things settle down now that the microcode is in place and will take it from there. I'll read it more fully later tonight. The one thing I've not experienced yet is instability (it's only been a week or so though, with quite a few intentional reboots while I sort things out), so I'll probably leave things as they are for a little while now and go for the kernel upgrade if I have issues. One thing is that it's quite unclear from that thread whether it's the kernel updates or microcode that's actually making the difference. In fact, it rather seems to be more related to hardware revision, bios version and microcode installation.
 
Last edited:

zer0sum

Well-Known Member
Mar 8, 2013
849
473
63
Super easy to get newer kernels on Proxmox btw :cool:
  • apt update
  • apt install pve-kernel-6.2
  • reboot
 
  • Like
Reactions: daern

daern

New Member
May 23, 2023
9
1
1
Super easy to get newer kernels on Proxmox btw :cool:
  • apt update
  • apt install pve-kernel-6.2
  • reboot
Thanks!

Code:
root@proxmox1:~# uname -a
Linux proxmox1 6.2.11-2-pve #1 SMP PREEMPT_DYNAMIC PVE 6.2.11-2 (2023-05-10T09:13Z) x86_64 GNU/Linux
Will see how things go, but they were stable overnight. I probably need to stop fiddling with the whole environment now so I can be sure it's properly stable!
 

zer0sum

Well-Known Member
Mar 8, 2013
849
473
63
Just as an fyi, I found that virtio interfaces were just as fast as fully passed through SR-IOV ones when I tested up to 10G on my proxmox + opnsense server. I stuck with them for now due to the pure simplicity of it :cool:
 
  • Like
Reactions: MiniKnight

daern

New Member
May 23, 2023
9
1
1
Just as an fyi, I found that virtio interfaces were just as fast as fully passed through SR-IOV ones when I tested up to 10G on my proxmox + opnsense server. I stuck with them for now due to the pure simplicity of it :cool:
Thanks, that's interesting. It's quite possible that I'll see the same now that I've (hopefully!) got the base OS configured correctly. I will probably revisit this testing once I've given things a few weeks to stabilise and try again.

I have to admit that I feel more secure in having the WAN interface directly mapped through to the firewall, rather than passing through a proxmox bridge. In theory there should be no difference from a security point of view, but it feels more "right" doing it as a passthrough interface. The LAN one is much less critical, however, and I'd have no qualms about dumping this on a virtual interfacec.
 

mrpops2ko

New Member
Feb 12, 2017
1
0
1
33
Just as an fyi, I found that virtio interfaces were just as fast as fully passed through SR-IOV ones when I tested up to 10G on my proxmox + opnsense server. I stuck with them for now due to the pure simplicity of it :cool:
do you have any comparison data? i found SR-IOV to be about double the speed vs VMX3NET (esxi)
 

arfurtado

New Member
Jun 29, 2023
1
0
1
I just spent a good two hours testing a few configuration options, and got nowhere.
using N5105 with i-225v version, intel-microcode 3.20230512.1 (otherwise kernel panic on vms), proxmox 8, opnsense 2 cores, 3gb ram.

testing: fast.com 550mbps internet, 120 seconds.

OPNSENSE guest reports ~ 10% to 15% CPU usage, PROXMOX host reports about 70% to 80% usage (of two cores).
On all configurations what I got was about ~30% CPU usage of the 4 cpu cores of the host while downloading 550mbps.

Config 1: Q35 machine, PCIE NICs passthrough - same.
Config 2: Q35 machine, virtio NICs using bridge. On this configuration there was less usage of the kvm process, but two other processes appeared using CPU, and the overall CPU usage of everything was almost the same. Also there was more CPU time on the host being used for IRQs, instead of 'user' time used by kvm process.
Config 3: i440fx machine, PCI NIC passthrough - same.

Also tested hardware offloads enabled or disabled - this changed nothing on all configurations.

There are a few screenshots attached.

Edit: Also tried some opnsense tunables, to no avail.

dev.igc.0.fcFlow Controlruntime0
dev.igc.1.fcFlow Controlruntime0
hw.ibrs_disableDisable Indirect Branch Restricted Speculation (Spectre V2 mitigation)runtime1
hw.igc.rx_process_limitMaximum number of received packets to process at a time, -1 means unlimitedboot-time-1


OP @daern , did you really see adequate host CPU usage, equivalent to that reported by OPNSENSE? Did you change anything else?
 

Attachments

Last edited:

Haxtistic

New Member
Jun 30, 2023
1
0
1
I just spent a good two hours testing a few configuration options, and got nowhere.
using N5105 with i-225v version, intel-microcode 3.20230512.1 (otherwise kernel panic on vms), proxmox 8, opnsense 2 cores, 3gb ram.

testing: fast.com 550mbps internet, 120 seconds.

OPNSENSE guest reports ~ 10% to 15% CPU usage, PROXMOX host reports about 70% to 80% usage (of two cores).
On all configurations what I got was about ~30% CPU usage of the 4 cpu cores of the host while downloading 550mbps.

Config 1: Q35 machine, PCIE NICs passthrough - same.
Config 2: Q35 machine, virtio NICs using bridge. On this configuration there was less usage of the kvm process, but two other processes appeared using CPU, and the overall CPU usage of everything was almost the same. Also there was more CPU time on the host being used for IRQs, instead of 'user' time used by kvm process.
Config 3: i440fx machine, PCI NIC passthrough - same.

Also tested hardware offloads enabled or disabled - this changed nothing on all configurations.

There are a few screenshots attached.

Edit: Also tried some opnsense tunables, to no avail.

dev.igc.0.fcFlow Controlruntime0
dev.igc.1.fcFlow Controlruntime0
hw.ibrs_disableDisable Indirect Branch Restricted Speculation (Spectre V2 mitigation)runtime1
hw.igc.rx_process_limitMaximum number of received packets to process at a time, -1 means unlimitedboot-time-1


OP @daern , did you really see adequate host CPU usage, equivalent to that reported by OPNSENSE? Did you change anything else?


I have the same problem as you on my ProDesk 600 G3 SFF with the I340-T4 NIC Adapter. I've tried it on Proxmox 8, Proxmox 7.4, and even Alpine Linux with libvirtd, and the results are the same - the CPU overhead is extremely high - this can be observed in CPU metrics as well as in power consumption.

I've attempted to install intel microcodes and disable mitigations (Spectre/Meltdown) on both the host and guest, but I'm still experiencing the same issue. I truly have no idea what's going on