Problem Description:
Many people using Promox (and other KVM/QEMU based hypervisors) on Jasper Lake platforms (N5105, N6005) are experiencing kernel panics and/or hangs of their guest VMs. Both Linux (OpenWRT, Ubuntu) and FreeBSD (pfSense, OPNsense) guest VMs are affected. The host itself remains up and does not experience issues. LXC containers running in the host are not affected either. This happens on many Mini PCs; official Intel NUCs and Aliexpress units.
The issue seems to be related to CPU power management as the issue tends to occur during idle. Disabling C-States in the host BIOS and/or via kernel flags either completely or partially seems to reduce issue occurrence. Switching CPU idle mode from ACPI/MWAIT to Halt in the guest VMs seems to help too. Upgrading the host kernel from 5.15 to 5.19, 6.0, 6.1, or 6.2 seems to reduce incidence. Though ultimately the guests will still freeze or panic; possibly after a few weeks instead of a few days.
Working Fix (Updated 05/02/2023):
Option 1 (Load updated microcode at each boot):
Update CPU microcode to latest available in Debian non-free repo on Proxmox host:
Option 2 (Update BIOS if motherboard is Changwang N5105 v3, v4, v5):
Step 1: Download BIOS iso from Changwang's Website, ensure that you have a compatible Changwang motherboard:
Step 2: Use Rufus to convert the ISO into a bootable usb stick:
Step 3: Boot from USB stick (hit F11 at AMI splash screen) and let it automatically update bios.
Step 4: As bios settings will be reset after update, configure the BIOS as required by hitting delete at AMI splash screen.
Step 5: Verifying that BIOS has updated the microcode:
Old Potential Fixes:
Updating CPU microcode to latest available on Proxmox host:
Installing Opt-In Kernels on Proxmox:
Disabling ACPI/MWAIT idle in pfSense guest VM (FreeBSD):
The above can also be done on Linux based VM guests:
https://docs.kernel.org/admin-guide...el-command-line-options-and-module-parameters
According to this, setting idle=halt or intel_idle.max_cstate=0 as a kernel parameter will cause intel_idle initialization to fail.
Disabling C-States or Enhanced C-States in BIOS.
Using kvm64 as guest CPU instead of host and limiting CPU flags:
2 (1 sockets, 2 cores) [kvm64,flags=-pcid;-spec-ctrl;-ssbd;-ibpb;-virt-ssbd;-amd-ssbd;-amd-no-ssb;+aes]
Related Threads:
Example pfSense Kernel Panic:
Many people using Promox (and other KVM/QEMU based hypervisors) on Jasper Lake platforms (N5105, N6005) are experiencing kernel panics and/or hangs of their guest VMs. Both Linux (OpenWRT, Ubuntu) and FreeBSD (pfSense, OPNsense) guest VMs are affected. The host itself remains up and does not experience issues. LXC containers running in the host are not affected either. This happens on many Mini PCs; official Intel NUCs and Aliexpress units.
The issue seems to be related to CPU power management as the issue tends to occur during idle. Disabling C-States in the host BIOS and/or via kernel flags either completely or partially seems to reduce issue occurrence. Switching CPU idle mode from ACPI/MWAIT to Halt in the guest VMs seems to help too. Upgrading the host kernel from 5.15 to 5.19, 6.0, 6.1, or 6.2 seems to reduce incidence. Though ultimately the guests will still freeze or panic; possibly after a few weeks instead of a few days.
Working Fix (Updated 05/02/2023):
Option 1 (Load updated microcode at each boot):
Update CPU microcode to latest available in Debian non-free repo on Proxmox host:
Microcode - Debian Wiki
wiki.debian.org
Option 2 (Update BIOS if motherboard is Changwang N5105 v3, v4, v5):
Step 1: Download BIOS iso from Changwang's Website, ensure that you have a compatible Changwang motherboard:
畅网N5105-V3-V4-V5微码更新2023-04-18发布(鸡血版本)
BIOS更新内容:1,更新最新CPU微码2,修复了虚拟机情况下死机重启问题3,依然是满血全功耗开放版本(请自行注意做好散热)
www.changwang.com
Step 2: Use Rufus to convert the ISO into a bootable usb stick:
Rufus - The Official Website (Download, New Releases)
Rufus is a small application that creates bootable USB drives, which can then be used to install or run Microsoft Windows, Linux or DOS. In just a few minutes, and with very few clicks, Rufus can help you run a new Operating System on your computer...
rufus.ie
Step 3: Boot from USB stick (hit F11 at AMI splash screen) and let it automatically update bios.
Step 4: As bios settings will be reset after update, configure the BIOS as required by hitting delete at AMI splash screen.
Step 5: Verifying that BIOS has updated the microcode:
Code:
grep 'stepping\|model\|microcode' /proc/cpuinfo
model : 156
model name : Intel(R) Celeron(R) N5105 @ 2.00GHz
stepping : 0
microcode : 0x24000024
Updating CPU microcode to latest available on Proxmox host:
Microcode - Debian Wiki
wiki.debian.org
Installing Opt-In Kernels on Proxmox:
Opt-in Linux 5.19 Kernel for Proxmox VE 7.x available
We recently uploaded a 5.19 kernel into our repositories. The 5.15 kernel will stay the default on the Proxmox VE 7.x series, 5.19 is an option. 5.19 may be useful for some (especially newer) setups, for example if there is improved hardware support that has not yet been backported to 5.15. How...
forum.proxmox.com
Opt-in Linux 6.1 Kernel for Proxmox VE 7.x available
We recently uploaded a 6.1 kernel into our repositories. The 5.15 kernel will stay the default on the Proxmox VE 7.x series, 6.1 is an option that replaces the previous 5.19 based opt-in kernel. The 6.1 based kernel may be useful for some (especially newer) setups, for example if there is...
forum.proxmox.com
Disabling ACPI/MWAIT idle in pfSense guest VM (FreeBSD):
sysctl machdep.idle_mwait=0
sysctl machdep.idle=hlt
The above can also be done on Linux based VM guests:
https://docs.kernel.org/admin-guide...el-command-line-options-and-module-parameters
According to this, setting idle=halt or intel_idle.max_cstate=0 as a kernel parameter will cause intel_idle initialization to fail.
Disabling C-States or Enhanced C-States in BIOS.
Using kvm64 as guest CPU instead of host and limiting CPU flags:
2 (1 sockets, 2 cores) [kvm64,flags=-pcid;-spec-ctrl;-ssbd;-ibpb;-virt-ssbd;-amd-ssbd;-amd-no-ssb;+aes]
Related Threads:
VM freezes irregularly
Hi everyone, I have rewritten the text based on the troubleshooting I have tried. I am at my wit's end here: Some weeks ago, I bought a pfsense box on AliExpress (4-core N5105, 8GB RAM and 250GB NVMe) and installed Proxmox on it. On the box I run two VMs: pfSense - runs excellent and no...
forum.proxmox.com
Recurring Kernel Panics - Fatal trap 12: page fault while in kernel mode
Recurring Kernel Panics - Fatal trap 12: page fault while in kernel mode
forum.opnsense.org
pfSense kernel panic
I have been installing pfSense on Proxmox for a week and almost every day I register a crash but I have no idea what caused it. Among the logs I read "panic:...
forum.netgate.com
Example pfSense Kernel Panic:
Code:
kernel trap 12 with interrupts disabled
Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address = 0x1008e
fault code = supervisor write data, page not present
instruction pointer = 0x20:0xffffffff80da2d71
stack pointer = 0x28:0xfffffe0025782b00
frame pointer = 0x28:0xfffffe0025782b60
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = resume, IOPL = 0
current process = 11 (idle: cpu0)
trap number = 12
panic: page fault
cpuid = 0
time = 1672654637
KDB: enter: panic
db:0:kdb.enter.default> bt
Tracing pid 11 tid 100003 td 0xfffff8000520d000
kdb_enter() at kdb_enter+0x37/frame 0xfffffe00257828c0
vpanic() at vpanic+0x194/frame 0xfffffe0025782910
panic() at panic+0x43/frame 0xfffffe0025782970
trap_fatal() at trap_fatal+0x38f/frame 0xfffffe00257829d0
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe0025782a30
calltrap() at calltrap+0x8/frame 0xfffffe0025782a30
--- trap 0xc, rip = 0xffffffff80da2d71, rsp = 0xfffffe0025782b00, rbp = 0xfffffe0025782b60 ---
callout_process() at callout_process+0x1b1/frame 0xfffffe0025782b60
handleevents() at handleevents+0x188/frame 0xfffffe0025782ba0
cpu_activeclock() at cpu_activeclock+0x70/frame 0xfffffe0025782bd0
cpu_idle() at cpu_idle+0xa8/frame 0xfffffe0025782bf0
sched_idletd() at sched_idletd+0x326/frame 0xfffffe0025782cb0
fork_exit() at fork_exit+0x7e/frame 0xfffffe0025782cf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0025782cf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
db:0:kdb.enter.default> alltrace
Tracing command sleep pid 35878 tid 100632 td 0xfffff80057237740
sched_switch() at sched_switch+0x606/frame 0xfffffe003671b9c0
mi_switch() at mi_switch+0xdb/frame 0xfffffe003671b9f0
sleepq_catch_signals() at sleepq_catch_signals+0x3f3/frame 0xfffffe003671ba40
sleepq_timedwait_sig() at sleepq_timedwait_sig+0x14/frame 0xfffffe003671ba80
_sleep() at _sleep+0x1c6/frame 0xfffffe003671bb00
kern_clock_nanosleep() at kern_clock_nanosleep+0x1c1/frame 0xfffffe003671bb80
sys_nanosleep() at sys_nanosleep+0x3b/frame 0xfffffe003671bbc0
amd64_syscall() at amd64_syscall+0x387/frame 0xfffffe003671bcf0
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe003671bcf0
--- syscall (240, FreeBSD ELF64, sys_nanosleep), rip = 0x80038c9fa, rsp = 0x7fffffffec18, rbp = 0x7fffffffec60 ---
Tracing command sh pid 15762 tid 100600 td 0xfffff80016b8e000
sched_switch() at sched_switch+0x606/frame 0xfffffe00366cb970
mi_switch() at mi_switch+0xdb/frame 0xfffffe00366cb9a0
sleepq_catch_signals() at sleepq_catch_signals+0x3f3/frame 0xfffffe00366cb9f0
sleepq_wait_sig() at sleepq_wait_sig+0xf/frame 0xfffffe00366cba20
_sleep() at _sleep+0x1f1/frame 0xfffffe00366cbaa0
pipe_read() at pipe_read+0x3fe/frame 0xfffffe00366cbb10
dofileread() at dofileread+0x95/frame 0xfffffe00366cbb50
sys_read() at sys_read+0xc0/frame 0xfffffe00366cbbc0
amd64_syscall() at amd64_syscall+0x387/frame 0xfffffe00366cbcf0
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00366cbcf0
--- syscall (3, FreeBSD ELF64, sys_read), rip = 0x80044f03a, rsp = 0x7fffffffe3d8, rbp = 0x7fffffffe900 ---
Tracing command sh pid 15703 tid 100633 td 0xfffff80057237000
sched_switch() at sched_switch+0x606/frame 0xfffffe0036720800
mi_switch() at mi_switch+0xdb/frame 0xfffffe0036720830
sleepq_catch_signals() at sleepq_catch_signals+0x3f3/frame 0xfffffe0036720880
sleepq_wait_sig() at sleepq_wait_sig+0xf/frame 0xfffffe00367208b0
_sleep() at _sleep+0x1f1/frame 0xfffffe0036720930
kern_wait6() at kern_wait6+0x59e/frame 0xfffffe00367209c0
sys_wait4() at sys_wait4+0x7d/frame 0xfffffe0036720bc0
amd64_sy
Last edited: