Intel server PCIe passthrough stuck on "Enabled / Needs Reboot"

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Xoid

New Member
Nov 14, 2018
27
4
3
I'm having some issues with PCIe passthrough on my server and hoping someone would have some pointers for me. I have an Intel R2312WTTYS server with an S2600WTTS1R board, 2x E5-2699 v3 CPUs, and 128gb of RAM. This server is running ESXi 7.0 U3 (ESXi-7.0U3g-20328353).

In riser slot 1 I have a 2 slot riser (A2UL16RISER2, x16 & x8 slots) and in riser slot 2 I have a 3 slot riser (A2UL8RISER2, 3x x8 slots).

On the riser in slot 1 I just have a Quadro P2000 to passthrough to a Plex VM, and on the riser in slot 2 I have two Fujitsu D2607-A21 HBAs (LSI 9211-8i) for TrueNAS.

My problem is that I cannot get ESXi to passthrough both HBAs on riser slot 2. I can passthrough the device on riser slot 1, and once of the HBAs, but one HBA always shows as "Enabled / Needs Reboot" in the passthrough section. No matter how many times I reboot, it does not help. Soft reboot, hard reboot, shutdown & power on; nothing helps. I also set "VMkernel.Boot.disableACSCheck" to false as I saw that as a solution online, but it did not help.

esxi-needsreboot.png

If I swap the 2-slot and 3-slot riser cards between the riser slots, then put one HBA on the riser in slot 1 and one HBA on the riser in slot 2, and the P2000 also on the riser in slot 2, then I can passthrough both HBAs. However I cannot passthrough the P2000 then because of the same "Enabled / Needs Reboot" issue. But this means it's not a problem with any of the PCIe devices.

According to Intel, when using the 3 slot riser in riser slot 2, all the slots should be routed to CPU #2. So I'm not sure why one slot can be passed through, but the other cannot. I also did not see any option for bifurcation in the BIOS for slot 2, only slot 1 or 3.

intel_tps.png

VT-x and VT-d are both enabled in the BIOS. Is there anything else I may be missing to help get all 3 devices to pass through properly?
 

DavidWJohnston

Active Member
Sep 30, 2020
242
191
43
You need to set VMkernel.Boot.disableACSCheck to true - The goal is to get the VMKernel to NOT check the ACS status at boot - Hence you are enabling a policy to disable ACS check. (False is the default)

Also sometimes I've found after rebooting, it still shows as "needs reboot", and just toggling the passthru setting back and forth once seems to enable it, without another reboot being necessary.

If that doesn't work, check the ESXi log files for any useful messages that may give a clue on startup: https://docs.vmware.com/en/VMware-v...UID-832A2618-6B11-4A28-9672-93296DA931D0.html
 

Xoid

New Member
Nov 14, 2018
27
4
3
@DavidWJohnston Thanks for catching that, I actually did set VMkernel.Boot.disableACSCheck to True but wrote it incorrectly in my OP. I set it to True again, but still no dice. I'm not sure it would be a factor in my case anyway as my server supports ACS.

Toggling passthrough off & on also didn't seem to help.

I checked the logs but did not find anything useful. No errors when scanning the PCIe devices:

2023-02-27T15:55:23.058Z cpu0:2097152)PCI: 295: probe-scanning host/root bridge - Pastebin.com

The only difference I see is with "lspci -p"; the working HBA has "P" in the "M" column and the failing one has "V" instead. But it's not clear what the "M" column means and I can't find any info about the columns online. It seems the -p option is not a standard lspci parameter. My guess is that the "P" means it's configured for passthrough? Just an uneducated guess.

Code:
[root@esxi:~] esxcli hardware pci pcipassthru list
Device ID     Enabled
------------  -------
[...]
0000:81:00.0     true
0000:83:00.0     true

[root@esxi:~] lspci -p | grep 0000:8
Segm:Bu:De.F Vend:Dvid Subv:Subd ISA/irq/Vect P M Module       Name
0000:80:02.0 8086:2f04 0000:0000  10/   /     A V              PCIe RP[0000:80:02.0]
0000:80:02.2 8086:2f06 0000:0000  10/   /     A V              PCIe RP[0000:80:02.2]
[...]
0000:81:00.0 1000:0072 1734:1177  10/   /     A P pciPassthru
0000:83:00.0 1000:0072 1734:1177  10/   /     A V

[root@esxi:~] lspci -v
0000:80:02.0 Bridge PCI bridge: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCI Express Root Port 2 [PCIe RP[0000:80:                                                                                                                                                                                                    02.0]]
         Class 0604: 8086:2f04

0000:80:02.2 Bridge PCI bridge: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCI Express Root Port 2 [PCIe RP[0000:80:                                                                                                                                                                                                    02.2]]
         Class 0604: 8086:2f06

[...]

0000:81:00.0 Mass storage controller Serial Attached SCSI controller: Broadcom / LSI HBA Ctrl SAS 6G 0/1 [D2607]
         Class 0107: 1000:0072

0000:83:00.0 Mass storage controller Serial Attached SCSI controller: Broadcom / LSI HBA Ctrl SAS 6G 0/1 [D2607]
         Class 0107: 1000:0072
 

DavidWJohnston

Active Member
Sep 30, 2020
242
191
43
Hmm I don't know... It's a good one... What I might try is temporarily replace your boot drive with a spare SSD, install Windows on it, and see if you can actually use all devices simultaneously in Windows.

It could be the devices are reported as present by I2C/SMBus but enough lanes aren't active due to lack of bifurcation support or some other limitation - So the card(s) can't all start. Running an OS like Windows will allow you to use/benchmark all of the cards at once, thus eliminating the variable of passthru itself being the issue.

I'm not sure where else to go from here... If all the cards work in Windows, maybe try posting in r/vmware - If they do work, maybe try another subreddit geared more towards hardware.

Good Luck!
 

Xoid

New Member
Nov 14, 2018
27
4
3
I should have mentioned, I did try in Windows earlier and was able to passthrough all devices to a single VM, so no issues there. Seems to be VMWare specific. Unfortunately I didn't get any traction on the subreddit, but I guess I'll try the VMWare forums next.

Might just chalk it up to ESXi 7.0 generally being a buggy mess :confused:
 

DavidWJohnston

Active Member
Sep 30, 2020
242
191
43
Hmm yeah maybe try 6.7 & 8 if you can and see what happens. Note that a Windows Server update was just pushed that breaks SecureBoot in Windows Server 2022 VMs in 6.7 - I just had to disable SecureBoot on a bunch of my VMs.
 

Xoid

New Member
Nov 14, 2018
27
4
3
Well upgrading to ESXi 8.0 seems to have done the trick. My hardware is too old to be on the HCL but it seems to be working, and all devices can be set as active for passthrough without any issue. Haven't copied my VMs over yet for testing but it's looking good.

Wish I could figure out what the real issue was with 7.0, but not really worth my time digging deeper if 8.0 is working
 

Xoid

New Member
Nov 14, 2018
27
4
3
I'm generally familiar with and used to using ESXi and don't really feel like going down the road of converting all my existing VMs, although there are some features I'm missing with the free license I'd like to use and I don't feel like paying for VMUG.

I am building a couple new nodes though and will try out Proxmox on them to get my feet wet with it since I've heard so much about it