Troubleshooting GPU passthrough ESXi 6.5

Rand__

Well-Known Member
Mar 6, 2014
4,626
919
113
Its totally easy if you got the right card.
Just a tiny little bit difficult if you are trying to do something that was not intended by vmware and nvidia/amd;)
 

epicurean

Active Member
Sep 29, 2014
676
42
28
Have anyone got passthrough working without setting the external gpu to primary?
I would really like to be able to enable the internal (Intel) graphics as primary display.

I got it working with a 1080 in Esxi 6.7 /6.7u1 with the following additions:

Have to disable the internal graphics and set the 1080 to primary, wont work with internal enabled as secondary.
hypervisor.cpuid.v0 = FALSE
pciPassthru0.msiEnabled = "FALSE"
SMBIOS.reflectHost = "TRUE"
Does this mean that only 1 VM with 1 GPU attached to it will work in each esxi server? I have a 6.0u3 esxi server with 3 windows 10 VMs and 3 Nvidia GPUs passthrough to each of them, Been meaning to figure out how to enable GPU passthrough for esxi 6.7, which has not worked so far for me.
 

Docop

New Member
Jul 19, 2016
16
0
1
41
I re-install lately my esxi, as to trying to copy the usb key as a backup.. end up destroying it.. So esxi 6.0u2, Amd 7950 for game or so and Nvidia K2200. All Quadro card pass perfectly fine and amd too. On win10 vm amd with this older card still work fine. No extra setting needed. You plug the card, passthroug in config, reboot, then add the card to vm. AND : click reserve all memory and bingo. Pair with an Highpoint usb 3 4port that you add to your vm.
For esxi 6.7 it's quite painfull, buggy and very bad. Just hang from time to time. Esxi6.0 is just perfect. I will wait for esxi 7.0 before any upgrade.
 

besterino

New Member
Apr 22, 2017
27
7
3
43
Thought I’d share this after only silently reading here so far: I tinkered around with my totally „unsupported“ rig (2080Ti, X399/Threadripper 1920x) the last couple of days and finally figured out how to passthrough both an „onboard chipset“ USB3 controller as well as the 2080Ti, in particular on 6.7U3. Getting this to work „once“ is not such a big problem, but it gets tricky if both devices are to „survive“ and still work after a reboot of the VM.

As some of you are aware, until now - and without the settings below - the GPU would throw the famous „error code 43“ after a reboot of the VM and only work again once the ESXi host is rebooted as a whole.

So far the most common (only?) workaround seemed to be to disable the GPU in the device manager before VM-reboot and enable it again once the VM is up again. Some automated this procedure with respective scripts.

The following worked with Windows 10 (1903) VMs started both in BIOS as well as EFI Mode and did not require any manual / scripted interventions.

The secret is:

1. Edit passthru.map and delete / comment-out the default NVIDIA setting („#10de ffff bridge false“). This general/wildcard setting for all NVIDIA devices does not work - at least not for my 2080Ti FE. Instead I needed a more granular
approach: I had to set d3do for ALL Nvidia „sub-devices“ of my graphic card EXCEPT the GPU itself. The GPU has now no override anymore and ESXi will use its defaults (which works for the GPU, but unfortunately in particular not for the USB controller...). In my case the passthru.map now looks like this:

...
#NVIDIA
#Audio
10de 10f7 d3d0 false
#Serial Bus
10de 1ad7 d3d0 false
#USB
10de 1ad6 d3d0 false
...

(Reboot the host after you made changes to the passthru.map)

2. Add ALL devices of your NVIDIA graphic card as PCI Passthrough devices to your VM, i.e. for the 2080Ti: GPU, Audio, USB and Serial Bus.

NOTE: I did not test in detail whether this is really necessary or whether it is enough to only add one or certain devices in addition to the GPU. Adding all seemed reasonable to get all devices properly resetted at reboot (was also in line with some snippets I read somewhere) and it worked. ;)

NOTE2: would be interesting to see what happens if you just delete the NVIDIA settings without adding anything else... ah... I need more time...

3. In addition to the usual hypervisor.cpuid.v0 =FALSE“, set for ALL NVIDIA passthrough devices EXCEPT the NVIDIA USB controller of the 2080Ti:

pciPassthru0.msiEnabled = FALSE

NOTE: replace the 0 (zero) after pciPassthru with the correct number of your devices respectively.

NOTE for AMD:
Last but not least: my board (X399D8A-2T) / X399 chipset / Threadripper is a bitch when it comes to ESXi, in particular USB passthrough. Didn’t work properly even with 6.5U2. For the onboard controller to survive a VM-Reboot I also had to modify the passthru.map like this:

# AMD
1022 ffff d3d0 false

Probably instead of the ffff-wildcard the specific device works as well, but I was lazy and haven’t tested further (yet)...

Works now also in 6.5U2 and 6.7U3, with either EFI/BIOS VM startup setting.
 
Last edited:
  • Like
Reactions: Rand__

boledpoutine

New Member
Sep 3, 2019
3
1
3
Hey,

Looking for some help I can't seem to understand what I'm doing wrong!

Specs:
Mobo: SuperMicro X9DRH
GPU: Zotact GTX 1660
ESXI: 6.7U3

I am able to passthrough the GPU, before 6.7U3 it would only show the name Nvidia video controller. After upgradeing to U3 it shows the proper name. I don't know if this makes any difference or helps with my case

My issue is the moment I attach the GPU to a VM for example my Plex VM it says powering on 55% and then my host locks up
I can't ping my esx anymore and my only option is to reboot

I don't have a monitor plugged into the video card, I'll need to get one.
I can access the IPMI and just see the vmware home screen no purple/pink screen nothing, to be honest I haven't waited long enough to see if it will purple/ping screen as i seen others

What am I doing wrong?
Do i need to change a setting before powering on and attaching the gpu?
Do i need to attach the usb controller, audio and video?
How can I troubleshoot to see what is happening what is locking up my esx box?

Any help would be appreciated or I'll just return the card and give up on plex decoding.

Thanks
 

Docop

New Member
Jul 19, 2016
16
0
1
41
To confirm if your setup is working, start with a working solution. Install on a usb esxi 6.0 'Ux any of them' and this will work without problem detecting. Then for the gtx or any gaming card, it's painfull and not supposed to work. Any Quadro series will run you fine without any problem. I did install to try an upgrade on 6.7u2 and i burn a day and lot of problem. If you don't do san.. not much to gain on 6.7 and problematic also with my usb pass card. 6.0 all good. So boot, setting, pass you card, reset, add to vm, click the reserve all mem and boot the vm. yes put a screen on the card. you can pass the video portion and skip the audio section. *6.7 required lot of mod in code and many time it hang or don't survive a reboot of the system..
 

randman

Member
May 3, 2020
67
12
8
I added an Nvidia Quadro P2200 to my HPE ML30 Gen9 server, which is running ESXi 6.5 U3. I am running Plex Media Server on Ubuntu 18.04 VM. I successfully got hardware transcode (encode and decode) working!

The only issue that I have now is how to backup my plex VM. I do two kinds of backup:

1. I backup my Plex database using a script that runs in my plex VM. The script runs nightly and uses rsync to backup the contents of /var/lib/plexmediaserver/Library/Application Support/Plex Media Server/. The backup is saved to an NFS share residing in an external NAS.

2. I also have a Nakivo VMWare backup job that backs up the VM every night. Nakivo, like other 3rd party VMWare backup software, requires the use of snapshots.

Now that I am using PCI passthrough, snapshots can no longer be taken. VMWare doesn't support snapshots when passthrough is being used.

So, my question is: how do folks using passthrough backup their VM?

I suppose in the worst case, I can recreate the VM, reinstall Plex software, and do a restore of my Plex database from backup. But I want to have an alternate method as insurance policy, and would like a way to backup the entire VM.

One thought that I have is to clone the plex VM to a template. If I setup the clone job to clone to the same ESXi host, I don't get any warnings. But, I'd rather clone to another ESXi just in case my first host becomes unavailable. When setting up the clone job, I get compatibility warnings saying:

- Device 'PCI device 0' uses backing ", which is not accessible

- Device 'PCI device 1' uses backing , which is not accessible

(I added two PCI devices for my Nvidia GPU: one for video and the second for the audio controller)

One option is to ignore these warnings, and then later in the clone setup, tell it "Customize this virtual machine's hardware" and then remove the two PCI devices from the clone. Another option is to ignore the compatibility warnings, but don't delete the PCI devices. In the future, in case I have to use the cloned VM on another ESXi host (but only if the first ESXi host with the GPU card is dead), I can then edit the clone to remove the two PCI devices before powering it up (although the Nvidia driver may complain if I power up on the VM on the ESXi host that doesn't actually have the Nvidia GPU).

Anyone else have any suggestions on how to do backups once snapshots are no longer available?
 

das1996

Member
Sep 4, 2018
37
2
8
^^I use veeam. To backup, VM needs to be first shutdown. Two vm's are configured with passthrough,the firewall with a nic and windows htpc with the video card. Both back up (and restore) just fine.
 

randman

Member
May 3, 2020
67
12
8
^^I use veeam. To backup, VM needs to be first shutdown. Two vm's are configured with passthrough,the firewall with a nic and windows htpc with the video card. Both back up (and restore) just fine.
Thanks! Nakivo doesn’t work without snapshots so I’ll have to look into Veam. Is it possible to automate the shutdown, backup, and startup with Veam, or is it a manual procedure?
 

muhfugen

Active Member
Dec 5, 2016
133
39
28
if you wanted to script it, you could. if you just need to backup files from with-in the guest, like the Plex database, you can just use Veeam Agent which can run while the VM is running.
 

randman

Member
May 3, 2020
67
12
8
I currently use Nakivo for backup. As discussed, VM backup fails to create snapshot while the VM is powered on. Anyway, as das1996 suggested, I stopped my Plex VM, and ran my Nakivo VM backup. It was able to create the snapshot, backup my VM and remove the snapshot. While I already have a nightly backup script which runs inside the VM and backs up my Plex database to an external NAS, I really wanted to continue having a VM backup. Looks like the power off/backup/power on is a fine approach for backing up the VM. I still have to try restoring my Plex VM from the backup, just to double check...
 

Hossy

New Member
Jul 15, 2020
1
0
1
Hi everyone,

I'm having this same Code 43 driver issue. Somehow, I got it to work yesterday for about 30 minutes using all the way up to the latest (modified) driver, but now it's back to not working and I'm pulling my hair out. It seems people here might've gotten this to work. I'm willing to try anything and don't necessarily need to be running ESXi 7.0 (I can go back to 6.7), but I can't seem to get the installer for 6.5 to boot.

Hardware
Intel NUC9VXQNX
NVIDIA Quadro P2200
Two monitors (one connected to on-board graphics via HDMI; one connected to P2200 via DP-to-miniDP)

ESXi
ESXi 7.0.0, 16324942 (although I tried various versions of 6.7 as well)
passthru.map (excerpt):
Code:
# NVIDIA
# VMware default
# 10de  ffff  bridge   false

# NVIDIA Quadro P2200
# GPU
# 10de  1c31  d3d0  false
# HD Audio
10de  10f1  d3d0  false
VM
Windows Server 2019 Datacenter running in Test Mode (bcdedit /set testsigning on)
E1000E vNIC
No VMware Tools
vmx settings:
Code:
#This is the GPU
pciPassthru0.id = "00000:001:00.0"
pciPassthru0.deviceId = "0x1c31"
pciPassthru0.vendorId = "0x10de"
pciPassthru0.systemId = "5f0cc5f6-0de4-ab68-ab76-a4ae111e3c60"
pciPassthru0.present = "TRUE"
#This is the HD Audio controller
pciPassthru1.id = "00000:001:00.1"
pciPassthru1.deviceId = "0x10f1"
pciPassthru1.vendorId = "0x10de"
pciPassthru1.systemId = "5f0cc5f6-0de4-ab68-ab76-a4ae111e3c60"
pciPassthru1.present = "TRUE"
pciPassthru0.pciSlotNumber = "256"
pciPassthru1.pciSlotNumber = "1184"
hypervisor.cpuid.v0 = "FALSE"
SMBIOS.reflectHost = "TRUE"
pciPassthru0.msiEnabled = "FALSE"
pciPassthru1.msiEnabled = "FALSE"
Testing
Used these two sites for modifying the NVIDIA drivers
I tried nearly every NVIDIA driver available for Win2019 (after modifying them). Currently working with 26.21.14.3064 (NVIDIA 430.64) / 2016
  • 430.64-quadro-desktop-notebook-win10-64bit-international-whql.exe
  • 430.64-quadro-winserv-2016-2019-64bit-international-whql.exe
  • 431.98-quadro-winserv-2016-2019-64bit-international-whql.exe
  • 432.28-quadro-winserv-2016-2019-64bit-international-whql.exe
  • 436.30-quadro-desktop-notebook-win10-64bit-international-whql.exe
  • 440.97-quadro-winserv-2016-2019-64bit-international-whql.exe
  • 442.50-quadro-winserv-2016-2019-64bit-international-whql.exe
  • 442.74-quadro-winserv-2016-2019-64bit-international-whql.exe
  • 442.92-quadro-winserv-2016-2019-64bit-international-whql.exe
  • 443.18-quadro-winserv-2016-2019-64bit-international-whql.exe
  • 451.48-quadro-winserv-2016-2019-64bit-international-whql.exe
The disable/reboot/enable thing in Device Manager does not work for me.

All this testing has been done with the Primary Display BIOS setting set to IGFX (other options are Auto and PEG). I would like to use the integrated graphics to be able to access the DCUI, but also because without some display available to the hardware, Intel AMT Remote Desktop stops working.

However, what does "work" is setting the Primary Display BIOS settings to Auto or PEG. This causes the hardware to boot and display on the monitor connected to the P2200. The ESXi boot process will display the yellow/gray screen and then halt screen updates shortly after appearing. I'm presuming this is because the P2200 gets locked by the PCI Passthrough configuration and leaves the hardware and ESXi without a GPU. In this configuration, even the latest UNMODIFIED NVIDIA drivers work just fine in the VM even without advanced VMX settings like hypervisor.cpuid.v0. But, this is not the desired run state as I have to break other things that are otherwise valuable.

Thought I’d share this after only silently reading here so far: I tinkered around with my totally „unsupported“ rig (2080Ti, X399/Threadripper 1920x) the last couple of days and finally figured out how to passthrough both an „onboard chipset“ USB3 controller as well as the 2080Ti, in particular on 6.7U3. Getting this to work „once“ is not such a big problem, but it gets tricky if both devices are to „survive“ and still work after a reboot of the VM.

As some of you are aware, until now - and without the settings below - the GPU would throw the famous „error code 43“ after a reboot of the VM and only work again once the ESXi host is rebooted as a whole.

So far the most common (only?) workaround seemed to be to disable the GPU in the device manager before VM-reboot and enable it again once the VM is up again. Some automated this procedure with respective scripts.

The following worked with Windows 10 (1903) VMs started both in BIOS as well as EFI Mode and did not require any manual / scripted interventions.

The secret is:

1. Edit passthru.map and delete / comment-out the default NVIDIA setting („#10de ffff bridge false“). This general/wildcard setting for all NVIDIA devices does not work - at least not for my 2080Ti FE. Instead I needed a more granular
approach: I had to set d3do for ALL Nvidia „sub-devices“ of my graphic card EXCEPT the GPU itself. The GPU has now no override anymore and ESXi will use its defaults (which works for the GPU, but unfortunately in particular not for the USB controller...). In my case the passthru.map now looks like this:

...
#NVIDIA
#Audio
10de 10f7 d3d0 false
#Serial Bus
10de 1ad7 d3d0 false
#USB
10de 1ad6 d3d0 false
...

(Reboot the host after you made changes to the passthru.map)

2. Add ALL devices of your NVIDIA graphic card as PCI Passthrough devices to your VM, i.e. for the 2080Ti: GPU, Audio, USB and Serial Bus.

NOTE: I did not test in detail whether this is really necessary or whether it is enough to only add one or certain devices in addition to the GPU. Adding all seemed reasonable to get all devices properly resetted at reboot (was also in line with some snippets I read somewhere) and it worked. ;)

NOTE2: would be interesting to see what happens if you just delete the NVIDIA settings without adding anything else... ah... I need more time...

3. In addition to the usual hypervisor.cpuid.v0 =FALSE“, set for ALL NVIDIA passthrough devices EXCEPT the NVIDIA USB controller of the 2080Ti:

pciPassthru0.msiEnabled = FALSE

NOTE: replace the 0 (zero) after pciPassthru with the correct number of your devices respectively.

NOTE for AMD:
Last but not least: my board (X399D8A-2T) / X399 chipset / Threadripper is a bitch when it comes to ESXi, in particular USB passthrough. Didn’t work properly even with 6.5U2. For the onboard controller to survive a VM-Reboot I also had to modify the passthru.map like this:

# AMD
1022 ffff d3d0 false

Probably instead of the ffff-wildcard the specific device works as well, but I was lazy and haven’t tested further (yet)...

Works now also in 6.5U2 and 6.7U3, with either EFI/BIOS VM startup setting.
 
Last edited:

Serj_82

New Member
Aug 11, 2020
5
0
1
Hello!
Please tell me what can be: I was able to passthrough the video card in ESXi 6.5 (GeForce GTX690), she appeared in Windows 10 (the drivers were installed, but not accepted by the video card - this is another question...).
And now I DO NOT SEE her in the list of PCI devices!
Rebooted the host several times - it did not help.
Once again: before it WAS on the list of devices, and it could be correctly passed through, but now it just disappeared!
The video card is absolutely working, power is supplied.
 

marcoi

Well-Known Member
Apr 6, 2013
1,404
225
63
Gotha Florida
Hello!
Please tell me what can be: I was able to passthrough the video card in ESXi 6.5 (GeForce GTX690), she appeared in Windows 10 (the drivers were installed, but not accepted by the video card - this is another question...).
And now I DO NOT SEE her in the list of PCI devices!
Rebooted the host several times - it did not help.
Once again: before it WAS on the list of devices, and it could be correctly passed through, but now it just disappeared!
The video card is absolutely working, power is supplied.
double check in ESXI host that is setup as passthrough. maybe you moved a pcie device around on the system and it changed order?
Also did you try powering off the host and see if that cleared it up. Im not talking about just the VM but the whole system.
 
  • Like
Reactions: Serj_82

Serj_82

New Member
Aug 11, 2020
5
0
1
double check in ESXI host that is setup as passthrough. maybe you moved a pcie device around on the system and it changed order?
Also did you try powering off the host and see if that cleared it up. Im not talking about just the VM but the whole system.
This helped, (host shutdown) thanks a lot !!! But what was it? Why shutdown but not reboot?
 

Serj_82

New Member
Aug 11, 2020
5
0
1
Okay, but the main problems remain:

I have an Intel SR2600URBRP server (S5520UR motherboard). ESXi 6.5 is installed on it

The problem has two parts:
1) When a video card and a wi-fi PCI-E card are installed in the riser board at the same time, the video card (Nvidia GTX 690) is not visible in the list of the PCI device of the hypervisor. If you remove the wi-fi board, the card appears.
As far as we know, both devices fall into the same IOMMU group. And it seems that this problem is solved by some kind of patch...
That is, I would very much like to achieve the simultaneous operation of both devices.

2) The video card is correctly passed through to the guest OS, it is visible in the Win10 Device Manager as two devices (the card has two processors). But the drivers do not want to be picked up, and the SketchUp app launched requires hardware acceleration, that is, it does not see the external video card.
I wrote the necessary lines in the config (according to the manuals of the Internet) in order for the card to be correctly detected in the guest OS, but it did not help.
 

marcoi

Well-Known Member
Apr 6, 2013
1,404
225
63
Gotha Florida
Okay, but the main problems remain:

I have an Intel SR2600URBRP server (S5520UR motherboard). ESXi 6.5 is installed on it

The problem has two parts:
1) When a video card and a wi-fi PCI-E card are installed in the riser board at the same time, the video card (Nvidia GTX 690) is not visible in the list of the PCI device of the hypervisor. If you remove the wi-fi board, the card appears.
As far as we know, both devices fall into the same IOMMU group. And it seems that this problem is solved by some kind of patch...
That is, I would very much like to achieve the simultaneous operation of both devices.

2) The video card is correctly passed through to the guest OS, it is visible in the Win10 Device Manager as two devices (the card has two processors). But the drivers do not want to be picked up, and the SketchUp app launched requires hardware acceleration, that is, it does not see the external video card.
I wrote the necessary lines in the config (according to the manuals of the Internet) in order for the card to be correctly detected in the guest OS, but it did not help.
For 1 - see if you can move the cards to different pcie slots, you might be drawing too much power or hitting a pcie lane limit. You will need to experiment since it isnt an exact science.
For 2 - nvidia drivers are super picky and it might take a lot of trial and error to get them working. Also you may need to monitor or dummy plug setup on the video card. The other thing i recall is i had to remove the virtual video driver as nvidia drivers looked for that as a clue to running virtual machine. there is alot of info in the threads here so you may need to spend some time reading the threads and looking for what worked for various people and trying it out.

good luck.
 
  • Like
Reactions: Serj_82