Troubleshooting GPU passthrough ESXi 6.5

das1996

Member
Sep 4, 2018
37
2
8
No bueno.

I tried this variation before. Tried again.

The 1660 ti is a newer card, possible nvidia changed something. Or a combination of changes in esxi and the driver/card. I've been at this for the last few days and have tried all sorts of combinations in the passthru.map and vmx file. It all works great with this card in 6.5 (any update), but fails miserably in 6.7 and above.

I have a 1050 ti in my main box. Tempted to try it just to have a clear conscious that it's the card/driver/esxi than me doing something wrong.
 

das1996

Member
Sep 4, 2018
37
2
8
"Set nvidia gpu as primary"

Elaborate please.

Svga.present= false is in the vmx file to get rid of it entirely.
 

hmw

Active Member
Apr 29, 2019
259
83
28
"Set nvidia gpu as primary"

Elaborate please.

Svga.present= false is in the vmx file to get rid of it entirely.
Display 1 is the VMware SVGA display. I simply say 'Show on Display 2' which is the NVidia display

1602999423156.png 1602999438028.png 1602999536172.png
 

das1996

Member
Sep 4, 2018
37
2
8
I kind of figured as much after thinking about your comment. The svga.present = false gets rid of the vmware adapter/display entirely. The vm only sees a single gpu.

I think much of this is coming back to some kind of reset issue. The gpu is seen after a host reboot each and every time. It's only when the guest is rebooted that the error 43 crops up.

It's too late to pull the 1050 out but I'm tempted to give it a try tomorrow. Fortunately the test box is not even in a box... just a board on a stand of sorts. Very accessible. That would really be something if it works with it!
 

das1996

Member
Sep 4, 2018
37
2
8
See the second post here - HomeLab with Vmware ESXI and GPU passtrough |VMware Communities

He points out the unfavorable gpu state on guest shutdown.

I found a program by nirsoft - How to disable, enable, and uninstall a device from command-line on Windows, that allows device enable/disable via command line.

Using simple batch files for startup/shutdown (respectfully);

devmanview.exe /enable "NVIDIA GeForce GTX 1080 Ti"
devmanview.exe /disable "NVIDIA GeForce GTX 1080 Ti"

Allows the gpu to automatically be placed in a state which allows it to work after a guest reboot.

I added the appropriate command above to a startup.bat and shutdown.bat which is referenced in the group policy startup/shutdown policy.

1603004820657.png

So far this seems to work. It's not the most eloquent way. Also of concern is what effect this may have on OS/gpu stability in the long term. I usually go months between rebooting the hosts. On the production box, the win10 (htpc) is usually put to sleep after watching tv and restarted by WOL. I find after about 20-30 days I need to do an actual windows restart as it starts to get laggy.
 

das1996

Member
Sep 4, 2018
37
2
8
Wanted to provide an update after testing with a 1050 ti.

Swapped in the 1050 ti and disabled the startup/shutdown scripts (no disabling of gpu in device mgr at shutdown). Updated passthru.map with pid/vid of new card's gpu (d3d0 false).

Restarted as well as shutdown/restart a number of times. Gpu came back each time without any error 43's. IOW everything worked the way it should have.


So.. Take away... 16xx cards (and newer?) require the gpu disable/enable at shutdown/restart, otherwise error 43.

Same driver, same windows, same vmx settings, only change was the video card. Clearly something changed in the cards and/or the way the drivers interact with the newer cards.


I hope this helps the next person save some time and lots of hair pulling in getting this to work.
 

hmw

Active Member
Apr 29, 2019
259
83
28
So.. Take away... 16xx cards (and newer?) require the gpu disable/enable at shutdown/restart, otherwise error 43.
Which would mean the default reset method that works for the GPU (FLR) isn't working anymore

Reset Method
Possible values for the reset method include flr, d3d0, link, bridge, or default. The default setting is described as follows. If a device supports function level reset (FLR), ESX always uses FLR. If the device does not support FLR, ESX next defaults to link reset and bus reset in that order. Link reset and bus reset might prevent some devices from being assigned to different virtual machines, or from being assigned between the VMkernel and virtual machines. In the absence of FLR, it is possible to use PCI Power Management capabili
Perhaps you can try putting the GPU itself in the passthru map and experiment with D3D0 and other methods
 

das1996

Member
Sep 4, 2018
37
2
8
^^I've tried every possible variation with the 1660 ti. It didn't matter, d3d0, flr, link, bridge, etc on the GPU PID, or any other pid associated with the card.

I've spent MANY hours over a few days messing with this. I think we're on the right track about it being a reset issue. Something changed after 6.5.0.

Unrelated, but discovered my browsers (FF, opera, chrome) weren't doing hardware decoding of videos (inc 4k/8k). Turned out I was missing some filters.

Post #2, specifically vp9 and av1. Maybe the decrapifier script removed them when I set this box up a year ago... Never paid much attention as this cpu is quite powerful (3900x). But noticed no decode activity when playing any vids in task manager.

For now, the 1050 ti will go into the htpc. I'm taking the 1660 ti for myself. In a few years maybe some of the hw decoding abilities of the 1660 will be needed so i'll swap. In any event gt 710---> 1050 ti is a vast improvement.
 
Last edited:

Railgun

New Member
Jul 28, 2018
9
1
3
Hi @das1996. Firstly, thanks for your efforts on this (and everyone for that matter). I too have started to experience issues in this regard with my 1660Ti, though in my case, it seems a host reboot no longer gets the card in a working state. I'd not done any of the above previously and am working on that as I type, though that being the case, I'm trying to determine the cause as to why a host reboot does not work. I'm running 6.7 build 15160138. It's probable that this was the build that caused the issue as prior to the update (still on 6.7, don't recall specifically which version) a host reboot seemed to work.

I'll try to get everything in place.

Incidentally, adding the script to the policy causes a hang on reboot/shutdown of the guest. I suppose it's easy enough to simply do manually should I need to and it's not as though it reboots often.

As I'm not an ESXi guru by any stretch, I'll be digging deeper than I probably should be just to make sure I know where everything is and should be.