Troubleshooting GPU passthrough ESXi 6.5

Ch33rios

Member
Nov 29, 2016
102
6
18
39
I reset my primary display in the BIOS to 'offboard' which ensures the PCIe GPU is used as the primary and as you noted, I can see the host boot screens and ESXi loading screens show up but then they freeze, presumably when the passthru config is executed by the kernel.

Unfortunately again I cant seem to get the VM to load the GPU drivers fully. I'm posting my VMX just to see if that helps people take a gander. I'm also going to again attempt to use my newer GTX970 to see if that makes any difference.
 

Attachments

marcoi

Well-Known Member
Apr 6, 2013
1,403
224
63
Gotha Florida
i compared it to my vmx file and seen a few differences.

I dont see in yours config the below pci info (I have one set for each pass through device.)
pciPassthru0.deviceId = "0x8ca0"
pciPassthru0.id = "00000:000:27.0"
pciPassthru0.pciSlotNumber = "192"


you also had the following extra items, not sure it matters
softPowerOff = "FALSE"
svga.guestBackedPrimaryAware = "TRUE"

Did you make sure when you added to the esxi config file all three lines of device were added for the video card and they match what you see in the passthrough screen of the gui?

vi /etc/vmware/esx.conf

/device/00000:000:20.0/owner = "passthru"
/device/00000:000:20.0/device = "8cb1"
/device/00000:000:20.0/vendor = "8086"
 
  • Like
Reactions: crhendo

crhendo

New Member
Jan 5, 2017
2
0
1
63
Nvidia a PITA but i got my 750 TI working after a lot of research and playing."
Yes, I have stuck successfully to AMD over the years and even then it takes a while to get a "good" one.

Probably a little off topic, but interesting all the same. I just installed ESXi 6.5 on one of my Intel NUC Skull Canyon's. Had a small problem with not being able to see my Bluetooth adapter which was resolved by disabling the new (and buggy?) VMWare USB driver. I then noticed that I was able to pass through the on-board Intel Iris Pro 580 GPU to one of my Windows 10 64bit VM's and that is when things became a little interesting. The latest Intel Win10 Graphics driver installed without any problems but unfortunately I could not get an image on any monitor that I connected to any port. I knew the Graphics driver was doing its job because I made use of Intel's GPU assisted Quick sync to do a few H264 encodes using Handbrake which all performed extremely well. It would be great to have on-board Pass-through finally working on a NUC. Does anybody have any ideas on how I might get a monitor working?
 

marcoi

Well-Known Member
Apr 6, 2013
1,403
224
63
Gotha Florida
while gathering info for my setup, I believe i read somewhere that integrated GPU acts differently then pci gpu so it cant be pass through. but who knows maybe the newer gen IGP are made differently?
 

crhendo

New Member
Jan 5, 2017
2
0
1
63
while gathering info for my setup, I believe i read somewhere that integrated GPU acts differently then pci gpu so it cant be pass through. but who knows maybe the newer gen IGP are made differently?
Yes, its always been an issue for on-board GPU's but this is the first time I have been able to make use of the Intel Graphics driver after the pass-through and gain access to Quick sync. This was definitely not available in 6.0. Whatever the case, we are getting closer.
 

Ch33rios

Member
Nov 29, 2016
102
6
18
39
Yes, I have stuck successfully to AMD over the years and even then it takes a while to get a "good" one.
Yeah it might come down to getting a slightly cheaper one JUST to prove out my own personal use case. But for a proper gaming setup I'd have to invest in more...with seemingly so many folks able to figure it out (many with just doing the magic hypervisor.cpuid.v0 key) I was hopeful I'd be able to use my existing GTX970 once I proved it with my older GTX460...obviously that hasn't gone as smoothly as initially hoped :)

i compared it to my vmx file and seen a few differences.

I dont see in yours config the below pci info (I have one set for each pass through device.)
pciPassthru0.deviceId = "0x8ca0"
pciPassthru0.id = "00000:000:27.0"
pciPassthru0.pciSlotNumber = "192"


you also had the following extra items, not sure it matters
softPowerOff = "FALSE"
svga.guestBackedPrimaryAware = "TRUE"

Did you make sure when you added to the esxi config file all three lines of device were added for the video card and they match what you see in the passthrough screen of the gui?

vi /etc/vmware/esx.conf

/device/00000:000:20.0/owner = "passthru"
/device/00000:000:20.0/device = "8cb1"
/device/00000:000:20.0/vendor = "8086"
Yeah I took each of your previous instructions very slowly again and did a step by step with screenshots. Here's a pic from my esx.conf after I enabled passthrough on the High Def Audio controller via the Gui but BEFORE my first reboot:

BEFORE REBOOT:
gui_hardware_before.png

ESX.CONF
esx.conf.png

AFTER REBOOT:
gui_hardware_after.png

All seems according to plan. Its too late now but tomorrow I'm going to take a video of something weird I think that could be a primary symptom of why this isn't fully working. Its like I've mentioned before a bit...

1) VM is installed, booted, VM tools installed, tightVNC installed....all ready for passthrough at this point
2) Power down the VM
3) add in the PCI devices per instructions (both the GPU and the audio controller)
4) Power on the VM
5) There is now a pause of sorts when I can hear a very faint but very high pitched tone coming from the tower. Note the VM hasn't booted yet. This audible tone occurs several times (maybe 3-5 times but mind you, I wouldn't call it a BEEP like its some POST error or something) and then the VM continues to boot.

Its difficult to explain so I'll provide some more evidence tomorrow...

Along those lines I've taken some screenshots from the setup menu to see if there is something in the BIOS that I need to change/enable/etc as I've pretty much just left it at the defaults. Gigabyte's documentation thus far is basically pure junk and is frustrating me beyond belief but thank god for the internet...and more importantly the nice folks here!

If anything strikes you as weird Im all ears to learn more....
Bios_Advanced.png PCIe_Setup.png Chipset_PEG_Info.png
 

Netwerkz101

Active Member
Dec 27, 2015
294
75
28
Yes, its always been an issue for on-board GPU's but this is the first time I have been able to make use of the Intel Graphics driver after the pass-through and gain access to Quick sync. This was definitely not available in 6.0. Whatever the case, we are getting closer.
Meant for VDI .... try it in a VM via RDP - amazing.
Unigine's Heaven Benchmark looks great and runs well at lowest settings @ 1366x768 15-18 FPS.
That was over 4g connection to XenServer 7 vm running Win 10. Intel HD Graphics P530 via E3-1275v5.

More on topic ... I plan to try to do the same with ESXI 6.5.
I just need to find a working NVidia card to try to pass through.
 

Ch33rios

Member
Nov 29, 2016
102
6
18
39
Meant for VDI .... try it in a VM via RDP - amazing.
Unigine's Heaven Benchmark looks great and runs well at lowest settings @ 1366x768 15-18 FPS.
That was over 4g connection to XenServer 7 vm running Win 10. Intel HD Graphics P530 via E3-1275v5.

More on topic ... I plan to try to do the same with ESXI 6.5.
I just need to find a working NVidia card to try to pass through.
Whats really frustrating is that its so hit and miss. One person can have a flawless experience with passthru while others, like me apparently, are struggling :-/

If only I didn't already invest in nvidia cards! No...no...Im determined!
 

Ch33rios

Member
Nov 29, 2016
102
6
18
39
Quick update...decided to be brilliant (sarcasm!) and look through the esxi logs for anything interesting and came across the following line items in vmkwarning.log

2017-01-05T00:14:10.197Z cpu1:65928)WARNING: PCI: 179: 0000:01:00.0: Bypassing non-ACS capable device in hierarchy
2017-01-05T00:14:10.197Z cpu1:65928)WARNING: PCI: 179: 0000:01:00.1: Bypassing non-ACS capable device in hierarchy
2017-01-05T00:14:11.298Z cpu2:65930)WARNING: PCI: 179: 0000:01:00.0: Bypassing non-ACS capable device in hierarchy
2017-01-05T00:14:11.298Z cpu2:65930)WARNING: PCI: 179: 0000:01:00.1: Bypassing non-ACS capable device in hierarchy

So those are for sure the addresses of the devices. I'll dig a bit deeper but it could be the right path to an answer :)
 

marcoi

Well-Known Member
Apr 6, 2013
1,403
224
63
Gotha Florida
so it seems like there might be a setting in you bios or a limitation of the MB possible.
https://kb.vmware.com/selfservice/m...nguage=en_US&cmd=displayKC&externalId=1036811

what is your MB model? check out the manual and look for Access Control Services (ACS) .

It might be the option enable root port, set that to enable vs auto. Also if your using a dual cpu MB, you should make sure that your video card is in the pcie slot for primary cpu. (not sure if thats an issue but its good to try the card in different slots as well)
 

Ch33rios

Member
Nov 29, 2016
102
6
18
39
so it seems like there might be a setting in you bios or a limitation of the MB possible.
https://kb.vmware.com/selfservice/m...nguage=en_US&cmd=displayKC&externalId=1036811

what is your MB model? check out the manual and look for Access Control Services (ACS) .

It might be the option enable root port, set that to enable vs auto. Also if your using a dual cpu MB, you should make sure that your video card is in the pcie slot for primary cpu. (not sure if thats an issue but its good to try the card in different slots as well)
I'm pretty annoyed with this MB as it is....a Gigabyte MX31-BS0. The support portal has been terrible and simple questions which I received an answer to within 24 hours from this board still haven't been answered from their support portal 3 weeks later!

I'm just about dead set on returning it and finding something else that will fit my needs....any recommendations?

MSI C236M Workstation LGA 1151 Intel C236 HDMI SATA 6Gb/s USB 3.1 Micro ATX Intel Motherboard-Newegg.com

Looked pretty nice. I have had great experiences with their GPUs.
 
Last edited:

marcoi

Well-Known Member
Apr 6, 2013
1,403
224
63
Gotha Florida
i didnt see anything in the bios that indicates ACS. You can try doing restore factory settings then going through all the settings for CPU to enable VT-d, etc. On a side note has any pcie card been able to passt hrough on that MB?

my MB is MSI and I like the bios vs what i say in the Gigabyte. Not saying it would make any difference though.
 

Ch33rios

Member
Nov 29, 2016
102
6
18
39
i didnt see anything in the bios that indicates ACS. You can try doing restore factory settings then going through all the settings for CPU to enable VT-d, etc. On a side note has any pcie card been able to passt hrough on that MB?

my MB is MSI and I like the bios vs what i say in the Gigabyte. Not saying it would make any difference though.
I'll switch out the current GTX 460 for the 970 in my desktop and see what happens. Reset the parameters to defaults in the BIOS and ensure VT-d is enabled. Go from there I suppose :)

Yeah I was looking at the UEFI BIOS on the MSI board I linked...infinitely nicer and more modern. I also like the much more detailed availability of their support docs and such. Gigabyte B2B support, which is what my existing MX31-BS0 falls under, has been absolute sh*t. As you said, though, it might not make a huge difference. On a non-technical note, though, it might make me a bit happier since I at least KNOW MSI has better support based upon experience.
 

Ch33rios

Member
Nov 29, 2016
102
6
18
39
i didnt see anything in the bios that indicates ACS. You can try doing restore factory settings then going through all the settings for CPU to enable VT-d, etc. On a side note has any pcie card been able to passt hrough on that MB?

my MB is MSI and I like the bios vs what i say in the Gigabyte. Not saying it would make any difference though.

VICTORY!!! Well sort of. So I have replaced the problematic GTX460 with the GTX970 and huzzah!!! The passthrough was successful! However, its not perfect yet. The display will randomly turn off/on and eventually, when interacting with it, it will give me a BSoD saying there was a VIDEO_TDR_FAILURE.

Now what I did was just DISABLE the SVGA display adapter in the device manager and then installed the nVidia drivers as well (although sometimes Windows 10's automatic install would get in the way). Upon reboot it was absolutely fine and showed up via HDMI on my external monitor but again, its a bit flakey.

Any thoughts? It seems the VIDEO_TDR_FAILURE problem occurs with many other folks who aren't even trying to virtualize but it seems to be a potential driver issue? I downloaded the latest 376.33 IIRC. Im just letting it sit for a while to see what happens.

Specifically nvlddmkm.sys

Well I'm going to have to wait till my PCIe USB controller comes tomorrow to properly play with it and see if Im still having issues. I, for whatever reason, can still access the desktop via the console/VMware workstation and it treats that like a secondary display. Im wondering if that is causing an issue with the driver timing out which is what that TDR failure appears to be caused from.

Getting closer ;)
 
Last edited:

marcoi

Well-Known Member
Apr 6, 2013
1,403
224
63
Gotha Florida
Nice you are close. I had the same issues.
First make sure vnc is working, then turn off the VM monitor using display properties. Select only show on monitor 2(whichever is attached to your Nvidia card.) Then go into device manager and disable the VM svga. It might take a few tried since the BSOD can happen randomly. If done you should not see any video in VM console.
 
  • Like
Reactions: Ch33rios

Ch33rios

Member
Nov 29, 2016
102
6
18
39
Nice you are close. I had the same issues.
First make sure vnc is working, then turn off the VM monitor using display properties. Select only show on monitor 2(whichever is attached to your Nvidia card.) Then go into device manager and disable the VM svga. It might take a few tried since the BSOD can happen randomly. If done you should not see any video in VM console.
Perfect! I haven't had a chance to get back to it this weekend but perhaps I'll have some time tomorrow. I'll report back when I try things out.
 

Ch33rios

Member
Nov 29, 2016
102
6
18
39
Welp I'm currently writing this from my fully virtualized desktop running on my primary screen :) yay!

However, while the BSoDs are gone, I am randomly getting interrupts to both the USB and display inputs. So like right now everything seems fine but then all of a sudden while typing I see the cursor pause, the screen blanks out for like 1 second or less (sometimes though it has been almost 2-5 seconds), and then it pops back in like nothing. Not sure how to troubleshoot the cause of that though. I'm currently using an HDMI cable but will try a DVI cable instead...
 

Ch33rios

Member
Nov 29, 2016
102
6
18
39
Yeah the DVI cable didn't help at all unfortunately. Back to the drawing board? :( Man I'm tempted to just try a cheapo Radeon card at this point just to at least see if I can reach my goal of having an always on Steam streamer...either my setup is wonky or this Nvidia card is killing me!!!! :mad::mad::mad:
 

Ch33rios

Member
Nov 29, 2016
102
6
18
39
Anything show up in the windows logs that might give an idea what's happening?
I'll check tonight. I'll also look through the ESXi logs to see if there was anything interesting. Also, maybe I'll try a fresh install this time or at least revert to a snapshot before I messed with any of the device adding/removing.