VM with passthrough "freezes" entire ESXi box when shutdown/rebooting guest

thedotlair

New Member
Jul 3, 2011
7
0
1
Hi all,

VERY strange issue that I've come across and can't seem to get my head around this .. so far I've tried diagnosing this for nearly 12 hours straight without being able to get to the bottom of it.

I'm running ESXi 6.0u2 build 3620759 (free license), on a custom build Asus P9x79 PRO running an E5-2660 Xeon and 64gb of DDR3 RAM. The system has four cards in it: IBM/LSI M1015 passthrough to a FileServer VM, an ATI X1300 boot graphics card, 2x Intel PT1000 Dual NIC cards and an AMD HD6450 Graphics card for passthrough. I have a Windows 10 VM which has been configured to use the AMD HD6450 configured in passthrough both Audio and Video to use as a test bench for a new type of system that we're currently building but don't have the actual hardware with us .. so we're shortcutting to get the development of the software achieved.

The VM has the latest VMWare Tools installed (via the console) and runs absolutely perfectly ... except when you decide to either reboot or shutdown the VM, the actual ESX host becomes unresponsive. Stops responding to network traffic, ESXi thick and HTML consoles don't response (naturally) but there are absolutely no errors on the screen, like a PSOD. It still displays the normal yellow console which is also unresponsive, so a very hard lock.

I've looked through the knowledge base and found KB1030265 which I've followed and now have disabled interrupt mapping but this hasn't made a slight bit of difference.

Can anybody point me in a direction to either get logs from this thing or any suggestions to try and debug this? Appreciate that it's a tough call, especially since the hardware not everybody will be running etc but any experiences that are similar and things I can change/tune would be appreciated.

I'm tempted to drop back to ESXi 5.5 and see if that exhibits the same problem, which would indicate hardware faults, but I would have thought loading up the VM with 1080p graphics/sound would have caused a bigger issue than shutting down the VM.

Thanks

Dean
 

RyC

Active Member
Oct 17, 2013
357
89
28
Try leaving out the audio device when passing through to the VM if you don't use HDMI audio
 

thedotlair

New Member
Jul 3, 2011
7
0
1
Thanks RyC, I'll give that a shot as it could be the audio causing an issue but I kinda need that as well :( I'm also going to try reverting back to 6.0GA to see if it's a Update2 issue.
 

thedotlair

New Member
Jul 3, 2011
7
0
1
Just spoken to a colleague about this who's doing the same thing but on a SuperMicro board and is getting exactly the same behavior when passing through a GPU to a Windows VM. He's also running Update2 as well. Difference being, he's passing through a 290X.
 

pricklypunter

Well-Known Member
Nov 10, 2015
1,605
469
83
Canada
Have you tried passing the card through to say a win8.1 VM? Does it do the same thing? Could it be a win 10 driver issue?
 

xienze

New Member
Apr 11, 2016
3
0
1
38
I know that ATI cards have trouble starting back up when a proper PCIe bus reset hasn't occurred (like you would normally do running bare metal). Indeed, this is the issue I would see with my VM: the first time you start the VM after rebooting the host, no problems. If you restart the VM, the VM itself would lock up. I didn't see the exact issue you were seeing though (host locking up).

Here's something quick you can try, and if this works for you I can give you a much more in-depth post about how to fix the issue in an automated manner. Reboot your host and start your VM with the card passed through. Then open the Windows device manager and disable the card. Restart the VM -- there should be no lockups. After the VM has started, re-enable the card via device manager. If all that works for you, I can help you out with an automated solution.
 

thedotlair

New Member
Jul 3, 2011
7
0
1
Hey Xienze, thanks for the suggestion. I went through and tried everything you said and ended up with a host lockup after the device was disabled in the VM and then the VM rebooted :(

Up until that point, it was working perfectly even though it was the only VM turned on at that point.

Actually tried it with two different VMs: a Windows 10 Pro and a Windows 8.1 Pro. Both gave the same behaviour of the host locking up :(
 
Last edited:

thedotlair

New Member
Jul 3, 2011
7
0
1
Well I got it solved! But not in the way that I wanted to :( Had to go back to 5.5 Update 3 but absolutely no freezing on startup/shutdown and works like a dream.

Guess I'll be staying on this for a while especially as the HTML fling is available :)
 

starkindler89

New Member
Apr 25, 2016
1
0
1
I've been having the same problem on my build: HP Z820 workstation, Xeon E5-2770, 64GB DDR3, Radeon R9-380 passthrough. I could get the card to passthrough alright and display video but after shutting down or rebooting the VM, the host would freeze and crash. After rebooting and booting the VM, I could get it to work again but as soon as the VM rebooted, the host would hang again.
I tried different versions of ESXi 5.5, 5.5u2, 5.5u3, 6.0, 6.0u2, installing various versions of VMware tools, only installing part of the VMware tools suite, different graphics card drivers, Windows 7, 8.1, and 10 but nothing was working. Finally stumbled on a thread which mentioned the PCIe bus not getting fully reset with Radeon cards. I found that if I disable the passthrough video card in Windows device manager before shutting down the VM, I was able to reboot the VM and enable the card device again without crashing the host. With a little scripting added to the startup/shutdown section of the local GPO, I'm up and running now without any problems! Hope this helps!
 

MKO

New Member
Jun 23, 2016
6
0
1
40
I've also experienced the ESXi 6.0 u2 host freezing when shutting down or rebooting a Windows 8 VM with passthrough devices attached.

After reading starkindler89's post I started disabling passthrough devices and found that passthrough of the USB 3 controller is the cause of the issues on my system. I have an asus M5A99X evo r2.0 board and passed through a radeon 6870 and the Asmedia USB 3 ports to my primary VM.
When I don't pass this controller to the VM or eject it in the guest prior to shutting down or rebooting everything is fine.
I need to look into which device to disable in device manager, because ejecting the root hub also removes the passthrough.

The ESXi build running on my machine is 3620759 from March 2016. Based on the VMware KB entries related to the latest build (3825889) updating probably won't solve the underlying issue but I might give it a try soon because some IOFilter issue has been solved.
I will try updating the BIOS first, maybe something controller related will change, it might also be driver related ofcourse.
 
Last edited:

xienze

New Member
Apr 11, 2016
3
0
1
38
MKO, if you are able to disable the device in the guest prior to VM shutdown and that fixes your problem, what you can do is write a simple script that disables upon shutdown and enables upon startup. It's what I do for my graphics card. If you can verify that works I'll write up the process for you.
 

MKO

New Member
Jun 23, 2016
6
0
1
40
xienze, thanks but no need now :) I know how to disable and enable devices through powershell but I was unable to disable the suspectUSB hub, I could only eject it.
But I have sinds found that the issue is caused by a usb composite device which was present in device manager.
The vm thinks this device is connected to the passed through USB controller but this was no longer the case.
After disabling this device everything appears to be working correctly, even the connected mouse and keyboard stil work fine.
 

F1ydave

Member
Mar 9, 2014
132
21
18
There are some known IRQ problems with some cards with VMware. A lot of people are able to solve it by not using the main express slot.

At last the 5.5u3 worked out for you!
 

Jacob Staub

New Member
Jul 15, 2016
3
1
1
56
I've been having the same problem on my build: HP Z820 workstation, Xeon E5-2770, 64GB DDR3, Radeon R9-380 passthrough. I could get the card to passthrough alright and display video but after shutting down or rebooting the VM, the host would freeze and crash. After rebooting and booting the VM, I could get it to work again but as soon as the VM rebooted, the host would hang again.

... I found that if I disable the passthrough video card in Windows device manager before shutting down the VM, I was able to reboot the VM and enable the card device again without crashing the host. With a little scripting added to the startup/shutdown section of the local GPO, I'm up and running now without any problems! Hope this helps!
To all that have contributed to this thread before me: awesome work. I've been whacking my head against ESXi and have been dying to find anything of use in the giant information cesspool.

My Build: HP Z620 workstation, Xeon E5-2670, 64GB DDR3, ESXi 6.0.0 3620759, NVIDIA Quadro 2000 passthrough, Windows7 Professional VM. Entirely the same behavior was observed with respect to rebooting the VM after successfully passing through a Quadro 2000. Entirely the same behavior was observed with respect to the disabling and enabling of the Quadro 2000 driver. The scripts I used to automate the shutdown/reboot workaround were the following:

1. To disable the Quadro 2000 driver (a bit cumbersome to implement as a "Schedule task"):
"C:\Program Files\devmanview-x64\DevManView.exe"/disable "NVIDIA Quadro 2000"

2. To enable the Quadro 2000 driver (simple to implement as a "Schedule task"):
"C:\Program Files\devmanview-x64\DevManView.exe"/enable "NVIDIA Quadro 2000"

Click here for instructions on DevManView which explains the scripts and how DevManView handles them.

To implement the scripts within a Windows 7 VM two "Schedule tasks" were created using the following method:
1. Goto: Control Panel >> System and Security >> Administration Tools >> Schedule tasks

2. Set up the enable driver "Schedule task"
A. Under "Actions" menu choose "Create Basic Task"
B. Name = arbitrary
C. Trigger = When the computer starts
D. Action = Start a program
E. Program/script = File with enable script saved with suffix ".cmd"

3. Set up the disable driver "Schedule task" (play with plethora of accessory settings within the "Schedule task" interface as required to yield desired task execution behavior)
A. Under "Actions" menu choose "Create Task"
B. Name = arbitrary
C. Trigger >> New >> Settings = Basic >> Begin the task: On an event >> Log: System >> Source: USER32 >> Event ID: 1074
D. Actions >> New >> Action: Start a program >> Program/script = File with disable script saved with suffix ".cmd"
E. Conditions >> None
F. Settings >> As required
Note: Click here for remarks on "Trigger" set up.
 

Jacob Staub

New Member
Jul 15, 2016
3
1
1
56
A quick follow up on my GPU passthrough experience:

Quadro 2000 audio was successfully passed through to a Windows 7 VM. The process amounted to passing through Q2000 audio first. Once Q2000 audio was working Q2000 video was passed through. VM start time with Audio/Video passed through takes so long it seems like the system is frozen on startup. Audio quality was below average with scratchy delay that seemed to come and go based on VM activity levels. And despite scripting the enabling and disabling of the Q2000 audio and video devices, shutting down/restarting the VM reliably crashed the ESXi host.

Out of curiosity an attempt was made to pass through an available, "Intel Corporation C600/X79 series chipset High Definition Audio Controller." The Intel Hi-Def device passed through successfully to the VM. Audio quality was still below average with a hint of scratchy delay based on VM activity level. However, for Lo-Fi purposes the audio quality suffices. No enable/disable script of the Intel Hi-Def device was required to produce stable shut down/restart behavior.

So, it appears audio (at lest built in audio) can be passed through to a VM reliably so long as the audio is not part of the GPU. I have not tested an add-on PCIe audio card but it ought to work since the flow of information is over the same bus. That being said, I have a feeling the audio quality wouldn't be all that great but the only way to find out for sure is to try it.

Happy hunting,
Jake
 

nk215

Active Member
Oct 6, 2015
314
92
28
46
ESXi 5.5 has none of these issues right? I am on ESXi 5.5U1 and has no problem with PCI pass through. In fact, I use PCI pass-through to log onto my guest at the ESXi console w/o any issue.
 

epicurean

Active Member
Sep 29, 2014
635
32
28
To all that have contributed to this thread before me: awesome work. I've been whacking my head against ESXi and have been dying to find anything of use in the giant information cesspool.
The scripts I used to automate the shutdown/reboot workaround were the following:

1. To disable the Quadro 2000 driver (a bit cumbersome to implement as a "Schedule task"):
"C:\Program Files\devmanview-x64\DevManView.exe"/disable "NVIDIA Quadro 2000"

2. To enable the Quadro 2000 driver (simple to implement as a "Schedule task"):
"C:\Program Files\devmanview-x64\DevManView.exe"/enable "NVIDIA Quadro 2000"

Click here for instructions on DevManView which explains the scripts and how DevManView handles them.

To implement the scripts within a Windows 7 VM two "Schedule tasks" were created using the following method:
1. Goto: Control Panel >> System and Security >> Administration Tools >> Schedule tasks

2. Set up the enable driver "Schedule task"
A. Under "Actions" menu choose "Create Basic Task"
B. Name = arbitrary
C. Trigger = When the computer starts
D. Action = Start a program
E. Program/script = File with enable script saved with suffix ".cmd"

3. Set up the disable driver "Schedule task" (play with plethora of accessory settings within the "Schedule task" interface as required to yield desired task execution behavior)
A. Under "Actions" menu choose "Create Task"
B. Name = arbitrary
C. Trigger >> New >> Settings = Basic >> Begin the task: On an event >> Log: System >> Source: USER32 >> Event ID: 1074
D. Actions >> New >> Action: Start a program >> Program/script = File with disable script saved with suffix ".cmd"
E. Conditions >> None
F. Settings >> As required
Note: Click here for remarks on "Trigger" set up.
Hi Jacob,
How exactly did you do 1. ,2 and 3 ? I have AMD 6450 cards in my windows VM, and I am having the same esxi freeze everytime I try to shutdown or restart the VMs
 

Jacob Staub

New Member
Jul 15, 2016
3
1
1
56
Hi Jacob,
How exactly did you do 1. ,2 and 3 ? I have AMD 6450 cards in my windows VM, and I am having the same esxi freeze everytime I try to shutdown or restart the VMs
Hello epicurean,

Before implementing my version of a solution I recommend following the instructions given by "mvrk" in post number four of the following link.

mvrk's solution is more elegant and easier to implement. If for some reason mvrk's solution doesn't work for you I'll provide further assistance on my version.

Please let the forum know how it goes.

Click here if you'd like to read a little more about the development experience I had on my way to passing through a GPU. The capability does work even if the process is anything but smooth. My server is now passing through 2 x NVIDIA 2000 GPUs to two Windows 7 VMs without much perceptible trouble.

Regards,
Jake
 
  • Like
Reactions: epicurean

nk215

Active Member
Oct 6, 2015
314
92
28
46
I just tested a Quadro 2000 with a test ESXi 6U2 setup. GPU pass through works great with Win7 guest. NO issue.