Gaming instability ESXi 6.7u3b, TR 1950x

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

PsyberFyre

New Member
Jan 12, 2020
12
3
3
In lieu of purchasing a gaming PC for the TV in my den I decided to just spin up a Win 10 VM on my 1950X ESXi host and run extended active HDMI & USB 3.0 cables to it (about 5 meters). I passed through a GTX 1080ti and a dedicated USB3 PCIe card with no issues. I easily get 60fps when playing Doom @ 4k 60fps ultra settings, but after 15-30 minutes of gameplay the game will eventually freeze requiring me to restart the guest. GPU/CPU temps are fine as both have their own AIO coolers. The VM has 6 vCPU with low latency enabled. The case is chockablock with fans, so the VRM and all other mobo components are plenty cool. Heck, the RAID controller even has its own pair of 90mm fans blowing on it (as it liked to get toasty without it lol).

The only other VMs running are a domain controller, file server and vCenter virtual appliance. None of which are doing much of anything. The gaming VM never hits 100% CPU usage.

My assumption is that I may be coming up short in the power department. Since I want to add another GPU for a GPU accelerated terminal server I have ordered a 1200 watt 80+ platinum PSU, but wondering if there might be something else I am missing or should check in the meantime?

Hardware:
  • AMD Threadripper 1950x
  • 64GB Corsair Vengeance LPX 3200 DDR4
  • Drained, cleaned and refilled Enermax Liqtech II TR4 360
  • EVGA GTX 1080ti FTW 3 hybrid
  • Adaptec 72405 24 channel RAID controller
  • Intel X520 10Gb NIC SFP+
  • 3 x NVME SSDs standalone, 4x SATA SSDs (RAID), 5x 4TB SATA 7.2k HDDs (RAID), 2x 2TB SATA 7.2k HDDs (RAID)
  • 850 watt 80+ gold PSU
Software:
  • ESXi 6.7 Update 3b
  • vCenter 6.7 Update 3b OVA
  • Windows 10 1909 VM


(Note that 2nd GPU is not actually in the case at the moment. That was a test-fit to make sure everything would actually work. Not putting it in production until I get the new PSU).
 

mbosma

Member
Dec 4, 2018
76
58
18
First off I want to tell you that you have an awesome setup!

Have you tried installing Windows on a separate harddisk and tried running games on bare metal?
This would rule out any virtualization stacks and focus on hardware only.

Maybe vmware is not playing nice
I've been using proxmox to power gaming vm's and had no problem with vm's freezing so far (though a lot of code 43 driver issues on nvidia).
Virtualization will always add more variables to the equation when troubleshooting.

Your 850w psu should be plenty for one gpu, heck even two could work depending on their power usage and the rest of the system.
In this case I think 850w might be a bit on the edge when you stress every component at once.
 

PsyberFyre

New Member
Jan 12, 2020
12
3
3
Thanks! It's literally a patchwork quilt of parts that I've collected through the years.

I have gamed on this hardware back when it was a Hyper-V host with no problem. Back then it didn't have the RAID card or all the storage, though. It was just the 1080ti, a couple NVME drives and that was really it. Since then I've added all the storage, 10Gb networking, etc.

Using a PSU calculator I'm getting around 900 watts with JUST the single card...though this is assuming the CPU is running at 75%, which it never gets close to. AMD Ryzen Threadripper 1950X NVIDIA GeForce GTX 1080 Ti - PSU Calculator - Build hRk7qY

The storage for the gaming VM are all VMFS6 vmdk. The OS VMDK is on a single TLC NVME SSD, the Steam library is on a RAID 10 comprised of 4x 7200RPM 4TB HDDs. I will keep an eye on disk latency in vCenter while I'm gaming to see if I see anything there.

Have also been considering moving to Proxmox. I actually want to use it for my 9900k desktop workstation as a dual-head machine with OSX and Windows 10, but I can't for the life of me get my 5700XT successfully passed through to the OSX VM. I'll get there eventually.
 

besterino

New Member
Apr 22, 2017
27
7
3
46
I had a TR built running for a couple of months without issues with a seasonic 850W titanium PSU and GPU 2080Ti on X399D8A and 1920X, though except for USB3 Host Card no further devices.
What’s your Mainboard?

Recently tinkered with two TitanXp in that rig, ran only a couple of days but again no issues around crashes.

Grasping for straws: Have you checked the integrity of your system files in the VM with sfc /scannow ? Recommend doing that after crashes, in particular repeated ones.

Does this occur with all games or just the one? And which one(s)?
 

mbosma

Member
Dec 4, 2018
76
58
18
I've read on some forums that the Outervision calculator is not a good calculator. Though I'd rather not risk it if I were you, especially since you plan to add another gpu anyways.
The cases I've seen so far regarding power issues the host simply reboots due to a lack of power, never seen instability in a single vm before.
Do you have any other vm's running on that host as well?

Does the game freeze during a specific time or loading screen?
I tested a scenario were a few vm's ran from a zfs pool on a single ssd. Games would freeze for a period during loading and I would get a io wait time of nearly 100%.
A few hiccups occured during some terrain transitions in games like The Witcher 3 but no crashing due to the slow storage.

I love Proxmox for its versatility.
Can't say I have experience with running ATI cards nor running OSX in a vm, this is a thing I definitely like to try some time.

Have you tried passing through the other card (I'm assuming it's a 5700XT) instead of the 1080ti to see if it might be bound to hardware or drivers?
 

PsyberFyre

New Member
Jan 12, 2020
12
3
3
I, too, would expect a host-wide event during a low power situation. Only thing I was thinking was that the GPU may have gotten a little low on available power causing a hiccup that crashed the API and the game along with it. New PSU will be here tomorrow so we'll soon know!

The 5700xt is currently in my 9900k hackintosh workstation, so it's stuck where it is lol.

There's no logic to when it freezes...though its usually right around 15-20 minutes into the game. Never makes it to a half hour, but it's not early, either. It also doesn't seem linked to how busy the screen is. Happened on a stage-clear screen, a loading screen, just walking around, etc. Oddly enough it hasn't happened during a big fight with a lot going on. I also suspect the passed-through USB card. I bought a dual controller Sonnet Allegro USB 3.2 PCIe card and passed 1 of the 2 controllers through to the VM. Windows insta-crashed on bootup. It was using 2x ASM2142 controllers. I put in a single controller card and passed it through and it seems ok, also an ASM2142. I have read other people having issues with the multi-port cards on Windows 10 as well. Not sure what the actual root cause is, though.

There is a domain controller, file server and the vCenter OVA also running at the same time. They are low CPU and none are using the same storage as the gaming PC. I am going to setup a disk latency trace and play a few rounds of Doom to see if that's the issue assuming the new PSU doesn't work.
 

IamSpartacus

Well-Known Member
Mar 14, 2016
2,515
650
113
I recently ran Unraid on my 1950x with a gaming VM using a passed through 1080Ti and an NVMe controller. What I found to work best to ensure little to no performance hit was to isolate one entire Numa die (the one that is connected to your GPU which you can confirm with lstopo). This way the rest of the system is not using that Numa node at all.
 

besterino

New Member
Apr 22, 2017
27
7
3
46
USB indeed seems to be a bit tricky. I have only experience with NEC/Renesas chipsets (avoiding asmedia where I can). Have so far three of those in various boxes.

Have just now ordered a 4-in-one Startech PEXUSB3S44V, again with NEC/Renesas, would be great to have that power up to 4 VMs while sacrificing only one slot.
 

PsyberFyre

New Member
Jan 12, 2020
12
3
3
I recently ran Unraid on my 1950x with a gaming VM using a passed through 1080Ti and an NVMe controller. What I found to work best to ensure little to no performance hit was to isolate one entire Numa die (the one that is connected to your GPU which you can confirm with lstopo). This way the rest of the system is not using that Numa node at all.
I hadn't thought about that. I'm sure game designers never imagined their games being played on this topology. Gonna give this a try!
 

PsyberFyre

New Member
Jan 12, 2020
12
3
3
Well, it was worth a shot. I limited the VM to the same NUMA node / die that is carrying the GPU. Same result. Crashed after about 15 minutes of playtime. Ran a perfmon trace during the game and never saw any real storage latency, even from the spinning HDD array.

Hoping for better luck with the new PSU tomorrow.
 

IamSpartacus

Well-Known Member
Mar 14, 2016
2,515
650
113
The fact that I've wasted the last half-hour playing Doom makes me feel like I figured it out. Check out this article from Nvidia. I added this to the VM .vmx and so far it's been rock solid.

VMware vDGA / GPU Passthrough Requires That MSI is Disabled on VMs | NVIDIA
Nice find, I forgot about having to do that as well.

Question related to your setup. Did you have to do anything special to enable hardware passthrough for your GPU? I've never done so with a consumer GPU in ESXi so just curious.
 

PsyberFyre

New Member
Jan 12, 2020
12
3
3
Nice find, I forgot about having to do that as well.

Question related to your setup. Did you have to do anything special to enable hardware passthrough for your GPU? I've never done so with a consumer GPU in ESXi so just curious.
The main thing for Windows clients is to add 'hypervisor.cpuid.v0 = "FALSE"' to the .vmx. Otherwise the GeForce drivers detect the hypervisor and refuse to work. Nvidia wants you to use quadros in passthrough, of course .
 
  • Like
Reactions: IamSpartacus

besterino

New Member
Apr 22, 2017
27
7
3
46
Note: MSI disabled is no longer required for 2080Ti cards (possibly also other Turing based GPUs).
Sometimes you also have to tinker with passthru.map.

All those settings depend on the plattform (CPU/Chipset) as well as the GPU, also the specific Mainboard/BIOS may have an impact.
 

mbosma

Member
Dec 4, 2018
76
58
18
Ever had a code 43 error using vmware with 'hypervisor.cpuid.v0 = "FALSE"' in the vmx file?
I've had some varying results using proxmox and consumer grade gpu's, vmware might be a good alternative if this works.
 

besterino

New Member
Apr 22, 2017
27
7
3
46
@mbosma: yes, Error 43 (or a total crash of the VM at VM-boot) is the usual behaviour if one of the settings (advanced setting like msi or in passthru.map) is not correct.

From my experience, passthru.map and hyerpvisor.cpuid.v0=false primarily determine whether the VM boots (and reboots) without Error 43, advanced settings determine stability.
 
  • Like
Reactions: mbosma

PsyberFyre

New Member
Jan 12, 2020
12
3
3
I had serious boot issues when trying to pass through 1 of 2 controllers on an Allegro USB 3.2 4-port card. I've read others having the same experiences. In theory the 2 controller show up as 2 separate PCIe devices and you can pass one through to one VM and one through to the other. In practice that doesn't seem to work so well. Apparently the now discontinued "Pro" model with 4 USB-A ports and 4 controllers worked, but it's no longer available.

Last night I put a light OC on the 1080ti and my roommate gamed on the VM into the wee hours, installed almost a TB of steam games and left me a text saying it was rock solid.

Today will be adventure #2 when the new PSU arrives as I'll be spinning up a new terminal server VM with a GTX1080 for GIS work.
 

PsyberFyre

New Member
Jan 12, 2020
12
3
3
Got the new PSU and 2nd GPU in. Everything works like a charm! I pinned the terminal server VM to the 2nd NUMA node since it's carrying the 2nd GPU and that will prevent resource starvation between the terminal server and the gaming PC.

I have to give credit to Linus Tech Tips for giving me the idea to try this. This HEDT stuff is pricey, but when you take into account that I've effectively replaced a gaming PC, a workstation and a server with a single homelab, it actually makes financial sense. And it's a great way for me to reuse random parts that I've collected over the years lol.
 

besterino

New Member
Apr 22, 2017
27
7
3
46
Tell me about it. Been tinkering with it for years. Currently game in a VM on a 3960X with a 2080Ti passthroughed and have a friend streaming in parallel from another VM with a TitanXp passthroughed on the same rig. It’s just amazing what you can do even with totally unsupported consumer hardware... ;)

Will make a few experiments with the x4 USB-Card. We‘ll see... :)