VMWare ESXI - Ubuntu Server 20.04.03 with RTX3070 - GPU Passthrough broken

ksnell

New Member
Feb 8, 2021
8
2
3
I have an EPYC/Supermicro server build with the below specs running ESXi 7.0 U2 and have been using GPU pass through to provide a 3070 to my Ubuntu Server VM. This has worked flawlessly for about 6 months until a few days ago when the GPU spontaneously is no longer recognized by the Nvidia driver on the VM. When I run `nvidia-smi` I get `No devices were found`. The drivers are certainly installed and Ubuntu can "see" the hardware (confirmed via `lspci | grep -i nv`), so I am not sure why it is not recognized. This is only a linux issue for some reason and I can pass through the device to a Windows 10 VM just fine.

I'm wondering if anyone has run into similar issues recently or has any tips as I am running out of ideas. See below notes on what I have tried...this appears to be an ESXi or Supermicro BIOS?/ESXi issue but I cannot explain why because I don't recall pushing any updates.

What I have tried:
  • Windows 10 VM with GPU pass through WORKS.
  • Ubuntu Server 18.04 with GPU pass through fails with the same issue.
  • Ubuntu Desktop 20.10 with GPU pass through fails with the same issue.
  • I have tried Nvidia drivers 470, 495, and 510 with same results.
  • Bare metal Ubuntu Server 20.04 install WORKS.
  • Bare metal Proxmox install with Ubuntu Server 20.04 VM WORKS. I can successfully utilize GPU pass through with Proxmox, so this appears to be ESXi related (or motherboard/ESXi related?)
  • I do not have issues (like I have seen some mention) enabling the card for passthrough in the ESXi webUI. It remains active and persists through reboot without issue.
  • Fresh ESXi install 7.0U2 and 7.0U3c did not help.

Hardware:
  • CPU: AMD EPYC 7252
  • Motherboard: Supermicro H12SSL-CT
  • RAM : 128GB ECC 3200Mhz
  • SSD: 2TB Samsung M.2 NVME
  • GPU: Nvidia RTX3070 FE
Software:
  • Host: ESXi 7.0 U2
  • Guest: Ubuntu Server 20.04.3 LTS

Another post I have created with some CLI output from the VM.

https://forums.developer.nvidia.com/t/nvidia-smi-no-devices-were-found-vmware-esxi-ubuntu-server-20-04-03-with-rtx3070/202904
 
Last edited:

superempie

Member
Sep 25, 2015
78
10
8
The Netherlands
Have you tried this, in this specific order:
1) Install Ubuntu VM with all updates (and openssh-server just in case)
2) Shutdown VM
3) Set advanced configuration parameters:
3a) hypervisor.cpuid.v0 to FALSE
3b) svga.present to FALSE
4) Attach GPU and Audio Controller to VM
5) Boot up VM
6) Check lspci
7) Install driver; for me only 470 works. 495 also gives me no devices found. Didn't try 510 yet.
8) Reboot
9) Check nvidia-smi
 

ksnell

New Member
Feb 8, 2021
8
2
3
Have you tried this, in this specific order:
1) Install Ubuntu VM with all updates (and openssh-server just in case)
2) Shutdown VM
3) Set advanced configuration parameters:
3a) hypervisor.cpuid.v0 to FALSE
3b) svga.present to FALSE
4) Attach GPU and Audio Controller to VM
5) Boot up VM
6) Check lspci
7) Install driver; for me only 470 works. 495 also gives me no devices found. Didn't try 510 yet.
8) Reboot
9) Check nvidia-smi
I tried that but I set the advanced configuration items from the start. Not sure if that has an impact. I was about able to get a Debian VM going yesterday though. I used their `nvidia-driver` package which uses driver 460.91.03.

I may test changing the order as you mentioned but I am happy to have things running again.

EDIT: I did test an Ubuntu Server VM with the configuration order you mentioned and did not have success unfortunately. At least I am running on Debian. Appreciate the feedback!
 
Last edited: