I have an EPYC/Supermicro server build with the below specs running ESXi 7.0 U2 and have been using GPU pass through to provide a 3070 to my Ubuntu Server VM. This has worked flawlessly for about 6 months until a few days ago when the GPU spontaneously is no longer recognized by the Nvidia driver on the VM. When I run `nvidia-smi` I get `No devices were found`. The drivers are certainly installed and Ubuntu can "see" the hardware (confirmed via `lspci | grep -i nv`), so I am not sure why it is not recognized. This is only a linux issue for some reason and I can pass through the device to a Windows 10 VM just fine.
I'm wondering if anyone has run into similar issues recently or has any tips as I am running out of ideas. See below notes on what I have tried...this appears to be an ESXi or Supermicro BIOS?/ESXi issue but I cannot explain why because I don't recall pushing any updates.
What I have tried:
Hardware:
Another post I have created with some CLI output from the VM.
https://forums.developer.nvidia.com/t/nvidia-smi-no-devices-were-found-vmware-esxi-ubuntu-server-20-04-03-with-rtx3070/202904
I'm wondering if anyone has run into similar issues recently or has any tips as I am running out of ideas. See below notes on what I have tried...this appears to be an ESXi or Supermicro BIOS?/ESXi issue but I cannot explain why because I don't recall pushing any updates.
What I have tried:
- Windows 10 VM with GPU pass through WORKS.
- Ubuntu Server 18.04 with GPU pass through fails with the same issue.
- Ubuntu Desktop 20.10 with GPU pass through fails with the same issue.
- I have tried Nvidia drivers 470, 495, and 510 with same results.
- Bare metal Ubuntu Server 20.04 install WORKS.
- Bare metal Proxmox install with Ubuntu Server 20.04 VM WORKS. I can successfully utilize GPU pass through with Proxmox, so this appears to be ESXi related (or motherboard/ESXi related?)
- I do not have issues (like I have seen some mention) enabling the card for passthrough in the ESXi webUI. It remains active and persists through reboot without issue.
- Fresh ESXi install 7.0U2 and 7.0U3c did not help.
Hardware:
- CPU: AMD EPYC 7252
- Motherboard: Supermicro H12SSL-CT
- RAM : 128GB ECC 3200Mhz
- SSD: 2TB Samsung M.2 NVME
- GPU: Nvidia RTX3070 FE
- Host: ESXi 7.0 U2
- Guest: Ubuntu Server 20.04.3 LTS
Another post I have created with some CLI output from the VM.
https://forums.developer.nvidia.com/t/nvidia-smi-no-devices-were-found-vmware-esxi-ubuntu-server-20-04-03-with-rtx3070/202904
Last edited: