Troubleshooting GPU passthrough ESXi 6.5

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Tasik008

New Member
Oct 3, 2018
6
0
1

Attachments

das1996

Member
Sep 4, 2018
75
17
8
Instead of link use d3d0

I added these lines to the end of the passthru.map file
-------------
10de 128b d3d0 false
10de 0e0f d3d0 false
--------------------

Adjust the part in blue as per your specific adapter.

 

Tasik008

New Member
Oct 3, 2018
6
0
1
Instead of link use d3d0

I added these lines to the end of the passthru.map file
-------------
10de 128b d3d0 false
10de 0e0f d3d0 false
--------------------

Adjust the part in blue as per your specific adapter.
I added the lines with ID device that you marked in blue. Line 10de ffff link false in my screenshot was added after an unsuccessful installation of the NVIDIA driver in the virtual machine for verification.
Could the error be due to the inability to install nested virtualization in the virtual machine settings?
 

das1996

Member
Sep 4, 2018
75
17
8
Link is a different type of disconnect method than d3d0. I couldn't get the video card to work with link, but it did work with d3d0. See the commented portion of the passthru.map file for explanation of the different reset methods.

Also, it goes without saying that esxi reboot is required after any changes to passthru.map.
 

Tasik008

New Member
Oct 3, 2018
6
0
1
Link is a different type of disconnect method than d3d0. I couldn't get the video card to work with link, but it did work with d3d0. See the commented portion of the passthru.map file for explanation of the different reset methods.

Also, it goes without saying that esxi reboot is required after any changes to passthru.map.
Now the passthru.map file looks like in the screenshot, after installing this configuration, according to your advice, I rebooted the server, but the driver is still not installed in the virtual machine. I will try to google about various methods of reset.
 

Attachments

das1996

Member
Sep 4, 2018
75
17
8
Maybe I missed it, but are you also passing the nvidia hd audio as a separate pci device? If not, then that may be the issue. I didn't see it in your screen shots above, just the video portion in the vm config.
 

gilsas

New Member
Oct 4, 2018
1
0
1
I am having a similar issue though I am not interested in getting HDMI out; "just" want to passthrough a Nvidia K420 to a Ubuntu VM such that I can do a bit of GPU accelerated Tensorflow from within the VM (and later use it for model serving). My setup is a Dell T20 (Xeon) running Esxi 6.7.
I have been trying real hard to make it work for about 2 month now with no luck, so I thought I'd come ask for suggestions.

I believe I tried everything I found here and elsewhere: the device is passed to the VM and seen from within (lspci) but driver cannot communicate with the hardware after install (NVRM: RmInitAdapter failed) in syslog.
I tried different versions of Ubuntu and did everything obvious (reserve all memory, hypervisor.cpuid.v0 set to false etc) as well as tweaking the passthru.map but no luck. I read Nvidia is not enabling the low-end Quadro GPUs for virtualization but well, hiding the hypervisor to the guest OS should work in pretty much the same way others make it work with GeForce cards, no?

Would greatly appreciate any thoughts / suggestions as to what I should try next :)
 

Tasik008

New Member
Oct 3, 2018
6
0
1
Today I managed to successfully forward an AMD radeon rx 560 graphics card into a virtual machine, the driver was installed from the disk. But the driver from the official AMD site leads to the appearance of a blue screen. Strange.
Now I think how to connect a client computer monitor to this video card. If you simply disable the SVGA integrated graphics, then nothing happens (well, except that the screen resolution becomes 1024 * 768).
Any thoughts?
 

Dravor

New Member
Aug 17, 2015
19
1
3
47
All,

Am running a r720 with a Nvidia P2000. It's running perfectly,until sometimes I try and reboot the VM and it hands. Any attempt to reset the VM results in the crashing of ESXi. I ended up having to full reboot. It looks to me like the VM/ESXi cannot reset the video card?

I saw that adding this helps for ATI cards:

1002 ffff link false to /etc/vmware/passthru.map?

Has anyone seen this with 6.5 and Nvidia cards?

Thanks!
 

marcoi

Well-Known Member
Apr 6, 2013
1,532
288
83
Gotha Florida
All,

Am running a r720 with a Nvidia P2000. It's running perfectly,until sometimes I try and reboot the VM and it hands. Any attempt to reset the VM results in the crashing of ESXi. I ended up having to full reboot. It looks to me like the VM/ESXi cannot reset the video card?

I saw that adding this helps for ATI cards:

1002 ffff link false to /etc/vmware/passthru.map?

Has anyone seen this with 6.5 and Nvidia cards?

Thanks!
I recall a similar situation when i was running a video card under ESXI 6.5. I think if you restart the VM you might be ok(doesnt seem to be the case for you), but if you shut it down and restart it, i think that is when you're probably going to need to restart the whole server to re-init the video card. It maybe reverse as well, IE shutdown is ok, reboot VM is bad. Best thing would be experiment until you find out what each scenario does.
IE
1. Reboot VM - Result:?
2. Shutdown VM - Result: ?
etc.

I think its just part of the experience unfortunately.
 

oakshade

New Member
Oct 15, 2018
2
0
1
I came to this thread looking for help passing through an AMD R7 360 on win 10 64 vm on vsphere 6.0u3.
The box is a Dell r5500, which was close to the last of Dell's version of a rackmount GPU install-able workstations. It has 5 pcix16 (physical and wired) slots, and one x8 (I think). For the most part, it is a GPU server.

Some of what you (marcoi) posted such as reserving all memory, and setting the CPU to 'Expose hardware assisted virtualization' helped get passed some of the issues. I had upgraded ESXi from 5.5u1 to 6.0u3. My inventory included a series of older (ver 8) hardware version VMs, including the one I wanted to passthrough with the new GPU card (r7360) . This machine had worked without any problem before (on VS 5.5u1 hw version 8) passing through an AMD firepro on vspere 5.5, so I figured it would be a snap.

Initially, after 6.0u3 upgrade, i then updated the hardware version (to version 11 from version8) on the passthroug VM thinking that should overcome most problems i might encounter (more compatible), however nothing I did would allow it to run right. It would mostly crash the VM when it tried to load the drivers (same problem others had). I worked on it for a solid week attempting to overcome whatever might be mis-configured.

After installing the hypervisor.cpuid.v0 = "FALSE" parameter, finally the vm would sometimes allow the AMD drivers to be installed and not crash. It would run like normal, however, the next time I started it, the drivers would not load. Instead the card would be disabled by error code 43 (cant start) in windows. It would take a ESXi reboot to get passed that again.

Finally I solved it.

I figured out that the problem is the VM hardware version. vSphere changed between version 5.5 and 6.0. It absolutely does not support passing through this card in hardware version 11. I reconfigured the VM by rebuilding it to version 10, and all is well even running under 6.0u3. Problems have gone; it loads and runs fine. I have no added configuration parameters in .vmx; it just works like normal.

@vinay - First thing is make sure setup of esxi and vm are good. The ati video card should be put into pass-through mode in esxi, you shouldnt need to edit any esxi files since amd doesnt lock their cards. You need to make sure to get all the devices used by your video card, usually two of them. Here is a sample of my nvidia card
View attachment 5100

You will need to reboot the server at this point to make sure the card is in pass-through.
Next follow these steps: You can refer to original post with pics on page one.


you may also need to add to the vm vmx file the following if your vm has more then 2gb ram.
pciHole.start = "1200"
pciHole.end = "2200"

Also you need to make the video card primary display on the bios, else it doesnt seem to init right when you pass it to the VM. This means that as you boot esxi the screen will stop updating and look like it hung, but it should still be booting up in background. Once you power on your vm you monitor should show the vm display.

if that isnt happening then something is configured incorrectly.

hope this helps.
 
Last edited:

Dravor

New Member
Aug 17, 2015
19
1
3
47
I recall a similar situation when i was running a video card under ESXI 6.5. I think if you restart the VM you might be ok(doesnt seem to be the case for you), but if you shut it down and restart it, i think that is when you're probably going to need to restart the whole server to re-init the video card. It maybe reverse as well, IE shutdown is ok, reboot VM is bad. Best thing would be experiment until you find out what each scenario does.
IE
1. Reboot VM - Result:?
2. Shutdown VM - Result: ?
etc.

I think its just part of the experience unfortunately.
That's a good point, and actually surprisingly the other day I think it did shut down without issue. I'll do some more testing in that regard.

Thanks!
 

oakshade

New Member
Oct 15, 2018
2
0
1
I solved my arduous AMD R7360 pasthrough adventure by setting the VS 6.0u3 hardware version to 10 from the native version 11. All problems disappeared.

I am having a similar issue though I am not interested in getting HDMI out; "just" want to passthrough a Nvidia K420 to a Ubuntu VM such that I can do a bit of GPU accelerated Tensorflow from within the VM (and later use it for model serving). My setup is a Dell T20 (Xeon) running Esxi 6.7.
I have been trying real hard to make it work for about 2 month now with no luck, so I thought I'd come ask for suggestions.

I believe I tried everything I found here and elsewhere: the device is passed to the VM and seen from within (lspci) but driver cannot communicate with the hardware after install (NVRM: RmInitAdapter failed) in syslog.
I tried different versions of Ubuntu and did everything obvious (reserve all memory, hypervisor.cpuid.v0 set to false etc) as well as tweaking the passthru.map but no luck. I read Nvidia is not enabling the low-end Quadro GPUs for virtualization but well, hiding the hypervisor to the guest OS should work in pretty much the same way others make it work with GeForce cards, no?

Would greatly appreciate any thoughts / suggestions as to what I should try next :)
 

smefa

New Member
Oct 23, 2018
2
1
1
Have anyone got passthrough working without setting the external gpu to primary?
I would really like to be able to enable the internal (Intel) graphics as primary display.

I got it working with a 1080 in Esxi 6.7 /6.7u1 with the following additions:

Have to disable the internal graphics and set the 1080 to primary, wont work with internal enabled as secondary.
hypervisor.cpuid.v0 = FALSE
pciPassthru0.msiEnabled = "FALSE"
SMBIOS.reflectHost = "TRUE"
 
  • Like
Reactions: mathiastro

Docop

Member
Jul 19, 2016
41
0
6
45
Do you mean you put your 1080 in bios as primary and passth ok. but the onboard don't work, right ?
I never been able to pass the onboard, perhaps there's a way with esxi6.7u1 ..
 

smefa

New Member
Oct 23, 2018
2
1
1
Do you mean you put your 1080 in bios as primary and passth ok. but the onboard don't work, right ?
I never been able to pass the onboard, perhaps there's a way with esxi6.7u1 ..
Exactly, when I enable internal it brakes the external passthrough.
 

ksquared

New Member
Nov 16, 2018
1
1
1
I've been following the steps in here with interest and I'm in the same state you were here.
vbios failed with error -2.

How did you resolve this part?
I've doubled checked the file is in /lib/firmware/radeon/vbios.bin
I even placed another file in /radeon/ but I didn't think that was necessary.

I got my vbios.bin file by running this command:
dd if=/dev/mem of=/boot/vbios.bin bs=65536 skip=12 count=1
on a bare metal linux install
I also tried using a rom file from gpu-z bare metal windows renamed to vbios.bin

Hopefully I won't have to recompile the kernel again, I was kind of surprised how long that took.
I know this post is a bit old but, but I've been the beneficiary of random forum posts in the past and figured it's my turn to give back.

I was able to get this working after a bit of painful trial and error - here are the details of what I did in case it's helpful for anyone.

My setup: ESXi 6.0, AMD Radeon 5450, Ubuntu 18.04, 2 monitors (1 HDMI and 1 DVI). I imagine similar steps will apply to ESXi 6.5, other AMD cards, and other versions of Linux (FWIW I confirmed this worked on Ubuntu 16.04 too)

If you've never compiled a linux kernel before then consider this a good "learning opportunity".

1. Download kernel source and install relevant build utilities
In Ubuntu you can do something like this
Code:
$ apt-get source linux-image-$(uname -r)
$ sudo apt-get build-dep linux-image-$(uname-r)
$ sudo apt-get install libssl-dev
(obviously other distros may have their own analogous steps here - Google is your friend)

My setup extracted the source code into a "linux-4.15.0" folder (this name is used in subsequent steps)

2. You'll need a kernel .config file that informs the build system how to compile the kernel. In Ubuntu I just did
Code:
$ cp /boot/config-$(uname -r) linux-4.15.0/.config
to start with a working Ubuntu config. In the past on other distros I've also done 'make oldconfig' or 'make menuconfig' (from inside the linux kernel source directory), but in this case I found it simplest to just start with the one that Ubuntu uses.

3. Modify the .config file. You can type 'make menuconfig' and hunt down the relevant options, or just open .config in a text editor, and apply these updates:
Code:
# CONFIG_DEBUG_INFO is not set
CONFIG_EXTRA_FIRMWARE="radeon/vbios.bin"
CONFIG_EXTRA_FIRMWARE_DIR="/lib/firmware"
Unsetting DEBUG_INFO is optional - it will make the compile go significantly faster (and consume significantly less disk space). It seems that Ubuntu defaults this to on which was annoying because it filled up my limited disk space on the first attempt.

The 2 EXTRA_FIRMWARE-related lines instruct the kernel build system to compile the video bios binary directly into the kernel. None of the other replies on this topic floating around the internet have ever mentioned doing this, so there may be a way to avoid it. Without these settings, I was stuck with the same "vbios error -2" during boot- for some reason the kernel was unable to access the filesystem during boot (so even though the vbios.bin file was in /lib/firmware/radeon the kernel didn't see it). This issue may be an artifact of Ubuntu's full disk encryption - not an expert on the Linux boot process and don't intend to be :).

4. Apply the "patch" to linux-4.15.0/drivers/gpu/drm/radeon/radeon_bios.c
I would strongly suggest just typing these changes by hand since this file may have changed since the 'diff' floating around the internet, and hence the 'patch' utility may end up making a mess of things. If you don't know C, then consider it yet another learning opportunity. The patch amounts to 3 small changes:

A: add a new #include at the top with the others

Code:
#include <linux/firmware.h>
B: Add a radeon_read_bios_from_firmware function somewhere above the existing radeon_read_bios_from_firmware function. Here is a copy/paste of mine:

Code:
static bool radeon_read_bios_from_firmware(struct radeon_device *rdev)
{
  const uint8_t __iomem *bios;
  resource_size_t size;
  const struct firmware *fw = NULL;

  request_firmware(&fw, "radeon/vbios.bin", rdev->dev);
  if (!fw) {
    DRM_ERROR("No bios file\n");
    return false;
  }

  size = fw->size;
  bios = fw->data;

  if (!bios) {
    DRM_ERROR("Bios missing from bios file\n");
    return false;
  }

  if (size == 0) {
    DRM_ERROR("Wrong sig - size is 0!\n");
    release_firmware(fw);
    return false;
  }
  if (bios[0] != 0x55 || bios[1] != 0xaa) {
    DRM_ERROR("Wrong sig - magic numbers are wrong!\n");
    release_firmware(fw);
    return false;
  }

  rdev->bios = kmalloc(size, GFP_KERNEL);
  if (rdev->bios == NULL) {
    DRM_ERROR("kmalloc failed\n");
    release_firmware(fw);
    return false;
  }

  memcpy(rdev->bios, bios, size);
  release_firmware(fw);
  return true;
}
C. Call the function in B inside the radeon_get_bios function if everything else fails (for me the original last one was "radeon_read_platform_bios", so I added my call after that):
Code:
        if (r == false)
                r = radeon_read_bios_from_firmware(rdev);
5. Copy your vbios.bin file to /lib/firmware/radeon/vbios.bin.

6. Compile the kernel. There are a few different procedures to do this. The Debian-specific scripts (recommended by Ubuntu and other places on the internet) seemed to be a huge rabbit hole mess when I tried them, but this "old-school" method for Debian-based distros should work:
Code:
make -j8 bindeb-pkg LOCALVERSION=-radeon
The "-j8" will compile in parallel using 8 processors (adjust for your machine).
the LOCALVERSION variable will append this tag onto the final kernel so we can identify it as our patched version.

While it builds you might as well confirm that the settings in step 3 worked - the build system should create a vbios-related object file from the one in /lib/firmware, and put it here: linux-4.15.0/firmware/radeon

Hopefully it just completes successfully but any errors are likely (hopefully?) just a typo from section 4.

7. Install the new kernel, and reboot
If everything went well then you should end up with a bunch of *.deb files 1 folder up from the kernel source folder. On Ubuntu I installed them like this
Code:
sudo dpkg -i linux-*4.15.*radeon*.deb
and rebooted

8. Force X to use the radeon graphics card
On my setup I'm not able to delete the SVGA adapter from ESXi, and Ubuntu/X seems to default to that one as the primary. I forced it to point at my radeon by creating a new file at "/etc/X11/xorg.conf.d/20-radeon.conf" with these contents:
Code:
Section "Device"
        Identifier "Radeon 5450"
        Driver "radeon"
        BusID "PCI:19:0:0"
EndSection
Note: You will need to adjust the BusID for your graphics card. Assuming pass-through is working, you can find the address by doing "lspci | grep VGA". lspci reports the address in base-16, so I think you have to convert to base-10 for the X config. In my case lspci reported "13:00.0" for the passed-through AMD GPU, so I put 19:0:0 in the X config. After creating this file and rebooting everything started working for me, and has been incredibly stable for 1 week+.

Again, I hope these steps are useful for someone.

If anyone more knowledgeable on the video bios loading procedure comes across this, perhaps they can comment on why this song and dance is even necessary in the first place? i.e., is there any technical reason why this functionality cannot be added upstream in the kernel driver and "just work" for users in the future? (sadly, it took about 15 seconds to get GPU pass through working in Windows) This is obviously not a 'user-friendly' solution but I promise it does work!
 
Last edited:
  • Like
Reactions: roswellian

Ross_c1234

New Member
Jul 20, 2019
1
0
1
Holey Cow! After a marathon trying to get an R9 280x to work on an old motherboard with an old i7 2600 I finally solved the problem.

I was getting BSOD on boot after passing the graphics card through to the win10 vm. After trying every conceivable combination to get it to work I had some progress starting windows in safe mode and then restarting again. Id get about 30 seconds out of it before BSOD.

The solution ended up being in the bios of the motherboard. I changed the iGPU to PCIE (Disabling it completely - it was on auto before this.) and also disabling onboard audio, it works! I have put everything back to how it was before with the exception of pciHole.start=1200 and pciHole.end=2400. I was able to install the radeon drivers and it runs fine now.