Troubleshooting GPU passthrough ESXi 6.5

Discussion in 'VMware, VirtualBox, Citrix' started by Ch33rios, Jan 2, 2017.

  1. Tasik008

    Tasik008 New Member

    Joined:
    Oct 3, 2018
    Messages:
    6
    Likes Received:
    0
    Yes I did
     

    Attached Files:

    #181
  2. das1996

    das1996 New Member

    Joined:
    Sep 4, 2018
    Messages:
    19
    Likes Received:
    2
    #182
  3. Tasik008

    Tasik008 New Member

    Joined:
    Oct 3, 2018
    Messages:
    6
    Likes Received:
    0
    An attempt to add a device identifier to the configuration file did not lead to anything. The driver is still not installed.
    I tried to add a line with parameter link, and delete it, the result has not changed.
     

    Attached Files:

    #183
  4. das1996

    das1996 New Member

    Joined:
    Sep 4, 2018
    Messages:
    19
    Likes Received:
    2
    Instead of link use d3d0

    I added these lines to the end of the passthru.map file
    -------------
    10de 128b d3d0 false
    10de 0e0f d3d0 false
    --------------------

    Adjust the part in blue as per your specific adapter.

     
    #184
  5. Tasik008

    Tasik008 New Member

    Joined:
    Oct 3, 2018
    Messages:
    6
    Likes Received:
    0
    I added the lines with ID device that you marked in blue. Line 10de ffff link false in my screenshot was added after an unsuccessful installation of the NVIDIA driver in the virtual machine for verification.
    Could the error be due to the inability to install nested virtualization in the virtual machine settings?
     
    #185
  6. das1996

    das1996 New Member

    Joined:
    Sep 4, 2018
    Messages:
    19
    Likes Received:
    2
    Link is a different type of disconnect method than d3d0. I couldn't get the video card to work with link, but it did work with d3d0. See the commented portion of the passthru.map file for explanation of the different reset methods.

    Also, it goes without saying that esxi reboot is required after any changes to passthru.map.
     
    #186
  7. Tasik008

    Tasik008 New Member

    Joined:
    Oct 3, 2018
    Messages:
    6
    Likes Received:
    0
    Now the passthru.map file looks like in the screenshot, after installing this configuration, according to your advice, I rebooted the server, but the driver is still not installed in the virtual machine. I will try to google about various methods of reset.
     

    Attached Files:

    #187
  8. das1996

    das1996 New Member

    Joined:
    Sep 4, 2018
    Messages:
    19
    Likes Received:
    2
    Maybe I missed it, but are you also passing the nvidia hd audio as a separate pci device? If not, then that may be the issue. I didn't see it in your screen shots above, just the video portion in the vm config.
     
    #188
  9. gilsas

    gilsas New Member

    Joined:
    Oct 4, 2018
    Messages:
    1
    Likes Received:
    0
    I am having a similar issue though I am not interested in getting HDMI out; "just" want to passthrough a Nvidia K420 to a Ubuntu VM such that I can do a bit of GPU accelerated Tensorflow from within the VM (and later use it for model serving). My setup is a Dell T20 (Xeon) running Esxi 6.7.
    I have been trying real hard to make it work for about 2 month now with no luck, so I thought I'd come ask for suggestions.

    I believe I tried everything I found here and elsewhere: the device is passed to the VM and seen from within (lspci) but driver cannot communicate with the hardware after install (NVRM: RmInitAdapter failed) in syslog.
    I tried different versions of Ubuntu and did everything obvious (reserve all memory, hypervisor.cpuid.v0 set to false etc) as well as tweaking the passthru.map but no luck. I read Nvidia is not enabling the low-end Quadro GPUs for virtualization but well, hiding the hypervisor to the guest OS should work in pretty much the same way others make it work with GeForce cards, no?

    Would greatly appreciate any thoughts / suggestions as to what I should try next :)
     
    #189
  10. Tasik008

    Tasik008 New Member

    Joined:
    Oct 3, 2018
    Messages:
    6
    Likes Received:
    0
    Today I managed to successfully forward an AMD radeon rx 560 graphics card into a virtual machine, the driver was installed from the disk. But the driver from the official AMD site leads to the appearance of a blue screen. Strange.
    Now I think how to connect a client computer monitor to this video card. If you simply disable the SVGA integrated graphics, then nothing happens (well, except that the screen resolution becomes 1024 * 768).
    Any thoughts?
     
    #190
  11. Dravor

    Dravor New Member

    Joined:
    Aug 17, 2015
    Messages:
    17
    Likes Received:
    1
    All,

    Am running a r720 with a Nvidia P2000. It's running perfectly,until sometimes I try and reboot the VM and it hands. Any attempt to reset the VM results in the crashing of ESXi. I ended up having to full reboot. It looks to me like the VM/ESXi cannot reset the video card?

    I saw that adding this helps for ATI cards:

    1002 ffff link false to /etc/vmware/passthru.map?

    Has anyone seen this with 6.5 and Nvidia cards?

    Thanks!
     
    #191
  12. marcoi

    marcoi Well-Known Member

    Joined:
    Apr 6, 2013
    Messages:
    1,334
    Likes Received:
    205
    I recall a similar situation when i was running a video card under ESXI 6.5. I think if you restart the VM you might be ok(doesnt seem to be the case for you), but if you shut it down and restart it, i think that is when you're probably going to need to restart the whole server to re-init the video card. It maybe reverse as well, IE shutdown is ok, reboot VM is bad. Best thing would be experiment until you find out what each scenario does.
    IE
    1. Reboot VM - Result:?
    2. Shutdown VM - Result: ?
    etc.

    I think its just part of the experience unfortunately.
     
    #192
  13. oakshade

    oakshade New Member

    Joined:
    Oct 15, 2018
    Messages:
    2
    Likes Received:
    0
    I came to this thread looking for help passing through an AMD R7 360 on win 10 64 vm on vsphere 6.0u3.
    The box is a Dell r5500, which was close to the last of Dell's version of a rackmount GPU install-able workstations. It has 5 pcix16 (physical and wired) slots, and one x8 (I think). For the most part, it is a GPU server.

    Some of what you (marcoi) posted such as reserving all memory, and setting the CPU to 'Expose hardware assisted virtualization' helped get passed some of the issues. I had upgraded ESXi from 5.5u1 to 6.0u3. My inventory included a series of older (ver 8) hardware version VMs, including the one I wanted to passthrough with the new GPU card (r7360) . This machine had worked without any problem before (on VS 5.5u1 hw version 8) passing through an AMD firepro on vspere 5.5, so I figured it would be a snap.

    Initially, after 6.0u3 upgrade, i then updated the hardware version (to version 11 from version8) on the passthroug VM thinking that should overcome most problems i might encounter (more compatible), however nothing I did would allow it to run right. It would mostly crash the VM when it tried to load the drivers (same problem others had). I worked on it for a solid week attempting to overcome whatever might be mis-configured.

    After installing the hypervisor.cpuid.v0 = "FALSE" parameter, finally the vm would sometimes allow the AMD drivers to be installed and not crash. It would run like normal, however, the next time I started it, the drivers would not load. Instead the card would be disabled by error code 43 (cant start) in windows. It would take a ESXi reboot to get passed that again.

    Finally I solved it.

    I figured out that the problem is the VM hardware version. vSphere changed between version 5.5 and 6.0. It absolutely does not support passing through this card in hardware version 11. I reconfigured the VM by rebuilding it to version 10, and all is well even running under 6.0u3. Problems have gone; it loads and runs fine. I have no added configuration parameters in .vmx; it just works like normal.

     
    #193
    Last edited: Oct 22, 2018
  14. Dravor

    Dravor New Member

    Joined:
    Aug 17, 2015
    Messages:
    17
    Likes Received:
    1
    That's a good point, and actually surprisingly the other day I think it did shut down without issue. I'll do some more testing in that regard.

    Thanks!
     
    #194
  15. oakshade

    oakshade New Member

    Joined:
    Oct 15, 2018
    Messages:
    2
    Likes Received:
    0
    I solved my arduous AMD R7360 pasthrough adventure by setting the VS 6.0u3 hardware version to 10 from the native version 11. All problems disappeared.

     
    #195
  16. smefa

    smefa New Member

    Joined:
    Oct 23, 2018
    Messages:
    2
    Likes Received:
    1
    Have anyone got passthrough working without setting the external gpu to primary?
    I would really like to be able to enable the internal (Intel) graphics as primary display.

    I got it working with a 1080 in Esxi 6.7 /6.7u1 with the following additions:

    Have to disable the internal graphics and set the 1080 to primary, wont work with internal enabled as secondary.
    hypervisor.cpuid.v0 = FALSE
    pciPassthru0.msiEnabled = "FALSE"
    SMBIOS.reflectHost = "TRUE"
     
    #196
    mathiastro likes this.
  17. Docop

    Docop New Member

    Joined:
    Jul 19, 2016
    Messages:
    15
    Likes Received:
    0
    Do you mean you put your 1080 in bios as primary and passth ok. but the onboard don't work, right ?
    I never been able to pass the onboard, perhaps there's a way with esxi6.7u1 ..
     
    #197
  18. smefa

    smefa New Member

    Joined:
    Oct 23, 2018
    Messages:
    2
    Likes Received:
    1
    Exactly, when I enable internal it brakes the external passthrough.
     
    #198
  19. ksquared

    ksquared New Member

    Joined:
    Nov 16, 2018
    Messages:
    1
    Likes Received:
    1
    I know this post is a bit old but, but I've been the beneficiary of random forum posts in the past and figured it's my turn to give back.

    I was able to get this working after a bit of painful trial and error - here are the details of what I did in case it's helpful for anyone.

    My setup: ESXi 6.0, AMD Radeon 5450, Ubuntu 18.04, 2 monitors (1 HDMI and 1 DVI). I imagine similar steps will apply to ESXi 6.5, other AMD cards, and other versions of Linux (FWIW I confirmed this worked on Ubuntu 16.04 too)

    If you've never compiled a linux kernel before then consider this a good "learning opportunity".

    1. Download kernel source and install relevant build utilities
    In Ubuntu you can do something like this
    Code:
    $ apt-get source linux-image-$(uname -r)
    $ sudo apt-get build-dep linux-image-$(uname-r)
    $ sudo apt-get install libssl-dev
    (obviously other distros may have their own analogous steps here - Google is your friend)

    My setup extracted the source code into a "linux-4.15.0" folder (this name is used in subsequent steps)

    2. You'll need a kernel .config file that informs the build system how to compile the kernel. In Ubuntu I just did
    Code:
    $ cp /boot/config-$(uname -r) linux-4.15.0/.config
    to start with a working Ubuntu config. In the past on other distros I've also done 'make oldconfig' or 'make menuconfig' (from inside the linux kernel source directory), but in this case I found it simplest to just start with the one that Ubuntu uses.

    3. Modify the .config file. You can type 'make menuconfig' and hunt down the relevant options, or just open .config in a text editor, and apply these updates:
    Code:
    # CONFIG_DEBUG_INFO is not set
    CONFIG_EXTRA_FIRMWARE="radeon/vbios.bin"
    CONFIG_EXTRA_FIRMWARE_DIR="/lib/firmware"
    Unsetting DEBUG_INFO is optional - it will make the compile go significantly faster (and consume significantly less disk space). It seems that Ubuntu defaults this to on which was annoying because it filled up my limited disk space on the first attempt.

    The 2 EXTRA_FIRMWARE-related lines instruct the kernel build system to compile the video bios binary directly into the kernel. None of the other replies on this topic floating around the internet have ever mentioned doing this, so there may be a way to avoid it. Without these settings, I was stuck with the same "vbios error -2" during boot- for some reason the kernel was unable to access the filesystem during boot (so even though the vbios.bin file was in /lib/firmware/radeon the kernel didn't see it). This issue may be an artifact of Ubuntu's full disk encryption - not an expert on the Linux boot process and don't intend to be :).

    4. Apply the "patch" to linux-4.15.0/drivers/gpu/drm/radeon/radeon_bios.c
    I would strongly suggest just typing these changes by hand since this file may have changed since the 'diff' floating around the internet, and hence the 'patch' utility may end up making a mess of things. If you don't know C, then consider it yet another learning opportunity. The patch amounts to 3 small changes:

    A: add a new #include at the top with the others

    Code:
    #include <linux/firmware.h>
    B: Add a radeon_read_bios_from_firmware function somewhere above the existing radeon_read_bios_from_firmware function. Here is a copy/paste of mine:

    Code:
    static bool radeon_read_bios_from_firmware(struct radeon_device *rdev)
    {
      const uint8_t __iomem *bios;
      resource_size_t size;
      const struct firmware *fw = NULL;
    
      request_firmware(&fw, "radeon/vbios.bin", rdev->dev);
      if (!fw) {
        DRM_ERROR("No bios file\n");
        return false;
      }
    
      size = fw->size;
      bios = fw->data;
    
      if (!bios) {
        DRM_ERROR("Bios missing from bios file\n");
        return false;
      }
    
      if (size == 0) {
        DRM_ERROR("Wrong sig - size is 0!\n");
        release_firmware(fw);
        return false;
      }
      if (bios[0] != 0x55 || bios[1] != 0xaa) {
        DRM_ERROR("Wrong sig - magic numbers are wrong!\n");
        release_firmware(fw);
        return false;
      }
    
      rdev->bios = kmalloc(size, GFP_KERNEL);
      if (rdev->bios == NULL) {
        DRM_ERROR("kmalloc failed\n");
        release_firmware(fw);
        return false;
      }
    
      memcpy(rdev->bios, bios, size);
      release_firmware(fw);
      return true;
    }
    
    C. Call the function in B inside the radeon_get_bios function if everything else fails (for me the original last one was "radeon_read_platform_bios", so I added my call after that):
    Code:
            if (r == false)
                    r = radeon_read_bios_from_firmware(rdev);
    5. Copy your vbios.bin file to /lib/firmware/radeon/vbios.bin.

    6. Compile the kernel. There are a few different procedures to do this. The Debian-specific scripts (recommended by Ubuntu and other places on the internet) seemed to be a huge rabbit hole mess when I tried them, but this "old-school" method for Debian-based distros should work:
    Code:
    make -j8 bindeb-pkg LOCALVERSION=-radeon
    The "-j8" will compile in parallel using 8 processors (adjust for your machine).
    the LOCALVERSION variable will append this tag onto the final kernel so we can identify it as our patched version.

    While it builds you might as well confirm that the settings in step 3 worked - the build system should create a vbios-related object file from the one in /lib/firmware, and put it here: linux-4.15.0/firmware/radeon

    Hopefully it just completes successfully but any errors are likely (hopefully?) just a typo from section 4.

    7. Install the new kernel, and reboot
    If everything went well then you should end up with a bunch of *.deb files 1 folder up from the kernel source folder. On Ubuntu I installed them like this
    Code:
    sudo dpkg -i linux-*4.15.*radeon*.deb
    and rebooted

    8. Force X to use the radeon graphics card
    On my setup I'm not able to delete the SVGA adapter from ESXi, and Ubuntu/X seems to default to that one as the primary. I forced it to point at my radeon by creating a new file at "/etc/X11/xorg.conf.d/20-radeon.conf" with these contents:
    Code:
    Section "Device"
            Identifier "Radeon 5450"
            Driver "radeon"
            BusID "PCI:19:0:0"
    EndSection
    Note: You will need to adjust the BusID for your graphics card. Assuming pass-through is working, you can find the address by doing "lspci | grep VGA". lspci reports the address in base-16, so I think you have to convert to base-10 for the X config. In my case lspci reported "13:00.0" for the passed-through AMD GPU, so I put 19:0:0 in the X config. After creating this file and rebooting everything started working for me, and has been incredibly stable for 1 week+.

    Again, I hope these steps are useful for someone.

    If anyone more knowledgeable on the video bios loading procedure comes across this, perhaps they can comment on why this song and dance is even necessary in the first place? i.e., is there any technical reason why this functionality cannot be added upstream in the kernel driver and "just work" for users in the future? (sadly, it took about 15 seconds to get GPU pass through working in Windows) This is obviously not a 'user-friendly' solution but I promise it does work!
     
    #199
    Last edited: Nov 23, 2018
    roswellian likes this.
  20. Ross_c1234

    Ross_c1234 New Member

    Joined:
    Jul 20, 2019
    Messages:
    1
    Likes Received:
    0
    Holey Cow! After a marathon trying to get an R9 280x to work on an old motherboard with an old i7 2600 I finally solved the problem.

    I was getting BSOD on boot after passing the graphics card through to the win10 vm. After trying every conceivable combination to get it to work I had some progress starting windows in safe mode and then restarting again. Id get about 30 seconds out of it before BSOD.

    The solution ended up being in the bios of the motherboard. I changed the iGPU to PCIE (Disabling it completely - it was on auto before this.) and also disabling onboard audio, it works! I have put everything back to how it was before with the exception of pciHole.start=1200 and pciHole.end=2400. I was able to install the radeon drivers and it runs fine now.
     
    #200
Similar Threads: Troubleshooting passthrough
Forum Title Date
VMware, VirtualBox, Citrix troubleshooting vmware Aug 2, 2017
VMware, VirtualBox, Citrix Audio Problem - nVidia 8400GS Passthrough - Windows 7 - ESXi 6.7.0 U1 Nov 23, 2019
VMware, VirtualBox, Citrix ESXI 6.5u3 - GPU passthrough to single VM Oct 5, 2019
VMware, VirtualBox, Citrix ESXi 6.7U2, OSX Mojave, GPU Passthrough Jul 6, 2019
VMware, VirtualBox, Citrix Disk Passthrough ESXi to FreeNAS Jul 4, 2019

Share This Page