vmware esxi 6.5 machine learning multi gpu gtx 1080ti setup

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

rabz

New Member
Nov 1, 2017
2
0
1
54
Hi,

Ive been reading posts here and i'd hope if some of the gurus here could help.

Im building a machine learning multi gpu over esxi 6.5, but the host crashes when adding the second passthrough gpu.

Single guest vm ubuntu 16.04 with single gpu passthrough - works
Two guests vm ubuntu single gpu passthrough, running concurrently - works

Single guest TWO gpu passthrough - Fail, host crash, idrac complains
"A bus fatal error was detected on a component at slot 4."

Do i need to add additional parameters? I have hypervisor.cpuid0=false


My setup:
Dell T620, dual xeon e5 v2, 192gb ram all slots filled w same type of ram.
two gtx 1080 ti on pcie slot 4 (cpu1) and slot 5 (cpu2)
ESXi-6.5.0-20171004001-standard (Build 6765664)
vSphere version 6.5.0.10000

Here's my vmx config

.encoding = "UTF-8"
config.version = "8"
virtualHW.version = "11"
vmci0.present = "TRUE"
numvcpus = "16"
memSize = "32768"
sched.cpu.units = "mhz"
sched.cpu.latencySensitivity = "normal"
tools.upgrade.policy = "manual"
scsi0.virtualDev = "lsilogic"
scsi0.present = "TRUE"
sata0.present = "TRUE"
sata0:0.startConnected = "FALSE"
sata0:0.deviceType = "cdrom-image"
sata0:0.fileName = "/vmfs/volumes/59d4b973-8966855f-1131-d4ae52889379/iso/ubuntu-16.04.3-desktop-amd64.iso"
sata0:0.present = "TRUE"
scsi0:0.deviceType = "scsi-hardDisk"
scsi0:0.fileName = "2gpu.vmdk"
sched.scsi0:0.shares = "normal"
sched.scsi0:0.vFlash.module = "vfc"
sched.scsi0:0.vFlash.blockSize = "8192"
sched.scsi0:0.vFlash.min = "21474836480"
sched.scsi0:0.vFlash.max = "21474836480"
sched.scsi0:0.vFlash.enabled = "TRUE"
scsi0:0.present = "TRUE"
ethernet0.virtualDev = "vmxnet3"
ethernet0.networkName = "VM Network"
ethernet0.addressType = "vpx"
ethernet0.generatedAddress = "00:50:56:8c:ca:a6"
ethernet0.uptCompatibility = "TRUE"
ethernet0.present = "TRUE"
displayName = "asrefi t00"
guestOS = "ubuntu-64"
toolScripts.afterPowerOn = "TRUE"
toolScripts.afterResume = "TRUE"
toolScripts.beforeSuspend = "TRUE"
toolScripts.beforePowerOff = "TRUE"
tools.syncTime = "FALSE"
tools.guest.desktop.autolock = "FALSE"
messageBus.tunnelEnabled = "FALSE"
uuid.bios = "42 0c 32 ee 43 ee db cf-40 a7 87 16 e9 f1 7a ca"
vc.uuid = "50 0c 2a a0 ed 88 57 d0-7b 53 2a 03 94 e7 3f 38"
nvram = "asrefi t00.nvram"
pciBridge0.present = "TRUE"
svga.present = "TRUE"
pciBridge4.present = "TRUE"
pciBridge4.virtualDev = "pcieRootPort"
pciBridge4.functions = "8"
pciBridge5.present = "TRUE"
pciBridge5.virtualDev = "pcieRootPort"
pciBridge5.functions = "8"
pciBridge6.present = "TRUE"
pciBridge6.virtualDev = "pcieRootPort"
pciBridge6.functions = "8"
pciBridge7.present = "TRUE"
pciBridge7.virtualDev = "pcieRootPort"
pciBridge7.functions = "8"
hpet0.present = "true"
firmware = "efi"
sched.scsi0:0.throughputCap = "off"
pciPassthru.use64bitMMIO = "true"
hypervisor.cpuid.v0 = "false"
virtualHW.productCompatibility = "hosted"
replay.supported = "false"
pciBridge0.pciSlotNumber = "17"
pciBridge4.pciSlotNumber = "21"
pciBridge5.pciSlotNumber = "22"
pciBridge6.pciSlotNumber = "23"
pciBridge7.pciSlotNumber = "24"
scsi0.pciSlotNumber = "16"
ethernet0.pciSlotNumber = "160"
vmci0.pciSlotNumber = "32"
sata0.pciSlotNumber = "33"
monitor.phys_bits_used = "42"
vmotion.checkpointFBSize = "4194304"
vmotion.checkpointSVGAPrimarySize = "16777216"
softPowerOff = "TRUE"
svga.guestBackedPrimaryAware = "TRUE"
tools.remindInstall = "FALSE"
sched.mem.pin = "TRUE"
numa.autosize.vcpu.maxPerVirtualNode = "8"
numa.autosize.cookie = "160001"
toolsInstallManager.lastInstallError = "0"
toolsInstallManager.updateCounter = "1"
pciHole.dynStart = "2560"
migrate.hostlog = "asrefi t00-2e4f6aef.hlog"
sched.cpu.min = "0"
sched.cpu.shares = "normal"
sched.mem.min = "32768"
sched.mem.minSize = "32768"
sched.mem.shares = "normal"
sched.swap.derivedName = "/vmfs/volumes/59e03834-3a401102-7319-d4ae52889379/2gpu/2gpu-ebbbae13.vswp"
uuid.location = "56 4d f0 1d 75 6e d8 eb-35 1f 35 fd 1b e7 e8 8d"
replay.filename = ""
scsi0:0.redo = ""
vmci0.id = "-370050358"
cleanShutdown = "TRUE"
cpuid.80000001.edx = "-----------H--------------------"
cpuid.80000001.edx.amd = "-----------H--------------------"
mks.use3dRenderer = "software"
floppy0.present = "FALSE"
svga.autodetect = "TRUE"
pciPassthru0.id = "00000:002:00.0"
pciPassthru0.deviceId = "0x1b06"
pciPassthru0.vendorId = "0x10de"
pciPassthru0.systemId = "59ba495c-36f5-d967-30d7-d4ae52889379"
pciPassthru0.present = "TRUE"
pciPassthru0.pciSlotNumber = "192"
pciPassthru1.id = "00000:067:00.0"
pciPassthru1.deviceId = "0x1b06"
pciPassthru1.vendorId = "0x10de"
pciPassthru1.systemId = "59ba495c-36f5-d967-30d7-d4ae52889379"
pciPassthru1.present = "TRUE"
 

rabz

New Member
Nov 1, 2017
2
0
1
54
Could it be that i need to add another parameter to pcislotnumber?
pciPassthru0.pciSlotNumber = "192" - some other number perhaps?
 

Dean

Member
Jun 18, 2015
116
11
18
48
My understanding is that you cant pass through 2 GPU's to one VM, unless your using Nvidia Grid.