Supermicro x12SPL-F won't boot with a GPU in slot 6 - a weird issue...

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

newdamage1

New Member
Oct 30, 2023
4
3
3
I am setting up an ESXi host where I've been passing through a Tesla P4 to a VM. This worked great on my old hardware (x10) but has issues with the new x12 board with a 4310 and 128G of supermicro ram

When installing (any, I've tried a few) GPU, I can get it to boot to VMware on the first boot; after a restart, it hangs on DXE--BIOS PCI Bus Enumeration... 92, and then resets over and over. I've been able to get the server to boot again after finagling with removing the card and changing things in the bios, but it's totally unreliable what exactly the secret sauce is.

Has anyone run across anything similar with this gen board?

( should note, I've updated the bios, and have done a bios reset)
 

oneplane

Well-Known Member
Jul 23, 2021
846
485
63
Sounds a bit like a BIOS fault, the DXE should do things like enumerate the bus and then load and GPU EFI ROM if needed. A reset at that stage seems like a crash to me.

Does it also happen if you don't boot into VMWare but instead boot into something else? And does it happen on warm reboots, or on cold boots as well? Does it happen with EFIVars reset (in case the OS is writing some EFI Vars which causes the DXE to fail).
 

RolloZ170

Well-Known Member
Apr 24, 2016
5,467
1,654
113
cold/warm boot issue, probably just a BIOS PCIe setting can help.
pls ask supermicro for help.
 

newdamage1

New Member
Oct 30, 2023
4
3
3
OK... Rounds (and rounds) of tests to see what could be causing it. I moved my HBA from Slot 4 to Slot 2 and the GPU to Slot 4 to see if it was just an issue with Slot 6. Same deal...

So, I removed the GPU and left the HBA in slot 2, and it does the same thing with this new configuration! Grr! In my frustration, I powered it off while it was hanging on the DXE, and a few seconds later, it powered itself back on and booted into ESX. Hmmm!

I then moved the HBA back to Slot 4 and let it fail on its initial boot, powered it off similarly, and booted to esx.

So it looks like a cold boot of the server, let it fail, and set the power off. Wait ten seconds, power back on, and it boots fine.

oneplane: I didn't have time to try booting to a different OS, I may tinker with that tomorrow just to see. But It does seem like it doesn't find the changed slots until after it boots to the os. (from the assertion log)

Rollo: It really does seem like a cold/warm issue of some sort, I'll ping SM support on Monday.
 

newdamage1

New Member
Oct 30, 2023
4
3
3
I had a little time to test today and found that things booted fine if I pulled the connectx-3 card out. It doesn't matter where the connectx-3 is installed; it won't boot with both. In the short term, this is OK, and 80% of things won't notice only having a pair of one gig nics. I subbed a ticket with supermicro, I'll update when I hear back.
 
  • Haha
Reactions: RolloZ170

oneplane

Well-Known Member
Jul 23, 2021
846
485
63
Thanks for the updates! Did you also run the cx311 in another system to see if that gets messed up too? Or did you feed it to the trashcan :p