sm H12SSL-i - stuck on PCI Enumeration only on warm reboot / restart (cold power up FINE)

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

james23

Active Member
Nov 18, 2014
462
127
43
53
I recently upgraded my daily desktop machine's H11 + 7262 (8c epyc gen2) to a h12SSL-i + 7302 (16c epyc gen2), w 4x ECC DDR4 hynx (is on supermicro QVL for BOTH boards) + a Quadro RTX 4000, and a Quadro p2000. (only disk of any kind attached is the boot drive, a p4800x 375gb optane U.2 nvme, connected via the SlimSAS x8 port (on the MB).

My issue is during post, it kept getting hung on DXE - PCI Bus Enumeration... 92 - at which point its locked up (keyboard caps lock/numlock frozen, ect). I then have to power cycle it, AND IT WILL BOOT fine after that! (it took some time to figure this out, but even in windows 11, or ubnutu LiveCD , if you choose restart, the bios will then hang on PCI Bus Enumeration. but if you choose Shutdown, then when system is off, press power button, it will boot up fine.

I have tried:
updating SM bios from 2.1, to current 2.4 (same)​
removing all PCIe cards, except rtx 4000 (or trying just the p2000) (same)​
CMOS resets, short-jumper, load defaults (same)​
BIOS setting VGA priority -> ONboard or Offboard (same)​
onboard jumper (disable VGA , 1-2 , 2-3) (same)​
(when it hangs, i have then tried checking the vKVM and it will show the exact same stuck on "DXE - PCI Bus Enumeration... 92 -" message as well)​
keep in mind please, the GPUs and the ram and boot drive, were moved directly over from my H11SSL, so im reasonably confident they are not the issue (ie are good).

I have seen this mentioned with this same board in these threads (but with no clear resolution, and the threads got a bit de-railed so im making this new thread):
>" so, while it works, then all is fine, except these errors on boot. most of time bios stuck is after a reboot.​
> when you cold boot a system, then it always boots just fine. "​
anyone have any solution or know of any resolution to this?

tks
 

i386

Well-Known Member
Mar 18, 2016
4,849
1,895
113
36
Germany
The quadros if, I remember correctly, have only display ports.
Is a monitor attached to at least one display port?
 

RolloZ170

Well-Known Member
Apr 24, 2016
9,412
3,017
113
germany
anyone have any solution or know of any resolution to this?
just some ideas. whats the difference on cold/warm boot.
cold boot: microcode is loaded/refreshed
DDR4 training
warm boot: microcode is still there, must not loaded.
DDR4 training is skipped
some PCIe training/detections are not made too on warm boots.
update:
wrong driver could set GPU to undefined mode which can not be left by warm reboot.
( remember NVMe ssd not present after wrong init. by driver on reboot )
 
Last edited:
  • Like
Reactions: james23

jdnz

Member
Apr 29, 2021
89
25
18
have you tried removing BOTH video cards and running with JUST the onboard VGA? Most lilkely culprit is a system BIOS/video card BIOS interaction
 

james23

Active Member
Nov 18, 2014
462
127
43
53
Thanks for all the replies! (edit: i have also opened a support case at supermicro,will update here)

some answers:


The quadros if, I remember correctly, have only display ports.
Is a monitor attached to at least one display port?
You are correct on ALL DP ports on quadro GPUs, the only exception being my quadro RTX 4000 , which has 3x DP and 1x USB-C (the card comes with a USB-C to DP adaptor, which i am using with success).
and YES all DP ports are connected to real monitors. (i actually have a massive display setup, consisting of 8x monitors, all DP - 5x are 4k , 3x are 2560p). Nothing is plugged into the onboard VGA port (ofcourse)

just some ideas. whats the difference on cold/warm boot.
cold boot: microcode is loaded/refreshed
DDR4 training
warm boot: microcode is still there, must not loaded.
DDR4 training is skipped
some PCIe training/detections are no made too on warm boots.
Great info, i was not aware of this. Thank you!
In terms of the DDR4 ECC in use: this is the exact memory that was in my H11 system with a epyc 7262 cpu (that setup ran perfectly for ~1.5 years).
I also did run about 6hr of memtest86 (v10.2), 0 errors on this H12SSL + epyc 7302p setup.
The exact memory is on BOTH supermicro QVLs for my prior H11SSL-NC board, and this H12SSL-i board.

have you tried removing BOTH video cards and running with JUST the onboard VGA? Most likely culprit is a system BIOS/video card BIOS interaction
I agree with this, although i have not tested with 0 GPUs (as i just assumed there would be no problem, if i had no GPU / no pcie cards attached, however i will test this and reply back).
Or is anyone else running a H12 board + Nvidia GPUs and can comment on if this effects them?

I do have the bios set to UEFI, (so all the pcie slots are set to EFI in the bios as well), would changing them to legacy or disabled change anything?

ie this setting is what im
referring to:
1675790359423.png
 
Last edited:

MasterControl

New Member
Jan 15, 2023
22
7
3
james23,

Are you running DDR4 1.2v 3200 ECC?

H12SSL-i doesn't officially support anything lower than that.
 

james23

Active Member
Nov 18, 2014
462
127
43
53
james23,

Are you running DDR4 1.2v 3200 ECC?

H12SSL-i doesn't officially support anything lower than that.
thanks MasterControl, yes the exact ram im using is on the SM QVL for this board (and acutally my prior H11ssl-NC too!) . (and i also double checked it just now - it is D4 3200 1.2v ECC -exact model: 4x 32gb sticks of: - HYNIX HMAA4GR7AJR8N-32GB 2RX8 PC4-3200AA-R DDR4 RDIMM-3200MHZ)
as an update- im working through some steps with supermicro support over the past 24h (and forward), when that is done i will post the full transcript (and result) here.

in the mean time please keep replying with any suggestions or confirmations if others are experiencing this same issue please,
thanks
 

turbo944s2

New Member
Dec 2, 2022
4
0
1
thanks MasterControl, yes the exact ram im using is on the SM QVL for this board (and acutally my prior H11ssl-NC too!) . (and i also double checked it just now - it is D4 3200 1.2v ECC -exact model: 4x 32gb sticks of: - HYNIX HMAA4GR7AJR8N-32GB 2RX8 PC4-3200AA-R DDR4 RDIMM-3200MHZ)
as an update- im working through some steps with supermicro support over the past 24h (and forward), when that is done i will post the full transcript (and result) here.

in the mean time please keep replying with any suggestions or confirmations if others are experiencing this same issue please,
thanks
James I am having the same issue on my H12DSi-N6. Are you plotting chia by any chance? LOL
 

turbo944s2

New Member
Dec 2, 2022
4
0
1
James. I got it to boot and reboot.

1. Installed GPU in Slot 6
2. Disabled the onboard video with a jumper.
Disabled the following settings.
SR-IOV
BME DMA MITIGATION
PCIE ARI SUPPORT
PCIE TEN BIT TAG SUPPORT
PCIE SPREAD SPECTRUM
RELAXED ORDERING
NO SNOOP
VGA PRIORITY SET TO OFF BOARD
ON-BOARD VIDEO OPTION ROM DISABLED.

Im a high pressure boiler operator not a computer expert, keep that in mind. I have been working with computers for 35 years though lol. Good lukc
 

klove007

New Member
Feb 8, 2023
5
1
3
James. I got it to boot and reboot.

1. Installed GPU in Slot 6
2. Disabled the onboard video with a jumper.
Disabled the following settings.
SR-IOV
BME DMA MITIGATION
PCIE ARI SUPPORT
PCIE TEN BIT TAG SUPPORT
PCIE SPREAD SPECTRUM
RELAXED ORDERING
NO SNOOP
VGA PRIORITY SET TO OFF BOARD
ON-BOARD VIDEO OPTION ROM DISABLED.

Im a high pressure boiler operator not a computer expert, keep that in mind. I have been working with computers for 35 years though lol. Good lukc
I'm having similar issues on reboot, but just a blank screen with no posting or bios errors. I am using onboard video at this time, Ill try these steps and try post back.

Any update from the OP from SM support?
 

markonen

New Member
Mar 3, 2021
13
9
3
Had this exact issue with the same hardware (H12SSL-i and a 7302P), reseating all the RAM and moving a Mellanox ConnectX-4 Lx from one slot to another fixed it for me. Not sure which action did the trick, but signs point to a random bad connection.
 

89giop

Member
Dec 4, 2020
49
30
18
Hi all, having the same issue with a H12SSW-NTR. Board seems to boot fine (around 100 seconds) if I unplug power and BMC resets then system powers on. Weirdly though it seems that it hangs on cold boots after that. It hangs at code 92, but if I just ctr+alt+del it then boots fine on the second POST. So it seems that cold boots are a problem but not warm boots. Trying to figure it out as we speak....

100 seconds is the boot time with 1 ram stick and 2 pcie cards plugged in. It is about 140 seconds if I plug in the 8x slimsas connectors on the mobo to the backplane.

Have tried unplugging all pcie devices, including the riser and just running off 1 ram stick and the issue persists...have also emailed SM support for some advice.

I did torque down the CPU to spec, but I am wondering if I should remove it and inspect it and the pins?
 
Last edited:

89giop

Member
Dec 4, 2020
49
30
18
Update. Removed and reseated cpu as well as 3rd try with another cpu. No visibile debris in the socket or cpu and no visbly bent pins.... Same symptoms as before.

However, I also worked out the following, whilst ctrl+alt+del reset will always result in a succesfull boot. Every 2nd cold boot will also work. then 3rd will fail, 4th will work an so fort and so on...

I have also taken the board out the chassis now to make sure it was not shorting out. Have reloaded bmc and bios multiple times....still same issue.


Have reloaded bmc and bios mutlple times as well as tried a few downgrades and upgrades....

I am at a loss here
 

89giop

Member
Dec 4, 2020
49
30
18
Haven't given up on the board just yet. I was thinking about what else lives on the pcie bus...and so i disable onboard lan by moving the physical jumper. If I do that, the board boots fine everytime and it actually boots much faster. Does that point to anything? LAN does seem to work once it is booted though, so could it maybe have something to do with the NIC firmware?
 

89giop

Member
Dec 4, 2020
49
30
18
Ok, one more update, it seems that if I do the following 3 things the system behaves normally:

1. DIsable SecureBoot
2. Enable CMS
3. DIsable Lan OPROM

There is obviously something happening with the FW/OPROM that is hanging up the system. I will try and play around with the OPROM settings to see if any of thes settings there make a difference (because I would like to have secure boot enabled). @NablaSquaredG I saw in another tread that you had some updated FW for the bcm57416. This board was manufactured in 2019 and maybe the LAN FW is outdated and has issues with EFI boots.

Firmware version:
 

89giop

Member
Dec 4, 2020
49
30
18
OK So after checking with SM Support. The FW loaded is not right, the SM OEM FW also starts with a triple digit like 226.0.145.0
The only BCM57416 FW with double digits like mine is the DELL version. At this point it's quite possible whoever had this board before updated the FW with the wrong file. I am waiting for SM to send me their latest FW (226.0.145.0) so I can flash it and test whether the board works properly after that.