amd epyc rome on h122ssl-c random crashes

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

KaneTW

Member
Aug 1, 2023
34
10
8
Dealing with some weird resets with my new (H12SSL-NT brand new, used China EPYC 7642) system too. Ran fine all day yesterday, turned it off for the night, resets before ESXi finishes booting. (In fact sometimes it just resets during PCI enumeration postcode 94)

E: I noticed that a ping to the BMC (on the 10G NIC) timeouts when this happens, but it doesn't always happen.
E2: I suspect a bad NVMe cable (SlimSAS to 2x OCuLink), as the crash occurs much more frequently with one of the two OCuLinks
E3: still occurring. this is driving me nuts. I've contacted SM supports but it's a weekend so
 
Last edited:

julianmh

New Member
Jul 10, 2023
23
2
3
It was the nvmes. One time I sat by the server watching it die and had the opportunity to open event viewer.

Both Samsung 990 had the same issue I guess it's a driver or firmware issue (yes newest firmwares were installed).
Swapped disks had no issues anymore.
 

KaneTW

Member
Aug 1, 2023
34
10
8
It was the nvmes. One time I sat by the server watching it die and had the opportunity to open event viewer.

Both Samsung 990 had the same issue I guess it's a driver or firmware issue (yes newest firmwares were installed).
Swapped disks had no issues anymore.
I'm running Samsung PM9A3 (2x 2.5" 7.68TB and M.2 960GB) so that's possibly the problem for me as well.
 

KaneTW

Member
Aug 1, 2023
34
10
8
Can confirm it is *not* the SSD in my case. Removed all SSDs (M.2 and 2.5") and it still crashes (in different ways, from PSOD to instant reset) when booting ESXi via HTTP boot.
 

Cosmin Stanciu

New Member
Mar 12, 2024
2
0
1
Hello.

Configuration - Epyc 7552; Supermicro H12SSL-NT; 256 RAM; Samsung 980 Pro; Corsair HX1500i; 1 x RTX4090

It started to reboot just moving the mouse. After several different OS installs and BIOS changes , even tried new ram - sort of making it to work.

Got to the point where it reboots right after Bios when trying to load the OS.

Only thing that made it to work was disabling Core Boost Performance in Bios instead leaving it on auto. Also C-State disabled, Watchdog disabled ; Watchdog jumper removed from the actual board.

The only problem I have right now - CPU is 2.2Ghz all the time, not boosting higher. Everything I do , idle , heavy loads , it stays on 2.2 - not moving even 100mhz up or down.

Turning the Core Boost Performance on Auto - goes back to loading the bios and crashing exactly after that - would crash even when booting os from USB stick with CBP on Auto.

Hopefully this info would help someone.
Slept 3 hours in 4 days o_O

Maybe someone with more experience would sheld some light on that core boost performance thing.

All the best,
Cosmin
 

KaneTW

Member
Aug 1, 2023
34
10
8
In my case it was a defective CPU. Installed a new 7763 (AMD distributor sourced, not China) and works like a charm.
 

Cosmin Stanciu

New Member
Mar 12, 2024
2
0
1
@KaneTW Just wondering .. as you have the same motherboard as mine - H12SSL-NT - assuming the latest version of BIOS - 2.7?? - and also an Epyc CPU - Would you mind to share if in BIOS - CPU Configuration - Core Performance Boost - is set on AUTO instead of Disabled?

If that is the case , I guess something broke on my CPU suddenly and can't boost thus the alltime 2.2Ghz speed.

If it's not - would you mind trying to set it to Auto and see if the same problem occurs?
With the Core Performance Boost set on Auto - not booting into the OS.

Would appreciate it.

Many thanks.
Cosmin
 

KaneTW

Member
Aug 1, 2023
34
10
8
Sorry, can't check right now. My system is in production and I don't have downtime scheduled at the moment.