amd epyc rome on h122ssl-c random crashes

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

KaneTW

Member
Aug 1, 2023
31
10
8
Dealing with some weird resets with my new (H12SSL-NT brand new, used China EPYC 7642) system too. Ran fine all day yesterday, turned it off for the night, resets before ESXi finishes booting. (In fact sometimes it just resets during PCI enumeration postcode 94)

E: I noticed that a ping to the BMC (on the 10G NIC) timeouts when this happens, but it doesn't always happen.
E2: I suspect a bad NVMe cable (SlimSAS to 2x OCuLink), as the crash occurs much more frequently with one of the two OCuLinks
E3: still occurring. this is driving me nuts. I've contacted SM supports but it's a weekend so
 
Last edited:

julianmh

New Member
Jul 10, 2023
23
2
3
It was the nvmes. One time I sat by the server watching it die and had the opportunity to open event viewer.

Both Samsung 990 had the same issue I guess it's a driver or firmware issue (yes newest firmwares were installed).
Swapped disks had no issues anymore.
 

KaneTW

Member
Aug 1, 2023
31
10
8
It was the nvmes. One time I sat by the server watching it die and had the opportunity to open event viewer.

Both Samsung 990 had the same issue I guess it's a driver or firmware issue (yes newest firmwares were installed).
Swapped disks had no issues anymore.
I'm running Samsung PM9A3 (2x 2.5" 7.68TB and M.2 960GB) so that's possibly the problem for me as well.
 

KaneTW

Member
Aug 1, 2023
31
10
8
Can confirm it is *not* the SSD in my case. Removed all SSDs (M.2 and 2.5") and it still crashes (in different ways, from PSOD to instant reset) when booting ESXi via HTTP boot.