EPYCD8 "reset" multiple time after frozen.

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

RE0

New Member
Apr 23, 2023
6
0
1
I have a ASRock Rack EPYCD8 motherboard with EPYC 7401P processer and a single micron RECC memory module.

Sometimes the system will be frozen (No matter which software is running on it). Few seconds later the monitor goes black and the debug led on the board shows a binking "00" before it successfully "reset" something and started to boot.

Some cards I plugged into the pcie slot:
XFX RX590
PCIE to USB3.0 converter (NEC chips)

The only memory module on the board:
MTA36ASF2G72PZ-2G1A2IK DDR4 RECC
Memtest shows no error.

It will be appreciated if anybody tells me what is happening on this system.

Sorry for my weak English.

Best regards.
 

RolloZ170

Well-Known Member
Apr 24, 2016
5,427
1,642
113
Code:
384  | 05/25/2023, 07:12:48 | DRAM ECC ErrorE1 | Memory                             | Correctable ECC - Asserted
the limit of the corr. ECC count can be reached, im that time server should freeze or reboot.
replace microc RDIMM, memtest86 EFI can poll ECC errors. without ECC polling you see no errors in memtest.
 

ocfguy

Active Member
Oct 25, 2022
100
51
28
Just to clarify, MemTest86 as in the proprietary version. The open-source Memtest86+ doesn't have EDAC support yet.

 
  • Like
Reactions: RE0

RE0

New Member
Apr 23, 2023
6
0
1
Code:
384  | 05/25/2023, 07:12:48 | DRAM ECC ErrorE1 | Memory                             | Correctable ECC - Asserted
the limit of the corr. ECC count can be reached, im that time server should freeze or reboot.
replace microc RDIMM, memtest86 EFI can poll ECC errors. without ECC polling you see no errors in memtest.
Thanks for your reply.
That day I took the mem out and put it into another slot without cleaning the golden finger.It boots up and failed immediately.
Then I noticed that the system is complaining about memory error which wasn't happened before.
I took that out, cleaned the golden fingers then put it back in. Erased the SEL & run memtest under efi.No error reported on both memtest and SEL.
This morning(May_28) the system freezed again.I can't see any valuable items in the log.
Now I am really confused.But I will replace that rdimm anyway.
 

RolloZ170

Well-Known Member
Apr 24, 2016
5,427
1,642
113
had similar things with asrock rack and intel lga3647 cpu at only one RDIMM. with two the reboot (CATERR) never happened again.
and AFAIK EPYC 7001 looses many performance running with only one mem.stick.
 
  • Like
Reactions: RE0

RE0

New Member
Apr 23, 2023
6
0
1
had similar things with asrock rack and intel lga3647 cpu at only one RDIMM. with two the reboot (CATERR) never happened again.
and AFAIK EPYC 7001 looses many performance running with only one mem.stick.
Yep.Only one of eight channels are enabled.
I'll go with quad Samsung 8G RDIMM .