4223HE Throwing ECC Errors in Linux-Bench

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

herby

Active Member
Aug 18, 2013
187
53
28
EDIT: I just noticed I mistyped the title of this thread is should read:
"4332HE Throwing ECC Errors in Linux-Bench"


Specs:
Supermicro H8SCM-F
AMD 4332 HE
KVR16R11D4/16 (16GB x 2 RDIMMs)


Prices on Ebay for 4332 Opterons have gotten super low so I thought it was time to upgrade from my old 4171 HE. So to put the CPU though it's paces I've been doing a little stress testing/benchmarking.

Unfortunately, my server keeps hanging in Linux-bench. When the Unixbench Dhrystone tests run the system event log in IPMIView starts throwing up:
Code:
Assertion: Memory| Event = Correctable ECC@DIMM1A(CPU1)
filling the log and by the Whetstone tests Ubuntu hangs.

I've tried swapping slots and the error is always DIMM1A. My RAM is on the list of tested by AMD with 4200 & 4300 Opterons albeit at lower than stock 1600Mhz. I don't recall really stressing the old 4171 HE much except in virtual machines but it seemed stable.

What could be the cause of instability?
  • Heat? (IPMI never showed anything too high)
  • Memory clock too high? (I'd be pissed if I have to go down to the tested 1066)
  • Bad Memory? (Errors always on the same slot, but I'm running Memtest86+ v4.20 to be safe)
  • Bad CPU or even worse MOBO?
I've another Supermicro H8SCM-F, more RAM (UDIMMs and non-ECC) and of course the old 4171 HE I could swap out; but is there a less drastic next step I'm overlooking?
 
Last edited:

EffrafaxOfWug

Radioactive Member
Feb 12, 2015
1,394
511
113
From the sounds of it you've tried swapping the same DIMM into different slots and it's always the same slot...? ATo narrow down the fault I'd try, one DIMM at a time, doing a memtest86+ pass to confirm it's always the same slot which generate errors with both DIMMs. If that's the case then it's typically either a) something almost invisibly obstructing the slot (dust bunnies are always a favourite so take a can of compressed air to it if you can) or b) bad CPU socket (check for bent or missing pins) or worse c) likely a duff motherboard...
 

Patriot

Moderator
Apr 18, 2011
1,450
790
113
Could be dirty pads. Memory errors are often so. Opterons are SOI, pretty damn hard to kill... try cleaning and reseating.
 
  • Like
Reactions: herby and OBasel

herby

Active Member
Aug 18, 2013
187
53
28
From the sounds of it you've tried swapping the same DIMM into different slots and it's always the same slot...? ATo narrow down the fault I'd try, one DIMM at a time, doing a memtest86+ pass to confirm it's always the same slot which generate errors with both DIMMs. If that's the case then it's typically either a) something almost invisibly obstructing the slot (dust bunnies are always a favourite so take a can of compressed air to it if you can) or b) bad CPU socket (check for bent or missing pins) or worse c) likely a duff motherboard...
I don't think I have any bent pins. Memtest86+ is running on both, but the first pass completed clear.

Could be dirty pads. Memory errors are often so. Opterons are SOI, pretty damn hard to kill... try cleaning and reseating.
That could be it, the 4332 had a little somthing on the pads I wiped off, I'll try more thoroughly with some 90% isopropyl alcohol.
 

Patriot

Moderator
Apr 18, 2011
1,450
790
113
I don't think I have any bent pins. Memtest86+ is running on both, but the first pass completed clear.



That could be it, the 4332 had a little somthing on the pads I wiped off, I'll try more thoroughly with some 90% isopropyl alcohol.
If it doesn't swap procs and see if it follows.
 

OBasel

Active Member
Dec 28, 2010
494
62
28
You know what, I've seen this more times with the DDR4 modules than I did with DDR3. I don't know if that's a thing but cleaning goes a long way.
 

Patriot

Moderator
Apr 18, 2011
1,450
790
113
You know what, I've seen this more times with the DDR4 modules than I did with DDR3. I don't know if that's a thing but cleaning goes a long way.
DDR4 modules are harder to seat correctly. Reseating probably does more good than cleaning. :shrugs:
 

herby

Active Member
Aug 18, 2013
187
53
28
Cleaned the pads on the 4332HE and Unixbench completed without Ubuntu hanging and no more correctable ECC errors in the log.

Edit: If anyone is curious the reference number for the run is 03131428954003
 
Last edited:
  • Like
Reactions: Patriot