4223HE Throwing ECC Errors in Linux-Bench

Discussion in 'Processors and Motherboards' started by herby, Apr 13, 2015.

  1. herby

    herby Active Member

    Joined:
    Aug 18, 2013
    Messages:
    169
    Likes Received:
    44
    EDIT: I just noticed I mistyped the title of this thread is should read:
    "4332HE Throwing ECC Errors in Linux-Bench"


    Specs:
    Supermicro H8SCM-F
    AMD 4332 HE
    KVR16R11D4/16 (16GB x 2 RDIMMs)


    Prices on Ebay for 4332 Opterons have gotten super low so I thought it was time to upgrade from my old 4171 HE. So to put the CPU though it's paces I've been doing a little stress testing/benchmarking.

    Unfortunately, my server keeps hanging in Linux-bench. When the Unixbench Dhrystone tests run the system event log in IPMIView starts throwing up:
    Code:
    Assertion: Memory| Event = Correctable ECC@DIMM1A(CPU1)
    filling the log and by the Whetstone tests Ubuntu hangs.

    I've tried swapping slots and the error is always DIMM1A. My RAM is on the list of tested by AMD with 4200 & 4300 Opterons albeit at lower than stock 1600Mhz. I don't recall really stressing the old 4171 HE much except in virtual machines but it seemed stable.

    What could be the cause of instability?
    • Heat? (IPMI never showed anything too high)
    • Memory clock too high? (I'd be pissed if I have to go down to the tested 1066)
    • Bad Memory? (Errors always on the same slot, but I'm running Memtest86+ v4.20 to be safe)
    • Bad CPU or even worse MOBO?
    I've another Supermicro H8SCM-F, more RAM (UDIMMs and non-ECC) and of course the old 4171 HE I could swap out; but is there a less drastic next step I'm overlooking?
     
    #1
    Last edited: Apr 13, 2015
  2. EffrafaxOfWug

    EffrafaxOfWug Radioactive Member

    Joined:
    Feb 12, 2015
    Messages:
    1,110
    Likes Received:
    371
    From the sounds of it you've tried swapping the same DIMM into different slots and it's always the same slot...? ATo narrow down the fault I'd try, one DIMM at a time, doing a memtest86+ pass to confirm it's always the same slot which generate errors with both DIMMs. If that's the case then it's typically either a) something almost invisibly obstructing the slot (dust bunnies are always a favourite so take a can of compressed air to it if you can) or b) bad CPU socket (check for bent or missing pins) or worse c) likely a duff motherboard...
     
    #2
  3. Patriot

    Patriot Moderator

    Joined:
    Apr 18, 2011
    Messages:
    1,291
    Likes Received:
    673
    Could be dirty pads. Memory errors are often so. Opterons are SOI, pretty damn hard to kill... try cleaning and reseating.
     
    #3
    herby and OBasel like this.
  4. herby

    herby Active Member

    Joined:
    Aug 18, 2013
    Messages:
    169
    Likes Received:
    44
    I don't think I have any bent pins. Memtest86+ is running on both, but the first pass completed clear.

    That could be it, the 4332 had a little somthing on the pads I wiped off, I'll try more thoroughly with some 90% isopropyl alcohol.
     
    #4
  5. Patriot

    Patriot Moderator

    Joined:
    Apr 18, 2011
    Messages:
    1,291
    Likes Received:
    673
    If it doesn't swap procs and see if it follows.
     
    #5
  6. OBasel

    OBasel Active Member

    Joined:
    Dec 28, 2010
    Messages:
    494
    Likes Received:
    62
    You know what, I've seen this more times with the DDR4 modules than I did with DDR3. I don't know if that's a thing but cleaning goes a long way.
     
    #6
  7. Patriot

    Patriot Moderator

    Joined:
    Apr 18, 2011
    Messages:
    1,291
    Likes Received:
    673
    DDR4 modules are harder to seat correctly. Reseating probably does more good than cleaning. :shrugs:
     
    #7
  8. herby

    herby Active Member

    Joined:
    Aug 18, 2013
    Messages:
    169
    Likes Received:
    44
    Cleaned the pads on the 4332HE and Unixbench completed without Ubuntu hanging and no more correctable ECC errors in the log.

    Edit: If anyone is curious the reference number for the run is 03131428954003
     
    #8
    Last edited: Apr 13, 2015
    Patriot likes this.

Share This Page