Supermicro X9 CPLD CATERR error

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Svarto

New Member
Dec 5, 2017
4
0
1
43
Hi All,

I have a recently put together server that is crashing / freezing every ~10 minutes or so and forces me to reboot it. This is what is shown in the error logs:

Event,Type,Timestamp,Sensor Type,Sensor,Event Type
1,System Event,2018/01/29 05:36:05 Mon,OEM,#0xFF,Assertion: OEM| Event = AC Power On
2,System Event,2018/01/29 06:57:33 Mon,CPLD,#0xFF,Assertion: CPLD Event| Event = CATERR

The IPMI connection still works and that is then how I connect through KVM and reboot the machine.

Does anyone have any idea what is causing this and how can I fix it?

This is my setup:

CPU: dual SR1A6 Intel Xeon E5-2680V2 2.8GHz
RAM: 4x16gb IBM PC3-12800R 2RX4 ECCReg
MB: Supermicro X9DRD-EF
SSD: Samsung PM863a, 480GB
 

pricklypunter

Well-Known Member
Nov 10, 2015
1,708
515
113
Canada
One of the CPU's is detecting a catastrophic error by the look of it. Pull em out, clean them and the socket pins, put them back in, one at a time, with fresh thermal paste and make sure the Heatsink's are sitting well on them. Beyond that, possibly a RAM issue of some sort or something getting a little too hot :)
 

Svarto

New Member
Dec 5, 2017
4
0
1
43
Thanks a lot! I will try that, I have it running a memtest86 today and saw some other things popping up with one of the DIMMs throwing lots of assertions that ECC corrections were made. Maybe it's a bad RAM module as you said.

One question, how do you clean the CPU and pins? Just with some vacuumed air or a cloth?

Always a bit worried when it comes to the CPU pins, they seem very sensitive.

Sent from my Pixel XL using Tapatalk
 

pricklypunter

Well-Known Member
Nov 10, 2015
1,708
515
113
Canada
I use those little printer head stick/pad thingies with some Isopropyl Alcohol, but whatever you use, make certain it is totally lint free and will not catch on anything that's sitting proud. So that means no cotton buds ;)

Just a gentle rub/swipe over the pins with it, give it a min to soak in and repeat as necessary to get them sparkling clean. The CPU's you can put a bit more pressure on, but the same rule applies, rub over it, let it soak, repeat as necessary. A quick blow dry with a hair dryer on cool to finish off. Don't be tempted to touch either the pins in the sockets or the CPU pads with your fingers and be careful when you re-seat the chips :)

After doing the CPU's and re-testing, pull the RAM and clean those up and re-seat them too, just in case it's oxidation on the contacts causing your issue, failing that, pull the RAM that's suspect and test again. It's really just a matter of divide and conquer until you can for certain identify the culprit :)