Supermicro X9DRi-LN4F+ Memory Errors

ycp

Member
Jun 22, 2014
175
8
18
I recently bought a supermicro 36bay chassis with a Supermicro X9DRi-LN4F+ motherboard.
I have installed Unraid on this server but i keep getting memory errors in the SEL of the motherboard and in Unraid Logs
The erros are similar to the below link

Bug 875194 – sbridge: HANDLING MCE MEMORY ERROR

Does anyone have any experience with these type of errors?
Is it something i can safely ignore or should i replace my motherboard?
 

Terry Kennedy

Well-Known Member
Jun 25, 2015
1,068
508
113
New York City
www.glaver.org
Bug 875194 – sbridge: HANDLING MCE MEMORY ERROR

Does anyone have any experience with these type of errors?
Is it something i can safely ignore or should i replace my motherboard?
Install the mcelog package from whatever repo your Linux distribution uses. You can run older kernel logs through it with "mcelog --ascii < file". It doesn't do a very good job of localizing memory faults, except on certain combinations of CPU and motherboard. But it should tell you the failing address and you should be able to find which DIMM(s) contain that address using dmidecode. If it always seems to be in one DIMM, try replacing that DIMM. If it is a multi-CPU system and happens on various DIMMs always on one CPU, try exchanging DIMMs between the CPU banks. If it still happens on the same CPU, swap CPUs between sockets. If the problem moves, may be a bad CPU or dirty contacts on bottom of the CPU. If it still stays in the same CPU socket, may be bent socket pin, lint in socket, or a bad motherboard. If it is everywhere, try updating the system BIOS and eliminate any memory overclocking (if used). If it still fails, you either have subtly incompatible memory, memory which doesn't really support the speed reported by the module, or a motherboard problem. Is the chassis power supply sufficient for all hardware you have installed?
 
  • Like
Reactions: pricklypunter

ycp

Member
Jun 22, 2014
175
8
18
Hey Terry,

My Unraid server doesn't freeze or restart on its own, and there don't seem to be any other problems other than these log messages.

The errors are CE errors which mean they are correctable, as they are not Uncorrectable errors
so are these errors serious or can i ignore them?
 
Last edited:

Terry Kennedy

Well-Known Member
Jun 25, 2015
1,068
508
113
New York City
www.glaver.org
The errors are CE errors which mean they are correctable, as they are not Uncorrectable errors so are these errors serious or can i ignore them?
A correctable memory error is only 1 bit away from being an uncorrectable memory error. :( Plus, if these are filling up your logs it makes it harder to see more important messages that might be mixed in.