I've recently acquired a used server which has a Supermicro X9DRi-LN4F, 32 GB of RAM (8 x 4GB), and 2x e5-2630lv2.
Unfortunately its been very unstable, and usually crashes in ~10mins with something like this in the event log:
16 2019/01/08 16:14:55 OEM Memory Correctable Memory ECC @ DIMMC1(CPU1) - Asserted
I've tried running Memtest86, and on some runs it has no errors, but on others 1000s of errors like the above all in different locations (C1, D1, etc). I've tried moving the RAM sticks in different locations but cannot get a consistent fault on a specific stick.
Most of the crashes during regular use seem to occur in C1 (even after moving sticks around).
I have removed one CPU, so currently all 8 sticks are on the A-D RAM slots.
I tried swapping out the CPU with the other one and crashes still occur.
Any ideas on what could be the problems here or any other troubleshooting steps? At this point I think the motherboard must be faulty.
Is it a problem that I'm running with only one CPU?
edit: the most reliable way I've found to get it to crash is by doing large file transfer through winscp. I've tried cpu stress test tools and various memory testers, but they don't reliably cause crashes.
Unfortunately its been very unstable, and usually crashes in ~10mins with something like this in the event log:
16 2019/01/08 16:14:55 OEM Memory Correctable Memory ECC @ DIMMC1(CPU1) - Asserted
I've tried running Memtest86, and on some runs it has no errors, but on others 1000s of errors like the above all in different locations (C1, D1, etc). I've tried moving the RAM sticks in different locations but cannot get a consistent fault on a specific stick.
Most of the crashes during regular use seem to occur in C1 (even after moving sticks around).
I have removed one CPU, so currently all 8 sticks are on the A-D RAM slots.
I tried swapping out the CPU with the other one and crashes still occur.
Any ideas on what could be the problems here or any other troubleshooting steps? At this point I think the motherboard must be faulty.
Is it a problem that I'm running with only one CPU?
edit: the most reliable way I've found to get it to crash is by doing large file transfer through winscp. I've tried cpu stress test tools and various memory testers, but they don't reliably cause crashes.