SM X9DRi-LNF4+ Memory Errors when running Dual Channel

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

ww2planes1

New Member
Mar 26, 2018
9
3
3
40
Hi all. I'm hoping someone here can tell me if I've got a hardware error or if I just have a configuration issue. I'm building a FreeNAS system to replace an old 4-bay Synology NAS, so I just bought a used Supermicro server off eBay. The specs are:
- SM 846 case
- SM X9DRI-LNF4 motherboard
- 2x E5-2640L v1
- 4x4GB Samsung DDR3 RDIMM

The system came fully assembled minus hard drives, and has a 30 day warranty. I started by firing the system up and running memtest86, only to have the system completely lock up after a few minutes. The IPMI logs reported multiple single bit memory errors on DIMMC1, so I started troubleshooting. After a day or so of swapping DIMMs sand testing, here's what I've found:

- As delivered, system had RAM in slots A1 and C1 (CPU1) and E1 and G1 (CPU2). In this config it is very unstable and crashes with IPMI errors on DIMMC1 C1
- First, I removed RAM from C1 and E1, switching to just a single DIMM per CPU. In this config the system seems stable. Runs memtest for 1hr+. I ran this test with all four dimms, so I think my RAM is good.
-Next, I swapped the cpus and put the RAM back in the A1/C1,E1/G1 config. In this config the system crashed with C1 single bit errors, so I don't think one of the CPUs is the issue.
- Next, i removed one CPU and ran a single CPU with all four DIMMs in A1,B1,C1,D1. System seemed stable, it was able to run memtest86 for a couple of hours without crashing.

After all of this, it seems likely to me that the motherboard and/or CPU didn't like running the ram in slots A1 and C1. I have rearranged the DIMMs to slots A1,B1, E1 and F1, and that appears stable. It ran memtest86 for 2 hours and it currently 12 hours into running Prime95 for a 24 hour burn in. (I'll run a long duration memtest86 after I finish with Prime95)

What I am looking for is someone to confirm my testing. I want to make sure that I don't have two issues here. Is it common to see Single Bit RAM errors if you aren't running the dual channel RAM in the correct slots? Or is it possible that I have an issue with the motherboard as well? I am planning on upgrading the RAM at some point down the line, so I'd like to know now if I need a new Mobo (before the 30 warranty expires)
 

nthu9280

Well-Known Member
Feb 3, 2016
1,628
498
83
San Antonio, TX
Check and make sure the bottom of the CPUs clean and not oxidated. Clean with alcohol wipe.

The other thing to check if any of the cpu socket pins are bent or any debris preventing proper contact between the socket pins and CPU pads.



Sent from my Nexus 6 using Tapatalk
 

ww2planes1

New Member
Mar 26, 2018
9
3
3
40
I did wipe both chips with IPA before reinstalling to make sure I didn't have and thermal compound on the undersides. They looked clean. I also did a quick inspection of the sockets and I didn't notice any damage, but it's possible I missed something. I will check again tonight.
 

ww2planes1

New Member
Mar 26, 2018
9
3
3
40
May be the malt in the IPA you used is causing the bad contact? :)

Sent from my Nexus 6 using Tapatalk
HAH! I knew someone was going to make a beer joke. Nah, I've got 99% Isopropyl Alcohol.

It ran memtest86 for 24 hours with the ram it slots A1 and B1, so I think it's good. I went ahead and ordered a bunch more RAM off eBay, once it arrives I'll populate the remaining slots and see if I can show that it was just a configuration issue.
 

cactus

Moderator
Jan 25, 2011
830
75
28
CA
The heat sink holes are blind, so make sure when you install the heat sink is it not lifting the retention mechanism off the board.
I did this on two boards thinking they or the CPUs were bad.
 

vrod

Active Member
Jan 18, 2015
241
43
28
31
I have had issues with the C1 slot on a board I bought as well, same model as yours. Would throw tons of memtest86 errors from that slot. Got the board exchanged and it didn’t happen anymore. Before that I had tried 3 different dimms as well as 3 different cpu’s.
 

ww2planes1

New Member
Mar 26, 2018
9
3
3
40
Looks like it's just that the supermicro board gets really unhappy if you don't go A->B->C->D when installing the RAM. I just got more RAM installed and it's running perfectly fine with 56GB of RAM (7 sticks per CPU). Memtest is perfectly solid, so I think I'm going to declare my new NAS ready for use!

Thanks.
 

james23

Active Member
Nov 18, 2014
441
122
43
52
Boy is this an interesting thread. I have the exact same MB and am having just about, the exact same issues (as OP and vrod's reply)!! (im thinking i need this MB replaced, as my ebay seller is willing to do)

2x questions:
1- what speed is your ram really running at (not dimm's spec'd speed)?
(as bios is usally set to AUTO dimm speed- best way ive found to know for sure, is see what memtest shows as its xxx GB/s speed, ~11.08 GB/s = 1066 , ~ 11.35 GB/s = 1333 )

2- now that you have your 56gb , what is the longest memtest run you have done with no errors (also what version of memtest? is it UEFI mode memteset (black backgound during testing) or BIOS memtest (blue background)?

3- are the dimm sticks your using on the supermicro QVL ? (ie the list of "tested" memory , by supermicro)
thanks.

here is my thread: https://forums.servethehome.com/ind...-with-just-2-slots-unless-lock-to-1066.20413/


MEM ISSUES WITH SAME EXACT MB:

https://forums.servethehome.com/ind...emory-errors-when-running-dual-channel.19197/
 

ww2planes1

New Member
Mar 26, 2018
9
3
3
40
Sorry for the delayed reply. It looks like you figured it out, but here is what you asked about anyway...

My problem was definitely due to only having two sticks of RAM per CPU and having those in the C1 and D1 memory slots. (Not my own setup, I bought my server off eBay and wrongly assumed the seller had configured the server correctly)

Running BIOS Memtest 86+ 5.01, RAM speed reports as 666MHz (DDR-1333) but transfer speed is only reporting ~8900MB/sec. (Close to, but slightly faster than DDR-1066). This is true with both AUTO and when I force the speed to 1333. Not sure exactly why that is. I've checked a couple of settings, but nothing has changed it.

I don't recall exactly how long I tested for, but I think it was at least 24 hours. My memtest errors were occurring very frequently, so once I got past the first hour or two, I was pretty confident I had solved the issue.

The modules I'm using are not on the QVL, but they are Samsung, and they are all the same p/n.

Finally, and perhaps most crucially, it seems I won the eBay lottery, as my board is a Rev 1.20A.