Ram error when upgrading to 48 GB ram

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

alsenior

Member Member
Apr 19, 2016
56
14
8
Hi Everyone,

I upgraded my server from 24GB of ram to 48GB of ram and ever since the upgrade i have had millions(Literally) of single bit errors on all of the new dimms. Thinking it was a faulty set i swapped them out with a new set and i have had the exact same issue. When i swap back to the original 24GB dimms there are no errors.

This is with the 48GB ram after 1 hour
alsenior@fileserver 23:11 ~ grep "[0-9]" /sys/devices/system/edac/mc/mc*/csrow*/ch*_ce_count
/sys/devices/system/edac/mc/mc0/csrow0/ch0_ce_count:32781
/sys/devices/system/edac/mc/mc0/csrow0/ch1_ce_count:110
/sys/devices/system/edac/mc/mc0/csrow0/ch2_ce_count:98391
/sys/devices/system/edac/mc/mc1/csrow0/ch0_ce_count:32822
/sys/devices/system/edac/mc/mc1/csrow0/ch1_ce_count:32858
/sys/devices/system/edac/mc/mc1/csrow0/ch2_ce_count:0

This is with the 24 GB set after 2 Days:

alsenior@fileserver 20:39 ~ grep "[0-9]" /sys/devices/system/edac/mc/mc*/csrow*/ch*_ce_count
/sys/devices/system/edac/mc/mc0/csrow0/ch0_ce_count:0
/sys/devices/system/edac/mc/mc0/csrow0/ch1_ce_count:0
/sys/devices/system/edac/mc/mc0/csrow0/ch2_ce_count:0
/sys/devices/system/edac/mc/mc1/csrow0/ch0_ce_count:0
/sys/devices/system/edac/mc/mc1/csrow0/ch1_ce_count:0
/sys/devices/system/edac/mc/mc1/csrow0/ch2_ce_count:0

Motherboard is a Supermicro x8dtl-6f running the latest Bios. Processors are X5650's
 

Terry Kennedy

Well-Known Member
Jun 25, 2015
1,140
594
113
New York City
www.glaver.org
I upgraded my server from 24GB of ram to 48GB of ram and ever since the upgrade i have had millions(Literally) of single bit errors on all of the new dimms. Thinking it was a faulty set i swapped them out with a new set and i have had the exact same issue. When i swap back to the original 24GB dimms there are no errors.

Motherboard is a Supermicro x8dtl-6f running the latest Bios. Processors are X5650's
Are all of the DIMMs the same brand/model with similar datecodes? I've found the X8DTH-iF to be somewhat picky in configurations with lots of memory. I upgraded from 48GB to 96GB by adding another (supposedly) identical set of 6 * HMT31GR7AFR4C-H9 8GB registered modules to the 6 that were already in there. I got all sorts of weird behavior, with the system randomly seeing between 24GB and 104GB o_O of RAM. It would report "system is operating at 800MHz" in the BIOS splash screen. Running memtest86+ generally saw 96GB of RAM, but would hang or reboot the system when asked to display the SPD information. When the system rebooted, it would see 96GB at 1066MHz (what was expected) but I'd get lots of correctable errors. It turns out that a mis-labeled module got into the tray - it was a 16GB 800 part, even though it was marked HMT31GR7AFR4C-H9. I wound up isolating it by running the system with one module, then two, then three and so on, until things went flakey.

The moral of that story is that the motherboard won't always detect an invalid memory configuration.

Your X8DTL-6F is documented as being limited to 24GB if the memory is unbuffered. To get beyond 24GB, all memory must be registered ECC memory. 5600 series Xeon processors require PCB R2.01 or later. Again, what brand / model are each of your DIMMs?
 

unwind-protect

Active Member
Mar 7, 2016
415
156
43
Boston
To clarify, you went 6x 4 GB to 6x 8 GB modules, right?

I still think you need to memtest all the modules first. For all you know both sets you tried have one or more broken modules in them.
 

alsenior

Member Member
Apr 19, 2016
56
14
8
Are all of the DIMMs the same brand/model with similar datecodes?

Your X8DTL-6F is documented as being limited to 24GB if the memory is unbuffered. To get beyond 24GB, all memory must be registered ECC memory. 5600 series Xeon processors require PCB R2.01 or later. Again, what brand / model are each of your DIMMs?
Yeah i had this issue sometimes the system would not boot when the moduals are in certain places. all of the moduals are detected fine.

Jut to confirm all of the dimms are detected as pc8500 dimms and are all ECC.

To clarify, you went 6x 4 GB to 6x 8 GB modules, right?

I still think you need to memtest all the modules first. For all you know both sets you tried have one or more broken modules in them.
Tried running memtest. No issues found .
 

Patriot

Moderator
Apr 18, 2011
1,450
789
113
Yeah i had this issue sometimes the system would not boot when the moduals are in certain places. all of the moduals are detected fine.

Jut to confirm all of the dimms are detected as pc8500 dimms and are all ECC.
Tried running memtest. No issues found .
Did you disable ECC before memtest?
 

alsenior

Member Member
Apr 19, 2016
56
14
8
Aye, thats the problem. occasionally there will be a double bit error boom the system goes down
 

alsenior

Member Member
Apr 19, 2016
56
14
8
A liltle update on this. I loaded all 96GB i have in my possession into another server with a x8dti and had none of the issues. so it looks like its the board that is the issue. now i just need to find out why.
 

unwind-protect

Active Member
Mar 7, 2016
415
156
43
Boston
Modern versions of memtest should automatically deal with ECC. That is why the option to turn it off is missing now. You could also turn it off in the BIOS, maybe.

Either way, memtest is just a partial RAM test. SuperPi and similar programs often find problems that aren't individual cell based. Timing problems in particular. That is why the overclockers use superpi, not memtest.
 
  • Like
Reactions: Patriot