Intel S2600 bit fade memory errors

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

slatfats

Member
Nov 3, 2014
42
15
8
Brisbane, Australia
Hi all

After collecting parts over the last few months to upgrade my ESXi / storage server, I'm getting close to swapping out the guts for a dual E5-2670v1 / 128GB PC3 setup.

After getting the RAM last week (second hand via eBay), I set it up with a single CPU on my workbench, installed the RMM4Lite and a SATA key, updated to the latest BIOS etc., and set about running the RAM through memtest86.

For those of you unfamiliar with memtest86, v7.3 works with a UEFI bios, and v4.3 works with an old-school BIOS. v7.3 also adds ECC support.

Booting to v4.3, all tests ran fine.
Under v7.3, all tests ran fine except one - test 10, the bit-fade test. This test loads up the RAM with data, leaves it for 5 minutes, then checks to see whether the data has changed.
Tested a few different 8GB DIMMs from the same RAM purchase, tried an 8GB DIMM from a different purchase, even tried a 4GB DIMM, and got the same issue at the same RAM address. So this is pointing to the MB or CPU as the culprit.

I've posted on the Passmark Memtest86 forums and their admin is guessing at an issue in Intel's UEFI bios (see thread here: v4.3 vs v7.3 bit fade test - PassMark Support Forums). TL/DR, he thinks the UEFI BIOS is overwriting a section of RAM that it incorrectly thinks is free.

Just wondering two things:
1) Have any of you seen the same errors? If so, how did you fix the issue?
2) What BIOS versions are people running?

Thanks!
slatfats

Edit: Bit fade test, not bit flip test
 
Last edited:

lbjm

New Member
May 29, 2017
17
0
1
It would help to note the type of ram you are using unbuffered/Registered ECC/non-ECC also the brand, model, and size of the memory would be helpful.
 

slatfats

Member
Nov 3, 2014
42
15
8
Brisbane, Australia
Registered ECC, Samsung DIMMs. Tried multiple different sticks from different purchases, 1.5V and 1.35V, 8GB and 4GB DIMMs, the bit fade errors are consistent across all sticks - same number of bits, same memory address (at about the 1GB mark).
 

slatfats

Member
Nov 3, 2014
42
15
8
Brisbane, Australia
After reading the issues people have had with downgrading the BIOS firmware on these boards, I'm starting to conclude that I'll just live with it - I'll just have to use the traditional BIOS as opposed to the EFI bios.

This in itself shouldn't be an issue... it's going to be used as an ESXi host and ESXi can obviously boot using the traditional BIOS.

I'll do some more testing before trusting it with my data though!
 

lbjm

New Member
May 29, 2017
17
0
1
Registered ECC, Samsung DIMMs. Tried multiple different sticks from different purchases, 1.5V and 1.35V, 8GB and 4GB DIMMs, the bit fade errors are consistent across all sticks - same number of bits, same memory address (at about the 1GB mark).
I'm using samsung M393A 16GB reg ECC dimms 1.5V on supermicro and asrock boards without issue I just memtested them the other day a set of 4 the other without problem. Are you running the mixed voltage memory together in the same system? Do you have another motherboard you can test the memory on? I am personally no fond of Intel branded boards.


edit: I used legacy bios mode myself over EFI.

Let us know how it goes.
 

slatfats

Member
Nov 3, 2014
42
15
8
Brisbane, Australia
My initial test was with multiple dimms in place at once (but all identical models). Noticed the errors so dropped back to one dimm at a time to try and narrow it down to a particular dimm, and have been testing with one ever since.

(a slight lie - I did one more test with two dimms to see if the address that I was getting the errors would change - it didn't)

Either way, I haven't mixed voltages or models of dimms.

I do have another server that I can test the memory on - I'll pull it out of storage to be 100% certain that it's not the RAM - but I do feel like I've pretty much concluded that already.
 

lbjm

New Member
May 29, 2017
17
0
1
My initial test was with multiple dimms in place at once (but all identical models). Noticed the errors so dropped back to one dimm at a time to try and narrow it down to a particular dimm, and have been testing with one ever since.

(a slight lie - I did one more test with two dimms to see if the address that I was getting the errors would change - it didn't)

Either way, I haven't mixed voltages or models of dimms.

I do have another server that I can test the memory on - I'll pull it out of storage to be 100% certain that it's not the RAM - but I do feel like I've pretty much concluded that already.

But to be 100% sure you have to test in a board that is not Intel S2600. What are the odds that every module that was purchased from different people all have the same problem?
 

slatfats

Member
Nov 3, 2014
42
15
8
Brisbane, Australia
Dragged out my other server. Chose two sticks at random, that gave bit fade errors in memtest v7.3, and ran them in my other server using the same memtest USB stick. No errors in either memtest version.
So it's not the RAM.
 

kiteboarder

Active Member
May 10, 2016
101
48
28
45
For posterity:

I ran into this problem this week and found an Intel post about it:



How to Resolve the MemTest86 Error on Test 10

What am I seeing?

When you run MemTest86 testing software, Intel® Server Boards and Intel® Server Systems may pass all tests but report failure on Test 10 [Bit fade test].

Instead of the expected 00000000 or FFFFFFFF, the address may contain a different hexadecimal value.

Some pages allocated with the AllocatePages() service may be above 4 GB if memory is present above 4 GB. The UEFI driver may dynamically overwrite the addresses reserved by MemTest86.

This behavior is normal and doesn't indicate any hardware failure.

How to fix it.

Entering BIOS setup and setting Boot Options > EFI Optimized Boot > Enabled allows the Bit fade test to pass.


(I still see the error even with EFI optimized boot enabled...)
 
  • Like
Reactions: slatfats