Memtest86+ errors under less than ideal conditions

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

adamjb

New Member
Jun 1, 2016
14
0
1
32
I have a new Supermicro Xeon D motherboard, with two new Samsung 32GB DDR4 sticks of RAM. I've been running Memtest86+ for the last 13 hours and have found 80 errors.

memtest.jpg

I'm wondering if the errors could be caused by something other than the RAM. The RAM and motherboard both came new from eBay, and appear to be new.

My testing setup is less than ideal, with the motherboard outside of a case resting on top of the box it came in, powered by a very old power supply, with a fan blowing across it.

IMG_0416.JPG

Temperatures as reported by IPMI are good.

Screen Shot 2016-06-23 at 8.52.45 PM.png

If I put the motherboard in a case with a new power supply and run the test again without errors, should I feel safe knowing that my RAM is probably good? Or at this point should I just replace the RAM?
 

PigLover

Moderator
Jan 26, 2011
3,186
1,545
113
Best way to tell is to.mave things about and see if the problem moves.

Retest with only one stick in slot a1. Then test with the other stick by itself. Did you get errors both times? If you have other ddr4 ram test the board with that - and if you have another board test the memory there. The actually bad component (board or ram) should readily become apparent.

Sent from my SM-G925V using Tapatalk
 

sullivan

New Member
Mar 27, 2016
25
16
3
If you are seeing Memtest errors and the addresses are all under 1MB then there's a good chance that this is actually just an issue with BIOS reserved memory or some other BIOS-related bug.

It looks like this is probably happening in your case, based on the screenshot. You are not seeing single bit flips, something is trashing those areas of memory. It also looks like this was a burst of errors during a single pass, i.e. not happening over and over again.

I've seen this issue multiple times on different systems and it never turned out to be bad memory. The problem either disappeared after some config change, or simply rebooting and restarting memtest again went on to pass for days at a time.

You can try using a newer or different version of Memtest (either from memtest86.com or memtest.org). Make sure you have the latest BIOS. You can try running with some or all of the multiple CPU cores disabled. You might also want to try some alternate tests like running prime95.
 

adamjb

New Member
Jun 1, 2016
14
0
1
32
@PigLover, unfortunately I don't have any other DDR4 motherboards or RAM. I will try testing one stick at a time tomorrow.

@sullivan, My board is a Supermicro X10SDV-TLN4F and I am using Samsung ECC REG M393A4K40BB0-CPB RAM from the Supermicro compatibility chart. I will check my BIOS revision. What config change might fix this? Shouldn't Registered ECC RAM not report errors? Does memtest disable ECC?
 

sullivan

New Member
Mar 27, 2016
25
16
3
First, I would make sure you have the latest BIOS. Just for good measure, clear the CMOS and reset the BIOS defaults.

Next, try running the Passmark version of Free Edition MemTest86 from memtest86.com. You will need to follow the instructions on their web page for booting via UEFI:

MemTest86 - Download now!
PassMark MemTest86 - Memory Diagnostic Tool - Technical Information

I am a long-time fan of the free version of memtest86+ (from memtest.org) but the reality is that it hasn't been updated in about 4 years. This is why it shows "Chipset: Unknown, Memory Type: Unknown" on newer motherboards like yours. Passmark is trying hard to sell their product but they are not too obnoxious about it and the Free Edition seems to work fine and supports new hardware well.

I had a similar issue with memtest86+ low memory errors on a Supermicro X10SRL / E5-1650v3 system a few months ago. I retested with the Passmark version and it ran fine for days. I reran memtest86+ with a single core and it ran fine for days. One more try with memtest86+ and multiple cores and it spit out low memory errors in the first couple of passes then no more errors for days.

I recall having similar issues with some other X9 series motherboards in the past. I suspect some sort of bug in memtest86+ related to large core counts and/or buggy BIOS code. If you go digging through the memtest86+ source code, there is a description of how they test low memory that involves relocating the executable code on the fly:

memtest86-plus/README.background at master · wkatsak/memtest86-plus · GitHub

The address ranges they use are in low memory between 0x2000 and 0x20000. I suspect that sometimes perhaps one core is still running trashing memory while the main program is doing its relocation tricks. So this looks like an error in that memory range. I suspect the BIOS because there may also be some interaction with interrupt handling, starting and stopping threads, and getting threads in and out of protected mode, but I've never investigated further.