do you guys really run memtest86+ to 100% when bulk testing DIMMs?

Discussion in 'General Chat' started by BLinux, Nov 20, 2018.

  1. BLinux

    BLinux cat lover server enthusiast

    Joined:
    Jul 7, 2016
    Messages:
    2,346
    Likes Received:
    812
    When I buy a large batch of RAM from eBay, I immediately test them with memtest86+ in a test system with 16x DIMM slots. It takes about 12+ hours to complete test to 100% for 256GB of RAM. I'm beginning to wonder if it is worth the effort and energy? Would I not catch 99.9% of any problems with say the first hour or 2 of running memtest86+? The last few test patterns in memtest86+ also seem to be really slow... just wondering what your guys' thoughts are on this? anyone looked at the cost / benefit analysis of running memtest to 100% vs. 50% or something like that?
     
    #1
  2. dandanio

    dandanio Member

    Joined:
    Oct 10, 2017
    Messages:
    60
    Likes Received:
    21
    I always run it to 100% at least once. I need to be sure of my dimms and running it hot makes sure they are heat soaked. And in the grand scheme of things, what is 12 hours of MEMTEST?
     
    #2
  3. Blinky 42

    Blinky 42 Active Member

    Joined:
    Aug 6, 2015
    Messages:
    531
    Likes Received:
    189
    I try to do at least a full pass in heat soak before considering DIMMs or full systems for that matter good to go. Kick it off in afternoon and done by the next morning unless it is a big system. Yes 90% of the time it finds issues in < 30 min but I do see fails late in the testing often enough to keep doing the full pass.
     
    #3
  4. BLinux

    BLinux cat lover server enthusiast

    Joined:
    Jul 7, 2016
    Messages:
    2,346
    Likes Received:
    812
    Well, at 200w for 12 hrs, that's 2.4 kw-H of energy. And that's just for 16 DIMMs. If I have a batch of 64 or more DIMMs, that's a couple of days. Is it still worth it?
     
    #4
  5. Magebarf

    Magebarf New Member

    Joined:
    Aug 15, 2018
    Messages:
    15
    Likes Received:
    3
    Have you ever experienced bad memories in a computer, any computer?

    If you're lucky, you get the apparent issues where it's clear something's wrong, and memory is the likely culprit.
    The nightmare is if you're struck with memory that only shows transient errors, and you never really can pinpoint that error exactly.

    I'd leave mine running for a full week if that's what it would take to complete that pass...
    Otherwise you really never know whenever that single bit f**ks you over and stores your encryption key wrong in memory, thus rendering all data lost.

    If we're talking ECC RAM and RDIMMs, we're off for another league, I still wouldn't bet on it without doing a full test though.
     
    #5
  6. Blinky 42

    Blinky 42 Active Member

    Joined:
    Aug 6, 2015
    Messages:
    531
    Likes Received:
    189
    I guess are you testing and then storing the memory for future use, or putting it into final / production use basically right away?
    Unless the server is something for testing purposes and I don't mind if it barfs on me randomly in a week or 2, its makes more sense from my side to just test it when it comes in so it can be returned with less hassle if there is a problem. If it is then stored after being tested good when it came in, a basic pass to make sure the memory plays along in the final system config should be enough to roll with, but I am still doing a full pass somewhere along the line.
    If you are talking about huge volumes of memory you are turning over then the workflow might be different - get a commercial tester (they must still make them?) or have a few victim test setups to burn through multiple batches at once.
     
    #6
  7. BLinux

    BLinux cat lover server enthusiast

    Joined:
    Jul 7, 2016
    Messages:
    2,346
    Likes Received:
    812
    @Magebarf i have experience failed memory in servers several times over the years. of course, all ECC RAM. but honestly, very few of them are "mystery" errors, and most are pretty obvious when it happens. I don't think any amount of testing is going to "prevent" RAM modules from failing, they will fail at their usual failure rates; which is pretty infrequent anyway.

    i'm just saying, if I buy like 64 sticks of used RAM for a project, I want to test them enough to know they work decently, expecting at some point in the future, some of them might fail during use. whether I test them for 1hour, 3 hour, or the full 100% (around 12 hours), is it really going to significantly reduce my chances of having RAM related problems later on? Is the difference in 2 yrs something like 1 DIMM failure vs 5 DIMM failures? Intuitively, I just don't feel like it would make that much of a difference? But intuition can be wrong, so I'm asking, anyone have data to show that spending 12hrs @ 200W for each set of 16x DIMMs (or whatever configuration and setup you have) really makes a meaningful difference to justify the time and power use? i mean, even thoroughly tested RAM modules can eventually fail right?

    put another way, at a high level, I'm just starting to question if my running of memtest86+ for 12 hours to 100% completion is really a productive activity or is it mostly a waste of time and electricity?
     
    #7
  8. Magebarf

    Magebarf New Member

    Joined:
    Aug 15, 2018
    Messages:
    15
    Likes Received:
    3
    From my experiences, purely anecdotal, the errors that show up can sometimes be quite systematic, and I've had some DIMMMs showing constant failures on on one of the lasts tests in Memtest, with nothing showing on any of the tests before that.

    I guess what you really want to achieve is to try to have as varying set of scenarios being tested as possible, but then again some of the tests do require quite a bit of time, as letting the memory rest is also a possible vector of issues (tested by the bit fade test).

    I guess the only test I'd ask myself whether is needed is the Rowhammer test, and that is as far as I can recall the most time consuming one.

    I guess the use case of the RAM is going to be a better guide to how thoroughly you want to depend on it, and in turn how much time you'd spend on testing it.

    From my experiences though, the only thing memtest is going to help you with is to ensure the statistical chance of a memory storing/reading a wrong value is low enough. I don't think it will really provide you with any data indicating expected life times, unless of course it's DIMMs that are DOA or dies during the test. :)

    And as I think I was clear about in my first post, about my own preferences; After having been bitten by bad RAM in the early 2000s, the type of error where nothing really makes sense as there is just random events a few times a week where software fails due to the executables loaded into memory was corrupted, or files I was writing to disk just ended up corrupted after I had saved them (or seemingly corrupted while loading them from disk), I'd never ever rely on a computer where I did not give the RAM an initial test of Memtest. Whether I accept a single 100% pass or do 4 passes is however depending on the situation, if I have time to spare before I need to use it.
     
    #8
  9. Evan

    Evan Well-Known Member

    Joined:
    Jan 6, 2016
    Messages:
    2,855
    Likes Received:
    427
    If your a seller and selling really larger qty isn’t it easier just to throw in a couple of spares ??
    As for running the test on the receiving end I guess it makes sense at burn in time but to test on receipt is a big effort for lots of dimms. Just a single pass maybe ?
     
    #9
  10. Rain

    Rain Active Member

    Joined:
    May 13, 2013
    Messages:
    228
    Likes Received:
    73
    I generally run one full Memtest86+ pass on all new/used RAM I purchase. Usually I don't need to get a new system up and running imminently so taking a few days to make sure everything is tested to avoid annoying issues down the road is well worth it (especially if the system will be remote!).

    That said, I haven't bulk-ordered more than a few servers worth of RAM at any one time. If I was spinning up a ton of servers and I knew there would be enough redundancy to deal with a failure or two, I'd skip testing just make sure to purchase a few more spares than I normally would.
     
    #10
  11. BLinux

    BLinux cat lover server enthusiast

    Joined:
    Jul 7, 2016
    Messages:
    2,346
    Likes Received:
    812
    thanks guys for your thoughts. this is primarily for my own use and my own projects; although occasionally, i'll use my own stash for a client's project (i'm a consultant) if needed and bill them for it. of course, I'm buying used RAM off eBay (mostly) and one of the main goals is to validate that the RAM is good, so I can return it within the return window if it is not good. secondary goal to that, is to reduce the chances of having RAM related problems when my servers are deployed for a project. None of my projects involve life threatening situations or weapons systems these days; so nothing terrible will happen if a system fails, it would just be an inconvenience, perhaps delay in a project, and some annoyance from a client.

    i've been thinking about all your comments and my original question, and I think I probably didn't really make it clear and I've been thinking about it some more. so, here's my re-phrasing of the question:

    "if you regularly test RAM with memtest86, how often have you encountered RAM modules defects that showed 0 errors during the first half (50%) of the tests, but started showing errors in the 2nd half of the tests? how often have you encountered defective RAM modules that were detected in the first half of the test?"

    To normalize your response, can you estimate an order of magnitude of how many DIMMs you've tested during the period where you've encountered the above error condition(s). For example, for me, I've probably tested about 100+ DIMMs in the last 12 months, and in that period, I've encountered I think 4 defective DIMMs (definitely less than 10), but all 4 were detected during POST (before running memtest86), none were detected by memtest86 during first or second half of the test.
     
    #11
  12. Evan

    Evan Well-Known Member

    Joined:
    Jan 6, 2016
    Messages:
    2,855
    Likes Received:
    427
    Yes, nearly all major memory errors I have seen are detected during post, otherwise just during operation stating single bit error but that’s usually long time after installation.
     
    #12
  13. Magebarf

    Magebarf New Member

    Joined:
    Aug 15, 2018
    Messages:
    15
    Likes Received:
    3
    Slightly more varied on my end;

    Some experiences (I'd say 2-3 modules) where computer did not even reach BIOS, leading to LED or beep codes to decode the source as memory errors.

    I do not think I've caught/noticed any errors during POST itself.

    With Memtest86, I've had 5-6 modules giving errors. Out of these 2 of them gave errors due to being run on a too high frequency on a machine without this configurable (Mac Mini), and those errors were spread all over the tests just as likely in the beginning as in the end. The rest of them 3-4 modules only showed errors in second half of Memtest86.

    This over the last 5 years, so not that high volume of a user. :)
     
    #13
Similar Threads: guys really
Forum Title Date
General Chat Just a big "I Love You Guys" Feb 26, 2019
General Chat What are you Guys use for network / cable documentation or management ? Sep 6, 2017
General Chat What desk do you guys use Apr 11, 2016
General Chat How do you guys determine if a server on eBay is a good deal or not? Jan 19, 2015
General Chat Do I really need my own FreeNAS + ownCloud system? Nov 26, 2018

Share This Page