ServeTheHome's RAID MTTDL Calculator Bug and Suggestion Box

Discussion in 'STH Suggestions and Updates' started by Patrick, Aug 26, 2012.

  1. Patrick

    Patrick Administrator
    Staff Member

    Joined:
    Dec 21, 2010
    Messages:
    11,151
    Likes Received:
    4,103
    Starting a thread to discuss bugs in the RAID mean time till data loss (MTTDL) calculator on ServeTheHome. We do need help testing so any assistance is greatly appreciated.
     
    #1
  2. mobilenvidia

    mobilenvidia Moderator

    Joined:
    Sep 25, 2011
    Messages:
    1,745
    Likes Received:
    40
    Bring it on.

    I can test, before I put the drives into long term storage mode.
     
    #2
  3. Patrick

    Patrick Administrator
    Staff Member

    Joined:
    Dec 21, 2010
    Messages:
    11,151
    Likes Received:
    4,103
    Found an "undocumented feature" this morning when migrating to production. For those wondering, this is something you can use to figure out ballpark data loss chances if you have say, 8 drive RAID 5 v 6.
     
    #3
  4. Patrick

    Patrick Administrator
    Staff Member

    Joined:
    Dec 21, 2010
    Messages:
    11,151
    Likes Received:
    4,103
    #4
  5. Jeggs101

    Jeggs101 Well-Known Member

    Joined:
    Dec 29, 2010
    Messages:
    1,412
    Likes Received:
    200
    that is awesome! holy cr@p! i haven't ever seen something online like that. i know it isn't a fullblown simulation model but makes enough sense to be useful. really thanks for having this done.
     
    #5
  6. mobilenvidia

    mobilenvidia Moderator

    Joined:
    Sep 25, 2011
    Messages:
    1,745
    Likes Received:
    40
    Hmmm, it's working too well, I may have to rethink my RAID5 array and go RAID6 now.
    This calculator is costing, will need to get 2 more drives now.

    Was cheaper not to see this :)

    Working well and makes one think, and possibly rework a setup
     
    #6
  7. gigatexal

    gigatexal I'm here to learn

    Joined:
    Nov 25, 2012
    Messages:
    2,470
    Likes Received:
    433
    some descriptions on what the table is telling you or how to use the table would be helpful for the uneducated like myself
     
    #7
  8. Patrick

    Patrick Administrator
    Staff Member

    Joined:
    Dec 21, 2010
    Messages:
    11,151
    Likes Received:
    4,103
    Thanks for the feedback. Will work that into next rev.
     
    #8
  9. dwm

    dwm New Member

    Joined:
    Dec 9, 2012
    Messages:
    4
    Likes Received:
    1
    Just explaining the units would be helpful. I assume the second table is probability on a scale from 0 to 100 (i.e. percent chance)? Is that for loss or failure?

    Would be nice to see raidz1 and raidz2 in the tables even if the model is the same/similar as others, just to avoid newbie confusion ("Why raidz3 and no raidz2 or raidz1?").

    Thanks for the work!
     
    #9
  10. Patrick

    Patrick Administrator
    Staff Member

    Joined:
    Dec 21, 2010
    Messages:
    11,151
    Likes Received:
    4,103
    Will do on this front. On the to do list.
     
    #10
    gigatexal likes this.
  11. cactus

    cactus Moderator

    Joined:
    Jan 25, 2011
    Messages:
    797
    Likes Received:
    67
    3 way mirror does not get calculated correctly.
     
    #11
  12. Patrick

    Patrick Administrator
    Staff Member

    Joined:
    Dec 21, 2010
    Messages:
    11,151
    Likes Received:
    4,103
    Thanks for the tip. Will work on that also.
     
    #12
  13. Thatguy

    Thatguy New Member

    Joined:
    Dec 30, 2012
    Messages:
    45
    Likes Received:
    0
    The option 'Volumes' does not work the way I think it should, at least for raidz3

    if for example, I have 36 drives, 4 8 Drive RaidZ3's, and 4 hot spares, I should be able to say that my 36 drive raidz3 is 4 volumes, or something like that.

    Maybe I'm just lazy :)
     
    #13
  14. Maltz

    Maltz New Member

    Joined:
    Jan 7, 2013
    Messages:
    1
    Likes Received:
    0
    I LOVE this calculator!!

    But... I think it may be VASTLY underrating the importance of uncorrectable bit error rate. Take this extreme example:

    RAID5
    MTBF: 36.5k hrs
    UBER: 10^6
    2TB drives
    4 drives
    1 volume
    15MB/s rebuild rate

    Now, unless I'm mis-reading the table, it's estimating 59.64 years before data loss for a RAID5 array. That doesn't make sense at all. When the first HD dies (~4 years) with an UBER of 10^6, you're pretty much guaranteed to run into thousands of unrecoverable errors while reading the 6TB required to rebuild the array. Then you've lost data. The UBER isn't that important while the RAID is functioning, but in a degraded state, it's VERY important on large arrays like that.

    Am I misunderstanding the table? Or is the uncorrectable error rate during a rebuild not being taken into account properly?

    Great work though! This is a tremendously helpful tool, and quite a bit of googling kept taking me back here. lol All the more reason it needs to be right, though. :)
     
    #14
  15. nitrobass24

    nitrobass24 Moderator

    Joined:
    Dec 26, 2010
    Messages:
    1,072
    Likes Received:
    125
    That seems about right to me. Granted MTTDL is not the best representation as far as accuracy is concerned, but should be relative when comparing other Raid levels, MTBFs, etc.

    With only 3 drives in a degraded state, you are not likely to encourter a URE. Take those same numbers and bump the # number of drives up to 10, you will see that it drastically drops to less than 3 years.

    Also something to keep in mind, no one even the HDD manufacturers have a clue about MTBF or UREs. If it was really scientific do you think we would have such perfectly round numbers? No way! :)
    A bunch of Engineers, Marketing, and Accounting people got together took the rough crap data they had and come up with something that Marketing can use and Accounting/finance can use to set price points and determine warranties without putting huge liabilities on their books.

    My brother in law works for a large Semiconductor mfr here in Dallas, TX. Even he will tell you they test and test but at the end of the day its just an extrapolation of a few algorithms to come up with a pretty good guess. The other people take this guess and adjust to make it work in the market and work on the books. The chips they make for TVs are the same ones they sell the military, but the military ones have a higher MTBF and better SLA/Warranty...all for a higher price. Just how the world works.
     
    #15
  16. Patrick

    Patrick Administrator
    Staff Member

    Joined:
    Dec 21, 2010
    Messages:
    11,151
    Likes Received:
    4,103
    Planning to revamp this weekend. Off to CES. Thanks for the feedback and keep it coming!
     
    #16
  17. matt_garman

    matt_garman Active Member

    Joined:
    Feb 7, 2011
    Messages:
    200
    Likes Received:
    35
    Is MTTDL assumed to be a normal distribution (i.e. bell curve)? Looking at the numbers, "mean" to me means I have a 50% chance of data loss after X years, depending on the parameters I input. What I'd be interested in is the standard deviation, and also higher "confidence" tiers.

    In other words, 50/50 odds doesn't mean much to me. I'd like to know, how many years (or months or days) do I have e.g. 75%, 90% and 99% chance of no data loss.
     
    #17
  18. Patrick

    Patrick Administrator
    Staff Member

    Joined:
    Dec 21, 2010
    Messages:
    11,151
    Likes Received:
    4,103
    Poisson distribution. Updated for MTTDL model that should be more clear next week.
     
    #18
  19. matt_garman

    matt_garman Active Member

    Joined:
    Feb 7, 2011
    Messages:
    200
    Likes Received:
    35
    Another question. With Quantity of Disks = 6, the numbers for RAID1 and RAID10 look the same. Is that right?

    I guess, the first question is, what does RAID1 mean with 6 disks? Does that mean three independent duplicate-copy sets? Or does that mean one set with 6x redundancy? If I reduce the number of disks to 2, then RAID1 MTTDL number goes up by about a factor of three...

    For that matter, "six disk" RAID10 is ill-defined, as you could stripe across two triple-redundant RAID1 sets. :) But I'll assume the calculator uses the traditional RAID10 of striping across two-way mirrors.
     
    #19
  20. Ron Dennison

    Ron Dennison New Member

    Joined:
    Aug 9, 2013
    Messages:
    1
    Likes Received:
    0
    When "other" is selected for MTBF the calculated data seems to go to a strange default which is insensitive to whatever is entered into the "Enter # for MTBF (base 10): " box
     
    #20
Similar Threads: ServeTheHome's RAID
Forum Title Date
STH Suggestions and Updates ServeTheHome's RAID Calculator Bug and Suggestion Box Mar 25, 2012
STH Suggestions and Updates FlexRAID forum? Jan 2, 2011

Share This Page