ServeTheHome's RAID MTTDL Calculator Bug and Suggestion Box

souporman

New Member
Jul 1, 2015
2
0
1
37
When "other" is selected for MTBF the calculated data seems to go to a strange default which is insensitive to whatever is entered into the "Enter # for MTBF (base 10): " box
Sorry for the necro, but the link on the calculator to the proper forum page doesn't go anywhere. The issue of "Other" displaying no data still exists.
 

Patrick

Administrator
Staff member
Dec 21, 2010
11,894
4,851
113
Sorry for the necro, but the link on the calculator to the proper forum page doesn't go anywhere. The issue of "Other" displaying no data still exists.
I think we just fixed the <3 disk issue and the Other should be working now. Let me know if it does not.

Forum page I should be able to fix tomorrow morning.
 

souporman

New Member
Jul 1, 2015
2
0
1
37
I think we just fixed the <3 disk issue and the Other should be working now. Let me know if it does not.

Forum page I should be able to fix tomorrow morning.
Hmm, perhaps I'm just not using it or understanding it right. If I switch it to Other, and put any number in for my MTBF the results are always the same in the chart below.
 

i386

Well-Known Member
Mar 18, 2016
1,894
494
83
31
Raid-z = raid 5
Raid-z2 = raid 6
Or did I miss something about the zfs implementations?
 
  • Like
Reactions: Patrick

Evan

Well-Known Member
Jan 6, 2016
3,041
505
113
Raid-z = raid 5
Raid-z2 = raid 6
Or did I miss something about the zfs implementations?
As far as I know there is no difference that’s correct. At least that’s what I have always used, maybe I have been wrong.

These days for work it’s always mirrors or a distributed raid-1
For home mostly been using just straight high quality SSD with backup to a disk pool for fast recovery, maybe I am living very dangerously but so far no issues.
Did think to do say a 4 disk raid-Z all SSD pool though so making reasonable use of SSD
 

PCBONEZ

New Member
Nov 1, 2017
16
0
1
60
Raid-z = raid 5
Raid-z2 = raid 6
Or did I miss something about the zfs implementations?
Clearly you did.
Usable capacity number are the same but the reliability numbers are different.

Using the same drive set for assumptions (**)...
(Assumptions based on 6x 4TB enterprise drives, IIRC.)
RAID-6 MTTDL: 8.7 x 10^11 hr ( or ~ 870,000,000,000 hrs )
RAID-Z2 MTTDL: 1.3 x 10^12 hr (~ 1,300,000,000,000 hrs )
Z2 has an advantage of ~ 430,000,000,000 more hours MTTDL.
IOW, Z2 is -supposedly- about a 50% improvement over RAID-6 as far as MTTDL.

(**) In different calculators as no one seems to bother to do both conventional RAID and ZFS in the same calculator.

I'm just trying to learn this stuff.
Been using RAID-1 + BU (on enterprise class controllers and drives) for ages but a recent close call (didn't lose anything) has me wanting to get more advanced RAID. Something that can handle two drive fails to cover me during the longer rebuild times.

Incomplete calculators everywhere (all with different & not necessarily stated assumptions) aren't helping me reach any decisions.
I though this one (reliability calculator) was finally going to give a fair (same assumption) reliability comparison but Z2 (and Z1) go *poof* when the results come up.
.
 
Last edited:

Patrick

Administrator
Staff member
Dec 21, 2010
11,894
4,851
113
If you are thinking 430B hours on a RAID-Z2 6x 4TB array is a reasonable figure, I would suggest you look at your assumptions.

On triple parity (e.g. RAID-Z3) the issue is not disk failure. Items like capacitors failing, HBAs failing, software failing and etc. become a major part of the reliability calculations. Most of those models when we did the calculator years ago took 15 minutes or so of a dual socket system to come up with reliability figures.

Here is an old one (7+ years) but still worth a read The RAID Reliability Anthology - Part 1 - The Primer
 

PCBONEZ

New Member
Nov 1, 2017
16
0
1
60
I'm a retired QA inspector. Electronic Control Systems for Nuc plants.
Ultimately QA over whole plants and their repair facilities. (Not just the electronics gear.)
What you did with a dual socket system I used to do with a TI-30. But not in 15 minutes.
(Just sayin' I know what the numbers are for.)
Unfortunately I don't know how to program or I'd just build the complete calculator I wanna see.

If you are thinking 430B hours on a RAID-Z2 6x 4TB array is a reasonable figure, I would suggest you look at your assumptions.
You appear to be confusing 'reasonable' with 'practical'.

Is the 1.2M hours MTBF (137 years 24/7) you hard coded into your calculator a practical (or reasonable) number?
Does anyone actually expect a platter drive to last (on average) 137 years? - I don't think so.
Such numbers aren't meant to be reasonable or practical. They are meant to do comparisons.

Also, that's not what I said.
Z2's MTTDL is 430B hours - more than RAID-6's MTTDL.
Your own calculator comes up with 870B hours MTTDL for RAID-6.
If 430B hours MTTDL is 'unreasonable' (as you just suggested) then clearly a figure twice that (spit out by YOUR calculator) is as well.

But that isn't the point.
The point (and response to the question I was asked directly) is that Z1 & Z2 are NOT the same as 5 & 6 respectively in reliability calculations.

And thus I agree with cinedigital. - They should be included.
Particularly in light of how popular they have become.


For those on here: Simple MTTDL RAID Reliability Calculator - Beta. Would love feedback.
Z1 and Z2 should be included. - That's my feedback.
.
 
Last edited:

PCBONEZ

New Member
Nov 1, 2017
16
0
1
60
I would think that STH would be benefited by having the only fully complete calculator.
 

MiniKnight

Well-Known Member
Mar 30, 2012
2,984
888
113
NYC
@PCBONEZ why is Z1 and Z2 different in how they fail? Do you have the math behind what you're saying? I've always been told they are essentially the same since they're using a similar parity setup and fail in the same way.

I've always assumed the "hard coded" 1.2M MTBF is because that was a common drive manufacturer spec along with the other options on that drop down. Well except for other because then you can put whatever you want in there. Maybe I'm wrong.
 

PCBONEZ

New Member
Nov 1, 2017
16
0
1
60
As I said earlier I'm just learning this RAID-Z(x) stuff myself and I'm not a programmer.

As I understand it the RAID-Z(s) use system RAM/CPU and some processing routines (that RAID-5/6 don't have equivalents to) to detect and correct data errors that RAID-5/6 miss completely. (Even using enterprise class hardware RAID-5/6 controllers.)

Basically the Z(s) check for and fix more kinds of errors. Many "on the fly". Fewer errors = less chance of data loss.
That's my (non-programmer) interpretation of what I've been reading.

.....
While I'm not a programmer, I am a tech.
Post retirement from my previously mentioned job I spent 12+ years repairing motherboards at component level.
(As in O'scope and soldering equipment. Removing/replacing chips & caps & such. LOTS of caps.)
I look at computers from the viewpoint of an Electronics Tech.

One of my own questions is:
With RAID-Z(x) being so RAM intensive how big is the difference in power use between RAID 5/6 and Z1/Z2?
Seems obvious that Z1/Z2 would use more power, but how much more?
.
 

Patrick

Administrator
Staff member
Dec 21, 2010
11,894
4,851
113
One of my own questions is:
With RAID-Z(x) being so RAM intensive how big is the difference in power use between RAID 5/6 and Z1/Z2?
Seems obvious that Z1/Z2 would use more power, but how much more?
.
More insofar as you may add additional RAM modules which consume power. By the time you have a decent sized array, a few GB of RAM is a small factor. Perhaps a good discussion for another thread.

On the calculator topic:

The ZFS scrubbing/ ECC is not (really) going to help you too much in this type of reliability model. A faster rebuild time will as that limits the amount of time you are exposed to a drive failure. If you are exposed during disk failure(s) ZFS does not necessarily help you if you have an undetected disk error before you get redundancy back.

The main issue is what happens when a second drive fails during the RAID-Z rebuild. No amount of data scrubbing helps when you have two drives die during an array rebuild.

You are right that it is not exactly on for RAID-Z1 and RAID-Z2, but from what I was told by the reliability folks at a large storage vendor, for this type of model it is a close enough of an approximation to use RAID 5 and RAID 6.
 

PCBONEZ

New Member
Nov 1, 2017
16
0
1
60
More insofar as you may add additional RAM modules which consume power. By the time you have a decent sized array, a few GB of RAM is a small factor. Perhaps a good discussion for another thread.
Depends on the kind of RAM.
Some FBDIMMs use in excess of 20w per module at full load.
This would be the only reason I moved on from my beloved dual LGA771 boards.
Well, except the one I'm using right now... :D

You are right that it is not exactly on for RAID-Z1 and RAID-Z2, but from what I was told by the reliability folks at a large storage vendor, for this type of model it is a close enough of an approximation to use RAID 5 and RAID 6.
As I showed earlier with Z(x) calc' numbers compared to your calc' numbers the difference is about 50%.
I would not call that close.

I will give you that the numbers for RAID-6 are so outstanding that 50% better and the Law of Diminishing Returns may make the 50% a non-issue for many people.....
... But that still doesn't make them close or equivalent.

I know one of the reasons the Z's fair better is they have protections against bit-rot and 5/6 have none.
I have no idea how -much- that affects the numbers. Just an example of why they aren't the same.

Vendor = Salesman.
Never trust a Salesman. They most always have undisclosed motives.
(And often knowledge deficits about things they don't sell.)
.
 
Last edited:

PCBONEZ

New Member
Nov 1, 2017
16
0
1
60
I've always assumed the "hard coded" 1.2M MTBF is because that was a common drive manufacturer spec along with the other options on that drop down. Well except for other because then you can put whatever you want in there. Maybe I'm wrong.
I've been looking at Seagate/WD NAS type drives which all seem to be 1.0M MTBF.
WD RE4 (which I have a lot of) are 1.2M MTBF.
The RE4 apparent replacements (WD Gold) are listed as between 2.0M and 2.5M MTBF

I missed 'other' and the drop-down that allows entering your own number there - so thanks for the tip.
.
 

Patrick

Administrator
Staff member
Dec 21, 2010
11,894
4,851
113
@Patrick can you post the formula that's used in the calculator?
It is essentially what is in the article on it.

Some of the folks debating the calculations there were also the folks who were doing IEEE paper for big vendors at the time. I actually had lunch with a few of them at the time to get and incorporate feedback.

The better models take into account more than just disk failures but are way more complex albeit more accurate.
 

i386

Well-Known Member
Mar 18, 2016
1,894
494
83
31
The better models take into account more than just disk failures but are way more complex albeit more accurate.
I know what you mean. After making the post I used google and found a paper with disk, psu/enclosure and different other failure rates in the mttdl formula :D
 

Patrick

Administrator
Staff member
Dec 21, 2010
11,894
4,851
113
I know what you mean. After making the post I used google and found a paper with disk, psu/enclosure and different other failure rates in the mttdl formula :D
Decent chance I spoke to that person.

One of the conversations I had was along the lines of "well I can license you my model, on a dual 4 core Xeon system it runs fast, only 15-30 minutes if you cut down the inputs."

Better accuracy, but I figured nobody wanted to wait that long for an answer. Even if it took 10 minutes on a Xeon D-1541 I would still need to have 2-3 nodes setup to just run the MTTDL calculator for STH.