ServeTheHome's RAID MTTDL Calculator Bug and Suggestion Box

cinedigital · Sep 8, 2013

Patrick said:
For those on here: Simple MTTDL RAID Reliability Calculator - Beta. Would love feedback.

Brilliant work!

RAID-Z and RAIDZ-2 are notably absent though, they would be really useful to have

Thanks
Dave

souporman · Jul 1, 2015

Ron Dennison said:
When "other" is selected for MTBF the calculated data seems to go to a strange default which is insensitive to whatever is entered into the "Enter # for MTBF (base 10): " box

Sorry for the necro, but the link on the calculator to the proper forum page doesn't go anywhere. The issue of "Other" displaying no data still exists.

Patrick · Jul 1, 2015

souporman said:
Sorry for the necro, but the link on the calculator to the proper forum page doesn't go anywhere. The issue of "Other" displaying no data still exists.

I think we just fixed the <3 disk issue and the Other should be working now. Let me know if it does not.

Forum page I should be able to fix tomorrow morning.

souporman · Jul 1, 2015

Patrick said:
I think we just fixed the <3 disk issue and the Other should be working now. Let me know if it does not.

Forum page I should be able to fix tomorrow morning.

Hmm, perhaps I'm just not using it or understanding it right. If I switch it to Other, and put any number in for my MTBF the results are always the same in the chart below.

PCBONEZ · Nov 1, 2017

cinedigital said:
Brilliant work!

RAID-Z and RAIDZ-2 are notably absent though, they would be really useful to have

Thanks
Dave

I agree.
RAID-Z and RAID-Z2 are both more common than RAID-Z3.
They should be included.
.

i386 · Nov 1, 2017

Raid-z = raid 5
Raid-z2 = raid 6
Or did I miss something about the zfs implementations?

Evan · Nov 2, 2017

i386 said:
Raid-z = raid 5
Raid-z2 = raid 6
Or did I miss something about the zfs implementations?

As far as I know there is no difference that’s correct. At least that’s what I have always used, maybe I have been wrong.

These days for work it’s always mirrors or a distributed raid-1
For home mostly been using just straight high quality SSD with backup to a disk pool for fast recovery, maybe I am living very dangerously but so far no issues.
Did think to do say a 4 disk raid-Z all SSD pool though so making reasonable use of SSD

PCBONEZ · Nov 9, 2017

i386 said:
Raid-z = raid 5
Raid-z2 = raid 6
Or did I miss something about the zfs implementations?

Clearly you did.
Usable capacity number are the same but the reliability numbers are different.

Using the same drive set for assumptions (**)...
(Assumptions based on 6x 4TB enterprise drives, IIRC.)
RAID-6 MTTDL: 8.7 x 10^11 hr ( or ~ 870,000,000,000 hrs )
RAID-Z2 MTTDL: 1.3 x 10^12 hr (~ 1,300,000,000,000 hrs )
Z2 has an advantage of ~ 430,000,000,000 more hours MTTDL.
IOW, Z2 is -supposedly- about a 50% improvement over RAID-6 as far as MTTDL.

(**) In different calculators as no one seems to bother to do both conventional RAID and ZFS in the same calculator.

I'm just trying to learn this stuff.
Been using RAID-1 + BU (on enterprise class controllers and drives) for ages but a recent close call (didn't lose anything) has me wanting to get more advanced RAID. Something that can handle two drive fails to cover me during the longer rebuild times.

Incomplete calculators everywhere (all with different & not necessarily stated assumptions) aren't helping me reach any decisions.
I though this one (reliability calculator) was finally going to give a fair (same assumption) reliability comparison but Z2 (and Z1) go *poof* when the results come up.
.

Patrick · Nov 9, 2017

If you are thinking 430B hours on a RAID-Z2 6x 4TB array is a reasonable figure, I would suggest you look at your assumptions.

On triple parity (e.g. RAID-Z3) the issue is not disk failure. Items like capacitors failing, HBAs failing, software failing and etc. become a major part of the reliability calculations. Most of those models when we did the calculator years ago took 15 minutes or so of a dual socket system to come up with reliability figures.

Here is an old one (7+ years) but still worth a read The RAID Reliability Anthology - Part 1 - The Primer

PCBONEZ · Nov 9, 2017

I'm a retired QA inspector. Electronic Control Systems for Nuc plants.
Ultimately QA over whole plants and their repair facilities. (Not just the electronics gear.)
What you did with a dual socket system I used to do with a TI-30. But not in 15 minutes.
(Just sayin' I know what the numbers are for.)
Unfortunately I don't know how to program or I'd just build the complete calculator I wanna see.

Patrick said:
If you are thinking 430B hours on a RAID-Z2 6x 4TB array is a reasonable figure, I would suggest you look at your assumptions.

You appear to be confusing 'reasonable' with 'practical'.

Is the 1.2M hours MTBF (137 years 24/7) you hard coded into your calculator a practical (or reasonable) number?
Does anyone actually expect a platter drive to last (on average) 137 years? - I don't think so.
Such numbers aren't meant to be reasonable or practical. They are meant to do comparisons.

Also, that's not what I said.
Z2's MTTDL is 430B hours - more than RAID-6's MTTDL.
Your own calculator comes up with 870B hours MTTDL for RAID-6.
If 430B hours MTTDL is 'unreasonable' (as you just suggested) then clearly a figure twice that (spit out by YOUR calculator) is as well.

But that isn't the point.
The point (and response to the question I was asked directly) is that Z1 & Z2 are NOT the same as 5 & 6 respectively in reliability calculations.

And thus I agree with cinedigital. - They should be included.
Particularly in light of how popular they have become.

Patrick said:
For those on here: Simple MTTDL RAID Reliability Calculator - Beta. Would love feedback.

Z1 and Z2 should be included. - That's my feedback.
.

PCBONEZ · Nov 9, 2017

I would think that STH would be benefited by having the only fully complete calculator.

MiniKnight · Nov 9, 2017

@PCBONEZ why is Z1 and Z2 different in how they fail? Do you have the math behind what you're saying? I've always been told they are essentially the same since they're using a similar parity setup and fail in the same way.

I've always assumed the "hard coded" 1.2M MTBF is because that was a common drive manufacturer spec along with the other options on that drop down. Well except for other because then you can put whatever you want in there. Maybe I'm wrong.

PCBONEZ · Nov 9, 2017

As I said earlier I'm just learning this RAID-Z(x) stuff myself and I'm not a programmer.

As I understand it the RAID-Z(s) use system RAM/CPU and some processing routines (that RAID-5/6 don't have equivalents to) to detect and correct data errors that RAID-5/6 miss completely. (Even using enterprise class hardware RAID-5/6 controllers.)

Basically the Z(s) check for and fix more kinds of errors. Many "on the fly". Fewer errors = less chance of data loss.
That's my (non-programmer) interpretation of what I've been reading.

.....
While I'm not a programmer, I am a tech.
Post retirement from my previously mentioned job I spent 12+ years repairing motherboards at component level.
(As in O'scope and soldering equipment. Removing/replacing chips & caps & such. LOTS of caps.)
I look at computers from the viewpoint of an Electronics Tech.

One of my own questions is:
With RAID-Z(x) being so RAM intensive how big is the difference in power use between RAID 5/6 and Z1/Z2?
Seems obvious that Z1/Z2 would use more power, but how much more?
.

Patrick · Nov 9, 2017

PCBONEZ said:
One of my own questions is:
With RAID-Z(x) being so RAM intensive how big is the difference in power use between RAID 5/6 and Z1/Z2?
Seems obvious that Z1/Z2 would use more power, but how much more?
.

More insofar as you may add additional RAM modules which consume power. By the time you have a decent sized array, a few GB of RAM is a small factor. Perhaps a good discussion for another thread.

On the calculator topic:

The ZFS scrubbing/ ECC is not (really) going to help you too much in this type of reliability model. A faster rebuild time will as that limits the amount of time you are exposed to a drive failure. If you are exposed during disk failure(s) ZFS does not necessarily help you if you have an undetected disk error before you get redundancy back.

The main issue is what happens when a second drive fails during the RAID-Z rebuild. No amount of data scrubbing helps when you have two drives die during an array rebuild.

You are right that it is not exactly on for RAID-Z1 and RAID-Z2, but from what I was told by the reliability folks at a large storage vendor, for this type of model it is a close enough of an approximation to use RAID 5 and RAID 6.

PCBONEZ · Nov 9, 2017

Patrick said:
More insofar as you may add additional RAM modules which consume power. By the time you have a decent sized array, a few GB of RAM is a small factor. Perhaps a good discussion for another thread.

Depends on the kind of RAM.
Some FBDIMMs use in excess of 20w per module at full load.
This would be the only reason I moved on from my beloved dual LGA771 boards.
Well, except the one I'm using right now...

Patrick said:
You are right that it is not exactly on for RAID-Z1 and RAID-Z2, but from what I was told by the reliability folks at a large storage vendor, for this type of model it is a close enough of an approximation to use RAID 5 and RAID 6.

As I showed earlier with Z(x) calc' numbers compared to your calc' numbers the difference is about 50%.
I would not call that close.

I will give you that the numbers for RAID-6 are so outstanding that 50% better and the Law of Diminishing Returns may make the 50% a non-issue for many people.....
... But that still doesn't make them close or equivalent.

I know one of the reasons the Z's fair better is they have protections against bit-rot and 5/6 have none.
I have no idea how -much- that affects the numbers. Just an example of why they aren't the same.

Vendor = Salesman.
Never trust a Salesman. They most always have undisclosed motives.
(And often knowledge deficits about things they don't sell.)
.

PCBONEZ · Nov 9, 2017

MiniKnight said:
I've always assumed the "hard coded" 1.2M MTBF is because that was a common drive manufacturer spec along with the other options on that drop down. Well except for other because then you can put whatever you want in there. Maybe I'm wrong.

I've been looking at Seagate/WD NAS type drives which all seem to be 1.0M MTBF.
WD RE4 (which I have a lot of) are 1.2M MTBF.
The RE4 apparent replacements (WD Gold) are listed as between 2.0M and 2.5M MTBF

I missed 'other' and the drop-down that allows entering your own number there - so thanks for the tip.
.

i386 · Nov 10, 2017

@Patrick can you post the formula that's used in the calculator?

MiniKnight said:
Do you have the math behind what you're saying?

I'm also interested in this.

Patrick · Nov 10, 2017

i386 said:
@Patrick can you post the formula that's used in the calculator?

It is essentially what is in the article on it.

Some of the folks debating the calculations there were also the folks who were doing IEEE paper for big vendors at the time. I actually had lunch with a few of them at the time to get and incorporate feedback.

The better models take into account more than just disk failures but are way more complex albeit more accurate.

i386 · Nov 10, 2017

Patrick said:
The better models take into account more than just disk failures but are way more complex albeit more accurate.

I know what you mean. After making the post I used google and found a paper with disk, psu/enclosure and different other failure rates in the mttdl formula

Patrick · Nov 10, 2017

i386 said:
I know what you mean. After making the post I used google and found a paper with disk, psu/enclosure and different other failure rates in the mttdl formula

Decent chance I spoke to that person.

One of the conversations I had was along the lines of "well I can license you my model, on a dual 4 core Xeon system it runs fast, only 15-30 minutes if you cut down the inputs."

Better accuracy, but I figured nobody wanted to wait that long for an answer. Even if it took 10 minutes on a Xeon D-1541 I would still need to have 2-3 nodes setup to just run the MTTDL calculator for STH.

ServeTheHome's RAID MTTDL Calculator Bug and Suggestion Box

New Member

New Member

Administrator

New Member

New Member

Well-Known Member

Well-Known Member

New Member

Administrator

New Member

New Member

Well-Known Member

New Member

Administrator

New Member

New Member

Well-Known Member

Administrator

Well-Known Member

Administrator