"restoring from LTO tape fails 40-70% of the time??"

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.
Crossposting an OLD comment, i'd like to find the actual original article for more context, and specific details...


The general findings show that more than 50% of the time, a full recovery attempt from tape will result in failure.

The following appears on DataMountain's website, it is a helpful summary of industry/analyst surveys:

"Storage Magazine (storagemagazine.techtarget.com) surveyed its readers . . . to determine how often unreliable tapes were at the heart of a backup snafu. When asked to describe the tape failure situation in their shops, nearly a third of the respondents (31.2%) said it was either a significant problem that often disrupts backups or a problem that sometimes disrupts backups.”

According to Microsoft, 42% of attempted recoveries from tape backups in the past year have failed. In addition, Ben Matheson, group product manager for Microsoft Data Protection Manager, told me, “More than 50 percent of customers we’ve surveyed said their current backup solutions do not fill their needs.”
“Restoring from tape fails 50% of the time in distributed organizations and mid-sized companies.” – Baroudi Bloor (www.hurwitz.com) (now Hurwitz and Associates)

“Over 34% of companies do not test their backups and of those that tested, 77% found their tape backups failed to restore.” – Storage Magazine (storagemagazine.techtarget.com)

A survey by the Yankee Group and Sunbelt Software found that 40% of IT managers had been unable to recover data from a tape when they needed it.
According to The Gartner Group (www.gartner.com), 71% of all tape restores fail. Strategic Research (www.sresearch.com) claims 54%."

------

Related to this a forum topic 2 years ago, some clips of responses in italics, out of order:



There is no such thing as safe data storage. Storage must be on multiple devices in multiple locations.

I have asked a person who deals with reading thousands of 'old' tapes. My question was what failure rate would you expect on 1,000 10 year old LTOs. The answer was about 10, but typically more than zero and less than 20. I would expect a much higher failure rate from disk drives that had been on the shelf for 10 years.

The answer for DAT tapes of the same age though is much worse.



I could believe this number, a sysadmin friend told me "about 1 in 100 tapes would have a problem" though didn't specify if that was losing a few files, losing the whole tape entirely or what.


I believe that statements made were more like:
1) the reliability of tapes is not constant and can greatly vary (by make/type of drive, make/type of media, exact storage conditions and *what not*)
2) the expected/perceived reliability of tapes is higher than real life one
3) there is no valid data (set aside personal anecdata, armchair experts opinions and results of (dubious) surveys) about the actual reliability in real world operation.

...there is a sort of sampling error even in the reported/documented cases.

I mean, you have 120 (say LTO) tapes (let us assume of the best technology/make and used and stored properly) covering (still say) your last ten years of backups (1 tape per month).

In most (that is 90% or 95%) cases you (fortunately) will never need to access them.

The one time you need to access one, let's say tape #37, you can recover all the data just fine.

The reasoning might then be "I took a random tape and it worked, hence I can assume that all tapes are fine, tapes are reliable".

On the other hand, another time you need to access tape #57 and no matter what you try and do, you cannot retrieve the data.

The reasoning then might then be "I took a random tape and it didn't work, hence I can assume that all tapes are crap, tapes are not reliable".

But in reality you took two times a 1/120 chance, once you won, once you lost, that's life.

If you had attempted recovering data from ALL the 120 tapes, and (say) 119 had no problems or (the other way round) 119 were not recoverable, then you could draw the line between reliable and unreliable.

I believe (and I may well be wrong) that it is more probable that out of 120 tapes, 110 will read fine (tapes are "reliable[1]") but - due to Murphy's Law - the data you need are in the 10 that fail.


-----------
After reading articles like this I feel my level of paranoia in trying to design my redundant array of inexpensive tape aka an LTO version of Erasure Coding actually makes more sense so I will continue to toil in obscurity on my project. :) A combination of mirrors and parity tapes to allow much higher chance of recovery after partial disaster and being a belt and suspenders together person i'll probably never be happy with less than a 3 way mirror plus multiple parity tapes to make up for more than one tape in a single copy mirrorset somehow dying.

I could totally understand how you could have tapes from say all one batch that ten years in the future, are just retroactively realized to have been not a great batch. The odds work in your favor if you do things like buy different branded tapes, made in different years, through different suppliers.

Questions i'm left with is how many tapes were in a full backup? If the average business had 100 tapes that all had to read perfect to fully restore a server, and they were ten years or more old, I could fully believe half of those businesses to have one or two tapes that wouldn't restore. I could totally believe statistics and odds creating an increased risk of a single tape failure happening for 'mid sized companies and distributed businesses' especially when a third dont test backups and nobody said if they only made ONE set of backup tapes vs two or three or had anything offsite.

This is literally the exact situation which mathematically I want to prevent ever occurring for my own projects. : P