anyone interested in Redundant Array of Independant Tape?

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.
So I posted originally 4-5 years back I think with a project that nobody seemed to understand quite exactly what I was doing or why... which is fine, because I realized I was braindumping too much not fully baked information about something that was still forming in my head. I've posted on other boards at different times and nobody had any answers or quoted me $50,000 and up to start making a solution plus needing a sysadmin to run things which wasn't going to work for me.

Fast forward a bit between covid shutdowns and unexpected life threatening medical issues i'm still recovering through things have been a bit stalled including progress towards my film production degree, but I hope in the next year I might be able to explain and show what I was trying to do then - build a D2D2T Disk to Disk to RAIT Tape offline media archiving system designed for long term catastrophe level recoverability (designed from the ground up for maximum protection against silent bit rot, corruption of data AFTER it leaves the server for which ZFS cant protect you, loss of data, loss of volumes, etc) at absolute minimum possible cost per terabyte and absolute minimum overhead cost including power use.

A RAIT means if I have a stripe of 8 data tapes and 2 parity tapes, I can lose any two of the original 8 tapes and fully restore the data and remake replacement main or parity tapes. This means I don't have to store full 2way/3way mirrors in offsite locations (ie 16/24 tapes) but can use 2 or 3 parity tapes to insure against loss of a volume in a set. Combine something like a local 2 way mirror because its faster to recover from a mirrored tape than regenerate a parity tape, and offsite storage in 1+ other locations where it's only 1 way but with 2-3 parity tapes being the goal.

I'm also designing it as part of a bigger D2D2T system on purpose meaning I have a primary NAS for active files, a backup NAS, and a migration path to and from tapes. The goal being all data from the primary NAS is backed up to at least one other disk as soon as possible (maybe not realtime but could schedule several times per day), and nothing is erased from the primary NAS until it is verified written to tape (and not erased from the backup NAS until it's on TWO tapes) so there's never a single point of data loss.

I'm pretty sure places like r/Datahoarder will be interested but i'm curious how interested people here are in the rest of this project as I build it. I could do anything from saying nothing until i'm done to just show the final project to keeping people constantly updated I mean... I just felt I might be on my own to do it all anyway since my original post a few years back left people a bit confused it seems with how to respond. :)
 

TRACKER

Active Member
Jan 14, 2019
169
48
28
Hi Twice_Shy,
my personal opinion on that topic:
not sure what is your experience with tapes, but...you know, tapes are not 'random r/w media' but streamer type. That suggests sequential IO.
When you try to read or write data from/to tape it takes...time, lot of time (while tape is rewinded/ff-ed) :)
It is hard for me to imagine something like 'parity tape' or 'stripe of tapes'.
I've never heard or read anything about such type of data storage using conventional 'raid' logic along with tape media, but i am pretty sure there is solid reason for that.
 

i386

Well-Known Member
Mar 18, 2016
4,221
1,540
113
34
Germany
i don't think that raid logic is the problem. I'm trying to think who has 10 tape drives and how you would store/organize the tapes (tape 1-4 for data, tape 5 for parity)
 

aero

Active Member
Apr 27, 2016
346
86
28
54
This idea seems needlessly complex in the extreme. Why not just take the source data and rar/par it if you want to be able to recover from corruption with parity, then commit it to tape in duplicate. Done.
 

msg7086

Active Member
May 2, 2017
423
148
43
36
Here's my personal opinion.

1. The main point of RAID is high availability. You make a RAID array because you want to keep your data ONLINE no matter what happened to your disks. You want to feel almost nothing when a disk fails, and the ops will simply replace the drive and it self heals. No impact to production unless failures exceed certain amount. Tapes, on the other hand, is not an online data source. It will solely be used to store data offline for backup purpose. The primary advantage of a RAID system no longer exist.

2. It seems like you are trying to reduce the purchases of tapes by implementing a solution using more tape drives. However tapes are much cheaper than tape drives. Take LTO7 as an example, used tape drives are around $2k, however new tapes are only $50 each. If you are doing a 10 tape RAIT you are looking at $20k upfront cost, which could have been used to buy tapes ($18k / $50 = 360 tapes).

3. Seems like you want to protect your data from losing tapes. How do you lose a tape? You don't just lose it unless you burn it or throw it away. Even if it's flooded you can still read the data from it after some restoration. Otherwise they should be very stable throughout the years. If you do want to add redundancy, do not do a RAID. There are plenty of other solutions like PAR, or RAR recovery record and recovery volume. Tape is a lot like floppy discs. You don't just install 10x 3.5 inch floppy drives on a computer. You pack files first into couple 1.44MB RAR parts with a few recovery volumes, then you sequentially copy each of them onto floppy discs and carry away.

4. Tapes are best kept at a remote location. Imagine if you caught a fire, your tapes can be burned down with your NAS drives. That's why it's much more common to fill up tapes then store them away. You don't keep them online.
 
  • Like
Reactions: aero
Responding to all because I posted, though it's sounding like there's not alot of interest in non-realtime storage even for TB on a budget so I probably wont extensively document and will just post something when i'm done. (which could be up to a year at current rate - i'm moving out of a house and going to college, so.. full plate)

Though I started the spaghetti, so i'll try to untangle it.

TRACKER - I know they are not random media. The idea is you are archiving alot of data for cheap, data you rarely access like huge 4k RAW video files that you don't NEED for any immediate upcoming project. Instead of 6tb of video you just shot on a BlackMagic sitting on a server for years like Linus Tech Tips, it's just sitting on a few LTO tapes should you need it again - not consuming power, protected against bitrot like ZFS.

RAIT actually is a thing just not well known, expensive and a bit boutique - it works like RAID for disks - a parity tape can replace any missing or failed tape. I'm not surprised there's no other solutions, i'm just surprised not many have asked for one because to me it seems so useful. :-/

i386 - you dont have 10 tape drives, you have 10 tape volumes cached on the second disk server and if for instance one of the tapes gets mangled, or you have a tape breakage (just like a VHS player, it goes through a loop so this DOES happen statistically nontrivially more often than total hard drive or SSD failures - its the one and only real downside to tape) you can still reclaim all the data from the volume that tape represented even if you dont have a mirrored set of tapes. You use the parity tape to restore any missing volume in the set, just like with a disk.

I'd just store and label accordingly - Dataset A labeled data 1-10 and parity 1-2. Have a CDR with the filetree or something if I want to see what is where.


aero - saving thousands of dollars requires some complexity, i'm planning to store north of 100TB so the difference between 2way and 3way mirrors is alot. The easiest answer is to throw money at the problem. Put everything on 16 disk off the shelf NAS's buying three so you have two backups. I want that level of data recovery ability at far less cost because a struggling videographer is trying to buy stuff that shows up on screen (nicer cameras and lenses) more than back end gear.


msg7086 - Tapes are just a way of backing up data, people use LTO all the time for this purpose. Some people organize offline media archives - they aren't realtime but they aren't unusable, taking 1-2 minutes to find the start of a video file on a tape with LTFS for a project you thought you didn't need to touch awhile isn't bad, this isn't for your daily workflow usually but ingressing alot of data once that you'll be editing for days or weeks afterwards. For massive video files this is not a big impediment. I only send data out to tape when I dont think i'm going to need it at all, or for awhile, on project completion. You don't throw the data away, you just try to store for cheap - tape is cheaper than spinning rust and way cheaper than SSD with no power overhead or "power surge killed the disks on the server" risk. This is already the workflow for lots of video companies. Linus Tech Tips has 3.6 petabytes on 270 drives and one lightning strike to the power grid or a flood would be a nightmare - I might have a box of tapes at the office in a fire safe plus one at home making it offsite.

The only thing i'm adding to normal practices is a parity tape. If I output 48tb of data to four LTO8 tapes and one of the tapes gets damaged i've lost 12tb of data. So I mirror - but now that's twice as expensive in tapes, and if I lose two tapes I might lose two of the same volume of four - data still lost. 3 way mirror even more expensive. Adding offsite mirror(s) in addition - REALLY expensive. How paranoid am I about losing data - i've had two backups and still lost files corrupted on both external USB hard drives due to silent bit rot and separately had an on site catastrophe losing everything stored in one place before. I'm REAL tired of losing my data. :- /

A parity tape or two would be insurance against losing the data - without the expensive of 3way, 4way or more way mirroring. More complexity and hassle yes, but when it saves thousands i'll put up with some if I don't do it too often.


I wont be using more tape drives - i'll be using more tapes, one tape drive, and hard drives with data ready to write to the tape. You 'lose tapes' because tapes have the failure mode of the long spiral of magnetic media getting mangled up... did you never own a VCR or a cassette player in the car that would occasionally eat a tape? This is a non-zero risk - i've had sysadmins tell me they loaded a tape, it broke, but that can happen so they loaded it's backup - then THAT broke because a small part failed in the LTO drive. Yes that's rare but all catastrophes are.

PAR and RAR have impossible overhead in my experiments - I tried making RAR parity files for just 100gigs of files and it gave me a 30 DAY compression time on my spinning rust hitting a RAM bottleneck no matter what I put for switches. It seems to break down with huge datasets. I asked on RAR/PAR boards how to do this and nobody had any suggestions for how to do it for terabytes as opposed to usenet posts. Your comment of 'tape is like floppies' - I never said i'd install 10 floppy drives or tape drives. I'm packing files into a couple of 2.5-12tb tape volumes on Disk Server #2 and then have that server create a parity volume I can access if any of them fail to load back onto the server. Heck that parity volume could even be RAR and PAR files if it would make them - that technically makes it a parity tape - I just wasn't planning to use those tools and my specific protection was against the "the tape jammed/broke we just lost every file on that volume" risk which is uncommon but is a specific failure mode of tape. All the parity on one tape simultaneously verifies and protects against the other per-fileset degradations too. So there's a reason to move all parity to a designated tape IMHO.


I still have some of the same barriers of communicating what i'm trying to do like I did the last time I posted my project I think. :-/ I'll check this one more round for followups if anyone wants to chime in, then stop talking and just go build it... if I eventually make a video explaining it maybe it will make more sense then. I'd need to show the data workflow and how things can go wrong with other storage vs how what i'm proposing fixes that with some graphics probably. It's all about storing data at minimum cost and maximum protection against loss against statistically known risks and threats and i'm using a second disk server (D2D2T, first Disk server is for video ingress and working in Premiere, the second is like a cache for tape volumes to use one tape drive instead of several, which is holding tape/TAR type images until burned) which both creates and can restore a parity volume for a set of tapes. If I lose say volume 7 of a tape with no other mirror of that tape available, I just load the second disk server with all the other volumes (say data 1-6, plus 8) and the parity tape, and run software to restore volume 7 using the parity data. Then volume 7 is rewritten to tape, and the integrity of the dataset restored and verified with nothing lost.

This second 'tape image cache' or parity creation/restoration server (which I was calling a tape preparation server a few years back) if you want to call it that is the only thing different vs dozens of videohouses that just directly make mirrored LTO tapes of all their old source data and in-progress Premiere sessions if they need to roll back to an older edit or something. And this is what lets me save thousands of dollars once we climb into the 100TB plus category of cost of data. It adds up and compounds.


I'll field other public questions for one more round, and if everyone is still scratching their head but wants to understand just PM me in the future if you find this and wanted to hear how the project turned out. I still have at least two disk servers to build first for this even if nobody understands what i'm doing with them. :)
 
Last edited:

msg7086

Active Member
May 2, 2017
423
148
43
36
OK that looks much better with your clarification.

I tried making RAR parity files for just 100gigs of files and it gave me a 30 DAY compression time on my spinning rust hitting a RAM bottleneck no matter what I put for switches.
Try smaller volumes. Split 100GB of data into 8GB volumes, and it should do quicker. (I backed up couple terabytes of data in 6 or 8GB volumes, they are not as slow as you described.) Start with 5% recovery records and a few recovery volumes and see how it works for you.
 

TRACKER

Active Member
Jan 14, 2019
169
48
28
i have experience with LTFS on LTO5...it is freakin slow when work with small files spread across the tape, this "shoeshining" is awful...don't know what is your personal experience with LTFS but if i were you, i would look in something like: nas with lots of disks for random/24/7 data access + LTO periodic backups.
 
i have experience with LTFS on LTO5...it is freakin slow when work with small files spread across the tape, this "shoeshining" is awful...don't know what is your personal experience with LTFS but if i were you, i would look in something like: nas with lots of disks for random/24/7 data access + LTO periodic backups.
None so far, i've never used LTO before, just researched it. Do you mean it's slow or shoeshines if you are accessing small files (they get in the way somehow of the one big file youre looking for) or if there are any small files in the directory at all?

One reason for wanting LTFS is to have the ability to pull "just a few files" straight from tape if I don't need to ingress 2.5-12tb back from a single volume thru backup software. Most likely large video files - I wouldn't plan to make a habit of grabbing WAV clips or mp3's or something off tape. Another reason for LTFS is I don't like proprietary formats because of where they might be in 10 or 15 years - I should be able to still pull files as LTFS from mac, win, or linux in 20+ because I don't see that going away. Specific backup company software? Who knows if it will still run or has a version in the future. I might ALSO use that for SOME things (like a collection of WAV or small files), but not all things, and honestly just TAR then LTFS might be better.

Another reason for wanting LTFS (which might be misinformed) is if for some reason an individual file had an error on the tape, I assumed a monolithic archive might just... die (the way a RAID rebuild can stop on a single bit error) but hoped LTFS would just move to the next working file off the tape so it's not a total loss. (If anyone knows if this is true or false please inform me.)

I'm planning D2D2T - there will be a primary NAS holding files i'm using for immediate projects. A secondary NAS which is mostly meant to assemble volumes of somewhat related data/images to write out to tape (each drive sized the same as the tape ie would be 9tb if LTO7 m8 format or 12tb if LTO8), but can also serve as a holding area if I need just overage storage for a project. (My rule is I want two copies of the files at all times - if I have to wipe my CFast cards it's mirrored on primary and secondary NAS, if I can leave the data intact then the whole combined space is available if needed until migrated to LTO)

For small files like audio files for ProTools they should be able to stay on the primary NAS or workstation indefinately. It's those big video files in high bitrates that suck data like no tomorrow that I need to migrate to the offline tape library once a project is done if I no longer need it.
 

TRACKER

Active Member
Jan 14, 2019
169
48
28
The speed of tape is not slow in general, but only when you copy small files randomly, because of the nature how tape works. If files are large in size and streamer just reads/writes sequentially then you could reach almost max speed of the LTO drive. If you have multiple small files (e.g. few KB to 10s of MBs in size) then way how to files will be read depends on how your OS will read them - for example OS could try to read them randomly, which means tape will fast forward and rewind tape multiple times until it finds and reads the particular file.
I had the same considerations like you regarding LTFS and that's why i decided to go with it, not using any proprietary/enterprise backup software :)
Unfortunately (as far as i am aware) LTFS does not have some special algorithm, ordering files which need to be read from tape in order, where tape drive will read them sequentially. Also, another consideration (if you use windows as OS), if you have anti-virus software - it will scan each file, read from tape (well, it will try to scan everything which OS is processing), that additionally slows down the read speed and generates additional "shoeshining".

I don't know if my explanation is good enough :) Perhaps someone on the forum may give better one.
 
The speed of tape is not slow in general, but only when you copy small files randomly
No that actually makes perfect sense. So I either have to try to 'guarantee' i'm copying files sequentially, or only grab one file I really need at a time with AV off, or just RAR or zip or TARball anything small together if I expect to need multiple files out of a group of related things - like TAR up an in progress Premiere session instead of writing as is to tape because even if I drag and drop intending it to write sequentially, it may or may not want to pull back off in the same order...

AKA for one file at a time it should remain pretty good but will rapidly degrade when it becomes multiple files. (I wonder if restoring a full tape from LTFS is just a drag and drop operation or if there is a way to force a sequential pull of everything)
 

aero

Active Member
Apr 27, 2016
346
86
28
54
A file written to tape will be written sequentially.
If you're using linux it's very simple to control, read, and write with tape...

#rewind tape to beginning
mt -f /dev/nst0 rewind

#seek to end of data
mt -f /dev/nst0 eod

#move forward to the beginning of next file
mt -f /dev/nst0 fsf 1

#write a file to tape
tar -cvf /dev/nst0 <filename>

It is the seeking to a particular file that is time intensive.

If you want to know what is on the tape you'd have to keep track separately, or rewind to the beginning, move forward to each record and 'tar -tvf /dev/nst0' to get the file info of the tape record you're at.

Scripting is useful.

edit: you need to know where you are at on the tape, so you can traverse forwards or backwards x number of records, or rewind all the way to the beginning, or all the way to the front (eod).

edit2: what I have described is the method for directly manipulating tape; not using LTFS. I would rather not use an additional layer of abstraction if at all possible; that's just me though
 
Last edited:
Are those LTFS specific commands or for tape control under linux in general? I'll be starting on windows because I dont have enough experience with linux, dragging and dropping directories or just the whole drive at a time when writing out I assume. Even if i'm willing to learn linux for personal use, i'm setting this up as a system for others who will not, so yes this knowledge of exactly how tape works is useful to me.

How does LTFS copy a whole tape back to a (windows) drive, is it a drag and drop operation or is there any way to keep files in the order written to tape to avoid a hassle? (else that seems like a big oversight but one I better be aware of)

I'm still wondering if anyone else sees the potential usefulness of parity tapes as I explain them or if for their needs they'll just mirror 3ways-plus instead of saving thousands of dollars. :) The simplicity with which two broken tapes in a row (likely the tape you want plus it's mirror, because until it happens twice in a row you don't realize anything is wrong with your drive - occasional broken tapes are a thing) to me seems a danger statistically out of line vs every other benefit of tape. (far lower bit error rates, cost per TB, storage longevity/no sticking drive heads from a drive kept in cold storage)
 

aero

Active Member
Apr 27, 2016
346
86
28
54
Sorry, caught me mid-edit. see my edit2

edit2: what I have described is the method for directly manipulating tape; not using LTFS. I would rather not use an additional layer of abstraction if at all possible; that's just me though

LTFS has to do all of these same things under the hood, so it's useful to understand.

I'm just not as cost conscious I suppose to want to mess with parity. You suggest that you think a double tape failure would be a high likelihood...how would that be any different than your data tape + parity tape failing?
 
I'm just not as cost conscious I suppose to want to mess with parity. You suggest that you think a double tape failure would be a high likelihood...how would that be any different than your data tape + parity tape failing?
High, no, but far higher than other risks to your data once your doing 3 way mirrors with the super low data corruption risks. I just know that i've heard from other LTO tape owners that once in awhile a tape breaks, either put in by hand or in the library machines. Myself I would unthinkingly just pop in the mirror of that trying to restore the same data. If a failing tape drive (like some piece had broken internally) had been the culprit - the next tape might break and you could lose your data in a 2way mirror.

The difference is that beyond two failures, increasing parity provides more protection than simple mirroring for the number of total tapes you're paying to have. For instance I plan to use at least 2 eventually 3 parity tapes with each set of data. (probably 10 data tapes in my case) I will also have at least a 2 way mirror at first, eventually 3 way for off site something. So if Volume 6 fails (or breaks), if i'm smart I try a different tape first to be sure the tape drive didn't die, but if i'm running on autopilot I just grab my other local copy of Volume 6 from the fire safe. If THAT breaks I have two options - drive across town for the offsite mirror (the third in the 3way set), or load all the other volumes to my server, to include one parity tape and restore Volume 6 to the world of the living - then rewrite it to tape to replace my damaged ones. Now I can survive further losses, it's like running a scrub in ZFS.


The logic is similar to how mirrored RAID arrays where say it's 10 drives and you might lose 8 drives on one set but all of the mirror is intact so you might rebuild all 8 drives copying from the other. But if you lose the WRONG 2 disks in a mirrored set you've lost all of the data on that volume... if you RAID5+1 now you can lose any two disks including the worst two to lose (the same volume on both sides) and still recover all data, probably even the worst three by my logic. (if you lost one of the two parity disks plus two of the same volume, you should be able to copy the parity disk first and rebuild your lost volume then copy that)


Rebuilding a volume from parity is more time consuming than just reading a tape back in, so that's always my first choice. I could go over other catastrophe scenarios... say I have a 2 way mirror at work and a 1 way mirror with parity in a storage garage. A fire burns everything up at work. Without parity tapes I have ONLY ONE REMAINING CHANCE to restore all my data - haven't you chewed your fingernails during a RAID rebuild before? If I have parity tapes, a breakage or two or other lesser tape error is not the end of the world. My chances of a total recovery go dramatically up. With two offsite storage places it goes up even more. If i'm able to save any of the tapes from either catastrophe site, that may further increase chances of recovery if some tapes are usable and some arent from all sites - this can easly turn into paranoid level "recover from multiple catastrophes" level data integrity IF that's what you want it to become. Parity tapes are like RAID5 vs RAID6 or assumedly what RAID7 would be - how many hard drives do you want to be able to lose while still recovering everything? And it shouldnt replace, but rather augment a mirrored set like belt and suspenders for anything important or representing alot of work. Video footage might represent tens of thousands or more dollars of work or be irreplaceable (deceased actor), but at some point just mirroring backups even more like 4way 5way 6way is diminishing returns while costing alot more. It's my opinion parity is a better way to solve this problem and allows a 1way stripe with parity to still have a high chance of recovering from catastrophic losses.
 
Last edited:

oneplane

Well-Known Member
Jul 23, 2021
844
484
63
For tapes it makes much more sense to just have on-tape parity information and simply use two tapes if the data is that important to you. Anything beyond that means you also have to have multiple geographical locations for more embellishments to make sense.
 
I'll use it as an opportunity to rethink what i'm trying to share... I know i've lost data with two USB drives backing something up and also lost "everything at one site" from catastrophe too. If one volume of a tape broke or was confirmed bad and there was only one remaining copy of my data in existence unchecked in years i'd be biting my nails for the next 4-12 hours waiting to see if the restoration worked, just like I would during a RAID5 rebuild.

I know further mirroring (4way 5way 6way) only makes sense with geographic distribution, but without parity tapes i'd want to stick away 2way mirrors per offsite location in case it was the last location gone to in an emergency. Yet i'd rather have 1 set of say 10 data tapes with 3 parity tapes than a 2 way mirror, statistically safer and it would cost a bit less (13 tapes vs 20). The exact economics varies based on # of data vs parity tapes obviously - i'll probably make a spreadsheet sometime to compare different strategies including the overhead of the server itself to find the break even point but my back of envelope calculations shows it seems to work for me something north of 100TB total data size.

For me it's a comparison of throwing money at a problem vs some additional hassle but being smarter for superior protection with less total # of tapes for important data. Modern movies like Star Wars Rogue One used 1 petabyte of total data in it's production - that'd be 56 LTO9 tapes, costing about $9k. A 2 way mirror $18k. Striped into 10data/3 parity about $12k or saving 6 grand per place you wanted a 2 way mirror and 1/3 of the physical space for greater protection.

I'm expecting my future dataset to plan for to be somewhere in the 100-900TB range before I feel forced to reanalyze, I could stretch it into the petabyte range but i'd want to look at the numbers of time overhead by then if someone had to be hired. Right now this is a SOHO video storage solution for lots of high res high bitrate video production hopefully simple enough I can do it all myself. I want to get back to shooting video not playing sysadmin.
 
Last edited:

oneplane

Well-Known Member
Jul 23, 2021
844
484
63
I don't think the quantity of tapes is as relevant as you think it is. For disks the cost is pretty high, so using computation to save on the required amount of storage is a nice trade-off. But the whole point of tapes is (relatively) cheap bulk archival.

If you want more copies (for redundancy), just make more tape copies. If you want to have many redundant stores, just keep 10 copies on 10 different tapes in 10 different locations. There is no point in having parity tapes if each one holds 30TB (or more) each for less than $100.

RAID is a cost reduction measure first, and the redundancy that comes with it is mostly a side-requirement of the inexpensiveness of the used disks. If you just want to not lose data, make copies. Redundancy in an array is for uptime (so availability, not durability), and you need redundancy if your array is fragile when using inexpensive disks.

Now, if we take your Star Wars example: do you really think the value of storing some tapes is something people will really care about as long as it's below $100K? Probably not. Hell, they'd happily pay $700K if that gives a single fraction percentage of recovery chance. At some point, the importance to an owner and the scale of an operation makes anything that might 'save some pocket change' really irrelevant.

In most cases of large archives, you don't specify what kind of parity level or error correcting code you want, you specify the durability and availability in a percentage with 5 decimals or more. And those numbers are generally also directly influenced by the insurer of the company or project related to that data.

Now, if you are a homelabber or someone trying to save a dvd rip collection, you're likely going to have around 90TB of data. That means 3 tapes. If you want redundancy, you simply buy 3 more tapes. And if single redundancy isn't enough you buy 3 more tapes. Then you have 9 tapes, which fit in a pretty small box. Or three even smaller boxes. Doing some parity or manual ecc really isn't going to help much. Even if you did want to really squeeze out 1 or 2 tapes, you will save very little money and space, while halving the speed at which you can read the files as you need to read the entire tape set before you can reconstruct the archive (if you use parity).

Perhaps if you're a large business but for some reason have very little money, you might do it so you can use 1000 fewer tapes, but then your recovery time with be multiple weeks longer making the entire backup non-viable because you're be bankrupt by then.

And lastly: there is no "array" in tape land. They have robots and libraries, which are sometimes called arrays, but they are not arrays like 'disk arrays'. This is because an array requires a read and write pattern that doesn't work for tapes. So if you were to hack something together, this means you have to read an entire tape, read another one, and then read a parity one, store them on disk (so you'd need a 90TB disk), and then start reading the individual contents. You could try to do it in RAM, but you'd need 90TB RAM. You could imagine that you store individual files instead of backup archives, but then you'd be stuck for multiple years* due to access time latency, which adds up significantly when you're at that many-TB level of storage.
 
Last edited:
  • Like
Reactions: aero

aero

Active Member
Apr 27, 2016
346
86
28
54
Yes, please stop calling these arrays, these are not anything like traditional disk arrays that stripe and parity (raid5, raid6, raidz, etc.) I think that's partly why people were not understanding what you were trying to do originally.
 
Unsure whether to defend and explain or drop, but i'm mostly closing the topic because there doesn't seem to be public interest in continuing it and i'll keep my other solutions or ideas on the topic to myself calling it a solo project. I called it redundant array of tape (not tape drives) because I thought I figured out ways to write the tape sets with a single tape drive, but if there isn't public discussion interest it's irrelevant. I also said money savings only matter on the small end in exchange for some work and hassle, I didn't propose Disney cares about saving a few thousand dollars but i'm still a film student trying to bootstrap production so it matters to me and might've to others. I'll keep my future topics here to spinning rust and SSD's.

Other solutions exist so i'm not the only one https://manualzz.com/doc/o/st5pd/th...ait--redundant-array-of-inexpensive-tape--sup...