Backup and archiving: HDDs or tape?

aag

Member
Jun 4, 2016
52
3
8
I need to backup several TBytes/week, ideally through a differential or incremental strategy. In addition, I need to archive hundreds of very large medical imaging files (500 GByte each). You may have seen the details of my project in the network subforum.

My question is: should I use a tape backup system? Or should I buy a bunch of HDDs and perform the backups using a hot-swappable drive bay?

The options and prices are as follows:

Tape: A Tandberg LTO-8autoloader can hold 24 tapes. Price: approx. $ 6'000
Each tape can hold 12 TBytes (raw), a pack of 20 tapes costs $ 3'600 ($180/tape). Hence we are at $15/Tbyte, plus the tape device, plus the inconvenience that it will be hard to send the tapes to collaborators since nobody is likely to have the reader.

HDDs: Currently a 14 Tbyte drive is approx. $600, hence $42/Tbyte. This is almost 3x as the tape, but HDDs are highly portable, and do not need any obscure software or additional hardware. I suspect (but I may be wrong) that they may also be faster than tape, at least for random access.

The breakeven point is at ca. 14 HDDs (14 TByte each, hence 196 TBytes) ($ 8400) vs 16 tapes holding 192 Tbytes ($ 8560 including the recorder). This is the amount of data that we produce in 1 year. Beyond that amount of data, the tapes become much cheaper than the HDDs.

My question to the community is: what should I go for? Tape or HDDs? Are there any issues that I am not considering? Thanks for any and all advice!
 

EffrafaxOfWug

Radioactive Member
Feb 12, 2015
1,285
439
83
You should point out distribution in the thread title as well if that's to be one of the considerations too. Personally I think you probably need to consider both approaches, using HDDs for distribution and tape for backup (but that's just me as I like tape).

HDDs probably make the most amount of sense assuming the people you're distributing files to don't have access to an LTO drive. Hard drives will almost certainly be faster than the tapes as although you'll be limited to 1-200MB/s maximum, it won't need to be copied off into a staging area and for random seeks (although from the sounds of your workflow those'll be rare) it'll be an order of magnitude faster. You'll need to take some care in transporting the HDDs as they're less physically resilient than tapes.

Just out of curiosity, how compressible are the files? I know they're an image format but my knowledge of most medical image formats is that they're usually uncompressed, or else using JPEG2k lossless.

Do the files actually change much in the course of being used? This'll have a large bearing on how much backup space you'll need, as well as backup software managing it. You don't want to be copying seventy-three different versions of a 500GB file where only 1MB of metadata or doctor's markup has changed... most of the backup agents I've used before aren't clever enough to figure this out unless they have a dedicated function for particular file/DB formats (which is why things like ZFS/btrfs or other block-based snaps are very useful). Have you actually tried your backup methodology yet to see how the files behave?

For backups and archiving though, LTO can't be beat but depends a lot here on timescales. How long would these need to be archived for, how many versions do you need to keep and how much shit are you in if the backups don't work? Some hard drives can have "stiction" (stuck bearings mostly) problems after a long time dormant and if you're moving things about a lot you'll likely run into higher failure rates than you would otherwise, so I'd be very wary of having one file only on a single hard drive.
 
  • Like
Reactions: aag

gregsachs

Active Member
Aug 14, 2018
320
88
28
Would it be possible to backup to the cloud, ie backblaze/amazon/etc? Amazon just announced their deep glacier service at $0.001/gb/month, or $200/month for 200TB.
 
  • Like
Reactions: aag

cesmith9999

Well-Known Member
Mar 26, 2013
1,211
364
83
You need to think about your strategy first. figuring this out before the cost so will save you $ in the long run

typical strategy is 3-2-1 system

3 copies
2 media types
1 offsite and offline

recovering deleted data is MUCH more expensive than any cost in backup software.

also make sure that you look into cloud backups. there are many solutions. but you need to look at $ there as there are many costs that show up unexpectedly.

Chris
 
  • Like
Reactions: aag and dandanio

dandanio

Active Member
Oct 10, 2017
147
50
28
Plus some technologies are HIPAA/SOX/PCI compliant and some are not. Make sure that whatever you come up with is compliant with whatever you are audited against.
 
  • Like
Reactions: aag

aag

Member
Jun 4, 2016
52
3
8
Thanks to everybody. Some clarifications and answers to questions:
- These are research data. There is no audit.
- Compressibility is 20% at most.
- Cloud storage would be nice, but I fear that transfers will be too slow. We moved a few terabytes from Switzerland to Stanford, and it was a royal pain. We ended up fedexing an HDD!
- Files will hardly ever be modified, and retrievals from archives will be rare (but will certainly happen). Hence random access may not be crucial.

All in all, I am propending towards tape. What are the arguments against that (if any)?
 

cesmith9999

Well-Known Member
Mar 26, 2013
1,211
364
83
a few more things to think about. what is your long term retention needs? is this only for a short term project? 1-2 years or longer?

Transferring files to the cloud is not the same as transferring them across the globe.

do you also have a need to send files across the globe. them a cloud based solution may be a better answer.

Planning for an Azure File Sync deployment

there are other solutions like this one.

Chris
 
  • Like
Reactions: aag

EffrafaxOfWug

Radioactive Member
Feb 12, 2015
1,285
439
83
If you can stomach the up-front cost for both the tape backups, and tape drives for restores and for clients, and backup software if you don't have it already, then tape is likely the best option for all your use cases. It also makes expanding your backup much cheaper in the (surprisingly common!) event that you need more backup capacity or copies than you imagined.
 
  • Like
Reactions: aag

aag

Member
Jun 4, 2016
52
3
8
Do modern LTO8 tapes need to be magnetically "refreshed"? If so, how often? My policy has always been, since 25 years, to keep all raw research data for at least 10 years. However in 10 years I will probably be retired (or dead), hence I would not need much longer than that.
 

cesmith9999

Well-Known Member
Mar 26, 2013
1,211
364
83
Magnetic tapes and HDD do not need to get refreshed. however they have a different issue: obsolescence.

SSD do need to get powered on once in a while.

I can count that there have been many times that I was handed a tape to restore. Then had to find a drive (equal to or newer than what it was written to) to restore it from... and figure out the software that was used to back it up.

Right now HDD are going in two directions

1) fast and furious - SSD in the NVMe realm
2) big and slow - large HDD.

Tape drives are adding more capacity to existing tape formats just like HDD and SSD. and new formats. in 10 years you will be lucky to find a tape drive that will be able to read your tapes.

Cloud has its own problems... what happens in a couple of cases... when they go out of business or you need to transfer to a different cloud provider. how fast or slow can you transfer files/backups.

Chris
 
  • Like
Reactions: aag

EffrafaxOfWug

Radioactive Member
Feb 12, 2015
1,285
439
83
LTO theoretically gives you wiggle room on obsolescence though - they maintain backwards compatibility for reads for two generations - i.e. an LTO9 or LTO10 tape drive should be able to read an LTO8 tape (there are some caveats to this esp. re: the media change from LTO7 to LTO8 but those shouldn't affect you here). The LTO5 (~2010) drives at work are able to read some old LTO3 (~2005) tapes we dug out, if the data's remotely important you'll read in your old tapes and write them out to new ones whenever you replace/upgrade your tape drives.

The tapes themselves should last at least 15yrs assuming they're kept in relatively good (i.e. cool and dry) conditions.

SSDs I'd stay well clear of for the time being as they're just not designed for archival purposes. One of my first ye olde SSDs survived entirely intact after 5yrs powered off, but TLC and especially up'n'coming QLC are much less forgiving integrity-wise. Some of you might recall the problem Samsung had with static data on their TLC drives (840 series was it?) necessitating a firmware fix that continually moved "old" blocks around internally.
 
  • Like
Reactions: aag

kapone

Well-Known Member
May 23, 2015
814
399
63
Lemme do some rough calculations...

- You need to backup "several" TB worth of data every week.
- The breakeven point between HDD and Tape is ~192TB.

At a conservative estimate of:
- 3TB/week x 52 weeks = 156TB/yr = Breakeven point reached in a year and 3 months (approx)
- 4TB/week x 52 weeks = 208TB/yr = Breakeven point reached in less than a year.

This calculation assumes no compression, so based on your estimate of 20% compression we can adjust accordingly.

My point is that in approx one year, you'll have reached your breakeven point in terms of cost. If this project/endeavour is a long term thing...I think the answer is obvious. An HDD based mechanism will be far easier to operate, maintain, be more portable and will have much higher longevity than tape drives.

p.s. You have not accounted for an actual server to house these HDDs for backup purposes. There's some cost to that as well.

This all being said, if you're generating ~200TB/yr that you need to store for a relatively long time, you should take all estimates you have/get and multiply them by 3x to 4x. That quantity of data WILL require thinking/planning/infrastructure/maintenance beyond what you're envisioning. And we're still at only a single backup of the data... :) If this data is critical/very hard to regenerate I'd want atleast two copies of it, with one offsite. Now, we're talking multi sites, networks, etc etc.
 
  • Like
Reactions: aag

Evan

Well-Known Member
Jan 6, 2016
3,142
530
113
In this instance I would be using tapes for backup the volumes make sense.
Then some disks for transporting data when needed.
Cloud backups are cheap but get back data is not cheap or that fast... still you hope never to need that.
 
  • Like
Reactions: aag

kapone

Well-Known Member
May 23, 2015
814
399
63
In this instance I would be using tapes for backup the volumes make sense.
Then some disks for transporting data when needed.
Cloud backups are cheap but get back data is not cheap or that fast... still you hope never to need that.
I really wouldn't. The math just doesn't work out, even leaving aside portability and longevity issues.

You don't see Facebook or Google backing up to tape, do you?
 

Evan

Well-Known Member
Jan 6, 2016
3,142
530
113
I really wouldn't. The math just doesn't work out, even leaving aside portability and longevity issues.

You don't see Facebook or Google backing up to tape, do you?
Actually yes, they at least Facebook used to until recently (I can’t speak for right this moment) have a final tape backup, not saying it’s as often as the disk backup with snapshots etc but there is certainly some data on tape as a final backup.

One thing I would prefer a one tapes is storage compared to all those disks. It’s just easier, from labeling to weight and how they wound sit on a shelf.

I know they mentioned compression was only 20% but I don’t wonder if you couldn’t use the new extreme disk compression (VDO) for better than 20%, at same time most tapes manage at least 1.5:1 if not 2:1 compression.
 

Terry Kennedy

Well-Known Member
Jun 25, 2015
1,068
508
113
New York City
www.glaver.org
I really wouldn't. The math just doesn't work out, even leaving aside portability and longevity issues.
I would (and do). I got a 48-slot library and LTO-6 drive/sled, both brand new, for well under $1000 total several years ago. I had to wait for that deal, but when it came along I stocked up on the $295 libraries (Dell TL4000, AKA IBM TS3200). It is a simple matter to change out the LTO sled for a newer model when you want to upgrade, and as was pointed out, LTO generally reads 2 generations back (and writes 1 generation back). If you wait until a new generation comes out before purchasing the previous generation, you can take advantage of the greatly reduced demand for drives and media, since many customers jump on the "latest and greatest" bandwagon right away and reduced demand = reduced prices.

Tapes are designed to go in and out of drives, often with robotic assistance (like the above library). You probably don't want to manually swap hard drives multiple times during a multi-volume backup, while the tape library does that for you.

LTO drives support transparent encryption (if you add that option) and compression, both in hardware. WORM media is available if you need to have auditable integrity. Barcode labels are readily available (or can be printed by the user) which makes organizing and inventorying of tapes very easy. And the tapes have cartridge memory chips which lets you automate those processes - just wave the tape over a reader connected to your host system / notebook / whatever and you get the complete usage history of the media.

Regardless of the system chosen (tape, disk, 80-column punch cards, etc.) the main issue with restoring the data in the far future will be the ability to understand the backup format used. I have systems which regularly read data off IDE (pre-PATA) drives, and I could probably read ST-506 MFM, RLL and ESDI drives with less than an hour's work. Likewise for most QIC formats. But the software that wrote those backups is long gone, and even if I had it, it wouldn't run on any of my systems. Even NASA had to send their old media to an obsolete-format specialist to recover the data. My LTO backups are written with GNU tar, and I expect that format will continue to be common well into the future. And if it isn't, I have the source code for it and can port it to newer systems. [I have a favorite editor that I've been dragging around since 1990 - it started on a VAX 750 running 4BSD and has moved through many different iterations of hardware and operating system, and I'm now using it on FreeBSD 12 on amd64. So supporting GNU tar is definitely do-able.]
 

sovking

Member
Jun 2, 2011
57
5
8
I would (and do). I got a 48-slot library and LTO-6 drive/sled, both brand new, for well under $1000 total several years ago. I had to wait for that deal, but when it came along I stocked up on the $295 libraries (Dell TL4000, AKA IBM TS3200).
I've no experience with this, so it's not clear for me.
If I would like to get a tape library, you say that I need a library like this: IBM System Storage TS3200 Tape Library Chassis / Bandbibliothek - 3573 L4U | eBay

and a lto6-drive/sled like this: IBM TS3100 / TS3200 Sled For LTO6 FH FC Module 45E2389 ( No Tape Drive Include ) | eBay

Nothing other else ? (that's is less than 500$ + shipping)
They will be compatible ? How do I check the compatibility of library with sleds ? same brand ?
Or I'm missing something ?

Thanks
 

Terry Kennedy

Well-Known Member
Jun 25, 2015
1,068
508
113
New York City
www.glaver.org
I've no experience with this, so it's not clear for me.
If I would like to get a tape library, you say that I need a library like this: IBM System Storage TS3200 Tape Library Chassis / Bandbibliothek - 3573 L4U | eBay
Yes. There are a couple of things to be aware of:

1) Used libraries will be subject to expensive (more than you pay for the library) damage if not shipped with the picker lock in place - see here for an easy way to do it with a paper clip.
2) Other damage can happen if the library is shipped with tapes in it, or if it is not properly packaged.
3) You can't run the library on a table or other non-rackmount unless you put rubber feet on it - if you don't, the bottom will bend up in the middle and jam the picker. So if you are planning on rack-mounting it, make sure it comes with the rails.
4) When you buy a chassis, you probably won't get the rear filler plates. Those will cost you US $50 or so each, when you can find them. You can just tape cardboard over the opening(s) - the idea is to maintain the cooling airflow, not to look nice.
5) Some of these libraries are being sold because they are very well-used. Some of the things the web interface can show you are the firmware version (see #6 below) and the "number of moves" which is the number of times the robot has moved a tape. I'd probably get worried if it got much above 20,000 or so. Less important is "power on hours", but that can also give you an idea how long the library has been in service.
6) IMPORTANT: The "brand" (Dell, HP, IBM, etc.) on the library (and drive, though it doesn't have to match the library) controls how often (and even if) you can get firmware updates. You can't easily cross-flash one version of the library to another, so if you get a HP or IBM one, you can't get firmware updates from them without a service contract. There's a workaround for IBM units - Lenovo re-sells them and doesn't restrict their firmware downloads (at this time; that may change in the future).

and a lto6-drive/sled like this: IBM TS3100 / TS3200 Sled For LTO6 FH FC Module 45E2389 ( No Tape Drive Include ) | eBay

Nothing other else ? (that's is less than 500$ + shipping)
That's a bare sled that someone took the drive out of and sold as a standalone drive. You need a sled with a drive, substantially more expensive. Also note that that is a full-height fiber channel sled, not a SAS sled. Unless you have a FC network, a FC sled / drive is not useful to you.
They will be compatible ? How do I check the compatibility of library with sleds ? same brand ?
Or I'm missing something ?
All of these IBM-built libraries / drives are compatible. I am running a Dell chassis (one of the $295 new ones from a few years back) and an IBM drive. Other than needing to get firmware updates from 2 different places for the library and drive, it works fine. You can't use a non-IBM-built drive mechanism in the library, but as IBM is one of the biggest makers of LTO drives this doesn't limit your choices too much.
 

Blinky 42

Active Member
Aug 6, 2015
568
203
43
44
PA, USA
I've no experience with this, so it's not clear for me.
If I would like to get a tape library, you say that I need a library like this: IBM System Storage TS3200 Tape Library Chassis / Bandbibliothek - 3573 L4U | eBay

and a lto6-drive/sled like this: IBM TS3100 / TS3200 Sled For LTO6 FH FC Module 45E2389 ( No Tape Drive Include ) | eBay

Nothing other else ? (that's is less than 500$ + shipping)
They will be compatible ? How do I check the compatibility of library with sleds ? same brand ?
Or I'm missing something ?
That is just the library itself and a sled for a full-height tape drive. You would need to add an actual LTO drive to yield a working setup.
The specs for the library:
www.redbooks.ibm.com/technotes/tips1342.pdf

Edit: What a sled looks like with a drive in it (not recommending the listing but just the first one I found to compare) NEW DELL IBM LTO-7 SAS HH 38L7456 M3HCC Tape Drive for TL2000 and TL4000 | eBay The half-height drives have 1 of the blue screws on each side, the full-hegiht are 2x as high and have the sled with 2 screws on a side.
 
Last edited:

Angus

Member
Mar 3, 2015
36
4
8
39
What cheap program can be used to create in image before backing up to tape? Basically i have an LTO 6 and would like to use LTFS, but my issue is slow speed because I am backing up many small files/photos.

What i would like to get is to build an image and then read that so its fast but still write in LTFS... not sure if that is possible but i hope so..