Freenas for nearline backup of san?

Perry

Member
Sep 22, 2016
58
6
8
49
A little background first: We are a small film restoration studio, typically dealing with files that are large. Very large. Either 250GB+ Quicktime files, or huge sets of image sequences (think 30MB/file x 130,000 for a single feature length film). Our current setup is a homegrown 250TB SAN consisting of a 40Gb ethernet network, a CentOS server running four 10-drive hardware RAID-6 pools that are served up as various sized iSCSI volumes to a TigerStore server running on Windows. It works well for us, has the performance we need, and has been great so far. Except for the occasional hardware failure. usually these are minor - slap in a replacement drive and let it rebuild the pool over a weekend and you're good. But we did have a RAID card fail at one point and while it would have been recoverable, it took too long to figure out the problem was the RAID controller, and some data was lost due to human error, before it could be properly backed up.

I am not interested in running FreeNAS on the main SAN or changing that system in any way. it works well, and we're extremely happy with it.

Anyway, I've got a friend who runs a university datacenter that just moved and upgraded/consolidated all their cluster computing and storage systems. He called me and asked if I wanted "some servers" that they were about to recycle. We had been talking about building a system to be a kind of live backup for the SAN, to just mirror it in case things go pear-shaped in the future. So I said sure I'll take them (not really knowing what I'd get), and a couple days later he shows up with a mini van full of Dell PowerEdge R515 servers with all the rackmount hardware. I'm thinking "Christmas!" -- and then we lugged it all up to the office and stacked it in the server room and after my back stopped hurting, I'm thinking - "what the hell am I going to do with these?"

Each one has 12x 4TB SAS drives currently in RAID 6, so somewhere around 40TB per unit. There are 11 servers. Most have 10GbE PCIe cards in them so we can easily integrate them into our existing network. While the 4TB drives in them provide more than enough aggregate storage to cover our SAN, that's a lot of machines to run and probably a lot of electricity. I'm thinking I might pare it down to 3-4 of these and start putting bigger drives in some of them. We use shucked WD EasyStores (8 and 10GB) in the SAN and they've been great. A quick look at the specs for the server says they work with SAS or SATA (but is that SAS *and* SATA or is it *or* -- as in, I can only put SAS drives in these machines)? I haven't even powered one of these up yet, let alonge taking one apart to see what's inside.

But next week we're doing year-end backups of things and I'll have a little free time, so I was thinking of messing with one or two to do some testing.

My friend's suggestion is to use Gluster for this system, but I have zero experience with it and not a ton of time to experiment with something new. I've played with FreeNAS in the past, but it wasn't suitable for our setup with the kinds of file sets we deal with, for our primary SAN. I'm thinking though, it might be good for this backup system, which just needs to quietly run in the background and performance isn't as much of an issue.

So the question is - where should I start? Is FreeNAS the way to go or should I be looking at something else? What about software to back up the files on the SAN in some automated way?
 
  • Like
Reactions: Samir

kapone

Well-Known Member
May 23, 2015
883
474
63
What about software to back up the files on the SAN in some automated way?
And that is the million $ question.

How will you backup the SAN? Synchronous/Asynchronous? Automated/Manual? Onsite/Offsite?

Forget the hardware you got (for free!) for a second and try and figure out the logistics first.
 
  • Like
Reactions: Samir

Perry

Member
Sep 22, 2016
58
6
8
49
I mean, that to me seems like the least important part of this! We're not a multi-user environment that's really pounding on the SAN daily. We tend to work on 2-3 projects at once, and there are three people in the office - two of them might be working on the SAN at once, but usually it's just one.

But I guess I don't really care how the backup happens. Honestly even if it's a simple scheduled incremental backup that only runs at night when nobody is here, we're good. We just want the server/cluster to be a bucket for the files on the SAN, as an insurance policy in case of disaster. Backups themselves could be run from anywhere - windows, mac, linux. I don't care if it's commercial software if it's not too expensive, but I'm sure there's something open source out there that would meet our needs. It would all be on-site. there's no way we're moving that much data over our crappy internet connection. It would need to be automated.
 
  • Like
Reactions: Samir

Perry

Member
Sep 22, 2016
58
6
8
49
When initially talking about this with my friend who provided the hardware, I was envisioning a single massive bucket with a directory for each folder on the SAN, and using rsync or something similar via cron to just sweep all the volumes and update them every day or two.
 
  • Like
Reactions: Samir

kapone

Well-Known Member
May 23, 2015
883
474
63
Does your TigerStore software have backup method(s)/Replication functionality?
 
  • Like
Reactions: Samir

gea

Well-Known Member
Dec 31, 2010
2,589
879
113
DE
Just because something does not cost money it does not mean that it is free as long as time and energy is a factor. Nearly a dozen of servers means a lot of machines that could fail and that need maintenance and energy.

From a pure storage point of view, you can build a cluster over them. Might give a huge performance and redundancy ex via a cluster filesystem or when using as an Amazon S3 compatible cluster. Remains the complexity. There are IT departments full of experts around only to manage such a beast. I would not suggest. If this should just work, keep it as simple as possible. I would use one of them as a simple backup system per video project where 40 TB should be enough and where you only power up the needed backup server.

Using a ZFS system for backup is not a bad idea as it gives superiour data security with data versioning over snaps. Using some sort of sync programm to keep your data and backup in sync is also a good idea (together with snaps to keep the former data state).

What will not work is using a raid-6 adapter for ZFS, at least it is a bad idea as you loose auto repair on detected checksum errors, one of the biggest ZFS features. You should at least replace the raid-6 adapter with a dumb HBA ex a LSI based one with a 2008, 2307 or 3008 chipset. You can get them cheap (used).

Remains the problem how to keep large data or many files in sync at a decent performance. Pure rsync is slow and any other sync method with SMB ex over SAMBA is also slow. Maybe a ZFS/kernelbased SMB on Solaris is a little faster but I would go a different path for a pure backup system of large or many files.

If you look around for commercial offers, Amazon S3 is the leading offer for high performance and high capacity storage. MiniO is a OpenSource inhouse option based on the same S3 protocol with a superiour performance. You can install it to any storage server. To keep your files in sync with an S3 bucket you can use one of the S3 backup tools for Linux, OSX or Solaris or rclone as a rsync replacement with a superiour performance.

To access data or to restore, you can also use rclone or simply SMB. To access snaps you can use Windows and "previous versions". At least on Solaris and its kernelbased SMB server this is always trouble free with zero config.

See my howto (for a Solaris based ZFS system) but the basics are always the same, http://www.napp-it.org/doc/downloads/cloudsync.pdf

There is also https://forums.servethehome.com/index.php?threads/amazon-s3-compatible-zfs-cloud-with-minio.27524/
 
  • Like
Reactions: Samir

Perry

Member
Sep 22, 2016
58
6
8
49
Does your TigerStore software have backup method(s)/Replication functionality?
No. TigerStore serves volumes over the 40Gb network (or fiber if we had a fiber channel network which we don’t) to Macs, windows and Linux workstations and servers, each seeing the mounted shares as if they were native disks. Permissions and locking are at the file level so it allows us to share volumes simultaneously. But most importantly it’s tuned for the kind of performance we need and it simply works, and works reliably. We’ve been using it daily for about 2 years now.

but no, it doesn’t have any kind of backup or replication built in.
 

Samir

Well-Known Member
Jul 21, 2017
1,479
470
83
46
So in the effort to use the kiss principle along with my relative ignorance on better ways to do it, here's what I would do in your shoes.

First, I would analyze the size of the daily data is 'new' on each of the 10 drive pools. Then I would figure out if I can transfer this overnight to an identical pool.

Next, I would prep 8 servers--each of them with CentOS running a hardware RAID-6 pool. The goal here is that each server should match the capacity of a single 10-drive pool on your SAN.

Then, the challenge is to find out the fastest way to nightly replicate each of the SAN pools to their respective servers. The goal here is to replicate everything new overnight so you have a backup of yesterday. But since you have 2x the number of pools you need, you alternate between one a set of 4 servers and the other set of 4 servers. Could be daily, could be weekly--whatever alternation you would like. And since these servers are only needed for one task at one time, you could set them to power on and then power off after the job. This gives you 2x backups that shouldn't use much power except when they're working.

You still have 3 servers left.

With the 2x remaining servers, you could again create pools, except use larger drives so backup 2 pools on each server. Put them in rotation with the other servers and now you have 4 days of backups, and potentially less power usage if powering on only 2 servers makes that much of a difference.

Keep 1 server spare for parts.

Depending on how much storage you want to pack into each server, once 20TB drives are available, you could literally have a single server backup all 4 pools nightly. And you could have each server only power up every 10 days and you'd literally have a rolling 10 day backup of everything. Now the drive cost alone of 120 drives at $500/ea is $60k, but I take it the work you are doing costs quite a bit per hour so downtime costs a hefty amount too.

The good thing about this solution is you can literally start with the hardware you have and then if it works, invest in the drives to scale back on power usage at whatever rate makes sense. And then as you feel comfortable, scale up with more servers in the backup pool to give you more days of backups. The great side effect of a bunch of days of 'snapshots' is that human errors can also be solved (like when files were deleted from the pool) by just going back a few days and retrieve the files off the appropriate server.

This is actually what I'm doing for backups, except my data set is a lot smaller and I'm just using basic consumer nas units. They power on when they are needed and are off otherwise. And yet should be good when I need them, and even if they don't, I've got others as I will have almost 14 days of backups. The good thing about shutting off a server is that it also can't be hit with a ransomware attack.