Reyhn's SOHO: Single or Dual RAID setup?

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Reyhn

New Member
Jul 29, 2017
5
1
3
44
Sweden
Hello!

This is my first post here, but I've been lurking this site for years, fantasizing about having my own network/servers. :) I'm considering building a SOHO setup, which includes a network setup as well as server setups. I have so many questions and ponderings, but I'll start with the storage needs.

I have two different storage needs:

1) Critical data
  • Data loss is unacceptable
  • Low data quantity
    For the foreseeable future, 2-3TB should be enough.
  • Many small files
    Measured in kB, not GB.
  • High availability
    I would like to replace my local hard drive with a network share.

2) Bulk data
  • Data is replaceable, but data loss should be avoided (but not at any cost)
  • Large data quantities
  • Large files
    Temporary storage of video editing materials, which amounts to many TBs.
  • Low availability
    Few concurrent users, low transfer speeds is OK.

The critical data is e.g. personal files etc. which are irreplacable, whereas the bulk data is more about plain volume and probably not very long-lived. I hope I made the distinction between my two needs clear.

(Note: I will have an off-site backup for the critical data, so this question is focused on the server design - not how to achieve absolute data reliability.)

Having no experience in setting up RAID and maintaining a file server, I did my research and came to a proposed solution:

  • Use two separate RAID arrays
  • For the critical data, ensure redundancy
  • For the bulk data, use storage space-biased RAID

I haven't decided on exact details yet, which is why I need help evaluating my options:

For the critical data:
  • RAID-1 with two identical disks.
    Seems cheap. No possibility to extend storage space, though.
  • RAID-10
    A reliable solution with performance, but at a cost due to many disks.
  • RAID-5
    Maybe insufficient reliability?
  • RAID-6
    Expensive, and maybe too much?
  • RAID-Z2
    FreeNAS supports it, and it appears to be very well thought of.

For the bulk data
  • RAID-5
  • RAID-6
    Will likely be very expensive, depending of the storage space required.
  • RAID-Z1

I would very much like your opinions on this setup.
Is this a stupid idea (two separate arrays) or am I on the right track?
Would it be better to just use one giant RAID-array for everything? What RAID-level would I use then?
 
Last edited:
  • Like
Reactions: gigatexal

i386

Well-Known Member
Mar 18, 2016
4,218
1,540
113
34
Germany
Just to be clear:
Raid 5 = raid z1, single parity
Raid 6 = raid z2, double parity
Raid is not a backup!

Is this a stupid idea (two separate arrays) or am I on the right track?
There is no definitve answer for that.
You wrote that data loss is not acceptable and about high availability. Whenever I read that combo I think of dual port sas drives, 2 hbas/raid controlers, ups, redundant servers etc. This is overkill for a soho setup and and would be very expensive.

Would it be better to just use one giant RAID-array for everything? What RAID-level would I use then?
I had the same problem a while ago and solved it for my case with a huge raid 6 (or raid z2/z3 with zfs) with 2 hot spares and maxcache (ssds as read & write cache to improve the performance).
 

Patrick

Administrator
Staff member
Dec 21, 2010
12,511
5,792
113
On the "Bulk data" part, my only advice is to stop thinking of a large quantity of data as easy to replace. I used to think similarly. The reality is that this data is valuable, and takes a long time to replace from backup sources.

With that said, investing in an online backup solution may be worthwhile.
 

Evan

Well-Known Member
Jan 6, 2016
3,346
598
113
For your critical data may consider investing and mirrored SSD for performance (especially those small files) and also since SSD is generally more reliable than spinning disks.

Make sure you have a media break backup, Maybe just some portable disks with full copy rotated offsite every month or whatever but don't rely completely on raid, too much chance of virus or something else leaving you with no data at all. Online backups could be a good option as the offsite choice.
 

Reyhn

New Member
Jul 29, 2017
5
1
3
44
Sweden
Just to be clear:
Raid 5 = raid z1, single parity
Raid 6 = raid z2, double parity
Raid is not a backup!
We're clear :)

You wrote that data loss is not acceptable and about high availability. Whenever I read that combo I think of dual port sas drives, 2 hbas/raid controlers, ups, redundant servers etc. This is overkill for a soho setup and and would be very expensive.
I definitely agree that what you described is overkill! :) I may have used the term "high availability" in the wrong sense. What I meant was more along the lines of low latency, maximum network speed (client PCs have 1GbE-network cards) and that the data is "always" available (i.e. no waiting for hard drives to spin up, or servers to boot etc).

I had the same problem a while ago and solved it for my case with a huge raid 6 (or raid z2/z3 with zfs) with 2 hot spares and maxcache (ssds as read & write cache to improve the performance).
Thank you! That is very interesting to hear!
Also, the cache is a great idea!

On the "Bulk data" part, my only advice is to stop thinking of a large quantity of data as easy to replace. I used to think similarly. The reality is that this data is valuable, and takes a long time to replace from backup sources.

With that said, investing in an online backup solution may be worthwhile.
Wise words that I'll listen to.
I was trying to highlight that my "bulk data" (for lack of a better word), while valuable, is not as valuable as the critical data. Losing one of my private files may or may not be a personal catastrophe, but losing some of the bulk data is not the end of the world.

For your critical data may consider investing and mirrored SSD for performance (especially those small files) and also since SSD is generally more reliable than spinning disks.
Really?! SSD is more reliable? I wouldn't have guessed. But then they must be more short-lived? I'm thinking that the physical media will be tormented to death by RAID, the same way a defrag-program would do to a SSD?

Make sure you have a media break backup, Maybe just some portable disks with full copy rotated offsite every month or whatever but don't rely completely on raid, too much chance of virus or something else leaving you with no data at all. Online backups could be a good option as the offsite choice.
+1 for mentioning viruses, which I hadn't considered!

You all mentioned backups.
I was from the beginning thinking to make regular and automated backups of the critical data to an online service. I think that would be feasible even though the volume might be 1-3TB of data.

However, backing up the bulk data as well could become troublesome if the volumes are large. That could literally take weeks to upload a single backup (in the worst case scenario). Maybe I need to rethink what I am trying to achieve...
 

Reyhn

New Member
Jul 29, 2017
5
1
3
44
Sweden
The more I think on this, the more uncertain I become of the best solution... :)

I now see four potential options:
  1. Dual RAID-arrays:
    Critical array: High parity, with focus on preventing data loss.
    Bulk array: Lower parity, with focus on cheap storage.
  2. Dual RAID-arrays:
    Critical array: A simpler array, like mirroring, just to have something, at least.
    Bulk array: High parity, with focus on both storage volume and data loss prevention.
  3. Single RAID-array:
    Critical data: Not RAIDed, but frequently backuped.
    Bulk array: As in option #2.
  4. Single mega-RAID-array:
    One array to rule them all.
    And selective backups.
In option # 1, the backup should rarely be needed.
In option # 2 and 3, I'm thinking that the backup will be the main data loss prevention mechanism, which will have to be done often to very often. It would matter less if the entire array is lost, because I would rely on a backup to be almost up to date.
Inspired by Patrick, I'm also emphasizing the value of the bulk data, just to make it easier when disaster strikes.
In option # 4, the RAID-level would become a balance between cost risk.

How do you reason when designing for two different storage scenarios?
 

Evan

Well-Known Member
Jan 6, 2016
3,346
598
113
ADR on SSD is say 10 times better, 0.35% vs 3.5%
NAND wear out does happen but only in rare examples like cache drives with the wrong type of drive. For just data and read optimized drive will suit your needs.

Online backups do what's called forever incremental which means an initial full backup and only ever changes after that so backup time is likely not an issue after first upload.
 

msg7086

Active Member
May 2, 2017
423
148
43
36
How about multiple servers and a distributed file system? Like GlusterFS or Ceph. It gives you a bit more flexibility and eliminate the case where the server itself goes down (or from cat peeing which actually happened before).
 

TType85

Active Member
Dec 22, 2014
630
193
43
Garden Grove, CA
In option # 1, the backup should rarely be needed.
For valuable data NEVER think this way that is how you loose it. Raid is not backup. Raid is availability. My non-replaceable data is backed up locally for a quick restore if needed and to the "cloud" as a just in case.
 

Connorise

Member
Mar 2, 2017
75
17
8
33
US. Cambridge
You wrote that data loss is not acceptable and about high availability. Whenever I read that combo I think of dual port sas drives, 2 hbas/raid controlers, ups, redundant servers etc. This is overkill for a soho setup and and would be very expensive.


I agree with that.

If you are looking for the redundant solution, that could cost the fortune, and I am talking not only about x2 RAID cards, motherboards, etc.

The high availability means you should be able to tolerate losing one server, and when it comes to either FT or HA, migrating should be with out downtime or with it, accordingly.

And in my opinion, the setup which I described is not-for typical SOHO setup and would be not efficient in terms of pricing/budget.