ZFS vs. BTRFS: Safest Choice for Dummies?

casperghst42 · Oct 2, 2023

A possibly unpopular sugestion: mdadm + lvm2 - Synology is actually using mdadm for raid and put btrfs on top of it.

Sean Ho · Oct 2, 2023

I've used btrfs raid6 for an archival array for years, including replacing and adding disks, without data loss. The raid5/6 write hole issue is overblown if UPS/NUT and graceful shutdown are in use. However, I agree with all the concerns about the btrfs dev community playing a bit fast-and-loose with things; even FB doesn't use its parity raid much, so there's not a lot of impetus to improve it. I don't think I'd recommend it for "dummies".

Syno, QNAP, et al. basically use btrfs + LVM + mdadm; it's fine and works but isn't too flexible, and the branded hardware is crazy expensive. Xpenology et al aren't much better than just rolling your own on Ubuntu/Debian.

For a "set-and-forget" storage appliance, I'd probably recommend TrueNAS Core. ZFS is complex but battle-tested. It does not require lots of RAM; rather, ARC uses as much RAM as you give it, by design. But it can function perfectly fine on lower RAM systems.

reasonsandreasons · Oct 2, 2023

Yeah, in my experience TrueNAS Core with ZFS is basically bulletproof. It's a good front-end for managing a capable and stable ZFS-on-FreeBSD server. TrueNAS Scale is flashier and probably where things are going, but IMO it's not particularly attractive for pure storage workloads. TrueNAS Mini systems are a great choice if you need your company to purchase something for you, too.

As to the complexity of ZFS itself, my advice is to learn ZFS as ZFS, not by analogy to traditional RAID or other solutions. From that perspective it's not that hard. It is opinionated, but in most cases in the direction of keeping data safe. Here are some of the high-level points I found useful to understand:

The difference between vdevs and your pool is very important. vdevs are virtual devices with internal redundancy; your pool is where your data is stored. There is no redundancy on the pool level, only the vdev level--if any single vdev fails you will lose your pool.
You can always expand your pool by adding more vdevs, but you can't really expand vdevs. There are four different types of vdevs:
- Mirrors: Two (or three or four) disks that have the same content. 50% storage efficiency but best for speed.
- RAIDz1: A set of disks with one-disk redundancy. Don't make these wider than about five disks, and consider avoiding them entirely.
- RAIDz2: A set of disks with two-disk redundancy. For a four-disk setup just use mirrors. Rebuild times start to get long, if that's a concern.
- RAIDz3: A set of disks with three-disk redundancy. Don't make these wider than about nine disks. Rebuild times are very long here.
Unless you really, really know what you're doing, lean towards a pool of mirrors rather than RAIDz.
ZFS does a great job finding and repairing issues, but only if you have scrub tasks set up to run regularly.
ZFS uses RAM as a read cache in a transparent, don't-worry-about-it way. Don't drive yourself crazy trying to optimize this perfectly, as it's probably better at it than you are.
ZFS doesn't really offer write caching, but does offer many things that look a lot like it. If you don't have a very specific task to accelerate, do not bother.
ZFS doesn't require a ton of RAM or ECC, but they both help.
Spend some time reading these posts. They're really, really useful.

ericloewe · Oct 2, 2023

reasonsandreasons said:
Unless you really, really know what you're doing, lean towards a pool of mirrors rather than RAIDz.

That one is something I really disagree with. I don't mean to say that everyone should use RAIDZ, but that article is way too dismissive of RAIDZ (going from memory, it's not loading at the moment). In particular, for storing media - let's be realistic, in this context, any meaningful volume of data to be stored will be large files of some sort - the performance concerns are misplaced and wasted space due to small blocks is a non-issue.
The only major disadvantages left are that expansion is less-than-OCD-friendly, in that you probably want to expand with a similar vdev; and the lack of device removal the instant a RAIDZ vdev touches the pool.

Of course, any pool that's going to see greater-than-casual usage of small blocks (serious databases, VM images, storing millions of tiny files, ...) should absolutely not use RAIDZ because it'll likely end up being just as "inefficient" in terms of space as a mirror, with all the downsides of RAIDZ.

reasonsandreasons · Oct 2, 2023

From my recollection it doesn't actually touch on the inefficiency point--it's mostly an argument that a pool of mirrors is faster, easier to expand with similar vdevs, has more predictable rebuilds, and the performance of a degraded pool is better in the meantime.

While many of these points are less important in the media storage context, in my view the expansion argument becomes more important as media storage needs tend to grow over time. In a production context you can just plan for a full-array upgrade down the road, especially if you've sized the pool right to begin with, but for SOHO users the flexibility of a pool of mirrors is hard to beat. This might change if/when RAIDz expansion gets off the ground, though.

RAIDz does have its uses (my backup server is RAIDz1), but it's just harder to live with for most non-professionals. This is especially true if you have a limited budget.

ericloewe · Oct 3, 2023

I don't agree with the cost aspect, since a modicum of planning and forethought allows for something like a 6-wide RAIDZ2 vdev with 33% redundancy rather than 50%, plus the ability to lose any two disks. We can certainly invent a scenario where this doesn't work out in terms of cost, but it just does not seem realistic and widespread to me. How many users are both on so tight a budget they absolutely need to reduce the initial storage space and will be regularly adding pairs of disks for more storage?
Maybe I'm biased by having an understanding of RAID by the time I learned of ZFS, but I do believe that the added cognitive load of considering the use of RAIDZ, with the pros and cons it has, does not meaningfully degrade the user experience, and so it's best to just let people decide based on their needs and their scenario.

louie1961 · Oct 3, 2023

My reading of the OPs question was a concern over bit rot, and if either ZFS or BTRFS had an advantage in that area. I have seen no data that shows any difference in bit rot protection, scrubs, or snapshots between the two file systems. Is there any difference (leaving aside the RAID aspects/questions) say in a single disk or JBOD array not set up as a RAID?

Secondly, what are the must have use cases for a home lab enthusiast to have RAID (of what ever flavor)? Leaving aside the Youtube content creators and people running a business, what use case absolutely demands this level of resilience? I mean, who has a contractual service level penalty associated with their Plex server?

I assume we can all agree that RAID is not a replacement for a good backup strategy? I can't think of many (any?) use cases that really require it at home other than if you are just trying to learn the technology. I would love to hear what other folks think.

reasonsandreasons · Oct 3, 2023

I do wonder if the RAID discourse for home users has really kept up with how cheap large HDDs are these days. You can get a 22TB drive for like $400, and if you have a backup that might be good enough.

For me, the value in RAID is more the peace of mind it provides. My homelab, such as it is, is something I maintain both for fun and because it provides a useful service for people in my life, including my partner. I don't want to be the person who installs a bunch of smart lightbulbs and then tries to get everyone else in the house to never use the light switch again becuase a flaky app is better than a century-old technology. I really pride myself in making things like my Plex server Just Work, and having some level of redundancy and backup makes me confident that it'll continue to do so.

(Also, to your point @ericloewe, "both on so tight a budget they absolutely need to reduce the initial storage space and will be regularly adding pairs of disks for more storage" described me extremely well circa 2019 as a new college grad. It's much less true now, though, and I think I have gotten a bit didactic about the general principle in the meantime.)

ericloewe · Oct 3, 2023

louie1961 said:
My reading of the OPs question was a concern over bit rot, and if either ZFS or BTRFS had an advantage in that area.

Well, if the whole volume is a goner, a single bit out of place seems like a minor concern, that's the main gripe with btrfs.

louie1961 said:
Secondly, what are the must have use cases for a home lab enthusiast to have RAID (of what ever flavor)? Leaving aside the Youtube content creators and people running a business, what use case absolutely demands this level of resilience? (...)
I assume we can all agree that RAID is not a replacement for a good backup strategy?

Two points:

Proper RAID adds slices to the metaphorical stack of swiss cheese
It seriously cuts down on the effort needed to recover from failed disks (which are far from rare)

gea · Oct 5, 2023

louie1961 said:
My reading of the OPs question was a concern over bit rot, and if either ZFS or BTRFS had an advantage in that area. I have seen no data that shows any difference in bit rot protection, scrubs, or snapshots between the two file systems. Is there any difference (leaving aside the RAID aspects/questions) say in a single disk or JBOD array not set up as a RAID?

To detect bitrot. you need checksums on data and metadata. To be protected against a corrupt filesystem due a crash during write you need "Copy on Write". Btrfs and ZFS have them both while ZFS has some performance relevant extras like special vdev tiering, persistent ssd readcache and protection of the rambased writecache via slog.

Secondly, what are the must have use cases for a home lab enthusiast to have RAID (of what ever flavor)? Leaving aside the Youtube content creators and people running a business, what use case absolutely demands this level of resilience? I mean, who has a contractual service level penalty associated with their Plex server?

The enemy of data is statistics. With a low propability you have bitrot (data on disk is bad) or other readerrors that cannot be repaired by disks itself. The number of such errors is proportional to pool size, usage time and pool load. With a multi terabyte pool that you use or backup regularly you have errors within its usage time for sure. Checksums detect such errors on read or pool scrub but then? With btrfs or ZFS raid redundancy the error is repaired automatically. Without btrfs or ZFS raid redundancy you get informed about a bad file and must restore manually from backup.

I assume we can all agree that RAID is not a replacement for a good backup strategy? I can't think of many (any?) use cases that really require it at home other than if you are just trying to learn the technology. I would love to hear what other folks think.

It is not that easy.
On a disaster like fire, theft or amok hardware you need a more or less current disaster backup. This is a must and you can accept a restore time of several days or that backup is like old bread, always from yesterday or last week. If you want to be sure that backup data is valid you not only need checksums on backup data but also a backup method that checksum protects data during copy to backup like ZFS replication. ZFS replication is also a method to sync multi terabyte storage and backup even on a high load server down to a minute delay or over a low performance link.

But this is not the most propable case of a data loss. More often you delete or modify accidentally and want an undo or you was affected by a virus or ransomware. In such a case you want readonly snaps with a history like four in current hour, 24 in current day, 31 in current month etc. Even when the snaps only cost the space of modified datablocks, you need space what often means a higher raidlevel for lower costs what btrfs cannot offer. As you should avoid restore from backup as much as possible your first concern should be a reliable storage with a propability of a dataloss near to zero.

This means good hardware, ECC RAM, and a pool that allows a failure of disks, ideally any two disks like a Z2 or a 3way mirror. Minimum is a simple btrfs or ZFS mirror. A hardware raid or mdadm softwareraid combined with btrfs or ZFS does offer checksum protection without a repair option (no self healing)

Ever since I use ZFS with multi terabytes and hundreds of disks (around 15 years) I never needed a backup although I had two of them with daily replication syncs and a third offline on a different location for monthly backups.

casperghst42 · Oct 5, 2023

louie1961 said:
Secondly, what are the must have use cases for a home lab enthusiast to have RAID (of what ever flavor)? Leaving aside the Youtube content creators and people running a business, what use case absolutely demands this level of resilience? I mean, who has a contractual service level penalty associated with their Plex server? I assume we can all agree that RAID is not a replacement for a good backup strategy? I can't think of many (any?) use cases that really require it at home other than if you are just trying to learn the technology. I would love to hear what other folks think.

Some people like to collect data, in my case I like to keep my photos which goes back +20 years. And my music (ripped CDs). That is a very good reason for having raid(6) and offsite backup.

Raid is not backup, but it is a good thing if you loose a disk (or two).

gea · Oct 5, 2023

You know that raid-6 protects against up to two bad disks but does not protect against a corrupt filesystem or raid structure when the system crashes during a write, "Write hole" phenomenon in RAID5, RAID6, RAID1, and other arrays.

Copy on Write btrfs or ZFS filesystems combined with btrfs or ZFS software raid are not affected by the write hole problem.
And: How sure can you be that data or backup is good without checksums?

casperghst42 · Oct 5, 2023

gea said:
You know that raid-6 protects against up to two bad disks but does not protect against a corrupt filesystem or raid structure when the system crashes during a write, "Write hole" phenomenon in RAID5, RAID6, RAID1, and other arrays.

Copy on Write filesystems combined with btrfs or ZFS software raid are not affected by the write hole problem.
And: How sure can you be that data or backup is good without checksums?

I've been running a raid6 setup for +10 years and until now the only problem I've had is 2 dead drives (3 years apart). I have exclusively used it with ext4.

To my recollection I have never said that there was anything wrong with ZFS, I just don't see that there is a reason for me to use it.

How about his: The 'Hidden' Cost of Using ZFS for Your Home NAS

As I said, there is nothing wrong with ZFS nor RAID5/6, choose what you think is the best for you.

ericloewe · Oct 5, 2023

gea said:
Copy on Write btrfs (...) not affected by the write hole problem.

Except that it is, here's the quote straight from the horse's mouth:

The write hole problem. An unclean shutdown could leave a partially written stripe in a state where the some stripe ranges and the parity are from the old writes and some are new. The information which is which is not tracked. Write journal is not implemented. Alternatively a full read-modify-write would make sure that a full stripe is always written, avoiding the write hole completely, but performance in that case turned out to be too bad for use.

The RAID56 status intro is also a good one:

The RAID56 feature provides striping and parity over several devices, same as the traditional RAID5/6. There are some implementation and design deficiencies that make it unreliable for some corner cases and the feature should not be used in production, only for evaluation or testing. The power failure safety for metadata with RAID56 is not 100%.

Translated into practical reality, "RAID56 works fine except when it doesn't, and it's so terrible we really should just remove it outright."

louie1961 · Oct 5, 2023

With btrfs or ZFS raid redundancy the error is repaired automatically. Without btrfs or ZFS raid redundancy you get informed about a bad file and must restore manually from backup.

I think this is mostly true, but I believe and correct me if I am wrong, if you are doing regular snapshots, either file system can correct the error from a previous snapshot if the snapshot contains good data. Plus both file systems store redundant copies of the metadata so if the error is in the metadata, that can be corrected automatically. Am I understanding this correctly?

gea · Oct 5, 2023

louie1961 said:
I think this is mostly true, but I believe and correct me if I am wrong, if you are doing regular snapshots, either file system can correct the error from a previous snapshot if the snapshot contains good data. Plus both file systems store redundant copies of the metadata so if the error is in the metadata, that can be corrected automatically. Am I understanding this correctly?

You are wrong.
A ZFS snapshot does not contain duplicated redundant files like a backup. Due Copy on Write no ZFS datablock (4k-1M) with former data is modified or overwritten but always written newly. The former datablock is then either blocked by a snap or marked free. This means that every unmodified data is only there once. The snap allows a return to a former file state for modified data. It does not allow to regain the whole file without the current data state. For every snap version all data is there only once.

SDLeary · Oct 8, 2023

gea said:
This is not related to ZFS. On BSD and LInux you always use SAMBA that knows nothing about ZFS. Only difference are SAMBA settings and defaults by a distribution. ZFS related are only ZFS properties like aclmode or aclinherit that work on filesystem level not share level.

Only when you use a Solaris based ZFS system with the OS/kernel/ZFS based SMB server instead SAMBA, situation is different as this comes with Windows ntfs alike nfs4 ACL instead simpler Posix ACL, use shares as a strict ZFS property that allows ZFS snaps as Windows previous versions without special settings or considerations, can use local SMB groups that can contain groups and use Windows SID ex S-1-5-21-722635049- 2797886035-3977363046- 2147484648 as security reference instead simpler Unix uid like 101 with the then needed complex mappings on a backup system or after a filesystem replication.

So are you saying Illumos for the win here? ;-)

SDLeary

BlueFox · Oct 8, 2023

SDLeary said:
So are you saying Illumos for the win here? ;-)

They have a vested commercial interest in said software, so of course they'll tout it. They just never disclose that in their posts.

gea · Oct 9, 2023

The Solaris fork Illumos/OmniOS is OpenSource.
You can optionally use my napp-it ontop as a webbased management GUI, either the free version or the Pro with support

Quite similar to ESXi or TrueNAS

Samir · Oct 13, 2023

Not the conventional way of protecting data, but I would simply have it everywhere if it's only 6TB. Get some 8TB enterprise drives, some older 2-bay synology, qnap, whatever nas units, and put the drives in as jbod and copy the data to all of them. Put some of the nas units off site. Get more drives and put them in safety deposit boxes and other places. Compare the copies to each other and the original every year or so to check for any bit rot.

The benefit to this method is that each 'backup node' is essentially disposable. If a unit stops working, it's something you can worry about and work on later if need be since you still have your data. Because even nas units have multiple points of failure besides just the drives--hardware, raid controller, power supply, etc.

If you want data to be safe, the 3-2-1 rule is a good one. And if you do each step with multiple levels of redundancy (3 copies x2 = 6 copies, 2 locations x2 = 4 locations, etc), then you are bound to survive even catastrophic events with no data loss.

ZFS vs. BTRFS: Safest Choice for Dummies?

Member

seanho.com

Active Member

Active Member

Active Member

Active Member

Active Member

Active Member

Active Member

Well-Known Member

Member

Well-Known Member

Member

Active Member

Active Member

Well-Known Member

Member

Legendary Member Spam Hunter Extraordinaire

Well-Known Member

Post Liker and Deal Hunter Extraordinaire!