I have fallen in love with MooseFS

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

tjk

Active Member
Mar 3, 2013
424
154
43
www.servercentral.com
I've tested the Quobyte distributed FS too, which is based on XtreemFS, distributed metadata servers, tiering, etc.

A bit buggy on the setup, but awesome SDS for sure, but not cheap.
 

i386

Well-Known Member
Mar 18, 2016
3,779
1,327
113
34
Germany
Anything that does the traditional RAID 0, 1, 5, 6 (or any of the less common) ones, in a way that is more or less the same.
This would mean that zfs is also "traditional raid" since the math behind raid 5/raidz1 (https://blogs.oracle.com/solaris/post/understanding-raid-5-recovery-with-elementary-school-math) and raid 6/raidz2 (https://blogs.oracle.com/solaris/post/understanding-raid-6-with-junior-high-math) is the same. (Both blog posts are from a zfs developer from oracle)
 

Mithril

Active Member
Sep 13, 2019
327
99
28
This would mean that zfs is also "traditional raid" since the math behind raid 5/raidz1 (https://blogs.oracle.com/solaris/post/understanding-raid-5-recovery-with-elementary-school-math) and raid 6/raidz2 (https://blogs.oracle.com/solaris/post/understanding-raid-6-with-junior-high-math) is the same. (Both blog posts are from a zfs developer from oracle)
"In a way that's more or less the same", using the same parity math isn't where an issue would be (unless that parity math is flawed). And I will note that blog doesn't go over the *actual* math, but I think that's sort of a tangent to this anyways.

RaidZ doesn't have the same write-hole that most hardware/software raids do.

RaidZ isn't logically separated from the file system, while you can create a virtual block device touse any file system you want that's an abstraction and ZFS is still "under" that.

RaidZ rebuilds (resilver) only care about data (since it's linked to the file system, that awareness exists)

RaidZ doesn't lay out blocks on the disk in the same way (the exact difference is something I'd need to refresh my memory)

But, IMHO, the biggest difference is RaidZ/ZFS has a lower level of "trust" of the hardware, this is at the ZFS layer so even single disk setups have some protection. RAID with parity is *supposed* to have that, but many solutions silently *don't*.

Granted, these differences come with large tradeoffs in performance, hardware considerations, limitation to a single "native" file system, etc.
There *are* raid solutions out there that still work the way people believe they do, but you need the right disks and the right hardware/software solution or you could very easily have little or no additional protection VS a single disk. And in all cases, RAID is not a backup (nor RaidZ). It's at least a *little* easier to test Raid than it is to test ECC memory (write directly to a member disk, from another computer if you need to, test what happens when you read)

Relating to this thread, MooseFS is actually acting a bit like ZFS here (broadly speaking), maintaining checksums on files and claims to do re-reads from other copies if needed. Assuming MooseFS does what it says well, it would (IMHO) make raid5/6 more useful since it would solve at least some data integrity issues. What I don't know is if MooseFS can "rebuild" a file (say from two partially correct sources) or if it's just a checksum/hash/
 

UhClem

Active Member
Jun 26, 2012
337
175
43
NH, USA
... Since the error correction needs to be fast, it can't be too sophisticated so it's entirely possible to reconstruct an incorrect value on a read retry and most drives will pass along the first "correct" value for a sector. ...
Do you really believe this?
[You read it on the Web ? ...(so it must be true) :)]
 

Mithril

Active Member
Sep 13, 2019
327
99
28
Do you really believe this?
[You read it on the Web ? ...(so it must be true) :)]
It's possible spinning rust no longer uses Hamming code, sure. But it's likely they do as it's proven, fast and reliable within its limits. Which is 1 bit correction, 2 bit detection, and *can* fail (pass bad data) at3 or more bitflips.

Seems safe, sure. Except in modern HDD corrected ECC errors are just a way of life, a necessity of the density and cost demands. I've got a HDD under test right now passing (so far) with flying colors, no reallocated sectors but a handful of delayed ECC and *millions* of silent ECC corrections, fairly typical for a drive with ~30k hours. Most modern SMART utils don't even show corrected/silent ECC only failures and retries. AFAIK many SSDs are doing internal ECC and sometimes parity that is never even reported to the host. So it's rather obvious to me single bit errors are a constant (so to speak), double bit errors happen sometimes. I'm not willing to be 3+ bit errors are impossible, especially when it would be the biggest problem potentially such as when another drive is failing in an array.

Is this a "sky is falling" type thing? Obviously not. Can it happen? Absolutely.

I very, very, very much doubt any consumer (and likely most enterprise) drives do anything more than "check ECC, attempt re-read, report error on uncorrected double failure". The gap there is a 3+ failure that results in a passing ECC, which can happen.

Of course, it all comes down to how important any given bit of data is, I've run software raid-0 before because it wasn't the primary storage and affordable SSDs were not even something people imagined at the time so the performance was a win.

Food for thought, is the DRAM cache on your HDD or SSD ECC? I doubt it...
 

nabsltd

Active Member
Jan 26, 2022
204
121
43
Long before I'd ever heard of ZFS, I started keeping checksums of files. Whenever I copy that data to another disk, I verify the checksum matches. If it does not, I test the checksum against the original file.

In 15 years, I have never had a file where the "source" failed the checksum and the disk did not report an uncorrectable error. I've caught silent copy errors (where there were no problems with either disk, but somewhere in between some bits flipped), and had disks die on me, but not once has a disk that has said "yep, everything's OK" lied to me.

Note, too, that unless you use ZFS for everything, and only use ZFS replication for copying, ZFS does not protect from the sort of errors that having an external checksum does. Sure, odds are that no bits get flipped in RAM, or on the network, or over a bus, but you can never be 100% sure without testing.
 

UhClem

Active Member
Jun 26, 2012
337
175
43
NH, USA
It's possible spinning rust no longer uses Hamming code, sure. But it's likely they do as it's proven, fast and reliable within its limits. Which is 1 bit correction, 2 bit detection, and *can* fail (pass bad data) at3 or more bitflips. ...
"possible" ??? ... "no longer" ??? (Did it ever?)

Why would you make these assumptions?

Have you, maybe, conjoined the ECC implementation used for (current/modern) RAM [an 8-bit datum] with (any of) the ECC implementations used for (current/modern) HDDs [a 512/4096-byte datum]?
 
Last edited:
  • Like
Reactions: Mithril

Mithril

Active Member
Sep 13, 2019
327
99
28
"possible" ??? ... "no longer" ??? (Did it ever?)

Why would you make these assumptions?

Have you, maybe, conjoined the ECC implementation used for (current/modern) RAM [an 8-bit datum] with (any of) the ECC implementations used for (current/modern) HDDs [a 512/4096-byte datum]?
When you don't remember the name, google it and fail to notice they are actually talking about SSDs, lol. Yeah, totally the wrong name/method. Looks like it's a mix of Reed-Solomon which is much better (in general), and Low-density parity-check. Both are single pass, so unless the drive is comparing multiple (raw) reads it's possible to have an uncorrected error. How often depends on drive parameters. I'm actually less worried about that now with the refresher on the ECC used, so thanks!

I tried to refresh my memory and actually ended up the worse for it, doh!

I will say tho, that it's still good to have *some* extra checksum, either in filesystem such as ZFS or external as nabsltd mentioned as that protects reasonably both from silent bad reads, and silent failed/corrupted writes (such as non-sync writes, or consumer drives saying "yes I wrote that" before it actually finishes).

Honestly when you look at the raw ECC rates of modern spinning rust, it's quite impressive. The whole "we're going to constantly be getting errors so we just have to live with it and correct them" philosophy. Sort of like how with NAND storage, quantum tunneling is just.... a consideration that has to be made as a fact of life.

Edit: No sarcasm here, genuinely confused myself trying to refresh my memory so I wasn't confused. I still think hash/checksome besides what the drive does is a good idea.
 

zunder1990

Active Member
Nov 15, 2012
154
41
28
Relating to this thread, MooseFS is actually acting a bit like ZFS here (broadly speaking), maintaining checksums on files and claims to do re-reads from other copies if needed. Assuming MooseFS does what it says well, it would (IMHO) make raid5/6 more useful since it would solve at least some data integrity issues. What I don't know is if MooseFS can "rebuild" a file (say from two partially correct sources) or if it's just a checksum/hash/
Here is how it would work in my setup using the open source version. For this example the file is 60mb so it will fix in a single chuck(64mb) and I have min goal set to 2. There are two full copies of the chuck seating on different hard drives in different servers. During read of the file or normal checksum there is an error, moosefs will flag the error and read the file from the good chuck left, and serve that info to the client. It will also work and coping that good chuck to another server to bring the system back up to two good working copies of the chuck. For my really important folder (less than 1tb) of stuff like desktop backups and photos I ahve the min goal set to 3 so there are 3 good copies of that file in the moosefs system.
 

amalurk

Active Member
Dec 16, 2016
300
111
43
101
I ran MooseFS before for a file store it worked well enough. It is basically a CephFS equivalent that pre-dated CephFS and that hasn't really had any architecture improvements in a long long time just slow maintenance for many years. They promised multiple master or at least automatic master failover in the open source but have never delivered it or erasure encoding which are maybe available in the paid version but, they don't seem to really try to sell that either and the project hasn't seem much change in many years. So, I have my doubts about the paid version really being that much of an improvement of the open source. I have since moved to Ceph because I can do CephFS for file store and then RBD too for VMs. Just makes more sense to me than two software systems.
 
Last edited:
  • Like
Reactions: zunder1990

i386

Well-Known Member
Mar 18, 2016
3,779
1,327
113
34
Germany
Are you sure?
I get a "new" tag in the top right for new/unread posts and then usually skim over them:1682058508343.png
 

gb00s

Well-Known Member
Jul 25, 2018
919
388
63
Poland
.... It is basically a CephFS equivalent that pre-dated CephFS and that hasn't really had any architecture improvements in a long long time just slow maintenance for many years. They promised multiple master or at least automatic master failover in the open source but have never delivered it or erasure encoding which are maybe available in the paid version but, they don't seem to really try to sell that either and the project hasn't seem much change in many years. So, I have my doubts about the paid version really being that much of an improvement of the open source ....
Link: MooseFS
 

zunder1990

Active Member
Nov 15, 2012
154
41
28
I ran MooseFS before for a file store it worked well enough. It is basically a CephFS equivalent that pre-dated CephFS and that hasn't really had any architecture improvements in a long long time just slow maintenance for many years. They promised multiple master or at least automatic master failover in the open source but have never delivered it or erasure encoding which are maybe available in the paid version but, they don't seem to really try to sell that either and the project hasn't seem much change in many years. So, I have my doubts about the paid version really being that much of an improvement of the open source. I have since moved to Ceph because I can do CephFS for file store and then RBD too for VMs. Just makes more sense to me than two software systems.
I am still running moosefs for my file system at home and still enjoy using but I will also agree with you on most of the above. it does seem that new development was on features is very slow or not happening but the code for my use case has been stable. In the past few weeks I have moved my proxmox based VMs from local storage to ceph backed storage and overall it has been a nice experience. One thing I have noticed is ceph does not do well or cant run on SBC with low ram. I have a bunch of odroid hc2 and I could not make ceph stable on those as it kept running out of ram. Moosefs has no problem on the very same hardware. with moosefs running those boards are seating at 220md used out of 2gb ram.