Fed up with BRTFS Redhat to create new ZFS-like FS called Stratis

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Patrick

Administrator
Staff member
Dec 21, 2010
12,511
5,792
113
I know I am going to be shamed for this belief, but I do also think there is merit to looking at a ZFS alternative. I am also a happy ZFS user.

Here is my reasoning: a lot has changed in technology we have today and how infrastructure is constructed today versus 2005. ZFS was essentially created for disk based storage and not for clustered file system environments. 10GbE was just starting.

Do I think BTRFS or Stratis is the answer, perhaps.

My expectation is that at Flash Memory Summit we are going to hear a lot about NVMe over fabrics. 100Gbps networking is everywhere now. On some of the Skylake CPUs $155 per CPU gets you 100Gbps OPA which brings that to an affordable level. We actually have had an Intel OPA + Optane over fabrics demo in DemoEval since ISC17 using Broadwell.

I think there is merit to thinking about storage in how one can scale up and out using today's technology. Right now those worlds are completely different.

Likewise, if you look at tiering, automation as a whole is going up, but as we have memory class persistent storage, I do think our models will need to change. Who needs to add a SLOG when your RAM is persistent (you do not.)

The good news is that this is how technology gets built. Different groups have some good ideas and over time we get better software and models.
 

cactus

Moderator
Jan 25, 2011
830
75
28
CA
The good news is that this is how technology gets built. Different groups have some good ideas and over time we get better software and models.
It's also good when a company with enough resources to follow through with something as complex as a filesystem has good ideas.
 
  • Like
Reactions: Patrick

Evan

Well-Known Member
Jan 6, 2016
3,346
598
113
I think it's good we look to the future, probably going to take a long while to develop but if you don't start you never finish.
 

Patriot

Moderator
Apr 18, 2011
1,450
789
113
No that’s a really good point. Perhaps if the license was open we could have forked it and some users could bake in some SSD first features.
That is one of the reasons ZFS is hard to deploy as an enterprise service...
When it comes to licensing ... Oracle is not friendly.
 
  • Like
Reactions: Patrick

wildchild

Active Member
Feb 4, 2014
389
57
28
Might sound like an idiot, but why not just add to the openzfs movement, no oracle stuff there
Why is it that everytime some has to make the next new big thing, instead of improving what is there and proven ?

Everytime some company starts something like this, in the end i fails, dies or get overlicenced and dies
 

gigatexal

I'm here to learn
Nov 25, 2012
2,913
607
113
Portland, Oregon
alexandarnarayan.com
Might sound like an idiot, but why not just add to the openzfs movement, no oracle stuff there
Why is it that everytime some has to make the next new big thing, instead of improving what is there and proven ?

Everytime some company starts something like this, in the end i fails, dies or get overlicenced and dies
well RedHat's customers are buying RedHat's version of Linux and RH has the cash to make this work. I agree though, they could just fund the crap out of ZoL/OpenZFS and call it a day but oh well.
 

cactus

Moderator
Jan 25, 2011
830
75
28
CA
Might sound like an idiot, but why not just add to the openzfs movement, no oracle stuff there
Why is it that everytime some has to make the next new big thing, instead of improving what is there and proven ?

Everytime some company starts something like this, in the end i fails, dies or get overlicenced and dies
OpenZFS is a branch of the original Sun code. CDDL licensing and Sun copyright.
Code:
/*
* CDDL HEADER START
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License, Version 1.0 only
* (the "License"). You may not use this file except in compliance
* with the License.
*
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
* or http://www.opensolaris.org/os/licensing.
* See the License for the specific language governing permissions
* and limitations under the License.
*
* When distributing Covered Code, include this CDDL HEADER in each
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
* If applicable, add the following below this CDDL HEADER, with the
* fields enclosed by brackets "[]" replaced with your own identifying
* information: Portions Copyright [yyyy] [name of copyright owner]
*
* CDDL HEADER END
*/
/*
* Copyright 2005 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
/* Copyright (c) 1984, 1986, 1987, 1988, 1989 AT&T */
/* All Rights Reserved */
This is from a randomly selected file in the openzfs git repo. openzfs/acctcms.c at master · openzfs/openzfs · GitHub
 

i386

Well-Known Member
Mar 18, 2016
4,217
1,540
113
34
Germany
well RedHat's customers are buying RedHat's version of Linux and RH has the cash to make this work. I agree though, they could just fund the crap out of ZoL/OpenZFS and call it a day but oh well.
(Open)zfs uses the cddl license like @cactus posted, which is not compatible with the gnu gpl license. They can't ship redhat with zfs so putting effort (money) into zfs won't be profitable for them.
 

TuxDude

Well-Known Member
Sep 17, 2011
616
338
63
I'd have rather seen redhat put this money/effort into improving BTRFS. They make only a minor mention about it in the initial proposal, only really saying "Btrfs has no licensing issues, but after many years of work it still has significant technical issues that may never be resolved." No reason is given why Redhat couldn't help solve those technical issues, and they really should have had a section 2.2.x with a proper analysis of what it would have taken to build BTRFS into what they are looking for - instead BTRFS was not mentioned anywhere under section 2.

I've also got issues with everything under section 8.1, the known shortcomings.

As an overall opinion of what they're doing - it sounds to me like they want to apple-ify linux storage. They're focusing too much on ease-of-use and hiding implementation details from users, trying to rush this "new" technology into use ("new" in quotes because it's only a new layer of management abstraction on existing tech) instead of doing the hard work of actually fixing the technical issues in the proper solution to this issue, that being BTRFS.
 
  • Like
Reactions: gigatexal

cactus

Moderator
Jan 25, 2011
830
75
28
CA
I assume they see BTRFS's current headaches as fundamental design problems from early on that will require a lot of rewrite. Using XFS layered on DM and having an external management tool lets them leverage an existing FS and focus on the pool and data integrity part. It also gives RH control of the project. There are a lot of hands in the BTRFS cookie jar.
 

JDM

Member
Jun 25, 2016
44
22
8
33
Very interesting find, that document was a good read. I agree with @Patrick it would be good to have another alternative to ZFS (again a very happy, and active ZFS user). However I think it'll be an up hill battle in many respects.

Proving a file system is capable and reliable is no easy feat, and once developed it will definitely need some soak time before trusted for production. With the document *estimating* 1H 2018 for 1.0 and ZFS parity not arriving until 3.0 this would be a long ways off. At that time we run into another issue (this being the most important in my opinion) and that is drive size. It appears Strasis would stick to the RAID 0,1,5,6,10 concept and while it appears it will be elastic in growth/shrinking, it'll be using our historic parity methods. New VMFs need to be looking to erasure coding as that'll be important for reducing failure vulnerability. We'll be seeing this soon with draid for ZFS (I'm looking forward to this greatly, working on getting this test code deployed in my devel space).

Lastly, they seem to state there will be less/little focus on performance in the "data tier" and the solution to performance is to throw flash at the problem, which is fine for small to medium deployments, but could be less optimal for the large scale deployments depending on how they implement things exactly and how it performs. I work in HPC storage, so the large scale is important to me (as we plan and build triple digit petabyte file systems now, and even bigger in the future). Until flash gets to be at a cost level with spinning rust, classic HDD will play a big roll in parallel file systems. While burst buffers are a huge thing, how well this would fill this need isn't certain and how well the fast tier can drain to the data tier will be a big factor on its viability.
 
  • Like
Reactions: Nugget

Nugget

Member
Jul 13, 2017
32
25
18
Tejas Hill Country
keybase.io
Perhaps if the license was open we could have forked it and some users could bake in some SSD first features.
The CDDL license is plenty open. It's GNU's less open intent with the GPL that makes it incompatible. Notably there are plenty of projects (like the BSDs) that have no difficulty working with CDDL'd code all with more open licenses than GPL.

The GPL was designed specifically to be difficult to work with other licenses to serve as incentive for developers to choose the GPL. Sometimes that license judo is a hinderance though when the momentum is on the other side of the licensing.
 

whitey

Moderator
Jun 30, 2014
2,766
868
113
41
I think it's good we look to the future, probably going to take a long while to develop but if you don't start you never finish.
Yeah I believe a 2 year goal for filesystem dev is a bit aggressive but my best wishes to them. RH pays the bills in this household.
 
  • Like
Reactions: gigatexal

PigLover

Moderator
Jan 26, 2011
3,184
1,545
113
If RH does this id really like to see them usr the people and ideas they picked up buying Inktank (ceph) and Gluster and do something really novel. Build for erasure codes rather than Raid5/6, leave hooks for clustering and replication, tiering, and uses of NVMe over fabric and persistent DIMMs, etc.

You don't get any of that just trying to finish BTRFS.

Sent from my SM-G950U using Tapatalk
 

cheezehead

Active Member
Sep 23, 2012
723
175
43
Midwest, US
Production grade functionality from scratch in <24 months is a bit of stretch, but buying an existing solution that needs some re-branding and a some bug fixes (ie Permabit) should be pretty easy to do.