Has anyone tried bcachefs yet?

Discussion in 'Linux Admins, Storage and Virtualization' started by voxadam, Apr 22, 2018.

  1. voxadam

    voxadam Member

    Joined:
    Apr 21, 2016
    Messages:
    105
    Likes Received:
    11
    Kent Overstreet, the original developer of the bcache block caching system for Linux, has been developing bcachefs which he refers to as "The COW filesystem for Linux that won't eat your data" for awhile now and I was curious if anyone has had a chance to give it a spin.

    I've been running Btrfs on my primary workstation for a number of years without incident, though, the fact that I've avoided RAID5/6 like the plague probably has a lot to do with my success. I've been meaning to upgrade my primary SSD and do a clean install for awhile now and it's tempting to give bcachefs a shot. As bcachefs is based on the widely deployed and tested bcache code I'm reasonably confident in its stability but in the event it does eat my data I have backups. The primary thing that's holding me back is the lack of support snapshots which Kent freely admits is "by far the most complex of the remaining features to implement" and there's nary a mention of it on the TODO which worries me a bit.

    Anyway, I was just wondering if anyone had experimented with bcahcefs yet.

    Feature status:

    • Full data checksumming

      Fully supported and enabled by default. We do need to implement scrubbing, once we've got replication and can take advantage of it.

    • Compression

      Not quite finished - it's safe to enable, but there's some work left related to copy GC before we can enable free space accounting based on compressed size: right now, enabling compression won't actually let you store any more data in your filesystem than if the data was uncompressed

    • Tiering/writeback caching:

      Bcachefs allows you to specify disks (or groups thereof) to be used for three categories of I/O: foreground, background, and promote. Foreground devices accept writes, whose data is copied to background devices asynchronously, and the hot subset of which is copied to the promote devices for performance.

      Basic caching functionality works, but it's not (yet) as configurable as bcache's caching (e.g. you can't specify writethrough caching).

    • Replication

      All the core functionality is complete, and it's getting close to usable: you can create a multi device filesystem with replication, and then while the filesystem is in use take one device offline without any loss of availability.

    • Encryption

      Whole filesystem AEAD style encryption (with ChaCha20 and Poly1305) is done and merged. I would suggest not relying on it for anything critical until the code has seen more outside review, though.

    • Snapshots

      Snapshot implementation has been started, but snapshots are by far the most complex of the remaining features to implement - it's going to be quite awhile before I can dedicate enough time to finishing them, but I'm very much looking forward to showing off what it'll be able to do.
    Known issues/caveats
    • Mount time

      We currently walk all metadata at mount time (multiple times, in fact) - on flash this shouldn't even be noticeable unless your filesystem is very large, but on rotating disk expect mount times to be slow. This will be addressed in the future - mount times will likely be the next big push after the next big batch of on disk format changes.

    homepage: bcachefs
    git: evilpiepirate.org/git/bcachefs.git
    Patreon: Kent Overstreet is creating bcachefs - a next generation Linux filesystem | Patreon
    mailing list: Majordomo Lists at VGER.KERNEL.ORG
    irc: irc://irc.oftc.net/#bcache
    The bcachefs filesystem [LWN.net] - 25 August 2015
    Bcachefs - encryption, fsck, and more - 15 March 2017
    A new bcachefs release [LWN.net] - 16 March 2017
     
    #1
    MiniKnight likes this.
  2. MiniKnight

    MiniKnight Well-Known Member

    Joined:
    Mar 30, 2012
    Messages:
    2,721
    Likes Received:
    760
    I'm not ready to use it yet. I strongly prefer upstream kernel support. This also gives me pause
    That isn't giving me confidence.

    As a project, the concept has promise.
     
    #2
    voxadam likes this.
  3. dandanio

    dandanio New Member

    Joined:
    Oct 10, 2017
    Messages:
    27
    Likes Received:
    6
    I have dabbled when I was looking for alternatives for ZFS. But since Red Hat abandoned it in their distro, I scrapped all thoughts of playing with it again anytime in the future. ZFS is it and Btrfs has nothing to offer to make it superior to it.
     
    #3
    JustinClift likes this.
  4. Joel

    Joel Active Member

    Joined:
    Jan 30, 2015
    Messages:
    692
    Likes Received:
    130
    From your description it sounds like it's basically trying to copy ZFS, so why not just use the original which is at least 15 years old and mostly mature?

    Of course for ultimate stability you'd want to stick with BSD OSes.
     
    #4
  5. dswartz

    dswartz Member

    Joined:
    Jul 14, 2011
    Messages:
    327
    Likes Received:
    24
    It's not really like ZFS at all. That said, it seems to share the weakness of a number of one-man projects I've seen in the past. The developer disappears for weeks at a time, incommunicado...
     
    #5
  6. SlickNetAaron

    SlickNetAaron Member

    Joined:
    Apr 30, 2016
    Messages:
    50
    Likes Received:
    12
    I’m hoping this takes off! The design, feature roadmap and performance look impressive on a cursory review
     
    #6
  7. dswartz

    dswartz Member

    Joined:
    Jul 14, 2011
    Messages:
    327
    Likes Received:
    24
    Yeah, that would be nice. I'm not holding my breath though. I've seen this WAY too many times before. One-man project. Too much stuff to do. Doesn't want to take others on board to help. Gets burned out and either disappears for weeks at a time, or just abandons the project (not saying he's done all of this, but enough warning signs to make me leery...)
     
    #7
  8. Joel

    Joel Active Member

    Joined:
    Jan 30, 2015
    Messages:
    692
    Likes Received:
    130
    Definitely not something I’d want in a file system, especially compared to a mature product like ZFS...
     
    #8
  9. _alex

    _alex Active Member

    Joined:
    Jan 28, 2016
    Messages:
    846
    Likes Received:
    88
    #9
  10. BackupProphet

    BackupProphet Active Member

    Joined:
    Jul 2, 2014
    Messages:
    687
    Likes Received:
    235
    Interesting, its not clear, but do compression work? And what compression about are there? LZ4 or ZSTD?
     
    #10
  11. EffrafaxOfWug

    EffrafaxOfWug Radioactive Member

    Joined:
    Feb 12, 2015
    Messages:
    669
    Likes Received:
    233
    According to bits I've read, compression is based on zstd/zstandard written by facebook (also used in btrfs amongst other utils), but I don't think it's actually functional yet.
     
    #11
  12. _alex

    _alex Active Member

    Joined:
    Jan 28, 2016
    Messages:
    846
    Likes Received:
    88
    as i got it, compression works but is currently useless because saved space isn't reported back as usable. guess you can choose the algo.

    anyway, the key is that some linux kernel devs involved in i/o, filesystems and such obviously consider bcachefs for upstream. guess there is still a huge work to be done until this happens, but looks promising for the future.
    guess in terms of stability and architectural flaws putting efforts towards it is maybe more efficient than with btrfs.

    Btw, also dm-cache seems to be on it's way in 4.18 :)
     
    #12
  13. BackupProphet

    BackupProphet Active Member

    Joined:
    Jul 2, 2014
    Messages:
    687
    Likes Received:
    235
    I just tested it now, there is a nice ppa for ubuntu that makes installation a breeze bcachefs testing archive : Chris Halse Rogers
    First impression, sync write performance is only 30-50% of ext4 or xfs. About the same as ZFS on Linux. Still over twice as fast as BTRFS which has REALLY slow sync write performance.

    Transparent compression with zstd is awesome, if it worked I could use this at once. I looks like impressive work so far, if I read the manual correctly you can assemble pools like you can with ZFS. That's really cool!

    bcachefs:
    Code:
    olav@sola:~$ sudo /usr/lib/postgresql/10/bin/pg_test_fsync -f /mnt/testfile
    5 seconds per test
    O_DIRECT supported on this platform for open_datasync and open_sync.
    
    Compare file sync methods using one 8kB write:
    (in wal_sync_method preference order, except fdatasync is Linux's default)
            open_datasync                      2571,695 ops/sec     389 usecs/op
            fdatasync                          2589,633 ops/sec     386 usecs/op
            fsync                              2666,934 ops/sec     375 usecs/op
            fsync_writethrough                              n/a
            open_sync                          2568,984 ops/sec     389 usecs/op
    
    Compare file sync methods using two 8kB writes:
    (in wal_sync_method preference order, except fdatasync is Linux's default)
            open_datasync                      1270,695 ops/sec     787 usecs/op
            fdatasync                          2064,706 ops/sec     484 usecs/op
            fsync                              2055,642 ops/sec     486 usecs/op
            fsync_writethrough                              n/a
            open_sync                          1214,583 ops/sec     823 usecs/op
    
    Compare open_sync with different write sizes:
    (This is designed to compare the cost of writing 16kB in different write
    open_sync sizes.)
             1 * 16kB open_sync write          2343,705 ops/sec     427 usecs/op
             2 *  8kB open_sync writes         1360,672 ops/sec     735 usecs/op
             4 *  4kB open_sync writes          799,095 ops/sec    1251 usecs/op
             8 *  2kB open_sync writes          541,918 ops/sec    1845 usecs/op
            16 *  1kB open_sync writes          272,126 ops/sec    3675 usecs/op
    
    Test if fsync on non-write file descriptor is honored:
    (If the times are similar, fsync() can sync data written on a different
    descriptor.)
            write, fsync, close                2823,150 ops/sec     354 usecs/op
            write, close, fsync                2958,514 ops/sec     338 usecs/op
    
    Non-sync'ed 8kB writes:
            write                            394881,852 ops/sec       3 usecs/op
    
    
    ext4
    Code:
    olav@sola:~$ sudo /usr/lib/postgresql/10/bin/pg_test_fsync -f /mnt/testfile
    5 seconds per test
    O_DIRECT supported on this platform for open_datasync and open_sync.
    
    Compare file sync methods using one 8kB write:
    (in wal_sync_method preference order, except fdatasync is Linux's default)
            open_datasync                      9489,370 ops/sec     105 usecs/op
            fdatasync                          9619,069 ops/sec     104 usecs/op
            fsync                              9581,304 ops/sec     104 usecs/op
            fsync_writethrough                              n/a
            open_sync                         10228,169 ops/sec      98 usecs/op
    
    Compare file sync methods using two 8kB writes:
    (in wal_sync_method preference order, except fdatasync is Linux's default)
            open_datasync                      5601,623 ops/sec     179 usecs/op
            fdatasync                          6110,553 ops/sec     164 usecs/op
            fsync                              5459,554 ops/sec     183 usecs/op
            fsync_writethrough                              n/a
            open_sync                          5023,706 ops/sec     199 usecs/op
    
    Compare open_sync with different write sizes:
    (This is designed to compare the cost of writing 16kB in different write
    open_sync sizes.)
             1 * 16kB open_sync write          5402,382 ops/sec     185 usecs/op
             2 *  8kB open_sync writes         4896,125 ops/sec     204 usecs/op
             4 *  4kB open_sync writes         2988,940 ops/sec     335 usecs/op
             8 *  2kB open_sync writes         1755,473 ops/sec     570 usecs/op
            16 *  1kB open_sync writes          970,653 ops/sec    1030 usecs/op
    
    Test if fsync on non-write file descriptor is honored:
    (If the times are similar, fsync() can sync data written on a different
    descriptor.)
            write, fsync, close                8060,057 ops/sec     124 usecs/op
            write, close, fsync                8937,642 ops/sec     112 usecs/op
    
    Non-sync'ed 8kB writes:
            write                            315500,895 ops/sec       3 usecs/op
    
    Done on a 80GB Intel 320
     
    #13
  14. _alex

    _alex Active Member

    Joined:
    Jan 28, 2016
    Messages:
    846
    Likes Received:
    88
    so, just single ssd with no spinner behind ?
    Here are the 'tunables', you should be able to choose between lz4, gzip and zstd per diskgroup

    IoTunables

    Until it works or as an alternative you could layer it with dm-vdo, and also enable dedup ...
    I had only minor problems building vdo on debian, mainly fixing some paths for python.
     
    #14
Similar Threads: anyone tried
Forum Title Date
Linux Admins, Storage and Virtualization U-NAS Storage OS - Anyone tried it? Oct 8, 2013
Linux Admins, Storage and Virtualization OpenNebula - anyone using? Feb 26, 2018
Linux Admins, Storage and Virtualization Proxmox SMART and NVMe - Anyone seeing huge write numbers? Apr 24, 2017
Linux Admins, Storage and Virtualization Proxmox ZFS pools under mergerFS, I wonder if anyone done that before ? Mar 17, 2017
Linux Admins, Storage and Virtualization Dockerized GCC? Has anyone done this yet? Mar 6, 2017

Share This Page