bcache reliability for a NAS in 2024... any horror stories?

Can I trust bcache with my data?


  • Total voters
    4
  • Poll closed .
Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Gio

Member
Apr 8, 2017
96
18
8
37
I'm really interested in implementing bcache (much better performance than lvm2 in my tests) but keep finding horror stories/random comments on reddit calling this kernel module from 2012 as unstable. Then you find gems like these: Bcache - Awesome and terrible at the same time and I don't know what to think.

My plan will be to use writecache on a mdadm RAID1 of 1tb nvme disks for cache.
- Backing drives will be a pair of 10TB and 8TB (4 disks)

Also I am not interested in bcachefs until it proves itself more. Read this recently: Bcachefs, an introduction/exploration - blog.asleson.org

Please share your love (or horror) stories of using bcache as your logical volume manager and caching system?
 

nexox

Well-Known Member
May 3, 2023
1,523
734
113
It's been a long time since I tried bcache (turns out the single 200GB S3700 I could come up with in 2017 was a huge bottleneck to a handful of 7200 RPM SAS disks for any kind of sequential IO,) but as far as I can tell that first link is mostly about udev misconfiguration mounting bcache incorrectly. I imagine that in the intervening 4.5 years things have gotten better for Debian's udev rules and bcache's handling of those cases.

I'm following bcachefs development, and it looks nearly usable to me, if I had more time to configure my new fileserver hardware I would probably have switched already, but only with my current lvm/md array as a backup.

If you want to test out another alternative there's always dm-cache, I also tried it in 2017 and it had the same issues as bcache, but the documentation suggests they have improved since, at least for some of the workloads I was testing back then.

Either way make sure those NVMe drives have PLP, and you still need to use lvm with a filesystem, bcache doesn't replace either of those.
 
  • Like
Reactions: Gio

Gio

Member
Apr 8, 2017
96
18
8
37
If you want to test out another alternative there's always dm-cache, I also tried it in 2017 and it had the same issues as bcache, but the documentation suggests they have improved since, at least for some of the workloads I was testing back then.

Either way make sure those NVMe drives have PLP, and you still need to use lvm with a filesystem, bcache doesn't replace either of those.
Thanks for sharing. Does the order of where LVM goes in matter?

I'm currently experimenting with this setup and have some early benches I can share. The below is my actual setup and logical order of software configuration:

- Physical disks -> MDADM raid5 -> LVM2 volgroup -> /dev/bcache0 -> ZFS and Btrfs

I assume that the above setup makes sense but I guess I can setup bcache on top of MDMADM first, then throw /dev/bcache0 into LVM2 - but one the things I was playing with was replicating Synology SHR different sized disks pool with data redundancy.

Setup with ZFS screams (l2arc limited to 786mb): ugreen-nas/experiments-bench/mdadm-lvm2-bcache-zfs.md at main · TheLinuxGuy/ugreen-nas

Setup with Btrfs: ugreen-nas/experiments-bench/mdadm-lvm2-bcache-btrfs.md at main · TheLinuxGuy/ugreen-nas
 

nexox

Well-Known Member
May 3, 2023
1,523
734
113
Does the order of where LVM goes in matter?
I think that depends on whether you want the entire VG cached (put bcache under lvm,) or just a subset of LVs cached (put bcache on top of lvm and potentially use lvm to split your cache devices up if you want more than one LV cached.) dm-cache, on the other hand, is more tied into lvm, you add the ssd to the VG and then create a cache volume on it and configure a slow LV to use that.
 

nexox

Well-Known Member
May 3, 2023
1,523
734
113
I think the issue might be that the ZFS page says BTRFS at the top.
 

Gio

Member
Apr 8, 2017
96
18
8
37
If using zfs on top, what would be the advantage of this vs raidz1 + special vdev with small-block data? What's your expected data use pattern?
I have not tested special vdev devices. Not sure I will be doing that test.

Here is a comparison benchmark between running RAID1 natively on zpool vs. mdadm raid1 then zpool on top (negligible performance difference). ugreen-nas/experiments-bench/baseline-stats.md at main · TheLinuxGuy/ugreen-nas

My expected data usage pattern is a Plex Media server.
- I want my slow spinning hard drives to be powered down the vast majority of the time. (hd-idle takes care of it)
- I want any read requests that hit my spinning hard drives to be duplicated to a read cache (ssd/nvme)
- I want all writes to hit my nvme disks, then later flush to slower disks (ideally with a throttling algorithm, bcache can do it, lvmcache does not seem so unless you use dm-writecache but then you get no read cache)
 

gb00s

Well-Known Member
Jul 25, 2018
1,327
719
113
Poland
If its of any interest, I would prioritize Open CAS before Bcache and dm-cache as cache solution.
 
  • Like
Reactions: nexox

Gio

Member
Apr 8, 2017
96
18
8
37
If its of any interest, I would prioritize Open CAS before Bcache and dm-cache as cache solution.
I tested and benched Open-CAS... its a memory hog! It used 10GB of memory just to cache my 23TB array.

Open-CAS doesn't support a btrfs/zfs (officially). Benchmarks on anything other than ext4 were slower... the whole enchilida was documented at ugreen-nas/experiments-bench/opencas-cache.md at main · TheLinuxGuy/ugreen-nas

Go thru the other pages of my repository tree to compare benchmark results and make your own conclusions. IMO bcache seems to be most efficient and performant.
 

gea

Well-Known Member
Dec 31, 2010
3,490
1,372
113
DE
I have not tested special vdev devices. Not sure I will be doing that test.

Here is a comparison benchmark between running RAID1 natively on zpool vs. mdadm raid1 then zpool on top (negligible performance difference). ugreen-nas/experiments-bench/baseline-stats.md at main · TheLinuxGuy/ugreen-nas

My expected data usage pattern is a Plex Media server.
- I want my slow spinning hard drives to be powered down the vast majority of the time. (hd-idle takes care of it)
- I want any read requests that hit my spinning hard drives to be duplicated to a read cache (ssd/nvme)
- I want all writes to hit my nvme disks, then later flush to slower disks (ideally with a throttling algorithm, bcache can do it, lvmcache does not seem so unless you use dm-writecache but then you get no read cache)
Due my napp-it cs web-gui on Windows I integrated Windows Storage Spaces in the web-gui. Your usage pattern may be a perfect use case for Windows Data Tiering where you can force hot/current data or data of a special type to SSD and cold/old data to HDD. Move is automatically done on a daily base.

I have no longterm experience with Storage Spaces Tiering as I am a 100% ZFS user. A special vdev where you can force data based on recsize per filesystem to a dedicated vdev is a great solution, not a temporary cache but the final data destination based on physical data structures so maybe not the right solution for you.