storage caching thoughts

zunder1990

Member
Nov 15, 2012
82
12
8
I am planning on upgrading/replace my current storage setup with the goal of mainly reducing power usage. Currently most of data is video.
Current setup

Ubuntu 18.04
2x E5620 8c/16t
96gb ram
4x8tb in raidz
4x6tb in raidz
normal power usage 200-400watts

Proposed setup
2-4 nodes
ODROID-H2+ board
2x 14tb in mirror
1x nvme ssd
expect 20-25 watts per node
plan to power them off each night or when plex is not in use

Now for the question what filesystem
ZFS
I am very comfortable with zfs, limited config options for caching but real concern with zfs for the project is the lack of ecc ram. Is the lack of ecc ram a problem when doing mirror vdevs?

bcache

Many caching options, my concern is looks like the dev team as moved onto bcachefs with no new work on bcache.

bcachefs

Many caching options, but is ready for prime time?

btrfs

Does it support cache? best that I can tell it they want add the feature but it currently does not support it.
 

MBastian

Active Member
Jul 17, 2016
135
32
28
Düsseldorf, Germany
I use lvmcache. Pretty nice performance gain as long as your workload fits in the cache drive. Perfect for high IOPS virtual machine volumes.

A lack of ECC RAM is always a concern if you value your data integrity. IMHO not a reason to rule out ZFS.
 

ullbeking

Active Member
Jul 28, 2017
499
59
28
42
London
I am planning on upgrading/replace my current storage setup with the goal of mainly reducing power usage. Currently most of data is video.

Proposed setup
2-4 nodes
ODROID-H2+ board
2x 14tb in mirror
1x nvme ssd
expect 20-25 watts per node
plan to power them off each night or when plex is not in use
Those ODROID-H2+ boards look awesome, huh!!?!

Now for the question what filesystem
ZFS
I am very comfortable with zfs, limited config options for caching but real concern with zfs for the project is the lack of ecc ram. Is the lack of ecc ram a problem when doing mirror vdevs?
I just don't get the hype.

bcache

Many caching options, my concern is looks like the dev team as moved onto bcachefs with no new work on bcache.
bcache is a very mature project. If it does what you need, jump right in!

bcachefs

Many caching options, but is ready for prime time?
I would use it for prime time. I'm surprised it's not more popular. Probably because over-hyping things can kill them.

btrfs

Does it support cache? best that I can tell it they want add the feature but it currently does not support it.
There are experimental patches for caches, and I plan on experimental development of one sometime after I've learned its codebase. I want Btrfs to succeed.

LVM plus LVMCACHE is probably your best bet. Or bachefs, which not only tiers the storage in a good way, but is ready I think for "personal production use". I'm going to set up a server with bcachefs after I've set up a few others with different fs'es before that.
 

gea

Well-Known Member
Dec 31, 2010
2,649
908
113
DE
ZFS protects against most thinkable reasons of data corruption like power outage on writes (CopyonWrite to avoid a corrupted filesystem), silent data corruption (checksums with data self healing) and ransomware (read only snaps). Without ecc there is the exclusion corrupted data due a ram error. This is the same then with any other filesystem. There is also no special disadvantage of ZFS against other systems with a similar ram usage. When too many errors happens ex due ram problems, ZFS reacts with a pool or disk offline due too many errors. As ZFS has more options to check data, a bad outcome may even has a lower propability. If things go worst, a damaged pool or filesystem is possible on any filesystem solution due ram errors.

In general if you care about your data use all available protection technologies and this means ZFS, a raid level that allows one or more disk failures and ECC ram.

If your main use case is video, the ZFS rambased readcache is not really helpful. It does not cache sequential data or files but small random data and metadata. What may be helpful is a l2arc NVMe with read ahead enabled but do not expect too much of it.

In general a ZFS pool has no problems with several hundred MB/s sequential performance so even in a multiuser environment with concurrent reads a raid- Zn pool may be fast enough. If not, use multiple-mirrors.
 

EffrafaxOfWug

Radioactive Member
Feb 12, 2015
1,372
488
83
If you're already of a mind to power the server off when it's not in use, I still think you'd be better off sticking with more conventional x86 kit - especially when you can do things with the RTC wake timer (although that might exist in the odroid's BIOS as well I suppose).

Are you planning on making a storage cluster of the odroid nodes as opposed to having one big filesystem? Or were you planning to have different chunks of your dataset sitting on different standalone servers? Personally either approach seems rather complicated to me, especially since the majority of power saving would come from a) more modern/more conservative kit and b) being able to turn it off when not in use.

The first gen E5 platform was always a power hog so I suspect this is where a lot of your idle power consumption is coming from; modern platforms should consume far less if you don't need the raw CPU power. A modern xeon E3 or zen platform should be able to fit <20W without the discs and still allow you to use ECC RAM and without the complexity of managing multiple nodes. It's a bit long in the tooth now (as Intel haven't released anything better CPU-wise) but the A2SDi-8C+-HLN4F platform is excellent for low-power storage arrays.

FWIW I don't use ZFS at home myself but was happy with using LVM caching. Personally I'll always use ECC wherever it's a viable option (which is pretty much everywhere outside of laptops these days).
 

zunder1990

Member
Nov 15, 2012
82
12
8
If you're already of a mind to power the server off when it's not in use, I still think you'd be better off sticking with more conventional x86 kit - especially when you can do things with the RTC wake timer (although that might exist in the odroid's BIOS as well I suppose).

I have good luck with using power on after power loss and using a smart plug to toggle power on and off. The board that I am looking at is intel based and does have a normal bios and may already have RTC wake time on it.
Are you planning on making a storage cluster of the odroid nodes as opposed to having one big filesystem? Or were you planning to have different chunks of your dataset sitting on different standalone servers? Personally either approach seems rather complicated to me, especially since the majority of power saving would come from a) more modern/more conservative kit and b) being able to turn it off when not in use.
Right now I have one big large file system do to so many things and many things accessing the filesystem the disk have never had a chance to spin down. My thinking is to break my big file system into smaller nodes. This would allow be to power off the nodes that host plex video to power off say 11pm to 9am when no one is watching plex. I would keep all of the chucks of the dataset located on the same node so that I can power on and off nodes individually

The first gen E5 platform was always a power hog so I suspect this is where a lot of your idle power consumption is coming from; modern platforms should consume far less if you don't need the raw CPU power. A modern xeon E3 or zen platform should be able to fit <20W without the discs and still allow you to use ECC RAM and without the complexity of managing multiple nodes. It's a bit long in the tooth now (as Intel haven't released anything better CPU-wise) but the A2SDi-8C+-HLN4F platform is excellent for low-power storage arrays.
I will have to give that look.
 

EffrafaxOfWug

Radioactive Member
Feb 12, 2015
1,372
488
83
I have good luck with using power on after power loss and using a smart plug to toggle power on and off. The board that I am looking at is intel based and does have a normal bios and may already have RTC wake time on it.

Right now I have one big large file system do to so many things and many things accessing the filesystem the disk have never had a chance to spin down. My thinking is to break my big file system into smaller nodes. This would allow be to power off the nodes that host plex video to power off say 11pm to 9am when no one is watching plex. I would keep all of the chucks of the dataset located on the same node so that I can power on and off nodes individually

I will have to give that look.
Probably more of an aesthetic point than a technical one, but I always considered using rtcwake more elegant than forcibly flipping the power button. WOL from a central "control" server could also achieve the same thing if you wanted to bring things up before the rtc wake triggered. But personally I wouldn't want to deal with the complexity of managing multiple nodes if I could avoid it. There's a million ways to skin this particular cat though so if you don't mind the overhead (and potential changes to the WAF) it could make for an interesting experiment.

FWIW I'm using a Zen2 chip in a X470D4U "server-lite" board myself; it's idle power is higher than the atoms and I need to add an HBA but when I looked at least it was the best option available to me in the power vs. cost vs. efficiency equation (it does a fair amount of heavy lifting that my old Haswell E3 was struggling with).