Real filebased data tiering on ZFS

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

gea

Well-Known Member
Dec 31, 2010
3,183
1,199
113
DE
I'm currently working on the topic of data tiering.

With the special vdev, ZFS offers a very intelligent approach for hybrid pools made up of large but slow disks and expensive and fast SSD/NVMe. The basic idea with ZFS is: Particularly performance-critical data is stored on the fast special vdev due to its physical data structure (small io, metadata, Dedup tables), all other data is stored on the slow pool vdev.

The main advantage is that you don't have to set anything or copy data between the fast and slow vdevs. Just set and forget. This is a perfect approach for use cases with a lot of small, volatile data from many users (e.g. university mail server) and provides significant performance.

But for a normal Office or VM server this is practically quite useless. The classic tiering approach of storing data specifically in the fast or slower part of the pool would be more suitable. There is no support for this in Open-ZFS, but it could certainly be achieved, see https://illumos.topicbox.com/groups...on-a-special-vdev-and-rule-based-data-tiering

I I'm planning a Pool > Tiering menu in napp-it to make this more convenient. Until then, everyone can try it out manually.

sbb.PNG
 

gea

Well-Known Member
Dec 31, 2010
3,183
1,199
113
DE
About Data Tiering

Data Tiering is a method to split data between expensive/fast (special vdev, SSD/NVMe) tier-1 and cheaper/slow vdevs (disks) tier-2 of a hybrid data pool. Usually you want hot or performance critical data on the fast tier and older or uncritical data on the second tier.

With ZFS you can select between three tiering methods

1. Blockbased tiering based on physical data structures.
This is the default ZFS method when you use a hybrid ZFS pool with special vdevs. All data land on the normal slower vdevs beside performance critical small datablocks, metadata or dedup tables. This improves read/write access to otherwise slow datablocks. Main advantage is that this method is set and forget. On use cases with many small volatile files and many users (like a university mailserver) it helps to improve overall performance due fast access to metadata.

2. Tiering based on ZFS small datablock and recsize settings
If you set the small blocksize property of a ZFS filesystem, all files with a blocksize smaller/equal this value are stored on the special vdev, files with a larger blocksize are stored on regular vdevs. Blocksize of smaller files is reduced dynamically, recsize is the max blocksize value.

File=100k, small blocksize=128k and recsize=128k: file is always stored on the special vdevs (blocksize is always < 128k when filesize is < recsize)
File=200k, small blocksize=128k and recsize=128k: file is stored on the special vdevs (blocksize 128k)
File=300k, small blocksize=128k and recsize=256k: file is stored on the regular vdevs (blocksize 256k)
For a office filer where most files are small files, set small block size to 128k and most office documents are on tier-1 while VM or media files are on tier-2.

This behaviour is very helpful as you can create a filesystem for hot/performance critical data and another for other/cold/ archival without sbs set or a larger recsize. Based on recsize you can decide if only small files or all files are stored on the fast special vdev. If you change recsize, you can switch behaviour for further writes. This is the suggested tiering option on ZFS.

3. Tiering based on single files
This is the classic tiering approach. You can decide where files are stored or to move files between fast tier-1 and slow tier-2. This is
not supported by ZFS. You can tier files when you switch recsize setting below or above the small blocksize value, then rename the file
and copy it back again This results in a move between tier-1 (special vdev) and tier-2 (regular vdev) under the identical file path.
For a manual tiering set small blocksize and recsize in menu ZFS Filesystems. Every saved file (not moved) is then stored on the selected tier.
Smaller files with a dynamically reduced blocksize are always stored on the special vdev (makes sense as small files are usually slow on access).

Main problem: You can move a file between tiers but any file edit will store the file due Copy on Write with the then current recsize setting.
Manual tiering is therefor only an option if the default tier (recsize<=sbs) is the special vdev and you move cold/older files some time after last edit to tier-2. Hot/active data or small files is then always on tier-1, larger and older files on tier-2. This can be automated via jobs.

Main difference to Arc/L2Arc Caching
Caching works on read last/read most ZFS datablocks ex 128k not whole files.
A special vdev tier-1 is not a cache but the area where a whole file is stored to improve read and write.

Main advantage of file tiering
Full control if a larger file is on the fast or slow tier based on size and last edit time.

Main disdvantage of file/rulebased tiering
Moving data between tiers produces load and cost performance
Files are locked during tiering
More complicated than blockbased tiering
You must care about capacity in tier-1. When full data land on tier-2.
 
Last edited:

mrpasc

Well-Known Member
Jan 8, 2022
504
264
63
Munich, Germany
What happens if a „tier 1“ aka special vDev runs out of space? Does all new writes (including metadata and so on) just go to the „tier 2“ aka normal vDefs or will the pool become halted (like a „normal“ pool will do if filled up?
 

gea

Well-Known Member
Dec 31, 2010
3,183
1,199
113
DE
This is similar to a full pool when you add another vdev. All new writes go to the new vdev.
If a special vdev is full further writes go to the other regular vdevs.

I have collected some infos about ZFS hybridpools and data tiering in a pdf
www.napp-it.org/doc/downloads/hybrid_pools.pdf
 
Last edited: