Thoughts On My ZFS Setup

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

WhosTheBosch

Member
Dec 20, 2016
41
4
8
I'm going to be creating 4 types of ZFS data stores and I'm wondering if someone more experienced could let me know if these commands look good. I've also included the Proxmox storage commands to add it properly to the Proxmox system. Any tips or issues with the below commands would be appreciated!

Code:
#############################################
### ISO directory for ISO storage

# Create ZFS pool for ISOs on partition from OS disk (partitioned OS disk as it was 1TB)
zpool create -o ashift=12 -o autotrim=on -O atime=off -O acltype=posixacl -O compression=zstd-5 -O dnodesize=auto -O normalization=formD -O recordsize=1m -O relatime=on -O utf8only=on -O xattr=sa -m /data/sata0 sata0 DEVICE_ID-PARTITION

# Make ZFS dataset and the defaults should be taken from the pool
zfs create sata0/pve

# add directory to Proxmox - list as mountpoint to ensure it's mounted at startup
pvesm add dir isos --content iso --mkdir yes --is_mountpoint yes --path /data/sata0/pve

#############################################
### SSD storage for containers and VMs

# create ZFS pool on NVME SSD
zpool create -o ashift=12 -o autotrim=on -O atime=off -O acltype=posixacl -O compression=zstd-5 -O dnodesize=auto -O normalization=formD -O recordsize=1m -O relatime=on -O utf8only=on -O xattr=sa -m /data/nvme0 nvme0 DEVICE_ID

# Create ZFS datasets for containers
zfs create nvme0/cts0

# Create ZFS dataset for VMs - change recordsize to 64k as zvols will use that size
zfs create -o recordsize=64k nvme0/vms0

# add both container and VM storage to Proxmox - make sure the blocksize matches recordsize
pvesm add zfspool cts0 --blocksize 1M --content rootdir --pool nvme0/cts0
pvesm add zfspool vms0 --blocksize 64k --content images --pool nvme0/vms0

#############################################
### HDD storage for archived media i.e. pics, movies, etc

# create ZFS zraid pool on 4 HDDs
zpool create -o ashift=12 -O atime=off -O acltype=posixacl -O compression=zstd-5 -O dnodesize=auto -O normalization=formD -O recordsize=1m -O relatime=on -O utf8only=on -O xattr=sa -m /data/archive archive raidz DEVICE_ID1 DEVICE_ID2 DEVICE_ID3 DEVICE_ID4

# create ZFS dataset for documents with smaller recordsize
zfs create -o recordsize=256k archive/documents

# create ZFS datasets for movies
zfs create archive/movies

# create ZFS dataset for pictures
zfs create archive/pictures

# no proxmox storage types needed as I will mount it as NFS where needed for Linux / Windows / MacOS
 

Stephan

Well-Known Member
Apr 21, 2017
923
700
93
Germany
ashift=12 is ok for spinning rust but check page size of your NVME, sometimes they are 8k (ashift=13)

Do you really need ZVOLs or would QCOW2 files work ok? Consider ZVOL vs QCOW2 with KVM – JRS Systems: the blog and try to make hardware page size = zfs record size = qcow2 clustersize for amazing speedups.

Consider if aclinherit=passthrough makes sense for you.

Run zed daemon if proxmox has it default off and test email-function, e.g. by following https://www.reddit.com/r/zfs/comments/fb8utq/_/fj5b9ks
A ZFS scrub should be run once monthly if proxmox has no default. SMART short test every day makes sense for the HDDs. Make sure some sort of smartd is running to alert you of any problems.

For auto-snapshots, again if proxmox does not have it, I personally like GitHub - yboetz/pyznap: ZFS snapshot tool written in python. Even a simple "keep three previous dailys" will be nice in case a rm -rf runs berserk.

If you can, don't be like LTT (again) and lose data, employ and check backup.
 

WhosTheBosch

Member
Dec 20, 2016
41
4
8
ashift=12 is ok for spinning rust but check page size of your NVME, sometimes they are 8k (ashift=13)
Good point, I checked and they're all 4k and I set the logical sectors to be 4k as well:
Code:
root@pve:/data# fdisk -l /dev/nvme0n1
Disk /dev/nvme0n1: 1.75 TiB, 1920383410176 bytes, 468843606 sectors
Disk model: Seagate IronWolf510 ZP1920NM30001-2S9303
Units: sectors of 1 * 4096 = 4096 bytes
Sector size (logical/physical): 4096 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
root@pve:/data#
root@pve:/data# fdisk -l /dev/nvme1n1
Disk /dev/nvme1n1: 1.82 TiB, 2000398934016 bytes, 488378646 sectors
Disk model: INTEL SSDPE2KX020T8
Units: sectors of 1 * 4096 = 4096 bytes
Sector size (logical/physical): 4096 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
root@pve:/data#
root@pve:/data# fdisk -l /dev/nvme2n1
Disk /dev/nvme2n1: 3.64 TiB, 4000787030016 bytes, 976754646 sectors
Disk model: INTEL SSDPE2KX040T8
Units: sectors of 1 * 4096 = 4096 bytes
Sector size (logical/physical): 4096 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes

Do you really need ZVOLs or would QCOW2 files work ok? Consider ZVOL vs QCOW2 with KVM – JRS Systems: the blog and try to make hardware page size = zfs record size = qcow2 clustersize for amazing speedups.
This is one area I'm still confused on - mainly how zvols work. From what I remember Proxmox uses zvol by default and that is what they expect to be used. I'm not sure if switching to qcow2 would cause any problems or not. There was also limited discussion about that article on the PM forums. [0] I plan to do a follow up post with speed testing.

[0] How to adjust the qcow2 cluster size of existing images to drastically improve I/O performance?

Consider if aclinherit=passthrough makes sense for you.
Interesting, thanks. I'll read up on this. I don't think that it will be useful for the VM and CT datasets, but on the HDD datasets that might prove useful. I still have to determine what identity system I will run for permissions etc. I'm still not sure the specifics of getting NFS working properly [1][2].

[1] NFS/POSIX ACL support · Issue #170 · openzfs/zfs
[2] Implement NFS4 ACL support · Issue #4966 · openzfs/zfs

Run zed daemon if proxmox has it default off and test email-function, e.g. by following https://www.reddit.com/r/zfs/comments/fb8utq/_/fj5b9ks
Cool thanks, I'll put that onto my list of To Do's.

A ZFS scrub should be run once monthly if proxmox has no default. SMART short test every day makes sense for the HDDs. Make sure some sort of smartd is running to alert you of any problems.
Proxmox by default has scrubs running twice a month and trims running once a month. I might increase those frequencies after testing.
Code:
cat /etc/cron.d/zfsutils-linux
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin

# TRIM the first Sunday of every month.
24 0 1-7 * * root if [ $(date +\%w) -eq 0 ] && [ -x /usr/lib/zfs-linux/trim ]; then /usr/lib/zfs-linux/trim; fi

# Scrub the second Sunday of every month.
24 0 8-14 * * root if [ $(date +\%w) -eq 0 ] && [ -x /usr/lib/zfs-linux/scrub ]; then /usr/lib/zfs-linux/scrub; fi

For auto-snapshots, again if proxmox does not have it, I personally like GitHub - yboetz/pyznap: ZFS snapshot tool written in python. Even a simple "keep three previous dailys" will be nice in case a rm -rf runs berserk.
Thanks, I haven't experimented with ZFS snapshots yet so I will look into this.

If you can, don't be like LTT (again) and lose data, employ and check backup.
Haha yes my goal is to learn from other peoples mistakes!