Intel NUC single SSD - best filesystem for Proxmox

leonroy

Member
Oct 6, 2015
64
9
8
39
Just picked up an Intel Coffee Lake NUC. I've ordered a single M.2 NVMe SSD (1TB Samsung 970 Evo Plus).

Wanted to run a few test VMs at home on it, nothing critical.

Would ZFS be the best filesystem to use or would XFS/ext4 be better?
 

fossxplorer

Active Member
Mar 17, 2016
483
76
28
Oslo, Norway
Congrats with the purchases :)
I'm not going into details, but in short ZFS is far superior volume manager and filesystem than the other you list. Ability to quickly take snapshots, do compression (can be quite handy some some type of data, e.g VMs) and check summing of data (and not just meta data) just to mention some of the benefits of ZFS.
Though you need to be aware of some issues of having ZFS with Proxmox on the boot disk using UEFI. There is a good write up here in the forums you'll find to fix that :)

EDIT1: i'd say if you can over come the initial challenge of ZFS/UEFI/PVE, you'll enjoy ZFS and learn a lot along the road unless ZFS is familiar to you already :)
 

leonroy

Member
Oct 6, 2015
64
9
8
39
Thanks @fossxplorer good advice - looking forward to playing with the NUC. I'm retiring a Nehalem Xeon based ESXi server. Can hardly believe how much processing power a NUC contains.

I'm messing around with Proxmox inside a VM at the moment and noticed it has RAID1 and RAID0 available for a single disk.

Don't suppose you know whether it makes a difference which one I pick?

Screenshot 2019-03-17 at 22.12.31.png Screenshot 2019-03-17 at 22.12.49.png
 

fossxplorer

Active Member
Mar 17, 2016
483
76
28
Oslo, Norway
I suppose we are talking about single disk RAID. I've not even tried RAID1 with single disk, so i'm not sure if that might work at all with a single disk device.
With RAID0 you get the option to choose a missing device as the second disk :)
 

vl1969

Active Member
Feb 5, 2014
611
68
28
I do not believe you can do a single disk raid1 unless you partition the disk in 2 and use partitions in raid.
However it makes no sense to do so. You get no benefits found mirror on single physical device. If you are just playing around testing setups an such, ok, but for a working setup this is not a good option to use. IMHO.
 

leonroy

Member
Oct 6, 2015
64
9
8
39
thanks all, there will be nightly backups to a Synology. It’s a home server so nothing critical.

The disk will be a single 1TB SSD. It will only have about 100-200GB of containers on it.

Do you know if ZFS on Linux’s lack of support for TRIM will be a problem?
 

arglebargle

H̸̖̅ȩ̸̐l̷̦͋l̴̰̈ỏ̶̱ ̸̢͋W̵͖̌ò̴͚r̴͇̀l̵̼͗d̷͕̈
Jul 15, 2018
656
233
43
Lack of TRIM shouldn't be a huge issue in the medium term. It's pretty likely that you'll be able to flip the trim support bit on that pool within the next year and a half (ZoL 0.8.0 is in the pre-release stage now and includes TRIM,) and I don't see you writing enough data to it in that time to trash the drive. If you wanted to reduce wear you could size the filesystem so that there's about 20% free space at the end of the drive, then the firmware will use that for garbage collection. It looks like you're already doing that by setting hdsize=80, IMO that's a fine mitigation until TRIM is supported.

You might, and I say might because I don't know if this is accurate, want to use ashift=13 on your Samsung drive. I think their 3d vnand uses 8kb pages, but I need someone more knowledgeable to confirm. I have a stack of 850 Pros over here and I've been researching this the last week or so.
 
  • Like
Reactions: leonroy

leonroy

Member
Oct 6, 2015
64
9
8
39
Heheh thanks @arglebargle, going from Intel S3700 drives for SLOG on a dedicated enterprise grade Supermicro server with a RAID 10 pool to this little NUC thing is definitely a learning experience.

Seems a single drive ZFS 'pool' should ideally have property `copies=2` set?

Some interesting analysis on the subject here suggests that whilst it's not anywhere as good as two separate devices, a single device with `copies=2` increases recovery from corruption enought to be worth doing with important data:
testing the resiliency of zfs set copies=n

Here's an interesting post by an Oracle guy with knowledge of their ZFS appliances:
ZFS caching with "consumer" ssds? : zfs

The 870 Evo is MLC V-NAND according to this though...
https://s3.ap-northeast-2.amazonaws...msung_NVMe_SSD_970_EVO_Data_Sheet_Rev.1.0.pdf

And TLC according to this...?
The Samsung 970 EVO Plus (250GB, 1TB) NVMe SSD Review: 96-Layer 3D NAND

Is there any downside to an ashift of 13 instead of 12 if the drive in fact has a 4192 byte page size?
 

arglebargle

H̸̖̅ȩ̸̐l̷̦͋l̴̰̈ỏ̶̱ ̸̢͋W̵͖̌ò̴͚r̴͇̀l̵̼͗d̷͕̈
Jul 15, 2018
656
233
43
The only downside is increasing slack space:
Used ashift=13 instead of ashift=12 for a pool. How much am I sacrificing with this? : zfs

You could use copies=2 if you wanted some level of data redundancy on a single drive. Personally I just back up stuff from single drive systems by sending snapshots to my NAS, which is then backed up to another zpool once every so often.

I'm honestly not sure about the page sizing on your drives (or mine) but they've hardcoded sector size for a number of drives that lie about their sector sizes and that includes most Samsung consumer SSDs: The zfs-discuss June 2014 Archive by thread

That list hasn't been updated for Samsung drives in ages though, I don't see the 850, *60 or *70 series mentioned at all. I'm building my pools with ashift=13 to be conservative, at worst it's a small fraction of wasted slack space and at best (if the pages are 8k) performance should be quite a bit better.

edit: Searching reddit for info on ashift and Samsung actually turned up a decent discussion thread that I posted in a month ago and had forgotten about: anyone know what the physical sector size / ashift value for a nvme disk (samsung 970 pro) should be : zfs

tl;dr - results are inconclusive on the 970 pro nvme, advice is: try them both and run benchmarks or just use ashift=13.
 
Last edited:
  • Like
Reactions: leonroy

efschu2

Member
Feb 14, 2019
68
12
8
I took your suggestion and did some research this weekend. Conclusion ZoL has much better performance on CentOS, but is still slow.

For comparison, same hardware, default settings ZoL 0.7.9, benchmarked with pg_test_fsync

Ubuntu 2200 iops
Debian 2000 iops
CentOS 8000 iops

FreeBSD 16000 iops

Ubuntu + XFS 34000 iops
Ubuntu + EXT4 32000 iops
Ubuntu + BcacheFS 14000 iops
For a single disk i would recommend EXT4 or XFS or at least use a patched (non-buggy) version of zfs (https://github.com/zfsonlinux/zfs/issues/7834). This bug should be gone on 0.7.11+ and proxmox is using zfs-0.7.13 but I would recommend to run a few i/o test to compare and decide what is best for you.
 
  • Like
Reactions: leonroy

leonroy

Member
Oct 6, 2015
64
9
8
39
heheh @efschu2 all this faffing around with ZFS does make a simple ext4 volume seem the obvious and simplest choice, especially if using it for a homelab.

I'm a tinkerer so I'll test all the scenarios. I'll setup Proxmox to boot off the SSD SATA bay which will allow me to mess around with different options on the NVMe disk without having to rebuild the system each time.

Bits and pieces arrive today so hopefully have an update shortly.

When I first messed around with ZFS (believe it was FreeNAS 8 back in 2013!) - I recall having to align the start of the partition for my SLOG SSD like so using `-b 2048`:

gpart add -t freebsd-zfs -b 2048 -a 4k -l log0 -s 8G ada0

@arglebargle do you know if SSDs still need to be aligned in gpart before usage?
 
Last edited:

arglebargle

H̸̖̅ȩ̸̐l̷̦͋l̴̰̈ỏ̶̱ ̸̢͋W̵͖̌ò̴͚r̴͇̀l̵̼͗d̷͕̈
Jul 15, 2018
656
233
43
Most OS installers just take care of alignment for you. The only time I've thought about it in the last couple of years was when manually laying out partitions on a drive and then I just aligned to the nearest 1MB boundary. I just let PVE take care of it for me when I installed my little Proxmox nodes.
 

leonroy

Member
Oct 6, 2015
64
9
8
39
So...

I did some testing using a Samsung EVO 250GB SSD.

First up:

ext4 - created using default partition tools on install:
  • read, MiB/s: 7.11
  • written, MiB/s: 4.74
zfs - ashift 9 - created using zpool create /dev/sda:
  • read, MiB/s: 4.08
  • written, MiB/s: 2.72


zfs - ashift 13 - created using zpool create -o ashift=13 evo /dev/sda:
  • read, MiB/s: 3.84
  • written, MiB/s: 2.56
Ran the results several times and they're consistent.

Any ideas why the performance is lower using ashift 13 vs ashift 9?
 

arglebargle

H̸̖̅ȩ̸̐l̷̦͋l̴̰̈ỏ̶̱ ̸̢͋W̵͖̌ò̴͚r̴͇̀l̵̼͗d̷͕̈
Jul 15, 2018
656
233
43
I'd ask over on Reddit at /r/zfs.

You're probably hitting the cache in ram though instead of the disk. We don't have O_DIRECT support on ZFS until 0.8.0 so it's difficult to accurately bench without running extremely large tests.