TL;DR: Given a system where CPU may be a constraining factor, should I expect larger performance from mirrored vdevs or 3 disk raidz1?
---
After having run a smaller glusterfs cluster for some time, I am now upgrading hardware and after a lot of reading finally feeling ready to do my first ZFS setup.
The system will consist of 3 glusterfs servers (replica 2 arbiter 1), with the following specs on each:
My plan is to have 2 glusterfs volumes and this is what I'm thinking for the 2 storage servers (in glusterfs the arbiter does not hold actual data):
I am aware that the NVMe drive may become a bottleneck and that it's advisable to put the "special" dev on redundant storage - if that turns out to be an issue, I could still plug a 2xNVMe card into each storage server's PCIe slot and add additional NVMe drives at a later stage.
Now, I am having trouble settling on the number of disks and vdev layouts before purchasing.
Since I will have offsite backups and glusterfs provides additional redundancy, I am satisfied with any safety above 0, so the failure tradeoffs between the alternatives is not really any concern.
I have seen conflicting information on the performance characteristics between raidz1 and mirrored. The obvious answer is that mirroring is less CPU-taxing due to not having to calculate parity, but how much consideration should that really be? feels like the sweet spot for storage utilization for the data vdev, and mirroring (2x12TB) for media, but will I pay a significant performance tax for either? Especially this benchmark confuses me a lot in how the author gets significantly better performance from raidz1 compared to mirroring.
The Ryzen 3600 is a 6-core 12-thread CPU and these servers will be dedicated to storage, but considering the overheads of glusterfs, encryption (recently in stable ZFS for Linux), L2ARC/SLOG, scrubbing, and (optional, if it can be afforded) compression, I am unsure how to reason about it.
---
After having run a smaller glusterfs cluster for some time, I am now upgrading hardware and after a lot of reading finally feeling ready to do my first ZFS setup.
The system will consist of 3 glusterfs servers (replica 2 arbiter 1), with the following specs on each:
- Ryzen 3600
- 128 GB RAM
- 8 SATA ports
- 1 m.2 PCIe slot (to be populated by 2TB NVMe drive; used as both system drive and L2ARC/SLOG/special)
- 1 PCIe Gen4x16 slot (to be unused for now)
- 2 10GbE NICs
- Debian 11
My plan is to have 2 glusterfs volumes and this is what I'm thinking for the 2 storage servers (in glusterfs the arbiter does not hold actual data):
media
(larger files, mostly torrents and media for streaming; reads mostly sequential)- 2-3 3.5" mechanical 8-12TB Toshiba MN drives (should be non-SMR)
- L2ARC/SLOG/special: ~700GB partitions on NVMe drive
data
(everything else for hosted services; docker images, logs, config, databases, git repos; varied loads with lots of random access)- 4-6 2.5" 2TB SSDs (currently deciding between WD Blue, Seagate Barracuda 120, Crucial MX500. Assuming I still want drives with their own DRAM cache)
- L2ARC/SLOG/special: ~700GB partitions on NVMe
I am aware that the NVMe drive may become a bottleneck and that it's advisable to put the "special" dev on redundant storage - if that turns out to be an issue, I could still plug a 2xNVMe card into each storage server's PCIe slot and add additional NVMe drives at a later stage.
Now, I am having trouble settling on the number of disks and vdev layouts before purchasing.
- For the
media
volume, should I go with a 3x8TB raidz1, or 2x12TB mirrored, assuming I will have a single-vdev zpool? - For the
data
volume, 2 mirrored striped vdevs with 2x2TB each, or single-vdev raidz?
Since I will have offsite backups and glusterfs provides additional redundancy, I am satisfied with any safety above 0, so the failure tradeoffs between the alternatives is not really any concern.
I have seen conflicting information on the performance characteristics between raidz1 and mirrored. The obvious answer is that mirroring is less CPU-taxing due to not having to calculate parity, but how much consideration should that really be? feels like the sweet spot for storage utilization for the data vdev, and mirroring (2x12TB) for media, but will I pay a significant performance tax for either? Especially this benchmark confuses me a lot in how the author gets significantly better performance from raidz1 compared to mirroring.
The Ryzen 3600 is a 6-core 12-thread CPU and these servers will be dedicated to storage, but considering the overheads of glusterfs, encryption (recently in stable ZFS for Linux), L2ARC/SLOG, scrubbing, and (optional, if it can be afforded) compression, I am unsure how to reason about it.
Last edited: