Seek Raid Opinions for ZFS Build

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

nemeas

New Member
Aug 15, 2011
8
0
1
Belgium
Hi,

I'm building an all-in-one (ESXi + OI) as many before me have. The system will have a Xeon E5-2620, 64GB RAM, 3 IBM M1015s that connect 16 WD REDs 3TB disks for data, 2 Plextor M3Ps 256GB for L2ARC and I'm still thinking about an SSD for the ZIL (Samsung 830 64GB).

It is designed to be a storage server for Blurays (TV series & movies) and lots of VMs. Especially the VMs performance needs to be top as I'm developing BI solutions which require High I/O and low latency. Hence the choice for lots of RAM (I'm thinking of giving OI at least 32GB) and L2ARC in the hopes to get the working set in the ARC / L2ARC and let the VMs running SQL feed of that.

The part I'm still pondering about is the RAID setup. The table below shows the thinking work I have already done for my options:

Raid LevelUsable StorageData DisksParity / Mirror DisksHot Spare
16 disk 1+021.8TB880
2x 8 disk Z327.2TB1060
3x 5 disk Z132.7TB1231
2x 8 disk Z232.7TB1240
1x 16 disk Z335.5TB1330

As you can tell, here comes the difficult trade-off between storage and performance. It's a home / lab machine so that's why the RAID-Z options are considered. Personally I'm hesitating between 3x RAID-Z1 or 2x RAID-Z2 which trade-off storage vs performance vs data integrity nicely.

But what are your thoughts? All input is appreciated!
 
Last edited:

sotech

Member
Jul 13, 2011
305
1
18
Australia
As a point - your disk numbers are sub-optimal for ZFS; there's some discussion of that over at [H], and sub.mesa wrote (either on here or on [H], I made a note of it and don't remember where from):


As i understand, the performance issues with 4K disks isn’t just partition alignment, but also an issue with RAID-Z’s variable stripe size.
RAID-Z basically works to spread the 128KiB recordsizie upon on its data disks. That would lead to a formula like:
128KiB / (nr_of_drives – parity_drives) = maximum (default) variable stripe size
Let’s do some examples:
3-disk RAID-Z = 128KiB / 2 = 64KiB = good
4-disk RAID-Z = 128KiB / 3 = ~43KiB = BAD!
5-disk RAID-Z = 128KiB / 4 = 32KiB = good
9-disk RAID-Z = 128KiB / 8 = 16KiB = good
4-disk RAID-Z2 = 128KiB / 2 = 64KiB = good
5-disk RAID-Z2 = 128KiB / 3 = ~43KiB = BAD!
6-disk RAID-Z2 = 128KiB / 4 = 32KiB = good
10-disk RAID-Z2 = 128KiB / 8 = 16KiB = good
add one to the raidz2 numbers to get the ideal raidz3 figures. I don't have hard numbers as to how much of a performance hit the sub-optimal numbers result in, though, as I've always stuck to them, but if you're chasing maximum performance I would keep that in mind.

Last week I had two HDDs die within 2.5 hours of each other - we use raidz2 as standard so our arrays can handle that, whereas raidz1 would have resulted in going to the offsite backups. Given that drives in the same array will probably have similar usage patterns and will probably be the same make/model and even perhaps batch I feel that those factors add up to make multiple drive failures in a short period more likely. I find raidz2 a sweet spot personally and we have yet to go to offsite backups, despite having ~8 HDD failures in the past 18 months (out of 24 drives).


How big are the VMs going to be? Or, perhaps, how small CAN they be? 2- or 3-way mirrors of SSDs are utterly blistering compared to spinning disks - our VM arrays are almost all made up of 2- or 3-way SSD mirrors and there's a huge performance difference between the VMs on those and the VMs on the raidz2 vdev hdd arrays.
 
Last edited:

apnar

Member
Mar 5, 2011
115
23
18
I have a similar all-in-one. I ended up using 4 SSDs in RAID 10 off a M1015 for my ESXi install. I keep most of my high IO VMs there. I also have a NFS mount back to my storage VM for VMs that require more storage space but aren't as high IO. I loose the ZFS features for my VM storage but with RAID 1 and ESX snapshots I'm reasonably covered.

As to the sub-optimal layout sotech references, I remember that thread well but I never could find the one I remember with the graphs. Anyway, the way I look at it is there is a bump in performance if you hit the divisor correctly. But there is also a bump in performance by adding a spindle and I recall they were reasonably close in the size of the bump. I think in your situation I'd go with 2x 8 disk Z2. If you want to align, grab 4 more disks and do 2x 10 disk Z2.

It sounds like you have two completely different use cases, Bluray rips (huge slow one-time reads that get no advantage from cache) and VMs (lots of small IO) and are trying to knock them both out with one solution. I suggest splitting up the problem. Use your large pool for Bluray with no L2ARC or ZIL and have a smaller fast pool of SSDs for VMs either on your storage VM or direct to ESXi (again no need for L2ARC or ZIL because it's all already SSD).
 

dba

Moderator
Feb 20, 2012
1,477
184
63
San Francisco Bay Area, California, USA
I've slowly moved to a model where my VMs are on SSD drives. I have a few VMs that require significant storage and for these I just add additional volumes that use non-SSD. How about one fast ZFS pool made up of SSD drives plus one (or more) slower pools using spinny drives plus a bit of SSD L2ARC/Zil?
 

dswartz

Active Member
Jul 14, 2011
610
79
28
If this is serving up data to VMs, I would go with a raid10 layout - with 8x2 mirror pool, your random write IOPS will be 8 spindles and your random read IOPs somewhere between 8 and 16 spindles. Any raidz* will have read/write of 1 spindle per vdev - that will kill VM performance.
 

nemeas

New Member
Aug 15, 2011
8
0
1
Belgium
After yesterday's great input I was starting to think in that direction as well dswartz. Right now I'm thinking a combination of 11 disk RAIDZ3 will yield +/- 22TB of optimal "slow" storage for the movies / tv series. The other 4 disks in RAID10 will yield 5.5 TB of faster storage and I can add SSD L2ARC to this for the VMs. That leaves 1 Global Hotspare to recover from a disk failure immediately.

Still, the disks aren't delivered yet so I can still change my mind a trillion times :).
 

apnar

Member
Mar 5, 2011
115
23
18
After yesterday's great input I was starting to think in that direction as well dswartz. Right now I'm thinking a combination of 11 disk RAIDZ3 will yield +/- 22TB of optimal "slow" storage for the movies / tv series. The other 4 disks in RAID10 will yield 5.5 TB of faster storage and I can add SSD L2ARC to this for the VMs. That leaves 1 Global Hotspare to recover from a disk failure immediately.
I'm not sure that buys you much of anything. The issue with RAIDZx and IO is you only really get 1 spindles worth of IO out of the entire vdev. With your suggestion you'll have 1 spindles worth of IO for Z3 and 2 (cached) spindles worth of IO for the RAID10. If you're only going to get 2 spindles worth of IO out of it for VMs you might as well go with 2x 8 disks RAIDZ2 vdevs in one pool with caching. That would also give you 2 cached spindles of IO for your VMs (and everything else) while also giving you better redundancy (can loose any two disks and keep your VMs as opposed to the RAID10 where 2 failures could be very bad).
 

dswartz

Active Member
Jul 14, 2011
610
79
28
that isn't right though

I'm not sure that buys you much of anything. The issue with RAIDZx and IO is you only really get 1 spindles worth of IO out of the entire vdev. With your suggestion you'll have 1 spindles worth of IO for Z3 and 2 (cached) spindles worth of IO for the RAID10. If you're only going to get 2 spindles worth of IO out of it for VMs you might as well go with 2x 8 disks RAIDZ2 vdevs in one pool with caching. That would also give you 2 cached spindles of IO for your VMs (and everything else) while also giving you better redundancy (can loose any two disks and keep your VMs as opposed to the RAID10 where 2 failures could be very bad).
4 drives in raid10 will give closer to 4 spindles not 2 for random reads.
 

apnar

Member
Mar 5, 2011
115
23
18
Ahh true, I forgot about the read benefit on the mirrors. So would be 4 for read and 2 for write.