Hey all
I am in the process of upgrading the storage on my home file server, which runs Solaris 11.3 on an Ivy Bridge i5-3550 (H77 chipset) with 32GB of RAM, using a mixture of storage controllers : 8 ports from an LSI 9211-8i; 6 ports from the onboard H77 controller and up to another 8 ports available from 2 x 4-port Marvell 88SE9215 controllers in PCIE-x1 slots. (I would like to get a second LSI in the future, but would need to first change my mobo in order to use it - the H77 chipset only provides 4 lanes on the second PCIE-16 slot.)
I have 17 x 2TB + 1 x 3TB drives available, but my final array will be 16 x 2TB drives configured as 2 x 8-drive RAIDZ2, providing a total of 12 data drives and 4 parity.
My question relates to ashift, and what ashift I am best using for my two VDEVs, given that I - unfortunately - have a mix of sector sizes amongst my drives. I haven't been a professional sysadmin for several years and I'm rather out of touch with hardware. As such I only really learnt about ashift, and the impact of Advanced Format drives on ZFS, in the last week - after I had already bought a bunch of refurbished drives to expand my original 9 x 2TB RAIDZ2 into the planned 16 drive config. (Ironically, it turned out the drives I already had - purchased new in 2011 - were 4k, but most of the extra drives I bought this week were 512b!)
Of the 18 drives I have available, exactly half are native 512b sectors, with the other half being 4k drives that present as 512 ('512e')
However - and this part confuses me - of the nine 4k drives I have, seven result in an ashift of 9 when used in any VDEV, and two result in an ashift of 12. I tested this by making one pool for every type of drive and checking ashift with zdb -C.
The seven giving ashift=9 makes sense as they're 512e drives, but I don't quite follow why the other two give ashift=12 given they're also 512e. I'd have expected either all of them to be 12 (if Solaris can work out they're really 4k), or all 9 (if it can't.) But maybe there's some extra info that these two drives provide that others don't? Or maybe Solaris contains a configuration list of specific drives to use certain ashift values with - as I have since read that Illumos does with its configurable sd driver?
Of the 16 drives I actually plan to use, only one results in ashift=12. Therefore, if I just went ahead and created my zpool without further thought, I would end up with one VDEV with an ashift of 9 and the other with an ashift of 12. I've noticed that using this config results in slightly unbalanced data allocation across the two VDEVs (as monitored with zpool iostat -v)
However, I have found that I can manipulate the ashift. Solaris 11 doesn't provide a direct way to set ashift (unlike most/all other newer ZFS implementations, I've read). But I can force the ashift to be 9 by building the pool using a file instead of the ashift=12 drive, then switching in the real drive with zpool replace afterwards. Similarly, I can also force both VDEVs to be ashift=12 using one of my two spare drives which has that ashift value. Again, I can build the pool with this drive in place and then zpool replace in the drive I actually want. I've tested both of those methods and they seem to work OK.
Here's a summary of all the drives I have, including their sector size and the ashift value they result in when used in a pool:
This totals 18 drives, with the 16 I want in my final pool being all except the last two rows. Of the last two, the Seagate ST3000 is a 3TB drive that I will remove when I'm done testing, but may be useful in the meantime as the second ashift=12 drive I mentioned. The WD20EARS also won't be in the final pool but will likely be kept attached to be configured as a hot spare.
So here's my real question after all that blurb!: given this less-than-homogeneous mix of drives, what ashift value should I use? Should I go with the default value Solaris gives me - one VDEV of ashift=9, one of ashift=12? Or should I manipulate them to be both ashift=12, or both ashift=9?
My strong preference, assuming there's no big problem I am not aware of, would be ashift=9 for both, because it gives me 1.1TB extra capacity (21.4TB usable space versus 20.3TB), and then also uses less space for each file allocation and dataset. I have found that this allocation difference can occasionally be quite noticeable - I have a few datasets containing 100's of thousands of files, one of which consumes 40GB on an ashift=9 VDEV but 65GB on an ashift=12.
That said, my file server is primarily used for large media files, so datasets with lots of files will be the exception. Similarly, while extra space is always nice, 20.3TB should be more than enough space for the next few years. In other words, if there's some big gain to be had by going ashift=12, then the loss of space is not likely to impact me all that much or any time soon.
I did do some bonnie++ benchmarking, creating a 2 x 6-drive RAIDZ2 pool (12 total drives) both as ashift=9 and ashift=12. The ashift=9 pool came out 4% faster on sequential writes compared to the ashift=12 but 1% slower on sequential reads. My feeling is that these small differences are simply within the margin of error and therefore probably not significant (I have only benchmarked each pool once so far.) There was no major difference, to be sure.
Anyway, sorry for the length of all that! Any advice and further info would be much appreciated. My hope is that I can just use ashift=9, but I'd definitely like to hear if there's anything else I should consider (besides spending yet more money to standardise on using all 4k drives
)
TB
I am in the process of upgrading the storage on my home file server, which runs Solaris 11.3 on an Ivy Bridge i5-3550 (H77 chipset) with 32GB of RAM, using a mixture of storage controllers : 8 ports from an LSI 9211-8i; 6 ports from the onboard H77 controller and up to another 8 ports available from 2 x 4-port Marvell 88SE9215 controllers in PCIE-x1 slots. (I would like to get a second LSI in the future, but would need to first change my mobo in order to use it - the H77 chipset only provides 4 lanes on the second PCIE-16 slot.)
I have 17 x 2TB + 1 x 3TB drives available, but my final array will be 16 x 2TB drives configured as 2 x 8-drive RAIDZ2, providing a total of 12 data drives and 4 parity.
My question relates to ashift, and what ashift I am best using for my two VDEVs, given that I - unfortunately - have a mix of sector sizes amongst my drives. I haven't been a professional sysadmin for several years and I'm rather out of touch with hardware. As such I only really learnt about ashift, and the impact of Advanced Format drives on ZFS, in the last week - after I had already bought a bunch of refurbished drives to expand my original 9 x 2TB RAIDZ2 into the planned 16 drive config. (Ironically, it turned out the drives I already had - purchased new in 2011 - were 4k, but most of the extra drives I bought this week were 512b!)
Of the 18 drives I have available, exactly half are native 512b sectors, with the other half being 4k drives that present as 512 ('512e')
However - and this part confuses me - of the nine 4k drives I have, seven result in an ashift of 9 when used in any VDEV, and two result in an ashift of 12. I tested this by making one pool for every type of drive and checking ashift with zdb -C.
The seven giving ashift=9 makes sense as they're 512e drives, but I don't quite follow why the other two give ashift=12 given they're also 512e. I'd have expected either all of them to be 12 (if Solaris can work out they're really 4k), or all 9 (if it can't.) But maybe there's some extra info that these two drives provide that others don't? Or maybe Solaris contains a configuration list of specific drives to use certain ashift values with - as I have since read that Illumos does with its configurable sd driver?
Of the 16 drives I actually plan to use, only one results in ashift=12. Therefore, if I just went ahead and created my zpool without further thought, I would end up with one VDEV with an ashift of 9 and the other with an ashift of 12. I've noticed that using this config results in slightly unbalanced data allocation across the two VDEVs (as monitored with zpool iostat -v)
However, I have found that I can manipulate the ashift. Solaris 11 doesn't provide a direct way to set ashift (unlike most/all other newer ZFS implementations, I've read). But I can force the ashift to be 9 by building the pool using a file instead of the ashift=12 drive, then switching in the real drive with zpool replace afterwards. Similarly, I can also force both VDEVs to be ashift=12 using one of my two spare drives which has that ashift value. Again, I can build the pool with this drive in place and then zpool replace in the drive I actually want. I've tested both of those methods and they seem to work OK.
Here's a summary of all the drives I have, including their sector size and the ashift value they result in when used in a pool:
Code:
Qty | Drive | Sect Int-Ext| Ashift
7 | Samsung HD204UI | 4096 - 512e | 9
6 | Hitachi HUA72202 | 512 - 512 | 9
1 | WDC WD2003FYYS | 512 - 512 | 9
1 | Toshiba DT01ACA200 | 4096 - 512e | 12
1 | Hitachi HDS72202 | 512 - 512 | 9
1 | WDC WD20EARS-00MV | 4096 - 512e | 9
1 | Seagate ST3000DM003 | 4096 - 512e | 12
This totals 18 drives, with the 16 I want in my final pool being all except the last two rows. Of the last two, the Seagate ST3000 is a 3TB drive that I will remove when I'm done testing, but may be useful in the meantime as the second ashift=12 drive I mentioned. The WD20EARS also won't be in the final pool but will likely be kept attached to be configured as a hot spare.
So here's my real question after all that blurb!: given this less-than-homogeneous mix of drives, what ashift value should I use? Should I go with the default value Solaris gives me - one VDEV of ashift=9, one of ashift=12? Or should I manipulate them to be both ashift=12, or both ashift=9?
My strong preference, assuming there's no big problem I am not aware of, would be ashift=9 for both, because it gives me 1.1TB extra capacity (21.4TB usable space versus 20.3TB), and then also uses less space for each file allocation and dataset. I have found that this allocation difference can occasionally be quite noticeable - I have a few datasets containing 100's of thousands of files, one of which consumes 40GB on an ashift=9 VDEV but 65GB on an ashift=12.
That said, my file server is primarily used for large media files, so datasets with lots of files will be the exception. Similarly, while extra space is always nice, 20.3TB should be more than enough space for the next few years. In other words, if there's some big gain to be had by going ashift=12, then the loss of space is not likely to impact me all that much or any time soon.
I did do some bonnie++ benchmarking, creating a 2 x 6-drive RAIDZ2 pool (12 total drives) both as ashift=9 and ashift=12. The ashift=9 pool came out 4% faster on sequential writes compared to the ashift=12 but 1% slower on sequential reads. My feeling is that these small differences are simply within the margin of error and therefore probably not significant (I have only benchmarked each pool once so far.) There was no major difference, to be sure.
Anyway, sorry for the length of all that! Any advice and further info would be much appreciated. My hope is that I can just use ashift=9, but I'd definitely like to hear if there's anything else I should consider (besides spending yet more money to standardise on using all 4k drives
TB
Last edited: