Traditional RAID vs. ZFS/ReFS "summary"...

jcl333

Active Member
May 28, 2011
153
32
28
Hello all,

* By the way, I mention ReFS only because the technology is "similar" to ZFS, but I think it has a long way to go. Read this thread:
http://social.technet.microsoft.com.../thread/79ca6d6d-cab7-4ff3-8c17-ec6ce249e641/
It is certainly interesting, and the people testing it are using crappy hardware, but still I think ReFS is a few service packs short of a solution, but it could be something, someday.

I am a Windows admin by trade with a lot of RAID experience, I have very little Unix/Linux experience. I realize with the ZFS solutions available we are mostly talking about a GUI strapped on top of the OS anyways, I guess I have discomfort with any solution I don't have long-running experience with. RAID has been around for more than 20 years.

That being said, ZFS is pretty interesting. I think for me, part of my issue is just the amount of time I have to spare to research and play with it.

I have read a lot on this, I am not done yet, but let me see if I have it right so far:

Best options for ZFS today:
* Openindiana + napp-it
* Nexenta community edition
* FreeNAS
* Either run natively on hardware or virtualize with pass-thru under VMware

Advantages of ZFS over RAID
* Save cost of hardware RAID controller (although possibly still need HBA)
* Ability to use consumer drives without TLER problem
* Mix and match different drives, if you want to
* Protection against elusive long-term hard to detect data corruption
* Advanced features not normally available from RAID - such as deduplication (with sufficient resources)
* Some nice SSD acceleration capabilities only available on the most expensive hardware RAID controllers

Questions
* Has anyone actually experienced some of the data corruption that can go un-detected in RAID, or is it just faith in the principle and design?

I am thinking for the server I am about to build, go with one of the new Adaptec cards and an LSI HBA in the same machine. Pass them each through to a VM on top of VMware and give them each their own sets of disks. I will run Server 2012 on the RAID controller.

For the ZFS solution, I think I am leaning toward either Nexenta CE or FreeNAS because the interface seems to review better, but please clue me in if there is an overwhelming reason I should start with Openindiana instead.

Then I can experience it for myself, and one day if I really find that ZFS is for me, I can switch the RAID card over to HBA mode and off I go. And in the meantime, I could actually backup one array with the other as well.

Please let me know if I am missing anything, or if anyone has any suggestions or comments.

Thanks

-JCL
 

gea

Well-Known Member
Dec 31, 2010
2,485
837
113
DE
Questions
* Has anyone actually experienced some of the data corruption that can go un-detected in RAID, or is it just faith in the principle and design?

I am thinking for the server I am about to build, go with one of the new Adaptec cards and an LSI HBA in the same machine. Pass them each through to a VM on top of VMware and give them each their own sets of disks. I will run Server 2012 on the RAID controller.

For the ZFS solution, I think I am leaning toward either Nexenta CE or FreeNAS because the interface seems to review better, but please clue me in if there is an overwhelming reason I should start with Openindiana instead.

Then I can experience it for myself, and one day if I really find that ZFS is for me, I can switch the RAID card over to HBA mode and off I go. And in the meantime, I could actually backup one array with the other as well.

Please let me know if I am missing anything, or if anyone has any suggestions or comments.

Thanks

-JCL
If you use ZFS you see checksum errors from time to time that are fixed automatically.
Each of these errors are not detected and not fixed with conventional Raid

Forget any Adaptec controller.
With ZFS use always (i mean always) LSI HBA and compatibles (ex LSI 9211, IBM 1015 flashed to 9211) with IT firmware

Each flavour of ZFS has advantages.
If you only look for a NAS/SAN start with Illumos based systems
dev status: OI
stable status: OmniOS, Nexentastor CE (do not start with old V3, try NexentaStor CE V4 beta, only noncommercial use, max 18 TB RAW)

My personal favourite for new configs: OmniOS

Hardware-raid and ZFS is a nogo. You loose the nice self-healing feature of ZFS
 

zicoz

Member
Jan 7, 2011
140
0
16
Why shouldn't one use Adaptec for ZFS? And does that warning also count for their new models?
 

gea

Well-Known Member
Dec 31, 2010
2,485
837
113
DE
Why shouldn't one use Adaptec for ZFS? And does that warning also count for their new models?
Two rules
If you like to use ZFS you should avoid any Hardware Raid controller and use a pure HBA adapter without Raid functionality (IT mode)
Solaris is like Apple - not mainstream and not supported by any hardware. Only a few are well tested and supported (in older, current and future OS releases)

You should use whats used in a typical Oracle or Nexenta Box.
If you look for widely used HBAs in current commercial ZFS boxes you will discover that LSI HBA's are used in nearly 100% of configurations.
 

jcl333

Active Member
May 28, 2011
153
32
28
If you use ZFS you see checksum errors from time to time that are fixed automatically.
Each of these errors are not detected and not fixed with conventional Raid
How often would you say? According to the Wiki it would be approximately 1 error for every 67TB of data read.

Forget any Adaptec controller.
With ZFS use always (i mean always) LSI HBA and compatibles (ex LSI 9211, IBM 1015 flashed to 9211) with IT firmware
Seeing the child post, is this mainly because of driver availability and compatibility? Because I am sure that the Adaptec HBA mode is as good as any other controller, but that it might not work with the OS in question, yes?

Each flavour of ZFS has advantages.
If you only look for a NAS/SAN start with Illumos based systems
dev status: OI
stable status: OmniOS, Nexentastor CE (do not start with old V3, try NexentaStor CE V4 beta, only noncommercial use, max 18 TB RAW)
Yup, this is a major reason I have historically shied away from Linux in general... it can be hard to determine if you are chasing a niche or lame duck. It beta, it's ALPHA, it's pre-ALPHA, it's a pet-project, it's abandoned, there are no drivers, you have a bug or feature you need and its in the other distribution, the whole OS has been forked, Richard Stallman won't let you run it, etc.

On the other hand, if your only purpose is a dedicated ZFS storage server, this lessens the problem and virtualization can sooth some of the hardware issues.

I have been reading through the Wiki and other documents for zfs, nexentastor, openindiana, illumos, freenas, napp-it, and so on, and trying to determine what the safest bet would be here.

I have a temptation to look at FreeNAS just because I might have to deal with the underlying OS that much less and just treat it like an appliance, trying to see which one is more "fresh" - FreeNAS or NAS4Free. My hesitation with it might be that it's the FreeBSD port of ZFS and might take longer for the features you want to get to you, but they are at Zpool 28 now which has most of the features you want.

My personal favourite for new configs: OmniOS
Starting to read about this one. I agree with the child post, it would be great to know why you like it.

Hardware-raid and ZFS is a nogo. You loose the nice self-healing feature of ZFS
Right, I was not suggesting using a hardware RAID controller with ZFS unless it was in IT/HBA mode.

-JCL
 

gea

Well-Known Member
Dec 31, 2010
2,485
837
113
DE
How often would you say? According to the Wiki it would be approximately 1 error for every 67TB of data read.
Please tell this to my disks....
Disks errors detected by checksum errors due to various reasons is a not too rare problem especially on older disks
I have about one or two disk failures every month (I use about 120 disks in my systems).
Some are completely dead, most have various problems. A low-level format with a manufacturets tool fixes the problems very often
More important: they had read errors resulting in checksum errors resulting a degraded pool. Without ZFS i would not have been informed in such an early state. Smartvalues were mostly ok.

Seeing the child post, is this mainly because of driver availability and compatibility? Because I am sure that the Adaptec HBA mode is as good as any other controller, but that it might not work with the OS in question, yes?
It is not a question if Adaptec is good or not.
As you say it is a question if the Illumos developers care about on their tests. (short answer: no, not like they do with LSI)


.. it would be great to know why you like it. (OmniOS)
short answer:
OpenIndiana is the Nr.1 distribution, following all the ideas of OpenSolaris including desktop use.
I like it and will not abandon. My heart is there.

But I have to accept, that there are not enough developers to keep this mega project up to date.
This is where Omni can fill the gap. They are focussed to a minimal server OS with optionally commercial support, a stable release and a lot of bloody updates to newest Illumos improvements.
 

MagnusDredd

New Member
Jan 24, 2013
1
0
0
How often would you say? According to the Wiki it would be approximately 1 error for every 67TB of data read.
-JCL
"In theory, theory and practice are the same. In practice, they are not."
-- Lawrence Peter Berra

The problem with the number you're quoting is that it does not include disk surface errors. In theory when a sector on a hard drive platter is bad, the drive detects this and remaps the sector to one in the host protected area[1]. If my experience is any indication, in practice hard drives fail to catch bad sectors at FAR higher ratios in reality.

I work in IT for a large school district. Of the 900+ Dell computers that I support at work, I've found 40 to 50 drives which had at least 1 bad sector with uncorrectable errors[2] which hadn't been remapped. If the bad sector is located near enough to the beginning of the drive, re-imaging[3] fails when the re-imaging software tries writing to the bad sector.

I use a free tool called MHDD[4] to force the drive to remap these bad sectors. It's old, it's not pretty, and it only supports communication using the ATA protocol, but it works better than anything else I've found. There's a copy on the Ultimate Boot CD[5]. To use it on the newer Dells, it requires the hard drive controller to be changed from using AHCI or IRRT mode to either ATA, SATA, or Legacy mode (in that order, since what the correct mode is named varies by model of computer).

To be clear, a majority of the drives in question have been Western Digital Raptors. However, a few Samsung drives, as well as Seagate drives have had this issue. I've modified the Ultimate Boot CD to network boot[6] at work and I use it all the time, so I've scanned hundreds of drives with it over the years.

I deal with drives with uncorrectable errors in the following manner:
If the number of errors is low and the number of slow sectors is small:
1) I label the drive with the date and number of errors using masking tape and a sharpie.
2) I email myself with the name of the machine and the number of errors. (I only recently started doing this)
3) If the drive is in a staff machine, I move it to a machine which users can't save to the drive.
3a) While I care about corrupted user files, not corrupted installs. I can easily replace the install[3].
If there's between 10 and 100 uncorrectable errors, but the number of slow sectors isn't too large:
1-3) as above
4) I set the drive to scan using "remap" and "loop" and let it run for 16 to 24 hours.
4a) This will either result in no additional bad sectors being found, and the drive can be used or it will stress the drive enough to kill it.

The above process allows me to generally keep track of what's going on with these drives. Of the 50 or so drives that I've reconditioned using MHDD, many of them have been in service for a year or two with no additional issues detected (but of course Windows can't detect these issues).

Given the 3 corruption fixes that my small 3TB home ZFS-based file server has caught and repaired, combined with what I've seen at work, I no longer trust "Theory". In practice, my experience with corruption has been FAR worse, and I'm not willing to risk family photos, movies from the hospital when my daughter was born, 50GB+ of carefully tagged MP3s, etc...

When I've had issues with my array, the "rebuild" time has been minutes. It's not days or weeks to rebuild like a traditional RAID, which of course puts a great deal of stress on the drives in the array making catastrophic failure all the more likely. This is because it only fixes what's corrupt not the entire drive, free space included.

The data integrity features combined with snapshots, compression, double parity arrays (2 drives can fail), ZFS's version of RAID expansion[7], and OpenSolaris (what I use) support for Windows "Previous Versions" when connected to a CIFS share; I don't want to use anything else anymore.

[1]
http://en.wikipedia.org/wiki/Host_protected_area
[2]
http://www.datarecovery.net/articles/hard-drive-sector-damage.html
[3]
http://www.techterms.com/definition/reimage
http://www.symantec.com/deployment-solution
[4]
http://hddguru.com/software/2005.10.02-MHDD/
http://forum.hddguru.com/hdd-faq-t5.html
http://hddguru.com/software/2005.10.02-MHDD/mhdd_manual.en.html
https://www.youtube.com/watch?v=-1lsPdSQ_0U
[5]
http://www.ultimatebootcd.com/
[6]
http://en.wikipedia.org/wiki/Preboot_Execution_Environment
[7]
http://www.itsacon.net/computers/unix/growing-a-zfs-pool/
 
Last edited:

mixer

Member
Nov 26, 2011
92
0
6
Data corruption happens: I had music files stored on a 1.5 TB disk that sat in a drawer for 9 months or so then when I went to copy it to my ZFS NAS there were a few read errors, files which could not be read and were thus lost. There was no way to detect the error until I tried to read the file. I'm not sure if ZFS metadata could have helped the read (assuming the drive had no mirror), but maybe if the drive had been ZFS I could have plugged it in and scrubbed it every couple of months and perhaps the bits could have been corrected.

See also: http://jinx.de/zfs/hfsfailure.html (he's talking about Macs and HFS, but applies to most file systems I think)
 

hagak

Member
Oct 22, 2012
80
0
6
No one has mentioned one other major benefit of ZFS? COW (Copy-on-Write) is the method ZFS uses for writing files, this creates some very nice features.

Snapshots being the most obvious feature, reduce (basically no chance) of corrupted files. Also note this is where comparing ZFS to HW RAID gets confusing because HW RAID is not a filesystem and is a filesystem feature, but since ZFS is a filesystem you get this :). Now other filesystems have this feature that can run on HW RAID so...
 

MiniKnight

Well-Known Member
Mar 30, 2012
2,987
891
113
NYC
I think you just won for most well cited tech forum post ever.

"In theory, theory and practice are the same. In practice, they are not."
-- Lawrence Peter Berra

The problem with the number you're quoting is that it does not include disk surface errors. In theory when a sector on a hard drive platter is bad, the drive detects this and remaps the sector to one in the host protected area[1]. If my experience is any indication, in practice hard drives fail to catch bad sectors at FAR higher ratios in reality.

I work in IT for a large school district. Of the 900+ Dell computers that I support at work, I've found 40 to 50 drives which had at least 1 bad sector with uncorrectable errors[2] which hadn't been remapped. If the bad sector is located near enough to the beginning of the drive, re-imaging[3] fails when the re-imaging software tries writing to the bad sector.

I use a free tool called MHDD[4] to force the drive to remap these bad sectors. It's old, it's not pretty, and it only supports communication using the ATA protocol, but it works better than anything else I've found. There's a copy on the Ultimate Boot CD[5]. To use it on the newer Dells, it requires the hard drive controller to be changed from using AHCI or IRRT mode to either ATA, SATA, or Legacy mode (in that order, since what the correct mode is named varies by model of computer).

To be clear, a majority of the drives in question have been Western Digital Raptors. However, a few Samsung drives, as well as Seagate drives have had this issue. I've modified the Ultimate Boot CD to network boot[6] at work and I use it all the time, so I've scanned hundreds of drives with it over the years.

I deal with drives with uncorrectable errors in the following manner:
If the number of errors is low and the number of slow sectors is small:
1) I label the drive with the date and number of errors using masking tape and a sharpie.
2) I email myself with the name of the machine and the number of errors. (I only recently started doing this)
3) If the drive is in a staff machine, I move it to a machine which users can't save to the drive.
3a) While I care about corrupted user files, not corrupted installs. I can easily replace the install[3].
If there's between 10 and 100 uncorrectable errors, but the number of slow sectors isn't too large:
1-3) as above
4) I set the drive to scan using "remap" and "loop" and let it run for 16 to 24 hours.
4a) This will either result in no additional bad sectors being found, and the drive can be used or it will stress the drive enough to kill it.

The above process allows me to generally keep track of what's going on with these drives. Of the 50 or so drives that I've reconditioned using MHDD, many of them have been in service for a year or two with no additional issues detected (but of course Windows can't detect these issues).

Given the 3 corruption fixes that my small 3TB home ZFS-based file server has caught and repaired, combined with what I've seen at work, I no longer trust "Theory". In practice, my experience with corruption has been FAR worse, and I'm not willing to risk family photos, movies from the hospital when my daughter was born, 50GB+ of carefully tagged MP3s, etc...

When I've had issues with my array, the "rebuild" time has been minutes. It's not days or weeks to rebuild like a traditional RAID, which of course puts a great deal of stress on the drives in the array making catastrophic failure all the more likely. This is because it only fixes what's corrupt not the entire drive, free space included.

The data integrity features combined with snapshots, compression, double parity arrays (2 drives can fail), ZFS's version of RAID expansion[7], and OpenSolaris (what I use) support for Windows "Previous Versions" when connected to a CIFS share; I don't want to use anything else anymore.

[1]
http://en.wikipedia.org/wiki/Host_protected_area
[2]
http://www.datarecovery.net/articles/hard-drive-sector-damage.html
[3]
http://www.techterms.com/definition/reimage
http://www.symantec.com/deployment-solution
[4]
http://hddguru.com/software/2005.10.02-MHDD/
http://forum.hddguru.com/hdd-faq-t5.html
http://hddguru.com/software/2005.10.02-MHDD/mhdd_manual.en.html
https://www.youtube.com/watch?v=-1lsPdSQ_0U
[5]
http://www.ultimatebootcd.com/
[6]
http://en.wikipedia.org/wiki/Preboot_Execution_Environment
[7]
http://www.itsacon.net/computers/unix/growing-a-zfs-pool/