SSD and RAID - Worth it?

ColPanic · Dec 8, 2016

After reading the great article here about the reliability of used enterprise SSDs. With annual failure rates below .1% and lower than fans I'm questioning the wisdom of continuing RAID at all. Mirroring or RAID 10 just seems unecessary and as long as you have good backups JBOD should be totally fine. I guess the real risk is with striped arrays having the ability to survive a loose cable or accidentally pulling the wrong drive. So maybe raid 5 or raid z1 should be back. I'm doing some performance testing on how different raid levels perform and the results are not what I expected so far.

gea · Dec 9, 2016

Do you also buy a car without airbag or security belt as the propability of an accident is low?

The same with disks. Propability of a failure in the first years of a new disk or SSD can be quite low but there are disks like a special series of Seagate Constellations where failure rates are up to 40% per year. For the others there is only one for sure: they will die sooner or later.

If your data is in any way valuable you must care about data security. This means in any case at least disaster backup. But with unverified backups without checksums you cannot be sure about validity. Backups are also like a newspaper or a bread - always from yesterday.

So Raid that allows one or more disks to fail is the most important method for data security, followed by a crash resistent CopyOnWrite filesystem with checksums like btrfs, ReFS or ZFS for validity, followed by readonly versioning of data on your prime storage with Snaps, followed by a disaster backup for the worst case on a different physical location. You need all of them if you really care about data security

vl1969 · Dec 9, 2016

I was under impression that you use riad not for data security but mostly for uptime.
it is not about having your data safe and secure but having an uninterruptable access to said data as needed.
just think about what it worth to you, having server failed and spending some time on rebuild and recover OR
having a disk fail and just pop the replacement in and be done.
what the better question should be is , is it worth to have a hardware raid to day or software raid.
and that is totally depends on the infrastructure you run.

if you run a Linux shop, I think moving off the hardware raid onto proper software raid setup is a good idea.
I can not yet comment on Windows based configuration,like sharedspaces etc. , as I never used one
but I have been playing with Linux raid a bit for the last 3 years and I like it alot.
you get all the benefits of hardware raid, an uptime, a single volume on JBOD set, a data security if all is setup properly,
but step around the downsides of hardware raid like no control on how things is done under the hood, hardware dependency, hardware limitations of the controller.
and some software raid system even give you ability to use mixed drives in the set as well. for home use it is a God send. how many of us can afford to get a bunch of same drives at the same time?

the file systems like BTRFS and ZFS have a built in raid capability, so you don't even need an extra software
and this FS also offer a snapshot and even backup options, built in as well.

i386 · Dec 9, 2016

vl1969 said:
I was under impression that you use riad not for data security but mostly for uptime.

That's correct. Raid is not a backup!

vl1969 · Dec 9, 2016

i386 said:
That's correct. Raid is not a backup!

well I have used it as both for a bit,
but after I had 3 hard drives crash at the same time, had to rethink my setup

ColPanic · Dec 9, 2016

vl1969 said:
I was under impression that you use riad not for data security but mostly for uptime.
it is not about having your data safe and secure but having an uninterruptable access to said data as needed.
just think about what it worth to you, having server failed and spending some time on rebuild and recover OR
having a disk fail and just pop the replacement in and be done.
what the better question should be is , is it worth to have a hardware raid to day or software raid.
and that is totally depends on the infrastructure you run.
.

Thats what I was getting at. RAID came about because spinning rust hard drives were (are) the least reliable part of a server. You could count on one being the first part to fail so you had to have a redundancy system in place to maintain uptime. With SSDs that no longer seems to be the case - their reliability is up there with the other components. There may also be a case to be made for doing away with striped arrays - we did that to get more IOPs out of HDDs, but SSDs are so much faster, JBOD pooling makes a lot of sense. Especially if no data is lost if a disk is temporarily taken offline.

Patriot · Dec 9, 2016

Just a casual reminder, raid is not a backup, it is simply meant to keep a server up despite disk failure.
If you have other methods of redundancy in a cluster it is not required.

Reasons for SSD raid would be for enhanced performance if a single drive iops is not enough. However, for VMs it is often nice to have JBOD SSDs to remove noisy neighbor problems.

vl1969 · Dec 9, 2016

well even IF SSDs are more reliable, they still can fail.
I had an HTPC on 64GB SSD just turn off and never get back up again.
the drive simply went dead on a running system. so the system was up runign in RAM
and give me some problems, when I rebooted it was all over.
since it was just a simple HTPC running Kodi, I lost nothing. it was a pure streaming machine, no data of any kind on the drive, now if it was data server I would have lost all of it just like that.

ColPanic · Dec 9, 2016

Sure. You can find anecdotal evidence of any device failing - I've seen motherboards crack, CPUs fail, etc... But SSDs do not fail like traditional hard drives so they dont necessarily need to be treated as if they are just a faster version of the same old hard drive. This is the article I'm referring to. With 4.5 million hours on hundreds of used enterprise drives - two failed.
Used enterprise SSDs: Dissecting our production SSD population

Fritz · Dec 9, 2016

Here in the ghetto we think of RAID as a safety net. A wake up call that screams BACK UP NOW!!!

vl1969 · Dec 9, 2016

weather SSDs fail like traditional hard drives, or not is not the point.
point is they can and do fail, just like any other peace of electronics out there.
and when they fail the outcome is still the same, you are left with the mess on your hands.
either it is a dead server that you need to rebuild from scratch, or a dead data pool that needs to be restored from backup, while you do all that your data is unavailable to you in any way.
a properly chosen and configured raid setup, can help you stay on top and can help you keep your data/system accessible. what's wrong with that?
I am not advocating the same use of raid as it was done for years in the past.
but I still think raid have a place even in today's world. you just have to pick a proper type and way for your setup. for example I build my home server with a raid-1 System drive. I feel it is worth it for me
the extra steps and expanse to keep the system going if drive fails. all my drives are in hot-swap cages and are monitored with smart tools. if I start to get warning I have a choice of changing the offending drive at my leisure and on working system, while my system and data is still there accessible, compared to having the whole thing hard crush and spending hours re building and reconfiguring thing anew.
data is not an issue as I have a backup, but I never successfully recovered a system drive from backup that I did not have to tweak and fix afterwards to make it work properly again.
I understand that raid is not 100% protection, nothing is,but even if it is a 85%-90% it good enough for me.
YMMV

EffrafaxOfWug · Dec 10, 2016

As others have said a millions times already, RAID is not backup. Size and performance of the array notwithstanding:
RAID is multiple copies of the data in the same place to enhance availability
Backups are multiple copies of the data in different places to enhance survivability

As ever, DNA put it best.

Douglas Adams said:
The major difference between a thing that might go wrong and a thing that cannot possibly go wrong is that when a thing that cannot possibly go wrong goes wrong it usually turns out to be impossible to get at or repair.

All of the SSDs I've put into production have always been in a RAID set because SSD failures can and do happen - not always in the same way as hard drives of course. Mechanical failure is no longer applicable to SSDs by and large, but sudden death of the controller will nuke any kind of drive from orbit with aplomb. So first rule is spend money on backups before you spend it on RAID. No amount of RAID will save you from a fire or a misplaced rm -rf.

Ultimately it all boils down to cost vs. benefit as with everything else. If one of your trading platforms is down for an hour and your lose £$€eleventy jillion whilst you replace a £500 drive, then it would have made economical sense to spend the money. If you tell the missus she won't be able to access t'internet until the replacement SSD for the router arrives in three days time, there's a cost to that too (not to mention the hospital bills).

If the time spent in replacing the part and restoring from backups isn't an issue, or it isn't economically feasible to run RAID, then as long as you've managed expectations there shouldn't be any issue.

SINN78 · Dec 10, 2016

Fritz said:
Here in the ghetto we think of RAID as a safety net. A wake up call that screams BACK UP NOW!!!

LOL Not in these parts RAID means the cops kicked someones door down

SycoPath · Dec 10, 2016

The biggest issue for me with SSD's is the way they fail. More often than not, a hard drive will give you a warning, low performance, read errors, abnormal noises, etc. Also, when a hard drive does fail, usually a lot can be recovered with drive utilities. SSD's are different. Everything is all great and happy until it decides to hop on a midnight train to Georgia and take your data with it. Bang, gone, no warning, just gone, and it ain't coming back.

ColPanic · Dec 11, 2016

What settled the question for me is needing raid not to protect from drive failure but to protect me from myself. The chances of me accidentally pulling the wrong drive is far higher than the chance of a SSD failure.

vl1969 · Dec 12, 2016

SycoPath said:
The biggest issue for me with SSD's is the way they fail. More often than not, a hard drive will give you a warning, low performance, read errors, abnormal noises, etc. Also, when a hard drive does fail, usually a lot can be recovered with drive utilities. SSD's are different. Everything is all great and happy until it decides to hop on a midnight train to Georgia and take your data with it. Bang, gone, no warning, just gone, and it ain't coming back.

yep ,that was my point earlier. like I said in my last post, that is what happened with my HTPC.
it was up and running happily for 5 month, all of a sudden it start kicking out cryptic errors on regular basis.
never come back after reboot. the drive was dead as a paperweight. no big deal with MY htpc setup as the box was just a streamer. any data on it was also copied to main server, but some people setup pools double or triple duty, and when this fails it is a disaster of great proportion. and also a PITA to recover, even with backup.

vl1969 · Dec 12, 2016

ColPanic said:
What settled the question for me is needing raid not to protect from drive failure but to protect me from myself. The chances of me accidentally pulling the wrong drive is far higher than the chance of a SSD failure.

how often do you pool the drives out of the system for it to be so critical?
with me, if I am pooling the drive out it means S#$t has hit the fan already and I really can not do much more damage

NetWise · Dec 12, 2016

For me the question comes down to WHO's data is it. If it's yours in your own house, well you're in charge. But if it is your employer or customer's - that's not your baby to risk dropping. The main reason we all did RAID 1 or 10 used to be for performance to avoid the IO penalty. RAID 5/6 on SSD's have tons of IO to spare, so doesn't really matter as much. So you can get away with 'bad' RAID levels as they no longer hurt as much.

But even for an SMB customer, to have 25x $20-$100/hr employees twiddling their thumbs for a few hours is a minimum of $500/hr while you do a restore or reseat a drive or or or.

So 'worth it'? To buy an extra $100 SM863 drive for an array or something? Every day I breathe. The good will and saving face alone are worth it, IMHO.

Sent from my iPhone using Tapatalk Pro

ColPanic · Dec 12, 2016

Since this is a home lab/media server/fileserver/backup storage I'm always tinkering with it. In fact, tinkering with it is the whole point. There is nothing critical that's not backed up regularly and in multiple places. For data I truly care about I have a backup strategy that would make even the most paranoid hoarder shake his head.

I'm not very likely to pull the wrong drive but the chances of that, or a loose cable, are much higher than the .1-.3% annual failure rate that STH has seen with used data center SSDs. However I manage pooling the drives it needs to protect me from myself. Since it's all flash, I don't need striping for performance and I'm convinced that parity is a waste of space in this application. JBOD pooling + iSCSI and smb + checksumming + ability to temporarily lose a drive would be a winner.

vl1969 · Dec 12, 2016

ColPanic said:
Since this is a home lab/media server/fileserver/backup storage I'm always tinkering with it. In fact, tinkering with it is the whole point. There is nothing critical that's not backed up regularly and in multiple places. For data I truly care about I have a backup strategy that would make even the most paranoid hoarder shake his head.

I'm not very likely to pull the wrong drive but the chances of that, or a loose cable, are much higher than the .1-.3% annual failure rate that STH has seen with used data center SSDs. However I manage pooling the drives it needs to protect me from myself. Since it's all flash, I don't need striping for performance and I'm convinced that parity is a waste of space in this application. JBOD pooling + iSCSI and smb + checksumming + ability to temporarily lose a drive would be a winner.

sorry I am in totally different state of mind when it comes to tinkering

for me it is just a necessity not the point of it. once I have a setup I like I usually go with hands-off approach.
my first home server was an UnRaid setup on a DIY white box. once it was up I run it for 3 years only login onto it for small updates and to make new folder.

as soon as I am satisfied with my new server setup, it will be the same way.

right now, I am running OMV as main OS. raid-1 on 2 SSDs for System drive

I have 8 or 10 mixed drives, some 3TB, some 2TB couple of 1GB all individually formatted with BTRFS (this gives me CoW and bit-rot protection for data)
run SnapRaid in raid-6 config on all data drives (2x3TB drives are reserved for Parity, rest is data)
and MergerFS to pool the drives into single share.
I use NFS and Samba for sharing folders on MergerFS volume, it is not advisable to share the root on the pool.
I guess you can add iSCSI to the mix as well

the beauty of this setup is that all drives are totally independant of each other.
I can pull drive out and use it in other system if I need to, and as long as I put it back before the scheduled SnapRaid scan I will not have an issue, and even if I miss the scan, SnapRaid will usually correct the error when the drive is back.
MergerFS will show you error about missing disk but will still work. all data on other drives are still accessible, only data stored on missing drive is not.

SSD and RAID - Worth it?

Member

Well-Known Member

Active Member

Well-Known Member

Active Member

Member

Moderator

Active Member

Member

Well-Known Member

Active Member

Radioactive Member

Active Member

Active Member

Member

Active Member

Active Member

Active Member

Member

Active Member