Please help me RCA this

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.
Apr 9, 2020
57
10
8
All right, I'm going to describe a series of events, and I'm hoping someone can help me figure out what the heck happened.

First, my workstation:

Windows 10
Asus Z10pe-D16-WS
2x Xeon E5-2643 v3s
64gb RAM(2x 32gb)
2x nVidia GTX 1070 ti
LSI Logic SAS9260-8I SGL Raid 8PORT Int 6GB SAS/sata Pcie 2.0 512MB running 2x SAS drives in RAID0

Saturday morning at about 5AM I am awoken by continuous beeps. I trace the source to the workstation listed above. I go to my desk, turn on my screens. 2 monitors aren't responding, but windows is up and the clock is accurate(generally not possible if the screen is frozen). I can't get any response from the keyboard or mouse, figure my KVM is just on the fritz again, and it being 5'o'clock in the bloody morning, I simply flip the PSU off and resolve to fix the problem later.

Next morning, get up, turn the computer back on, it starts beeping again.

So I start going through standard troubleshooting steps. I won't bore you with the actual details but 5 days and close to $800 worth of replacement parts later, I finally discover that the actual problem is one of my RAID0 HDs has failed. Now, I knew this sort of thing was a possibility when I decided I wanted RAID0, but the question I have is: how was window still running that night at 5 AM? If one member in a RAID0 was gone, should not the entire system have been down?

I can live with all the time and money wasted on this effort(and actually the money is not "wasted" since all the parts will live on in other projects) but I don't think I can rest until I figure out the mistake that sent me down this rabbit hole.
 

EffrafaxOfWug

Radioactive Member
Feb 12, 2015
1,394
511
113
how was window still running that night at 5 AM? If one member in a RAID0 was gone, should not the entire system have been down?
Depends on the nature of the failure; I've had systems before running entirely from RAM when their hard drives have gone out from under them but how the OS sees that will depend on the RAID controller and the discs themselves. Conceivably, one of the discs might have gone read-only (or gone read-only some time after boot, or partially broken only on certain zones/sectors) which might get you to a bootable state but not much further.

I know that once explorer is running the clock applet will as normal even if the hard drive is totalled.
 

EffrafaxOfWug

Radioactive Member
Feb 12, 2015
1,394
511
113
Were you able to get the system to stay up long enough to at least back it up...?

RAID0 though... just avoid it, regardless of how reliable you think your drives might be, unless you're 100% certain it will only ever contain ephemeral data that can be replaced at a moment's notice. The (increasingly irrelevant) speed benefits it can provide are miniscule compared to the complexity and lack of redundancy. You were luckier than most when you were able to still boot off the array - usually when a RAID0 goes, it's gone for good.
 
Apr 9, 2020
57
10
8
Nah, not bootable with only 1 drive. I did loose a couple things, mostly some word documents I had left open lost the unsaved work. There's a few other files I stupidly left only on the C:\ drive that will be a little trouble to rebuild.

I have definitely learned a lesson about RAID0 I won't soon forget. The really annoying part? The plan had been to configure an automated backup system to another drive that would save everything periodically. I apparently lost interest in this project and never actually got it automated...
 
Apr 9, 2020
57
10
8
Well I'm about to loose my mind over this.

Ok, 1 drive fails in a raid0, kills PC. It happens. Cry, loose data, move on. So I got new drives and switched to RAID5.

Got windows instaleld, rebooted 5 times doing automatic updates, started process of driver installs, then had to go to work. Forgot to disable power features, the system "went to sleep" and that destroyed the windows install. No idea why or how.

So I reinstall windows. Everything's going well, got 60% of my software installed, got my drivers loaded, realized "Oh, hey, I forgot to change the computer name and join it to the workgroup!" So I do that, restart again... and have somehow destroyed windows AGAIN.

I have no clue what is going on, here. This machine was rock solid before the drive went out, I had it running over 100 days contiguously..

Driving me mad.
 

i386

Well-Known Member
Mar 18, 2016
4,245
1,546
113
34
Germany
should not the entire system have been down
It depends on what you mean with the "system": the storage system consisting of raidcontroller + storage devices + logical array or your workstation as a system consisting of an os (windows) and hardware...
In theory the sas2+ raidcontroller should allow you to use the "degraded" or "failed" array (sometimes you will need the cli tools with a -force parameter*). I can't confirm this as I don't use any raid 0 setups & never had a problem with prototypes where I used raid 0.

*Can a failed array be recovered?
Loose data,cry, learn from it, move on.
Rearranged/fixed that for you :p
 

Dreece

Active Member
Jan 22, 2019
503
161
43
Sorry to hear of your bad-luck.

I'm sure right now you're realising the pitfalls of RAID... heart drop moments galore indeed.

Now there is a golden rule with raid, never ever Raid-0 your system/boot drive. Raid 1 is a common choice for system drives, pure and simple a mirror, plus have a solid backup strategy for the system drive ontop of the Raid-1 redundancy, one of the cheapest out there is EaseUS Todo Backup, it installs its own boot-level driver as well as gives you a process image you can burn onto CD/usb, and recovery is plain sailing, many others out there too so take your pick, but always 'test' the backup/recovery strategy on a new install to confirm the process and write some notes down to help speed up the process if and when the inevitable happens and you have to go down that road... believe me when I tell you plenty have backup strategies or so they thought they had but later find out the convoluted recovery process just doesn't want to play ball.

I personally use Raid-0 for data storage pools as I have no need for redundancy, I have a backup/recovery system in-place which is more than good enough to recover back to the last backup, might miss a file here and there, but I calculated it an acceptable risk. Where data is critical, I tend to go Raid-10 nearly always, sure it costs more, but that's just me, I don't like to play devils advocate with parity drive(s) endurance.

On another note, welcome to the world of tinkering, we all have to lose data at some point to make us realise how important and special 'that guy' behind the scenes who manages a company's servers really is, though these days it's all mostly outsourced to big players.
 
Apr 9, 2020
57
10
8
Now there is a golden rule with raid, never ever Raid-0 your system/boot drive.
So that brought back some fun memories. Yes, it's 100% true and I should have known better. But it reminded me of way back in the day(I think 17-18 years ago) when Raid0 was all the rage amongst my circle of friends. I knew a few guys who had 4 drive raid0 PATA arrays as their C:\ drive, usually as their only actual drive, AND had only 1 computer(we were in high school and a poorly-timed drive failure could mean flunking history).

I've had quite a long history with raid. Long ago I did have my gaming rig in a 2 drive RAID0 for speed, and that actually lasted a good long while. But I also had an independent system with a more reliable boot drive. Then I had an 8-drive RAID10(4x span mirrored) with 15k SCSI2 drives for a while and man that thing FLEW! It was woefully obsolete at the time(I know SATA was out and possibly SAS) but the parts were dirt cheap and the speeds were really good. Had a few RAID5 servers... what happened over and over again was that I would build the server, run it, then 1 drive would fail and I would no longer be able to buy that drive... ah the memories.

My next major project is to find either an external SAS box or move my workstation into a better tower so I can fit more SAS drives. I am liking the speeds quite a lot.

Anyway, am on RAID5 now for my boot drive and exploring my options. Probably going to pick up EaseUS Todo Backup. Reinstalling everything has proven a big enough hassle. Especially after having to reinstall it all TWICE thanks to sleep mode...