Coming back online

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Patrick

Administrator
Staff member
Dec 21, 2010
12,516
5,811
113
Thanks! I actually am working on getting an offsite solution.
 

Clownius

Member
Aug 5, 2013
85
0
6
Sorry to hear of your trouble. Have you determined exactly what was faulty.

Im again reminded of the importance of good backups and the time i had been working on a site for a few months when the HDD died. When i enquired about the backup system i was told daily to the same disk and monthly offsite. The last backup was a couple of weeks old and the site data was time sensitive with even an hour outdated data being a real problem. Thank god that disk came back up long enough to pull the latest data off it. Step one was overhauling the whole backup system.
 

Patrick

Administrator
Staff member
Dec 21, 2010
12,516
5,811
113
I noticed there are sporadic articles coming online. What gives?
Basically I am prioritizing ones that I can get online quickly (~10 minutes/ article) and I am also prioritizing the most popular ones.

Totally on the offsite backup. The big issue was that I was temporarily running the backup server off of the same chassis (different machine) as the A + B production nodes. At this point I am fairly certain it was an issue with the Dell C6100 hard drive backplane since it impacted 3 of 4 nodes in that chassis but the servers kept running.
 

ehorn

Active Member
Jun 21, 2012
342
52
28
At this point I am fairly certain it was an issue with the Dell C6100 hard drive backplane...
Murphy has a way of finding the only non-redundancy and exploiting it...

He is a good teacher though...

Glad to see the site back up Patrick.
 

Alfa147x

Active Member
Feb 7, 2014
192
39
28
Thanks Patrick for putting all of this work into the site. I doubt the ad revenue makes this site profitable so your diligence and persistence is very much appreciated.
 

iq100

Member
Jun 5, 2012
68
3
8
I thought you had written that "only the intel SSD data had survived". Why that one. You wrote>"At this point I am fairly certain it was an issue with the Dell C6100 hard drive backplane since it impacted 3 of 4 nodes in that chassis but the servers kept running."
Looking forward to detailed analysis of what happened and your suggestions for preventing a re-occurrence! Doesn't your UPS provide saving all SSD data buffers?
 

Patrick

Administrator
Staff member
Dec 21, 2010
12,516
5,811
113
Thanks Patrick for putting all of this work into the site. I doubt the ad revenue makes this site profitable so your diligence and persistence is very much appreciated.
There was certainly a minute or two where I was thinking about the wedding and honeymoon and was thinking whether it made sense. That lasted about 48 hours then I realized I felt like something was missing.

@iq100 It seems like the controllers died rather than the entire backplane failing. The Kingston E100's have capacitors (they are enterprise drives) so it was not that. Seems like the controllers died.
 

chune

Member
Oct 28, 2013
119
23
18
There was certainly a minute or two where I was thinking about the wedding and honeymoon and was thinking whether it made sense. That lasted about 48 hours then I realized I felt like something was missing.

@iq100 It seems like the controllers died rather than the entire backplane failing. The Kingston E100's have capacitors (they are enterprise drives) so it was not that. Seems like the controllers died.
Care to share what controllers they were? Were you running ZFS or HW raid?
 

wildchild

Active Member
Feb 4, 2014
389
57
28
Hi Patrick, glad to see you've returned from the dark black :)

tapatalk isn't working yet, sure it's no biggie, but thought you'd like to know
Also had to recreate my account
 

Patrick

Administrator
Staff member
Dec 21, 2010
12,516
5,811
113
Hi Patrick, glad to see you've returned from the dark black :)

tapatalk isn't working yet, sure it's no biggie, but thought you'd like to know
Also had to recreate my account
I was at the gym this morning and had that thought. Will work on it later today.
 

rnavarro

Active Member
Feb 14, 2013
197
40
28
Glad to see things are coming back online.

Very curious to see what the root cause was, and whether or not drive savers will be able to get anything.

On the backup note, I have multiple terabytes of storage in colo that I use for backups and such. More than happy to give you some backup space for free so you have multiple copies!
 

parawizard

New Member
Jan 28, 2014
21
0
1
Ouch. That is a pretty brutal failure! Props on getting it back up and working to this degree quickly.

Weird stuff happening with that backplane I wonder what additional protection is built into the intel drives that made them survive.
 

Patrick

Administrator
Staff member
Dec 21, 2010
12,516
5,811
113
I think I have something like 38 main site articles left to restore and about 8 of those started.

Just got a note from drive savers... the semi good Kingston E100 is not coming back to life.