Hardware failures in 2017 - Post yours!

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

SycoPath

Active Member
Oct 8, 2014
139
41
28
I'd tell you how many failed Seagate drives I've had to replace over the years but I can't count that high :p
1999 a client company had a single system drive on the NT4 server die and it cost them more than $2m in down time. They spent loooots of money on new servers after that. It was a Seagate!
Any infrastructure with a single point of failure that could cost you $2m was negligently horribly designed. That's such a poor design choice I'd probably fire the guy who designed it, and the guy who hired him for making such a horrible hiring choice.
 

Tom5051

Active Member
Jan 18, 2017
359
79
28
46
Any infrastructure with a single point of failure that could cost you $2m was negligently horribly designed. That's such a poor design choice I'd probably fire the guy who designed it, and the guy who hired him for making such a horrible hiring choice.
Totally agree. It was an inherited system and for some reason they had this server hidden away in a cupboard away from the other servers. Was so hot in there when we finally located it. The company that owned the servers had a view that IT did not make them money and hence the need to spend money on 'fancy' gear was not required.
 

SycoPath

Active Member
Oct 8, 2014
139
41
28
Totally agree. It was an inherited system and for some reason they had this server hidden away in a cupboard away from the other servers. Was so hot in there when we finally located it. The company that owned the servers had a view that IT did not make them money and hence the need to spend money on 'fancy' gear was not required.
Stories like this just keep reinforcing my paranoia. Very first thing I do when looking at any system is NMap everything into a big spread sheet and go find every single thing on the network. Every switch, PDU, Desktop, Server, everything. I even look for WIFI ip's that never leave and go find out what the heck it is. So many times some middle manager adds some random piece of gear for their department that gets used for so long everyone thinks it's official.
 

Tom5051

Active Member
Jan 18, 2017
359
79
28
46
Stories like this just keep reinforcing my paranoia. Very first thing I do when looking at any system is NMap everything into a big spread sheet and go find every single thing on the network. Every switch, PDU, Desktop, Server, everything. I even look for WIFI ip's that never leave and go find out what the heck it is. So many times some middle manager adds some random piece of gear for their department that gets used for so long everyone thinks it's official.
Don't think we had any network mapping software options in those days.
 

SycoPath

Active Member
Oct 8, 2014
139
41
28
Don't think we had any network mapping software options in those days.
Well, yeah. o_O I thought it was pretty clear I was referring to my current process since I was taking about WiFi also. :rolleyes: My paranoia was born in "ye olden days" from crazy stories like these.

Man could you imagine what you could have charged for something like NMap in the 1980s? It would have been like black magic.
 

Xicaque

New Member
Mar 28, 2017
23
2
3
114
Olympia
How about my Athena Power PSU? The second one just gave up the ghost. Still have warranty but upon further web searching, I've found some not so great comments on Athena. POS! I will trash it rather than having a fire.

Ordered a Supermicro CSe-743T665B and decided to upgrade the case for one of my FreeNAS servers.
 

vl1969

Active Member
Feb 5, 2014
634
76
28
how about my IBM as/400 eSeries 170 box at the office.
the raid controller cache battery give out on a vim. still trying to get it back up :)
 

Xicaque

New Member
Mar 28, 2017
23
2
3
114
Olympia
Are those IBM as/400 still in production? I remember those at my first job when I got out of the army. The owner was like jumping of joy as they upgraded from another unit (can't recall the model).
 

vl1969

Active Member
Feb 5, 2014
634
76
28
Are those IBM as/400 still in production? I remember those at my first job when I got out of the army. The owner was like jumping of joy as they upgraded from another unit (can't recall the model).
yes, there are some of them still chugging along just fine,
we actually have 2 units. one is a PROD system that company had for almost 20! years.
and the apps that we still running have been in dev for over 30!.
they upgraded to AS400 170 mini from a true System36 machine (still have the original unit in the storage)
in 1999, moved the in house system to it and updated to RPGII.
last we have tried to move/upgrade to an iSeries 720 machine in 2014 but having problems converting the in-house system to new standard. it's all RPGII running on SSP and new 720 will not run that.
also we have about 20 Twinax specialty printers that we can not replace and new 720 does not support twinax :)
we had hired and fired an IBM consultant, and got a second one now to help us to convert .
it's a real mess.

we got a second 170 machine to have a backup if the PROD fails. I have been having a hell of a time setting up the cloning solution to mirror the PROD to BACKUP in optimal fashion.
build out a CLP + FTP setup that SAVEs the M36 partition and several other LIBs into SAVE files overnight,
FTPs it to the Clone system and restores it there.
 

Fritz

Well-Known Member
Apr 6, 2015
3,386
1,386
113
70
I have a Seagate 1TB HD that has had 154 bad sectors for at least a decade. In spite of this it has performed flawlessly. Just this week the count went to 155. :(
 

brinox

Member
May 7, 2013
48
10
8
Bought a brand new X10SDV-TLN4F from Newegg last week, got it on Friday. I temporarily started with some desktop DDR4 memory because reasons, and had a seemingly random freeze on the system. I first suspected the RAM, and exchanged it at my local Micro Center for some proper Crucial ECC memory. Got home and the board wouldn't even post anymore. Very weirdly, the side of the board where power comes in and DIMMA1 is, a few components were ridiculously hot, and the single stick of memory I had installed there was hot to the touch. Mind you, I only connected power here; I hadn't hit the power button yet.

After about 20 more minutes of troubleshooting and letting the board sit out by itself with zero power connected, including the CMOS battery, I tried yet again to get it going. This time, there was a faint but audible super-high-pitched buzzing noise after connecting the power supply. Also, the aforementioned far side of the board and memory weren't getting hot anymore. I'm pretty sure I watched a $900 board commit suicide. Yes, it was very disconcerting. However the replacement comes today hopefully!
 

William

Well-Known Member
May 7, 2015
789
252
63
66
OUCH !!!

Think it might be your power supply ?
Do any caps look damaged on the motherboard ?
 
  • Like
Reactions: brinox

brinox

Member
May 7, 2013
48
10
8
OUCH !!!

Think it might be your power supply ?
Do any caps look damaged on the motherboard ?
Nope, tried 3 different PSUs, all appear to be in working order.

Oh I forgot to mention! This failure also includes the oh-so-lovely smell of fried silicon and PCB. I took a magnifying glass to that board before shipping back and found nothing.

I just got home and have the replacement mainboard powered up, but only using the 24pin ATX PSU connector, since I read that this board doesn't need that. Its loading up ESXi as we speak, and my near complete passthrough.map change appears to have worked. For the deeply discerning, I've successfully passed through the Xeon-D Lynx Point AHCI controller. Now I'm just hoping to get some NVMe storage provisioned using the onboard M.2 slot to verify the passthrough stuff...
 

Terry Kennedy

Well-Known Member
Jun 25, 2015
1,142
594
113
New York City
www.glaver.org
As you have failures, do share them in this thread.
I ran into something odd today which reminded me of the heatsink issue in your original post...

My company was one of the first to provide a "we take it back for free and responsibly recycle it" as a standard part of every system sold (we started doing this > 15 years ago). So we get a lot of old stuff back which we either then refurbish and donate to people who can use older systems or we part them out, making sure that nothing ends up either as landfill or sent to one of those toxic offshore "recycling towns".

We got a couple Dell PowerEdge 2950's back that we'd sold back when they were new. As part of the open / inspect / wipe drives / upgrade all firmware process, I discovered that one of the support chip heatsinks was no longer attached to anything and it was just floating loose in the case. This is the type of heatsink with a springy clip that goes into 2 loops on the motherboard, one at each diagonal corner of the motherboard - the black heatsink at the upper center of this picture (not my pic). The top right loop (just above the black SATA connector in that picture) was completely missing. A thorough search of the chassis (including removing the motherboard tray and then removing the motherboard from the tray) didn't turn up the part - it must have been missing from the factory assembly process.

I fabricated a replacement by cutting the fat end of a paperclip off and soldering it into the two motherboard holes for the loop. I then cleaned the top of the chip and the bottom of the heatsink, put some heatsink compound on, and re-attached the heatsink.

The system is now undergoing a 24-hour burn-in test and seems to be working perfectly.
 
  • Like
Reactions: Patrick

funkywizard

mmm.... bandwidth.
Jan 15, 2017
848
402
63
USA
ioflood.com
1 drive out of 12 of the same model, and I have 44 ultrastars spinning away with no issues so far! Probably just jinx'd myself though. Even Backblaze data seems to show they are better than average. Seagate however, I wouldn't take for free.
Now that there's burstcoin, those Seagates are worth just a smidge over $0. Prior to that "use case", Seagate has been "shoot on sight" over here for a while now.
 

funkywizard

mmm.... bandwidth.
Jan 15, 2017
848
402
63
USA
ioflood.com
Can't seem to find the pictures, but I had a Dell C6220 chassis fail spectacularly. On the power distribution board, one or more of the power regulatiom transistors literally caught fire. The 4 motherboards have since been sitting aside as spares, as I've got no chassis to put them into.
 

niekbergboer

Active Member
Jun 21, 2016
154
59
28
46
Switzerland
Now that there's burstcoin, those Seagates are worth just a smidge over $0. Prior to that "use case", Seagate has been "shoot on sight" over here for a while now.
Absolutely: I spent a grand total of $0 (CHF 0, really) on Burstcoin so far, since I had 4 4TB WD REDs with each a small collection of bad sectors around. Sure, I get some read errors on a small subset of the nonces, but that's fine: 99% of the nonces will be read just fine.
 

alex_stief

Well-Known Member
May 31, 2016
884
312
63
38
Nothing spectacular: Two 8GB DDR3 RDIMMs from Samsung and two Hard drives: One Seagate, one Toshiba. Sooner or later all of our HDDs will be WD.
 

Fritz

Well-Known Member
Apr 6, 2015
3,386
1,386
113
70
Just had a Supermicro BPN-SAS2-846EL1 screw the pooch. :( All drives dropped never to be seen again. I had a spare SAS1 BP I replaced it with and since box only houses 2TB drives I'll be leaving it in.