Hardware failures in 2017 - Post yours!

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

csc2ya

New Member
Feb 1, 2018
1
0
1
39
Hi all....first post...good to be here. I'm male, 33 years of age, from the UK. I've always loved servers and computers.

Had 2 hardware failiures recently, both on my servers. My supermicro box at OVH running windows had it's motherboard fail at the end of last year. Second mirrored drive went first, OVH replaced it. I rebuilt it's mirror...but still had issues. OVH checked it again, and ended up replacing the motherboard. I updated the MAC address in windows according to their instructions, and it was back online. That same machine has been nearly entirely replaced over the 3 years i've had it (2 motherboards, several harddrives, memory, the power supply more than once...all with no data loss though...thank god for mirrored drives)

My second supermicro box at online.net (also running windows) had some sort of failiure at the beginning of this year. I'm unsure what was wrong (possibly the mobo), but they replaced that machine (a blade in a supermicro microcloud) entirely.

Fortunately no important data was lost in either case. The online.net box was a complete loss, but I had local backups of that one.
 
Last edited:

TeeJayHoward

Active Member
Feb 12, 2013
376
112
43
Well, it's technically 2018... But I've had my first hardware failure of the year, which somehow prompted an upgrade. It started out with me trying to migrate from vSphere to RHEV. I fired up a node I hadn't used in a while, and experienced a lot of issues with the ConnectX-2 card. Well, here's the reason:


Popped it back on, and good to go. On another project, I was ripping some BluRays, and suddenly started seeing errors. Turns out I had a dead disk. Well, what can you do? Spend the $35 to replace it?


Of course you could... OR, you could use it as an opportunity to upgrade. 6x12TB disks to add to the array, a card to turn one of the 4Us into a JBOD, a cable to attach them, and a bunch of leftover hardware from two lab revisions ago... Presto!



Hrm, while I'm in here, how about tidying up the cables a bit for better airflow?


...Okay, I still need to finish that project. But it's time to stop the work that I pay for, and start the work that pays me. And if I get a chance tonight, maybe I'll try and find out who makes lacing cable and replace all those twist ties.
 

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,625
2,043
113
Did you just pop it back on or secure it this time?

I'd say about 20-30% of the NICs I get in the mail are popped, honestly just pop them back here... but after hearing yours popped in a server I think I'll be securing them different now :D
 

pricklypunter

Well-Known Member
Nov 10, 2015
1,708
515
113
Canada
That's usually a symptom of the little brass rivets having lost their tension due to heating/ cooling. I usually just remove the heatsink, clean it and the chip and replace it with a dab of arctic mx something or other. I also splay the rivets a little with a thin spudger to re-tension them. After having done that I have never had one detach on its own again :)
 

Patrick

Administrator
Staff member
Dec 21, 2010
12,511
5,792
113
I have to admit, I wish they just used some sort of thermal tape/ cement on NIC heatsinks. This happens too often.
 

pricklypunter

Well-Known Member
Nov 10, 2015
1,708
515
113
Canada
In days gone by, manufacturers used to use chip bond, which was a sort of metal loaded epoxy, to directly bond heatsinks to the plastic or ceramic chip carrier. Unfortunately, miniaturisation, new soldering and machining processes, modern chip packaging, cost etc all played a part in the demise of such practises. Well that and folks also wanted to be able to pop a device, rework a board and replace it easily without resorting to using a crow bar to get the device off the board. It is also conductive, so being careful with it on a modern board is an understatement. Still I keep some around for just those times when nothing else will do :)
 

TeeJayHoward

Active Member
Feb 12, 2013
376
112
43
I just popped it back on and used a pocket knife to spread the tabs out an absurd amount. Not sayin' it can't come back off, just that it'd be impressive if it did.
 

DRAGONKZ

Member
Apr 9, 2018
87
10
8
41
I had the CIMC fail on my Cisco C220 M3 and it now refuses to boot.

It’s impossible to get a board 2nd hand in AUS so I had to order one from the USA.

Playing the homelab game is so much easier and cheaper over there!!
 

Jaket

Active Member
Jan 4, 2017
232
119
43
Seattle, New York
purevoltage.com
Not a 2017 but 2018 fail.
We had a new custom built deep learning system built out shipped, couldn't get everything working after over a week of troubleshooting turns out we had to RMA the whole system was a 30k system wasn't a fun process.
 

alex_stief

Well-Known Member
May 31, 2016
884
312
63
38
Curiosity got the best of me, so I opened up one of those crappy AIO liquid coolers that failed so frequently in HPs Z820 workstations. I'll spare you a description of the smell, here is just the picture


Why not just rename this thread. Remove the date, then we won't need another one.
 
Last edited:

Markess

Well-Known Member
May 19, 2018
1,146
761
113
Northern California
Curiosity got the best of me, so I opened up one of those crappy AIO liquid coolers that failed so frequently in HPs Z820 workstations. I'll spare you a description of the smell, here is just the picture.
Yuck! But, I've seen that elsewhere before. I've got no experience with those HPs, but is the Radiator aluminum (or some other non-copper)? I've seen some older AIO units with mixed metals fail after whatever stabilizing agent they'd had in them got old enough that they didn't work anymore.
 

alex_stief

Well-Known Member
May 31, 2016
884
312
63
38
Yeah, there are lots of things wrong here. Pairing copper with aluminium, using an unproven, low-volume design, probably the tubes leaching whatever they are made of... and most importantly, using a tiny AIO water cooler in the first place. The way it is designed, it offers zero benefit over an air cooler, and adds more points of failure.
 
  • Like
Reactions: Markess

Markess

Well-Known Member
May 19, 2018
1,146
761
113
Northern California
Yeah, there are lots of things wrong here. Pairing copper with aluminium, using an unproven, low-volume design, probably the tubes leaching whatever they are made of... and most importantly, using a tiny AIO water cooler in the first place. The way it is designed, it offers zero benefit over an air cooler, and adds more points of failure.
Yeah, with that size radiator, I can't imagine any of the ones with a pair of 150w "W" series CPUs, or even 130w ones, were quiet or cool. Even with the air channels, those things must have been like a furnace inside.