Intel DC S3500 SSD - What could cause the PLP capacitor to fail ?

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

netswitch

Member
Sep 24, 2018
44
17
8
Hello

We have multiple arrays with intel DC S3500 800 Gb drives and they have been quite solid until now.

We had some strange incident this week with 2 drives reporting smart pre fail values because of capacitor failure at the same time.
We have replaced the drives, they have their age and are our of warranty but we are concerned by the following facts :
-the two drive's capacitors seem to have died at the same time, they both report a discharge time of 1 microsecond.
-the drives have a different age and wearout (one has a CVWL serial number and the other a PHWL serial number )
-both of them had a warranty period until 10/2019
-we have 50+ DC ssd's now and must have used more thant 100+ over the last yearsand never had this kind of issue until now.
-this seems to be a very rare event as there is only one mailing list post mentionning such an event
-the drives are still perfectly fine performance wise, they just have no more power loss protection.

So we are worried if this is an exceptional bad luck or if there is something to dig further at our side in the backplane or power supply of the server.

Anyone has idea what external reason could cause a capacitor failure on an intel DC S3500 SSD ?
 

funkywizard

mmm.... bandwidth.
Jan 15, 2017
848
402
63
USA
ioflood.com
In general, capacitors fail due to poor design / poor quality, heat (especially over an extended period of time), or electrical abuse (such as a surge).

Due to both failing at the same time, I'd lean more towards a surge, but that's just a guess.
 

b-rex

Member
Aug 14, 2020
59
13
8
I'm going with a surge or poor power conditioning. Way back in the day, when Catalyst 3750's ruled the network closets, I remember getting calls about failed switches. For the most part, those switches are excellent. I'd still buy them today...I have 4 at the moment, very solid devices. Yet...what I found interesting is that with a good amount of the failed switches, one of the common events that occurred before they failed was a power outage or gen test. I remember seeing them not post and then randomly they would. Turns out...it was a known issue with the solder on the RAM modules, yet it didn't manifest itself until power failures (quick on/off like a gen test or short blackout) and once the device hit a certain age. Even modest surges like the ones you get daily from utility power can destroy the best of electronic devices. While it could be heat, we lost an Intel to a heat failure...one...that one device that failed was one of 24 in service in a largely undercooled environment. Age is a possibility...I mean...S3500s are getting up there age wise. Yet...I'd still put my bet on power quality. Age + poor power quality typically drives devices to die. I have power conditioners at home for this reason since most of my stuff is pushing 5-10 years old. That being said, sometimes the problem is the power supply on the server itself.
 

netswitch

Member
Sep 24, 2018
44
17
8
Hello
Thank you both for your feedback, these are in a Interxion Datacenter with with UPS and we monitor the power trough the PDU, it seems OK.
But the server PSU or backplane could be the source of the issue, at the moment we suspect something with the backplane.

I will also try to replace the capacitors on the dead drives as I have some spare dead S3500 lying around to check if they can be resurected and used in testing / draft environments.