Three complete failures in a month! Intel SSD Issues...

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

b-rex

Member
Aug 14, 2020
59
13
8
I've been losing an alarming number of S3610. I've lost three in the last month. Is there something I don't know? I've been seeing reports of these drives dying after a number of idle hours...but would assume the most recent firmware would have taken care of that bug. It's been severely impacting my view of Intel for storage. Sadly, almost all of my SSD stock is Intel...mostly S3610 including the 24 $200 1.6TB versions. Are these things all ticking time bombs? I'm just wondering if it makes sense to unload them now while they're still alive in exchange for Samsung replacements. I've used Intel and Samsung almost exclusively for the last 10 years...never had an issue with Samsung but have had a ton of problems with Intel as of late. I'm very concerned considering I have 16 P3600, 8 P3608, 16 S3710 400, 16 S3610 400, 16 S3510 480, and 24 1.6TB S3610. Anyone else seeing these patterns?
 

b-rex

Member
Aug 14, 2020
59
13
8
Just to add to this...just pulled the disk that was showing as "dead" in VMware....booted up my Windows diagnostic machine with Intel's tools and shows up healthy. No smart errors...shows 100% life left......and one of the others did too earlier this month. Is this another VMware WTF? These were RDM that literally got ripped from the VMs they were in because VMware showed them as dead. The first one this happened to, I replaced even though it was showing as healthy on other machines because...well I thought it was best to be safe. Having this one go in the same manner and showing up healthy has me suspicious though. Just for shits and giggles, I'm going to see if I can find the disk I pulled last week and check that one too.

Thoughts? Which is more likely VMware bug or Intel failure?
 
  • Wow
Reactions: ColdCanuck

b-rex

Member
Aug 14, 2020
59
13
8
Okay...another update...I decided to boot my ESX hypervisor into PartedMagic and started looking at SMART a little bit closer....and saw this for most of my disks...very, very disconcerting. Time to sell. Here's a word of warning to anyone buying stuff on eBay: don't buy used SSD. I've got thousands invested in these disks, all showed healthy in Intel MAS...but looking closer at smart data tells a different story. Some of these were purchased in the last few months. Too late to return...but based on the deal I got, pretty obvious why they were cheap.
 

Attachments

Stephan

Well-Known Member
Apr 21, 2017
946
715
93
Germany
Before jumping to conclusions, try the original Windows Intel Toolbox first for diagnostic and health scan. Then try a 24h stress test exercising the full disk. I usually use Linux with "shred -vzn1 /dev/sdX" as the command in a loop. That writes random numbers and then zeros sequentially. If SSD survives and shows good health in Intel Toolbox still, something else is imho fishy.
 
  • Like
Reactions: T_Minus

b-rex

Member
Aug 14, 2020
59
13
8
Before jumping to conclusions, try the original Windows Intel Toolbox first for diagnostic and health scan. Then try a 24h stress test exercising the full disk. I usually use Linux with "shred -vzn1 /dev/sdX" as the command in a loop. That writes random numbers and then zeros sequentially. If SSD survives and shows good health in Intel Toolbox still, something else is imho fishy.
Yeah, I think I'm just panicking. The soft error rates aren't that alarming...but the fact that I've dropped three SSD has me concerned something is up...whether it's with the SSD or something else...I have a host completely down at the moment. As for the stress test, yeah...I'll be doing that on the one that failed today.
 

b-rex

Member
Aug 14, 2020
59
13
8
I tested them and confirmed that the three that died were actually SSD failures. Two completely shutdown, can't get them to be recognized. The one actually locked up the system it was being tested on. I had another throw smart warnings over the weekend in another server. Pretty disappointing. I talked to a guy at work and he said he's seen similar issues with the S3610 and S3510. I'm guessing it would apply to S3710 as well.

For what it's worth, the last however many years I've dealt with enterprise storage, I've always sworn by Intel. The last year or two though, I've seen a few WTF occurrences after firmware updates that at first I shrugged off as flukes. I'm not sure if it's their updated firmware or what, but I've got thousands of dollars worth of Intel SSD that I feel like are ticking time bombs now. After some research elsewhere, I'm not the only one having these types of mass failures either. Some blame power, some blame firmware. I use power conditioning equipment on both of my stacks and 2 Eaton 5PX3000...I've had them running for a long time with no issues...so I'm not thinking it's power. Which leads me to believe that the most recent firmware updates might be the culprit. Given the fact that I do not expect them to release any more firmware updates, I'm going to start selling my SSD stock after I replace them. Sadly...I don't think the wife is going to let me buy several thousands of dollars worth of replacement SSD right now though.

Any tips on solid SSD bargains? Seems like prices have actually gone up on eBay for a lot of the stuff I'd consider "good."
 

b-rex

Member
Aug 14, 2020
59
13
8
There were some deals for the 400GByte model were people from sth bought hundreds of them from ebay and so far there are no posts about them dying in large quantities...
Given the failures I've experienced with the S3610, I'm not willing to take a chance on the S3710 (or any Intel, for that matter) despite the fact that they're higher quality. I'm going to go with the SM883.

I just hope my more expensive P3608, P3600, and larger size S3610 last. The S3610 1.6 are starting to show errors though...and they were not cheap.
 

Stephan

Well-Known Member
Apr 21, 2017
946
715
93
Germany
Your ebay profile will suffer if people realize those Intel SSDs were near end of life and others wrote the same as comment.

Intel really has lost its ways, many ways. Same with Texas Instruments, wheres 20 years ago you really could buy any chip for projects, nowadays the company seems to be run by MBAs who know jack shit and jack has left town.

I recommend Samsung for SSDs, their stuff seems to outlife their specs. Have a couple 840 Pro that just won't die.
 

Fritz

Well-Known Member
Apr 6, 2015
3,392
1,394
113
70
Just checked. I have a S3610 in a spare server sitting idle at the moment. Note to self, replace it before putting it to work.
 

redeamon

Active Member
Jun 10, 2018
291
207
43
I know we expect things to last forever, and for some companies this is the case (Samsung in particular makes great enterprise SSDs- I've never had a Samsung die on me), but for some companies End of Life is literally 5 years. If I recall S3610's are at least 7 years old by now and well past their expected "lifetime".

I'm not saying I agree with this, but this is partially the reason why data centers refresh every 5 years, or even 3 years- with most of that hardware ending up on eBay.
 
  • Like
Reactions: Samir

redeamon

Active Member
Jun 10, 2018
291
207
43
Given the failures I've experienced with the S3610, I'm not willing to take a chance on the S3710 (or any Intel, for that matter) despite the fact that they're higher quality. I'm going to go with the SM883.

I just hope my more expensive P3608, P3600, and larger size S3610 last. The S3610 1.6 are starting to show errors though...and they were not cheap.
The PM/SM883(a)'s are VERY good drives. They're super popular right now in the DC space.
 
  • Like
Reactions: Samir

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,652
2,066
113
I know we expect things to last forever, and for some companies this is the case (Samsung in particular makes great enterprise SSDs- I've never had a Samsung die on me), but for some companies End of Life is literally 5 years. If I recall S3610's are at least 7 years old by now and well past their expected "lifetime".

I'm not saying I agree with this, but this is partially the reason why data centers refresh every 5 years, or even 3 years- with most of that hardware ending up on eBay.

There are literally 100s of datacenters that will run hardware until well past EOL and they run out of spares to fix\replace and keep things going. Then there are 100s more who buy the used stuff up on ebay and put it into service for another 5-10 years.

5 and 3 year refresh cycles are for enterprise businesses for purposes more than just EOL, hardware is going to die, replace replace replace...
Depreciation schedule and warranty\replacement are the big ones, not because the hardware is just going to randomly die suddenly.


This guy had 3 failures and a lot of you are acting like END OF THE WORLD for Intel drives, it's rather alarming.
If you google S3610 failure you come up with a STH article and maybe a couple other, there are not loads of issues with these drives, let alone other Intel models. I don't think Intel has lost their way at all, I think some of you are freaked out over nothing.
 
  • Like
Reactions: i386 and itronin

b-rex

Member
Aug 14, 2020
59
13
8
There are literally 100s of datacenters that will run hardware until well past EOL and they run out of spares to fix\replace and keep things going. Then there are 100s more who buy the used stuff up on ebay and put it into service for another 5-10 years.

5 and 3 year refresh cycles are for enterprise businesses for purposes more than just EOL, hardware is going to die, replace replace replace...
Depreciation schedule and warranty\replacement are the big ones, not because the hardware is just going to randomly die suddenly.


This guy had 3 failures and a lot of you are acting like END OF THE WORLD for Intel drives, it's rather alarming.
If you google S3610 failure you come up with a STH article and maybe a couple other, there are not loads of issues with these drives, let alone other Intel models. I don't think Intel has lost their way at all, I think some of you are freaked out over nothing.
While I generally agree that raising the alarm over a few drives when there are however many out there shouldn't mean Intel has lost its way, still there's plenty to suggest in other areas, that in the aggregate, they have. While I think everyone on this board who's involved with enterprise operations gets the need for refreshes - SSD failure at 5 years (released in 2015...mine were produced in '16) is not the norm especially when they're nowhere close to their endurance ratings.

Aside from MTBF, with the thousands of SSD I've worked with in my years in this field...failures even long past 5 years are extremely rare. I've heard more than enough now from colleagues that for hardware they have with life extensions (pushing 6-7 years), an Intel SSD failure is not at all uncommon. This contrasts with Micron, Samsung, and even HGST devices where the same can't be said: like the Intel of lore, a failure is unheard of. This is not limited to SSD either. There are major changes in the semiconductor industry where Intel is falling behind fast. People down play AMD or even the increased feasibility of using highly performant RISC architectures for more complex workloads using specialized chips...Intel needs to pick up the slack or they're going to be in trouble. That is their bread and butter...and they've fallen behind and fast. SSD issues won't hurt them...processor issues will and they've certainly lost their way there too.

One other thing:

We might not see it posted everywhere because a lot of these drives have been destroyed or recycled by now, but the experience I've had along with shared experience from my colleagues suggests that Intel failures are far more common than almost any other SSD brand at this point. I used to chalk them up to flukes...even blamed power and whatever for the failures I've seen in the past...with these same devices...and nah. Between the fact that they just don't make them like they used to and that their customer service is outright atrocious, I'm glad almost all of the Intel's in my purview at work are OEM supported.

Besides, while we're familiar with these drives on STH...they're not common. People aren't going to eBay looking for used enterprise-grade SSD, because well, why would they? Businesses don't want them and consumers don't know about them. They're looking for the stuff they're familiar with...the stuff they see on the shelves at Best Buy. I'm not surprised we're not seeing vast numbers of people claiming these drives have failed...because, especially used, they're not that popular outside of a datacenter. We're the only geeks buying them.
 

b-rex

Member
Aug 14, 2020
59
13
8
The PM/SM883(a)'s are VERY good drives. They're super popular right now in the DC space.
Yeah, I love them and a few of my colleagues do too...they're great drives, pretty much all around. We haven't had a single failure with the several hundred we've been involved in putting in service. That's generally been the experience we have with Samsung devices though...they're very good. Aside from a few issues I've experienced with 1725s -- they're extremely reliable. Even their consumer line is fantastic.
 

b-rex

Member
Aug 14, 2020
59
13
8
Your ebay profile will suffer if people realize those Intel SSDs were near end of life and others wrote the same as comment.

Intel really has lost its ways, many ways. Same with Texas Instruments, wheres 20 years ago you really could buy any chip for projects, nowadays the company seems to be run by MBAs who know jack shit and jack has left town.

I recommend Samsung for SSDs, their stuff seems to outlife their specs. Have a couple 840 Pro that just won't die.
Yeah, I decided to take the risk on the ones I have left with more frequent back-ups and more detailed SMART monitoring. I'm just going to hope for the best.

As for the comment about Samsungs - I know someone who uses 840/850 in servers. I'm not kidding. He was saying he's even got EVOs out there that take the abuse in his dev/tst environments where he's got no budget. He works with his hardware reps to only get Samsung drives for everything he puts in service.
 
  • Like
Reactions: Samir

funkywizard

mmm.... bandwidth.
Jan 15, 2017
848
402
63
USA
ioflood.com
Just to add to this...just pulled the disk that was showing as "dead" in VMware....booted up my Windows diagnostic machine with Intel's tools and shows up healthy. No smart errors...shows 100% life left......and one of the others did too earlier this month. Is this another VMware WTF? These were RDM that literally got ripped from the VMs they were in because VMware showed them as dead. The first one this happened to, I replaced even though it was showing as healthy on other machines because...well I thought it was best to be safe. Having this one go in the same manner and showing up healthy has me suspicious though. Just for shits and giggles, I'm going to see if I can find the disk I pulled last week and check that one too.

Thoughts? Which is more likely VMware bug or Intel failure?
may need to just reseat the drives
 

funkywizard

mmm.... bandwidth.
Jan 15, 2017
848
402
63
USA
ioflood.com
I tested them and confirmed that the three that died were actually SSD failures. Two completely shutdown, can't get them to be recognized. The one actually locked up the system it was being tested on. I had another throw smart warnings over the weekend in another server. Pretty disappointing. I talked to a guy at work and he said he's seen similar issues with the S3610 and S3510. I'm guessing it would apply to S3710 as well.

For what it's worth, the last however many years I've dealt with enterprise storage, I've always sworn by Intel. The last year or two though, I've seen a few WTF occurrences after firmware updates that at first I shrugged off as flukes. I'm not sure if it's their updated firmware or what, but I've got thousands of dollars worth of Intel SSD that I feel like are ticking time bombs now. After some research elsewhere, I'm not the only one having these types of mass failures either. Some blame power, some blame firmware. I use power conditioning equipment on both of my stacks and 2 Eaton 5PX3000...I've had them running for a long time with no issues...so I'm not thinking it's power. Which leads me to believe that the most recent firmware updates might be the culprit. Given the fact that I do not expect them to release any more firmware updates, I'm going to start selling my SSD stock after I replace them. Sadly...I don't think the wife is going to let me buy several thousands of dollars worth of replacement SSD right now though.

Any tips on solid SSD bargains? Seems like prices have actually gone up on eBay for a lot of the stuff I'd consider "good."
We've had basically no issues with 3600/3610/3700/3710 (hundreds of them in use), but we also don't update the firmware, so I think you're right that the firmware update you did may be causing your problems.
 
  • Like
Reactions: Samir and T_Minus

funkywizard

mmm.... bandwidth.
Jan 15, 2017
848
402
63
USA
ioflood.com
Yeah, I decided to take the risk on the ones I have left with more frequent back-ups and more detailed SMART monitoring. I'm just going to hope for the best.

As for the comment about Samsungs - I know someone who uses 840/850 in servers. I'm not kidding. He was saying he's even got EVOs out there that take the abuse in his dev/tst environments where he's got no budget. He works with his hardware reps to only get Samsung drives for everything he puts in service.
Yeah we used to do that. A few hundred RMA's later and we learned our lesson. Don't put Samsung "Pro" or "Evo" drives in servers. The endurance is atrociously bad.
 
  • Like
Reactions: Samir and T_Minus