Toshiba SAS SSD double failure after 6 months powered off

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

abstractalgebra

Active Member
Dec 3, 2013
182
26
28
MA, USA
Toshiba 3.84TB SAS SSD double failure in PERC RAID-1 SSD after 6 months powered off


Running in a Dell R730, PERC H730 mini. Amber Lights on the drives. Other Crucial SSD is continuing to run fine.

iDRAC reported Physical Disk State: Failed for both drives


I can't get any more detailed errors from iDRAC or the PERC Configuration Utility


Anyone else seen this, Any suggestions? Any chance after being powered up for a while the SSDs may un-fail :)

P.S. Yes RAID-1 is not a backup
 
Last edited:

Mithril

Active Member
Sep 13, 2019
356
106
43
The drives may be fine, but you may need to check them with something that is not a raid controller to see real SMART (and run a long self test).

The data may be toast (or damaged). SSDs do NOT do well powered off for long periods of time. They are (to oversimplify) effectively a bunch of small capacitors (charge traps) and there is some level of self-leakage and quantum shenanigans.
 

abufrejoval

Member
Sep 1, 2022
39
10
8
The subject of SSD data retention was explored in a bit more detail some years ago. But generally speaking, once the charge trapped has escaped, it's truly gone and SSDs are not designed for long term archival storage, especially after they have been used intensively or with increasing levels of voltages to encode more data per cell (QLC).

And that's why hard disk habits can bite, where I for one have continued to use older drive as backup devices (although never as the single copy), because they tend to suffer little natural decay powered off.

The worst combination for flash storage seems to have been:
  • using them intensively at high temperatures until they are starting to show signs of wear
  • writing them at high temperatures before storing
  • storing them at medium or high temperatures
At that point data seems to have gone within days.

And the safest way was the opposite of all of the above.

Things aren't quite as bad, if even after an active life, when
  • the last write before storage was at relatively cool temperatures
  • the drive is then stored in a cold place
It's important to note that enterprise SSDs are generally qualified for shorter data retention without power (six months), while consumer devices are supposed to last two years without under nominal conditions. I'm not sure it's actually as a consequence the phyical design differences, though. It might be more an issue of safety margins (and average use). One thing that's for sure is that the greater percentage of spare area in enterprise drives is of no benefit when the power is off.

If the drives have seen little use, even at high operating temperatures and without cold storage they should last quite a while, but when activated I guess you should let them run long enough to finish internal maintenance and rewriting of marginal data blocks. A single complete (yet relaxed) read pass every other month and a bit of settling time would probably do a lot of good and allow you to monitor SMART data: again most SSDs simply aren't designed for archive use and will use idle time to do necessary housekeeping.

A trimming pass before and after a backup jobs helps reduce write amplification and if you have written significant amounts of data, letting them settle to empty the cache area, should help as well.

P.S. Not sure you'd want to activate SSD straight from deep freeze, so if the use case is archive and you do cool them, let them warm to room temperatures before use.
 
  • Like
Reactions: abstractalgebra