Intel Datacenter ssd "Assert mode"

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

ServerSemi

Active Member
Jan 12, 2017
131
34
28
Intel Optane SSD DC P4800X 375GB Review - Enterprise 3D XPoint | Write Pressure, Conclusion, Pricing, and Final Thoughts

"Warning (to non-IT pros)
more important in the case of Intel Datacenter parts, is the matter of 'assertion'. IT specialists don't like wasting time on intermittent faults and silent data corruption. If something is wrong in the slightest, an IT Pro just wants the thing to fail hard so they can replace it and get that portion of their network back up ASAP. As such, Intel programs their DC SSD firmware to enter an 'assert mode' at the slightest sign of trouble. An asserted Intel SSD is effectively a bricked SSD that won't do anything further as it is meant to be replaced. Even if most of the data was good, it will no longer be readable, that's not to say Intel's Datacenter SSDs are bricking left and right, but an SSD 750 (consumer version of the P3xxx) will push through many faults and attempt to continue operating while those same issues would instantly assert a P3520"


I was looking at these cheap nvme intel datacenter drives on ebay to get one for a workstation but now I don't know is this is a good idea. Can anyone please explain this in more detail?
 

MiniKnight

Well-Known Member
Mar 30, 2012
3,073
974
113
NYC
They've got an under 0.3% chance of failure. Or you can wear them out with too many writes which is unlikely.

So you still either mirror or use erasure coding for fault tolerance. Intel just is quick to flag a failing drive so it can be replaced.

If you're buying on ebay, you may put a premium on drives with valid warranties. Or you might just buy an extra drive just in case.

We have thousands... this rarely ever happens on Intel drives.
 

Evan

Well-Known Member
Jan 6, 2016
3,346
598
113
The "fail hard" part and why it's supposed to be good?
Intermittent failures or doing lots of retry hurts performance in a high demand environment that the environment ‘just’ works but is effectively useless, a hard failure means the failed item is just taken out of service and the redundant items continue at full speed.

It’s very easy to detect a failure and then failover but it’s very hard to know what to do when something is just slow to retry on error.