Intel Datacenter ssd "Assert mode"

ServerSemi · Feb 5, 2018

Intel Optane SSD DC P4800X 375GB Review - Enterprise 3D XPoint | Write Pressure, Conclusion, Pricing, and Final Thoughts

"Warning (to non-IT pros)
more important in the case of Intel Datacenter parts, is the matter of 'assertion'. IT specialists don't like wasting time on intermittent faults and silent data corruption. If something is wrong in the slightest, an IT Pro just wants the thing to fail hard so they can replace it and get that portion of their network back up ASAP. As such, Intel programs their DC SSD firmware to enter an 'assert mode' at the slightest sign of trouble. An asserted Intel SSD is effectively a bricked SSD that won't do anything further as it is meant to be replaced. Even if most of the data was good, it will no longer be readable, that's not to say Intel's Datacenter SSDs are bricking left and right, but an SSD 750 (consumer version of the P3xxx) will push through many faults and attempt to continue operating while those same issues would instantly assert a P3520"

I was looking at these cheap nvme intel datacenter drives on ebay to get one for a workstation but now I don't know is this is a good idea. Can anyone please explain this in more detail?

MiniKnight · Feb 5, 2018

They've got an under 0.3% chance of failure. Or you can wear them out with too many writes which is unlikely.

So you still either mirror or use erasure coding for fault tolerance. Intel just is quick to flag a failing drive so it can be replaced.

If you're buying on ebay, you may put a premium on drives with valid warranties. Or you might just buy an extra drive just in case.

We have thousands... this rarely ever happens on Intel drives.

i386 · Feb 5, 2018

ElBerryKM13 said:
Can anyone please explain this in more detail?

The "fail hard" part and why it's supposed to be good?

Evan · Feb 5, 2018

i386 said:
The "fail hard" part and why it's supposed to be good?

Intermittent failures or doing lots of retry hurts performance in a high demand environment that the environment ‘just’ works but is effectively useless, a hard failure means the failed item is just taken out of service and the redundant items continue at full speed.

It’s very easy to detect a failure and then failover but it’s very hard to know what to do when something is just slow to retry on error.

Search

Intel Datacenter ssd "Assert mode"

ServerSemi

Active Member

MiniKnight

Well-Known Member

i386

Well-Known Member

Evan

Well-Known Member