From a longer-term perspective, a reminder came up that we needed to do a refresh of Used enterprise SSDs: Dissecting our production SSD population
As I was looking at the data a few weeks ago, we still had not experienced an Intel SSD failure. In fact, our failures in the three-quarters after that article were absolutely minimal.
Just when I thought all was lost, I received this notification last evening:
An Intel DC S3700 400GB failing!
For those who are not familiar, this is one of the Proxmox VE drive failure alert features in newer versions. I do not remember when they added SMART but I do know it is in PVE 4.4 but it was not in PVE 4.1.
As I was looking at the data a few weeks ago, we still had not experienced an Intel SSD failure. In fact, our failures in the three-quarters after that article were absolutely minimal.
Just when I thought all was lost, I received this notification last evening:
Code:
This message was generated by the smartd daemon running on:
host name: fmt-pve-07
DNS domain: servethebiz.com
The following warning/error was logged by the smartd daemon:
Device: /dev/sdf [SAT], FAILED SMART self-check. BACK UP DATA NOW!
Device info:
INTEL SSDSC2BA400G3E, S/N:_____________, WWN:_________, FW:5DV10250, 400 GB
For details see host's SYSLOG.
You can also use the smartctl utility for further investigation.
Another message will be sent in 24 hours if the problem persists.
For those who are not familiar, this is one of the Proxmox VE drive failure alert features in newer versions. I do not remember when they added SMART but I do know it is in PVE 4.4 but it was not in PVE 4.1.