PM863a health status?

richard.dzavoronok · Jul 14, 2023

I won auction for 8 1.92tb PM863a drives. They were advertised as "at 70% health". I assumed that would be indicator of TBW.
All of them are showing #5 Reallocated Sector Count non zero values. I assume that's all right for those drives as they are in the ballpark of 20-50 and #180 Unused Reserved Block Count is ~6400 so plenty to spare here for few years (of course if this number will not rise rocket speed).

But three of them are showing #187 Uncorrectable Error Count somewhere around ~20, also #195 ECC Error Rate is same value for each drive (ie: #187 is 20, #195 is also 20).
GSmartControl is showing that these drives are in pre-fail state, it's showing errors in error log, red numbers etc...

So I used official software DC Toolkit from samsung as this is specifically for PM863(a) (and other) drives.
DC Toolkit is showing that these drives are in good health.
I ran extended offline check for every one of them, result is "Completed without error", multiple times.
I did some research and badblocks will not help me here to determine state.

Screenshots attached.

One drive did cost 46 euros including shipping, that's why I'm hesitant on returning them

I don't know what to do. Maybe you guys will have more insight on this and help me to decide?

rtech · Jul 14, 2023

Does #187 Uncorrectable Error Count actually matter on SSDs? AFAIK Controller will just mark the sector bad and will write elsewhere.

Have you considered that GUI could have been written for HDDs in mind? What does the smartctl -a say?

richard.dzavoronok · Jul 15, 2023

"Have you considered that GUI could have been written for HDDs in mind?" - That's what I was thinking.
Smartctl -a output attached.

rtech · Jul 15, 2023

In smarctl -a output SMART Log errors were logged but they occured long time ago and there no indication what it actually caused them.

If i were to keep it i would definitly put in BTRFS/ZFS RAID 1 array and setup a scrub via cron, smartd monitoring via email

SSDs work until they dont so i personaly knowing how NAND flash works would not put much stock in these errors, But i would definitely not trust them scrub and smartd for sure.

richard.dzavoronok · Jul 16, 2023

Thanks a lot for tips/insight.
In the end I'll be returning drives as I wanted to use them as standalone drives in my daily drivers + boot drives and would not trust them enough.

rtech · Jul 17, 2023

Your options here are:
Get new stuff with warranty or get old enough gear that was validated by time and usage.
For example i use SSD that has been made in 2011 SATAII yes 275MB max.
My boot disks are SLC SSD formerly intended to be used as cache also from around 2010s.

Just because you got enterprise gear doesnt mean it was used properly or handled properly. These SMART errors could be indicators of such mishandling.
Once i got second hand servers with black dust in them. Black dust typcally means graphite which is conductive. You can guess how high was the failure rate.

mr44er · Jul 17, 2023

These errors also happen if the controller or HBA had a small time period of overheating/tripping. That's no defect on/from the disk, but the controller resets, requests the data again and the disk has to log that and 'doesn't know better'.

I would not care about that, but I don't know much about this Samsung model in specific and I also use only and everywhere ZFS with enough redundancy. If a disk dies, it dies...I always have spares at hand.

richard.dzavoronok · Jul 21, 2023

Thanks to you all for your insight.
I returned the drives

Found a local seller who is selling unused 2tb MX500 drives quite cheap along with 2tb Crucial P2 nvme for same price.
Now I need to decide if I want to use hw raid or vroc.

Search

PM863a health status?

richard.dzavoronok

Active Member

Attachments

rtech

Active Member

richard.dzavoronok

Active Member

Attachments

rtech

Active Member

richard.dzavoronok

Active Member

rtech

Active Member

mr44er

Active Member

richard.dzavoronok

Active Member