Regularly Failing ST4000 2.5" 4TB HDDs

Bradford · Mar 19, 2017

Hello,

I have a home server with a ZFS storage array - mirrored VDEVs with 6x 4TB drives. I have been experiencing pretty regular failures of my drives, which are all ST4000 pulled out of the Seagate Backup Plus 4TB drives. I have been lucky to have avoided any major data loss, but they shouldn't be failing this fast. They have reasonably good airflow, and their temperatures are usually reasonably low (one drive is currently at 35C, highest is 43C).

When a drive fails, it will disappear from my SAS card, and rebooting won't bring it back. It appears to be totally dead. The third such drive failed a couple days ago. I'm not sure what would be causing it. The hardware I'm using are, from the SAS card up:

* SAS: IBM M1015 flashed to LSI2008
* SAS Cables: Monoprice 0.75m 30AWG Internal Mini SAS 36-Pin Male with Latch to 7-Pin Female Forward Breakout Cable (new)
* Hot Swap Bay: AMS DS-528SSBA Anti-Vibration 8x2.5" SAS/SATA HDD Backplane Module

Since I recently replaced the SAS cables, I suspect it might be the hot swap bay or the SAS card. Do these symptoms appear to point to what is failing? Could it just be the drives? I haven't heard these drives failing like this from anyone, so I kind of doubt I've been this unlucky.

If this isn't enough information, where could I look? I can't read the smart data on the failed drive, but should I be looking at a particular smart value on the other drives?

Thanks for any advice, I appreciate it.

sth · Mar 19, 2017

I had this problem when I was trying to create a 24 drive array under FreeNAS 9.10 with those exact drives. Even when getting it stable for long enough to copy data I had terrible performance (the threads on here somewhere) and I suspect the issue was the drives were ultimately struggling to communicate with the controller and kept hanging / falling out the array which was leaving them into a faulty MBR type state.
I tried the same hardware under Linux and OmniOS, performance was greatly improved and I stopped having drives fail.

EDIT: You will be able to revive the faulty drives by recreating the MBR I suspect.

Bradford · Mar 19, 2017

sth said:
I had this problem when I was trying to create a 24 drive array under FreeNAS 9.10 with those exact drives. Even when getting it stable for long enough to copy data I had terrible performance (the threads on here somewhere) and I suspect the issue was the drives were ultimately struggling to communicate with the controller and kept hanging / falling out the array which was leaving them into a faulty MBR type state.
I tried the same hardware under Linux and OmniOS, performance was greatly improved and I stopped having drives fail.

EDIT: You will be able to revive the faulty drives by recreating the MBR I suspect.

Thanks for your report. I am running these on Linux already (Proxmox) and I'm happy with the performance - they're maxing out my gigabit network but that could just be my cache SSD.

Would having a bad MBR make the drive disappear from the controller?

sth · Mar 21, 2017

yes, without a correctly configured MBR I think the controller will struggle to initialise them.
I can't for the life of me remember the tool I used to reinitialise the drives though.

Search

Regularly Failing ST4000 2.5" 4TB HDDs

Bradford

Active Member

sth

Active Member

Bradford

Active Member

sth

Active Member