I have been so frustrated by HGST's unwillingness to provide firmware for these drives that I wanted to share both the firmware and my experience that lead me to the update so that this can be added to the Google universe.
I have two Synology RC18015xs+ servers (high availability) with 12x HUH721010AL5200 in an RXD1215sas. This configuration worked for years until recently the drives started showing delayed ECC reads:
With firmware A21D, those delayed reads would cause the RXD1215sas to hang and HA would kick in, causing the servers to flap back and forth, thinking they had gone down. If I powered off the backup server that would solve the problem and just result in a momentary hang:
All of this has resulted in many hours of frustration, and the replacement of 5 of my 12 drives. Thankfully I kept the 5 "bad" ones.
From the start I was fairly convinced that this was a firmware issue because the drives didn't show any grown defects. Now, I can't answer why the older drives are having delayed ECC reads vs none on newer drives, but I can say that the latest firmware of A92D seems to have corrected the timeouts while the ECC correction is taking place.
Here is the firmware and the associated flashing utilities they provided: HGST.rar
I found it easier (in Linux on Synology at least) to use SG3 utils: The sg3_utils package
And flash with: ./sg_write_buffer -m 5 -vvvvv -I LHGNA92C.bin /dev/sgX
I flashed all 17 of my drives and put the worst "bad" one back on the array to test. It has been slowly increasing the delayed ECC reads but I haven't experienced any issues with the rebuild or related performance.
Hope this helps someone!
I have two Synology RC18015xs+ servers (high availability) with 12x HUH721010AL5200 in an RXD1215sas. This configuration worked for years until recently the drives started showing delayed ECC reads:
Code:
Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 0 170357 0 170357 13436815 590624.886 0
write: 0 0 0 0 640515 47190.265 0
verify: 0 0 0 0 1545917 0.000 0
Code:
[27690.768212] sd 6:0:10:0: attempting task abort! scmd(ffff880133577cc0)
[27690.774740] sd 6:0:10:0: [sas11] CDB:
[27690.778481] cdb[0]=0x4d: 4d 00 4d 00 00 00 00 00 10 00
[27690.783761] scsi target6:0:10: handle(0x0014), sas_address(0x5000cca25110a1f5), phy(18)
[27690.791747] scsi target6:0:10: enclosure_logical_id(0x5001132f000d2bff), slot(0)
[27690.799152] sd 6:0:10:0: task abort: SUCCESS scmd(ffff880133577cc0)
From the start I was fairly convinced that this was a firmware issue because the drives didn't show any grown defects. Now, I can't answer why the older drives are having delayed ECC reads vs none on newer drives, but I can say that the latest firmware of A92D seems to have corrected the timeouts while the ECC correction is taking place.
Here is the firmware and the associated flashing utilities they provided: HGST.rar
I found it easier (in Linux on Synology at least) to use SG3 utils: The sg3_utils package
And flash with: ./sg_write_buffer -m 5 -vvvvv -I LHGNA92C.bin /dev/sgX
I flashed all 17 of my drives and put the worst "bad" one back on the array to test. It has been slowly increasing the delayed ECC reads but I haven't experienced any issues with the rebuild or related performance.
Hope this helps someone!