How many corrected errors is too many?

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

nabsltd

Well-Known Member
Jan 26, 2022
410
274
63
I bought 6x HUS726060AL5215 (6TB SAS3) and ran badblocks on them. Five of them had runs that looked very similar to this:
Code:
# time badblocks -svw -b 4096 -c 131072 /dev/da1
Checking for bad blocks in read-write mode
From block 0 to 1465130645
Pass completed, 0 bad blocks found. (0/0/0 errors)

real    4545m36.927s
user    60m19.007s
sys     20m51.058s
The sixth had the following:
Code:
# time badblocks -svw -b 4096 -c 131072 /dev/da0
Checking for bad blocks in read-write mode
From block 0 to 1465130645
Pass completed, 0 bad blocks found. (0/0/0 errors)

real    7307m25.928s
user    54m20.283s
sys     20m34.253s
You'll notice the running time is 160% as long, dropping the overall average read/write from about 175MB/sec down to about 110MB/sec. The culprit seems to be corrected errors. Fast disks:
Code:
Accumulated power on time, hours:minutes 32734:58
Accumulated start-stop cycles:  30
Accumulated load-unload cycles:  1220
Elements in grown defect list: 0

Vendor (Seagate Cache) information
  Blocks sent to initiator = 53830322436964352

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0      853         0       853   22857415    2228793.886           0
write:         0       90         0        90    3996424     580876.925           0
verify:        0        0         0         0     365101          2.214           0
Slow disk:
Code:
Accumulated power on time, hours:minutes 32905:16
Accumulated start-stop cycles:  34
Accumulated load-unload cycles:  1423
Elements in grown defect list: 83

Vendor (Seagate Cache) information
  Blocks sent to initiator = 8059375755198464

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0   866516         0    866516   17891861    1342161.269           0
write:         0       69         0        69    5686799      84367.620           0
verify:        0        0         0         0     717638          1.713           0
Despite having about the same PoH, the slow disk has read and written far less data, and has 1000x as many corrected errors.

So, dead disk walking?
 

Mithril

Active Member
Sep 13, 2019
356
106
43
Does seem suspect to me. It doesn't claim to be doing full re-reads for those ECC errors at least. Maybe put it though badblocks again and see if it behaves the same? Might be worth contacting the seller with the same finding as seeing what they say; start the convo as open ended "hey by the way, one of the disks behaved like this".
 

nabsltd

Well-Known Member
Jan 26, 2022
410
274
63
make sure it is not the cable first.
All the drives are connected to an SAS3-EL1 backplane, and the problem moves with the drive. So, it's the drive.

Now, if you have any ideas about a hardware issue with the drive's connector that I might be able to see, I'll check.

Might be worth contacting the seller with the same finding as seeing what they say; start the convo as open ended "hey by the way, one of the disks behaved like this".
I'm running a long SMART test now to see if anything else appears. If there is anything wonky from that, I'll definitely just start a return. But, I think you are right, and I'll still see what they can do.
 

RolloZ170

Well-Known Member
Apr 24, 2016
5,320
1,604
113
the errors are only in read's.
maybe this is raw data and different in FW revisions ?
check if FW is same on all drives.
 

Mithril

Active Member
Sep 13, 2019
356
106
43
The slower speed of the drive is suspect that there is *something* going on.
 

Stephan

Well-Known Member
Apr 21, 2017
923
700
93
Germany
Dead disk walking indeed, do not put into production. Once grown defects grow beyond a handful, its downhill.
 

Mithril

Active Member
Sep 13, 2019
356
106
43
Oh I totally missed the defect count, yeeeeeeah. Thats a drive you give to someone you want revenge on.