Hello!
I have the following new setup:
I mount them all into JBOD and started running badblocks test on them. After one and a half pass, I check SMART stats and I saw LOTS of ECC corrected errors.
If I look at some of the other drives:
I also tried a brand new hard drive, out of the box. It reports the following data:
Are those numbers normal?
I also looked at smart stats on the server 10k SAS drives, but there are 0 ECC corrected errors reported.
Is it possible that 112 drives are bad?
Matej
I have the following new setup:
- server with LSI 9207 HBA
- Supermicro 837E26-RJBOD1 28bay JBOD
- 28x Seagate Enterprise capacity 3.5 HDD v4 4TB SAS drives
I mount them all into JBOD and started running badblocks test on them. After one and a half pass, I check SMART stats and I saw LOTS of ECC corrected errors.
Code:
smartctl -a /dev/sdh
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.10.0-229.el7.x86_64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Vendor: SEAGATE
Product: ST4000NM0034
Revision: E001
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Logical block size: 512 bytes
Physical block size: 4096 bytes
Lowest aligned LBA: 0
Logical block provisioning type unreported, LBPME=0, LBPRZ=0
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Logical Unit id: 0x5000c5008375b0db
Serial number: Z4F03BS20000R524FN4B
Device type: disk
Transport protocol: SAS
Local Time is: Sun Sep 27 12:37:16 2015 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Temperature Warning: Enabled
=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK
Current Drive Temperature: 39 C
Drive Trip Temperature: 60 C
Manufactured in week 15 of year 2015
Specified cycle count over device lifetime: 10000
Accumulated start-stop cycles: 68
Specified load-unload count over device lifetime: 300000
Accumulated load-unload cycles: 74
Elements in grown defect list: 0
Vendor (Seagate) cache information
Blocks sent to initiator = 3519259482
Blocks received from initiator = 1152791088
Blocks read from cache and sent to initiator = 3489960
Number of read and write commands whose size <= segment size = 8488
Number of read and write commands whose size > segment size = 1
Vendor (Seagate/Hitachi) factory information
number of hours powered up = 47.02
number of minutes until next internal SMART test = 50
Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 1953503667 0 0 1953503667 0 32007.073 0
write: 0 0 0 0 0 4988.356 0
Non-medium error count: 0
[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']
SMART Self-test log
Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ]
Description number (hours)
# 1 Background short Completed - 5 - [- - -]
Long (extended) Self Test duration: 24300 seconds [405.0 minutes]
Code:
Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 3886545 0 0 3886545 0 32007.052 0
write: 0 0 0 0 0 4985.349 0
Non-medium error count: 0
Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 1953499760 0 0 1953499760 0 32007.046 0
write: 0 0 0 0 0 4992.851 0
Non-medium error count: 0
Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 3716742 0 0 3716742 0 32006.910 0
write: 0 0 0 0 0 4981.957 0
Non-medium error count: 0
Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 1953490099 0 0 1953490099 0 32006.947 0
write: 0 0 0 0 0 4976.416 0
Non-medium error count: 0
Code:
Vendor (Seagate/Hitachi) factory information
number of hours powered up = 0.25
Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 2266 0 0 2266 0 0.037 0
write: 0 0 0 0 0 0.005 0
Non-medium error count: 0
I also looked at smart stats on the server 10k SAS drives, but there are 0 ECC corrected errors reported.
Is it possible that 112 drives are bad?
Matej