Scary SMART Values on New Seagate Enterprise Drives?

mattlach

Member
Aug 1, 2014
153
14
18
Hey all,

So, I'm 4 disks into my "swap all 12 drives in my NAS with larger drives and resilver to grow my ZFS pool" project, and decided to go with 10TB Seagate Helium Enterprise drives (ST10000NM0016)

The four drives I have thus far come from two orders of two drives each. Two from Newegg and two from Amazon.

All drives passed the following tests before being resilvered into the pool:
- SMART Short test
- SMART Conveyance Test
- Badblocks write test (all four test patterns, taking ~5 days)

I've done some reading on this in the past where it was suggested that one pretty much ignore the "RAW VALUES" in SMART readouts from Seagate drives as they probably don't mean what you think they mean, and instead use Seatools for any diagnostics.

The problems I have with Seatools:

1.) The Linux version is old and not maintained and didn't appear to give me any useful information.

2.) The Windows version might be more fully featured, but it doesnt seem to recognize Seagate hard drives as true Seagate hard drives when sitting in a USB dock, and I don't have a box with easily accessible SATA ports I can stick one in to right now

3.) There is a DOS version, but that requires taking my server offline and booting from a Freedos USB stick. My server may not be production, but it is more "home production" than lab, so this would be really inconvenient.


Why I am concerned:

I know "ignore the RAW VALUE field" is what I've found when googling inthe past, but what about the three digit weighted values? Are those to be ignored as well?

Just look at some of these, notably the "Hardware_ECC_Recovered" looks pretty scary on all of these:

Disk 1:
Code:
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   075   064   044    Pre-fail  Always       -       32160040
  3 Spin_Up_Time            0x0003   096   096   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       4
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   081   060   045    Pre-fail  Always       -       129401214
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       494 (1 186 0)
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       3
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   088   088   000    Old_age   Always       -       12
190 Airflow_Temperature_Cel 0x0022   071   055   040    Old_age   Always       -       29 (Min/Max 26/35)
191 G-Sense_Error_Rate      0x0032   098   098   000    Old_age   Always       -       4187
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       5
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       30
194 Temperature_Celsius     0x0022   029   045   000    Old_age   Always       -       29 (0 19 0 0 0)
195 Hardware_ECC_Recovered  0x001a   027   003   000    Old_age   Always       -       32160040
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       491 (144 32 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       83916735051
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       83452883922
Disk 2:
Code:
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   080   066   044    Pre-fail  Always       -       91957936
  3 Spin_Up_Time            0x0003   096   096   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       4
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   079   061   045    Pre-fail  Always       -       81905155
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       283 (213 63 0)
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       4
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   090   090   000    Old_age   Always       -       10
190 Airflow_Temperature_Cel 0x0022   072   067   040    Old_age   Always       -       28 (Min/Max 20/33)
191 G-Sense_Error_Rate      0x0032   098   098   000    Old_age   Always       -       4777
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       2
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       16
194 Temperature_Celsius     0x0022   028   040   000    Old_age   Always       -       28 (0 20 0 0 0)
195 Hardware_ECC_Recovered  0x001a   006   002   000    Old_age   Always       -       91957936
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       273 (12 162 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       83844942895
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       82966201892
Disk 3:
Code:
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   079   064   044    Pre-fail  Always       -       70937936
  3 Spin_Up_Time            0x0003   098   098   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       2
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   077   060   045    Pre-fail  Always       -       53737869
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       158 (50 181 0)
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       2
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   098   098   000    Old_age   Always       -       2
190 Airflow_Temperature_Cel 0x0022   071   067   040    Old_age   Always       -       29 (Min/Max 24/33)
191 G-Sense_Error_Rate      0x0032   098   098   000    Old_age   Always       -       4720
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       1
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       14
194 Temperature_Celsius     0x0022   029   040   000    Old_age   Always       -       29 (0 21 0 0 0)
195 Hardware_ECC_Recovered  0x001a   007   006   000    Old_age   Always       -       70937936
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       157 (185 136 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       83754676019
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       78155955092
Disk 4:
Code:
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   083   065   044    Pre-fail  Always       -       193684680
  3 Spin_Up_Time            0x0003   098   098   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       2
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   077   061   045    Pre-fail  Always       -       54155047
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       158 (142 118 0)
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       2
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   071   067   040    Old_age   Always       -       29 (Min/Max 25/33)
191 G-Sense_Error_Rate      0x0032   098   098   000    Old_age   Always       -       4822
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       0
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       10
194 Temperature_Celsius     0x0022   029   040   000    Old_age   Always       -       29 (0 21 0 0 0)
195 Hardware_ECC_Recovered  0x001a   011   009   000    Old_age   Always       -       193684680
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       157 (133 76 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       83781193947
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       78158256547

So, am I concerned for nothing. Does one just ignore SMART readouts on Seagate Enterprise drives, or did I get 4 drives from 2 different retailers that are all going bad within only a couple of weeks?

Much appreciated.

Crossposted here and here for more eyeballs, as server/enterprise/*nix stuff tends to get less traffic.
 
Last edited:

Blinky 42

Active Member
Aug 6, 2015
565
201
43
44
PA, USA
Not sure, but I took a look at my (non-enterprise) 10T Seagates and the values are are a lot lower considering they have been up for months

Code:
[root@cube01 ~]# for D in /dev/sd? ; do smartctl -a $D | egrep -i 'ecc|model|hour' ; done
Device Model:     ST10000DM0004
  9 Power_On_Hours          0x0032   093   093   000    Old_age   Always       -       6427 (54 153 0)
195 Hardware_ECC_Recovered  0x001a   100   001   000    Old_age   Always       -       647272
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       277351808112916
Device Model:     ST10000DM0004
  9 Power_On_Hours          0x0032   093   093   000    Old_age   Always       -       6444 (34 249 0)
195 Hardware_ECC_Recovered  0x001a   100   001   000    Old_age   Always       -       584365
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       79340930865444
Device Model:     ST10000DM0004
  9 Power_On_Hours          0x0032   093   093   000    Old_age   Always       -       6444 (5 25 0)
195 Hardware_ECC_Recovered  0x001a   003   001   000    Old_age   Always       -       242698676
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       274950921394468
Device Model:     ST10000DM0004
  9 Power_On_Hours          0x0032   093   093   000    Old_age   Always       -       6444 (146 252 0)
195 Hardware_ECC_Recovered  0x001a   001   001   000    Old_age   Always       -       4906990
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       78353088387364

and -l error for all shows no errors still (even sdc which has much higher ECC Recovered).
 

mattlach

Member
Aug 1, 2014
153
14
18
Not sure, but I took a look at my (non-enterprise) 10T Seagates and the values are are a lot lower considering they have been up for months

<snip>

and -l error for all shows no errors still (even sdc which has much higher ECC Recovered).
Interesting.


Yours is down to 001.

I'm starting to suspect these numbers mean next to nothing on Seagate drives...
 

Terry Kennedy

Well-Known Member
Jun 25, 2015
1,067
506
113
New York City
www.glaver.org
So, I'm 4 disks into my "swap all 12 drives in my NAS with larger drives and resilver to grow my ZFS pool" project, and decided to go with 10TB Seagate Helium Enterprise drives (ST10000NM0016)

The four drives I have thus far come from two orders of two drives each. Two from Newegg and two from Amazon.
Before I get into the numbers, go to the Seagate warranty checker (link) and see if these are actually retail drives with warranties or not, and when the warranty ends. Unfortunately, both Newegg and Amazon have "marketplace" sellers that aren't always the most ethical.
 

mattlach

Member
Aug 1, 2014
153
14
18
Before I get into the numbers, go to the Seagate warranty checker (link) and see if these are actually retail drives with warranties or not, and when the warranty ends. Unfortunately, both Newegg and Amazon have "marketplace" sellers that aren't always the most ethical.
This is good information, and yes, I have run into it before.

The first two drives I bought from Newegg (Sold by Newegg, not by a marketplace seller) had this issue. Once I received them and ran the serials, they showed up as OEM drives without warranty. After contacting Newegg, they opened a ticket and communicated with Seagate, and now they magically show up as being covered on Seagates Warranty validator tool.

The second two I bought from Amazon via Marketplace seller goHardDrive. These showed up as having warranty when I received them, so all good there.

I'm trying to alternate sellers to try to get as different date codes, and shipping impacts as possible. (Though judging by the warranty expiration dates of my four drives to date being within a day of eachother, that doesn't seem to have worked out the way I had hoped) The next group of two I'll be ordering from B&H despite their higher pricing to get as much variety in my sourcing as possible. Hopefully those will arrive with full warranty coverage, and I won't need to repeat my Newegg issue.
 
Last edited:

msg7086

Active Member
May 2, 2017
256
69
28
33
It's Old_age, it's just statistics, or history.

ECC recovered sounds right. They are recovered, right?
 

mattlach

Member
Aug 1, 2014
153
14
18
ECC recovered sounds right. They are recovered, right?

This means that there was an error in the drive, but that internal drive ECC recovered it.

It is an important pre-failure statistic. As the drive gets older one would expect the number of recovered errors to increase, until the point where it encounters one it cannot recover, and then you have a problem.

The purpose of SMART is to warn you before this happens, so you can take appropriate action.

It doesn't always work that way though.
 

msg7086

Active Member
May 2, 2017
256
69
28
33
This means that there was an error in the drive, but that internal drive ECC recovered it.

It is an important pre-failure statistic.
I'm not sure if it's true.

First it's an old_age number, not a pre-failure number.

Second, the RAW number doesn't even make sense. Like, some says that some vendor records the interval between 2 ecc corrections, some says that such count record rolls over to 0 after reaching certain number.

What I'm trying to say is, it's not a number that can present a certain meaning.

Also ECC recovery is so common in modern hard drive (due to the density) it's not even a problem, well, at least for the first couple years.