WD RED Drive About to Fail?

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

sapper6fd

Member
May 21, 2013
48
1
8
I'm starting to use SMART Tools more and more. Today I ran a report on the status of four of my main data drives. They are all 2TB WD RED drives currently in my custom built NAS Server. Something caught my eye as its the odd one out of the bunch. Is this drive (ada2) in the pre fail stage? Its less than half a year old and I'm not the most knowledgable on SMART tools.

Heres the data from all four drives. The second drive down is the one causing concern.

Code:
[root@freenas] ~# smartctl -A /dev/ada1
smartctl 5.43 2012-06-30 r3573 [FreeBSD 8.3-RELEASE-p7 amd64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   187   178   021    Pre-fail  Always       -       5625
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       56
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   096   096   000    Old_age   Always       -       3190
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       56
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       47
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       8
194 Temperature_Celsius     0x0022   119   113   000    Old_age   Always       -       31
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0





[root@freenas] ~# smartctl -A /dev/ada2
smartctl 5.43 2012-06-30 r3573 [FreeBSD 8.3-RELEASE-p7 amd64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       1872
  3 Spin_Up_Time            0x0027   187   177   021    Pre-fail  Always       -       5650
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       224
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   094   094   000    Old_age   Always       -       4993
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       201
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       170
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       53
194 Temperature_Celsius     0x0022   118   104   000    Old_age   Always       -       32
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       10
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   199   000    Old_age   Offline      -       16




[root@freenas] ~# smartctl -A /dev/ada3
smartctl 5.43 2012-06-30 r3573 [FreeBSD 8.3-RELEASE-p7 amd64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   179   175   021    Pre-fail  Always       -       4025
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       173
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   094   094   000    Old_age   Always       -       4943
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       152
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       132
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       40
194 Temperature_Celsius     0x0022   114   103   000    Old_age   Always       -       33
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0





[root@freenas] ~# smartctl -A /dev/ada4
smartctl 5.43 2012-06-30 r3573 [FreeBSD 8.3-RELEASE-p7 amd64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   181   175   021    Pre-fail  Always       -       3950
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       171
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   094   094   000    Old_age   Always       -       4943
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       150
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       130
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       40
194 Temperature_Celsius     0x0022   116   105   000    Old_age   Always       -       31
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0
 

bwillcox

Member
Jan 20, 2013
32
0
6
Tejas
The ada2 disk has 10 pending sectors on it. These are sectors the drive is concerned about and will remap if a write fails to them.

As long as you are using redundancy you should be OK, the raid will map those out on the next write. Keep an eye on that though, if you start seeing that count grow daily or start seeing offline uncorrectable sectors then it will be time to swap that drive out.

It is normal for drives especially the large ones to get a few medium errors over their life, that is why you want to use redundant raid levels.

:cool:
 

sapper6fd

Member
May 21, 2013
48
1
8
I'm running RAIDz between the 4 drives right now. I've seen the error messages from my weekly smart tests for the past few months, the count hasn't change though. I guess I'll continue to monitor it.
 

Mike

Member
May 29, 2012
482
16
18
EU
Good chance that on overwriting those sectors the pending sector count will drop.
 

bwillcox

Member
Jan 20, 2013
32
0
6
Tejas
Yup. It'll likely move the pending sectors to the reallocated events count, which means that the drive did successfully reallocate the medium error and replace it with a spare sector from its spare.

Sapper you should be in good shape... Just make sure you're doing a weekly scrub of that zpool and ZFS will take care of your data.

-b-
 

sapper6fd

Member
May 21, 2013
48
1
8
Good to know.

A weekly scrub you say bwillcox? I've been completing monthly scrubs, but I'll drop it down to weekly.
 

ColdCanuck

Member
Jul 23, 2013
38
3
8
Halifax NS
WD firmware has been known to lie, especially on the green drives. I've not had any bad Red drives yet so I can't comment on those but I have had three different greens give me pending sectors which **do not** turn into reallocated sectors when written , they just silently disappear. If you hadn't been watching you would never know.

In all cases the drive died in 3 to 6 months after starting to do this.. So be warned and watchful..

In addition you should do nightly short SMART self tests and weekly long tests. This might at least catch the bad sectors in the test logs. But the best defense is a zfs scrub which will catch the error, hopefully while you have enough disk left to replace it.

Time to bone up on the methods or zfs replacement if you are not now acquainted with them. Hint ,you can make file backed very small test pools to learn the syntax on your particular flavour of ZFS.. (yes that's flavour with a U and ZFS with a zed :0)
 

bwillcox

Member
Jan 20, 2013
32
0
6
Tejas
All of the Zfs info I've seen recommends weekly scrubbing of arrays composed of consumer grade disks. It recommends biweekly for enterprise sata and monthly for SAS. The Reds do have the TLER but are otherwise a Green mechanism so are consumer drives.

I also have been following this for my MD arrays.

I also do daily short selftesting of my drives and weekly long testing of my drives with smartd in addition to regular on and semi-regular off site backups.

You might call me paranoid, but there are two kinds of people. Those that have yet to lose data, and those that have lost data the hard way.

-b
 

omniscence

New Member
Nov 30, 2012
27
0
0
WD firmware has been known to lie, especially on the green drives. I've not had any bad Red drives yet so I can't comment on those but I have had three different greens give me pending sectors which **do not** turn into reallocated sectors when written , they just silently disappear. If you hadn't been watching you would never know.
This is actually expected behaviour. Pending sector just means that it is a sector where the checksum does not match the data and is thus invalid. A write to such a sector will replace the invalid data by valid data and decrease the pending sector count. A reallocated sector means that the drive fails to write to it. You can artificially create a pending sector by pulling the power plug during write operations.
 

ColdCanuck

Member
Jul 23, 2013
38
3
8
Halifax NS
Oh it's not hard to create pending sectors, just use WD Greens :eek:)

It's not the behaviour I expect or want. In my experience these bad sectors do not just go away if you ignore them. They come back. Again and again until, if you are lucky, they fail SMART tests and you can get a warranty replacement.

Seagate, despite their many other faults, have firmware which spares out failing sectors much more aggressively. They almost always map a pending sector to a reallocated position as soon as it knows the correct contents i.e. on the next write to this sector. I much prefer this to WDs behaviour of simply writing the new data on the old questionable sector and hoping for the best. After all this sector did throw a read error, at the very least it is soft.


I have had WD green drives which would never remap the sector even after repeated errors on that sector. In many cases I had to pin a file over the bad area to keep from losing any more data until I could replace the drive. In the three cases I cited the drive ultimately failed at the location of the pending sectors WITHOUT EVER REALLOCATING FROM the pool of spare sectors. Had it done so the drive would not have failed, although I would not use it.

Not the behavour I expect, YMMV :eek:)
 

wookienz

Member
Apr 2, 2012
98
4
8
i have 12TB on sata drives in an Z2 array connected to 3 m1015's. The last scrub i did took 56 hours. Is this normal? I do them every 3-6 months...because im ignorant about these things...not any more.

How do i job the smartctl daily as suggested and get emailed the results? As long as the short tests say passed, should i be happy?

thanks.