WD100EMAZ strange noise

msg7086 · Dec 31, 2018

Just got my black friday WD Easystore opened. (One month late because I've been lazy.)

Shuck it and start using it, I noticed that there's random noise from the drive when reading / writing. Sounds to me like the head loses the track and tries to relocate itself. Installed Debian on to this drive, and did some random apt-get install, I heard a few times the "head reset" noise during the reading / writing, and apt lags for a fraction of a second every time I hear it.

So I thought, screw it, let me scan this drive.

Timeout threshold was set to 300ms per cylinder since normally it's 10-40ms. Let it run over night (20 hours till now) and today I see a whooping 800+ timed out cylinders all over the disk by now, and is still increasing.

0<=<=60ms 1000000+
60<=<=200ms 4500+
200<=<=300ms 0
300ms+ (Timeout) 850+

For those who have the same drive, do you observe similar issues? Do you hear random and frequent "head reset" when reading / writing? I want to make sure it's a problematic drive before asking for an exchange.

Note that I have un-mount partitions before doing disk scans, so should not be interfered by someone else on the computer.

BLinux · Dec 31, 2018

That's a problem. I have 12 of those fully burned in and they never make such noises. Really quiet actually.

msg7086 · Dec 31, 2018

Finished scanning. Not good.

I'm gonna replace it.

msg7086 · Jan 5, 2019

The replacement is a good one.

madbrain · Jan 5, 2019

Which software are you using to scan the drives ? I have 5 of these in my NAS in ZFS / RAID2 . I have seen some disconnects happen randomly which cause a very short resilver.

I used h2testw under Windows to test the surface of each of them for nearly 24 hours, found no errors, though.

msg7086 · Jan 7, 2019

This is DiskGenius, produced by a Chinese company. They also have an English version called PartitionGuru which basically does the same thing. Free version is enough for scanning the drives and even able to fix logical bad sectors.

However random disconnects don't sound like a disk surface / bad sectors issue. Do you observe disconnections on single drive or random drives?

madbrain · Jan 8, 2019

Seems to be random. it's mostly on bootup. Sometimes the ZFS volume won't import automatically, but will manually. I think some disks are taking longer than others to spin up. dmesg shows a 10 second interval with different times for the drives being seen.
I don't think the surface is bad on any of them.

madbrain · Jan 8, 2019

Pretty much every bootup there is an issue. SMART data looks OK on all drives, though the number of poweups is wildly different between them, strangely.
I don't get any issues with suspend/resume. I just rebooted and now I got issues with 2 drives. Definitely different serial numbers than before, so not consistent.

madbrain@SERVER10G:~$ sudo tcsh
[sudo] password for madbrain:
SERVER10G:~# zpool status
no pools available
SERVER10G:~# zpool import array
SERVER10G:~# zpool status
pool: array
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: ZFS Message ID: ZFS-8000-9P
scan: resilvered 355M in 0h0m with 0 errors on Fri Jan 4 21:15:49 2019
config:

NAME STATE READ WRITE CKSUM
array ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
sdc ONLINE 0 0 0
sde ONLINE 0 0 2
sda ONLINE 0 0 0
sdd ONLINE 0 0 0
sdf ONLINE 0 0 0

errors: No known data errors
SERVER10G:~# zpool status
pool: array
state: ONLINE
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Tue Jan 8 18:40:51 2019
22.4G scanned out of 20.0T at 234M/s, 24h57m to go
8.75G resilvered, 0.11% done
config:

NAME STATE READ WRITE CKSUM
array ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
sdc ONLINE 0 0 0
sde ONLINE 0 0 2 (resilvering)
sda ONLINE 0 0 0
sdd ONLINE 0 0 0
sdf ONLINE 0 0 0 (resilvering)

errors: No known data errors
SERVER10G:~# zpool status

I don't think it will really take the 25 hrs that it says it will to resilver. It's already down to 18hrs after just a few minutes.
I have never had 2 drives resilver at a time, though ...

msg7086 · Jan 8, 2019

TBH, this sounds to me like a power supplement issue. Either PSU or cable.

madbrain · Jan 8, 2019

Thanks. The resilver of the two HDDs took 1h16min this time, longest I have seen it take after these boot issues.

I noticed that one of my two LSI cards to which the disks are connected had backlevel firmware and BIOS, so I just updated them.
Next (warm) boot after that, the ZFS pool still didn't automatically import, but there were no errors upon manual import.
I then did a shutdown and a cold boot. Now, two disks are missing and not even resilvering !

madbrain@SERVER10G:~$ zpool status
pool: array
state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: ZFS Message ID: ZFS-8000-4J
scan: resilvered 673G in 1h16m with 0 errors on Tue Jan 8 19:57:30 2019
config:

NAME STATE READ WRITE CKSUM
array DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
sdc ONLINE 0 0 0
sdf ONLINE 0 0 0
sda ONLINE 0 0 0
8203626860017461470 FAULTED 0 0 0 was /dev/sde1
167936804069715386 UNAVAIL 0 0 0 was /dev/sdg1

You may be right about the power issue. Time to open the beast and try to run some drives off another PSU cable.

OTOH, fdisk -l shows that all 5 10TB disks are powered up. Very weird.

dmesg isn't reporting any issue either.

[ 2.898946] mpt2sas_cm0: host_add: handle(0x0001), sas_addr(0x500605b006de0ba0), phys(8)
[ 3.029323] mpt2sas_cm1: host_add: handle(0x0001), sas_addr(0x500605b008f2c6b0), phys(8)
[ 3.051632] scsi 8:0:0:0: Direct-Access Generic- USB3.0 CRW -0 1.00 PQ: 0 ANSI: 0 CCS
[ 3.066013] scsi 8:0:0:1: Direct-Access Generic- USB3.0 CRW -1 1.00 PQ: 0 ANSI: 0 CCS
[ 3.080115] scsi 8:0:0:2: Direct-Access Generic- USB3.0 CRW -2 1.00 PQ: 0 ANSI: 0 CCS
[ 3.528055] scsi 7:0:0:0: Direct-Access ATA WDC WD100EMAZ-00 0A83 PQ: 0 ANSI: 6
[ 3.528057] scsi 7:0:0:0: SATA: handle(0x0009), sas_addr(0x4433221104000000), phy(4), device_name(0x0000000000000000)
[ 3.528058] scsi 7:0:0:0: enclosure logical id (0x500605b008f2c6b0), slot(7)
[ 3.528132] scsi 7:0:0:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
[ 3.777924] scsi 7:0:1:0: Direct-Access ATA WDC WD100EMAZ-00 0A83 PQ: 0 ANSI: 6
[ 3.777926] scsi 7:0:1:0: SATA: handle(0x000a), sas_addr(0x4433221105000000), phy(5), device_name(0x0000000000000000)
[ 3.777927] scsi 7:0:1:0: enclosure logical id (0x500605b008f2c6b0), slot(6)
[ 3.778114] scsi 7:0:1:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
[ 4.027902] scsi 7:0:2:0: Direct-Access ATA WDC WD100EMAZ-00 0A83 PQ: 0 ANSI: 6
[ 4.027904] scsi 7:0:2:0: SATA: handle(0x000b), sas_addr(0x4433221107000000), phy(7), device_name(0x0000000000000000)
[ 4.027905] scsi 7:0:2:0: enclosure logical id (0x500605b008f2c6b0), slot(4)
[ 4.027978] scsi 7:0:2:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
[ 4.278127] scsi 7:0:3:0: Direct-Access ATA WDC WD100EMAZ-00 0A83 PQ: 0 ANSI: 6
[ 4.278129] scsi 7:0:3:0: SATA: handle(0x000c), sas_addr(0x4433221106000000), phy(6), device_name(0x0000000000000000)
[ 4.278130] scsi 7:0:3:0: enclosure logical id (0x500605b008f2c6b0), slot(5)
[ 4.278300] scsi 7:0:3:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
[ 9.024095] mpt2sas_cm0: port enable: SUCCESS
[ 9.146894] scsi 0:0:0:0: Direct-Access ATA WDC WD100EMAZ-00 0A83 PQ: 0 ANSI: 6
[ 9.146897] scsi 0:0:0:0: SATA: handle(0x0009), sas_addr(0x4433221100000000), phy(0), device_name(0x0000000000000000)

msg7086 said:
TBH, this sounds to me like a power supplement issue. Either PSU or cable.

madbrain · Jan 8, 2019

Did another soft reboot and the disks showed up, and resilvered automatically and instantly ...

madbrain@SERVER10G:~$ zpool status
pool: array
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: ZFS Message ID: ZFS-8000-9P
scan: resilvered 172K in 0h0m with 0 errors on Tue Jan 8 23:28:30 2019
config:

NAME STATE READ WRITE CKSUM
array ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
sdc ONLINE 0 0 0
sdf ONLINE 0 0 0
sda ONLINE 0 0 0
sde ONLINE 0 0 3
sdg ONLINE 0 0 1

errors: No known data errors
madbrain@SERVER10G:~$

Seems there is a definite problem with the initial cold boot.

I had not seen the disks go completely missing and never show up until reboot, though .

madbrain · Jan 8, 2019

Actually checked and the disks spin up as soon as I press the power button from a power-off. So this isn't a problem with delayed spin up. My kill-a-watt showed a peak of 189W during the drives spinup. Usage goes back down to 102W afterwards.
This time around, ZFS didn't automatically import on boot. But no problems on manual import. I can't seem to be getting the same results twice on boot ...

I noticed the device names for the disks aren't always the same also, even when all 5 drives are present. Really weird.

msg7086 · Jan 9, 2019

With zfs you are supposed to use the unique id path, like /dev/disk/by-id/*, to specify devices. Could this trigger the problem somehow?

madbrain · Jan 9, 2019

msg7086 said:
With zfs you are supposed to use the unique id path, like /dev/disk/by-id/*, to specify devices. Could this trigger the problem somehow?

Oh, I didn't know that. I followed the Ubuntu tutorial on ZFS which instructed to use the /dev names .

Setup a ZFS storage pool | Ubuntu tutorials

Is there any way to fix the pool to use different device names ?

BLinux · Jan 9, 2019

madbrain said:
Oh, I didn't know that. I followed the Ubuntu tutorial on ZFS which instructed to use the /dev names .

Setup a ZFS storage pool | Ubuntu tutorials

Is there any way to fix the pool to use different device names ?

Export the pool:

# zpool export array

Now, import the pool while specifying the path:

# zpool import -d /dev/disk/by-id array

If that succeeds, the cache info should update, and the next time you can import without -d and it will use the cached block device paths.

I like to use /dev/disk/by-id for SATA drives, but for SAS, I prefer /dev/disk/by-path instead.

madbrain · Jan 9, 2019

BLinux said:
Export the pool:

# zpool export array

Now, import the pool while specifying the path:

# zpool import -d /dev/disk/by-id array

If that succeeds, the cache info should update, and the next time you can import without -d and it will use the cached block device paths.

I like to use /dev/disk/by-id for SATA drives, but for SAS, I prefer /dev/disk/by-path instead.

Thanks. It succeeded.

root@SERVER10G:~# zpool status
pool: array
state: ONLINE
scan: resilvered 172K in 0h0m with 0 errors on Tue Jan 8 23:28:30 2019
config:

NAME STATE READ WRITE CKSUM
array ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
ata-WDC_WD100EMAZ-00WJTA0_JEGLJXAN ONLINE 0 0 0
wwn-0x5000cca267c7c89e ONLINE 0 0 0
wwn-0x5000cca267c8561f ONLINE 0 0 0
wwn-0x5000cca267c89be6 ONLINE 0 0 0
wwn-0x5000cca267c78fd4 ONLINE 0 0 0

errors: No known data errors
root@SERVER10G:~#

A little odd that one device has different name that the 4 others. There are two LSI controllers and one Intel (on the motherboard). I thought I attached all the HDDs to the LSIs, but they may not all be on the same LSI card. There is one 9207-8i and one 9207-4i4e .

madbrain · Jan 9, 2019

BLinux & msg7806, thank you. I think you solved my problems. No more inconsistencies on boot, warm or cold. ZFS volume now always imports. And no more disconnects / bad checksums / resilverings so far.

BLinux · Jan 9, 2019

madbrain said:
Thanks. It succeeded.

root@SERVER10G:~# zpool status
pool: array
state: ONLINE
scan: resilvered 172K in 0h0m with 0 errors on Tue Jan 8 23:28:30 2019
config:

NAME STATE READ WRITE CKSUM
array ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
ata-WDC_WD100EMAZ-00WJTA0_JEGLJXAN ONLINE 0 0 0
wwn-0x5000cca267c7c89e ONLINE 0 0 0
wwn-0x5000cca267c8561f ONLINE 0 0 0
wwn-0x5000cca267c89be6 ONLINE 0 0 0
wwn-0x5000cca267c78fd4 ONLINE 0 0 0

errors: No known data errors
root@SERVER10G:~#

A little odd that one device has different name that the 4 others. There are two LSI controllers and one Intel (on the motherboard). I thought I attached all the HDDs to the LSIs, but they may not all be on the same LSI card. There is one 9207-8i and one 9207-4i4e .

yeah,.. in /dev/disk/by-id, there are many symlinks that allow you to find the block device in different ways. For SATA drives, I much prefer the format:

ata-[BRAND]_[MODEL]-[SERIALNUMBER]

as that is easiest for me to confirm that I'm pulling the correct drive when it is time to swap the drive. I don't know of a way to tell zpool to look only at ata-*, but one way I've used to force it into using the ata-* name is to delete all the wwn-0x* symlinks in /dev/disk/by-id/, and then do the "zpool import -d ... ". That usually makes zpool find the ata-* symlinks only and then everything is consistent. once you reboot, the wwn-0x* symlinks will get re-created anyway. (I'm sure there must be a command to re-create all the symlinks, but I don't know it)

For SAS drives, there are no symlinks in /dev/disk/by-id like the ata-* format, so that's why I usually use /dev/disk/by-path, as that gives me enough info about which HBA and port the faulty drive is on for me to narrow it down to the physical HDD slot. I wish I knew how to have it make symlinks like:

sas-[BRAND]_[MODEL]-[SERIAL]

that would be nice... and i'm sure there is way... i just don't know it.

madbrain · Jan 9, 2019

BLinux said:
yeah,.. in /dev/disk/by-id, there are many symlinks that allow you to find the block device in different ways. For SATA drives, I much prefer the format:

ata-[BRAND]_[MODEL]-[SERIALNUMBER]

as that is easiest for me to confirm that I'm pulling the correct drive when it is time to swap the drive. I don't know of a way to tell zpool to look only at ata-*, but one way I've used to force it into using the ata-* name is to delete all the wwn-0x* symlinks in /dev/disk/by-id/, and then do the "zpool import -d ... ". That usually makes zpool find the ata-* symlinks only and then everything is consistent. once you reboot, the wwn-0x* symlinks will get re-created anyway. (I'm sure there must be a command to re-create all the symlinks, but I don't know it)

For SAS drives, there are no symlinks in /dev/disk/by-id like the ata-* format, so that's why I usually use /dev/disk/by-path, as that gives me enough info about which HBA and port the faulty drive is on for me to narrow it down to the physical HDD slot. I wish I knew how to have it make symlinks like:

sas-[BRAND]_[MODEL]-[SERIAL]

that would be nice... and i'm sure there is way... i just don't know it.

Thanks again. I followed your instructions, and now all drives in the pool are showing with the ata-[BRAND]-[MODEL]-[SERIALNUMBER] format.

EffrafaxOfWug · Jan 9, 2019

As an aside... does ZFS not write its metadata to the drives then...? Any number of things can transpire to change /dev/sdX to /dev/sdY and I didn't think anything relied on persistent device names any more.

If that's the case, doesn't that also mean that if you use by-path IDs the pool will fail to import if you move to a different HBA/different PCI slot?

WD100EMAZ strange noise

Active Member

cat lover server enthusiast

Active Member

Active Member

Active Member

Active Member

Active Member

Active Member

Active Member

Active Member

Active Member

Active Member

Active Member

Active Member

cat lover server enthusiast

Active Member

Active Member

cat lover server enthusiast

Active Member

Radioactive Member