My WD RED WD80EFAX HDD suddenly died last week: I shut down my Proxmox server, booted it up again and the drive started "clicking". It was clicking for a while, until it stopped and no longer does that. I did not receive any SMART warnings ahead of time, and looking back at the /var/lib/smartmontools/ attrlog, I don’t think there was anything to worry about there:
compare that with the first values recorded in that log file:
The HDD was connected through an external USB enclosure, so I first tested to make sure the problem persists using another USB enclosure and it does, unfortunately. What I am seeing in dmesg is:
While the disk appears to report the
I also tried connecting it via SATA over a borrowed PCI extension card, since my server is a Lenovo Tiny and does not come with a regular SATA connector. There, I kept getting 'sata link down' errors, although I cannot 100% be sure it wasn’t due to the PCI extension card itself, since I didn’t have another disk to test with it to exclude false negative. I'll see if I can test it again in some other system just to be 100% sure.
Lastly, I removed the PCB and did not see any immediate damage to it. I also cleaned it up a bit, but that didn't do anything.
At this point I am wondering if
I am slightly confused because I am not sure if I should go through that effort. The replacement PCBs for this model are readily available on Aliexpress at a reasonable price, but it would take quite some work to re-solder the BIOS SMD chip.
P.S. that HDD contained backups only, so nothing critical but I would still prefer to retain the data. And I was also going to set up RAID for it (the main SSD with OS is already RAID1), it just wasn't a priority.
date | SMART attribute ID | current | raw |
2023-10-24 09:34:51 | 1 | 100 | 0 |
2023-10-24 09:34:51 | 2 | 128 | 116 |
2023-10-24 09:34:51 | 3 | 253 | 2031728 |
2023-10-24 09:34:51 | 4 | 99 | 6689 |
2023-10-24 09:34:51 | 5 | 100 | 0 |
2023-10-24 09:34:51 | 7 | 100 | 0 |
2023-10-24 09:34:51 | 8 | 128 | 18 |
2023-10-24 09:34:51 | 9 | 95 | 41823 |
2023-10-24 09:34:51 | 10 | 100 | 0 |
2023-10-24 09:34:51 | 12 | 100 | 2276 |
2023-10-24 09:34:51 | 22 | 100 | 100 |
2023-10-24 09:34:51 | 192 | 93 | 9251 |
2023-10-24 09:34:51 | 193 | 93 | 9251 |
2023-10-24 09:34:51 | 194 | 127 | 279174185011 |
2023-10-24 09:34:51 | 196 | 100 | 0 |
2023-10-24 09:34:51 | 197 | 100 | 0 |
2023-10-24 09:34:51 | 198 | 100 | 0 |
2023-10-24 09:34:51 | 199 | 200 | 0 |
compare that with the first values recorded in that log file:
date | SMART attribute ID | current | raw |
2022-04-15 15:52:32 | 1 | 100 | 0 |
2022-04-15 15:52:32 | 2 | 128 | 116 |
2022-04-15 15:52:32 | 3 | 151 | 8617263560 |
2022-04-15 15:52:32 | 4 | 100 | 584 |
2022-04-15 15:52:32 | 5 | 100 | 0 |
2022-04-15 15:52:32 | 7 | 100 | 0 |
2022-04-15 15:52:32 | 8 | 128 | 18 |
2022-04-15 15:52:32 | 9 | 96 | 28636 |
2022-04-15 15:52:32 | 10 | 100 | 0 |
2022-04-15 15:52:32 | 12 | 100 | 557 |
2022-04-15 15:52:32 | 22 | 100 | 100 |
2022-04-15 15:52:32 | 192 | 99 | 1794 |
2022-04-15 15:52:32 | 193 | 99 | 1794 |
2022-04-15 15:52:32 | 194 | 144 | 279174185005 |
2022-04-15 15:52:32 | 196 | 100 | 0 |
2022-04-15 15:52:32 | 197 | 100 | 0 |
2022-04-15 15:52:32 | 198 | 100 | 0 |
2022-04-15 15:52:32 | 199 | 200 | 0 |
The HDD was connected through an external USB enclosure, so I first tested to make sure the problem persists using another USB enclosure and it does, unfortunately. What I am seeing in dmesg is:
Code:
[25343.421737] usb 2-3: new SuperSpeed USB device number 8 using xhci_hcd
[25343.442848] usb 2-3: New USB device found, idVendor=152d, idProduct=1561, bcdDevice= 1.04
[25343.442854] usb 2-3: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[25343.442857] usb 2-3: Product: SABRENT
[25343.442858] usb 2-3: Manufacturer: SABRENT
[25343.442860] usb 2-3: SerialNumber: DB98765432143
[25343.446053] scsi host1: uas
[25343.446591] scsi 1:0:0:0: Direct-Access SABRENT 0104 PQ: 0 ANSI: 6
[25343.448532] sd 1:0:0:0: Attached scsi generic sg0 type 0
[25353.377987] sd 1:0:0:0: [sda] 1953506646 4096-byte logical blocks: (8.00 TB/7.28 TiB)
[25353.378144] sd 1:0:0:0: [sda] Write Protect is off
[25353.378147] sd 1:0:0:0: [sda] Mode Sense: 53 00 00 08
[25353.378427] sd 1:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[25353.378658] sd 1:0:0:0: [sda] Preferred minimum I/O size 32768 bytes
[25353.378662] sd 1:0:0:0: [sda] Optimal transfer size 268431360 bytes not a multiple of preferred minimum block size (32768 bytes)
[25384.996385] sd 1:0:0:0: [sda] tag#22 uas_eh_abort_handler 0 uas-tag 1 inflight: CMD IN
[25384.996393] sd 1:0:0:0: [sda] tag#22 CDB: Read(10) 28 00 00 00 00 00 00 00 01 00
[25385.016413] scsi host1: uas_eh_device_reset_handler start
[25385.148590] usb 2-3: reset SuperSpeed USB device number 8 using xhci_hcd
[25385.174465] scsi host1: uas_eh_device_reset_handler success
[25417.783354] scsi host1: uas_eh_device_reset_handler start
[25417.783528] sd 1:0:0:0: [sda] tag#24 uas_zap_pending 0 uas-tag 1 inflight: CMD
[25417.783535] sd 1:0:0:0: [sda] tag#24 CDB: Read(10) 28 00 00 00 00 00 00 00 01 00
[25417.915763] usb 2-3: reset SuperSpeed USB device number 8 using xhci_hcd
[25417.937381] scsi host1: uas_eh_device_reset_handler success
[25450.530389] scsi host1: uas_eh_device_reset_handler start
[25450.530552] sd 1:0:0:0: [sda] tag#26 uas_zap_pending 0 uas-tag 1 inflight: CMD
[25450.530556] sd 1:0:0:0: [sda] tag#26 CDB: Read(10) 28 00 00 00 00 00 00 00 01 00
[25450.658774] usb 2-3: reset SuperSpeed USB device number 8 using xhci_hcd
[25450.680523] scsi host1: uas_eh_device_reset_handler success
[25453.039632] sd 1:0:0:0: [sda] tag#9 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=99s
[25453.039639] sd 1:0:0:0: [sda] tag#9 Sense Key : Aborted Command [current]
[25453.039641] sd 1:0:0:0: [sda] tag#9 Add. Sense: No additional sense information
[25453.039644] sd 1:0:0:0: [sda] tag#9 CDB: Read(10) 28 00 00 00 00 00 00 00 01 00
[25453.039646] I/O error, dev sda, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[25453.039650] Buffer I/O error on dev sda, logical block 0, async page read
[25483.301277] sd 1:0:0:0: [sda] tag#10 uas_eh_abort_handler 0 uas-tag 1 inflight: CMD IN
[25483.301299] sd 1:0:0:0: [sda] tag#10 CDB: Read(10) 28 00 00 00 00 00 00 00 01 00
[25483.345279] scsi host1: uas_eh_device_reset_handler start
[25483.477571] usb 2-3: reset SuperSpeed USB device number 8 using xhci_hcd
[25483.499402] scsi host1: uas_eh_device_reset_handler success
capacity
(7.28 TiB), I cannot get smartctl
to show anything at all, it gets stuck at -c
, -i
and, obviously, -a
. The disk does, however, "tick" rhythmically and rather quietly during when smartctl
remains stuck, but it is not the "clicking" sound.I also tried connecting it via SATA over a borrowed PCI extension card, since my server is a Lenovo Tiny and does not come with a regular SATA connector. There, I kept getting 'sata link down' errors, although I cannot 100% be sure it wasn’t due to the PCI extension card itself, since I didn’t have another disk to test with it to exclude false negative. I'll see if I can test it again in some other system just to be 100% sure.
Lastly, I removed the PCB and did not see any immediate damage to it. I also cleaned it up a bit, but that didn't do anything.
At this point I am wondering if
smartctl
failing to report anything and the SATA link errors could be indicative of a PCB failure? It would be an odd one, since the disk *does* spin up and partially report itself via USB enclosure, so it's not *completely* broken.I am slightly confused because I am not sure if I should go through that effort. The replacement PCBs for this model are readily available on Aliexpress at a reasonable price, but it would take quite some work to re-solder the BIOS SMD chip.
P.S. that HDD contained backups only, so nothing critical but I would still prefer to retain the data. And I was also going to set up RAID for it (the main SSD with OS is already RAID1), it just wasn't a priority.