Hi all
I recently started using smartmontools, installed via the CSWsmartmontools package on my Solaris 11.3 server. I am particularly interested in this for temperature measuring of my 24 HDDs - in particular the 12 that are in my external JBOD chassis, which doesn't have much airflow between the drives and the fans (and where I am trying to use quieter fans if possible.)
smartmontools provides a daemon (smartd) and a CLI (smartcli). Both work great. But I have noticed something odd. Whenever I use smartcli to get info on a disk, and whenever smartd runs its periodic check (once per hour I think), I see errors recorded against the disks.
Here is a simple way to demonstrate the issue - in this example a disk has 0 errors, then I run smartcli, and then it has 1 error, shown as a "Recoverable" software error:
I cannot yet see any problem caused by these errors - if I run smartcli when there is heavy disk activity (eg a zpool scrub), there is no sign of it being affected or interrupted. Nothing gets logged in /var/adm/messages. So currently, the only symptom is the error shown in iostat.
But I am worried that if I let these errors build up over time, maybe fmadm will take the disk out of service? I could probably test that using a scratch disk by running smartcli over and over until there are hundreds of errors.
Maybe that won't be an issue. But it's still not ideal getting fake errors showing in iostat - it means I will always have errors on all disks, which might mean I miss real errors.
Has anyone seen this before or knows anything about why it might be happening? I will likely contact the smartmontools guys, and/or the CSWsmartmontools package maintainers. But I thought I'd ask here first in case anyone has experienced it before.
Thanks
I recently started using smartmontools, installed via the CSWsmartmontools package on my Solaris 11.3 server. I am particularly interested in this for temperature measuring of my 24 HDDs - in particular the 12 that are in my external JBOD chassis, which doesn't have much airflow between the drives and the fans (and where I am trying to use quieter fans if possible.)
smartmontools provides a daemon (smartd) and a CLI (smartcli). Both work great. But I have noticed something odd. Whenever I use smartcli to get info on a disk, and whenever smartd runs its periodic check (once per hour I think), I see errors recorded against the disks.
Here is a simple way to demonstrate the issue - in this example a disk has 0 errors, then I run smartcli, and then it has 1 error, shown as a "Recoverable" software error:
Code:
root@magrathea:~# iostat -xen c0t5000C5007A13ED4Dd0 && \
smartctl -H /dev/rdsk/c0t5000C5007A13ED4Dd0 && \
iostat -xen c0t5000C5007A13ED4Dd0
extended device statistics ---- errors ---
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device
2576.7 2.0 42241.0 10.3 0.0 0.7 0.0 0.3 0 25 0 0 0 0 c0t5000C5007A13ED4Dd0
smartctl 6.5 2016-05-07 r4318 [i386-pc-solaris2.10] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
extended device statistics ---- errors ---
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device
2576.6 2.0 42239.3 10.3 0.0 0.7 0.0 0.3 0 25 1 0 0 1 c0t5000C5007A13ED4Dd0
root@magrathea:~# iostat -E c0t5000C5007A13ED4Dd0
sd44 Soft Errors: 1 Hard Errors: 0 Transport Errors: 0
Vendor: ATA Product: ST2000DM001-1ER1 Revision: CC25 Serial No: Z4Z1SE37
Size: 2000.40GB <2000398934016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 1
Illegal Request: 0 Predictive Failure Analysis: 0 Non-Aligned Writes: 0
But I am worried that if I let these errors build up over time, maybe fmadm will take the disk out of service? I could probably test that using a scratch disk by running smartcli over and over until there are hundreds of errors.
Maybe that won't be an issue. But it's still not ideal getting fake errors showing in iostat - it means I will always have errors on all disks, which might mean I miss real errors.
Has anyone seen this before or knows anything about why it might be happening? I will likely contact the smartmontools guys, and/or the CSWsmartmontools package maintainers. But I thought I'd ask here first in case anyone has experienced it before.
Thanks