smartmontools / smartcli causing software errors on disks whenever used?

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

TheBloke

Active Member
Feb 23, 2017
200
40
28
44
Brighton, UK
Hi all

I recently started using smartmontools, installed via the CSWsmartmontools package on my Solaris 11.3 server. I am particularly interested in this for temperature measuring of my 24 HDDs - in particular the 12 that are in my external JBOD chassis, which doesn't have much airflow between the drives and the fans (and where I am trying to use quieter fans if possible.)

smartmontools provides a daemon (smartd) and a CLI (smartcli). Both work great. But I have noticed something odd. Whenever I use smartcli to get info on a disk, and whenever smartd runs its periodic check (once per hour I think), I see errors recorded against the disks.

Here is a simple way to demonstrate the issue - in this example a disk has 0 errors, then I run smartcli, and then it has 1 error, shown as a "Recoverable" software error:
Code:
root@magrathea:~# iostat -xen c0t5000C5007A13ED4Dd0 && \
smartctl -H /dev/rdsk/c0t5000C5007A13ED4Dd0 && \
iostat -xen c0t5000C5007A13ED4Dd0
                            extended device statistics       ---- errors ---
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b s/w h/w trn tot device
 2576.7    2.0 42241.0   10.3  0.0  0.7    0.0    0.3   0  25   0   0   0   0 c0t5000C5007A13ED4Dd0
smartctl 6.5 2016-05-07 r4318 [i386-pc-solaris2.10] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

                            extended device statistics       ---- errors ---
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b s/w h/w trn tot device
 2576.6    2.0 42239.3   10.3  0.0  0.7    0.0    0.3   0  25   1  0   0   1 c0t5000C5007A13ED4Dd0
root@magrathea:~# iostat -E c0t5000C5007A13ED4Dd0
sd44      Soft Errors: 1 Hard Errors: 0 Transport Errors: 0
Vendor: ATA      Product: ST2000DM001-1ER1 Revision: CC25 Serial No: Z4Z1SE37
Size: 2000.40GB <2000398934016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 1
Illegal Request: 0 Predictive Failure Analysis: 0 Non-Aligned Writes: 0
I cannot yet see any problem caused by these errors - if I run smartcli when there is heavy disk activity (eg a zpool scrub), there is no sign of it being affected or interrupted. Nothing gets logged in /var/adm/messages. So currently, the only symptom is the error shown in iostat.

But I am worried that if I let these errors build up over time, maybe fmadm will take the disk out of service? I could probably test that using a scratch disk by running smartcli over and over until there are hundreds of errors.

Maybe that won't be an issue. But it's still not ideal getting fake errors showing in iostat - it means I will always have errors on all disks, which might mean I miss real errors.

Has anyone seen this before or knows anything about why it might be happening? I will likely contact the smartmontools guys, and/or the CSWsmartmontools package maintainers. But I thought I'd ask here first in case anyone has experienced it before.

Thanks
 

gea

Well-Known Member
Dec 31, 2010
3,141
1,182
113
DE
This iostat message (iostat soft error counter ++ on every check) is a "known problem" for smartmontools on Solarish for years. Nothing to worry about.
 
  • Like
Reactions: TheBloke