[Solved] Soft Errors with Napp-It 18.06+ and Solaris 11.4

Discussion in 'Solaris, Nexenta, OpenIndiana, and napp-it' started by optimans, Mar 4, 2019.

  1. optimans

    optimans New Member

    Joined:
    Feb 20, 2015
    Messages:
    14
    Likes Received:
    2
    Hey All/Gea,

    Having a slight issue with the latest Napp-It packages with 18.06 and above. When using Napp-It, each time it gets information from the SAS cards (9211-8i x2), soft errors are present on connected drives, and increasing as I use the web interface. No errors reported in dmesg or logs. (Soft Errors are present without using smartmontools)

    If I downgrade to 18.01 and below, there are no errors reported.

    Does anyone know what may have changed in the newer builds?

    Thanks
     
    #1
  2. gea

    gea Well-Known Member

    Joined:
    Dec 31, 2010
    Messages:
    2,096
    Likes Received:
    674
    There is no known general problem but napp-it use several tools to gather disk information that can be the reason

    iostat (sort of inventory of all seen disks and errors from bootup)
    format (currently connected disks)
    parted -lm (known partitions)

    I do not expect iostat messages/errors due the above because of the tools,
    only because of other problems related to HBA, disks, cabling, power etc


    In menu Disks > Location napp-it can use sas2ircu and sas3ircu
    to request the location of a disk with a WWN in a backplane

    As these tools are tools for Solaris they can produce warnings on Illumos.
    Did you have seen the soft errors when calling Disks > Location

    Or try sas2ircu/sas3ircu or lsiutil.i386 at console
    When they are installed, they are in /var/web-gui/_my/tools/

    You can disable these tools in menu Disks > Location > select sas2/3ircu.
    Disk location is then done by the driver alone (no option to enable a red alert disk)


    Smartmontools is also a possible reason for soft errors.
    Check Disks > Smartinfo


    Last option would be the firmware or disks compatibility problems
    What firmware is on your LSI 9211
     
    #2
  3. optimans

    optimans New Member

    Joined:
    Feb 20, 2015
    Messages:
    14
    Likes Received:
    2
    Hi Gea,

    Thanks for the timely response.

    I think I may have found the culprite: smartmontools

    I had a look at agent-get-diskvalues.pl and it says: # script is called every 45s from agents during sessions (session-age max 60s)

    Looking at /tmp/nappit folder, I can see that the smart_data.smart and smart_last.smart is being updated every minute.

    Looks like it is pooling smart info every minute when logged into the web interface, hence why every disk has same soft error count

    When activating 18.01, there is no smart files in /tmp/nappit and no soft errors when in web interface.

    Is there a way to disable getting smart info when script repeats itself every 60 seconds?

    Thanks


    Info:
    9211-8i using version 20.00.07.00-IT firmware on both HBA's.
    sas2ircu version 20.00.00.00
    PCI passthrough via ESXi 6.7U1 VM14
     
    #3
  4. TRACKER

    TRACKER New Member

    Joined:
    Jan 14, 2019
    Messages:
    18
    Likes Received:
    3
    Hello,
    this is well known issue, caused by smartmontools.

    When i compiled smartmontools from source, it doesn't give all smart parameters but instead shows only temperature and actual status (OK/NOT-OK) but no soft errors were generated :)

    P.S. I use Solaris 11.3 and 11.4
     
    #4
  5. optimans

    optimans New Member

    Joined:
    Feb 20, 2015
    Messages:
    14
    Likes Received:
    2
    Hi TRACKER,

    What version are you using? I've got 6.6 installed as per nappit installer.
     
    #5
  6. TRACKER

    TRACKER New Member

    Joined:
    Jan 14, 2019
    Messages:
    18
    Likes Received:
    3
    I have both:

    /opt/csw/sbin/smartctl -v
    smartctl 6.5 2016-05-07 r4318 [i386-pc-solaris2.10] (local build)

    /usr/sbin/smartctl -v
    smartctl 6.6 2017-11-05 r4594 [i386-pc-solaris2.11] (local build)

    I don't have (now) the version of (self-compiled) smartmontools, which doesn't generate soft errors.

    Anyway, i think you should not consider this as an issue, i remember i've found somewhere on internet forums thread, where issue was discussed and basically it is ok to have those soft errors
    I've found also this one :)

    Script to reset the iostat errors counters (hard/soft/trn) without reboot – The Geek Diary
     
    #6
  7. optimans

    optimans New Member

    Joined:
    Feb 20, 2015
    Messages:
    14
    Likes Received:
    2
    Thanks for that.

    I knew the soft errors were generated by smartmontools, it was when errors were increasing when I didn't request any smartinfo that had me worried.

    Will look into the reset script.

    Cheers
     
    #7
  8. gea

    gea Well-Known Member

    Joined:
    Dec 31, 2010
    Messages:
    2,096
    Likes Received:
    674
    If you find that soft errors are triggered by smartmontools (every call of smartmontools increases soft errors) and the rapidly increasing number is due the acceleration function that calls smartmontools in the background then you can

    use menu Services > ACC (may need a default en menu set) and disable smart for acceleration. Then this happens only when you manually check smart
     
    #8
    optimans likes this.
  9. optimans

    optimans New Member

    Joined:
    Feb 20, 2015
    Messages:
    14
    Likes Received:
    2
    Hi Gea,

    Disabling the smart in ACC has solved the issue.

    Thanks for your help!
     
    #9
  10. gea

    gea Well-Known Member

    Joined:
    Dec 31, 2010
    Messages:
    2,096
    Likes Received:
    674
    I have compiled smartmontools 7.0 on Solaris 11.4
    Still the same behaviour (increase soft error counter on each run)

    Illumos (OmniOS, OI) does not show this behaviour
     
    #10
  11. TRACKER

    TRACKER New Member

    Joined:
    Jan 14, 2019
    Messages:
    18
    Likes Received:
    3
    I will try to get the version, which is not generating soft errors (but on Saturday, i don't have much time these days) and will update you here.
     
    #11
  12. TRACKER

    TRACKER New Member

    Joined:
    Jan 14, 2019
    Messages:
    18
    Likes Received:
    3
    Hi again,

    the version, which i used in the past and it was not generating soft errors is:

    /usr/local/bin/smartctl -v
    smartctl 5.42 2011-10-20 r3458 [i386-pc-solaris2.11] (local build)
    Copyright (C) 2002-11 by Bruce Allen, smartmontools

    As i mentioned, it shows only SMART status (OK/not OK) and temperature of the drive.
     
    #12
  13. gea

    gea Well-Known Member

    Joined:
    Dec 31, 2010
    Messages:
    2,096
    Likes Received:
    674
    #13
    Last edited: Mar 8, 2019
Similar Threads: [Solved] Soft
Forum Title Date
Solaris, Nexenta, OpenIndiana, and napp-it [solved] Optane missing Oct 31, 2018
Solaris, Nexenta, OpenIndiana, and napp-it Solaris (OmniOS) w/ Napp-It ZPool Share Permissions for CIFS [Solved] Sep 17, 2018
Solaris, Nexenta, OpenIndiana, and napp-it [SOLVED] OmniOS r151021 (latest Bloody) and napp-it: Tty.so Perl error on admin.pl Apr 11, 2017
Solaris, Nexenta, OpenIndiana, and napp-it [solved] Napp-it and folder-acl : write-error Nov 29, 2016
Solaris, Nexenta, OpenIndiana, and napp-it [solved] napp-it and infiniband : where is menu ? Nov 24, 2016

Share This Page