[Solved] Soft Errors with Napp-It 18.06+ and Solaris 11.4

optimans

Member
Feb 20, 2015
56
55
18
Hey All/Gea,

Having a slight issue with the latest Napp-It packages with 18.06 and above. When using Napp-It, each time it gets information from the SAS cards (9211-8i x2), soft errors are present on connected drives, and increasing as I use the web interface. No errors reported in dmesg or logs. (Soft Errors are present without using smartmontools)

If I downgrade to 18.01 and below, there are no errors reported.

Does anyone know what may have changed in the newer builds?

Thanks
 

gea

Well-Known Member
Dec 31, 2010
2,502
842
113
DE
There is no known general problem but napp-it use several tools to gather disk information that can be the reason

iostat (sort of inventory of all seen disks and errors from bootup)
format (currently connected disks)
parted -lm (known partitions)

I do not expect iostat messages/errors due the above because of the tools,
only because of other problems related to HBA, disks, cabling, power etc


In menu Disks > Location napp-it can use sas2ircu and sas3ircu
to request the location of a disk with a WWN in a backplane

As these tools are tools for Solaris they can produce warnings on Illumos.
Did you have seen the soft errors when calling Disks > Location

Or try sas2ircu/sas3ircu or lsiutil.i386 at console
When they are installed, they are in /var/web-gui/_my/tools/

You can disable these tools in menu Disks > Location > select sas2/3ircu.
Disk location is then done by the driver alone (no option to enable a red alert disk)


Smartmontools is also a possible reason for soft errors.
Check Disks > Smartinfo


Last option would be the firmware or disks compatibility problems
What firmware is on your LSI 9211
 

optimans

Member
Feb 20, 2015
56
55
18
Hi Gea,

Thanks for the timely response.

I think I may have found the culprite: smartmontools

I had a look at agent-get-diskvalues.pl and it says: # script is called every 45s from agents during sessions (session-age max 60s)

Looking at /tmp/nappit folder, I can see that the smart_data.smart and smart_last.smart is being updated every minute.

Looks like it is pooling smart info every minute when logged into the web interface, hence why every disk has same soft error count

When activating 18.01, there is no smart files in /tmp/nappit and no soft errors when in web interface.

Is there a way to disable getting smart info when script repeats itself every 60 seconds?

Thanks


Info:
9211-8i using version 20.00.07.00-IT firmware on both HBA's.
sas2ircu version 20.00.00.00
PCI passthrough via ESXi 6.7U1 VM14
 

TRACKER

Member
Jan 14, 2019
58
14
8
Hello,
this is well known issue, caused by smartmontools.

When i compiled smartmontools from source, it doesn't give all smart parameters but instead shows only temperature and actual status (OK/NOT-OK) but no soft errors were generated :)

P.S. I use Solaris 11.3 and 11.4
 

optimans

Member
Feb 20, 2015
56
55
18
Hi TRACKER,

What version are you using? I've got 6.6 installed as per nappit installer.
 

TRACKER

Member
Jan 14, 2019
58
14
8
I have both:

/opt/csw/sbin/smartctl -v
smartctl 6.5 2016-05-07 r4318 [i386-pc-solaris2.10] (local build)

/usr/sbin/smartctl -v
smartctl 6.6 2017-11-05 r4594 [i386-pc-solaris2.11] (local build)

I don't have (now) the version of (self-compiled) smartmontools, which doesn't generate soft errors.

Anyway, i think you should not consider this as an issue, i remember i've found somewhere on internet forums thread, where issue was discussed and basically it is ok to have those soft errors
I've found also this one :)

Script to reset the iostat errors counters (hard/soft/trn) without reboot – The Geek Diary
 

optimans

Member
Feb 20, 2015
56
55
18
Thanks for that.

I knew the soft errors were generated by smartmontools, it was when errors were increasing when I didn't request any smartinfo that had me worried.

Will look into the reset script.

Cheers
 

gea

Well-Known Member
Dec 31, 2010
2,502
842
113
DE
If you find that soft errors are triggered by smartmontools (every call of smartmontools increases soft errors) and the rapidly increasing number is due the acceleration function that calls smartmontools in the background then you can

use menu Services > ACC (may need a default en menu set) and disable smart for acceleration. Then this happens only when you manually check smart
 
  • Like
Reactions: optimans

optimans

Member
Feb 20, 2015
56
55
18
Hi Gea,

Disabling the smart in ACC has solved the issue.

Thanks for your help!
 

gea

Well-Known Member
Dec 31, 2010
2,502
842
113
DE
I have compiled smartmontools 7.0 on Solaris 11.4
Still the same behaviour (increase soft error counter on each run)

Illumos (OmniOS, OI) does not show this behaviour
 

TRACKER

Member
Jan 14, 2019
58
14
8
I will try to get the version, which is not generating soft errors (but on Saturday, i don't have much time these days) and will update you here.
 

TRACKER

Member
Jan 14, 2019
58
14
8
Hi again,

the version, which i used in the past and it was not generating soft errors is:

/usr/local/bin/smartctl -v
smartctl 5.42 2011-10-20 r3458 [i386-pc-solaris2.11] (local build)
Copyright (C) 2002-11 by Bruce Allen, smartmontools

As i mentioned, it shows only SMART status (OK/not OK) and temperature of the drive.
 

gea

Well-Known Member
Dec 31, 2010
2,502
842
113
DE
Last edited: