OmniOS + napp-it rebooting during ZFS scrub

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

deepblue

New Member
Mar 10, 2015
1
0
1
52
Hi all,

I run a OmniOS v11 r151012 + napp-it pro v. 0.9f4 system that spontaneously reboots during scrubs. Setup is:
Supermicro Chassis 847E16-R1K28LPB (36x3.5" HDD)
Supermicro Board X9SRL-F, 64GB RAM
LSI 9207-8i
The chassis has 2 expanders: SAS2X36 + SAS2X28 (controller connects only to SAS2X36)

Currently there are 8 SAS and 11 SATA drives in the chassis. I have 2 ZFS pools (besides the root pool that is on 2 USB Sticks). The first pool is made up from the SATA drives the other is made up from SAS drives.

The server rebooted twice during the last scrub of the SATA pool. Sometimes I see messages like:

Code:
kern notice scsi: [ID 107833 kern.notice] /pci@0,0/pci8086,e08@3/pci1000,3020@0 (mpt_sas0):
kern notice of 60 seconds expired with 3 commands on target 13 lun 0.
kern warning scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,e08@3/pci1000,3020@0 (mpt_sas0):
kern warning command timeout for target 13 w50014ee0acb02a27.

info scsi: [ID 365881 kern.info] /pci@0,0/pci8086,e08@3/pci1000,3020@0 (mpt_sas0):
info info 0x31130000 received for target 13 w50014ee0acb02a27.
info ioc_status=0x8048, scsi_state=0xc
I see these messages not only for target 13 but for all (more or less) disks in the system (SATA and SAS disks).

Is this related to the fact that there are SATA and SAS disks behind the expander? Or is this more likely a HW issue with the controller or expander(s)? Up until now I have seen these warnings/reboots only during scrubs.

I have to admit that I am not a Solaris/OmniOS expert and my knowledge on ZFS is limited. If there is anything I could debug please tell me how to :)

Thanx and Regards
deepblue