ZFS Pool Degraded -> Unavail

Bronko

Member
May 13, 2016
102
7
18
101
Coming from Napp-it SuperStorage Server 6048R-E1CR36L Performance and after 3 years working fine, this is what my data pool currently looks like:

Code:
# cat /etc/release
  OmniOS v11 r151028o
  Copyright 2017 OmniTI Computer Consulting, Inc. All rights reserved.
  Copyright 2017-2019 OmniOS Community Edition (OmniOSce) Association.
  All rights reserved. Use is subject to licence terms.

# zpool status tank1
  pool: tank1
 state: UNAVAIL
status: One or more devices are faulted in response to persistent errors.  There are insufficient replicas for the pool to
        continue functioning.
action: Destroy and re-create the pool from a backup source.  Manually marking the device
        repaired using 'zpool clear' may allow some data to be recovered.
  scan: none requested
config:

        NAME                       STATE     READ WRITE CKSUM
        tank1                      UNAVAIL      0     0     0  insufficient replicas
          mirror-0                 ONLINE       0     0     0
            c2t5000CCA23B0CEF7Dd0  ONLINE       0     0     0
            c2t5000CCA23B0D18F9d0  ONLINE       0     0     0
          mirror-1                 DEGRADED     0     0     0
            c2t5000CCA23B0CDAE9d0  ONLINE       0     0     0
            c2t5000CCA23B0D0E11d0  FAULTED      0     0     0  external device fault
          mirror-2                 UNAVAIL      0     0     0  insufficient replicas
            c2t5000CCA23B0C20C9d0  UNAVAIL      0     0     0  cannot open
            c2t5000CCA23B0CA94Dd0  FAULTED      0     0     0  external device fault
          mirror-3                 ONLINE       0     0     0
            c2t5000CCA23B07B701d0  ONLINE       0     0     0
            c2t5000CCA23B0C9CD5d0  ONLINE       0     0     0
          mirror-4                 UNAVAIL      0     0     0  insufficient replicas
            c2t5000CCA23B0BE229d0  FAULTED      0     0     0  external device fault
            c2t5000CCA23B0C0935d0  UNAVAIL      0     0     0  cannot open
          mirror-5                 DEGRADED     0     0     0
            c2t5000CCA23B0BFDA9d0  ONLINE       0     0     0
            c2t5000CCA23B0D25C9d0  UNAVAIL      0     0     0  cannot open
          mirror-6                 ONLINE       0     0     0
            c2t5000CCA23B0B9121d0  ONLINE       0     0     0
            c2t5000CCA23B0BFCA1d0  ONLINE       0     0     0
          mirror-7                 DEGRADED     0     0     0
            c2t5000CCA23B0BDA41d0  ONLINE       0     0     0
            c2t5000CCA23B0BFBF1d0  FAULTED      0     0     0  external device fault
          mirror-8                 ONLINE       0     0     0
            c2t5000CCA23B0CE5B9d0  ONLINE       0     0     0
            c2t5000CCA23B0CE7A9d0  ONLINE       0     0     0
          mirror-9                 UNAVAIL      0     0     0  insufficient replicas
            c2t5000CCA23B0C0901d0  UNAVAIL      0     0     0  cannot open
            c2t5000CCA23B0D1BB5d0  FAULTED      0     0     0  external device fault
          mirror-10                DEGRADED     0     0     0
            c2t5000CCA23B0C00B1d0  FAULTED      0     0     0  external device fault
            c2t5000CCA23B0C9BD5d0  ONLINE       0     0     0
          mirror-11                DEGRADED     0     0     0
            c2t5000CCA23B0A3AE9d0  FAULTED      0     0     0  external device fault
            c2t5000CCA23B0CF6D9d0  ONLINE       0     0     0
        logs
          mirror-12                ONLINE       0     0     0
            c1t5002538C401C745Fd0  ONLINE       0     0     0
            c1t5002538C401C7462d0  ONLINE       0     0     0
11 Drives faulted nearly at the same time.

The pool degrading starts by 8 drives and the first try with zpool clear tank1 added 3 more drives.
Status above after reboot (cache device is gone).

A test by /var/web-gui/_my/tools/sas3ircu/lsiutil.i386 (napp-it) on one of affected disk finished with no errors!?
Code:
Select a device:  [1-26 or RETURN to quit] 4

1.  Alternating, 8-Bit, 00 and FF
2.  Alternating, 8-Bit, 55 and AA
3.  Incrementing, 8-Bit
4.  Walking 1s and 0s, 8-Bit
5.  Alternating, 16-Bit, 0000 and FFFF
6.  Alternating, 16-Bit, 5555 and AAAA
7.  Incrementing, 16-Bit
8.  Walking 1s and 0s, 16-Bit
9:  Random
10:  All B5
11:  All 4A
12:  Incrementing across iterations (00 through FF)

Select a data pattern:  [1-12 or RETURN to quit] 9
Number of blocks per I/O:  [1-64 or RETURN to quit] 64
Number of iterations:  [1-1000000 or 0 for infinite or RETURN to quit] 10000
Type of I/O:  [0=Sequential, 1=Random, default is 0] 1
Stop test on Write, Read, or Compare error?  [Yes or No, default is Yes]
Testing started...
10%  20%  30%  40%  50%  60%  70%  80%  90%  100%
Testing ended...
But some abnormality at phy counters?
Code:
1.  Inquiry Test
2.  WriteBuffer/ReadBuffer/Compare Test
3.  Read Test
4.  Write/Read/Compare Test
8.  Read Capacity / Read Block Limits Test
12.  Display phy counters
13.  Clear phy counters
14.  SATA SMART Read Test
15.  SEP (SCSI Enclosure Processor) Test
18.  Report LUNs Test
19.  Drive firmware download
20.  Expander firmware download
21.  Read Logical Blocks
99.  Reset port
e   Enable expert mode in menus
p   Enable paged mode
w   Enable logging

Diagnostics menu, select an option:  [1-99 or e/p/w or 0 to quit] 12

Adapter Phy 0:  Link Up, No Errors

Adapter Phy 1:  Link Up, No Errors

Adapter Phy 2:  Link Up, No Errors

Adapter Phy 3:  Link Up, No Errors

Adapter Phy 4:  Link Down, No Errors

Adapter Phy 5:  Link Down, No Errors

Adapter Phy 6:  Link Up, No Errors

Adapter Phy 7:  Link Up, No Errors

Expander (Handle 0009) Phy 0:  Link Up, No Errors

Expander (Handle 0009) Phy 1:  Link Up, No Errors

Expander (Handle 0009) Phy 2:  Link Up, No Errors

Expander (Handle 0009) Phy 3:  Link Up, No Errors

Expander (Handle 0009) Phy 4:  Link Up, No Errors

Expander (Handle 0009) Phy 5:  Link Up, No Errors

Expander (Handle 0009) Phy 6:  Link Up, No Errors

Expander (Handle 0009) Phy 7:  Link Up, No Errors

Expander (Handle 0009) Phy 8:  Link Up, No Errors

Expander (Handle 0009) Phy 9:  Link Up, No Errors

Expander (Handle 0009) Phy 10:  Link Up, No Errors

Expander (Handle 0009) Phy 11:  Link Up, No Errors

Expander (Handle 0009) Phy 12:  Link Up
  Invalid DWord Count                                          10
  Running Disparity Error Count                                 0
  Loss of DWord Synch Count                                     2
  Phy Reset Problem Count                                       0

Expander (Handle 0009) Phy 13:  Link Up
  Invalid DWord Count                                          11
  Running Disparity Error Count                                 0
  Loss of DWord Synch Count                                     2
  Phy Reset Problem Count                                       0

Expander (Handle 0009) Phy 14:  Link Up
  Invalid DWord Count                                          12
  Running Disparity Error Count                                 0
  Loss of DWord Synch Count                                     2
  Phy Reset Problem Count                                       0

Expander (Handle 0009) Phy 15:  Link Up
  Invalid DWord Count                                          11
  Running Disparity Error Count                                 3
  Loss of DWord Synch Count                                     2
  Phy Reset Problem Count                                       0

Expander (Handle 0009) Phy 16:  Link Down, No Errors

Expander (Handle 0009) Phy 17:  Link Down, No Errors

Expander (Handle 0009) Phy 18:  Link Down, No Errors

Expander (Handle 0009) Phy 19:  Link Down, No Errors

Expander (Handle 0009) Phy 20:  Link Up, No Errors

Expander (Handle 0009) Phy 21:  Link Up, No Errors

Expander (Handle 0009) Phy 22:  Link Up, No Errors

Expander (Handle 0009) Phy 23:  Link Up, No Errors

Expander (Handle 0009) Phy 24:  Link Down, No Errors

Expander (Handle 0009) Phy 25:  Link Down, No Errors

Expander (Handle 0009) Phy 26:  Link Down, No Errors

Expander (Handle 0009) Phy 27:  Link Down, No Errors

Expander (Handle 0009) Phy 28:  Link Up, No Errors

Expander (Handle 0009) Phy 29:  Link Up, No Errors

Expander (Handle 0009) Phy 30:  Link Up, No Errors

Expander (Handle 0009) Phy 31:  Link Up, No Errors

Expander (Handle 0009) Phy 32:  Link Up, No Errors

Expander (Handle 0009) Phy 33:  Link Up, No Errors

Expander (Handle 0009) Phy 34:  Link Up, No Errors

Expander (Handle 0009) Phy 35:  Link Up, No Errors

Expander (Handle 0009) Phy 36:  Link Up, No Errors

Expander (Handle 0009) Phy 37:  Link Up, No Errors

Expander (Handle 0009) Phy 38:  Link Up, No Errors

Expander (Handle 0009) Phy 39:  Link Up, No Errors

Expander (Handle 0009) Phy 40:  Link Up, No Errors

Expander (Handle 0009) Phy 41:  Link Down, No Errors

Expander (Handle 0009) Phy 42:  Link Down, No Errors
Report Phy Error Log failed with result 16
Report Phy Error Log failed with result 16
Report Phy Error Log failed with result 16
Report Phy Error Log failed with result 16
Report Phy Error Log failed with result 16
Report Phy Error Log failed with result 16
Report Phy Error Log failed with result 16
Report Phy Error Log failed with result 16

Expander (Handle 0017) Phy 0:  Link Down, No Errors

Expander (Handle 0017) Phy 1:  Link Down, No Errors

Expander (Handle 0017) Phy 2:  Link Down, No Errors

Expander (Handle 0017) Phy 3:  Link Down, No Errors

Expander (Handle 0017) Phy 4:  Link Down, No Errors

Expander (Handle 0017) Phy 5:  Link Down, No Errors

Expander (Handle 0017) Phy 6:  Link Down, No Errors

Expander (Handle 0017) Phy 7:  Link Down, No Errors

Expander (Handle 0017) Phy 8:  Link Down, No Errors

Expander (Handle 0017) Phy 9:  Link Down, No Errors

Expander (Handle 0017) Phy 10:  Link Down, No Errors

Expander (Handle 0017) Phy 11:  Link Down, No Errors

Expander (Handle 0017) Phy 12:  Link Down, No Errors

Expander (Handle 0017) Phy 13:  Link Down, No Errors

Expander (Handle 0017) Phy 14:  Link Down, No Errors

Expander (Handle 0017) Phy 15:  Link Down, No Errors

Expander (Handle 0017) Phy 16:  Link Up, No Errors

Expander (Handle 0017) Phy 17:  Link Up, No Errors

Expander (Handle 0017) Phy 18:  Link Up, No Errors

Expander (Handle 0017) Phy 19:  Link Up, No Errors

Expander (Handle 0017) Phy 20:  Link Down, No Errors

Expander (Handle 0017) Phy 21:  Link Down, No Errors

Expander (Handle 0017) Phy 22:  Link Down, No Errors

Expander (Handle 0017) Phy 23:  Link Down, No Errors

Expander (Handle 0017) Phy 24:  Link Down, No Errors

Expander (Handle 0017) Phy 25:  Link Down, No Errors

Expander (Handle 0017) Phy 26:  Link Down, No Errors

Expander (Handle 0017) Phy 27:  Link Down, No Errors

Expander (Handle 0017) Phy 28:  Link Up, No Errors

Expander (Handle 0017) Phy 29:  Link Down, No Errors

Expander (Handle 0017) Phy 30:  Link Down, No Errors
My first step would be to exchange the HBA and the cable to back plane due to several disk failed at the same time.

Does anyone have any experiences in this case?
 
Last edited:

gea

Well-Known Member
Dec 31, 2010
2,500
842
113
DE
Without a flash or overvoltage (PSU) it is unlikely that several disks fail at the same time. Most probably this is a hba, cabling or power problem.

The good thing with ZFS is that even if the pool, a vdev or disk is in degraded or offline state, the pool becomes available again when enough disks come back. ZFS is much more uncritical here than traditional hardware raid as ZFS can be sure if data is valid or nor (due checksums).

I would power off the system, control the whole cabling. Have you done any changes prior the failure, undo. If all trouble disks are on the same HBA, cabling or power, replace or check.

In the end, you need one disk of mirror 2,4,9 to "come back". Then you can access the pool at least in a degraded state.

If disks come back but are not detected properly as pool members
- export + import the pool as this will re-read all disks
- menu pools > clear error to delete faulted state of disks.
 
  • Like
Reactions: gigatexal

vl1969

Active Member
Feb 5, 2014
611
68
28
Just a suggestion but if you can, not sure what is cost effective , start with cabling and psu.
I had similar issue with my home lab where a new disks would drop out for no reason.
I rebuild the whole server. New MB new HBA new cables. Turned out it was psu. Too old and not enough power for 16 hdd. Especially when I added 4 new dell 7200 2tb.
 

Bronko

Member
May 13, 2016
102
7
18
101
@gea Yes, this is what I'm awaiting from ZFS in case some disks coming back. Thanks for your napp-it menu path...

@vl1969 PSU is redundant in this system and no event logs regarding PSU in IPMI.

But isn't confusing lsiutil finished R/W Test on affected Disk without an error?

New HBA and mini SAS Cable are in delivering from vendor currently (some HHDs too).
Will be back.
 
Last edited:

gea

Well-Known Member
Dec 31, 2010
2,500
842
113
DE
Only ZFS use real data checksums. It detects any error, trust ZFS and when in doubt only ZFS.
Any other tool (smartmontools, hardware raid or any other test tool is not near to ZFS)
 

FMA1394

Active Member
Jan 11, 2013
624
186
43
not familiar with zfs on openindiana, but at least on ZoL you can do (at risk of losing a few transactions):

Get all of the drives showing up in lsscsi (or equiv. in openindiana) and:

zpool import <poolname> -F

-F Recovery mode for a non-importable pool. Attempt to
return the pool to an importable state by discarding the
last few transactions. Not all damaged pools can be
recovered by using this option. If successful, the data
from the discarded transactions is irretrievably lost.
This option is ignored if the pool is importable or
already imported.


Good idea to scrub after doing this, should you choose this path
 

vl1969

Active Member
Feb 5, 2014
611
68
28
Well I didn't have any errors or messages about psu in my case too.
My box is heavily moded.
I swiped the original redundant psu for regular atx silver 80. To make it quieter.
But regardless the system had no errors except every few days one of the disks would simply drop off the pool.
I switched the psu to new 1500w model and the box have been running for month now with all the drives no problem. The same drives I puled as failed I put it all back and added to my pools as new.
 

zxv

The more I C, the less I see.
Sep 10, 2017
153
51
28
Just for troubleshooting purposes, it's possible to gather more info about the pool using ZDB:

Examining ZFS Pools with zdb - Lustre Wiki

zdb -C displays the cached information about the pool.

zdb -l displays the labels for members of the pools. This includes GUIDs, and in certain cases like HP hardware, the enclosure and bay numbers. In certain cases it can be helpful to have a copy taken before rearranging or swapping drives, shelves, or for other hardware changes, as a way to compare the previous hardware layout.
 

Bronko

Member
May 13, 2016
102
7
18
101
Thanks for your replies!

@gea about checksum: This is my very first status in DEGRADED state before reboot, no checksum errors, only I/O eroors:
Code:
 pool: tank1
state: DEGRADED
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
   see: http://illumos.org/msg/ZFS-8000-HC
  scan: none requested
config:

NAME                       STATE     READ WRITE CKSUM
tank1                      DEGRADED     2     1     0
  mirror-0                 ONLINE       0     0     0
    c2t5000CCA23B0CEF7Dd0  ONLINE       0     0     0
    c2t5000CCA23B0D18F9d0  ONLINE       0     0     0
  mirror-1                 DEGRADED     0     0     0
    c2t5000CCA23B0CDAE9d0  ONLINE       0     0     0
    c2t5000CCA23B0D0E11d0  FAULTED      0     0     0  external device fault
  mirror-2                 DEGRADED    30    40     0
    c2t5000CCA23B0C20C9d0  DEGRADED    32    44     0  external device fault
    c2t5000CCA23B0CA94Dd0  FAULTED      0     0     0  external device fault
  mirror-3                 ONLINE       0     0     0
    c2t5000CCA23B07B701d0  ONLINE       0     0     0
    c2t5000CCA23B0C9CD5d0  ONLINE       0     0     0
  mirror-4                 DEGRADED     3     3     0
    c2t5000CCA23B0BE229d0  FAULTED      0     0     0  external device fault
    c2t5000CCA23B0C0935d0  ONLINE       3     3     0
  mirror-5                 ONLINE       0     0     0
    c2t5000CCA23B0BFDA9d0  ONLINE       0     0     0
    c2t5000CCA23B0D25C9d0  ONLINE       0     0     0
  mirror-6                 ONLINE       0     0     0
    c2t5000CCA23B0B9121d0  ONLINE       0     0     0
    c2t5000CCA23B0BFCA1d0  ONLINE       0     0     0
  mirror-7                 DEGRADED     0     0     0
    c2t5000CCA23B0BDA41d0  ONLINE       0     0     0
    c2t5000CCA23B0BFBF1d0  FAULTED      0     0     0  external device fault
  mirror-8                 ONLINE       0     0     0
    c2t5000CCA23B0CE5B9d0  ONLINE       0     0     0
    c2t5000CCA23B0CE7A9d0  ONLINE       0     0     0
  mirror-9                 DEGRADED     0     0     0
    c2t5000CCA23B0C0901d0  ONLINE       0     0     0
    c2t5000CCA23B0D1BB5d0  FAULTED      0     0     0  external device fault
  mirror-10                DEGRADED     0     0     0
    c2t5000CCA23B0C00B1d0  FAULTED      0     0     0  external device fault
    c2t5000CCA23B0C9BD5d0  ONLINE       0     0     0
  mirror-11                DEGRADED     0     0     0
    c2t5000CCA23B0A3AE9d0  FAULTED      0     0     0  external device fault
    c2t5000CCA23B0CF6D9d0  ONLINE       0     0     0
logs
  mirror-12                ONLINE       0     0     0
    c1t5002538C401C745Fd0  ONLINE       0     0     0
    c1t5002538C401C7462d0  ONLINE       0     0     0
cache
  c3t1d0                   ONLINE       0     0     0
For now I get the mention in >Home >Pools "cannot open 'tank1': I/O error" and Pool Status is displayed.


@zxv For any faulted Disks I get this:
Code:
# zdb -l /dev/dsk/c2t5000CCA23B0A3AE9d0s0
cannot open '/dev/rdsk/c2t5000CCA23B0A3AE9d0s0': No such file or directory

Console messages:
Code:
# tail -f /var/adm/messages
Feb 22 10:10:49 tanker last message repeated 1 time
Feb 22 10:10:49 tanker scsi: [ID 243001 kern.info]      w5000cca23b0d25c9 FastPath Capable and Enabled
Feb 22 10:10:49 tanker last message repeated 1 time
Feb 22 10:10:49 tanker scsi: [ID 243001 kern.info]      w5000cca23b0c0901 FastPath Capable and Enabled
Feb 22 10:10:49 tanker last message repeated 1 time
Feb 22 10:10:49 tanker scsi: [ID 243001 kern.info]      w5000cca23b0c20c9 FastPath Capable and Enabled
Feb 22 10:10:49 tanker last message repeated 1 time
Feb 22 10:10:49 tanker scsi: [ID 243001 kern.info]      w5000cca23b0c0901 FastPath Capable and Enabled
Feb 22 10:10:49 tanker last message repeated 3 times
Feb 22 10:10:49 tanker scsi: [ID 243001 kern.info]      w5000cca23b0c0935 FastPath Capable and Enabled
Feb 22 10:10:49 tanker last message repeated 1 time
Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:57 tanker  FW Upload tce invalid!
Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:57 tanker  FW Upload tce invalid!
Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:57 tanker  FW Upload tce invalid!
Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:57 tanker  FW Upload tce invalid!
Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:57 tanker  FW Upload tce invalid!
Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:57 tanker  FW Upload tce invalid!
Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:57 tanker  FW Upload tce invalid!
Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:57 tanker  FW Upload tce invalid!
Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:57 tanker  FW Upload tce invalid!
Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:57 tanker  FW Upload tce invalid!
Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:57 tanker  FW Upload tce invalid!
Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:57 tanker  FW Upload tce invalid!
Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:57 tanker  FW Upload tce invalid!
Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:57 tanker  FW Upload tce invalid!
Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:57 tanker  FW Upload tce invalid!
Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:57 tanker  FW Upload tce invalid!
Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:57 tanker  FW Upload tce invalid!
Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:57 tanker  FW Upload tce invalid!
Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:57 tanker  FW Upload tce invalid!
Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:57 tanker  FW Upload tce invalid!
Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:57 tanker  FW Upload tce invalid!
Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:57 tanker  FW Upload tce invalid!
Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:57 tanker  FW Upload tce invalid!
Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:57 tanker  FW Upload tce invalid!
Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:57 tanker  FW Upload tce invalid!
Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:57 tanker  FW Upload tce invalid!
Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:57 tanker  FW Upload tce invalid!
Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:57 tanker  FW Upload tce invalid!
Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:57 tanker  FW Upload tce invalid!
Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:57 tanker  FW Upload tce invalid!
Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:57 tanker  FW Upload tce invalid!
Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:57 tanker  FW Upload tce invalid!
Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:57 tanker  FW Upload tce invalid!
Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:57 tanker  FW Upload tce invalid!
Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:57 tanker  FW Upload tce invalid!
Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:57 tanker  FW Upload tce invalid!
Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:57 tanker  FW Upload tce invalid!
Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:57 tanker  FW Upload tce invalid!
Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:57 tanker  FW Upload tce invalid!
Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:57 tanker  FW Upload tce invalid!
Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:57 tanker  FW Upload tce invalid!
Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:57 tanker  FW Upload tce invalid!
Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:57 tanker  FW Upload tce invalid!
Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:57 tanker  FW Upload tce invalid!
Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:57 tanker  FW Upload tce invalid!
Feb 22 10:10:58 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:58 tanker  FW Upload tce invalid!
Feb 22 10:10:58 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:58 tanker  FW Upload tce invalid!
Feb 22 10:10:58 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:58 tanker  FW Upload tce invalid!
Feb 22 10:10:58 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:58 tanker  FW Upload tce invalid!
Feb 22 10:10:58 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:58 tanker  FW Upload tce invalid!
Feb 22 10:10:58 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:58 tanker  FW Upload tce invalid!
Feb 22 10:10:58 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:58 tanker  FW Upload tce invalid!
Feb 22 10:10:58 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:58 tanker  FW Upload tce invalid!
Feb 22 10:10:58 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:58 tanker  FW Upload tce invalid!
Feb 22 10:10:58 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:58 tanker  FW Upload tce invalid!
Feb 22 10:10:58 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:58 tanker  FW Upload tce invalid!
Feb 22 10:10:58 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:58 tanker  FW Upload tce invalid!
Feb 22 10:10:58 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:58 tanker  FW Upload tce invalid!
Feb 22 10:10:58 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:58 tanker  FW Upload tce invalid!
Feb 22 10:10:58 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:58 tanker  FW Upload tce invalid!
Feb 22 10:10:58 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:58 tanker  FW Upload tce invalid!
Feb 22 10:10:58 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
Feb 22 10:10:58 tanker  FW Upload tce invalid!
 
Last edited:

gea

Well-Known Member
Dec 31, 2010
2,500
842
113
DE
Any ideas in the meantime what has happened
Have you checked for a systematic between faulted disks vs good ones like all on the same HBA, power or cabling?

Have you checked temperature?
Overtemperature (say > 60° celcius) can kill disks and data. (I had a customer where the air conditioner failed without notice with a similar damaged and lost pool )

When state is "degraded", you can access the pool and backup important data. There is a concern with mirror-1 that is degraded + faulted. Access may fail.

from logs
FW Upload tce invalid! is uncritical (a messsage from sasircu on Illumos as this is a tool intended for Oracle Solaris only)
io error just means disk or pool is not accessable (= dead or without power supply)


another point:
You have good disks and bad disks.
What happens if you switch a good disk ex c2t5000CCA23B0CEF7Dd0
with the degraded c2t5000CCA23B0C20C9d0

If cabling, power or backplane is the problem the former good one may get problems while a former degraded or faulted becomes good.

If there is no change, export + import the pool readonly and try to backup changed data (asume you already have a backup) or as much as possible.
 

Bronko

Member
May 13, 2016
102
7
18
101
As mentioned, at first I will replace the HBA (only one in there) and the cable to back plane (all 24 drives in the same back plane). Unfortunately the vendor package will arrive not before monday.

Temperature was one of my first check, all about 30 °C. We have a well monitored in rack cooling system... ;-)

FW Upload tce invalid!
It's well known, but not in this mass...!

Your Disk change observation is a good idea, in case the HBA/Cable change will have no effect.

I have backups of all critical data in the pool because I'm using ZFS Replication to a backup system as a napp-it Pro feature.... ;-)

Little annoyance here, after the update OmniOSce-r151026 -> OmniOSce-r151028 once more I have to redo this steps to REactivate TLS E-Mail:

napp-it // webbased ZFS NAS/SAN appliance for OmniOS, OpenIndiana and Solaris : Downloads
As I have to do here before:
https://forums.servethehome.com/ind...022-long-term-stable.14367/page-3#post-148057

Therefore I got no E-Mail Alert and because of personal sickness and vacation the are some days old... ;-)
 
Last edited:

gea

Well-Known Member
Dec 31, 2010
2,500
842
113
DE
OmniOS 151028 removes the old Sun SSH completely and replace it with Open-SSH.
As a result TLS from 151026 is unworkable after the update. Even a reinstall of TLS didn't work for me.

If you need TLS email I suggest a clean install of 151028 + napp-it via wget + TLS
If you save/restore /var/web-gui/_log/* all napp-it settings remain intact, optionally recreate users with same uid/gid
 

Bronko

Member
May 13, 2016
102
7
18
101
OmniOS 151028 removes the old Sun SSH completely and replace it with Open-SSH.
As a result TLS from 151026 is unworkable after the update. Even a reinstall of TLS didn't work for me.
Reinstall of TLS was successful on two machines with same upgrade history of the last 3 years since initial installation of OmniOS 5.11 omnios-r151018-ae3141d April 2016.

E-Mail notification works as before since 3 days.
 

zxv

The more I C, the less I see.
Sep 10, 2017
153
51
28
Hey @Bronko, zdb -C and -l command take a pool name, so:
zdb -C tank1
zdb -l tank1
 

Bronko

Member
May 13, 2016
102
7
18
101
Ok, HBA replaced (not the Cable) and zpool clear tank1 (stucked before) got me this after some minutes:
Code:
# zpool status tank1

  pool: tank1
 state: ONLINE
  scan: none requested
config:

        NAME                       STATE     READ WRITE CKSUM
        tank1                      ONLINE       0     0     0
          mirror-0                 ONLINE       0     0     0
            c9t5000CCA23B0CEF7Dd0  ONLINE       0     0     0
            c9t5000CCA23B0D18F9d0  ONLINE       0     0     0
          mirror-1                 ONLINE       0     0     0
            c9t5000CCA23B0CDAE9d0  ONLINE       0     0     0
            c9t5000CCA23B0D0E11d0  ONLINE       0     0     0
          mirror-2                 ONLINE       0     0     0
            c9t5000CCA23B0C20C9d0  ONLINE       0     0     0
            c9t5000CCA23B0CA94Dd0  ONLINE       0     0     0
          mirror-3                 ONLINE       0     0     0
            c9t5000CCA23B07B701d0  ONLINE       0     0     0
            c9t5000CCA23B0C9CD5d0  ONLINE       0     0     0
          mirror-4                 ONLINE       0     0     0
            c9t5000CCA23B0BE229d0  ONLINE       0     0     0
            c9t5000CCA23B0C0935d0  ONLINE       0     0     0
          mirror-5                 ONLINE       0     0     0
            c9t5000CCA23B0BFDA9d0  ONLINE       0     0     0
            c9t5000CCA23B0D25C9d0  ONLINE       0     0     0
          mirror-6                 ONLINE       0     0     0
            c9t5000CCA23B0B9121d0  ONLINE       0     0     0
            c9t5000CCA23B0BFCA1d0  ONLINE       0     0     0
          mirror-7                 ONLINE       0     0     0
            c9t5000CCA23B0BDA41d0  ONLINE       0     0     0
            c9t5000CCA23B0BFBF1d0  ONLINE       0     0     0
          mirror-8                 ONLINE       0     0     0
            c9t5000CCA23B0CE5B9d0  ONLINE       0     0     0
            c9t5000CCA23B0CE7A9d0  ONLINE       0     0     0
          mirror-9                 ONLINE       0     0     0
            c9t5000CCA23B0C0901d0  ONLINE       0     0     0
            c9t5000CCA23B0D1BB5d0  ONLINE       0     0     0
          mirror-10                ONLINE       0     0     0
            c9t5000CCA23B0C00B1d0  ONLINE       0     0     0
            c9t5000CCA23B0C9BD5d0  ONLINE       0     0     0
          mirror-11                ONLINE       0     0     0
            c9t5000CCA23B0A3AE9d0  ONLINE       0     0     0
            c9t5000CCA23B0CF6D9d0  ONLINE       0     0     0
        logs
          mirror-12                ONLINE       0     0     0
            c1t5002538C401C745Fd0  ONLINE       0     0     0
            c1t5002538C401C7462d0  ONLINE       0     0     0
        cache
          c3t1d0                   ONLINE       0     0     0

errors: No known data errors
The Cache device (L2ARC, Intel SSD NVMe) is back, too.

And I have had access to zfs filesystems in the pool (mc browsed ;-).

After next reboot the pool is UNAVAIL again:
Code:
# zpool status tank1

  pool: tank1
 state: UNAVAIL
status: One or more devices could not be opened.  There are insufficient
        replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-3C
  scan: none requested
config:

        NAME                       STATE     READ WRITE CKSUM
        tank1                      UNAVAIL      0     0     0  insufficient replicas
          mirror-0                 ONLINE       0     0     0
            c9t5000CCA23B0CEF7Dd0  ONLINE       0     0     0
            c9t5000CCA23B0D18F9d0  ONLINE       0     0     0
          mirror-1                 DEGRADED     0     0     0
            c9t5000CCA23B0CDAE9d0  ONLINE       0     0     0
            c9t5000CCA23B0D0E11d0  UNAVAIL      0     0     0  cannot open
          mirror-2                 UNAVAIL      0     0     0  insufficient replicas
            c9t5000CCA23B0C20C9d0  UNAVAIL      0     0     0  cannot open
            c9t5000CCA23B0CA94Dd0  UNAVAIL      0     0     0  cannot open
          mirror-3                 ONLINE       0     0     0
            c9t5000CCA23B07B701d0  ONLINE       0     0     0
            c9t5000CCA23B0C9CD5d0  ONLINE       0     0     0
          mirror-4                 UNAVAIL      0     0     0  insufficient replicas
            c9t5000CCA23B0BE229d0  UNAVAIL      0     0     0  cannot open
            c9t5000CCA23B0C0935d0  UNAVAIL      0     0     0  cannot open
          mirror-5                 DEGRADED     0     0     0
            c9t5000CCA23B0BFDA9d0  ONLINE       0     0     0
            c9t5000CCA23B0D25C9d0  UNAVAIL      0     0     0  cannot open
          mirror-6                 ONLINE       0     0     0
            c9t5000CCA23B0B9121d0  ONLINE       0     0     0
            c9t5000CCA23B0BFCA1d0  ONLINE       0     0     0
          mirror-7                 DEGRADED     0     0     0
            c9t5000CCA23B0BDA41d0  ONLINE       0     0     0
            c9t5000CCA23B0BFBF1d0  UNAVAIL      0     0     0  cannot open
          mirror-8                 ONLINE       0     0     0
            c9t5000CCA23B0CE5B9d0  ONLINE       0     0     0
            c9t5000CCA23B0CE7A9d0  ONLINE       0     0     0
          mirror-9                 UNAVAIL      0     0     0  insufficient replicas
            c9t5000CCA23B0C0901d0  UNAVAIL      0     0     0  cannot open
            c9t5000CCA23B0D1BB5d0  UNAVAIL      0     0     0  cannot open
          mirror-10                DEGRADED     0     0     0
            c9t5000CCA23B0C00B1d0  UNAVAIL      0     0     0  cannot open
            c9t5000CCA23B0C9BD5d0  ONLINE       0     0     0
          mirror-11                DEGRADED     0     0     0
            c9t5000CCA23B0A3AE9d0  UNAVAIL      0     0     0  cannot open
            c9t5000CCA23B0CF6D9d0  ONLINE       0     0     0
        logs
          mirror-12                ONLINE       0     0     0
            c1t5002538C401C745Fd0  ONLINE       0     0     0
            c1t5002538C401C7462d0  ONLINE       0     0     0
L2ARC is gone again...

Console recrowded:
Code:
# tail -f /var/adm/messages
Feb 25 11:24:12 tanker scsi: [ID 243001 kern.info]      w5000cca23b0c0935 FastPath Capable and Enabled
Feb 25 11:24:12 tanker last message repeated 1 time
Feb 25 11:24:12 tanker scsi: [ID 243001 kern.info]      w5000cca23b0bfbf1 FastPath Capable and Enabled
Feb 25 11:24:12 tanker last message repeated 1 time
Feb 25 11:24:12 tanker scsi: [ID 243001 kern.info]      w5000cca23b0d25c9 FastPath Capable and Enabled
Feb 25 11:24:12 tanker last message repeated 1 time
Feb 25 11:24:12 tanker scsi: [ID 243001 kern.info]      w5000cca23b0a3ae9 FastPath Capable and Enabled
Feb 25 11:24:12 tanker last message repeated 1 time
Feb 25 11:24:12 tanker scsi: [ID 243001 kern.info]      w5000cca23b0c0935 FastPath Capable and Enabled
Feb 25 11:24:12 tanker last message repeated 1 time
Feb 25 11:24:12 tanker scsi: [ID 243001 kern.info]      w5000cca23b0c0901 FastPath Capable and Enabled
Feb 25 11:24:12 tanker last message repeated 1 time
Feb 25 11:24:12 tanker scsi: [ID 243001 kern.info]      w5000cca23b0d25c9 FastPath Capable and Enabled
Feb 25 11:24:12 tanker last message repeated 1 time
Feb 25 11:24:12 tanker scsi: [ID 243001 kern.info]      w5000cca23b0a3ae9 FastPath Capable and Enabled
Feb 25 11:24:12 tanker last message repeated 1 time
Feb 25 11:24:12 tanker scsi: [ID 243001 kern.info]      w5000cca23b0be229 FastPath Capable and Enabled
Feb 25 11:24:12 tanker last message repeated 1 time
Feb 25 11:24:12 tanker scsi: [ID 243001 kern.info]      w5000cca23b0c20c9 FastPath Capable and Enabled
Feb 25 11:24:12 tanker last message repeated 1 time
Feb 25 11:24:12 tanker scsi: [ID 243001 kern.info]      w5000cca23b0c0901 FastPath Capable and Enabled
Feb 25 11:24:12 tanker last message repeated 1 time
Feb 25 11:24:12 tanker scsi: [ID 243001 kern.info]      w5000cca23b0a3ae9 FastPath Capable and Enabled
Feb 25 11:24:12 tanker last message repeated 1 time
Feb 25 11:24:12 tanker scsi: [ID 243001 kern.info]      w5000cca23b0c0935 FastPath Capable and Enabled
Feb 25 11:24:12 tanker last message repeated 1 time
Feb 25 11:24:12 tanker scsi: [ID 243001 kern.info]      w5000cca23b0c20c9 FastPath Capable and Enabled
Feb 25 11:24:12 tanker last message repeated 1 time
Feb 25 11:24:12 tanker scsi: [ID 243001 kern.info]      w5000cca23b0a3ae9 FastPath Capable and Enabled
Feb 25 11:24:12 tanker last message repeated 1 time
Feb 25 11:24:12 tanker scsi: [ID 243001 kern.info]      w5000cca23b0be229 FastPath Capable and Enabled
Feb 25 11:24:12 tanker last message repeated 1 time
Feb 25 11:24:12 tanker scsi: [ID 243001 kern.info]      w5000cca23b0a3ae9 FastPath Capable and Enabled
Feb 25 11:24:12 tanker last message repeated 1 time
Feb 25 11:24:13 tanker scsi: [ID 243001 kern.info]      w5000cca23b0ca94d FastPath Capable and Enabled
Feb 25 11:24:13 tanker last message repeated 1 time
Feb 25 11:24:13 tanker cmlb: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci1000,30e0@0/iport@f/disk@w5000cca23b0d18f9,0 (sd29):
Feb 25 11:24:13 tanker  primary label corrupt; using backup
Feb 25 11:24:13 tanker scsi: [ID 243001 kern.info]      w5000cca23b0c0935 FastPath Capable and Enabled
Feb 25 11:24:13 tanker last message repeated 1 time
Feb 25 11:24:13 tanker scsi: [ID 243001 kern.info]      w5000cca23b0c00b1 FastPath Capable and Enabled
Feb 25 11:24:13 tanker last message repeated 1 time
Feb 25 11:24:13 tanker scsi: [ID 243001 kern.info]      w5000cca23b0d0e11 FastPath Capable and Enabled
Feb 25 11:24:13 tanker last message repeated 1 time
Feb 25 11:24:13 tanker scsi: [ID 243001 kern.info]      w5000cca23b0c20c9 FastPath Capable and Enabled
Feb 25 11:24:13 tanker last message repeated 1 time
Feb 25 11:24:13 tanker scsi: [ID 243001 kern.info]      w5000cca23b0c0901 FastPath Capable and Enabled
Feb 25 11:24:13 tanker last message repeated 1 time
Feb 25 11:24:13 tanker scsi: [ID 243001 kern.info]      w5000cca23b0d25c9 FastPath Capable and Enabled
Feb 25 11:24:13 tanker last message repeated 1 time
Feb 25 11:24:13 tanker scsi: [ID 243001 kern.info]      w5000cca23b0be229 FastPath Capable and Enabled
Feb 25 11:24:13 tanker last message repeated 1 time
Feb 25 11:24:13 tanker scsi: [ID 243001 kern.info]      w5000cca23b0ca94d FastPath Capable and Enabled
Feb 25 11:24:13 tanker last message repeated 1 time
.
.
.
 
Last edited:

Bronko

Member
May 13, 2016
102
7
18
101
another point:
You have good disks and bad disks.
What happens if you switch a good disk ex c2t5000CCA23B0CEF7Dd0
with the degraded c2t5000CCA23B0C20C9d0

If cabling, power or backplane is the problem the former good one may get problems while a former degraded or faulted becomes good.

If there is no change, export + import the pool readonly and try to backup changed data (asume you already have a backup) or as much as possible.
Tried exactly this and have alternated results. Sometimes the bad disk is alive (smartctl have access too) but not reboot aware, sometimes it keeps bad status. Good disk doesn't change the status.
 
Last edited:

zxv

The more I C, the less I see.
Sep 10, 2017
153
51
28
Given zfs cannot open the disk at all, is there any other messages about those devices in the in the kernel log?

It should be possible to use smartctl on omnios to get the drive health information.
pkg install smartmontools
/opt/ooce/sbin/smartctl -a /dev/rdsk/c9t5000CCA23B0D0E11d0
 

Bronko

Member
May 13, 2016
102
7
18
101
Given zfs cannot open the disk at all, is there any other messages about those devices in the in the kernel log?

It should be possible to use smartctl on omnios to get the drive health information.
pkg install smartmontools
/opt/ooce/sbin/smartctl -a /dev/rdsk/c9t5000CCA23B0D0E11d0
smartctl doesn't work on bad disks:
(we are back on old HBA: /dev/rdsk/c9... -> /dev/rdsk/c2...)
Code:
# smartctl -a -d scsi -T permissive /dev/rdsk/c2t5000CCA23B0C20C9d0s0
smartctl 6.5 2016-05-07 r4318 [i386-pc-solaris2.11] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

Smartctl open device: /dev/rdsk/c2t5000CCA23B0C20C9d0s0 failed: No such device or address
Check against good disk:
Code:
# smartctl -a -d scsi -T permissive /dev/rdsk/c2t5000CCA23B0CEF7Dd0s0
smartctl 6.5 2016-05-07 r4318 [i386-pc-solaris2.11] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               HGST
Product:              HUH728080AL4200
Revision:             A515
Compliance:           SPC-4
User Capacity:        8.001.563.222.016 bytes [8,00 TB]
Logical block size:   4096 bytes
LU is fully provisioned
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000cca23b0cef7c
Serial number:        12345678  (masqueraded by Bronko)
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Mon Feb 25 17:37:48 2019 CET
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:     32 C
Drive Trip Temperature:        85 C

Manufactured in week 15 of year 2015
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  36
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  6305
Elements in grown defect list: 0

Vendor (Seagate) cache information
  Blocks sent to initiator = 5644481034452992

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0        0         0         0    2846846      11445,872           0
write:         0        0         0         0     363236      22256,724           0
verify:        0        0         0         0      18588          0,000           0

Non-medium error count:        0

No self-tests have been logged
 

zxv

The more I C, the less I see.
Sep 10, 2017
153
51
28
Yea, that doesn't point toward drive health issues.
Given the results are strange, have you considered the health of the root pool?