ZFS Pool Degraded -> Unavail

Discussion in 'Solaris, Nexenta, OpenIndiana, and napp-it' started by Bronko, Feb 21, 2019.

  1. Bronko

    Bronko Member

    Joined:
    May 13, 2016
    Messages:
    65
    Likes Received:
    6
    Coming from Napp-it SuperStorage Server 6048R-E1CR36L Performance and after 3 years working fine, this is what my data pool currently looks like:

    Code:
    # cat /etc/release
      OmniOS v11 r151028o
      Copyright 2017 OmniTI Computer Consulting, Inc. All rights reserved.
      Copyright 2017-2019 OmniOS Community Edition (OmniOSce) Association.
      All rights reserved. Use is subject to licence terms.
    
    # zpool status tank1
      pool: tank1
     state: UNAVAIL
    status: One or more devices are faulted in response to persistent errors.  There are insufficient replicas for the pool to
            continue functioning.
    action: Destroy and re-create the pool from a backup source.  Manually marking the device
            repaired using 'zpool clear' may allow some data to be recovered.
      scan: none requested
    config:
    
            NAME                       STATE     READ WRITE CKSUM
            tank1                      UNAVAIL      0     0     0  insufficient replicas
              mirror-0                 ONLINE       0     0     0
                c2t5000CCA23B0CEF7Dd0  ONLINE       0     0     0
                c2t5000CCA23B0D18F9d0  ONLINE       0     0     0
              mirror-1                 DEGRADED     0     0     0
                c2t5000CCA23B0CDAE9d0  ONLINE       0     0     0
                c2t5000CCA23B0D0E11d0  FAULTED      0     0     0  external device fault
              mirror-2                 UNAVAIL      0     0     0  insufficient replicas
                c2t5000CCA23B0C20C9d0  UNAVAIL      0     0     0  cannot open
                c2t5000CCA23B0CA94Dd0  FAULTED      0     0     0  external device fault
              mirror-3                 ONLINE       0     0     0
                c2t5000CCA23B07B701d0  ONLINE       0     0     0
                c2t5000CCA23B0C9CD5d0  ONLINE       0     0     0
              mirror-4                 UNAVAIL      0     0     0  insufficient replicas
                c2t5000CCA23B0BE229d0  FAULTED      0     0     0  external device fault
                c2t5000CCA23B0C0935d0  UNAVAIL      0     0     0  cannot open
              mirror-5                 DEGRADED     0     0     0
                c2t5000CCA23B0BFDA9d0  ONLINE       0     0     0
                c2t5000CCA23B0D25C9d0  UNAVAIL      0     0     0  cannot open
              mirror-6                 ONLINE       0     0     0
                c2t5000CCA23B0B9121d0  ONLINE       0     0     0
                c2t5000CCA23B0BFCA1d0  ONLINE       0     0     0
              mirror-7                 DEGRADED     0     0     0
                c2t5000CCA23B0BDA41d0  ONLINE       0     0     0
                c2t5000CCA23B0BFBF1d0  FAULTED      0     0     0  external device fault
              mirror-8                 ONLINE       0     0     0
                c2t5000CCA23B0CE5B9d0  ONLINE       0     0     0
                c2t5000CCA23B0CE7A9d0  ONLINE       0     0     0
              mirror-9                 UNAVAIL      0     0     0  insufficient replicas
                c2t5000CCA23B0C0901d0  UNAVAIL      0     0     0  cannot open
                c2t5000CCA23B0D1BB5d0  FAULTED      0     0     0  external device fault
              mirror-10                DEGRADED     0     0     0
                c2t5000CCA23B0C00B1d0  FAULTED      0     0     0  external device fault
                c2t5000CCA23B0C9BD5d0  ONLINE       0     0     0
              mirror-11                DEGRADED     0     0     0
                c2t5000CCA23B0A3AE9d0  FAULTED      0     0     0  external device fault
                c2t5000CCA23B0CF6D9d0  ONLINE       0     0     0
            logs
              mirror-12                ONLINE       0     0     0
                c1t5002538C401C745Fd0  ONLINE       0     0     0
                c1t5002538C401C7462d0  ONLINE       0     0     0
    
    11 Drives faulted nearly at the same time.

    The pool degrading starts by 8 drives and the first try with zpool clear tank1 added 3 more drives.
    Status above after reboot (cache device is gone).

    A test by /var/web-gui/_my/tools/sas3ircu/lsiutil.i386 (napp-it) on one of affected disk finished with no errors!?
    Code:
    Select a device:  [1-26 or RETURN to quit] 4
    
    1.  Alternating, 8-Bit, 00 and FF
    2.  Alternating, 8-Bit, 55 and AA
    3.  Incrementing, 8-Bit
    4.  Walking 1s and 0s, 8-Bit
    5.  Alternating, 16-Bit, 0000 and FFFF
    6.  Alternating, 16-Bit, 5555 and AAAA
    7.  Incrementing, 16-Bit
    8.  Walking 1s and 0s, 16-Bit
    9:  Random
    10:  All B5
    11:  All 4A
    12:  Incrementing across iterations (00 through FF)
    
    Select a data pattern:  [1-12 or RETURN to quit] 9
    Number of blocks per I/O:  [1-64 or RETURN to quit] 64
    Number of iterations:  [1-1000000 or 0 for infinite or RETURN to quit] 10000
    Type of I/O:  [0=Sequential, 1=Random, default is 0] 1
    Stop test on Write, Read, or Compare error?  [Yes or No, default is Yes]
    Testing started...
    10%  20%  30%  40%  50%  60%  70%  80%  90%  100%
    Testing ended...
    
    But some abnormality at phy counters?
    Code:
    1.  Inquiry Test
    2.  WriteBuffer/ReadBuffer/Compare Test
    3.  Read Test
    4.  Write/Read/Compare Test
    8.  Read Capacity / Read Block Limits Test
    12.  Display phy counters
    13.  Clear phy counters
    14.  SATA SMART Read Test
    15.  SEP (SCSI Enclosure Processor) Test
    18.  Report LUNs Test
    19.  Drive firmware download
    20.  Expander firmware download
    21.  Read Logical Blocks
    99.  Reset port
    e   Enable expert mode in menus
    p   Enable paged mode
    w   Enable logging
    
    Diagnostics menu, select an option:  [1-99 or e/p/w or 0 to quit] 12
    
    Adapter Phy 0:  Link Up, No Errors
    
    Adapter Phy 1:  Link Up, No Errors
    
    Adapter Phy 2:  Link Up, No Errors
    
    Adapter Phy 3:  Link Up, No Errors
    
    Adapter Phy 4:  Link Down, No Errors
    
    Adapter Phy 5:  Link Down, No Errors
    
    Adapter Phy 6:  Link Up, No Errors
    
    Adapter Phy 7:  Link Up, No Errors
    
    Expander (Handle 0009) Phy 0:  Link Up, No Errors
    
    Expander (Handle 0009) Phy 1:  Link Up, No Errors
    
    Expander (Handle 0009) Phy 2:  Link Up, No Errors
    
    Expander (Handle 0009) Phy 3:  Link Up, No Errors
    
    Expander (Handle 0009) Phy 4:  Link Up, No Errors
    
    Expander (Handle 0009) Phy 5:  Link Up, No Errors
    
    Expander (Handle 0009) Phy 6:  Link Up, No Errors
    
    Expander (Handle 0009) Phy 7:  Link Up, No Errors
    
    Expander (Handle 0009) Phy 8:  Link Up, No Errors
    
    Expander (Handle 0009) Phy 9:  Link Up, No Errors
    
    Expander (Handle 0009) Phy 10:  Link Up, No Errors
    
    Expander (Handle 0009) Phy 11:  Link Up, No Errors
    
    Expander (Handle 0009) Phy 12:  Link Up
      Invalid DWord Count                                          10
      Running Disparity Error Count                                 0
      Loss of DWord Synch Count                                     2
      Phy Reset Problem Count                                       0
    
    Expander (Handle 0009) Phy 13:  Link Up
      Invalid DWord Count                                          11
      Running Disparity Error Count                                 0
      Loss of DWord Synch Count                                     2
      Phy Reset Problem Count                                       0
    
    Expander (Handle 0009) Phy 14:  Link Up
      Invalid DWord Count                                          12
      Running Disparity Error Count                                 0
      Loss of DWord Synch Count                                     2
      Phy Reset Problem Count                                       0
    
    Expander (Handle 0009) Phy 15:  Link Up
      Invalid DWord Count                                          11
      Running Disparity Error Count                                 3
      Loss of DWord Synch Count                                     2
      Phy Reset Problem Count                                       0
    
    Expander (Handle 0009) Phy 16:  Link Down, No Errors
    
    Expander (Handle 0009) Phy 17:  Link Down, No Errors
    
    Expander (Handle 0009) Phy 18:  Link Down, No Errors
    
    Expander (Handle 0009) Phy 19:  Link Down, No Errors
    
    Expander (Handle 0009) Phy 20:  Link Up, No Errors
    
    Expander (Handle 0009) Phy 21:  Link Up, No Errors
    
    Expander (Handle 0009) Phy 22:  Link Up, No Errors
    
    Expander (Handle 0009) Phy 23:  Link Up, No Errors
    
    Expander (Handle 0009) Phy 24:  Link Down, No Errors
    
    Expander (Handle 0009) Phy 25:  Link Down, No Errors
    
    Expander (Handle 0009) Phy 26:  Link Down, No Errors
    
    Expander (Handle 0009) Phy 27:  Link Down, No Errors
    
    Expander (Handle 0009) Phy 28:  Link Up, No Errors
    
    Expander (Handle 0009) Phy 29:  Link Up, No Errors
    
    Expander (Handle 0009) Phy 30:  Link Up, No Errors
    
    Expander (Handle 0009) Phy 31:  Link Up, No Errors
    
    Expander (Handle 0009) Phy 32:  Link Up, No Errors
    
    Expander (Handle 0009) Phy 33:  Link Up, No Errors
    
    Expander (Handle 0009) Phy 34:  Link Up, No Errors
    
    Expander (Handle 0009) Phy 35:  Link Up, No Errors
    
    Expander (Handle 0009) Phy 36:  Link Up, No Errors
    
    Expander (Handle 0009) Phy 37:  Link Up, No Errors
    
    Expander (Handle 0009) Phy 38:  Link Up, No Errors
    
    Expander (Handle 0009) Phy 39:  Link Up, No Errors
    
    Expander (Handle 0009) Phy 40:  Link Up, No Errors
    
    Expander (Handle 0009) Phy 41:  Link Down, No Errors
    
    Expander (Handle 0009) Phy 42:  Link Down, No Errors
    Report Phy Error Log failed with result 16
    Report Phy Error Log failed with result 16
    Report Phy Error Log failed with result 16
    Report Phy Error Log failed with result 16
    Report Phy Error Log failed with result 16
    Report Phy Error Log failed with result 16
    Report Phy Error Log failed with result 16
    Report Phy Error Log failed with result 16
    
    Expander (Handle 0017) Phy 0:  Link Down, No Errors
    
    Expander (Handle 0017) Phy 1:  Link Down, No Errors
    
    Expander (Handle 0017) Phy 2:  Link Down, No Errors
    
    Expander (Handle 0017) Phy 3:  Link Down, No Errors
    
    Expander (Handle 0017) Phy 4:  Link Down, No Errors
    
    Expander (Handle 0017) Phy 5:  Link Down, No Errors
    
    Expander (Handle 0017) Phy 6:  Link Down, No Errors
    
    Expander (Handle 0017) Phy 7:  Link Down, No Errors
    
    Expander (Handle 0017) Phy 8:  Link Down, No Errors
    
    Expander (Handle 0017) Phy 9:  Link Down, No Errors
    
    Expander (Handle 0017) Phy 10:  Link Down, No Errors
    
    Expander (Handle 0017) Phy 11:  Link Down, No Errors
    
    Expander (Handle 0017) Phy 12:  Link Down, No Errors
    
    Expander (Handle 0017) Phy 13:  Link Down, No Errors
    
    Expander (Handle 0017) Phy 14:  Link Down, No Errors
    
    Expander (Handle 0017) Phy 15:  Link Down, No Errors
    
    Expander (Handle 0017) Phy 16:  Link Up, No Errors
    
    Expander (Handle 0017) Phy 17:  Link Up, No Errors
    
    Expander (Handle 0017) Phy 18:  Link Up, No Errors
    
    Expander (Handle 0017) Phy 19:  Link Up, No Errors
    
    Expander (Handle 0017) Phy 20:  Link Down, No Errors
    
    Expander (Handle 0017) Phy 21:  Link Down, No Errors
    
    Expander (Handle 0017) Phy 22:  Link Down, No Errors
    
    Expander (Handle 0017) Phy 23:  Link Down, No Errors
    
    Expander (Handle 0017) Phy 24:  Link Down, No Errors
    
    Expander (Handle 0017) Phy 25:  Link Down, No Errors
    
    Expander (Handle 0017) Phy 26:  Link Down, No Errors
    
    Expander (Handle 0017) Phy 27:  Link Down, No Errors
    
    Expander (Handle 0017) Phy 28:  Link Up, No Errors
    
    Expander (Handle 0017) Phy 29:  Link Down, No Errors
    
    Expander (Handle 0017) Phy 30:  Link Down, No Errors
    My first step would be to exchange the HBA and the cable to back plane due to several disk failed at the same time.

    Does anyone have any experiences in this case?
     
    #1
    Last edited: Feb 21, 2019
  2. gea

    gea Well-Known Member

    Joined:
    Dec 31, 2010
    Messages:
    2,167
    Likes Received:
    706
    Without a flash or overvoltage (PSU) it is unlikely that several disks fail at the same time. Most probably this is a hba, cabling or power problem.

    The good thing with ZFS is that even if the pool, a vdev or disk is in degraded or offline state, the pool becomes available again when enough disks come back. ZFS is much more uncritical here than traditional hardware raid as ZFS can be sure if data is valid or nor (due checksums).

    I would power off the system, control the whole cabling. Have you done any changes prior the failure, undo. If all trouble disks are on the same HBA, cabling or power, replace or check.

    In the end, you need one disk of mirror 2,4,9 to "come back". Then you can access the pool at least in a degraded state.

    If disks come back but are not detected properly as pool members
    - export + import the pool as this will re-read all disks
    - menu pools > clear error to delete faulted state of disks.
     
    #2
    gigatexal likes this.
  3. vl1969

    vl1969 Active Member

    Joined:
    Feb 5, 2014
    Messages:
    567
    Likes Received:
    63
    Just a suggestion but if you can, not sure what is cost effective , start with cabling and psu.
    I had similar issue with my home lab where a new disks would drop out for no reason.
    I rebuild the whole server. New MB new HBA new cables. Turned out it was psu. Too old and not enough power for 16 hdd. Especially when I added 4 new dell 7200 2tb.
     
    #3
  4. Bronko

    Bronko Member

    Joined:
    May 13, 2016
    Messages:
    65
    Likes Received:
    6
    @gea Yes, this is what I'm awaiting from ZFS in case some disks coming back. Thanks for your napp-it menu path...

    @vl1969 PSU is redundant in this system and no event logs regarding PSU in IPMI.

    But isn't confusing lsiutil finished R/W Test on affected Disk without an error?

    New HBA and mini SAS Cable are in delivering from vendor currently (some HHDs too).
    Will be back.
     
    #4
    Last edited: Feb 22, 2019
  5. gea

    gea Well-Known Member

    Joined:
    Dec 31, 2010
    Messages:
    2,167
    Likes Received:
    706
    Only ZFS use real data checksums. It detects any error, trust ZFS and when in doubt only ZFS.
    Any other tool (smartmontools, hardware raid or any other test tool is not near to ZFS)
     
    #5
  6. FMA1394

    FMA1394 Active Member

    Joined:
    Jan 11, 2013
    Messages:
    615
    Likes Received:
    170
    not familiar with zfs on openindiana, but at least on ZoL you can do (at risk of losing a few transactions):

    Get all of the drives showing up in lsscsi (or equiv. in openindiana) and:

    zpool import <poolname> -F

    -F Recovery mode for a non-importable pool. Attempt to
    return the pool to an importable state by discarding the
    last few transactions. Not all damaged pools can be
    recovered by using this option. If successful, the data
    from the discarded transactions is irretrievably lost.
    This option is ignored if the pool is importable or
    already imported.


    Good idea to scrub after doing this, should you choose this path
     
    #6
  7. vl1969

    vl1969 Active Member

    Joined:
    Feb 5, 2014
    Messages:
    567
    Likes Received:
    63
    Well I didn't have any errors or messages about psu in my case too.
    My box is heavily moded.
    I swiped the original redundant psu for regular atx silver 80. To make it quieter.
    But regardless the system had no errors except every few days one of the disks would simply drop off the pool.
    I switched the psu to new 1500w model and the box have been running for month now with all the drives no problem. The same drives I puled as failed I put it all back and added to my pools as new.
     
    #7
  8. zxv

    zxv The more I C, the less I see.

    Joined:
    Sep 10, 2017
    Messages:
    152
    Likes Received:
    46
    Just for troubleshooting purposes, it's possible to gather more info about the pool using ZDB:

    Examining ZFS Pools with zdb - Lustre Wiki

    zdb -C displays the cached information about the pool.

    zdb -l displays the labels for members of the pools. This includes GUIDs, and in certain cases like HP hardware, the enclosure and bay numbers. In certain cases it can be helpful to have a copy taken before rearranging or swapping drives, shelves, or for other hardware changes, as a way to compare the previous hardware layout.
     
    #8
  9. Bronko

    Bronko Member

    Joined:
    May 13, 2016
    Messages:
    65
    Likes Received:
    6
    Thanks for your replies!

    @gea about checksum: This is my very first status in DEGRADED state before reboot, no checksum errors, only I/O eroors:
    Code:
     pool: tank1
    state: DEGRADED
    status: One or more devices are faulted in response to IO failures.
    action: Make sure the affected devices are connected, then run 'zpool clear'.
       see: http://illumos.org/msg/ZFS-8000-HC
      scan: none requested
    config:
    
    NAME                       STATE     READ WRITE CKSUM
    tank1                      DEGRADED     2     1     0
      mirror-0                 ONLINE       0     0     0
        c2t5000CCA23B0CEF7Dd0  ONLINE       0     0     0
        c2t5000CCA23B0D18F9d0  ONLINE       0     0     0
      mirror-1                 DEGRADED     0     0     0
        c2t5000CCA23B0CDAE9d0  ONLINE       0     0     0
        c2t5000CCA23B0D0E11d0  FAULTED      0     0     0  external device fault
      mirror-2                 DEGRADED    30    40     0
        c2t5000CCA23B0C20C9d0  DEGRADED    32    44     0  external device fault
        c2t5000CCA23B0CA94Dd0  FAULTED      0     0     0  external device fault
      mirror-3                 ONLINE       0     0     0
        c2t5000CCA23B07B701d0  ONLINE       0     0     0
        c2t5000CCA23B0C9CD5d0  ONLINE       0     0     0
      mirror-4                 DEGRADED     3     3     0
        c2t5000CCA23B0BE229d0  FAULTED      0     0     0  external device fault
        c2t5000CCA23B0C0935d0  ONLINE       3     3     0
      mirror-5                 ONLINE       0     0     0
        c2t5000CCA23B0BFDA9d0  ONLINE       0     0     0
        c2t5000CCA23B0D25C9d0  ONLINE       0     0     0
      mirror-6                 ONLINE       0     0     0
        c2t5000CCA23B0B9121d0  ONLINE       0     0     0
        c2t5000CCA23B0BFCA1d0  ONLINE       0     0     0
      mirror-7                 DEGRADED     0     0     0
        c2t5000CCA23B0BDA41d0  ONLINE       0     0     0
        c2t5000CCA23B0BFBF1d0  FAULTED      0     0     0  external device fault
      mirror-8                 ONLINE       0     0     0
        c2t5000CCA23B0CE5B9d0  ONLINE       0     0     0
        c2t5000CCA23B0CE7A9d0  ONLINE       0     0     0
      mirror-9                 DEGRADED     0     0     0
        c2t5000CCA23B0C0901d0  ONLINE       0     0     0
        c2t5000CCA23B0D1BB5d0  FAULTED      0     0     0  external device fault
      mirror-10                DEGRADED     0     0     0
        c2t5000CCA23B0C00B1d0  FAULTED      0     0     0  external device fault
        c2t5000CCA23B0C9BD5d0  ONLINE       0     0     0
      mirror-11                DEGRADED     0     0     0
        c2t5000CCA23B0A3AE9d0  FAULTED      0     0     0  external device fault
        c2t5000CCA23B0CF6D9d0  ONLINE       0     0     0
    logs
      mirror-12                ONLINE       0     0     0
        c1t5002538C401C745Fd0  ONLINE       0     0     0
        c1t5002538C401C7462d0  ONLINE       0     0     0
    cache
      c3t1d0                   ONLINE       0     0     0
    For now I get the mention in >Home >Pools "cannot open 'tank1': I/O error" and Pool Status is displayed.


    @zxv For any faulted Disks I get this:
    Code:
    # zdb -l /dev/dsk/c2t5000CCA23B0A3AE9d0s0
    cannot open '/dev/rdsk/c2t5000CCA23B0A3AE9d0s0': No such file or directory

    Console messages:
    Code:
    # tail -f /var/adm/messages
    Feb 22 10:10:49 tanker last message repeated 1 time
    Feb 22 10:10:49 tanker scsi: [ID 243001 kern.info]      w5000cca23b0d25c9 FastPath Capable and Enabled
    Feb 22 10:10:49 tanker last message repeated 1 time
    Feb 22 10:10:49 tanker scsi: [ID 243001 kern.info]      w5000cca23b0c0901 FastPath Capable and Enabled
    Feb 22 10:10:49 tanker last message repeated 1 time
    Feb 22 10:10:49 tanker scsi: [ID 243001 kern.info]      w5000cca23b0c20c9 FastPath Capable and Enabled
    Feb 22 10:10:49 tanker last message repeated 1 time
    Feb 22 10:10:49 tanker scsi: [ID 243001 kern.info]      w5000cca23b0c0901 FastPath Capable and Enabled
    Feb 22 10:10:49 tanker last message repeated 3 times
    Feb 22 10:10:49 tanker scsi: [ID 243001 kern.info]      w5000cca23b0c0935 FastPath Capable and Enabled
    Feb 22 10:10:49 tanker last message repeated 1 time
    Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:57 tanker  FW Upload tce invalid!
    Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:57 tanker  FW Upload tce invalid!
    Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:57 tanker  FW Upload tce invalid!
    Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:57 tanker  FW Upload tce invalid!
    Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:57 tanker  FW Upload tce invalid!
    Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:57 tanker  FW Upload tce invalid!
    Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:57 tanker  FW Upload tce invalid!
    Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:57 tanker  FW Upload tce invalid!
    Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:57 tanker  FW Upload tce invalid!
    Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:57 tanker  FW Upload tce invalid!
    Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:57 tanker  FW Upload tce invalid!
    Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:57 tanker  FW Upload tce invalid!
    Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:57 tanker  FW Upload tce invalid!
    Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:57 tanker  FW Upload tce invalid!
    Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:57 tanker  FW Upload tce invalid!
    Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:57 tanker  FW Upload tce invalid!
    Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:57 tanker  FW Upload tce invalid!
    Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:57 tanker  FW Upload tce invalid!
    Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:57 tanker  FW Upload tce invalid!
    Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:57 tanker  FW Upload tce invalid!
    Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:57 tanker  FW Upload tce invalid!
    Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:57 tanker  FW Upload tce invalid!
    Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:57 tanker  FW Upload tce invalid!
    Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:57 tanker  FW Upload tce invalid!
    Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:57 tanker  FW Upload tce invalid!
    Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:57 tanker  FW Upload tce invalid!
    Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:57 tanker  FW Upload tce invalid!
    Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:57 tanker  FW Upload tce invalid!
    Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:57 tanker  FW Upload tce invalid!
    Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:57 tanker  FW Upload tce invalid!
    Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:57 tanker  FW Upload tce invalid!
    Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:57 tanker  FW Upload tce invalid!
    Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:57 tanker  FW Upload tce invalid!
    Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:57 tanker  FW Upload tce invalid!
    Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:57 tanker  FW Upload tce invalid!
    Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:57 tanker  FW Upload tce invalid!
    Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:57 tanker  FW Upload tce invalid!
    Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:57 tanker  FW Upload tce invalid!
    Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:57 tanker  FW Upload tce invalid!
    Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:57 tanker  FW Upload tce invalid!
    Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:57 tanker  FW Upload tce invalid!
    Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:57 tanker  FW Upload tce invalid!
    Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:57 tanker  FW Upload tce invalid!
    Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:57 tanker  FW Upload tce invalid!
    Feb 22 10:10:57 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:57 tanker  FW Upload tce invalid!
    Feb 22 10:10:58 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:58 tanker  FW Upload tce invalid!
    Feb 22 10:10:58 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:58 tanker  FW Upload tce invalid!
    Feb 22 10:10:58 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:58 tanker  FW Upload tce invalid!
    Feb 22 10:10:58 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:58 tanker  FW Upload tce invalid!
    Feb 22 10:10:58 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:58 tanker  FW Upload tce invalid!
    Feb 22 10:10:58 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:58 tanker  FW Upload tce invalid!
    Feb 22 10:10:58 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:58 tanker  FW Upload tce invalid!
    Feb 22 10:10:58 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:58 tanker  FW Upload tce invalid!
    Feb 22 10:10:58 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:58 tanker  FW Upload tce invalid!
    Feb 22 10:10:58 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:58 tanker  FW Upload tce invalid!
    Feb 22 10:10:58 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:58 tanker  FW Upload tce invalid!
    Feb 22 10:10:58 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:58 tanker  FW Upload tce invalid!
    Feb 22 10:10:58 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:58 tanker  FW Upload tce invalid!
    Feb 22 10:10:58 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:58 tanker  FW Upload tce invalid!
    Feb 22 10:10:58 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:58 tanker  FW Upload tce invalid!
    Feb 22 10:10:58 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:58 tanker  FW Upload tce invalid!
    Feb 22 10:10:58 tanker scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci15d9,808@0 (mpt_sas0):
    Feb 22 10:10:58 tanker  FW Upload tce invalid!
    
     
    #9
    Last edited: Feb 22, 2019
  10. gea

    gea Well-Known Member

    Joined:
    Dec 31, 2010
    Messages:
    2,167
    Likes Received:
    706
    Any ideas in the meantime what has happened
    Have you checked for a systematic between faulted disks vs good ones like all on the same HBA, power or cabling?

    Have you checked temperature?
    Overtemperature (say > 60° celcius) can kill disks and data. (I had a customer where the air conditioner failed without notice with a similar damaged and lost pool )

    When state is "degraded", you can access the pool and backup important data. There is a concern with mirror-1 that is degraded + faulted. Access may fail.

    from logs
    FW Upload tce invalid! is uncritical (a messsage from sasircu on Illumos as this is a tool intended for Oracle Solaris only)
    io error just means disk or pool is not accessable (= dead or without power supply)


    another point:
    You have good disks and bad disks.
    What happens if you switch a good disk ex c2t5000CCA23B0CEF7Dd0
    with the degraded c2t5000CCA23B0C20C9d0

    If cabling, power or backplane is the problem the former good one may get problems while a former degraded or faulted becomes good.

    If there is no change, export + import the pool readonly and try to backup changed data (asume you already have a backup) or as much as possible.
     
    #10
  11. Bronko

    Bronko Member

    Joined:
    May 13, 2016
    Messages:
    65
    Likes Received:
    6
    As mentioned, at first I will replace the HBA (only one in there) and the cable to back plane (all 24 drives in the same back plane). Unfortunately the vendor package will arrive not before monday.

    Temperature was one of my first check, all about 30 °C. We have a well monitored in rack cooling system... ;-)

    It's well known, but not in this mass...!

    Your Disk change observation is a good idea, in case the HBA/Cable change will have no effect.

    I have backups of all critical data in the pool because I'm using ZFS Replication to a backup system as a napp-it Pro feature.... ;-)

    Little annoyance here, after the update OmniOSce-r151026 -> OmniOSce-r151028 once more I have to redo this steps to REactivate TLS E-Mail:

    napp-it // webbased ZFS NAS/SAN appliance for OmniOS, OpenIndiana and Solaris : Downloads
    As I have to do here before:
    https://forums.servethehome.com/ind...022-long-term-stable.14367/page-3#post-148057

    Therefore I got no E-Mail Alert and because of personal sickness and vacation the are some days old... ;-)
     
    #11
    Last edited: Feb 22, 2019
  12. gea

    gea Well-Known Member

    Joined:
    Dec 31, 2010
    Messages:
    2,167
    Likes Received:
    706
    OmniOS 151028 removes the old Sun SSH completely and replace it with Open-SSH.
    As a result TLS from 151026 is unworkable after the update. Even a reinstall of TLS didn't work for me.

    If you need TLS email I suggest a clean install of 151028 + napp-it via wget + TLS
    If you save/restore /var/web-gui/_log/* all napp-it settings remain intact, optionally recreate users with same uid/gid
     
    #12
  13. Bronko

    Bronko Member

    Joined:
    May 13, 2016
    Messages:
    65
    Likes Received:
    6
    Reinstall of TLS was successful on two machines with same upgrade history of the last 3 years since initial installation of OmniOS 5.11 omnios-r151018-ae3141d April 2016.

    E-Mail notification works as before since 3 days.
     
    #13
  14. zxv

    zxv The more I C, the less I see.

    Joined:
    Sep 10, 2017
    Messages:
    152
    Likes Received:
    46
    Hey @Bronko, zdb -C and -l command take a pool name, so:
    zdb -C tank1
    zdb -l tank1
     
    #14
  15. Bronko

    Bronko Member

    Joined:
    May 13, 2016
    Messages:
    65
    Likes Received:
    6
    Ok, HBA replaced (not the Cable) and zpool clear tank1 (stucked before) got me this after some minutes:
    Code:
    # zpool status tank1
    
      pool: tank1
     state: ONLINE
      scan: none requested
    config:
    
            NAME                       STATE     READ WRITE CKSUM
            tank1                      ONLINE       0     0     0
              mirror-0                 ONLINE       0     0     0
                c9t5000CCA23B0CEF7Dd0  ONLINE       0     0     0
                c9t5000CCA23B0D18F9d0  ONLINE       0     0     0
              mirror-1                 ONLINE       0     0     0
                c9t5000CCA23B0CDAE9d0  ONLINE       0     0     0
                c9t5000CCA23B0D0E11d0  ONLINE       0     0     0
              mirror-2                 ONLINE       0     0     0
                c9t5000CCA23B0C20C9d0  ONLINE       0     0     0
                c9t5000CCA23B0CA94Dd0  ONLINE       0     0     0
              mirror-3                 ONLINE       0     0     0
                c9t5000CCA23B07B701d0  ONLINE       0     0     0
                c9t5000CCA23B0C9CD5d0  ONLINE       0     0     0
              mirror-4                 ONLINE       0     0     0
                c9t5000CCA23B0BE229d0  ONLINE       0     0     0
                c9t5000CCA23B0C0935d0  ONLINE       0     0     0
              mirror-5                 ONLINE       0     0     0
                c9t5000CCA23B0BFDA9d0  ONLINE       0     0     0
                c9t5000CCA23B0D25C9d0  ONLINE       0     0     0
              mirror-6                 ONLINE       0     0     0
                c9t5000CCA23B0B9121d0  ONLINE       0     0     0
                c9t5000CCA23B0BFCA1d0  ONLINE       0     0     0
              mirror-7                 ONLINE       0     0     0
                c9t5000CCA23B0BDA41d0  ONLINE       0     0     0
                c9t5000CCA23B0BFBF1d0  ONLINE       0     0     0
              mirror-8                 ONLINE       0     0     0
                c9t5000CCA23B0CE5B9d0  ONLINE       0     0     0
                c9t5000CCA23B0CE7A9d0  ONLINE       0     0     0
              mirror-9                 ONLINE       0     0     0
                c9t5000CCA23B0C0901d0  ONLINE       0     0     0
                c9t5000CCA23B0D1BB5d0  ONLINE       0     0     0
              mirror-10                ONLINE       0     0     0
                c9t5000CCA23B0C00B1d0  ONLINE       0     0     0
                c9t5000CCA23B0C9BD5d0  ONLINE       0     0     0
              mirror-11                ONLINE       0     0     0
                c9t5000CCA23B0A3AE9d0  ONLINE       0     0     0
                c9t5000CCA23B0CF6D9d0  ONLINE       0     0     0
            logs
              mirror-12                ONLINE       0     0     0
                c1t5002538C401C745Fd0  ONLINE       0     0     0
                c1t5002538C401C7462d0  ONLINE       0     0     0
            cache
              c3t1d0                   ONLINE       0     0     0
    
    errors: No known data errors
    
    The Cache device (L2ARC, Intel SSD NVMe) is back, too.

    And I have had access to zfs filesystems in the pool (mc browsed ;-).

    After next reboot the pool is UNAVAIL again:
    Code:
    # zpool status tank1
    
      pool: tank1
     state: UNAVAIL
    status: One or more devices could not be opened.  There are insufficient
            replicas for the pool to continue functioning.
    action: Attach the missing device and online it using 'zpool online'.
       see: http://illumos.org/msg/ZFS-8000-3C
      scan: none requested
    config:
    
            NAME                       STATE     READ WRITE CKSUM
            tank1                      UNAVAIL      0     0     0  insufficient replicas
              mirror-0                 ONLINE       0     0     0
                c9t5000CCA23B0CEF7Dd0  ONLINE       0     0     0
                c9t5000CCA23B0D18F9d0  ONLINE       0     0     0
              mirror-1                 DEGRADED     0     0     0
                c9t5000CCA23B0CDAE9d0  ONLINE       0     0     0
                c9t5000CCA23B0D0E11d0  UNAVAIL      0     0     0  cannot open
              mirror-2                 UNAVAIL      0     0     0  insufficient replicas
                c9t5000CCA23B0C20C9d0  UNAVAIL      0     0     0  cannot open
                c9t5000CCA23B0CA94Dd0  UNAVAIL      0     0     0  cannot open
              mirror-3                 ONLINE       0     0     0
                c9t5000CCA23B07B701d0  ONLINE       0     0     0
                c9t5000CCA23B0C9CD5d0  ONLINE       0     0     0
              mirror-4                 UNAVAIL      0     0     0  insufficient replicas
                c9t5000CCA23B0BE229d0  UNAVAIL      0     0     0  cannot open
                c9t5000CCA23B0C0935d0  UNAVAIL      0     0     0  cannot open
              mirror-5                 DEGRADED     0     0     0
                c9t5000CCA23B0BFDA9d0  ONLINE       0     0     0
                c9t5000CCA23B0D25C9d0  UNAVAIL      0     0     0  cannot open
              mirror-6                 ONLINE       0     0     0
                c9t5000CCA23B0B9121d0  ONLINE       0     0     0
                c9t5000CCA23B0BFCA1d0  ONLINE       0     0     0
              mirror-7                 DEGRADED     0     0     0
                c9t5000CCA23B0BDA41d0  ONLINE       0     0     0
                c9t5000CCA23B0BFBF1d0  UNAVAIL      0     0     0  cannot open
              mirror-8                 ONLINE       0     0     0
                c9t5000CCA23B0CE5B9d0  ONLINE       0     0     0
                c9t5000CCA23B0CE7A9d0  ONLINE       0     0     0
              mirror-9                 UNAVAIL      0     0     0  insufficient replicas
                c9t5000CCA23B0C0901d0  UNAVAIL      0     0     0  cannot open
                c9t5000CCA23B0D1BB5d0  UNAVAIL      0     0     0  cannot open
              mirror-10                DEGRADED     0     0     0
                c9t5000CCA23B0C00B1d0  UNAVAIL      0     0     0  cannot open
                c9t5000CCA23B0C9BD5d0  ONLINE       0     0     0
              mirror-11                DEGRADED     0     0     0
                c9t5000CCA23B0A3AE9d0  UNAVAIL      0     0     0  cannot open
                c9t5000CCA23B0CF6D9d0  ONLINE       0     0     0
            logs
              mirror-12                ONLINE       0     0     0
                c1t5002538C401C745Fd0  ONLINE       0     0     0
                c1t5002538C401C7462d0  ONLINE       0     0     0
    
    L2ARC is gone again...

    Console recrowded:
    Code:
    # tail -f /var/adm/messages
    Feb 25 11:24:12 tanker scsi: [ID 243001 kern.info]      w5000cca23b0c0935 FastPath Capable and Enabled
    Feb 25 11:24:12 tanker last message repeated 1 time
    Feb 25 11:24:12 tanker scsi: [ID 243001 kern.info]      w5000cca23b0bfbf1 FastPath Capable and Enabled
    Feb 25 11:24:12 tanker last message repeated 1 time
    Feb 25 11:24:12 tanker scsi: [ID 243001 kern.info]      w5000cca23b0d25c9 FastPath Capable and Enabled
    Feb 25 11:24:12 tanker last message repeated 1 time
    Feb 25 11:24:12 tanker scsi: [ID 243001 kern.info]      w5000cca23b0a3ae9 FastPath Capable and Enabled
    Feb 25 11:24:12 tanker last message repeated 1 time
    Feb 25 11:24:12 tanker scsi: [ID 243001 kern.info]      w5000cca23b0c0935 FastPath Capable and Enabled
    Feb 25 11:24:12 tanker last message repeated 1 time
    Feb 25 11:24:12 tanker scsi: [ID 243001 kern.info]      w5000cca23b0c0901 FastPath Capable and Enabled
    Feb 25 11:24:12 tanker last message repeated 1 time
    Feb 25 11:24:12 tanker scsi: [ID 243001 kern.info]      w5000cca23b0d25c9 FastPath Capable and Enabled
    Feb 25 11:24:12 tanker last message repeated 1 time
    Feb 25 11:24:12 tanker scsi: [ID 243001 kern.info]      w5000cca23b0a3ae9 FastPath Capable and Enabled
    Feb 25 11:24:12 tanker last message repeated 1 time
    Feb 25 11:24:12 tanker scsi: [ID 243001 kern.info]      w5000cca23b0be229 FastPath Capable and Enabled
    Feb 25 11:24:12 tanker last message repeated 1 time
    Feb 25 11:24:12 tanker scsi: [ID 243001 kern.info]      w5000cca23b0c20c9 FastPath Capable and Enabled
    Feb 25 11:24:12 tanker last message repeated 1 time
    Feb 25 11:24:12 tanker scsi: [ID 243001 kern.info]      w5000cca23b0c0901 FastPath Capable and Enabled
    Feb 25 11:24:12 tanker last message repeated 1 time
    Feb 25 11:24:12 tanker scsi: [ID 243001 kern.info]      w5000cca23b0a3ae9 FastPath Capable and Enabled
    Feb 25 11:24:12 tanker last message repeated 1 time
    Feb 25 11:24:12 tanker scsi: [ID 243001 kern.info]      w5000cca23b0c0935 FastPath Capable and Enabled
    Feb 25 11:24:12 tanker last message repeated 1 time
    Feb 25 11:24:12 tanker scsi: [ID 243001 kern.info]      w5000cca23b0c20c9 FastPath Capable and Enabled
    Feb 25 11:24:12 tanker last message repeated 1 time
    Feb 25 11:24:12 tanker scsi: [ID 243001 kern.info]      w5000cca23b0a3ae9 FastPath Capable and Enabled
    Feb 25 11:24:12 tanker last message repeated 1 time
    Feb 25 11:24:12 tanker scsi: [ID 243001 kern.info]      w5000cca23b0be229 FastPath Capable and Enabled
    Feb 25 11:24:12 tanker last message repeated 1 time
    Feb 25 11:24:12 tanker scsi: [ID 243001 kern.info]      w5000cca23b0a3ae9 FastPath Capable and Enabled
    Feb 25 11:24:12 tanker last message repeated 1 time
    Feb 25 11:24:13 tanker scsi: [ID 243001 kern.info]      w5000cca23b0ca94d FastPath Capable and Enabled
    Feb 25 11:24:13 tanker last message repeated 1 time
    Feb 25 11:24:13 tanker cmlb: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2f04@2/pci1000,30e0@0/iport@f/disk@w5000cca23b0d18f9,0 (sd29):
    Feb 25 11:24:13 tanker  primary label corrupt; using backup
    Feb 25 11:24:13 tanker scsi: [ID 243001 kern.info]      w5000cca23b0c0935 FastPath Capable and Enabled
    Feb 25 11:24:13 tanker last message repeated 1 time
    Feb 25 11:24:13 tanker scsi: [ID 243001 kern.info]      w5000cca23b0c00b1 FastPath Capable and Enabled
    Feb 25 11:24:13 tanker last message repeated 1 time
    Feb 25 11:24:13 tanker scsi: [ID 243001 kern.info]      w5000cca23b0d0e11 FastPath Capable and Enabled
    Feb 25 11:24:13 tanker last message repeated 1 time
    Feb 25 11:24:13 tanker scsi: [ID 243001 kern.info]      w5000cca23b0c20c9 FastPath Capable and Enabled
    Feb 25 11:24:13 tanker last message repeated 1 time
    Feb 25 11:24:13 tanker scsi: [ID 243001 kern.info]      w5000cca23b0c0901 FastPath Capable and Enabled
    Feb 25 11:24:13 tanker last message repeated 1 time
    Feb 25 11:24:13 tanker scsi: [ID 243001 kern.info]      w5000cca23b0d25c9 FastPath Capable and Enabled
    Feb 25 11:24:13 tanker last message repeated 1 time
    Feb 25 11:24:13 tanker scsi: [ID 243001 kern.info]      w5000cca23b0be229 FastPath Capable and Enabled
    Feb 25 11:24:13 tanker last message repeated 1 time
    Feb 25 11:24:13 tanker scsi: [ID 243001 kern.info]      w5000cca23b0ca94d FastPath Capable and Enabled
    Feb 25 11:24:13 tanker last message repeated 1 time
    .
    .
    .
     
    #15
    Last edited: Feb 25, 2019
  16. Bronko

    Bronko Member

    Joined:
    May 13, 2016
    Messages:
    65
    Likes Received:
    6
    Tried exactly this and have alternated results. Sometimes the bad disk is alive (smartctl have access too) but not reboot aware, sometimes it keeps bad status. Good disk doesn't change the status.
     
    #16
    Last edited: Feb 26, 2019
  17. zxv

    zxv The more I C, the less I see.

    Joined:
    Sep 10, 2017
    Messages:
    152
    Likes Received:
    46
    Given zfs cannot open the disk at all, is there any other messages about those devices in the in the kernel log?

    It should be possible to use smartctl on omnios to get the drive health information.
    pkg install smartmontools
    /opt/ooce/sbin/smartctl -a /dev/rdsk/c9t5000CCA23B0D0E11d0
     
    #17
  18. Bronko

    Bronko Member

    Joined:
    May 13, 2016
    Messages:
    65
    Likes Received:
    6
    smartctl doesn't work on bad disks:
    (we are back on old HBA: /dev/rdsk/c9... -> /dev/rdsk/c2...)
    Code:
    # smartctl -a -d scsi -T permissive /dev/rdsk/c2t5000CCA23B0C20C9d0s0
    smartctl 6.5 2016-05-07 r4318 [i386-pc-solaris2.11] (local build)
    Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
    
    Smartctl open device: /dev/rdsk/c2t5000CCA23B0C20C9d0s0 failed: No such device or address
    Check against good disk:
    Code:
    # smartctl -a -d scsi -T permissive /dev/rdsk/c2t5000CCA23B0CEF7Dd0s0
    smartctl 6.5 2016-05-07 r4318 [i386-pc-solaris2.11] (local build)
    Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
    
    === START OF INFORMATION SECTION ===
    Vendor:               HGST
    Product:              HUH728080AL4200
    Revision:             A515
    Compliance:           SPC-4
    User Capacity:        8.001.563.222.016 bytes [8,00 TB]
    Logical block size:   4096 bytes
    LU is fully provisioned
    Rotation Rate:        7200 rpm
    Form Factor:          3.5 inches
    Logical Unit id:      0x5000cca23b0cef7c
    Serial number:        12345678  (masqueraded by Bronko)
    Device type:          disk
    Transport protocol:   SAS (SPL-3)
    Local Time is:        Mon Feb 25 17:37:48 2019 CET
    SMART support is:     Available - device has SMART capability.
    SMART support is:     Enabled
    Temperature Warning:  Enabled
    
    === START OF READ SMART DATA SECTION ===
    SMART Health Status: OK
    
    Current Drive Temperature:     32 C
    Drive Trip Temperature:        85 C
    
    Manufactured in week 15 of year 2015
    Specified cycle count over device lifetime:  50000
    Accumulated start-stop cycles:  36
    Specified load-unload count over device lifetime:  600000
    Accumulated load-unload cycles:  6305
    Elements in grown defect list: 0
    
    Vendor (Seagate) cache information
      Blocks sent to initiator = 5644481034452992
    
    Error counter log:
               Errors Corrected by           Total   Correction     Gigabytes    Total
                   ECC          rereads/    errors   algorithm      processed    uncorrected
               fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
    read:          0        0         0         0    2846846      11445,872           0
    write:         0        0         0         0     363236      22256,724           0
    verify:        0        0         0         0      18588          0,000           0
    
    Non-medium error count:        0
    
    No self-tests have been logged
    
     
    #18
  19. zxv

    zxv The more I C, the less I see.

    Joined:
    Sep 10, 2017
    Messages:
    152
    Likes Received:
    46
    Yea, that doesn't point toward drive health issues.
    Given the results are strange, have you considered the health of the root pool?
     
    #19
  20. Bronko

    Bronko Member

    Joined:
    May 13, 2016
    Messages:
    65
    Likes Received:
    6
    Mirrored rpool devices are on SATA onboard ports, not on HBA... ;-)
     
    #20
Similar Threads: Pool Degraded
Forum Title Date
Solaris, Nexenta, OpenIndiana, and napp-it Pool Degraded - Help! May 25, 2019
Solaris, Nexenta, OpenIndiana, and napp-it degraded pool in OmniOS + Napp-IT Aug 29, 2017
Solaris, Nexenta, OpenIndiana, and napp-it ZFS pool degraded, potential data issues.... Aug 6, 2015
Solaris, Nexenta, OpenIndiana, and napp-it Best way to access storage pool with Linux (Ubuntu Server) Today at 4:00 AM
Solaris, Nexenta, OpenIndiana, and napp-it PLEASE DELETE: OmniOS: 'zpool set autoexpand=on poolname' not working Jul 7, 2019

Share This Page