ZFS Pool Degraded -> Unavail

zxv · Feb 25, 2019

OK, so it wouldn't necessarily have corrupted the OS.
You might do a scrub of the root pool just to make sure.

Bronko · Feb 25, 2019

@zxv
rpool will be srcubed every week.

Bronko · Feb 26, 2019

gea said:
...
If there is no change, export + import the pool readonly and try to backup changed data (asume you already have a backup) or as much as possible.

Not possible to import pool:

Code:

# zpool import
   pool: tank1
     id: 14720958912406048058
  state: UNAVAIL
 status: One or more devices are missing from the system.
 action: The pool cannot be imported. Attach the missing
        devices and try again.
   see: http://illumos.org/msg/ZFS-8000-3C
 config:

        tank1                      UNAVAIL  insufficient replicas
          mirror-0                 ONLINE
            c2t5000CCA23B0CEF7Dd0  ONLINE
            c2t5000CCA23B0D18F9d0  ONLINE
          mirror-1                 DEGRADED
            c2t5000CCA23B0CDAE9d0  ONLINE
            c2t5000CCA23B0D0E11d0  UNAVAIL  cannot open
          mirror-2                 DEGRADED
            c2t5000CCA23B0C20C9d0  ONLINE
            c2t5000CCA23B0CA94Dd0  UNAVAIL  cannot open
          mirror-3                 ONLINE
            c2t5000CCA23B07B701d0  ONLINE
            c2t5000CCA23B0C9CD5d0  ONLINE
          mirror-4                 UNAVAIL  insufficient replicas
            c2t5000CCA23B0BE229d0  UNAVAIL  cannot open
            c2t5000CCA23B0C0935d0  UNAVAIL  cannot open
          mirror-5                 DEGRADED
            c2t5000CCA23B0BFDA9d0  ONLINE
            c9t5000CCA23B0D25C9d0  UNAVAIL  cannot open
          mirror-6                 ONLINE
            c2t5000CCA23B0B9121d0  ONLINE
            c2t5000CCA23B0BFCA1d0  ONLINE
          mirror-7                 DEGRADED
            c2t5000CCA23B0BDA41d0  ONLINE
            c9t5000CCA23B0BFBF1d0  UNAVAIL  cannot open
          mirror-8                 ONLINE
            c2t5000CCA23B0CE5B9d0  ONLINE
            c2t5000CCA23B0CE7A9d0  ONLINE
          mirror-9                 UNAVAIL  insufficient replicas
            c2t5000CCA23B0C0901d0  UNAVAIL  cannot open
            c2t5000CCA23B0D1BB5d0  UNAVAIL  cannot open
          mirror-10                DEGRADED
            c9t5000CCA23B0C00B1d0  UNAVAIL  cannot open
            c2t5000CCA23B0C9BD5d0  ONLINE
          mirror-11                DEGRADED
            c9t5000CCA23B0A3AE9d0  UNAVAIL  cannot open
            c2t5000CCA23B0CF6D9d0  ONLINE
        logs
          mirror-12                ONLINE
            c1t5002538C401C745Fd0  ONLINE
            c1t5002538C401C7462d0  ONLINE

# zpool import -f tank1
cannot import 'tank1': no such device in pool
        Destroy and re-create the pool from
        a backup source.

Destroy the pool isn't possible too:

Code:

# zpool destroy -f tank1
cannot open 'tank1': no such pool

gea · Feb 27, 2019

Your pool state seems different from test to test. Only the result "UNAVAIL" remains as in all cases different disks are not available, in all cases more than the redundancy level allows.

A single point of failure that can introduce this is the HBA and the Expander. If possible replace the HBA. As there is a slight chance that a single bad disk irritates the expander, it may be an option to remove all disks, then insert disk by disk, wait a short time and check pool state if the disk came up. There is a slight chance that suddenly the pool/other disks change state when adding a disk (a bad disk then).

One idea would be to use a new bootdisk, install OmniOS and check disk and pool state (via format and zpool status). Not very propably but the OS may be corrupted.

If you have a different case/jbod with enough empty bays, I would move all disks to the second system. Your disk behaviour is so unsystematic that I would suppose you have either an HBA/Expander problem or not only one problem but two overlaying problems in your server. If the pool is ok there, then one or propably some parts of the server are damaged.

Bronko · Feb 27, 2019

Hi @gea , thanks for sharing your systematic way to solve problems, its my way too and unfortunately each test eating a lot of time. (I'm "only" in a half position and love it... ;-).

gea said:
Your pool state seems different from test to test. Only the result "UNAVAIL" remains as in all cases different disks are not available, in all cases more than the redundancy level allows.

Yes and I have identified 11 disk in maximum and unfortunately two mirrored vdevs involved.

gea said:
A single point of failure that can introduce this is the HBA and the Expander. If possible replace the HBA.

Already done, pls. check above.

gea said:
As there is a slight chance that a single bad disk irritates the expander, it may be an option to remove all disks, then insert disk by disk, wait a short time and check pool state if the disk came up. There is a slight chance that suddenly the pool/other disks change state when adding a disk (a bad disk then).

I read about it, but with the limitation of SATA drives on SAS Expander. I only have SAS disks here.
Never the less these check itself is still outstanding, will do it asap...

gea said:
One idea would be to use a new bootdisk, install OmniOS and check disk and pool state (via format and zpool status). Not very propably but the OS may be corrupted.

If you have a different case/jbod with enough empty bays, I would move all disks to the second system. Your disk behaviour is so unsystematic that I would suppose you have either an HBA/Expander problem or not only one problem but two overlaying problems in your server. If the pool is ok there, then one or propably some parts of the server are damaged.

Completely agree, and I have had this issue: fmd fault after my update OmniOSce-r151026 -> OmniOSce-r151028. Not really sure currently if it has much more side effects regarding SAS HBA. What do you mean?

One of my tests yesterday was to go back to the last OmniOSce-r151026 boot environment, but doesn't changed data pool (tank1) state.

gea · Feb 27, 2019

Regarding the fmd problem in Topicbox

Have you tried the patch:
pkg apply-hot-fix https://hf.omnios.org/r151028/10187_fmd.p5p

(not sure if this affects you)

Bronko · Feb 27, 2019

Yes, for sure, because its "my" thread on Topicbox and fmd is up again, on booth machines.

Bronko · Mar 7, 2019

Finally the problem was solved by replacing all 11 faulted drives with new one from myself in warranty. Don't believe it...

Is there anyone with similar failure rate of 11 in 24 drives after three years ins use?

Pool State (new and old disk mixed per mirror dev):

Code:

  pool: tank1
 state: ONLINE
  scan: none requested
config:

        NAME                       STATE     READ WRITE CKSUM
        tank1                      ONLINE       0     0     0
          mirror-0                 ONLINE       0     0     0
            c2t5000CCA23B0D18F9d0  ONLINE       0     0     0
            c2t5000CCA2676CFAF9d0  ONLINE       0     0     0
          mirror-1                 ONLINE       0     0     0
            c2t5000CCA23B0CDAE9d0  ONLINE       0     0     0
            c2t5000CCA2674A826Dd0  ONLINE       0     0     0
          mirror-2                 ONLINE       0     0     0
            c2t5000CCA23B0CEF7Dd0  ONLINE       0     0     0
            c2t5000CCA26767B28Dd0  ONLINE       0     0     0
          mirror-3                 ONLINE       0     0     0
            c2t5000CCA23B0C9CD5d0  ONLINE       0     0     0
            c2t5000CCA2674D9035d0  ONLINE       0     0     0
          mirror-4                 ONLINE       0     0     0
            c2t5000CCA23B07B701d0  ONLINE       0     0     0
            c2t5000CCA2674A8D3Dd0  ONLINE       0     0     0
          mirror-5                 ONLINE       0     0     0
            c2t5000CCA23B0BFDA9d0  ONLINE       0     0     0
            c2t5000CCA2674938CDd0  ONLINE       0     0     0
          mirror-6                 ONLINE       0     0     0
            c2t5000CCA23B0BFCA1d0  ONLINE       0     0     0
            c2t5000CCA2674908F9d0  ONLINE       0     0     0
          mirror-7                 ONLINE       0     0     0
            c2t5000CCA23B0BDA41d0  ONLINE       0     0     0
            c2t5000CCA2674B31D9d0  ONLINE       0     0     0
          mirror-8                 ONLINE       0     0     0
            c2t5000CCA23B0CE7A9d0  ONLINE       0     0     0
            c2t5000CCA257925BE1d0  ONLINE       0     0     0
          mirror-9                 ONLINE       0     0     0
            c2t5000CCA23B0B9121d0  ONLINE       0     0     0
            c2t5000CCA257936939d0  ONLINE       0     0     0
          mirror-10                ONLINE       0     0     0
            c2t5000CCA23B0C9BD5d0  ONLINE       0     0     0
            c2t5000CCA267407661d0  ONLINE       0     0     0
          mirror-11                ONLINE       0     0     0
            c2t5000CCA23B0CE5B9d0  ONLINE       0     0     0
            c2t5000CCA2674D78F1d0  ONLINE       0     0     0
          mirror-12                ONLINE       0     0     0
            c2t5000CCA26742E4CDd0  ONLINE       0     0     0
            c2t5000CCA267478921d0  ONLINE       0     0     0
          mirror-13                ONLINE       0     0     0
            c2t5000CCA26746049Dd0  ONLINE       0     0     0
            c2t5000CCA2674CDF71d0  ONLINE       0     0     0
          mirror-14                ONLINE       0     0     0
            c2t5000CCA2673E3DD9d0  ONLINE       0     0     0
            c2t5000CCA267460489d0  ONLINE       0     0     0
          mirror-15                ONLINE       0     0     0
            c2t5000CCA2674488F1d0  ONLINE       0     0     0
            c2t5000CCA26744F179d0  ONLINE       0     0     0
        logs
          mirror-16                ONLINE       0     0     0
            c1t5002538C401C745Fd0  ONLINE       0     0     0
            c1t5002538C401C7462d0  ONLINE       0     0     0
        cache
          c3t1d0                   ONLINE       0     0     0
        spares
          c2t5000CCA2676BC9F5d0    AVAIL  
          c2t5000CCA2676CF405d0    AVAIL  
          c2t5000CCA23B0CF6D9d0    AVAIL  

errors: No known data errors

Backup Replication is in progress... thanks to napp-it!

Thanks for all your hints and minds.

gea · Mar 8, 2019

Bronko said:
Finally the problem was solved by replacing all 11 faulted drives with new one from myself in warranty. Don't believe it...

Is there anyone with similar failure rate of 11 in 24 drives after three years ins use?

I was affected with a series of 3TB Seagate disks a few years ago where I nearly lost a Z3 backup pool. After around two/three years they died like flies. There was a rumour that it was due the air filter allowing dust into the internal after this time.

Secondly I have a customer where the serverroom cooling failed with alerting disabled and a similar poolstate like yours after the weekend.

Search

ZFS Pool Degraded -> Unavail

zxv

The more I C, the less I see.

Bronko

Member

Bronko

Member

gea

Well-Known Member

Bronko

Member

gea

Well-Known Member

Bronko

Member

Bronko

Member

gea

Well-Known Member