Pool Degraded - Help!

Discussion in 'Solaris, Nexenta, OpenIndiana, and napp-it' started by ZzBloopzZ, May 25, 2019.

  1. ZzBloopzZ

    ZzBloopzZ Member

    Joined:
    Jan 7, 2013
    Messages:
    89
    Likes Received:
    13
    Hello,

    I noticed my file server acting really slow last few weeks but was too busy to investigate until now. Even napp-it would not be able to pull Napp-it would not even be able to retrieve anything from the Disks or Pools sections. It would keep loading and loading, even hour later. Finally, I remembered to ssh in to OmniOS directly and type "zpool status". I have attached my results. I'm running
    OmniOS v11 r151022, the 10x 3TB drives are connected to two HBA LSI 9211-8i cards.

    I have the pool set to have 2 spares. Now I am worried, from the screenshot does it seem 4 drives are bad or just the 2 faulted drives? What exactly should I do next? Replace all 4 drives or just the 2 faulted or degraded ones? Right now thinking to back up important data to few spare USB external drives I have and then focus on troubleshooting/fixing the pool. Appreciate any help please!

    Thank you!
     

    Attached Files:

    #1
    Last edited: May 25, 2019
  2. redeamon

    redeamon Member

    Joined:
    Jun 10, 2018
    Messages:
    37
    Likes Received:
    2
    Hey there, we need more info to help. What types of drives, size, etc. and a full "zpool status -v" would help. Have you run any scrubs?
     
    #2
  3. gea

    gea Well-Known Member

    Joined:
    Dec 31, 2010
    Messages:
    2,096
    Likes Received:
    674
    It is quite unlikely that three disks fail all together. I have only seen two cases where this happened. One was with Seagate 3TB disks that died after three years like flies. Onother case was a serverrom where the air conditioner failed during a weekend with some disks died due overtemperature.

    At the moment you have three damaged files (unrepairable with checksum errors ) with two faulted disks. If they are really dead the next disk fail means a pool lost. If that happens and a disk come back, the pool reverts to degraded or online.

    What I would do:
    Check disk temperature and smart status (menu disks) and backup the most important data. Then poweroff and check cables. Maybe a bad disk or power connector is the reason. Then power on and do a Menu pools > clear to clear the errors.

    To completely clear the error, you must delete the damaged files and start a scrub. If the disks come back and the reason remains unclear, do an intensive disk test with sea tools or wd data liveguard or a similar tool.
     
    #3
    Last edited: May 25, 2019
  4. ZzBloopzZ

    ZzBloopzZ Member

    Joined:
    Jan 7, 2013
    Messages:
    89
    Likes Received:
    13
    Firstly, thank you everyone for the support. I ended up deleting the three corrupt files as they were not important. Then I updated napp-it and finally the menu's were working again. I identified two drives that seemed to have many errors under the Disk menu so I replaced those two with the last 2 spares I had as extra. I also blew out dust from the server and unplugged and firmly replugged all power cables and SATA/HBA cables connections.

    I then cleared errors with "zpool clear tank". Then replaced the two drives in napp-it so now pool is resilvering. Finally, after replacing the two drives I am now finally able to run the smartinfo in napp-it which I was unable to do before. It shows "!Failed" for two drives. Does that mean they are bad as well? Strangely, one drive has many errors but SMART for it passed while the two that failed do not show any errors for S H and T. Smart results are attached.

    FS Smart.PNG

    Also, one of the drives I replaced had hundreds of errors under H and T. While another had 50-60 H and few T. Pool is currently being resilvered:

    Code:
    pool: pool30tb
     state: DEGRADED
    status: One or more devices is currently being resilvered.  The pool will
            continue to function, possibly in a degraded state.
    action: Wait for the resilver to complete.
      scan: resilver in progress since Sat May 25 21:01:36 2019
        131G scanned out of 22.6T at 100M/s, 65h12m to go
        24.9G resilvered, 0.57% done
    config:
    
            NAME                         STATE     READ WRITE CKSUM
            pool30tb                     DEGRADED     0     0     0
              raidz2-0                   DEGRADED     0     0     0
                c3t5000CCA37EC13F7Cd0    ONLINE       0     0     0
                c3t5000CCA37EC1C4B1d0    ONLINE       0     0     0
                c3t5000CCA37EC1C4E4d0    ONLINE       0     0     0
                c3t5000CCA37EC1CD01d0    ONLINE       0     0     2
                c3t5000CCA37EC1EB05d0    ONLINE       0     0     0
                c3t5000CCA37EC1ED1Cd0    ONLINE       0     0     0
                replacing-6              UNAVAIL      0     0     0
                  c3t5000CCA37EC3F74Ed0  UNAVAIL      0     0     0  cannot open
                  c3t5000039FF4E7B7E5d0  ONLINE       0     0     0  (resilvering)
                c3t5000039FF4E7BF5Fd0    ONLINE       0     0     0
                replacing-8              UNAVAIL      0     0     0
                  c3t5000CCA37EC21035d0  UNAVAIL      0     0     0  cannot open
                  c3t5000039FF4E7B791d0  ONLINE       0     0     0  (resilvering)
                c3t5000CCA37EC2292Bd0    ONLINE       0     0     0
    
    errors: No known data errors
    
      pool: rpool
     state: ONLINE
      scan: none requested
    config:
    
            NAME        STATE     READ WRITE CKSUM
            rpool       ONLINE       0     0     0
              c2t0d0s0  ONLINE       0     0     0
    
    errors: No known data errors
    Although the system is resilvering, it is already responding much quicker. I know for a fact one of the drives I pulled were bad. I am going to connect them to my computer and run advanced HD diagnostics on them to verify if they defective or not.
     
    #4
    Last edited: May 25, 2019
  5. gea

    gea Well-Known Member

    Joined:
    Dec 31, 2010
    Messages:
    2,096
    Likes Received:
    674
    If you klick on the serial number in menu Disks > Smart, you can see the detailled smartlog. If a disk reports a smart failed, I would replace and do at least an intemsive disk test (wd data live or similar)
     
    #5
  6. ZzBloopzZ

    ZzBloopzZ Member

    Joined:
    Jan 7, 2013
    Messages:
    89
    Likes Received:
    13
    Update, the drives are still resilvering and have been monitoring them randomly throughout the week. Now as of this morning, it is showing most of the drives as 'DEGRADED'. I do know for the fact the two drives that I originally pulled and replaced with brand-new drives are indeed bad as I ran SMART on them in another machine with WD Data Lifeguard Diagnostics. They fail quickly with both short and extended tests.

    Here is the current zpool status:

    Also, I have attached napp-it Disks and SmartInfo screenshots. Tons of errors on two drives, but then for SMART it looks like one of those error drives is bad while the other drive that it is saying is bad does not have any errors.

    I'm worried, at this point I guess I have to wait for the two newest drives I installed to finish resilvering then plan to replace two more drives with new ones. What a nightmare... looks like perfect storm that all these drives are failing granted they are old. Server was running perfectly quick 6 weeks ago. It's not even used much either and sits in a cool basement with much airflow.

    Should I just wait patiently or is there anything else I should do?
     

    Attached Files:

    #6
  7. gea

    gea Well-Known Member

    Joined:
    Dec 31, 2010
    Messages:
    2,096
    Likes Received:
    674
    If disks are known to be bad on problems makes fixing easy, just replace them. If disk state is unclear, it can be a bad disk, repairable disks with bad esctors, backplane, power or an expander with Sata disks where a bad disk can block or irritate the expander.

    Best method to be sure is an external test ex via WD data liveguard (full test). Smart is a good indicator when failed. If you use non-destructive tests you may shut down the server and test the disks prior a resilvering. Such a test may also repair or block bad sectors what may help on a raid resilvering.
     
    #7
  8. ZzBloopzZ

    ZzBloopzZ Member

    Joined:
    Jan 7, 2013
    Messages:
    89
    Likes Received:
    13
    Is there a command to save all directory and file names in the pool to a text file? I googled around and could not find a solution. :c/
     
    #8
  9. pricklypunter

    pricklypunter Well-Known Member

    Joined:
    Nov 10, 2015
    Messages:
    1,453
    Likes Received:
    413
    I don't know Solaris/ OmniOS all that well, but maybe ls -R > your.txt or maybe install tree, if that's available for that OS, and use that to format the output how you like?
     
    #9
  10. EffrafaxOfWug

    EffrafaxOfWug Radioactive Member

    Joined:
    Feb 12, 2015
    Messages:
    939
    Likes Received:
    320
    One very simple way of doing that without installing tree would be to run something like:
    Code:
    find / * > /somedir/all_my_files_and_dirs.txt
    I'm not 100% on what Solaris' find options are like compared to the GNU find I'm used to, but I'm fairly certain the above should work anywhere.
     
    #10
  11. gea

    gea Well-Known Member

    Joined:
    Dec 31, 2010
    Messages:
    2,096
    Likes Received:
    674
    You can use different find options on OmniOS
    find / -name find

    /usr/bin/find
    /usr/xpg4/bin/find
    /usr/gnu/bin/find
     
    #11
Similar Threads: Pool Degraded
Forum Title Date
Solaris, Nexenta, OpenIndiana, and napp-it ZFS Pool Degraded -> Unavail Feb 21, 2019
Solaris, Nexenta, OpenIndiana, and napp-it degraded pool in OmniOS + Napp-IT Aug 29, 2017
Solaris, Nexenta, OpenIndiana, and napp-it ZFS pool degraded, potential data issues.... Aug 6, 2015
Solaris, Nexenta, OpenIndiana, and napp-it NAPP-IT - Moving whole pool - DATAMOVER or no? May 29, 2019
Solaris, Nexenta, OpenIndiana, and napp-it ZFS 20 x 8TB Mirrored pool (40 drives) Nov 4, 2018

Share This Page