Help! ZFS resilver ended with "insufficient replicas"

el_pedr0 · Mar 17, 2017

Hi everyone,

I've just replaced a drive in my ZFS pool to add storage capacity. But at the end of the resilver process the status is:

Code:

root@toast:~# zpool status bodpool
  pool: bodpool
 state: UNAVAIL
status: One or more devices could not be used because the label is missing
        or invalid.  There are insufficient replicas for the pool to continue
        functioning.
action: Destroy and re-create the pool from
        a backup source.
   see: http://zfsonlinux.org/msg/ZFS-8000-5E
  scan: resilvered 351G in 1h41m with 3248169 errors on Fri Mar 17 00:43:30 2017
config:

        NAME             STATE     READ WRITE CKSUM
        bodpool          UNAVAIL     65     0     0  insufficient replicas
          mirror-0       ONLINE       0     0     0
            sdc          ONLINE       0     0     0
            sdd          ONLINE       0     0     0
          mirror-1       UNAVAIL    128     0     6  insufficient replicas
            replacing-0  OFFLINE      0     0     0
              old        OFFLINE      0     0     0
              sde        ONLINE       0     0     0
            sdf          UNAVAIL      0     0     0

errors: 3248169 data errors, use '-v' for a list

I was replacing a 2TB drive with a 4TB drive in a mirror. I issued commands:
zpool offline bodpool sde
(then shutdown the machine, replaced the old drive with the new one, then booted back up)
zpool replace -f bodpool sde

Then the resilvering process began. But when I checked again at the end of the process I got the errors above. I've still got the old 2TB disk - it wasn't damaged or anything, I was only replacing it with a bigger drive (the other drive in the mirror was already 4TB).

Do you have any advice how I can fix this situation?

gea · Mar 17, 2017

Your mirror-1 consists of the old now unavailable disk (former sde) and the unavailable sdf.
This means that both disks in this mirror are missing = pool is lost

Whenever either the old sde or the sdf come back, you can access the pool again,
so check cabling of sdf and reinsert the old disk

btw:
It a bad choice to
- use controller/port based disk detection. Better is a disk unique assignment like WWN
- never remove a disk for replacement as this affects redundancy, always add a new disk and start a disk > replace

el_pedr0 · Mar 17, 2017

Thanks @gea. You've given me hope. It's only my music collection, so not mission critical. But my soul is crying at the prospect of ripping all those CDs again

.

Regarding disk detection. My server is a Proxmox hypervisor, so I use ZFS baked into Proxmox. I tried to research unique assignment when I set it up in the first place, but it seemed as though it wasn't possible under Proxmox.
(I think that was my interpretation of posts like this and this, but even now I don't really understand it)
I might have misunderstood though so any pointers as to how I can use unique assignment under ZFS and Proxmox would be greatfully received.

vl1969 · Mar 17, 2017

the only way I have found to do "unique assignment" in Proxmox ZFS, and even on other ZFS on Linux, is to build out the zpool first using the /dev/sd(a-z)
than do Export "zpool export <zpool name>"
and than Import " zpool import -d /dev/disk/by-id <zpool name>"

as per some post on ubuntu forum, some times you might need to use the pool number instead of the name, but not sure why would that be needed. something to do with gost pools?!

FYI it will only work on data pools that you can unmount, it will not work on OS pool (in proxmox that would be "rpool") as it can not be unmount/exported

ttabbal · Mar 17, 2017

If you have the old drive still, you can probably put it in and import the pool.

I don't even do a live replacement. I add the new drive making a 3-way mirror. When it's done and scrub comes back clean, then remove the old drive.

I also insist on beating the crap out of new drives. They get badblocks, smart long tests, and filling up with data with ZFS, then scrubbed a few times. Errors in anything means the drive fails and gets returned. Takes a couple of days to test them, but it's worth it not to have one die early on me. There's no guarantee of course, but I've caught a few with errors that only showed up after full testing.

el_pedr0 · Mar 17, 2017

Phew! It turns out one of my SATA cables had got pinched in the disk-swap process. I replaced the cable, resilvering resumed, and now it's completed with my pools all nice and tidy (though I've got to lookup what 3 in the cksum column means). Thanks all!

Code:

root@toast:~# zpool status bodpool
  pool: bodpool
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-9P
  scan: resilvered 747G in 2h2m with 0 errors on Fri Mar 17 22:18:51 2017
config:

        NAME        STATE     READ WRITE CKSUM
        bodpool     ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            sdc     ONLINE       0     0     0
            sdd     ONLINE       0     0     0
          mirror-1  ONLINE       0     0     0
            sde     ONLINE       0     0     0
            sdf     ONLINE       0     0     3

errors: No known data errors

P.S. @vl1969 Did you go for ZFS over BTRFS in the end then?

Kybber · Mar 18, 2017

Glad to see it worked out for you, @el_pedr0

el_pedr0 said:
I might have misunderstood though so any pointers as to how I can use unique assignment under ZFS and Proxmox would be greatfully received.

vl1969 said:
the only way I have found to do "unique assignment" in Proxmox ZFS, and even on other ZFS on Linux, is to build out the zpool first using the /dev/sd(a-z)
than do Export "zpool export <zpool name>"
and than Import " zpool import -d /dev/disk/by-id <zpool name>"

I did this in Proxmox:

Code:

zpool create tank mirror scsi-35000c50056ff7a9b scsi-35000c500575edfd7 mirror scsi-35000c500581c2e6f scsi-35000c5005719dabb

Works perfectly fine and persists through reboots.

vl1969 · Mar 18, 2017

el_pedr0 said:

Phew! It turns out one of my SATA cables had got pinched in the disk-swap process. I replaced the cable, resilvering resumed, and now it's completed with my pools all nice and tidy (though I've got to lookup what 3 in the cksum column means). Thanks all!

Code:

root@toast:~# zpool status bodpool
  pool: bodpool
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-9P
  scan: resilvered 747G in 2h2m with 0 errors on Fri Mar 17 22:18:51 2017
config:

        NAME        STATE     READ WRITE CKSUM
        bodpool     ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            sdc     ONLINE       0     0     0
            sdd     ONLINE       0     0     0
          mirror-1  ONLINE       0     0     0
            sde     ONLINE       0     0     0
            sdf     ONLINE       0     0     3

errors: No known data errors

P.S. @vl1969 Did you go for ZFS over BTRFS in the end then?

Haven't rebuild my real server yet.
Still researching and playing in vm setup at the moment.
In my case though it really makes no difference, i will have to manage all data space manually anyway, and either pass through the data pool to omv vm or freenas vm. Or load the nfs server on proxmox and share the pool via nfs to all. All cli and manual anyhow.
The best option I am thinking to make it easier is to load a webmin alongside the proxmox and do some managing vai webmin.

Sent from my SM-N910T using Tapatalk

cperalt1 · Mar 18, 2017

For ZFS On Linux the way I create a pool to persist mapping changes is to create a gpt partition on a new drive and then find the id to create the pool by doing an ls -alh in the
/dev/disk/by-id directory
ata-INTEL_SSDSC2BA100G3_AAAA999999A0100ZZZ -> ../../sdg
zpool create -f -o ashift=12 -O casesensitivity=insensitive -O normalization=formD -O atime=off -O compression=lz4 tank ata-INTEL_SSDSC2BA100G3_AAAA999999A0100ZZZ

T_Minus · Mar 18, 2017

cperalt1 said:
For ZFS On Linux the way I create a pool to persist mapping changes is to create a gpt partition on a new drive and then find the id to create the pool by doing an ls -alh in the
/dev/disk/by-id directory
ata-INTEL_SSDSC2BA100G3_AAAA999999A0100ZZZ -> ../../sdg
zpool create -f -o ashift=12 -O casesensitivity=insensitive -O normalization=formD -O atime=off -O compression=lz4 tank ata-INTEL_SSDSC2BA100G3_AAAA999999A0100ZZZ

Cool tip.

I know I'm not so 'good' on IDs myself, need to step up that game.

I wouldn't mind seeing a 'guide' for how to do this in ZFS on OmniOS and/or FreeNAS as well as Linux, and the different ways (like here) to get this data.

This seems like one of those areas that's not hard but if you've done it for 5+ years or had to replace a dozen+ drives you probably know a thing or 2 those of us with few # of drives or a couple years experience don't know.

Search

Help! ZFS resilver ended with "insufficient replicas"

el_pedr0

Member

gea

Well-Known Member

el_pedr0

Member

vl1969

Active Member

ttabbal

Active Member

el_pedr0

Member

Kybber

Active Member

vl1969

Active Member

cperalt1

Active Member

T_Minus

Build. Break. Fix. Repeat