ZOL - refresh a pool status when disk is physically missing

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

lenainjaune

New Member
Sep 21, 2022
11
0
1
Hi :)

We have a ZFS RAIDZ of 2 disks in mirror to let us eject one by slots protected with doors if a sinister arises. We externalise one disk a month and to simplify the process of ZOL replace we try to make an automated script. We block in one point : refresh the pool status. Suppose we have an operational pool and the script automagically ejects in software way the right disk and asks for us to eject it physically, but we eject the wrong one. At this time ZFS knows the first disk is ejected BUT believes the second is always there then the pool is totally out of order. We know that system reboot repairs the problem as at boot ZFS auto imports the pool but how to repair it without reboot. Too when ZFS believes the disk is here, we can not import. The why of our post ...

Thanks in advance for the time you will spend for us ;)
With adelphity,
lnj
 
Last edited:

ttabbal

Active Member
Mar 10, 2016
747
207
43
47
That they are saying mirror, and only 2 disks, it can't be raidz. Has to be mirror.

So it sounds like they are removing a drive using the ZFS CLI. Something like "zpool offline" perhaps? Then physically removed the other drive. I'm surprised it didn't throw an error and take the pool offline. Surprised enough that I would suspect a bug in the ZOL code. I would suggest asking them about it. I use it, but I've never run into something like this. The only times I remove drives is to fix problems.

One thing that might work is "zpool export" followed by "zpool import -a". I think that's what it does at boot. This will take the pool offline for a short time, so it isn't a lot better than a reboot, but it should be faster at least. And if both drives are offline, the pool is down anyway, so I guess it's not losing much to try.
 

lenainjaune

New Member
Sep 21, 2022
11
0
1
Thank you for your reactivity I really love this forum ;) !

Can you clarify the pool layout? A RAID-Z can't be done with 2 disks.
That they are saying mirror, and only 2 disks, it can't be raidz. Has to be mirror.
Yes RAID-Z is a mistake ! This is a RAID1 with 2 disks in mirror but not a RAID-Z (I memorized without verified it that a RAID with ZFS was a RAID ZFS shorten in RAID-Z but I re-discovered that a RAID-Z is a variation of RAID5).

So it sounds like they are removing a drive using the ZFS CLI. Something like "zpool offline" perhaps? Then physically removed the other drive.
This is exactly what we made.

I'm surprised it didn't throw an error and take the pool offline. Surprised enough that I would suspect a bug in the ZOL code.
Me too !

One thing that might work is "zpool export" followed by "zpool import -a". I think that's what it does at boot. This will take the pool offline for a short time, so it isn't a lot better than a reboot, but it should be faster at least. And if both drives are offline, the pool is down anyway, so I guess it's not losing much to try.
All Right ! I will try to export and import in this way. It seems that I did not try this before ...

I would suggest asking them about it.
OK
 

lenainjaune

New Member
Sep 21, 2022
11
0
1
After ejecting a disk in the software way (zpool offline), I ejected physically the wrong disk, I just tried :
Code:
zpool export pool_bkp
The command never ended without symptom of activity (12 minutes after the command was always running). From a second terminal, when I tried to see the effective pool status (zpool status), the command never ended neither. I had to reboot twice to make to pool in ONLINE state (the first to re-import the disk to make the pool available in DEGRADED state, the second because when I tried to make the other disk in the ONLINE state the operation ended with a faulty state and indicated the UNVAIL state).

I ask to myself if there is nothing to make with the multiple ZFS services (as indicated here). What do you think ?

Also is there a debug command to see what is blocking the export process ?
 
Last edited:

thulle

Member
Apr 11, 2019
48
18
8
So then both active members of the pool is gone? That might be a state that recovery isn't tested for. Might have to file a bug for that.

Also is there a debug command to see what is blocking the export process ?
If you check the kernel log there should be a hung task-calltrace, something like:

kernel: INFO: task kworker/u66:2:2121 blocked for more than 122 seconds.
kernel: Tainted: P W OE T 6.7.6-gentoo-x86_64 #2
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: task:kworker/u66:2 state:D stack:0 pid:2121 tgid:2121 ppid:2 flags:0x00004000
kernel: Workqueue: writeback wb_workfn (flush-zfs-5)
kernel: Call Trace:
kernel: <TASK>
kernel: __schedule+0x277/0x680
kernel: schedule+0x31/0xe0
kernel: __cv_broadcast+0x154/0x190 [spl]
...and so on


One partial solution might be to use vdev_id.conf to create aliases for the members of the pool so that zpool status shows

Code:
    NAME                                            STATE     READ WRITE CKSUM
    pool                                            ONLINE       0     0     0
      mirror-0                                      ONLINE       0     0     0
        left_drive                                  ONLINE       0     0     0
        right_drive                                 ONLINE       0     0     0
Or upper/lower, red/blue - marking the caddies w different colors - or something like that. Would lower the risk.

One solution might be to make it into a 3-way mirror, that way you can always plug the incorrectly removed drive back in.
 

lenainjaune

New Member
Sep 21, 2022
11
0
1
Hi and sorry for the delay I did not find a moment to manage this project this week.

Then ...

If you check the kernel log there should be a hung task-calltrace, something like:

kernel: INFO: task kworker/u66:2:2121 blocked for more than 122 seconds.
kernel: Tainted: P W OE T 6.7.6-gentoo-x86_64 #2
...
I tried again to export the disk which was mistakenly physically removed to trace what happens.

In my case after a loooong while, the logging started to populate (in my case I mistakenly removed the ATA1 disk) :
Code:
root@HOST:~# journalctl -kf
...
mars 08 14:44:34 HOST kernel: ata1.01: failed to resume link (SControl 0)
mars 08 14:44:34 HOST kernel: ata1.00: SATA link down (SStatus 0 SControl 330)
mars 08 14:44:34 HOST kernel: ata1.01: SATA link down (SStatus 0 SControl 0)
mars 08 14:44:40 HOST kernel: ata1.01: failed to resume link (SControl 0)
mars 08 14:44:40 HOST kernel: ata1.00: SATA link down (SStatus 0 SControl 330)
mars 08 14:44:40 HOST kernel: ata1.01: SATA link down (SStatus 0 SControl 0)
mars 08 14:44:47 HOST kernel: ata1.01: failed to resume link (SControl 0)
mars 08 14:44:47 HOST kernel: ata1.00: SATA link down (SStatus 0 SControl 330)
mars 08 14:44:47 HOST kernel: ata1.01: SATA link down (SStatus 0 SControl 0)
mars 08 14:44:47 HOST kernel: ata1.00: disabled
mars 08 14:44:47 HOST kernel: sd 0:0:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=46s
mars 08 14:44:47 HOST kernel: sd 0:0:0:0: [sda] tag#0 Sense Key : Illegal Request [current]
mars 08 14:44:47 HOST kernel: sd 0:0:0:0: [sda] tag#0 Add. Sense: Unaligned write command
mars 08 14:44:47 HOST kernel: sd 0:0:0:0: [sda] tag#0 CDB: Write(16) 8a 00 00 00 00 00 33 71 9b 88 00 00 00 08 00 00
mars 08 14:44:47 HOST kernel: blk_update_request: I/O error, dev sda, sector 863083400 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
mars 08 14:44:47 HOST kernel: zio pool=pool_bkp vdev=/dev/disk/by-path/pci-0000:00:1f.2-ata-1-part1 error=5 type=2 offset=441897652224 size=4096 flags=180880
mars 08 14:44:47 HOST kernel: sd 0:0:0:0: rejecting I/O to offline device
mars 08 14:44:47 HOST kernel: blk_update_request: I/O error, dev sda, sector 883045136 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
mars 08 14:44:47 HOST kernel: zio pool=pool_bkp vdev=/dev/disk/by-path/pci-0000:00:1f.2-ata-1-part1 error=5 type=2 offset=452118061056 size=4096 flags=180880
mars 08 14:44:47 HOST kernel: blk_update_request: I/O error, dev sda, sector 917213312 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
mars 08 14:44:47 HOST kernel: ata1.00: detaching (SCSI 0:0:0:0)
mars 08 14:44:47 HOST kernel: zio pool=pool_bkp vdev=/dev/disk/by-path/pci-0000:00:1f.2-ata-1-part1 error=5 type=2 offset=469612167168 size=4096 flags=180880
mars 08 14:44:47 HOST kernel: blk_update_request: I/O error, dev sda, sector 2576 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
mars 08 14:44:47 HOST kernel: zio pool=pool_bkp vdev=/dev/disk/by-path/pci-0000:00:1f.2-ata-1-part1 error=5 type=1 offset=270336 size=8192 flags=b08c1
mars 08 14:44:47 HOST kernel: blk_update_request: I/O error, dev sda, sector 178301760 op 0x1:(WRITE) flags 0x700 phys_seg 2 prio class 0
mars 08 14:44:47 HOST kernel: zio pool=pool_bkp vdev=/dev/disk/by-path/pci-0000:00:1f.2-ata-1-part1 error=5 type=2 offset=91289452544 size=8192 flags=40080c80
mars 08 14:44:47 HOST kernel: blk_update_request: I/O error, dev sda, sector 7814016016 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
mars 08 14:44:47 HOST kernel: zio pool=pool_bkp vdev=/dev/disk/by-path/pci-0000:00:1f.2-ata-1-part1 error=5 type=1 offset=4000775151616 size=8192 flags=b08c1
mars 08 14:44:47 HOST kernel: blk_update_request: I/O error, dev sda, sector 7814016528 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
mars 08 14:44:47 HOST kernel: zio pool=pool_bkp vdev=/dev/disk/by-path/pci-0000:00:1f.2-ata-1-part1 error=5 type=1 offset=4000775413760 size=8192 flags=b08c1
mars 08 14:44:47 HOST kernel: blk_update_request: I/O error, dev sda, sector 255977720 op 0x1:(WRITE) flags 0x700 phys_seg 4 prio class 0
mars 08 14:44:47 HOST kernel: zio pool=pool_bkp vdev=/dev/disk/by-path/pci-0000:00:1f.2-ata-1-part1 error=5 type=2 offset=131059544064 size=16384 flags=40080c80
mars 08 14:44:47 HOST kernel: blk_update_request: I/O error, dev sda, sector 422839560 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
mars 08 14:44:47 HOST kernel: zio pool=pool_bkp vdev=/dev/disk/by-path/pci-0000:00:1f.2-ata-1-part1 error=5 type=2 offset=216492806144 size=4096 flags=180880
mars 08 14:44:47 HOST kernel: blk_update_request: I/O error, dev sda, sector 863083408 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
mars 08 14:44:47 HOST kernel: zio pool=pool_bkp vdev=/dev/disk/by-path/pci-0000:00:1f.2-ata-1-part1 error=5 type=2 offset=441897656320 size=4096 flags=180880
mars 08 14:44:47 HOST kernel: zio pool=pool_bkp vdev=/dev/disk/by-path/pci-0000:00:1f.2-ata-1-part1 error=5 type=2 offset=534857596928 size=4096 flags=180880
mars 08 14:44:47 HOST kernel: WARNING: Pool 'pool_bkp' has encountered an uncorrectable I/O failure and has been suspended.
mars 08 14:44:47 HOST kernel: sd 0:0:0:0: [sda] Synchronizing SCSI cache
mars 08 14:44:47 HOST kernel: sd 0:0:0:0: [sda] Synchronize Cache(10) failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
mars 08 14:44:47 HOST kernel: sd 0:0:0:0: [sda] Stopping disk
mars 08 14:44:47 HOST kernel: sd 0:0:0:0: [sda] Start/Stop Unit failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
mars 08 14:46:14 HOST kernel: INFO: task txg_sync:666 blocked for more than 120 seconds.
mars 08 14:46:14 HOST kernel:       Tainted: P           OE     5.10.0-18-amd64 #1 Debian 5.10.140-1
mars 08 14:46:14 HOST kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mars 08 14:46:14 HOST kernel: task:txg_sync        state:D stack:    0 pid:  666 ppid:     2 flags:0x00004000
mars 08 14:46:14 HOST kernel: Call Trace:
mars 08 14:46:14 HOST kernel:  __schedule+0x282/0x880
mars 08 14:46:14 HOST kernel:  schedule+0x46/0xb0
mars 08 14:46:14 HOST kernel:  schedule_timeout+0x8b/0x150
mars 08 14:46:14 HOST kernel:  ? __next_timer_interrupt+0x110/0x110
mars 08 14:46:14 HOST kernel:  io_schedule_timeout+0x4c/0x80
mars 08 14:46:14 HOST kernel:  __cv_timedwait_common+0x12f/0x170 [spl]
mars 08 14:46:14 HOST kernel:  ? add_wait_queue_exclusive+0x70/0x70
mars 08 14:46:14 HOST kernel:  __cv_timedwait_io+0x15/0x20 [spl]
mars 08 14:46:14 HOST kernel:  zio_wait+0x129/0x2b0 [zfs]
mars 08 14:46:14 HOST kernel:  dsl_pool_sync+0x465/0x4f0 [zfs]
mars 08 14:46:14 HOST kernel:  spa_sync+0x575/0xfa0 [zfs]
mars 08 14:46:14 HOST kernel:  ? spa_txg_history_init_io+0x105/0x110 [zfs]
mars 08 14:46:14 HOST kernel:  txg_sync_thread+0x2e0/0x4a0 [zfs]
mars 08 14:46:14 HOST kernel:  ? txg_fini+0x240/0x240 [zfs]
mars 08 14:46:14 HOST kernel:  thread_generic_wrapper+0x6f/0x80 [spl]
mars 08 14:46:14 HOST kernel:  ? __thread_exit+0x20/0x20 [spl]
mars 08 14:46:15 HOST kernel:  kthread+0x11b/0x140
mars 08 14:46:15 HOST kernel:  ? __kthread_bind_mask+0x60/0x60
mars 08 14:46:15 HOST kernel:  ret_from_fork+0x22/0x30
mars 08 14:46:15 HOST kernel: INFO: task zpool:389016 blocked for more than 120 seconds.
mars 08 14:46:15 HOST kernel:       Tainted: P           OE     5.10.0-18-amd64 #1 Debian 5.10.140-1
mars 08 14:46:15 HOST kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mars 08 14:46:15 HOST kernel: task:zpool           state:D stack:    0 pid:389016 ppid:388326 flags:0x00004000
mars 08 14:46:15 HOST kernel: Call Trace:
mars 08 14:46:15 HOST kernel:  __schedule+0x282/0x880
mars 08 14:46:15 HOST kernel:  schedule+0x46/0xb0
mars 08 14:46:15 HOST kernel:  io_schedule+0x42/0x70
mars 08 14:46:15 HOST kernel:  cv_wait_common+0xac/0x130 [spl]
mars 08 14:46:15 HOST kernel:  ? add_wait_queue_exclusive+0x70/0x70
mars 08 14:46:15 HOST kernel:  txg_wait_synced_impl+0xcd/0x120 [zfs]
mars 08 14:46:15 HOST kernel:  txg_wait_synced+0xc/0x40 [zfs]
mars 08 14:46:15 HOST kernel:  spa_export_common+0x4d5/0x5a0 [zfs]
mars 08 14:46:15 HOST kernel:  ? zfs_log_history+0x9c/0xf0 [zfs]
mars 08 14:46:15 HOST kernel:  zfsdev_ioctl_common+0x69b/0x880 [zfs]
mars 08 14:46:15 HOST kernel:  ? _copy_from_user+0x28/0x60
mars 08 14:46:15 HOST kernel:  zfsdev_ioctl+0x53/0xe0 [zfs]
mars 08 14:46:15 HOST kernel:  __x64_sys_ioctl+0x8b/0xc0
mars 08 14:46:15 HOST kernel:  do_syscall_64+0x33/0x80
mars 08 14:46:15 HOST kernel:  entry_SYSCALL_64_after_hwframe+0x61/0xc6
mars 08 14:46:15 HOST kernel: RIP: 0033:0x7f56640396b7
mars 08 14:46:15 HOST kernel: RSP: 002b:00007ffc2bd9fdc8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
mars 08 14:46:15 HOST kernel: RAX: ffffffffffffffda RBX: 00007ffc2bd9fde0 RCX: 00007f56640396b7
mars 08 14:46:15 HOST kernel: RDX: 00007ffc2bd9fde0 RSI: 0000000000005a03 RDI: 0000000000000003
mars 08 14:46:15 HOST kernel: RBP: 00007ffc2bda37d0 R08: 00000000ffffffff R09: 00007ffc2bd9fc60
mars 08 14:46:15 HOST kernel: R10: 00005579fa4098c0 R11: 0000000000000246 R12: 00005579fa4098b0
mars 08 14:46:15 HOST kernel: R13: 00005579fa4098c0 R14: 00007ffc2bda3390 R15: 00005579f8fb5c00
mars 08 14:52:17 HOST kernel: INFO: task txg_sync:666 blocked for more than 120 seconds.
mars 08 14:52:17 HOST kernel:       Tainted: P           OE     5.10.0-18-amd64 #1 Debian 5.10.140-1
mars 08 14:52:17 HOST kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mars 08 14:52:17 HOST kernel: task:txg_sync        state:D stack:    0 pid:  666 ppid:     2 flags:0x00004000
mars 08 14:52:17 HOST kernel: Call Trace:
mars 08 14:52:17 HOST kernel:  __schedule+0x282/0x880
mars 08 14:52:17 HOST kernel:  schedule+0x46/0xb0
mars 08 14:52:17 HOST kernel:  schedule_timeout+0x8b/0x150
mars 08 14:52:17 HOST kernel:  ? __wake_up_common_lock+0x8a/0xc0
mars 08 14:52:17 HOST kernel:  ? __next_timer_interrupt+0x110/0x110
mars 08 14:52:17 HOST kernel:  io_schedule_timeout+0x4c/0x80
mars 08 14:52:17 HOST kernel:  __cv_timedwait_common+0x12f/0x170 [spl]
mars 08 14:52:17 HOST kernel:  ? add_wait_queue_exclusive+0x70/0x70
mars 08 14:52:17 HOST kernel:  __cv_timedwait_io+0x15/0x20 [spl]
mars 08 14:52:17 HOST kernel:  zio_wait+0x129/0x2b0 [zfs]
mars 08 14:52:17 HOST kernel:  dsl_pool_sync+0x465/0x4f0 [zfs]
mars 08 14:52:17 HOST kernel:  spa_sync+0x575/0xfa0 [zfs]
mars 08 14:52:17 HOST kernel:  ? spa_txg_history_init_io+0x105/0x110 [zfs]
mars 08 14:52:17 HOST kernel:  txg_sync_thread+0x2e0/0x4a0 [zfs]
mars 08 14:52:17 HOST kernel:  ? txg_fini+0x240/0x240 [zfs]
mars 08 14:52:17 HOST kernel:  thread_generic_wrapper+0x6f/0x80 [spl]
mars 08 14:52:17 HOST kernel:  ? __thread_exit+0x20/0x20 [spl]
mars 08 14:52:17 HOST kernel:  kthread+0x11b/0x140
mars 08 14:52:17 HOST kernel:  ? __kthread_bind_mask+0x60/0x60
mars 08 14:52:17 HOST kernel:  ret_from_fork+0x22/0x30

... the export command never ends but the populate of the log ends.

What to do with the log ? Is the problem will be identified here ?

Furthermore disks are ever identified in the pool by their paths :
Code:
root@HOST:~# lsblk -f | grep zfs_member
├─sda1 zfs_member 5000  pool_bkp 11234760596266434154                              
├─sdb1 zfs_member 5000  pool_bkp 11234760596266434154
root@HOST:~# ls -l /dev/disk/by-path/* | grep -E "sd(a|b)$"
lrwxrwxrwx 1 root root  9  8 mars  16:05 /dev/disk/by-path/pci-0000:00:1f.2-ata-1 -> ../../sda
lrwxrwxrwx 1 root root  9  8 mars  16:05 /dev/disk/by-path/pci-0000:00:1f.2-ata-1.0 -> ../../sda
lrwxrwxrwx 1 root root  9  8 mars  16:05 /dev/disk/by-path/pci-0000:00:1f.2-ata-2 -> ../../sdb
lrwxrwxrwx 1 root root  9  8 mars  16:05 /dev/disk/by-path/pci-0000:00:1f.2-ata-2.0 -> ../../sdb
root@HOST:~# zpool status
  pool: pool_bkp
state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
    still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
    the pool may no longer be accessible by software that does not support
    the features. See zpool-features(5) for details.
  scan: resilvered 628K in 00:00:01 with 0 errors on Fri Mar  8 16:06:46 2024
config:

    NAME                          STATE     READ WRITE CKSUM
    pool_bkp                      ONLINE       0     0     0
      mirror-0                    ONLINE       0     0     0
        pci-0000:00:1f.2-ata-1    ONLINE       0     0     0
        pci-0000:00:1f.2-ata-2.0  ONLINE       0     0     0

errors: No known data errors
Note : I do not know why the pathes are named differently (one finishes by .0 and not the other)

Then I tried to configure vdev_id.conf to shorter aliases but with no success. Here is my configuration file :
Code:
root@HOST:~# cat /etc/zfs/vdev_id.conf
#     by-vdev
#     name     fully qualified or base name of device link
alias ata2      /dev/disk/by-path/pci-0000:00:1f.2-ata-2.0
alias ata1      /dev/disk/by-path/pci-0000:00:1f.2-ata-1
From past I tried to detach, labelclear and attach the device with its wanted path name but with no success. It seems tricky to rename a vdev.
 
Last edited:

thulle

Member
Apr 11, 2019
48
18
8
What to do with the log ? Is the problem will be identified here ?
The line
mars 08 14:46:14 HOST kernel: txg_sync_thread+0x2e0/0x4a0 [zfs]
suggests to my amateur reading that we're stuck waiting for a write to disk to complete, and since there's no active drive in the pool it can't complete.
I managed to find some slightly older roadmap-powerpoints from one of those OpenZFS-conferences, recovery from all-vdevs-gone seemed to be planned for OpenZFS 3.0 - so it seems like this issue is something they're aware of, but it will be a while until they'll prioritize it.

From past I tried to detach, labelclear and attach the device with its wanted path name but with no success. It seems tricky to rename a vdev.
IIRC you need to export the pool and import it with the -d flag specifying either a folder to look in for the devices, or the devices themselves if they're in a folder with other aliases for the devices.

i.e.
zpool import -d /dev/vdev_aliases/ pool

or, if the aliases are directly in /dev you might need
zpool import -d /dev/ata1 -d /dev/ata2 pool
 

lenainjaune

New Member
Sep 21, 2022
11
0
1
As someone suggests me I opened a thread in the ZOL list in parallel to this thread.

The line
mars 08 14:46:14 HOST kernel: txg_sync_thread+0x2e0/0x4a0 [zfs]
suggests to my amateur reading that we're stuck waiting for a write to disk to complete, and since there's no active drive in the pool it can't complete.
I managed to find some slightly older roadmap-powerpoints from one of those OpenZFS-conferences, recovery from all-vdevs-gone seemed to be planned for OpenZFS 3.0 - so it seems like this issue is something they're aware of, but it will be a while until they'll prioritize it.
My ZOL version is 2.0.3-9 and the ZOL list gave the same answer as you : this functionality is not implemented for now and seems difficult to integer as I understood.

IIRC you need to export the pool and import it with the -d flag specifying either a folder to look in for the devices, or the devices themselves if they're in a folder with other aliases for the devices.

i.e.
zpool import -d /dev/vdev_aliases/ pool

or, if the aliases are directly in /dev you might need
zpool import -d /dev/ata1 -d /dev/ata2 pool
None of these commands working in my case :
Code:
root@host:~# zpool status
...
    NAME                        STATE     READ WRITE CKSUM
    pool                          ONLINE       0     0     0
      mirror-0                  ONLINE       0     0     0
        pci-0000:00:1f.2-ata-1  ONLINE       0     0     0
        pci-0000:00:1f.2-ata-2  ONLINE       0     0     0

root@host:~# zpool export pool
root@host:~# zpool status
no pools available

root@host:~# mkdir /dev/vdevs_aliases
root@host:~# ln -s /dev/disk/by-path/pci-0000:00:1f.2-ata-1 /dev/vdevs_aliases/ata-1
root@host:~# ln -s /dev/disk/by-path/pci-0000:00:1f.2-ata-2 /dev/vdevs_aliases/ata-2
root@host:~# zpool import -d /dev/vdevs_aliases/ pool
cannot import 'pool_bkp': no such pool available
root@host:~# zpool import -d /dev/vdevs_aliases/ata-1 -d /dev/vdevs_aliases/ata-2 pool_bkp
cannot import 'pool_bkp': no such pool available
root@host:~# zpool import pool
root@host:~# zpool status
...
    NAME                        STATE     READ WRITE CKSUM
    pool                          ONLINE       0     0     0
      mirror-0                  ONLINE       0     0     0
        pci-0000:00:1f.2-ata-1  ONLINE       0     0     0
        pci-0000:00:1f.2-ata-2  ONLINE       0     0     0  (resilvering)
# => ok ... back to normal
Maybe due to the version of ZOL or the distro itself ..


But you put me on the right direction to rename a vdev :
Code:
root@host:~# zpool status
...
NAME STATE READ WRITE CKSUM
pool    ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
pci-0000:00:1f.2-ata-1 ONLINE 0 0 0
pci-0000:00:1f.2-ata-2.0 ONLINE 0 0 0

# => mirror with ata-1 vs ata-2.0

root@host:~# zpool detach pool pci-0000:00:1f.2-ata-2.0
root@host:~# zpool status
...
NAME STATE READ WRITE CKSUM
pool     ONLINE 0 0 0
pci-0000:00:1f.2-ata-1 ONLINE 0 0 0

# => NO more mirror

root@host:~# ls -l /dev/disk/by-path/* | grep -E "pci-0000:00:1f.2-ata-2.+[a-z]$"
lrwxrwxrwx 1 root root 9 11 mars 14:06 /dev/disk/by-path/pci-0000:00:1f.2-ata-2 -> ../../sdb
lrwxrwxrwx 1 root root 9 11 mars 14:06 /dev/disk/by-path/pci-0000:00:1f.2-ata-2.0 -> ../../sdb

# => we can see the 2 accessible aliases of sdb by ata-2 and ata-2.0

root@host:~# zpool attach pool pci-0000:00:1f.2-ata-1 /dev/disk/by-path/pci-0000\:00\:1f.2-ata-2
root@host:~# zpool status
...
NAME STATE READ WRITE CKSUM
pool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
pci-0000:00:1f.2-ata-1 ONLINE 0 0 0
pci-0000:00:1f.2-ata-2 ONLINE 0 0 0
# => it works :D !
=> now the vdevs are definitively with the same format (even after reboot)
 
Last edited: