Evolution from Storage Spaces

gea

Well-Known Member
Dec 31, 2010
2,655
909
113
DE
7237068410135738485 and disk WWN identify a disk while a number like c2t3d0 identifies a port (connector) on your mainboard or controller. C2t3d0 (Controller 2, Disk 3) is part of your pool so you cannot use it a s destination from a disk id.

You can replace simply the disk on same port (if new disk is on same physical location than the faulted)
zpool replace -f prima c2t3d0

or you must insert the new disk on a new location ex
zpool replace -f prima 7237068410135738485 c2t9d0

Whenever possible, use WWN disk detection.
LSI HBA ex with IT firmware does.
 

gea

Well-Known Member
Dec 31, 2010
2,655
909
113
DE
WWN is a standardised number the disk is given by the manufacturer just like the Mac adress of a network card. It is mostly printed on your disk or tools ex smartmontools can read them out.

WWN can be used to identify a disk and to remember even if you change port, HBA or even server. This is why WWN is preferred on professional setups. The way a system identifies a disk depends on hardware and driver. On some controller cards you work with port numbers like in your case (AHCI Sata, LSI controller with IR firmware, some raid adapters).

If you use an LSI HBA with IT firmware, disk detection is always and only done via WWN.
 

nonyhaha

Member
Nov 18, 2018
32
6
8
Hello @gea.

Unfortunately, there is no way to do that.. Each and every time I get the error:
invalid vdev specification
the following errors must be manually repaired:
/dev/dsk/c4t8d0s0 is part of active ZFS pool prima. Please see zpool(1M).
root@nappit:~#

I tried removing the disc and erasing it on another machine. Is there any command in napp-it to zro out the drive or remove all info about the pool that resided on it before?

The disk is ok, no smart errors and no errors during testing on another machine.

I also tried to initialize the disk and tick " force only clear ZFS label (Illumos) " but every time I do this and afterwards try to replace the disk in gui, I get:

Could not proceed due to an error. Please try again later or ask your sysadmin.
Maybe a reboot after power-off may help.

cannot replace 7237068410135738485 with c4t3d0: c4t3d0 is busy, or device removal is in progress

After this error, if I try to re-replace the disk, I get the previous error, that the disk is part of active zfs pool prima.
It is really driving me crazy. This is a very simple thing to do and I can't sort it out for 2 days now. What should be the normal steps to sort this out?

Is it going to work if I destroy everything and start from scratch, or will I get the same errors when I will try to remake the pool???

I really tried everything, configure command for the new disk, bringing it online in the prima pool, but when I try o replace it I still get the same errors. As I have a backup, I will try to destroy everything and see if I can restart the pool, but it looks like a very very very unstable system if there is this much to do just to replace a disk.
 
Last edited:

gea

Well-Known Member
Dec 31, 2010
2,655
909
113
DE
Your current state is unclear to me

Starting from a
Code:
NAME                     STATE     READ WRITE CKSUM
       prima                    DEGRADED     0     0     0
         raidz2-0               DEGRADED     0     0     0
           c2t0d0               ONLINE       0     0     0
           c2t1d0               ONLINE       0     0     0
           7237068410135738485  FAULTED      0     0     0  was /dev/dsk/c2t3d0s0
           c2t5d0               ONLINE       0     0     0
           c2t2d0               ONLINE       0     0     0
           c2t4d0               ONLINE       0     0     0
           c2t6d0               ONLINE       0     0     0
           c2t8d0               ONLINE       0     0     0
Your normal action is (a little unclear why it displays a slice info s0, usually you always use the whole disk):
remove bad disk in bay c2t3d0 and replace with a good one and start a replace in same slot
zpool replace -f prima c2t3d0 or
zpool replace -f prima c2t3d0s0


If you add a new disk in another empty bay ex in c2t9d0 :
zpool replace -f prima 7237068410135738485 c2t9d0

or
zpool replace -f prima c2t3d0s0 c2t9d0
zpool replace -f prima c2t3d0 c2t9d0

if your system is not hotplug capable, power off prior disk insert or remove.
If you just unplug the faulted disk, the state should there go to c2t3d0 removed
 

nonyhaha

Member
Nov 18, 2018
32
6
8
Sorry to bring this up again @gea but I am having some issues again. I have upgraded my servers' cpus and after I start all the vms, including nappit, I can acces the pool through smb share for a few seconds, after which nappit maching becomes kind of unresponsive with regards to the pool. While running:
root@nappit:~# zpool status prima
pool: prima
state: ONLINE
scan: scrub repaired 0 in 25h41m with 0 errors on Tue Dec 8 01:26:04 2020

and the cursor does not go back to #. It simply remains at a new blank line and no more commands can be sent in console or through putty.

Did you encounter or see any issues like this? I know I do not have to rebuild it, and I really can't do this now, and I need access to my data :)
Where should I start?

Do you maybe have a discord account?

l.e. I am lucky , I detached all 8 drives, started the vm without any,, shut it down, reattached all disks, and it is now working.

After 2 minutes it went haywire again. Can't access it :(
Any command to display any kind of info about the disks or the data on them results in a broken terminal.
I am desperate.

I tried something. After the restart, I did not start some apps using the smb share from the windows server vm, like Emby, and 2 torrent clients. It looks like when starting Emby ot any of the torrent clients, nappit vm can't handle it and freezes up.

Over the noght i was thinking a reset of the windows machine should be a try. And after the restart there were no more problems. But i really have no clue why this was happening.
 
Last edited:

gea

Well-Known Member
Dec 31, 2010
2,655
909
113
DE
If the pool list freezes (ex on a zpool show) it is mostly due a disk freeze. You can open a console then and try a format to list disks (ctrl-c to cancel after listing). If a disk freezes also napp-it freezes as it wants to read disks and pools.

Check logs for details. Mostly its a single bad disk (or cabling). Controller is not as critical. Bad ram or PSU can be another reason. If the system is unresponsable remove all data disks, check logs and then add disk by disk followed by a format to detect. All disks should be listed in same time. A delay may be a hint.
 

nonyhaha

Member
Nov 18, 2018
32
6
8
If the pool list freezes (ex on a zpool show) it is mostly due a disk freeze. You can open a console then and try a format to list disks (ctrl-c to cancel after listing). If a disk freezes also napp-it freezes as it wants to read disks and pools.

Check logs for details. Mostly its a single bad disk (or cabling). Controller is not as critical. Bad ram or PSU can be another reason. If the system is unresponsable remove all data disks, check logs and then add disk by disk followed by a format to detect. All disks should be listed in same time. A delay may be a hint.
Hi Gea. Thanks for the reply.
It is a very wired issue. I think both vms started at approximately the same time and this made the windows vm crash the nappit vm. After i restarted only the windows vm, all was working a-ok. I do not understand why. On windows startup, emby server - that runs on windows - starts and is looking for the files hosted on nappit.
 

gea

Well-Known Member
Dec 31, 2010
2,655
909
113
DE
On a ZFS pool with problems after some time, a VM with a high disk load may show a problem that VM with low disk load may not show.

To find the problem look at ZFS (logs, disk, smart etc) or RAM (that is common to ZFS and VMs)