Evolution from Storage Spaces

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

gea

Well-Known Member
Dec 31, 2010
3,141
1,182
113
DE
7237068410135738485 and disk WWN identify a disk while a number like c2t3d0 identifies a port (connector) on your mainboard or controller. C2t3d0 (Controller 2, Disk 3) is part of your pool so you cannot use it a s destination from a disk id.

You can replace simply the disk on same port (if new disk is on same physical location than the faulted)
zpool replace -f prima c2t3d0

or you must insert the new disk on a new location ex
zpool replace -f prima 7237068410135738485 c2t9d0

Whenever possible, use WWN disk detection.
LSI HBA ex with IT firmware does.
 

gea

Well-Known Member
Dec 31, 2010
3,141
1,182
113
DE
WWN is a standardised number the disk is given by the manufacturer just like the Mac adress of a network card. It is mostly printed on your disk or tools ex smartmontools can read them out.

WWN can be used to identify a disk and to remember even if you change port, HBA or even server. This is why WWN is preferred on professional setups. The way a system identifies a disk depends on hardware and driver. On some controller cards you work with port numbers like in your case (AHCI Sata, LSI controller with IR firmware, some raid adapters).

If you use an LSI HBA with IT firmware, disk detection is always and only done via WWN.
 

nonyhaha

Member
Nov 18, 2018
50
12
8
Hello @gea.

Unfortunately, there is no way to do that.. Each and every time I get the error:
invalid vdev specification
the following errors must be manually repaired:
/dev/dsk/c4t8d0s0 is part of active ZFS pool prima. Please see zpool(1M).
root@nappit:~#

I tried removing the disc and erasing it on another machine. Is there any command in napp-it to zro out the drive or remove all info about the pool that resided on it before?

The disk is ok, no smart errors and no errors during testing on another machine.

I also tried to initialize the disk and tick " force only clear ZFS label (Illumos) " but every time I do this and afterwards try to replace the disk in gui, I get:

Could not proceed due to an error. Please try again later or ask your sysadmin.
Maybe a reboot after power-off may help.

cannot replace 7237068410135738485 with c4t3d0: c4t3d0 is busy, or device removal is in progress

After this error, if I try to re-replace the disk, I get the previous error, that the disk is part of active zfs pool prima.
It is really driving me crazy. This is a very simple thing to do and I can't sort it out for 2 days now. What should be the normal steps to sort this out?

Is it going to work if I destroy everything and start from scratch, or will I get the same errors when I will try to remake the pool???

I really tried everything, configure command for the new disk, bringing it online in the prima pool, but when I try o replace it I still get the same errors. As I have a backup, I will try to destroy everything and see if I can restart the pool, but it looks like a very very very unstable system if there is this much to do just to replace a disk.
 
Last edited:

gea

Well-Known Member
Dec 31, 2010
3,141
1,182
113
DE
Your current state is unclear to me

Starting from a
Code:
NAME                     STATE     READ WRITE CKSUM
       prima                    DEGRADED     0     0     0
         raidz2-0               DEGRADED     0     0     0
           c2t0d0               ONLINE       0     0     0
           c2t1d0               ONLINE       0     0     0
           7237068410135738485  FAULTED      0     0     0  was /dev/dsk/c2t3d0s0
           c2t5d0               ONLINE       0     0     0
           c2t2d0               ONLINE       0     0     0
           c2t4d0               ONLINE       0     0     0
           c2t6d0               ONLINE       0     0     0
           c2t8d0               ONLINE       0     0     0
Your normal action is (a little unclear why it displays a slice info s0, usually you always use the whole disk):
remove bad disk in bay c2t3d0 and replace with a good one and start a replace in same slot
zpool replace -f prima c2t3d0 or
zpool replace -f prima c2t3d0s0


If you add a new disk in another empty bay ex in c2t9d0 :
zpool replace -f prima 7237068410135738485 c2t9d0

or
zpool replace -f prima c2t3d0s0 c2t9d0
zpool replace -f prima c2t3d0 c2t9d0

if your system is not hotplug capable, power off prior disk insert or remove.
If you just unplug the faulted disk, the state should there go to c2t3d0 removed
 

nonyhaha

Member
Nov 18, 2018
50
12
8
Sorry to bring this up again @gea but I am having some issues again. I have upgraded my servers' cpus and after I start all the vms, including nappit, I can acces the pool through smb share for a few seconds, after which nappit maching becomes kind of unresponsive with regards to the pool. While running:
root@nappit:~# zpool status prima
pool: prima
state: ONLINE
scan: scrub repaired 0 in 25h41m with 0 errors on Tue Dec 8 01:26:04 2020

and the cursor does not go back to #. It simply remains at a new blank line and no more commands can be sent in console or through putty.

Did you encounter or see any issues like this? I know I do not have to rebuild it, and I really can't do this now, and I need access to my data :)
Where should I start?

Do you maybe have a discord account?

l.e. I am lucky , I detached all 8 drives, started the vm without any,, shut it down, reattached all disks, and it is now working.

After 2 minutes it went haywire again. Can't access it :(
Any command to display any kind of info about the disks or the data on them results in a broken terminal.
I am desperate.

I tried something. After the restart, I did not start some apps using the smb share from the windows server vm, like Emby, and 2 torrent clients. It looks like when starting Emby ot any of the torrent clients, nappit vm can't handle it and freezes up.

Over the noght i was thinking a reset of the windows machine should be a try. And after the restart there were no more problems. But i really have no clue why this was happening.
 
Last edited:

gea

Well-Known Member
Dec 31, 2010
3,141
1,182
113
DE
If the pool list freezes (ex on a zpool show) it is mostly due a disk freeze. You can open a console then and try a format to list disks (ctrl-c to cancel after listing). If a disk freezes also napp-it freezes as it wants to read disks and pools.

Check logs for details. Mostly its a single bad disk (or cabling). Controller is not as critical. Bad ram or PSU can be another reason. If the system is unresponsable remove all data disks, check logs and then add disk by disk followed by a format to detect. All disks should be listed in same time. A delay may be a hint.
 

nonyhaha

Member
Nov 18, 2018
50
12
8
If the pool list freezes (ex on a zpool show) it is mostly due a disk freeze. You can open a console then and try a format to list disks (ctrl-c to cancel after listing). If a disk freezes also napp-it freezes as it wants to read disks and pools.

Check logs for details. Mostly its a single bad disk (or cabling). Controller is not as critical. Bad ram or PSU can be another reason. If the system is unresponsable remove all data disks, check logs and then add disk by disk followed by a format to detect. All disks should be listed in same time. A delay may be a hint.
Hi Gea. Thanks for the reply.
It is a very wired issue. I think both vms started at approximately the same time and this made the windows vm crash the nappit vm. After i restarted only the windows vm, all was working a-ok. I do not understand why. On windows startup, emby server - that runs on windows - starts and is looking for the files hosted on nappit.
 

gea

Well-Known Member
Dec 31, 2010
3,141
1,182
113
DE
On a ZFS pool with problems after some time, a VM with a high disk load may show a problem that VM with low disk load may not show.

To find the problem look at ZFS (logs, disk, smart etc) or RAM (that is common to ZFS and VMs)
 

nonyhaha

Member
Nov 18, 2018
50
12
8
Hello @gea.
For all this time I just set the napp=it vm to start with a 4 minute headstart, and I have had no more issues.

Do you know if napp-it has hpsa driver?
I am searching for an hba adapter to pass it directly to the napp-it machine, in order to have more relevant data about the drives connected to it and I have found an HP H240 card, which is using a custom chipset, and is supported only bu the hpsa drivers.
 

gea

Well-Known Member
Dec 31, 2010
3,141
1,182
113
DE
Napp-it has no drivers. You must look at the underlying OS, ex illumos HCL - WIP or https://www.oracle.com/webfolder/technetwork/hcl/index.html

But if you want a "it just works" experience, use what others use especially on Unix. Near to 100% of all production ZFS systems use HBAs based on LSI chipsets like the older 6G ones LSI 2008 or 2307 or the newer 12G chipset 3008.

Even on a mainstream Linux or Windows this is "best use". I would look for a new or used LSI 3008 based HBA (BroadCom or OEM), see https://forums.servethehome.com/ind...omplete-listing-plus-oem-models.599/post-4321

These HBAs are available with different fimwares. Each firmware needs a different driver. Avoid Raid-5 firmware or use a HBA with proven firmware crossflash capability. Best is IT firmware (pure HBA). IR firmware (HBA+Raid 1/10) is also ok.
 
Last edited:

nonyhaha

Member
Nov 18, 2018
50
12
8
Thank you very much @gea.
I ended up ordering a dell h310 and flashed it to it.

I am now in the process of moving data from the old 8 disk pool to a new 4 disk pool (larger drives, less power consumed).
I have a small problem and I am trying to get to the bottom of it. While copying a large amount of large files, transfer rate from old pool to new one seem to be capped at 1 gigabit. but it is not consistent even there. I searched a little bit, and checked if jumbo frames are enabled.
I am not sure if this is the issue but first of all, mtu is 1500 on napp-it. The simple tutorial from How to Enable Support for Jumbo Frames (System Administration Guide: Network Interfaces and Network Virtualization) fails on command 'dladm set-linkprop -p mtu=9000 data-link'.
Do you know of another way to set jumbo frames and thest the copying speed?

I mention that the copying is done from a smb client (win2019).
 

nonyhaha

Member
Nov 18, 2018
50
12
8
In the end, after swapping everything - case, all hardware including all hard drives, from time to time I still get random errors on one drive, and I narrowed it down to the drive missing connection with the adapter. I can blame this on small vibrations over time, creating disconnects between the cables and drives. Once every 4-6 months I have to stop the server and reseat the cables into the drives. On quick resilver later, all is up and running.

Hi @gea how are you doing?
I am now trying to resolve an older issue that hasn't been bothering me until now.
I have NFS shares on the SunOS san 5.11 omnios-r151038 server. These are mounted on a Ubuntu VM, with same uid and gid on both client and server.
On the server side, ls-al shows correct user and group on files created on those shares.
However, on the client side, same files show up as owned by "nobody".
I need to run some apps on those shares, for my nextcloud server, but the Ubuntu VM is throwing errors because of the owning user.

How can i make it so the client displays the correct user and group, in order to be able to run said apps?

As always, best regards!
Noni.
 

gea

Well-Known Member
Dec 31, 2010
3,141
1,182
113
DE
As there is no authentication nor authorisation with NFS3, the usual way is to set all file permissions recursively to modify or full for everyone. You can also use the root sharing option when you share via NFS to allow root full access. You can restrict access anyway only based on client ip as client uid without authentication is not secure.

The file owner uid depends on the combination of server and client OS.
 

nonyhaha

Member
Nov 18, 2018
50
12
8
Hello, thanks for the quick reply.
On the server side, I created a new folder on one of the pools for testing.(new zfs filesystem from the webgui)
These are the permissions for that new folder:
drwxrwxrwx+ 4 root root 4 Nov 3 08:01 merge
I mounted this share on my client. I can access and write to it with both root and another user. the result is this:
on client side, still all data created appears as owned by nobody.
on server side, data created by root appears as owned by nobody.
on server side, data created by another user appears owned by another user.

"set all file permissions recursively to modify or full for everyone" equals to chmod -R 775/777 on the shared folder on server side? Or am I missing something?
in the gui, the new test share looks like this:


 
Last edited:

gea

Well-Known Member
Dec 31, 2010
3,141
1,182
113
DE
When a NFS client creates a file, owner is either the client user id or nobody, You cannot modify this behaviour. It differs on NFS server and client pairs. Anyway you can use owner and user settings to restrict access only via SMB or NFS up from v4, not for the NFS3 protocol. Unlike usual Unix behaviours, root has no special rights over NFS shares unless you explicitely set the no_root_squash option on Linux or enable via root= share option on Solaris). What you can do is do allow all users access, either locally via chmod or remotely via napp-it or Windows and SMB (root or admin user).

Unlike LInux, Solaris uses ZFS via NFSv4 ACL (a superset of Windows ntfs alike ACL with folder inheritance, Posix ACL and classic permissions like 755). If you set permissions via chmod ex 775, your NFSv4 ACL settings loose their inheritance and overall ACL permission is reduced to a level that fits with the limited classic permission options. For NFS the effect is mostly minimal unless you need fine granular NFSv4 ACL over NFS4. If You use Solaris SMB the effect is critical as Solaris SMB is fully based on the Microsoft Windows ntfs way of ACL handling. It always and only use ACL and the lost of ACL inheritance may result in unwanted changes of access permissions. If you use SMB, you should only use ACL like everyone@ and avoid classic permissions like 755. You can set NFSv4 ACL locally, via napp-it or in the easiest way via Windows after you SMB connect as root/ admin.

If you set permissions, care about folder settings and file settings. You need execute rights on a folder to open the folder and additionally file permissions to read or modify a file.


Share options (OmniOS behaves like Oracle Solaris 11.2, Solaris 11.4 has new share options)
 
Last edited:

nonyhaha

Member
Nov 18, 2018
50
12
8
At the moment I am not having any issues because on server side everything is looking good.
The problem is that on the client side, any file owner appears as "nobody", thus making it impossible to run apps hosted on that nfs share as a linux user.
And I would like for the clients to display the correct owners of the files on the NFS shares.
 

gea

Well-Known Member
Dec 31, 2010
3,141
1,182
113
DE
If you need authentication (who is connected) and authorisation (what is allowed) you need another protocol like NFS4 or SMB what is the preferred one. Sun developped NFS3 on Solaris as an easy and very fast method to provide a network filesystem in a trusted environment. On NFS3 you can simply change/count up uid and try to connect. No protection against such a try and error method. Only way to restrict a NFS3 share is based on ip or firewall based on ip or nic . On NFS3 everything works on a goodwill base. This is why owner often is nobody to reflect this.

NFS3 is quite like a SMB share with an anonymous guest access allowed.