napp-it ZFS server on OmniOS/Solaris: news, tips and tricks

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

gea

Well-Known Member
Dec 31, 2010
3,163
1,195
113
DE
How to duplicate a ZFS pool with ongoing replications?
New pool should be exact identical to old pool but for ex. larger or with a different vdev structure


You must care of the following:

1. Transfer ZFS filesystems with a "recursive" job setting
This includes all datasets in the transfer (sub filesystems, snaps, zvols)

2. A ZFS replication creates the new filesystem(s) below the destination ZFS filesystem
pool1 -> pool2 results in pool2/pool1

If you want an identical structure you must create (recursive) jobs for each 1st level filesystem ex
pool1/fs1 -> pool2 gives you pool2/fs1

If the new pool should be named like oldpool (ex pool1):
-destroy old pool pool1 after transfers are done, export pool2 and import as pool1

3. A replication transfers not all ZFS properties like compress or sync
Filesystem attibute like ACL are preserved.

If you want the same ZFS properties you must apply them after the replication.
A better way is to set them on the parent target filesystem ex pool2 prior replications.
They are then inherited to the new filesystems,

4. Some ZFS properties can only be set on creation time
ex upper/lowercase behaviour or character sets

If you use napp-it to create pools, settings are identical.

5. Ongoing Replication/backup jobs
You can continue old replication jobs if
- pool structure remains identical
- you have snappairs of former replications on both sides ex jobid_repli_source/target_nr_1037

If you want to recreate a replication job that continues incremental transfers
- recreate the job with same source/destination settings and the old jobid
Jobid is part of old snapnames

Or rerun an initial transfer
Rename the old destination filesystems ex to filesystem.bak to preserve them in case of problems. Then rerun a replication (full transfer). After success, destroy the .bak filesystem. Next replications are then incremental again.
 
Last edited:
  • Like
Reactions: mrpasc

gea

Well-Known Member
Dec 31, 2010
3,163
1,195
113
DE
New feature in napp-it 23.dev (Apr 05):
ZFS autosnaps and ZFS replications of ESXi/NFS filesystems with embedded ESXi hot memory snaps.


If you want to backup running VMs on ESXi, you mostly use commercial tools like VEEAM that supports coalesce (stop a filesystem during backup) or can include ESXi hot memory state.

If you use ZFS to store VMs you can use ZFS snaps for versioning or to save and restore them either via a simple SMB/NFS copy, Windows previous versions or ZFS replication. This works well but only for VMs at down state during backup as a ZFS snap is like a sudden power off. There is no guarantee that a running VM becomes not corrupted in a ZFS snap. While ESXi can provide save snaps with coalesce or hot memory state, you cannot use them alone for a restore as they rely on the VM itself. A corrupt VM cannot be restored from ESXi snaps while you can restore a VM from ZFS snaps. As ESXi snaps are delta files they grow over time so you should under no circumstances use more than a few ESXi snaps for no longer than a few days.

So why not combine both. Unlimited ZFS snaps with the recovery options of ESXi snaps. This can be achieved if you create an ESXi snap prior the ZFS snap that then includes the ESXi snap. After the ZFS snap is done, the ESXi snap can be destroyed.

Napp-it 23.dev automates this

Howto setup:
- update napp-it to current 23.dev
- add the needed Perl modules to OmniOS,
see https://forums.servethehome.com/ind...laris-news-tips-and-tricks.38240/#post-367124
- Enter ESXi settings (ip, root, pw and NFS datastores) in napp-it menu System > ESXi > NFS datastore

-list autosnap or replication snaps in napp-it menu Jobs
Click on the jobid to enter settings, add the ip of the ESXI server
- run the autosnap or replication job
Each ZFS snap will then include an ESXi snap. As a VM is stopped for a few seconds run this at low usage times.
- click on replicate or snap in the line of the job to check log entries

Restore a VM in a running state:
- shutdown all VMs
- restore a single VM folder from a ZFS snap, either via SMB/NFS copy, Windows previous versions,
filesystem rollback or replication

ESXi will see the ESXi snaps after a reboot, so reboot now
- power on a VM and restore the last ESXi snap. The VM is then at the state of backup time in power on state.

more, https://www.napp-it.org/doc/downloads/napp-in-one.pdf
 

gea

Well-Known Member
Dec 31, 2010
3,163
1,195
113
DE
New feature in napp-it 23.dev (Apr 05):
ZFS autosnaps and ZFS replications of ESXi/NFS filesystems with embedded ESXi hot memory snaps.


If you want to backup running VMs on ESXi, you mostly use commercial tools like VEEAM that supports coalesce (stop a filesystem during backup) or can include ESXi hot memory state.

If you use ZFS to store VMs you can use ZFS snaps for versioning or to save and restore them either via a simple SMB/NFS copy, Windows previous versions or ZFS replication. This works well but only for VMs at down state during backup as a ZFS snap is like a sudden power off. There is no guarantee that a running VM becomes not corrupted in a ZFS snap. While ESXi can provide save snaps with coalesce or hot memory state, you cannot use them alone for a restore as they rely on the VM itself. A corrupt VM cannot be restored from ESXi snaps while you can restore a VM from ZFS snaps. As ESXi snaps are delta files they grow over time so you should under no circumstances use more than a few ESXi snaps for no longer than a few days.

So why not combine both. Unlimited ZFS snaps with the recovery options of ESXi snaps. This can be achieved if you create an ESXi snap prior the ZFS snap that then includes the ESXi snap. After the ZFS snap is done, the ESXi snap can be destroyed.

Napp-it 23.dev automates this

Howto setup:
- update napp-it to current 23.dev
- add the needed Perl modules to OmniOS,
- Enter ESXi settings (ip, root, pw and NFS datastores) in napp-it menu System > ESXi > NFS datastore

to go back to such a snap:
- rollback the NFS filesystem or restore a single VM from a ZFS snap via SMB and Windows previous version
- reload VM settings ex via Putty and vim-cmd vmsvc/reload vmid (or reboot)
- go back to the save ESXi snap

 
Last edited:

gea

Well-Known Member
Dec 31, 2010
3,163
1,195
113
DE
A Data risk analysis
Main risks for a dataloss listed by relevance.

1. OMG, something happened to my data
Human errors, Ransomware or sabotage by a former employee


- Last friday, i deleted a file. I need it now
- 6 weeks ago i was infected by Ransomware that already encrypted some data
- 6 months ago data was modified by a former employee when was sacked after a dispute

- Occurance: very often
- methods against:
read only data versioning with at least a daily version for current week,
a weekly version for last month and a monthly version for last year

Howto protect agaist: Best solution is to use ZFS snaps, ou can hold thousands of readonly snaps without problem. They are created without delay and space consumption is only amount of modified datablocks fo a former snap. Only Unix root and not a Windows admin user can destroy ZFS snaps and not remote but only locally on server.

- Restore: simple, connect a ZFS filesystem via SMB und use Windows „previous versions“ to restore single files or folders.
On Solaris SMB „ZFS Snaps=previous versions“ is zero config, with SAMBA you must care about snap folder settings.

or use ZFS rollback to set back a whole filesystem in time.

Alternative: use tape backups (hundreds of). A restore is very time consuming and
can be quite complicated with differential backups.

2. OMG, a diaster has happened
not the daily problems, a real disaster


- A fire destroyed your server(s)
- A thieve has stolen your server(s)

Occurance: maybe never, but be prepared or anything can be lost
methods against: Create ongoing external daily disaster backups
Use disks or tapes or ZFS replicate to an external site or a removeable pool that you unplug/move after backup.

- Restore: simple but very time consuming. A restore of a large pool from backup can last days
and without a current disaster backup data state is not recent.
As ZFS is a Unix filesystem, Windows AD SMB ACL permissions are not restored automatically ex on SAMBA
but requires a correct mapping Unix uid -> Windows SID. The Solaris kernelbased server is not affected
by mapping problems as it uses Windows SID directly as extended ZFS attributes.

If a disaster restore is not easy and straight forward with a simple copy method, test it prior an emergency case

3. OMG, i missed a hardware failure for too long
Servers are not „set and forget“


- a disk failed, then the next, then the pool
- a fan or climatisation failed and due overtemperature disks are damaged, data is corrupted

occurance: maybe once every few years
methods against: Monitor hardware and use mail alerts or you have a disaster case (see 2.)

4. OMG, I cannot trust my data or backups
suddently you discover a corrupted file or image with black areas, text errors or problems to open applications or files.


Can you trust then any of your data?

occurance: prior ZFS sometimes, never with ZFS as ZFS protects data and metadata with checksums and repairs during read
on the fly from Raid redundancy or on a regular base ex once every few months with a pool scrub.

5. Ok, I cared about OMG problems with ZFS, many snaps and an external daily or weekly disaster backup
Anything left to consider?


There are indeed some remaining smaller problems even with ZFS

- Server crashes during write:
In this case the affected file is lost. ZFS can not protect you. With small files there is a minimal chance that data
already completely in the rambased write cache. With sync enabled the file is written on next reboot.

Only the writing application can protect whole f iles against such incidents ex Word with temp files.
ZFS can protect the filesystem and commited writes.

- incomplete atomic writes. Atomic writes are minimal dependent write operations that must be done completly.

An example is when a system crashes after data is written to storage and prior the needed update of metadata or when a database writes dependent transactions ex move money from one account to another with the result of dataloss or a corrupted filesystem.

In a Raid ex a mirror all data is written sequentially disk by disk. A sudden crash results in a currupted Raid/ mirror.

ZFS itself is not affected due Copy on Write where atomic writes are done completey or discarded at all so a currupted filesystem or Raid cannot occure by filesystem design. If you additionally want that any commited write is on save storage, you must enable sync.

If you have VMs with non Copy on Write filesystems, ZFS can guarantee for iteslf but not for guest operating systems. A crash during write can corrupt a VM. Activating sync write on ZFS can guarentee atomic writes and filesystems on VMs..

-RAM errors
Google for „ram error occurrence“ about the risk

All data is processed in RAM prior write. A RAM problem can corrupt these data that is then written to disk.
Even ZFS checksums cannot help as it can happen that you have bad data with proper checksums.

The risk of RAM errors is a small statistical risk that increases with RAM. A modern 32GB server has 64x the risk than a 512MB server with same quality of RAM. Only single errors are a problem. Bad RAM with many errors mostly results in an OS crash/ kernel panic or „too many errors“ on reads with a disk or pool offline on ZFS. The „myth“ of a ZFS scrub to death where ZFS wrongly repairs good data is a myth.

Anyway, if you care about data on a filer, always use ECC.
Even without ECC ZFS offers more security than older filesystems without ECC.

-Silent data errors/ bit rot
Google „Bit Rot“

This affects mostly long term storage with a statistical amount of data corruptions by chance over time. Some but not all can be repaired by the disk itself. ZFS can detect and repair all bitrot problems during read or a scrub of all data.
On long term storage, start regular scrubs ex once every few months to validate a pool prior problems become serious.

- Insufficient redundancy.
ZFS holds metadata twice and can hold data twice with a copies=2 setting even on single disks.
A ZFS raid offers a redundancy that counts in allowed disk failures until a pool is lost

As this is also a statistical problem, a rule of thumb is:
Best is when you allow any two disks to fail. This is the case with a raid-Z2 or 3way mirror. With more than say 10 disks per vdev, consider Z3. Slog or L2Arc do not need redundancy due a fallback to pool without dataloss. Only in case of an Slog failure combined with a system crash you see a loss of data in ramcache beside a performance degration.
If you use special vdevs, always use mirrors as a vdev lost means pool lost.

- SSD without powerloss protection
A crash during write on such SSDs can result in a dataloss or corrupted filesystem. If the SSD is part of a ZFS raid, problems can be repaired based on checksum errors, If the SSD is not in a Raid ex an Slog, SSD powerloss protection is mandatory.
 
Last edited:
  • Like
Reactions: Aluminat and mrpasc

gea

Well-Known Member
Dec 31, 2010
3,163
1,195
113
DE
Update: Method to include ESXi hotmemory or quiesce snaps in ZFS snaps
Up from newest napp-it 23.dev ssh and soap are suppoerted

Why:
ESXi snaps are save. They can include memory state (restore to online state) or
quiesce where the guest filesystem is freezed during a snap (require VMware tools).
ESXi snaps are limited in number (only a few) and age (only for a few days).
You cannot use ESXi snaps for backups as you cannot rollback when the main VM file is corrupted.

ZFS snaps are not limited in numbers or age. As a ZFS snap includes all files at snaptime,
you can backup and fully restore a VM from a ZFS snap. But as a ZFS snap is like a sudden
powerloss, a VM in a ZFS snap is not save and can be corrupted.

The solution is to include save ESXi snaps within your ZFS snaps.
A VM restore is then:
- power down VM
- restore VM folder via Windows SMB and previous version or ZFS rollback
- reload VM settings via Putty and vim-cmd vmsvc/reload vmid or
napp-it menu System > ESXi > SSH: list snaps
- restore ESXi snap via ESXi webmanagement

new in current napp-it 23.dev from today: ESXi remote management via soap or SSH

zfs_esxi_snaps.png
 

gea

Well-Known Member
Dec 31, 2010
3,163
1,195
113
DE
Update
napp-it 23.dev (apr.30) can include ESXi snaps (quiesce or hotmem) in ZFS snaps (replications or autosnap) and restore a VM from a ZFS snap with the option to rollback to last ESXi snap

Setup
Use a ZFS filesystem via NFS to store VMs
Update to napp-it 23.dev (napp-it free, use an evalkey from napp-it.org to update to .dev).
Configure SSH (see menu System > ESXi)
Add ip of your ESXi server in autosnap or replication job settings (esxi_ip)
Create ZFS snaps

Restore a VM from a ZFS snap
Use menu System > ESXi > VM restore, select a VM and a snap to restore

restore.PNG
 
  • Like
Reactions: mrpasc

gea

Well-Known Member
Dec 31, 2010
3,163
1,195
113
DE
Join OmniOS, OpenIndiana or Solaris to a Windows Active Directory, a base method in 7 steps
You can join a Domain in napp-it menu Service > SMB > Active Directory


This follows these steps:

1. sync date with AD server ex
ntpdate 192.168.2.124

2. edit /etc/resolv.conf with nameserver=AD server ip and domain local.de
search local.de
domain local.de
nameserver 192.168.2.124

3. check /etc/pam.conf
Code:
#
# CDDL HEADER START
#
# The contents of this file are subject to the terms of the
# Common Development and Distribution License (the "License").
# You may not use this file except in compliance with the License.
#
# You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
# or http://www.opensolaris.org/os/licensing.
# See the License for the specific language governing permissions
# and limitations under the License.
#
# When distributing Covered Code, include this CDDL HEADER in each
# file and include the License file at usr/src/OPENSOLARIS.LICENSE.
# If applicable, add the following below this CDDL HEADER, with the
# fields enclosed by brackets "[]" replaced with your own identifying
# information: Portions Copyright [yyyy] [name of copyright owner]
#
# CDDL HEADER END
#
#
# Copyright 2010 Sun Microsystems, Inc.  All rights reserved.
# Use is subject to license terms.
#
# PAM configuration
#
# Unless explicitly defined, all services use the modules
# defined in the "other" section.
#
# Modules are defined with relative pathnames, i.e., they are
# relative to /usr/lib/security/$ISA. Absolute path names, as
# present in this file in previous releases are still acceptable.
#
# Authentication management
#
# login service (explicit because of pam_dial_auth)
#
login        auth requisite                pam_authtok_get.so.1
login        auth required                pam_dhkeys.so.1
login        auth required                pam_unix_cred.so.1
login        auth required                pam_unix_auth.so.1
login        auth required                pam_dial_auth.so.1
#
# rlogin service (explicit because of pam_rhost_auth)
#
rlogin        auth sufficient                pam_rhosts_auth.so.1
rlogin        auth requisite                pam_authtok_get.so.1
rlogin        auth required                pam_dhkeys.so.1
rlogin        auth required                pam_unix_cred.so.1
rlogin        auth required                pam_unix_auth.so.1
#
# Kerberized rlogin service
#
krlogin        auth required                pam_unix_cred.so.1
krlogin        auth required                pam_krb5.so.1
#
# rsh service (explicit because of pam_rhost_auth,
# and pam_unix_auth for meaningful pam_setcred)
#
rsh        auth sufficient                pam_rhosts_auth.so.1
rsh        auth required                pam_unix_cred.so.1
#
# Kerberized rsh service
#
krsh        auth required                pam_unix_cred.so.1
krsh        auth required                pam_krb5.so.1
#
# Kerberized telnet service
#
ktelnet        auth required                pam_unix_cred.so.1
ktelnet        auth required                pam_krb5.so.1
#
# PPP service (explicit because of pam_dial_auth)
#
ppp        auth requisite                pam_authtok_get.so.1
ppp        auth required                pam_dhkeys.so.1
ppp        auth required                pam_unix_cred.so.1
ppp        auth required                pam_unix_auth.so.1
ppp        auth required                pam_dial_auth.so.1
#
# GDM Autologin (explicit because of pam_allow).  These need to be
# here as there is no mechanism for packages to amend pam.conf as
# they are installed.
#
gdm-autologin auth  required    pam_unix_cred.so.1
gdm-autologin auth  sufficient  pam_allow.so.1
#
# Default definitions for Authentication management
# Used when service name is not explicitly mentioned for authentication
#
other        auth requisite                pam_authtok_get.so.1
other        auth required                pam_dhkeys.so.1
other        auth required                pam_unix_cred.so.1
other        auth required                pam_unix_auth.so.1
#
# passwd command (explicit because of a different authentication module)
#
passwd        auth required                pam_passwd_auth.so.1
#
# cron service (explicit because of non-usage of pam_roles.so.1)
#
cron        account required        pam_unix_account.so.1
#
# cups service (explicit because of non-usage of pam_roles.so.1)
#
cups        account        required        pam_unix_account.so.1
#
# GDM Autologin (explicit because of pam_allow) This needs to be here
# as there is no mechanism for packages to amend pam.conf as they are
# installed.
#
gdm-autologin account  sufficient  pam_allow.so.1
#
# Default definition for Account management
# Used when service name is not explicitly mentioned for account management
#
other        account requisite        pam_roles.so.1
other        account required        pam_unix_account.so.1
#
# Default definition for Session management
# Used when service name is not explicitly mentioned for session management
#
other        session required        pam_unix_session.so.1
#
# Default definition for Password management
# Used when service name is not explicitly mentioned for password management
#
other        password required        pam_dhkeys.so.1
other        password requisite        pam_authtok_get.so.1
other        password requisite        pam_authtok_check.so.1
other        password required        pam_authtok_store.so.1
#
# Support for Kerberos V5 authentication and example configurations can
# be found in the pam_krb5(7) man page under the "EXAMPLES" section.
#
# smb settings set by napp-it installer
other   password required   pam_smb_passwd.so.1 nowarn
4. edit /etc/krb5/krb5.conf (care about lower/uppercase) example
# Domain is local.de, AD server is 192.168.2.124
Code:
# CDDL HEADER START
#
# The contents of this file are subject to the terms of the
# Common Development and Distribution License (the "License").
# You may not use this file except in compliance with the License.
#
# You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
# or http://www.opensolaris.org/os/licensing.
# See the License for the specific language governing permissions
# and limitations under the License.
#
# When distributing Covered Code, include this CDDL HEADER in each
# file and include the License file at usr/src/OPENSOLARIS.LICENSE.
# If applicable, add the following below this CDDL HEADER, with the
# fields enclosed by brackets "[]" replaced with your own identifying
# information: Portions Copyright [yyyy] [name of copyright owner]
#
# CDDL HEADER END
#
#
# Copyright 2007 Sun Microsystems, Inc.  All rights reserved.
# Use is subject to license terms.
#
# ident        "%Z%%M%        %I%        %E% SMI"
#

# krb5.conf template
# In order to complete this configuration file
# you will need to replace the __<name>__ placeholders
# with appropriate values for your network and uncomment the
# appropriate entries.
#
[libdefaults]
        default_realm = LOCAL.DE


[realms]
        LOCAL.DE = {
                kdc = 192.168.2.124
                admin_server = 192.168.2.124
                kpasswd_server = 192.168.2.124
                kpasswd_protocol = SET_CHANGE
        }

[domain_realm]
        .local.de = LOCAL.DE

[logging]
        default = FILE:/var/krb5/kdc.log
        kdc = FILE:/var/krb5/kdc.log
        kdc_rotate = {

# How often to rotate kdc.log. Logs will get rotated no more
# often than the period, and less often if the KDC is not used
# frequently.

                period = 1d

# how many versions of kdc.log to keep around (kdc.log.0, kdc.log.1, ...)

                versions = 10
        }

[appdefaults]
        kinit = {
                renewable = true
                forwardable= true
        }

5. set lmauth level ex to 4
sharectl set -p lmauth_level=4 smb

6. join ad with an AD adminuser ex to domain local.de
smbadm join -u administrator local.de

7. reload SMB server
svcadm reload smb/server



If you want to set Domain ACL from Windows:
- Windows must be a member of AD
- SMB connect as root or an admin and set ACL

You can control connected users or open files from Windows:
- SMB connect as an AD admin
- Connect Computer Management to your AD server
- Edit shares, user and open files

Backup/Restore
If you backup/restore files (kernelbased SMB server only, not with SAMBA):
aAll ACL remain intact without any needed settings or mappings as the OS/ZFS kernelbased SMB server offers native support for Windows SID


Another method with uid/gid from AD server, see
 
Last edited:

gea

Well-Known Member
Dec 31, 2010
3,163
1,195
113
DE
Which Slog beside Intel Optane to protect the ZFS rambased writecache?

In the last years situation was quite easy. When you wanted to protect data in the ZFS rambased writecache using databases or VM storage you just enabled sync write. To limit the performance degration with diskbased pools you simply added an affordable lower capacity Intel Optane from the 800, 90x, 1600 or 4801 series as an Slog. Models differ mainly on guaranteed powerloss protection (every Optane is quite ok regarding plp) and max write endurance.

Nowadays it becomes harder and harder to find them, so what to do?

Diskbased pools:
If you still use diskbased pools for VMs or databases you really need an Slog. Without a ZFS pool offers no more than maybe 10-50 MB/s sync write performance. In such a situation I would try to get one of the Optanes either new or used. As an Slog has a minimal size of only around 8GB, you may also look for a used dram based RMS-200 or RMS-300 from Radian Memory Systems - more or less the only real alternative to Intel Optane.

What I would consider:
Use a large diskbased pool for filer usage or backup only where you do not need sync write. The ZFS pool is then fast enough for most use cases. Add a second smaller/ faster pool with NVME/SSDs for your VM storage or databases and simply enable sync without an extra dedicated Slog. The ZIL of the pool (a fast pool area without fragmentation) protects then sync writes. Only care about NVMe/SSD powerloss protection (a must for sync write), low latency and high 4k write iops. As a rule of thumb, search for NVMe/SSD with plp and more than say 80k write iops at 4k. Prefer NVMe over SSD and 2x 12G/24G SAS like WD SS 530/540 or Seagate Nitro (nearly as fast as NVMe) over 6G Sata SSD.

Special vdev mirror
As an option you can also use data tiering on a hybrid pool with disks + NVMe/SSD mirror. In such a case performance critical data like small io, metadata, deduptables or complete filesystems for VMs or databases with a recsize <= a setable threshold land on the faster part of a pool based on the physical data structures. Often more efficient than classic tiering methods for hot/last data that must be moved between SSD and disks.
 
Last edited: