Napp-IT Replication Integrity After NIC Failure

sonoracomm · Dec 19, 2023

Hi all,

We have been using the Napp-IT Replication Feature for a long time as a (multi-level) backup. We've been lucky and haven't needed to restore from it, so I'm not well versed in this feature/technology.

We had an incident where a NIC failed after moving a storage server (third-level backup) to another location with much slower connectivity (VPN) to the two source servers.

Obviously there were a few Replication jobs that failed or were incomplete.

Do I need to do anything to verify or repair the replicated datasets after the same/existing jobs have resumed successful scheduled operation with no ongoing errors?

Thanks in advance,

G

gea · Dec 20, 2023

The key point is that you need identical snapshots pairs on source and destination (same repli_nn snapnumber) to continue incremental replications. With correct snapshot pairs you can simply restart or reverse a replication (set old destination filesystem to rw and create there a new replication job with same job id).

If an incremental replications fails ex due a network error, you can just restart/retry. In rare cases the last destination snap is damaged. As a napp-it replication preserves at least last three snap pairs you can destroy the newest destination snap. The next job run will then be based on the former pair.

If you do not have a snappair with same number, rename destination filesystem ex to filesystem.old and start the job for a new full initial transfer. After success destroy the filesystem.old that you had kept simply as backup.

If you (re)run a replication without errors, it was checksum protected and ok. No need to verify or repair anthing due former errors.

evawillms · Mar 20, 2024

gea said:
Drift Boss said: The key point is that you need identical snapshots pairs on source and destination (same repli_nn snapnumber) to continue incremental replications. With correct snapshot pairs you can simply restart or reverse a replication (set old destination filesystem to rw and create there a new replication job with same job id).

If an incremental replications fails ex due a network error, you can just restart/retry. In rare cases the last destination snap is damaged. As a napp-it replication preserves at least last three snap pairs you can destroy the newest destination snap. The next job run will then be based on the former pair.

If you do not have a snappair with same number, rename destination filesystem ex to filesystem.old and start the job for a new full initial transfer. After success destroy the filesystem.old that you had kept simply as backup.

If you (re)run a replication without errors, it was checksum protected and ok. No need to verify or repair anthing due former errors.

How do I know if DFS replication is working?

gea · Mar 21, 2024

ZFS cares about data integrity, distributed filesystems organizes shares.
If the first is true, the second should as well.

gea · Sep 30, 2024

With ZFS it is quite simple.
A replication finishes without an error and a new destination snap created or not (replication failed).
In the last case (network or any other problem), just restart to retry

Lother00001 · Dec 5, 2024

Great to see Napp-IT replication in action! If jobs resumed with no errors, datasets should be fine since ZFS ensures integrity. ️ For extra peace of mind, run zfs diff to check for discrepancies and consider a periodic scrub to keep backups solid!

.

gea · Dec 5, 2024

An incremental replication based ex on common snap nr 100 creates a new snap 101 on source to transfer the difference. If snap 101 appears on destination, replication was successful.

gea · Dec 15, 2024

ZFS replications are checksum protected. If a replication runs through with a new base snap on destination it successfully finished otherwise on any error (lan or other) replication stops with an error and no new base snap on destination is generated. This is not napp-it specific, this is how ZFS replication works.

When you start a incremental replication, a rollback of the destination filesystem to the last common base snap (result of a former successful replication) is always done so a former unsuccessful replication run or any other modification or write access of a destination filesystem does not matter as they are discarded due the rollback.

Resumable transfers is a method to reduce amount of data to transfer on errors mostly for large initial transfers, rollback is the method to protect incremental replications.

System checkpoints is a method to go back to a former pool vdev state ex prior adding a vdev, not related to replications.

Redundant hardware, does not matter if psu, network, hba, sas, raid etc improves availability. ZFS data protection works without.

bandia21 · Jun 24, 2025

this issue, we explore what makes this beloved chain a cultural staple, a fan favorite, and a flavor-packed experience across the South and beyond. Additionally Whataburger serves breakfast and lunch as well.

Search

Napp-IT Replication Integrity After NIC Failure

sonoracomm

New Member

gea

Well-Known Member

evawillms

New Member

gea

Well-Known Member

gea

Well-Known Member

Lother00001

New Member

gea

Well-Known Member

gea

Well-Known Member

bandia21

New Member