Napp-IT Replication Integrity After NIC Failure

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

sonoracomm

New Member
Feb 10, 2017
7
0
1
66
Hi all,

We have been using the Napp-IT Replication Feature for a long time as a (multi-level) backup. We've been lucky and haven't needed to restore from it, so I'm not well versed in this feature/technology.

We had an incident where a NIC failed after moving a storage server (third-level backup) to another location with much slower connectivity (VPN) to the two source servers.

Obviously there were a few Replication jobs that failed or were incomplete.

Do I need to do anything to verify or repair the replicated datasets after the same/existing jobs have resumed successful scheduled operation with no ongoing errors?

Thanks in advance,

G
 

gea

Well-Known Member
Dec 31, 2010
3,282
1,266
113
DE
The key point is that you need identical snapshots pairs on source and destination (same repli_nn snapnumber) to continue incremental replications. With correct snapshot pairs you can simply restart or reverse a replication (set old destination filesystem to rw and create there a new replication job with same job id).

If an incremental replications fails ex due a network error, you can just restart/retry. In rare cases the last destination snap is damaged. As a napp-it replication preserves at least last three snap pairs you can destroy the newest destination snap. The next job run will then be based on the former pair.

If you do not have a snappair with same number, rename destination filesystem ex to filesystem.old and start the job for a new full initial transfer. After success destroy the filesystem.old that you had kept simply as backup.

If you (re)run a replication without errors, it was checksum protected and ok. No need to verify or repair anthing due former errors.
 

evawillms

New Member
Oct 6, 2023
2
0
1
Drift Boss said: The key point is that you need identical snapshots pairs on source and destination (same repli_nn snapnumber) to continue incremental replications. With correct snapshot pairs you can simply restart or reverse a replication (set old destination filesystem to rw and create there a new replication job with same job id).

If an incremental replications fails ex due a network error, you can just restart/retry. In rare cases the last destination snap is damaged. As a napp-it replication preserves at least last three snap pairs you can destroy the newest destination snap. The next job run will then be based on the former pair.

If you do not have a snappair with same number, rename destination filesystem ex to filesystem.old and start the job for a new full initial transfer. After success destroy the filesystem.old that you had kept simply as backup.

If you (re)run a replication without errors, it was checksum protected and ok. No need to verify or repair anthing due former errors.
How do I know if DFS replication is working?
 
Last edited:

gea

Well-Known Member
Dec 31, 2010
3,282
1,266
113
DE
ZFS cares about data integrity, distributed filesystems organizes shares.
If the first is true, the second should as well.
 

aswet456

New Member
Sep 30, 2024
1
0
1
The key point is that you need identical snapshots pairs on source and destination (same repli_nn snapnumber) to continue incremental replications. With correct snapshot pairs you can simply restart or reverse a replication (set old destination filesystem to rw and create there a new replication job with same job id).

If an incremental replications fails ex due a network error, you can just restart/retry. In rare cases the last destination snap is damaged. As a napp-it replication preserves at least last three snap pairs you can destroy the newest destination snap. The next job run will then be based on the former pair.

If you do not have a snappair with same number, rename destination filesystem ex to filesystem.old and start the job for a new full initial transfer. After success destroy the filesystem.old that you had kept simply as backup.

If you (re)run a replication without errors, it was checksum protected and ok. No need to verify or repair anthing due former errors.
Napp-IT's replication integrity is crucial for ensuring data consistency, especially after a network interface card (NIC) failure. In such scenarios, the system must implement failover mechanisms to maintain data integrity and continue operations without disruption. This can involve validating replication status and verifying that all data is correctly synchronized across storage nodes. Implementing robust error detection and correction protocols is essential to mitigate any potential data loss during the NIC failure. Additionally, incorporating reliable storage solutions, such as Koville cabinets, can enhance the overall system resilience, providing a secure and efficient environment for data storage and management.
 

gea

Well-Known Member
Dec 31, 2010
3,282
1,266
113
DE
With ZFS it is quite simple.
A replication finishes without an error and a new destination snap created or not (replication failed).
In the last case (network or any other problem), just restart to retry