How to protect a selection of data against bit rot?

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Istria

New Member
Feb 4, 2022
9
1
3
Hello all,

I have a small OpenMediaVault NAS running on a thin client (HP T630) using 3 external 2TB USB HDD's.

Most of this data is movies and series, which are not that important to me.
But I'd like to implement a system to protect some 500GB of sentimentally valuable data like family pictures and old camcorder video against corruption, rot or data loss.

Currently, I have that data saved on all 3 drives (no RAID, just 3 separate drives) in threefold. And a 4th copy on a not connected backup drive at a different location.

So the way I see it, I have the data fairly well protected against:
- Single drive failure (3 more copies)
- Catastrophic NAS failure like house fire, lightning strike, etc. (the 4th offsite backup)

However, I'd like to add protection against data rot. Because I'd like these pictures and videos be available to me uncorrupted for decades to come.

I was thinking it should be as "easy" as having 3 instances of the data. And periodically comparing the data bit by bit to see if all 3 are identical. And if there is difference in one of the instances, replace the corrupted file with the file on the other 2 instances.

My question is: Is this a good way to go? And if so, what software/scripts could I use for this?
And would it also be wise to periodically overwrite all the data (resilvering is the term I believe?). How would I go about that?

Thanks in advance. And if you have alternative better ideas, let me know! I'm open to suggestions.
 

DavidWJohnston

Active Member
Sep 30, 2020
242
191
43
There's probably a more "enterprise way" to do this, but one thing that comes to mind is to use PAR2 files like on Usenet.

It creates verification and repair files that also take up some disk space (adjustable), and can be used to repair a certain amount of corruption in the underlying files. If you store these .PAR2 files as well, you will have a way to un-corrupt them.

Using an SSD would also be a good way to extend the life. Resilvering would probably help HDDs, but not SSDs, I'm not an expert on this topic though. There might be a data recovery person on the forum who could answer better.

Here is some info about PAR2:



 

Stephan

Well-Known Member
Apr 21, 2017
929
706
93
Germany
500 GB isn't that much. Get a 5.25" blu ray writer and some Verbatim BD-R DL 50 GB media. Slimline writers are not the best writers usually. Verbatim media still the most reliable imho. Not many players left. Use The Official ImgBurn Website to write media at 1x speed or whatever is lowest speed supported. Store in dry, cool, dark place. Each disc in its own jewel case preferably.
 

Pete.S.

Member
Feb 6, 2019
56
24
8
If you want to make sure your data is intact and untampered with as you move it between drives and different mediums, you need to generate a checksum on the file level. That is what you see when you download files from reputable sources. It's usually a checksum in the form of SHA-256 file hash. ZFS or RAID can't help you with that.

You can generate checksums on a bunch of files by installing the md5deep package. There you have the sha256deep utility that can both generate and check sha-256 file hashes for you. You can also use the md5deep utility if you want to higher speed.

Use it like this:

First let's create an sha256 file for all files and subdirectories that you can use to compare with:

sha256deep -r /familypics/ > sha256sums

Let's verify that the files have not been tampered with.

sha256deep -rX sha256sums /familypics/


To not have to make copies manually it's better to store the files on a RAID array or ZFS if you prefer that. And then have a backup off-site. Since it's only 500GB it's probably easiest to store it in the cloud somewhere. That way you can have the backup done automatically and not manually. You can use for example rclone to sync your files on the NAS to the cloud.
 
Last edited: