zfs deduplication on BMP file

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

phoenixhua

New Member
Feb 24, 2022
3
0
1
I know video files are encrypted, and encryption is supposed to make the contents of the file look random. So deduplication does not take affect on video files.
BMP files are lossless and not encrypted, why the zfs deduplication does not take affect on BMP files?
 

ericloewe

Active Member
Apr 24, 2017
293
128
43
30
No, you are mistaken. Most media is compressed, and most codecs have many options. But, unless you go around converting formats left and right and playing around, there's no impact on deduplication (which always sucks, by the way), because the same file, compressed the same way, with the same compressor, will be identical.

Compression, on the other hand, is a very different matter. ZFS implements lossless compression schemes, for obvious reasons, which is inherently limited in how much can be gained. Media is typically compressed in a lossy, but acceptable manner - for much better compression ratios - and lossless compression is normally already a part of that process. So, compressed media is unlikely to benefit from additional generic compression. However, LZ4 is safe to have enabled in ZFS even then, performance-wise, which is nice if you have mixed media and other data in the same dataset.
 

Rand__

Well-Known Member
Mar 6, 2014
6,626
1,767
113
I have never used deduplication but I think it works differently than you assume it does...

It should work at block level, and if there are blocks that are identical they can be deduplicated.

But o/c it might be different on linux?

So if you copy the same bmp to a pool with deduplication 10000 times it should not use 10000 times the size of a single copy.
Have you tested it this way? Or how did you test and why you think its not working?
 

phoenixhua

New Member
Feb 24, 2022
3
0
1
I have never used deduplication but I think it works differently than you assume it does...

It should work at block level, and if there are blocks that are identical they can be deduplicated.

But o/c it might be different on linux?

So if you copy the same bmp to a pool with deduplication 10000 times it should not use 10000 times the size of a single copy.
Have you tested it this way? Or how did you test and why you think its not working?
I sampled 200 frames from a camera stream, saved them as bmp files to zpool. the zpool list command shows DEDUP is 1.00x.
 
Last edited:

Rand__

Well-Known Member
Mar 6, 2014
6,626
1,767
113
So they are 200 different files with similar content... you might want to split and diff them to see if there are identical blocks
 

dswartz

Active Member
Jul 14, 2011
610
79
28
Or create a text file with a few KB of text, then copy the file several times with different names?
 

Rttg

Member
May 21, 2020
71
47
18
Also, unless you really know what you’re doing (or you’re willing to totally rebuild your storage pools), don’t use deduplication. You may not see the results you expect in terms of space savings, and performance can really fall off a cliff without careful planning.

For 99.9% of scenarios, use compression, not dedup.
 

Rand__

Well-Known Member
Mar 6, 2014
6,626
1,767
113
It's quite possible that's why I said to test it;)

Presumably the deduplication block size is the datasets block size so you could increase your chances by running a lower bs there...
 

ericloewe

Active Member
Apr 24, 2017
293
128
43
30
Everywhere the same on Open-ZFS (Free-BSD, Illumos/OmniOS/OI, Linux, OSX, Windows)
but different on native Oracle Solaris ZFS
This specific thing might even be the same over at Oracle ZFS, because it's an old feature. Could be wrong, though.

It's quite possible that's why I said to test it;)

Presumably the deduplication block size is the datasets block size so you could increase your chances by running a lower bs there...
For live video? I doubt it very much, there's a good reason motion vectors have been a thing since at least H.261 in the 80s.
 

Rand__

Well-Known Member
Mar 6, 2014
6,626
1,767
113
I have no clue, never played with deduplication. Thats why I said give it a try;)