ZFS Dedup question

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

mixer

Member
Nov 26, 2011
92
0
6
Should I turn on Dedup at the Pool level, or at the ZFS Folder level?

If I have one Pool with dedup turned off in the NAPP-IT Gui, but I have two separate ZFS Folders with dedup turned on, will there be deduping between the two ZFS folders, or only inside of each one?

Put another way - if I have the exact same file in ZFS Folder #1 and also in ZFS Folder #2, will it be correctly caught if dedup is on for both of those ZFS folders, or must it be on at the higher Pool level?

Thanks!
 

gea

Well-Known Member
Dec 31, 2010
3,177
1,199
113
DE
Dedup should work between folders when you activate dedup on both.
But unless you really know about the RAM problems of realtime dedup (like with ZFS), you should NOT use it.

As a common rule, you should have enough RAM for OS + ARC caching (to have a fast server) + about 3-5 GB RAM per TB dedup-data. If you cannot achive dedup rates of a factor 5 and more, disks are much cheaper and mostly faster.

If you do not have the RAM and your dedup table must swap from RAM to disk, even a simple snap destroy can last hours to days.
 

mixer

Member
Nov 26, 2011
92
0
6
Hmm. Something to think about there. Here is my current ARC statistics output, with three 750 GB drives in RAID-Z1:

Code:
ARC Readcache: arcsummary 
http://cuddletech.com/arc_summary 

System Memory:
	 Physical RAM: 	12279 MB
	 Free Memory : 	1528 MB
	 LotsFree: 	191 MB

ZFS Tunables (/etc/system):

ARC Size:
	 Current Size:             7954 MB (arcsize)
	 Target Size (Adaptive):   7954 MB (c)
	 Min Size (Hard Limit):    1406 MB (zfs_arc_min)
	 Max Size (Hard Limit):    11255 MB (zfs_arc_max)

ARC Size Breakdown:
	 Most Recently Used Cache Size: 	 81% 	6451 MB (p)
	 Most Frequently Used Cache Size: 	 18% 	1502 MB (c-p)

ARC Efficency:
	 Cache Access Total:        	 3275344802
	 Cache Hit Ratio:      99%	 3269711001   	[Defined State for buffer]
	 Cache Miss Ratio:      0%	 5633801   	[Undefined State for Buffer]
	 REAL Hit Ratio:       95%	 3126276786   	[MRU/MFU Hits Only]

	 Data Demand   Efficiency:    99%
	 Data Prefetch Efficiency:    46%

	CACHE HITS BY CACHE LIST:
	  Anon:                        4% 	 139759292            	[ New Customer, First Cache Hit ]
	  Most Recently Used:          0% 	 26684665 (mru)      	[ Return Customer ]
	  Most Frequently Used:       94% 	 3099592121 (mfu)      	[ Frequent Customer ]
	  Most Recently Used Ghost:    0% 	 211365 (mru_ghost)	[ Return Customer Evicted, Now Back ]
	  Most Frequently Used Ghost:  0% 	 3463558 (mfu_ghost)	[ Frequent Customer Evicted, Now Back ]
	CACHE HITS BY DATA TYPE:
	  Demand Data:                 2% 	 73702267 
	  Prefetch Data:               0% 	 744969 
	  Demand Metadata:            92% 	 3029055461 
	  Prefetch Metadata:           5% 	 166208304 
	CACHE MISSES BY DATA TYPE:
	  Demand Data:                 5% 	 292912 
	  Prefetch Data:              15% 	 866653 
	  Demand Metadata:            32% 	 1853526 
	  Prefetch Metadata:          46% 	 2620710 
---------------------------------------------
I guess having 0% MRU and MFU_Ghost means I have plenty of RAM for my current usage. I have dedup on for a couple of ZFS folders but I was hoping to try more.

My useage of the system will be similar with the new drives which will be three 3GB drives in RAID-Z1. Also, I could put in an SSD as an ARC Cache which I understand would hold the dedup table(s) -- I hope there is a mechanism so if the SSD dies the dedup table is frequently backed-up somewhere on the disk set???

Thank you for your advice Gea, please continue to help me learn! What is your advice?
 

mixer

Member
Nov 26, 2011
92
0
6
OK... I admit I'm just a little excited about ZFS and want to turn on all the features I can. Still, I think there could be some benefit to dedup for me. I only recently added a bunch of RAM to the system and turned on dedup on a few existing ZFS Folders (which I understand only dedups new data being stored). Dedup is at 1.07 right now.

The use cases would be:

a) ESXi virtual machines which are stored on an NFS shared ZFS folder (probably not more than a few gigs to save there though)
b) some file archives: misc. software install images, files I want to hang on to but don't need on any local machine (not much to dedup probably there either)
c) family 'media' folder that could contain multiple copies of the same home movies, music libraries, etc
d) work files which could be both in a project backup ZFS folder and also in a separate client-accessed download folder (big zip files, this could come up to 100 gigs maybe in a worst-case scenario)

I know Gea was commenting on RAM prices versus disk prices, but if it's true an inexpensive 64 or 128 GB MLC SSD would hold the dedup tables in lieu of having more RAM, maybe I could try that. I have 12GB of RAM dedicated to the OI VM right now. Does it work that way?

What are your thoughts, sotech?
 

sotech

Member
Jul 13, 2011
305
1
18
Australia
OK... I admit I'm just a little excited about ZFS and want to turn on all the features I can. Still, I think there could be some benefit to dedup for me. I only recently added a bunch of RAM to the system and turned on dedup on a few existing ZFS Folders (which I understand only dedups new data being stored). Dedup is at 1.07 right now.

The use cases would be:

a) ESXi virtual machines which are stored on an NFS shared ZFS folder (probably not more than a few gigs to save there though)
b) some file archives: misc. software install images, files I want to hang on to but don't need on any local machine (not much to dedup probably there either)
c) family 'media' folder that could contain multiple copies of the same home movies, music libraries, etc
d) work files which could be both in a project backup ZFS folder and also in a separate client-accessed download folder (big zip files, this could come up to 100 gigs maybe in a worst-case scenario)

I know Gea was commenting on RAM prices versus disk prices, but if it's true an inexpensive 64 or 128 GB MLC SSD would hold the dedup tables in lieu of having more RAM, maybe I could try that. I have 12GB of RAM dedicated to the OI VM right now. Does it work that way?

What are your thoughts, sotech?
We tried dedup in a number of different systems and each time it was not worth the consequences (performance hits and what Gea mentions above with snapshot destruction taking an extraordinary amount of time) - I think it fits for a specific usage scenario which few home users will ever fit. Compression, on the other hand, is well worth the CPU usage for most people as it gives a notable performance and space benefit, depending on the data type. Some reading:

http://constantin.glez.de/blog/2011/07/zfs-dedupe-or-not-dedupe

http://icesquare.com/wordpress/how-to-improve-zfs-performance/#section13

<-- similar performance hit to what we experienced - he dropped from 80MB/s to 5MB/s, we went from about 150MB/s to 8MB/s.

More poor experience;

http://christopher-technicalmusings.blogspot.com.au/2011/07/zfs-dedup-performance-real-world.html

20x performance decrease there.

Lastly:

http://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSDedupMemoryProblem
 

Mike

Member
May 29, 2012
482
16
18
EU
I was thinking the other day about having deduplication activated during the night and not during work hours. I do not know how fast dedup recovers from being not activated for a day as i stopped using ZFS years ago, but it should be great for a home user if it is fast enough. Too bad btrfs currently doesn't support it, dedup on subvolumes should be great i think. Let's wait another kernel release before migrating ;)
 

sotech

Member
Jul 13, 2011
305
1
18
Australia
I was thinking the other day about having deduplication activated during the night and not during work hours. I do not know how fast dedup recovers from being not activated for a day as i stopped using ZFS years ago, but it should be great for a home user if it is fast enough. Too bad btrfs currently doesn't support it, dedup on subvolumes should be great i think. Let's wait another kernel release before migrating ;)
With regard to how fast it recovers... any deduped data will stay deduped, even after you turn it back off. Would there be much data written during the night to dedupe?
 

Mike

Member
May 29, 2012
482
16
18
EU
I was under the impression that it would scan for duplicate data when enabled but you're saying it only checks on stuff when it's written. That's too bad.
That makes it a bit tougher to benefit in an offline dedup case. Would likely have to copy, (duplicate :)), and then remove the original files with dedup on to benefit anything at all. Would have to have the space for three copies of duplicated blocks though.

I guess it isn't worth it in it's current form.
 

mixer

Member
Nov 26, 2011
92
0
6
Sotech and Gea, thanks for taking the time to talk sense into me. Those links were great. No dedup for me... it seems that even with a good SSD L2ARC I would still experience reduced write speed as my set begins to fill up.
 

sotech

Member
Jul 13, 2011
305
1
18
Australia
I was under the impression that it would scan for duplicate data when enabled but you're saying it only checks on stuff when it's written. That's too bad.
That makes it a bit tougher to benefit in an offline dedup case. Would likely have to copy, (duplicate :)), and then remove the original files with dedup on to benefit anything at all. Would have to have the space for three copies of duplicated blocks though.

I guess it isn't worth it in it's current form.

Sotech and Gea, thanks for taking the time to talk sense into me. Those links were great. No dedup for me... it seems that even with a good SSD L2ARC I would still experience reduced write speed as my set begins to fill up.
Yeah, it's kind of a shame that it really isn't worth it in it's current form for most home users - it's a pretty nifty idea and if some smart boffin figures out a different way of doing it it could work out quite well. As it stands, though, I have yet to meet anyone who benefited from it.

Try the various compression algorithms and see how you go there :)