What are alternatives to ZFS for deduplication?

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

ATS

Member
Mar 9, 2015
96
32
18
48
The HP Superdome was originally an Unix server up to 64-sockets. HP switched from Itanium cpus, to the x86 cpus. And slapped on Linux ontop and reduced to 16-sockets, apparently. Why did HP not keep 64-sockets? Is that because Linux scales bad? Or "to keep the cost down"?
Because there is actually very little market for large SMP machines. And those few large SMP machines which are sold are almost never used as large SMP machines, instead being partitioned, either hard or soft, into generally much smaller machines. And that market is further shrinking as core counts per socket expand.

It does not make sense to keep the cost down, as big SMP servers are the highest premium segment costing very many millions. For instance, the old IBM P595 costed $35 million, only 32-sockets. Money is not an issue in this segment. This segment is the most expensive segment, and commands the highest prices. It is like selling the most expensive sports car with plastic interior, to "keep the cost down". Or sell the most expensive car with only 200 horse power effect motor, to "keep the cost down". These things will never happen, if you can afford a million dollar sports car, then you can afford a decent interior or a better engine. Money is not an option. My take is that, because there have never been a large 16-socket Linux server before, Linux must necessarily scale bad. So it would not be a point of selling Linux onto their 64-socket Infinity servers. HP tried to do that before, google "Big Tux Linux", where Linux had utterly bad scaling on 64-socket server with cpu utilization of 40% or so, under full load. HP learned from their lessons, and now repackages their Unix servers with Linux, and decreasing size.
Money is certainly an issue in this segment. For the most part the large SMP machines are considered by everyone to be fully legacy machines. No one designing a new application system is targeting large SMP machines because they are all cost/performance losers. And as I pointed out, you pricing on the P595 doesn't pass the laugh test, esp since we have historical documented list pricing for a fully configured P595 from the 2008 TPC-C submission that completely disagrees with your statement. And Linux scales fairly well, despite your opinion on the matter.

This Linux kernel developer, did not understand that Bonwick (the father of ZFS) talked about SMP servers, and not clusters. There are no 1024 SMP servers out there, and never has been. The largest SMP servers are old Unix servers, Solaris, AIX, HP-UX, with 64-sockets (Solaris had a 144-socket server years ago). Linux largest server was 8-sockets until the last year, when HP repackaged their Unix servers to Linux. There never have existed 16-socket Linux servers before, let alone 1024 cpu SMP servers. Lot of ignorance (FUD?) from the Linux camp.
No the largest SMP servers ever released were SGI Origin 2K/3K(MIPS/IRIX based) and SGI Altix 3K/4K(IA64/Linux based) machines supporting upwards of 1024 sockets in fully cache coherence SMP systems. The x86 based successor to these systems the UV 2K and UV 300 systems support up to 256/32 sockets respectively in a single cache coherent SMP system and run Linux. As far as 16 socket x86 servers, IBM had them a decade ago. And yes, they ran Linux.

So, while you may be ignorant of the actual history, the reality is that it exists and you can still buy them today if you want.
 
Last edited:
  • Like
Reactions: T_Minus and Patrick

mstone

Active Member
Mar 11, 2015
505
118
43
46
Yup, linux scaled on SGI hardware to 1024 processors almost a decade ago and commonly ran in production partitions of 512. Their current generation has 256 sockets/4096 cores. The poster above who pointed out that sun's offerings were dogs is correct. Most of their systems were chopped up in order to support a number of smaller images in an HA configuration rather than being used for HPC workloads. (The achilles heel was always memory bandwidth, which is why SGI moved to NUMA almost 20 years ago and as sun did eventually. It is true that Sun delivered the high point for UMA SMP machines, but there's a reason everyone has moved away from that architecture. It probably wouldn't have gotten as far as it did if it weren't for the dot com dollars flowing in. (From a cost perspective it almost never made sense to buy an e10k to run a bunch of systems instead of buying a number of smaller systems unless you truly had money to burn and really wanted to be able to point to a giant purple rack.) Mainstream linux/x86 systems don't scale like that not because it's impossible, but because very few people want to do it. Application development for scaling to large systems is hard, and cache coherency has a cost. The reality is that sparc is a strictly legacy technology that's being revved for customers already tied to it (including oracle's own in-house products).
 
Last edited:
  • Like
Reactions: T_Minus and Patrick

Baddreams

New Member
Mar 15, 2011
21
7
3
I'm reading in the FreeNAS manual that I should allow 5GB of RAM for every 1 TB of deduplicated data. Is there a way to get the same deduplication result, maybe without ZFS, but without involving large amounts of additional RAM?

I'd like to consolidate some old (full) backups, and so a lot of the files will definitely be duplicate from one backup to the next.
Yes, deduplication in Windows Server 2012 R2. It works amazingly well.
 

uOpt

Member
Dec 12, 2014
43
7
8
Boston, MA, USA
I found that dedup doesn't need a lot of RAM during normal operations. Massive amounts of fast deletions might. I found that re-mounting a filesystem after a crash will require large amounts of RAM for mounting. However, nothing bad happened to it on low RAM, it just wouldn't complete until I temporarily connected the entire array to a large RAM mainboard.
 

general.zod2

New Member
Oct 18, 2016
5
0
1
45
This is old, but the HAMMER filesystem on DragonflyBSD does deduplication. It works quite well. But then you'd need to use DragonflyBSD because the filesystem isn't available elsewhere.
 

BackupProphet

Well-Known Member
Jul 2, 2014
1,083
640
113
Stavanger, Norway
olavgg.com
You can do dedup on file level. There is a program called rdfind which will create hardlinks for your files. This means that two equals files will refer to the same inode, and if delete one of them it will not ruin the other hardlink. If you remove both of them, then the inode that hardlinks points will be deleted.
 
  • Like
Reactions: Kybber