Will copying DEDUPED DATA via [ZFS Replication] result in UN-DEDUPED data..?

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

TrumanHW

Active Member
Sep 16, 2018
253
34
28
MY SITUATION:
It took me over 2Y before someone explained what DEDUPE does to performance..! o_O :mad:
Obviously disabling dedupe does nothing for previously written data, it just stops deduping new data.

Obviously, I wanna grant my data a governor's pardon (via the ZFS Replication Task) -- to an array which is not & will NEVER be deduped.


THE QUESTION:
Please tell me this SOLVES the issue and doesn't just kick the DeDupe can down to the server I'm transferring it to..?



MY REASONING:
  • I ask bc it's a BLOCK-LEVEL vs OBJECT-LEVEL copy task.
  • Source LAN Adapters: 1 Type: SFP+ Link Speed: 10 Gb
  • Target: LAN Adapters: 1 Type: SFP+ Link Speed: 10 Gb
  • Communicating through an L2 SFP+ (10GbE) switch.
  • The first 13 min -- the transfer rate was ~3GB/ min or ~42 MB/s
  • After ~6 hours it's at 960 GB ... which is ≤ 46 MBps ...
  • It should be faster, but, it IS copying a dataset that's probably 50% deduped.
  • Which is part of why I want to confirm ... that this isn't just replicating the problem, also.
(I forgot I'd copied some data to the zPool prior and mistakenly calculated with that aggregate instead of only what's copied today).
 
Last edited:

RageBone

Active Member
Jul 11, 2017
617
159
43
From what you write i assume you are running a Freenas or Truenas because to my knowledge, there is no replication task build into ZFS.

So i'm a bit confused by your reasoning and implied question?

It has been a long time since i did Replication stuff on my freenas but i think i could choose what underlying tool to use.
The default probably was rsync.
The zfs alternative would be zfs send.

I think it should be using rsync right now. Have a look into the task lists close to the power button.
It might list the rep task as rsync.
 
  • Like
Reactions: TrumanHW

TrumanHW

Active Member
Sep 16, 2018
253
34
28
I assume you're running a Freenas or Truenas...
Thank you for replying, yes, I'm using both. I'm replicating:
  • from FreeNAS
  • to TrueNAS.


I'm a big confused by your question ... but:
FreeNAS allows you to choose the underlying tool used, i.e., RSync or alternatively, ZFS send.

Have you ever used a low-performance system with dedupe enabled? lol.

I'm already successfully transferring (as mentioned) -- about ~ 1TB so far or 8%. (which is great).
It's using RSync I believe -- but, bc I used ZFS replication, it's also using ZFS send features (checksumming data to ensure in-tact transfers).

My concern -- is whether the data will still be DEDUPED!
I had enabled DEDUPE on the SOURCE machine, then, disabled it, but that doesn't 'undedupe' it.
My entire reason for transferring is to UN-DEDUPE the data. Dedupe is just brutal on performance.

I just want reassurance (given that under 50 MB/s is NOTHING like .... full BLOCK-LEVEL PERFORMANCE of 8x 7200 to 8x over 7200 over 10GbE.) ...

That slowness is (perhaps)... a residual effect of having INITIALLY deduped my SOURCE machine
...which, even though I disabled dedupe, doesn't revert the data to a 'duplicated' state.
...thus, I'm doing this whole exercise [TO] un-deduplicate it ... and get the performance the array should've had.
...and, that I'm not just sending the same, exact, deduplicated state of the data to another machine ... (that's going to remain deduped).

As my goal is being RID of DeDuped compression and to once again, GLORIOUSLY have ALL THE DUPES I WANT. :)

Thanks for putting up with this pedantic yet, still confusingly worded topic.
While you'll have had a better life if you're unable (blissfully ignorant) to help me...
Perhaps because misery loves comfort..? I'm hoping there's been enough aggravation to go around in order for me to be helped.

Thanks
 
Last edited:

ericloewe

Active Member
Apr 24, 2017
295
129
43
30
There's a semi-deprecated send dedup feature which operates on the stream and is independent of on-disk dedup, but I don't think there ever was support for sending deduped data straight from disk (like you can do with compression/large blocks).
So yeah, replicating to a different pool without dedup is guaranteed to work.

Also note that replicating to a new dataset on the same pool, with dedup disabled, and later deleting the source dataset should have the same effect, albeit more slowly as the DDT gets async destroyed.
 

TrumanHW

Active Member
Sep 16, 2018
253
34
28
PHEW! Thanks you guys. Honestly, given that the DeDuped transfer rate would be sub-1MB/s by any means other than ZFS Replicate...

Do you think it'll speed up perhaps once it's past the DeDuped data ..? Thus far for the last 3.6TB (over 23 hours) it's been at 46 MB/s.

THANK YOU
 

thulle

Member
Apr 11, 2019
48
18
8
Do you think it'll speed up perhaps once it's past the DeDuped data ..?
It might be the other way around, if you've already read something deduped it might be cached and not require reading from disk again. Non-deduped it's all from disk while still having to check the dedupe-table. Not sure that send-streams are cached though, so it might be no difference.
 
  • Like
Reactions: TrumanHW