Bottleneck on my ZFS Proxmox Backup Server???

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

js00

New Member
Dec 6, 2022
14
0
1
What do you guys think is the bottleneck on the following system that I will be using as a Proxmox PBS backup server.

Supermicro 826
MZ32-AR0-rev-30
AMD EPYC 7313 16-Core Processor
256GB DDR4 SAMSUNG 64GB 2RX4 PC4-3200AA-R (3200MHz) (4x64GB)
12x14 TB SAS Ultrastar DC HC530
1x 480GB SSD (Boot)
2x 1.92TB PM983 (ZFS Special Mirror)
Mellanox ConnectX-3 Pro MCX314A-BCCT CX314A Dual Port 40Gb Ethernet Network Card
Adaptec ASR-71605
BPN-SAS-826TQ Backplane

How beneficial is the 8 DIMMs vs 4 with EPYC would replacing these with 8x32GB 2666 be better?
 

SnJ9MX

Active Member
Jul 18, 2019
130
83
28
why are you concerned about a bottleneck on a backup server? without knowing any specifics, I'd even say it'll be overkill for backup purposes. only thing I'd change is mirrored boot drives.
 

BackupProphet

Well-Known Member
Jul 2, 2014
1,095
658
113
Stavanger, Norway
olavgg.com
Yeah, that setup is way overkill for a backup server. I backup like 100GB every day (most data come from VM's), and this is running on an old athlon x2 with 8GB ddr2 ECC memory. Just gigabit nic and no special vdev. I even have dedup enabled for the VM's.

The reason I am still running this on an ancient system is that it works fine and I have no reason to fix something that works fine.
 

js00

New Member
Dec 6, 2022
14
0
1
Yeah, that setup is way overkill for a backup server. I backup like 100GB every day (most data come from VM's), and this is running on an old athlon x2 with 8GB ddr2 ECC memory. Just gigabit nic and no special vdev. I even have dedup enabled for the VM's.

The reason I am still running this on an ancient system is that it works fine and I have no reason to fix something that works fine.
It will be for Proxmox Backup Server running several hundred virtual machines, the verify jobs use a decent amount of CPU power. Right now I am doing this with Ryzen 5 5600X and it's too slow.

I think I may change Mellanox ConnectX-3 to something more reliable like Intel x520 DAC (as 10G is plenty) and I really don't know about the Adaptec ASR-71605 should I get something more modern, are they more power efficient/reliable?

Other than running PBS it will be used for ZFS snapshots so just bandwidth intensive.
 

BackupProphet

Well-Known Member
Jul 2, 2014
1,095
658
113
Stavanger, Norway
olavgg.com
Dedup uses memory bandwidth and io. A lot of RAM and special metadata vdevs will help. Desktop class hardware has only 1/4 of the memory bandwidth that server class hardware has. If you dont use dedup, and a ryzen 5600x is to slow, then something else is bottlenecking hard.
 

js00

New Member
Dec 6, 2022
14
0
1
Dedup uses memory bandwidth and io. A lot of RAM and special metadata vdevs will help. Desktop class hardware has only 1/4 of the memory bandwidth that server class hardware has. If you dont use dedup, and a ryzen 5600x is to slow, then something else is bottlenecking hard.
Dedup is disabled

Code:
zfs get dedup zfs
NAME  PROPERTY  VALUE          SOURCE
zfs   dedup     off            default
Are you familiar with Proxmox PBS? According to forums PBS is very resource intensive and ideally should be stored on SSD.

Anything I should run to try find the cause?
 

Sean Ho

seanho.com
Nov 19, 2019
774
357
63
Vancouver, BC
seanho.com
Is it the backing up that's slow (on your current 5600X system), or the verify? Verify naturally involves a full read of the source and the backup, and a bunch of hashing, then a bit of updating the metadata on-disk. It's very I/O intensive, and the hashing parallelises well and can eat up a whole CPU if you let it. How frequently are you verifying? Is there a time-window constraint on the verify that's not being met (e.g., must finish within 2 hrs)? Or is it that other workloads are impacted during verify?
 

js00

New Member
Dec 6, 2022
14
0
1
Is it the backing up that's slow (on your current 5600X system), or the verify? Verify naturally involves a full read of the source and the backup, and a bunch of hashing, then a bit of updating the metadata on-disk. It's very I/O intensive, and the hashing parallelises well and can eat up a whole CPU if you let it. How frequently are you verifying? Is there a time-window constraint on the verify that's not being met (e.g., must finish within 2 hrs)? Or is it that other workloads are impacted during verify?
It's only the verify jobs that take a long time sometimes more than 24 hours so I can't run it daily currently. That is why I wanted to upgrade to a EPYC 7003 CPU rather than 7002.