Throughput benchmark check (36 disks, RAIDZ1, 3 ZDEV, 9400-16i)

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

68kdmd

New Member
Oct 25, 2024
15
2
3
Hello, everyone.


Currently running

Epyc 7302P
128GB (8x 16GB) 2667Mhz
36x 10TB HGST ultrastar in RAIDZ1 (width 12, 3 ZDEV) just for performance testing inside SC847 with BPN-SAS3-846EL1 and BPN-SAS3-826EL1 backplanes.

HBA is Lenovo 430-16i (9400-16i fw in tri-mode) in PCIe 3.0 x8 slot. 4 miniSAS HD connected to the backplanes.(2 each)

All the drives all recognized as 12Gbps in both the BIOS and storcli64.

No optional disk data sets were configured (dedup, logs, and etc).

I’m seeing about 2.6GB/s writes and 4+GB/s reads when i run

fio --ramp_time=5 --gtod_reduce=1 --numjobs=1 --bs=1M --size=100G --runtime=60s --readwrite=write --name=testfile

Is this in line with what’s expected? If not, what’s the bottleneck in my setup? All 32 threads hover around 30-40% when running fio.

I was thinking that the drives should be able to saturate the SAS3 ports.

Front backplane: (2 connecters, 4 lanes) = 8 lanes * 12Gb = 96 Gbps
Each drive = ~2 Gbps, 24 drives = 48 Gbps

Rear backplane: (2 connecters, 4 lanes) = 8 lanes * 12Gb = 96 Gbps
Each drive = ~2 Gbps, 12 drives = 24 Gbps

total estimated SAS3 link to hdd = 70 Gbps

PCIe 3.0 x8 = ~8 GB/s

But it seems like I’m getting less than half, and was just curious what I’m not taking into consideration.

Each drive reports around 250MB/s when tested individually.

8 drives out of 36 are 4k, and the rest are 512B.

Thanks, everyone, and hope to interact with you more down the road.
 

Attachments

ca3y6

Active Member
Apr 3, 2021
125
60
28
For writes, I suspect RAID 5 parity becomes the binding performance constraint for any implementation (some more than others).
 
  • Like
Reactions: 68kdmd

VMman

Active Member
Jun 26, 2013
139
60
28
Something you can check is the “Performance” settings in the BIOS including disabling c-states to ensure the pci-e card can sustain the burst speeds.
 
  • Like
Reactions: 68kdmd

gea

Well-Known Member
Dec 31, 2010
3,351
1,307
113
DE
You have 33 datadisks. If you divide the more relevant write value of 2600 /33 you get near to 80 MB/s per disk. Given that ZFS does not scale linear with many disks or vdevs, is more iops limited as it spreads datablocks evenly over the pool this is not very good but also not a very bad value, more as expected. PCI-e bandwidth and twin miniSAS3 does not limit. What may be relevant is fillrate and recsize. Fastest should be a fillrate below 70% and a recsize of 1M on diskbased Z1.

What you should monitor is iostat of disks. Under load all should perform similar, the weakest disk limits pool performance.

250MB/s per disk is a maximum on a pure sequential (track by track in the outer range of disks) load. This is not how ZFS works in a Raid where it is not optimized to deliver a movie for a single user as fast as possible but to guarantee a constant performance in a Raid with many files and users. As a thumb rule with less disks I count 100-120 MB/s per average disk in a Raid-Z and mixed load, less with many disks like your setup.

512B disks should be avoided, force ashift 12
 
Last edited:
  • Like
Reactions: 68kdmd and pimposh

BoredSysadmin

Not affiliated with Maxell
Mar 2, 2019
1,073
442
83
4-5 years ago we bought a somewhat similar system from iXsystems. At the time, our requirements were to 100% saturate a 10 gig nic on long sequential writes (think dozens of TB). Our design ended by being also 36 HD drives, but raidz2 of 6 drivers in 6 VDEVs. You get about 2x of our speed in your tests, so I'd say it's pretty good if you ask me. Gea already gave you a more detailed tech explanation as to why.
 
  • Like
Reactions: 68kdmd and pimposh

68kdmd

New Member
Oct 25, 2024
15
2
3
Thank you for the replies, everyone. It helped me understand raidz more.

I ran more tests without parity as recommended (and with/without in different zdev counts).

Parity definitely made the biggest difference, and I could see the number of zdev affecting the throughput as well.

it was a learning experience for me, and I'll do another round of testing with the c-state off, as @VMman suggested.


Thanks again, everyone!
 

Attachments

gea

Well-Known Member
Dec 31, 2010
3,351
1,307
113
DE
512e = 4k physical
For ZFS this is the same as 4k native disks so ashift 12 is selected automatically for both