Any benchmarks that show resilver time that compares Linux MDADM vs a hardware card?

Road Hazard · Jan 25, 2023

Been using Debian on my Plex server for a while. Whenever I add drives to expand the array, disk I/O takes a nose dive as MDADM eats up everything and Plex struggles to stream a single show. I don't know enough about Linux (or MDADM) to issue commands that instruct it to yield disk I/O to programs like Plex or my FTP server when it's doing a resilver.

I -LOVE- how flexible MDADM is because in the couple of years I've been using it, I had to pull all the drives out and move them to another box 2 times. Each time, MDADM saw the array and happily imported it and away it went. I know that with hardware RAID cards, there's the potential problem of buying a new replacement card and it not being able to import your existing array due to a BIOS mismatch or something else.

Since I'm not a fan of ZFS, and worry a TINY TINY bit about bit-rot, I like how hardware cards will look at data on the drive (and the parity bits) and if data set 1 and parity match but data set 2 doesn't match, data set 2 will be fixed. With MDADM, it assumes the data bits are always correct and recalculate the parity if something doesn't line up.

TLDR: Currently, when I add 2, 4TB drives to my existing array, MDADM takes about 18 hours to reshape. I've always wondered if using a dedicated hardware RAID card would cut that time in half so I'm just looking for benchmarks of MDADM vs a hardware card while reshaping an array.

EDIT: My CPU is an i7 7700K and all hard drives are WD Red 4TBs sitting in a Super Micro 846 chassis.

Thanks

BackupProphet · Jan 25, 2023

Is the only reason why youre not liking ZFS that you cannot expand the pool with a single vdev? A hardware raid will be significantly faster, as parity calculation IO can be done in memory. But, you don't get compression, snapshots, self-healing and so on.

CyklonDX · Jan 26, 2023

everything has trade-off on implementations. It highly depends on your disks rand seek read/write as whole data has to be rehashed, and rebuilt to span over more disks.

At home, i prefer zfs, and i don't add vdevs - i just create another pool, create my media folders and simply make jellyfin (in my case) scan those new folders additionally.
(I also have same chassis, and i do pool with 4 disks, and then i add 4 additional disks, and create new pool when i need to expand. Since i have so much space for disks i can easily upgrade other disks to larger ones by loading, and creating bigger pools with bigger disks.)

(the first pool is actually local SSD's for kvms - not in the storage array - as i made my chassis into jbod)

sko · Jan 26, 2023

Of course you can add more vdevs to a pool - actually that's the default way to extend a pool...
For small pools just use mirrors - they offer by far the most flexibility, ease of configuration and predictable space. raidz loses *a lot* of space for padding, especially with very low disk numbers (e.g. 3 or 4 drives per vdev), is comparably (very) slow and takes _a lot_ of time for resilvering.
Especially in single-vdev pools you should not expect any reasonable IOPs for anything but a low-load data grave. With raidz vdevs you absolutely *needs* to spread load across multiple vdevs to get some decent performance, so only use vdevs in large pools or if you are extremely constrained in physical space.

As for mdraid or lvm(2): yes, those are horribly slow with todays disk sizes and should be considered a thing of the past. lvm(2) is practically unusable after a few disk changes/upgrades (an its 'snapshot' implementation is worthless in production...) whereas a ZFS pool works perfectly fine even if all disks of the pool have been replaced several times. I still run pools that have been created 6 or 7 years ago and have their third or even fourth generation of disks and/or more or less vdevs than they were created with. Those pools perform exactly the same as newly created ones on similar hardware...

i386 · Jan 26, 2023

Road Hazard said:
I've always wondered if using a dedicated hardware RAID card would cut that time in half so I'm just looking for benchmarks of MDADM vs a hardware card while reshaping an array.

I can speak only for microchi/adaptec controllers, I don't have other brands at home.
Hardware raid "reshaping" the raid is slow*. These controllers do the reshaping in small transactions and small "chunks" and will prioritize io from the host over the "reshaping" task which will extend the time drastically.

I played around with mdadm a few months ago and you could set limits how "fast" mdadm should do certain tasks -> mdadm was faster (but I think normal io was extremly slow, didn't test it)

*I could write at ~300 MByte/s (sequential) to raid 6 with 14 hdds being expanded to 16 hdds, with fewer disks the will be slower.

gb00s · Jan 26, 2023

Did you try out Bcache with MDADM?

uldise · Jan 26, 2023

how you connect your disks? motherboard ports?

Road Hazard · Jan 26, 2023

BackupProphet said:
Is the only reason why youre not liking ZFS that you cannot expand the pool with a single vdev? A hardware raid will be significantly faster, as parity calculation IO can be done in memory. But, you don't get compression, snapshots, self-healing and so on.

Yes, that's a big problem I have with ZFS..... not being able to add 1 or 2 drives to expand my array at a time. Last I heard. this feature was being worked on.

sko said:
As for mdraid or lvm(2): yes, those are horribly slow with todays disk sizes and should be considered a thing of the past. lvm(2) is practically unusable after a few disk changes/upgrades (an its 'snapshot' implementation is worthless in production...) whereas a ZFS pool works perfectly fine even if all disks of the pool have been replaced several times. I still run pools that have been created 6 or 7 years ago and have their third or even fourth generation of disks and/or more or less vdevs than they were created with. Those pools perform exactly the same as newly created ones on similar hardware...

Yes it is.... it's slow but stable and is pretty much bulletproof. I really need to figure out how to make it yield I/O when something else needs to access the array.

i386 said:
I can speak only for microchi/adaptec controllers, I don't have other brands at home.
Hardware raid "reshaping" the raid is slow*. These controllers do the reshaping in small transactions and small "chunks" and will prioritize io from the host over the "reshaping" task which will extend the time drastically.

I played around with mdadm a few months ago and you could set limits how "fast" mdadm should do certain tasks -> mdadm was faster (but I think normal io was extremly slow, didn't test it)

*I could write at ~300 MByte/s (sequential) to raid 6 with 14 hdds being expanded to 16 hdds, with fewer disks the will be slower.

With my existing MDADM RAID 6 setup (20, 4TB drives with 1 hot spare), I can write to it at 500+MB/s. When it's reshaping after me adding 2 drives.... write performance gets cut in half but reading....... the hit on that is a negative number

.

gb00s said:
Did you try out Bcache with MDADM?

I've read up on BcacheFS but it needs some more 'maturity' under its' belt before I'd run it on a production system.

uldise said:
how you connect your disks? motherboard ports?

They're all 7200RPM SATA drives plugged into a back plane that uses an SFF cable into a LSI 9207 (I think) HBA.

BackupProphet · Jan 26, 2023

BCache is not BcacheFS, you also have lvm-cache

oneplane · Jan 26, 2023

Hardware RAID cards are mostly pointless, unless you are stuck with an OS or Hypervisor that doesn't know how to deal with hardware properly (like ESX or Windows). They are also only sold on the extreme ends: a single-node server with either 1 OS or a lame hypervisor, or when you buy some multi-rack 3PAR or EMC thing. Not much in between.

UhClem · Jan 26, 2023

Road Hazard said:
... I really need to figure out how to make it yield I/O when something else needs to access the array. ...

(maybe,) ionice

gb00s · Jan 27, 2023

gb00s said:
Did you try out Bcache with MDADM?

... otherwise checkout OpenCAS as cache solution >> Open Cache Acceleration Software | Open CAS

Road Hazard · Jan 27, 2023

BackupProphet said:
BCache is not BcacheFS, you also have lvm-cache

Snap. Sure, I've been using Linux on my server for a few years but I still consider myself a newbie with it.

I'll check that out, thanks!

oneplane said:
Hardware RAID cards are mostly pointless, unless you are stuck with an OS or Hypervisor that doesn't know how to deal with hardware properly (like ESX or Windows). They are also only sold on the extreme ends: a single-node server with either 1 OS or a lame hypervisor, or when you buy some multi-rack 3PAR or EMC thing. Not much in between.

No hypervisor here. Running Debian 11 on bare metal.

UhClem said:
(maybe,) ionice

In the years I've been using mdadm (and searching everywhere for ways to try and tame disk I/O) I have never, ever came across any mention of that program. Thank you!

gb00s said:
... otherwise checkout OpenCAS as cache solution >> Open Cache Acceleration Software | Open CAS

I'll read up on that, thanks!

Search

Any benchmarks that show resilver time that compares Linux MDADM vs a hardware card?

Road Hazard

Member

BackupProphet

Well-Known Member

CyklonDX

Well-Known Member

sko

Active Member

i386

Well-Known Member

gb00s

Well-Known Member

uldise

Active Member

Road Hazard

Member

BackupProphet

Well-Known Member

oneplane

Well-Known Member

UhClem

just another Bozo on the bus

gb00s

Well-Known Member

Road Hazard

Member