Is this a reason why a dedicated RAID controller would be better vs. MDADM?

Road Hazard · Jul 12, 2020

Using Debian 10.4 and I have a software MDADM RAID 6 array formatted as XFS. Zero complaints BUT, I think I read that during a weekly scrub if MDADM detects an error in a RAID 6 setup, it doesn't hold an election process and just assumes the parity drive is always correct. If that is correct, would this be ONE good reason why a dedicated RAID card would be better when compared to an MDADM RAID?

I have a Supermicro 846 24 bay server with a SAS2 backplane. The SM has a single SFF-8087 port on the backplane that I feed into an LSI HBA. If I were to switch to using a dedicated RAID card, would I be able to plug the backplane straight into that and remove the HBA? What sort of "got ya!" issues do I need to be on the lookout for if I switch to a RAID card in my setup? I know the Dell H700 is a popular card but I saw a post on Reddit (that went unanswered) about using a H700 with a SM and they talked about how the H700 only supported a total of 16 drives....and I have 20 in my server.

PS Before anyone suggests it, I have no desire to use BTRFS or ZFS.

gea · Jul 13, 2020

In a Raid-6 array, the OS sees the array as a single disk. A write to the disks is processed by the raid subsystem that creates raid stripes who are written sequentially to disks. A crash during such a write sequence can lead to a corrupt raid array (not all stripes are written) or depending on raid and filesystem an eventually corrupted filesystem (data written but metadata not updated).

As you do not want to use a new generation filesystem as they protects against with Copy on Write and data checksums to detect problems, my best bet is to use a hardware raid with a bbu/flash protection. On a crash bad but committed writes are then done on next reboot, similar to the slog approach of ZFS.

The problem is called write hole problem, "Write hole" phenomenon in RAID5, RAID6, RAID1, and other arrays.

EffrafaxOfWug · Jul 13, 2020

Road Hazard said:
Using Debian 10.4 and I have a software MDADM RAID 6 array formatted as XFS. Zero complaints BUT, I think I read that during a weekly scrub if MDADM detects an error in a RAID 6 setup, it doesn't hold an election process and just assumes the parity drive is always correct.

I'm not sure what you mean by an election process; RAID6 has a double distributed parity, so if a disc were to vanish, the parity block X could still be found on two other drives. Additionally, RAID6 uses positional error-correcting codes in the parity (Reed-Solomon error checking) to verify the "correctness" of the data. RAID5 uses a simple XOR and only has a single parity so in the event of data loss that has no option but to assume the parity is always correct.

(My understanding of the maths behind this is very limited so I could well be wrong, do you have a link to what you read?)

ZFS and btrfs are inherently more resilient since every single block gets a simple CRC32 checksum (although I believe btrfs is experimenting with more complex hashes) so random bit-flips are correctable.

Personally I've been on mdadm RAID at home since forever - never moved to ZFS because the lack of expandability, growing, reshaping and rebalancing is a deal breaker for me - but I do do regular checks of all of the files I care about via hashdeep for peace of mind.

Road Hazard · Jul 23, 2020

EffrafaxOfWug said:
I'm not sure what you mean by an election process

It was always my understanding that if MDADM detects a mismatch while doing its' monthly scrub (while using RAID6), it will assume the data stripe is ALWAYS correct and recalculate the parity stripes to match. But what if the data stripe is wrong? If MDADM held an election process during scrubs and finds a problem it could say, "Ok, I have a data mismatch. The data stripe is suppose to be ....5, according to BOTH parity stripes so I'm going to assume 2 data points are correct and recalculate the data stripe based on this."

BUT, from all the reading I've ever done, it looks like if MDADM ever encounters a mismatch, it defaults to assuming the data stripe is correct and re-calculates parity based on whatever the data stripe says.

Another way to think of it..... if 3 people are standing in front of a door and 1 person says, "Go on in, it's totally safe" and the other two are saying, "Don't go in there. If you do you'll die". 2 vs 1...... listen to the majority and walk away. MDADM listens to the first person and goes on in.

I totally understand what RAID5/6 is....the CoW whole.... all of that. I was just wondering with a hardware RAID card, if it encounters a mismatch during a scrub, does it hold an election process to see which stripe is correct or like mdadm, just blindly goes with 1 or the other?

Road Hazard · Jul 23, 2020

gea said:
In a Raid-6 array, the OS sees the array as a single disk. A write to the disks is processed by the raid subsystem that creates raid stripes who are written sequentially to disks. A crash during such a write sequence can lead to a corrupt raid array (not all stripes are written) or depending on raid and filesystem an eventually corrupted filesystem (data written but metadata not updated).

As you do not want to use a new generation filesystem as they protects against with Copy on Write and data checksums to detect problems, my best bet is to use a hardware raid with a bbu/flash protection. On a crash bad but committed writes are then done on next reboot, similar to the slog approach of ZFS.

The problem is called write hole problem, "Write hole" phenomenon in RAID5, RAID6, RAID1, and other arrays.

See my other reply to Effrafax for a better breakdown. I understand all of what you're saying but I'm just wondering how smart mdadm is (when compared to a hardware RAID card) when it comes to determining if the data stripe is corrupt or the parity stripe when it encounters a mismatch during a scrub.

msg7086 · Jul 23, 2020

Not an expert, but in a RAID-5 it's impossible to know which drive holds incorrect data, while in RAID-6 it's possible. So I personally don't think MDADM will assume data stripe holds correct data and wipe both parity stripes. Again, this is just my personal understanding. If I'm the developer I wouldn't make this assumption on RAID-6.

EffrafaxOfWug · Jul 24, 2020

As has been mentioned, with RAID5 there's only one possible parity so that's the only one bit of data that the RAID algo can assume to be correct. When reconstructing your data stripe, if the sole available parity disagrees then you've got nothing else to fall back on.

With RAID6, you've got two parities to choose from; if the two parities turn out to be different, you've got a combination of the XOR and Reed-Solomon codes for figuring out which of those is "correct". There's nothing like an election process happening.

I'm not really sure what else mdadm or a hardware RAID card should be doing in these scenarios, I assume they would both deal with recovery in the same way.

Road Hazard said:
from all the reading I've ever done, it looks like if MDADM ever encounters a mismatch, it defaults to assuming the data stripe is correct and re-calculates parity based on whatever the data stripe says.

In the event of a RAID reconstruction, there isn't a data stripe - the whole point of the rebuild process is to recreate the missing data stripe from the available data + parity stripes. To use wikipedia's example diagram, if you lost disc0, you would lose data block A1, amongst others. A1 will be reconstructed from the data in A2 and A3 in combination with one or both of the Ap/Aq parity blocks. If you lost disc4, the parity Aq would simply be re-calculated from the available data blocks. In the event of two missing stripes and the XOR and Reed-Solomon parities disagreeing (which I think is the scenario you're trying to emulate?) I don't think there's any way for the data to be successfully recovered since you've effectively lost three stripes there.

For a readable explanation of the maths behind this see here although I haven't been able to find any explanation of what mdadm might do in a failure scenario; it'd probably be easier to lab it on a test array.

Search

Is this a reason why a dedicated RAID controller would be better vs. MDADM?

Road Hazard

Member

gea

Well-Known Member

EffrafaxOfWug

Radioactive Member

Road Hazard

Member

Road Hazard

Member

msg7086

Active Member

EffrafaxOfWug

Radioactive Member