Strange MDADM RAID 6 behaviour

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Stephan

Well-Known Member
Apr 21, 2017
920
698
93
Germany
@Mashie Still in beta, and expansion reflowing is disk-by-disk as far as I know. Will happen in 2022 I think, x-mas present... ;-) What you have to do is enable the feature on the pool, add disk to pool, wait, add next disk, wait, etc. Finally, you should rewrite all files in your datasets to optimize for the new (better, more efficient) data to parity ratio. I'm an avid ZFS user and subscribed to hxxs://github.com/openzfs/zfs/pull/12225 for close to the source updates.

Sorry to OP for steering offtopic from his mdadm issue. I'll stop here.
 

Mashie

Member
Jun 26, 2020
37
9
8
@Mashie Still in beta, and expansion reflowing is disk-by-disk as far as I know. Will happen in 2022 I think, x-mas present... ;-) What you have to do is enable the feature on the pool, add disk to pool, wait, add next disk, wait, etc. Finally, you should rewrite all files in your datasets to optimize for the new (better, more efficient) data to parity ratio. I'm an avid ZFS user and subscribed to hxxs://github.com/openzfs/zfs/pull/12225 for close to the source updates.

Sorry to OP for steering offtopic from his mdadm issue. I'll stop here.
I am the OP and the MDADM issues are resolved for me so no problems discussing alternatives from now on.
 
  • Like
Reactions: Stephan

PerryCS

New Member
Aug 27, 2022
9
4
3
Seems my problem has been solved. Done about 40 copies from assorted machines, from Windows to Server, from Server to Server, and not 1 slowdown (except too many writes/copies at once and it briefly dipped to 0 but came back up pretty much right away which makes sense for mechanical drives even if you have a lot of them). Previously the whole copy would freeze for many minutes. So, mostly convinced my issue is solved. I will get more confident as the days and weeks roll on by. :)

Also, I don't mind reading about Arch Linux ... wanted to try ZFS but I don't have ECC memory in my system and I read a while ago... if you don't have ECC ZFS could absolutely trash your data without you knowing it if there is a ram problem.

So, that alone has scared me away from using it. Also, it will mature and be part of the OS eventually. By then, it will be tested even more, and even more bugs will be worked out.

I look forward to deduplication! I have a lot of duplication on my server unfortunately I haven't had time to address.
 
  • Like
Reactions: Mashie

Goose

New Member
Jan 16, 2019
21
7
3
Also, I don't mind reading about Arch Linux ... wanted to try ZFS but I don't have ECC memory in my system and I read a while ago... if you don't have ECC ZFS could absolutely trash your data without you knowing it if there is a ram problem.
Actually ZFS would be better than most filesystems and software raid implementations as the files would be flagged bad on read whereas any other filesystem would silently corrupt and then never let you know. MD would let you know if you did a resync or patrol read, but it doesn't know how to fix the issue as it's only XOR.

Also, rather than simply marking blocks bad at the MD level and then having to figure out what files are affected at the filesystem level, ZFS would give you a list of what's broken.

It's a myth that ZFS requires ECC.
 
  • Like
Reactions: Stephan

Glock24

Active Member
May 13, 2019
159
93
28
Actually ZFS would be better than most filesystems and software raid implementations as the files would be flagged bad on read whereas any other filesystem would silently corrupt and then never let you know. MD would let you know if you did a resync or patrol read, but it doesn't know how to fix the issue as it's only XOR.

Also, rather than simply marking blocks bad at the MD level and then having to figure out what files are affected at the filesystem level, ZFS would give you a list of what's broken.

It's a myth that ZFS requires ECC.
This if off-topic, but how does BTRFS compare to ZFS?
 

PerryCS

New Member
Aug 27, 2022
9
4
3
Final update. Been using the updated system for 26 days now since my last post and about a month since I did the upgrade. Everything is running super smoothly. Speeds are way up, no more pausing. So, now that I have hammered the daylights out of this system for a month now and not 1 slowdown... super happy! :)

It's so responsive now that when I am hammering the array I can actually hear for the first time drive seek noises coming from a bunch of drives as they scream in the data. Never heard that before - it was so slow before there was no need. But 40 drives all seeking around and super fast access times... it sounds like my single 10,000 Velociraptor I had years ago under heavy seek. LOL! But, the speeds are great!

Thought I would update and let you know, for me, this issue is closed as working. Something so simple as an update solved all sorts of problems.

Of course, the updates were not without a couple of glitches. My apache had to be fixed as PHP was also upgraded but that was a quick fix. Also, something else was fixed...

QEMU - if I hammered the daylights out of a VM with super heavy disk load and prime 95 and occt testing the VM is would sometimes crash. Now, I have multiple VM's running 24/7 and had them under prime 95 and occt ... so happy, I upgraded the ram to 32GB GSkill 3200Mhz for my Ryzen 2700x and rock solid serving hundreds of terabytes flawlessly. :)

I was even able to migrate one of my desktop machines into the server. So now, 1 less machine running 24/7... it's all running on the server.

So, solved all sorts of problems. Thank you for this post pushing me to "just do it" :)

Always scared of updates. I did clone the SSD using DD booting off a USB stick before upgrading. Always have backups :)

David Perry
Perry Computer Services
(Canada)
 
  • Like
Reactions: Stephan and Mashie

PerryCS

New Member
Aug 27, 2022
9
4
3
It's a myth that ZFS requires ECC.
It's true it's a myth that it's required... BUT, the reasons I remember for people mentioning it should be mandatory are... IF there is memory corruption due to a bad dimm or some other system problem... I thought I remembered many prominent youtube channels mentioning that corruption in memory is not checked and it will just write that back to the disk "as is" trashing your file system without any warnings.

That may be fixed now as this was a few years ago or I could be remembering this incorrectly... but, that's what I remember and it's the main reason I have stayed away. I could drop a few dozen TB if I had deduplication in my file system.

For now, I'm happy with mdadm - in the future, I would love to try ZFS or BTRFS with dedup in it...
 

Goose

New Member
Jan 16, 2019
21
7
3
It's true it's a myth that it's required... BUT, the reasons I remember for people mentioning it should be mandatory are... IF there is memory corruption due to a bad dimm or some other system problem... I thought I remembered many prominent youtube channels mentioning that corruption in memory is not checked and it will just write that back to the disk "as is" trashing your file system without any warnings.

That may be fixed now as this was a few years ago or I could be remembering this incorrectly... but, that's what I remember and it's the main reason I have stayed away. I could drop a few dozen TB if I had deduplication in my file system.

For now, I'm happy with mdadm - in the future, I would love to try ZFS or BTRFS with dedup in it...
The issue you described isn't specific to ZFS, in fact as I mentioned above, ZFS will notify you on a scrub or read which is better than most other filesystems.

The takeaway isn't that ZFS needs ECC, but rather that if you like your data, use ZFS with ECC and make backups!

Glock24's question is too vague, but if I understand what you're hinting at, the TL;DR is that ZFS is better due to stronger checksum and the fact that it verifies the checksum on every read vs BTRFS's policy of only doing it on a scrub. "on a scrub" is better than most other FS though so don't read this as "BTRFS is shit so you shouldn't use it".
 
  • Like
Reactions: Glock24