mdadm + whitelabel drives losing superblock

kndonlee

New Member
Mar 25, 2021
3
0
1
I have quite the odd problem that I've never encountered before.

Setup:
OS Ubuntu 18.04
7x Shucked WD 100EMAZ Drives
Raid: mdadm software raid 6
Mobo: Supermicro MBD-X11SAE-MO
3rd pin covered up to work with Mobo's sata ports

Whenever I lose power, 2 of the 7 drives always lose the superblock and need to be rebuilt from scratch, which if I lost any drive other than the 2 that always get lost, I would lose all the data. Anyone ever encounter such odd behavior?

I purchased the drives in various different batches, but they are consistent in Model (WDC WD100EMAZ-00WJTA0) & Firmware Revision (83.H0A83), so the drives are consistent, but the difference in behavior is quite odd.

Some random thoughts I have had:

Could it be that I have just added raw block devices: e.g /dev/sd[abcdefg] instead of create partitions and adding partitions like /dev/sd[abcdefg]1 ?

Or these are 10T drives, so perhaps I need to use a GUID partitioning format instead of dos?

If I shutdown the host gracefully, everything is fine. It's only on power outages that these 2 drives are finicky.
 

MBastian

Active Member
Jul 17, 2016
136
34
28
Düsseldorf, Germany
Hmm tricky,

did you had a look at the S.M.A.R.T data yet? Did you start a self-test?

- partition and format one of the affected drives and see what happens when you cut the power.
- swap the drives around, maybe it's the slot or SAS/SATA interface.
- try the drives on a different host
- maybe check the taped pins again


Imho you should never omit a proper partition setup. If only for the reason that someone might think that this disk is currently unutilized. Also not all 10TB drives have exactly the same size. Rule of thumb is to let the last 100MB free.
The only exception to that rule might be SAN or virtualized volumes. LVM resizing is much more convenient if you don't have to bother with manipulating the partition table.
 

acquacow

Well-Known Member
Feb 15, 2017
605
322
63
39
mdadm doesn't care about partitioning. Been using mdadm in production arrays w/o any partitioning for over 10 years.

You've created an mdadm config file as well I hope ala:
mdadm --detail --scan > /etc/mdadm.conf

I'd definitely be looking at the drives. I usually just throw raw block devices at mdadm, no partitioning at all, so GPT/etc doesn't matter.

What command did you use to initially create the array?
 

kndonlee

New Member
Mar 25, 2021
3
0
1
I've been using raw devices for over a decade as well, This array has gone through iterations using sets of 750G/1.5T/3T/10T drives, but it's finally on the 10T that I have 2 problematic drives.

The array was recreated a while back with the 1.2 superblock, presumably using something like: mdadm --create /dev/md0 --level=6 /dev/sd[abcdefg].

To add some more light: it's MDADM -> LVM -> ext4

I'm thinking of perhaps offlining the array, and shrinking everything to try to convert the volumes to a partitioned setup. But another part of me is thinking to just grab a single 18T drive and switch out one of the drives and see if it just happens to be weird, one-off hardware, before embarking on the highly risky shrink.
 

kndonlee

New Member
Mar 25, 2021
3
0
1
mdadm doesn't care about partitioning. Been using mdadm in production arrays w/o any partitioning for over 10 years.

You've created an mdadm config file as well I hope ala:
mdadm --detail --scan > /etc/mdadm.conf

I'd definitely be looking at the drives. I usually just throw raw block devices at mdadm, no partitioning at all, so GPT/etc doesn't matter.

What command did you use to initially create the array?
The existence of /etc/mdadm/mdadm.conf doesn't really change the behavior on boot.

root@packrat:~/bin# cat /etc/mdadm/mdadm.conf
ARRAY /dev/md/0 metadata=1.2 UUID=eb794803:46d23264:04f146ca:c9f6b1e6 name=packrat:0
MAILADDR root


On each boot, I end up having to force start the array by running:

mdadm --stop /dev/md0
mdadm --assemble --force /dev/md0 /dev/sd[cdefg]
mdadm --manage /dev/md0 --add /dev/sda
mdadm --manage /dev/md0 --add /dev/sdb