mdadm raid5 recovery

frawst

New Member
Mar 2, 2021
21
3
3
TL,DR; I might've messed up huge, I need someone's help who knows mdadm better than I. I can provide any sort of data, but this crazy long story will get you up to par, if you care to read it. I'm at a loss and worried.



Okay, I'm not even sure where to start with this post. the short of it is that i'm not sure if i'm screwed, or what to do to even check. I fear that I may have made one too many mistakes in my raid in the attempt to recover it. I've never posted for such a thing, so i'm just going to start from the top.. After days of research, I've learned that naïve me did not create the best situation. I'm aware of it now, and have a better game-plan going forward. the only concern now is the hope of recovering my data to get out of this.

I have an mdadm Raid-5 that consists of 9 8Tb disks with an LVM / XFS filesystem. The other day I was moving data from one directory to another in this volume, and out of nowhere it throws an I/O error, and the XFS filesystem shut down, this is the first time i've ever seen this. I couldn't figure out how to re-enable it, so i rebooted, everything mounted, and I immediately proceeded to get the data off and onto other drives. It does it again, but this time on reboot 3 of the disks go missing. No raid.

After going and physically verifying everything is connected and powered, i reboot the machine. and this time the disks seem to not be spinning at all, and don't show up in the bios. The next step is where I learned something the hard way. I took those 3 disks to my desktop, and hooked them up using the WD EasyStore controller they were paired with on purchase. Apparently these guys change the drive layout in some way, as they were then labeled as "easystore" drives, even after connecting them directly via sata. The easystore controller spun them all up to show that SMART was clean (all drives are, for that matter).

Now we're in trouble. The mdadm superblocks are gone / missing from these disks, and I now have no idea of knowing what order they should be in to re-assemble the raid (too many missing to automatically assemble). After a bunch of research, verification and noting down what I know, I went to a clean new vm for testing.

in this vm I made a brand new raid 5, formatted it, and put some data on it. I then stopped it, and zero'd the superblock of two of them. tried to assemble, and it didn't work. (expected). I then did the mdadm --create command again, calling the disks in the wrong order (and marked one missing). it assembled, the lvm showed, but wouldn't mount. I then went back and did the same thing, but in the right order, it worked. files and all.

At this point I knew the order of 7 out of 9 disks in the array, I took this knowledge and tried it on the real thing.
For the two unknowns, I tried it one way "/dev/sda, /dev/sdb"

It re-wrote all the superblock data (I have what was before notated to the best of that bad situation).
it then created the raid of all 9 disks and immediately started syncing, but no LVM to be found. I immediately stopped the raid for worries of improper syncing, and tried it again with the two unknown disks in reverse order "/dev/sdb, /dev/sda". same result, resyncing raid. no lvm.

I'm now worried that I've lost everything, and I'm at a loss. I have no idea where to go from here, aside from laying in darkness for the next year (again!).

If anyone has any sufficient knowledge with mdadm, I desperately need your help.
 

EffrafaxOfWug

Radioactive Member
Feb 12, 2015
1,390
494
83
I'm not 100% sure on the details here, but yes it sounds like you're hosed. If whatever this EasyStore controller did stomped all over the data on these discs then it might have already overwritten too much important stuff (but I've not got any experience with them). If you have enough scratch space available I'd suggest imaging each of the drives in to a copy right now so you can attempt recovery actions on those rather than the actual drives.

If the partition table has already been hosed you can try recovering that first; I've used a utility called testdisk to do this in the past when I accidentally deleted a partition table. Did you try a superblock recovery on the drives at any point? e.g. `mdadm --examine /dev/sdX1` for each of the afflicted drives/partitions. If you can retrieve the UUID and the underlying data is undamaged enough you can usually rebuild the array from the UUID rather than having to guess at the array geometry.

Please tell me you have backups - a 9*8TB RAID5 is already an inherently risky setup.
 
  • Like
Reactions: frawst

frawst

New Member
Mar 2, 2021
21
3
3
First off, Thank you for replying!


If whatever this EasyStore controller did stomped all over the data on these discs then it might have already overwritten too much important stuff (but I've not got any experience with them).
From what I've read, it basically rewrites the first few kb, this was noticed immediately because it erased the superblock data on two of the drives (but not all three that were tested on it? strange.) Due to this, mdadm --examine rendered this on both of them

/dev/sdf:
MBR Magic : aa55
Partition[0] : 4294967295 sectors at 1 (type ee)


If you have enough scratch space available I'd suggest imaging each of the drives in to a copy right now so you can attempt recovery actions on those rather than the actual drives.
This is something i'm trying to get to as we speak, This is also why i'm kinda in this situation, the cost of drives to back this stuff up is a bit much to only need them temporarily, it's possible my employer might be able to lend a hand. I feel I should also mention that up to the time of failure, all disks passed SMART, the raid reported clean and synced.


If the partition table has already been hosed you can try recovering that first; I've used a utility called testdisk to do this in the past when I accidentally deleted a partition table. Did you try a superblock recovery on the drives at any point? e.g. `mdadm --examine /dev/sdX1` for each of the afflicted drives/partitions. If you can retrieve the UUID and the underlying data is undamaged enough you can usually rebuild the array from the UUID rather than having to guess at the array geometry.

Please tell me you have backups - a 9*8TB RAID5 is already an inherently risky setup.
My immense googling brought me to testdisk, I ran it on a couple of drives, doing the deep search analyzation. it produced a bunch of different options, but I started this again on all drives last night. I'll post that outcome here. To be honest i'm not 100% on how to use it, in the hopes of recovering some kind of data.

As for rebuilding the array, this is something I have tried, I was able to get the superblock data for 7 of the 9 disks, I have all of that original information saved, using that can get me to the exact uuid of those disks. issue is the 7 out of 9 part. After going over the command in a test environment, and researching how superblocks are even made, I learned that my only choice here was to run an mdadm --create. I tested this command in vm, breaking the system, overwriting the partition with the wrong stuff, even assembling it in the wrong order, and then trying again, it worked at every turn. So after enough verification on my side, i tried this on the real thing, the raid assembled, but there was no filesystem. I did this with --assume-clean, and ensured no writing would occur, but after trying a couple of combinations (trying the two unknowns in reverse order), i stepped away.

As for backups, you can probably guess that I've done boo-boos. Historically speaking (for me), this has never been an issue. So I never bothered to research the risk of my setup. at this point in time I don't actually know where the failure occurred. all I know is that it sucks.
 

frawst

New Member
Mar 2, 2021
21
3
3
1614786447595.png
Just a snippet of what i'm seeing so far on these drives, They all show similar data so far. They were all used as XFS, if memory serves properly, all but one or two were directly mounted to the raid without partitions, I believe the latest one was using a partition as I had learned this was better practice.

1614786562970.png

Going over pictures from the past, when I introduced the most recent disk, it did this weird thing where it created the "md0p1" instance. I'm not really sure why this happened, But in the pursuit of transparency, i figured I would share this info. (you can ignore the notes, I was documenting the migration of dying raid data. ironic eh) :/


The last known mdadm --examine information before attempting a create (which of course overwrote that)
/dev/sda:
MBR Magic : aa55
Partition[0] : 4294967295 sectors at 1 (type ee)


/dev/sdb:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : c66a7380:f3159267:bb7c595d:55ec6215
Name : ubuntu:0
Creation Time : Sun Dec 31 16:22:06 2017
Raid Level : raid5
Raid Devices : 9

Avail Dev Size : 15627809456 (7451.92 GiB 8001.44 GB)
Array Size : 62511235072 (59615.36 GiB 64011.50 GB)
Used Dev Size : 15627808768 (7451.92 GiB 8001.44 GB)
Data Offset : 243712 sectors
Super Offset : 8 sectors
Unused Space : before=243432 sectors, after=688 sectors
State : clean
Device UUID : c9e85a63:2fa85db7:3710de20:03fe77aa

Update Time : Mon Mar 1 16:47:06 2021
Bad Block Log : 512 entries available at offset 264 sectors
Checksum : a625b20c - correct
Events : 1126263

Layout : left-symmetric
Chunk Size : 512K

Device Role : Active device 8
Array State : A.AAA.AAA ('A' == active, '.' == missing, 'R' == replacing)

/dev/sdc:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x8
Array UUID : c66a7380:f3159267:bb7c595d:55ec6215
Name : ubuntu:0
Creation Time : Sun Dec 31 16:22:06 2017
Raid Level : raid5
Raid Devices : 9

Avail Dev Size : 15627809456 (7451.92 GiB 8001.44 GB)
Array Size : 62511235072 (59615.36 GiB 64011.50 GB)
Used Dev Size : 15627808768 (7451.92 GiB 8001.44 GB)
Data Offset : 243712 sectors
Super Offset : 8 sectors
Unused Space : before=243432 sectors, after=688 sectors
State : clean
Device UUID : 50f573f9:04276d2c:4770fc01:cb5bfc0d

Update Time : Mon Mar 1 16:47:06 2021
Bad Block Log : 512 entries available at offset 264 sectors - bad blocks present.
Checksum : f5f0fb48 - correct
Events : 1126263

Layout : left-symmetric
Chunk Size : 512K

Device Role : Active device 6
Array State : A.AAA.AAA ('A' == active, '.' == missing, 'R' == replacing)

/dev/sdd:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x8
Array UUID : c66a7380:f3159267:bb7c595d:55ec6215
Name : ubuntu:0
Creation Time : Sun Dec 31 16:22:06 2017
Raid Level : raid5
Raid Devices : 9

Avail Dev Size : 15627809456 (7451.92 GiB 8001.44 GB)
Array Size : 62511235072 (59615.36 GiB 64011.50 GB)
Used Dev Size : 15627808768 (7451.92 GiB 8001.44 GB)
Data Offset : 243712 sectors
Super Offset : 8 sectors
Unused Space : before=243432 sectors, after=688 sectors
State : clean
Device UUID : 3fcd825c:44b0390b:97bc37c6:2dba8dc9

Update Time : Mon Mar 1 16:47:06 2021
Bad Block Log : 512 entries available at offset 264 sectors - bad blocks present.
Checksum : b799072b - correct
Events : 1126263

Layout : left-symmetric
Chunk Size : 512K

Device Role : Active device 0
Array State : A.AAA.AAA ('A' == active, '.' == missing, 'R' == replacing)

/dev/sde:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : c66a7380:f3159267:bb7c595d:55ec6215
Name : ubuntu:0
Creation Time : Sun Dec 31 16:22:06 2017
Raid Level : raid5
Raid Devices : 9

Avail Dev Size : 15627809456 (7451.92 GiB 8001.44 GB)
Array Size : 62511235072 (59615.36 GiB 64011.50 GB)
Used Dev Size : 15627808768 (7451.92 GiB 8001.44 GB)
Data Offset : 243712 sectors
Super Offset : 8 sectors
Unused Space : before=243624 sectors, after=688 sectors
State : clean
Device UUID : 6ffaf68a:97574857:6f9e77fa:6cccedcc

Update Time : Mon Mar 1 11:10:53 2021
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : 69bd7d6d - correct
Events : 1125537

Layout : left-symmetric
Chunk Size : 512K

Device Role : Active device 5
Array State : AAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)

/dev/sdf:
MBR Magic : aa55
Partition[0] : 4294967295 sectors at 1 (type ee)

/dev/sdq:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : c66a7380:f3159267:bb7c595d:55ec6215
Name : ubuntu:0
Creation Time : Sun Dec 31 16:22:06 2017
Raid Level : raid5
Raid Devices : 9

Avail Dev Size : 15627808768 (7451.92 GiB 8001.44 GB)
Array Size : 62511235072 (59615.36 GiB 64011.50 GB)
Data Offset : 243712 sectors
Super Offset : 8 sectors
Unused Space : before=243624 sectors, after=688 sectors
State : clean
Device UUID : c53de05d:00a7175a:7d6d9007:b7dee24b

Update Time : Mon Mar 1 16:47:06 2021
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : cb82405d - correct
Events : 1126263

Layout : left-symmetric
Chunk Size : 512K

Device Role : Active device 4
Array State : A.AAA.AAA ('A' == active, '.' == missing, 'R' == replacing)

/dev/sdr:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : c66a7380:f3159267:bb7c595d:55ec6215
Name : ubuntu:0
Creation Time : Sun Dec 31 16:22:06 2017
Raid Level : raid5
Raid Devices : 9

Avail Dev Size : 15627808768 (7451.92 GiB 8001.44 GB)
Array Size : 62511235072 (59615.36 GiB 64011.50 GB)
Data Offset : 243712 sectors
Super Offset : 8 sectors
Unused Space : before=243624 sectors, after=688 sectors
State : clean
Device UUID : b47528c5:4f941c5a:c71bc07b:0b7e25f4

Update Time : Mon Mar 1 16:47:06 2021
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : 4f41b339 - correct
Events : 1126263

Layout : left-symmetric
Chunk Size : 512K

Device Role : Active device 3
Array State : A.AAA.AAA ('A' == active, '.' == missing, 'R' == replacing)

/dev/sds:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : c66a7380:f3159267:bb7c595d:55ec6215
Name : ubuntu:0
Creation Time : Sun Dec 31 16:22:06 2017
Raid Level : raid5
Raid Devices : 9

Avail Dev Size : 15627808768 (7451.92 GiB 8001.44 GB)
Array Size : 62511235072 (59615.36 GiB 64011.50 GB)
Data Offset : 243712 sectors
Super Offset : 8 sectors
Unused Space : before=243624 sectors, after=688 sectors
State : clean
Device UUID : dc8566dc:c242e3f2:314c9908:ab54b5fc

Update Time : Mon Mar 1 16:47:06 2021
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : 94af78e4 - correct
Events : 1126263

Layout : left-symmetric
Chunk Size : 512K

Device Role : Active device 7
Array State : A.AAA.AAA ('A' == active, '.' == missing, 'R' == replacing)
 
Last edited:

frawst

New Member
Mar 2, 2021
21
3
3
I apologize for the post spam. this is what such things do to me. My employer lent me 9 8tb disks, and I started a dd clone today. it should take about 2-3 days to complete.
 

EffrafaxOfWug

Radioactive Member
Feb 12, 2015
1,390
494
83
Layer-wise, I assume these were all mdraid, then the RAID as an LVM pvol, and then a VG and LVs within that, formatted as XFS...?

You need to start worrying about the lowest layers first, namely just the discs/partitions themselves and the mdraid that sat on top of those. Having some partitioned vs. not partitioned will certainly complicate the geometry somewhat, and from the looks of the testdisk output it hasn't found much in the way of structure to restore (but I could be wrong, been years since I used it). Did testdisk give any info in that regard? IIRC if it doesn't find a partition table it'll give you the option to enter "none". For the ones you did partition, do you know if the partition type was set to fd00/linux RAID? If all it seems to be finding is a bunch of HFS partitions then I'm not really sure what's happened to the drives :/

As you've surmised, whatever the easystore controller (or anything else for that matter) changed on the drives, it probably wasn't a large amount of data as that would take forever; testdisk will scan all the sectors it can looking for common patterns like "that looks like an ext3 block" or "that's a GPT partition boundary" and then make educated guesses about what the partition table should look like and will offer to recreate it for you; without knowing more about the easystore I can't really tell if it's actually created HFS partitions or not. But if you got as far as the deep search options then it sounds like you were on the right track.

I was hoping you'd have enough scratch space available on another array to perhaps store the entire load of discs as images so you could then attempt recovery actions on those rather than risk further damaging the drives but yeah, it was probably unlikely you'd have a spare 80TB array knocking around...

From what I've read, it basically rewrites the first few kb, this was noticed immediately because it erased the superblock data on two of the drives (but not all three that were tested on it? strange.) Due to this, mdadm --examine rendered this on both of them

/dev/sdf:
MBR Magic : aa55
Partition[0] : 4294967295 sectors at 1 (type ee)
ee is the ID used for the GPT partition ID and so is to be expected on a GPT drive; if you looked at or had a /dev/sdf1 `mdadm --examine` should throw something like this (example from one of my RAID6's):
Code:
root@wug:~# mdadm --examine /dev/sdg
/dev/sdg:
   MBR Magic : aa55
Partition[0] :   xxxxxxxxxxx sectors at            1 (type ee)

root@wug:~# mdadm --examine /dev/sdg1
/dev/sdg1:
          Magic : xxxxxxxxx
        Version : 1.2
    Feature Map : 0x1
     Array UUID : xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
           Name : wug:storage  (local to host wug)
  Creation Time : Fri Jan  3 23:32:34 2020
     Raid Level : raid6
   Raid Devices : 6

 Avail Dev Size : 
     Array Size : 
  Used Dev Size : 
    Data Offset : xxxxxxx sectors
   Super Offset : x sectors
   Unused Space : before=xxxxx sectors, after=xxxx sectors
          State : clean
    Device UUID : xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Internal Bitmap : 8 sectors from superblock
    Update Time : Wed Mar  3 22:07:29 2021
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : xxxxxxx - correct
         Events : xxxxx

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device x
   Array State : xxxxxxxx ('A' == active, '.' == missing, 'R' == replacing)
I suspect you didn't have a partition there yet for mdadm to examine?

No backups == very ouch. I was "lucky" enough to have a hard drive failure six weeks after building my first server, and an array failure three months after that once I'd learned about RAID - lost data both times. But it taught me a very valuable lesson about backups.

At this point you probably need to consider engaging a professional data recovery specialist, but it's likely to be a tall order and very expensive... especially if the recovery attempts so far have further damaged an already shonky dataset.
 
  • Like
Reactions: frawst

frawst

New Member
Mar 2, 2021
21
3
3
I had the testdisk scans still going this morning, a couple of them completed, and mostly showed hundreds of EXT3/4 partitions, some HFS+ and then random everywhere else. as I mentioned in a previous (recent) comment, I got ahold of a full set of spare drives from my work, and have them all doing a copy right now, For that reason I stopped the scan of the remaning ones still running testdisk, for the opporuntity to get this backup clone happening sooner. As for the layout, your guess is exactly on point, mdadm > to a vg > vg pool > xfs.

At this point i'm just hopeful to get help on the partition tables themselves, the lower layers are where my knowledge begins to quickly drop out.
I think my last two posts will fulfill some thoughts you've had.

Thanks for the follow-up!
 

EffrafaxOfWug

Radioactive Member
Feb 12, 2015
1,390
494
83
Well, unless you actually re-write the partition table testdisk shouldn't actually write anything to the discs themselves but always best to test on a backup image if possible.

If testdisk hasn't found anything useful though I'd take a punt and say the data's not going to be recoverable by mere mortals like me. There might well be other free recovery tools I'm not aware of to tackle this sort of thing but a quick search doesn't reveal anything promising.

I had a look around for more info on the easystores but didn't find anything conclusive over what it might/does do if you plug in a drive. Automatically formatting it sounds like a bad design decision but might make sense if these were shucked originally. Now just... plucked. Sorry I couldn't be of more help :(
 

Goose

New Member
Jan 16, 2019
14
2
3
Hey

I half read the posts above, but I think a sane way to proceed, once you have a cloned set, would be to copy the MBR from a working disk to the couple that don't work. Hopefully one just works from there. If not try it from another one of the known good disks.

You may need to manually assemble and try it in different orders. RAID Recovery - Linux Raid Wiki has info on this.

RAID superblock formats - Linux Raid Wiki has info on where the superblock is located so if that's hosed, you may be able to copy it from a good disk. You'll need to establish from a good disk where the superblock is based on its version.

This is all predicated on an understanding that the external drive caddies overwrote the first parts of the disk.

Good luck. I've had fun with recoveries in the past and I know you must be sweating right now.
I've moved to ZFS but MD is supremely recoverable so if you are methodical and have the clones then you can probably get stuff back. I presume LVM and XFS will be fine if the RAID itself works but if you have to copy MBRs and stuff like that you may have some corruption.

Again, best of luck.
 
  • Like
Reactions: frawst

frawst

New Member
Mar 2, 2021
21
3
3
Hey

I half read the posts above, but I think a sane way to proceed, once you have a cloned set, would be to copy the MBR from a working disk to the couple that don't work. Hopefully one just works from there. If not try it from another one of the known good disks.

You may need to manually assemble and try it in different orders. RAID Recovery - Linux Raid Wiki has info on this.

RAID superblock formats - Linux Raid Wiki has info on where the superblock is located so if that's hosed, you may be able to copy it from a good disk. You'll need to establish from a good disk where the superblock is based on its version.

This is all predicated on an understanding that the external drive caddies overwrote the first parts of the disk.

Good luck. I've had fun with recoveries in the past and I know you must be sweating right now.
I've moved to ZFS but MD is supremely recoverable so if you are methodical and have the clones then you can probably get stuff back. I presume LVM and XFS will be fine if the RAID itself works but if you have to copy MBRs and stuff like that you may have some corruption.

Again, best of luck.
I missed this response. but Thank you so much for it, As of this morning I got all of the disks dd cloned to new ones for testing and whatnot. I've been following this post recover-raid-5-data-after-created-new-array-instead-of-re-using for any sort of guidance on what to do. I'm currently scanning the drives for binary signatures to determine the order of them. All superblocks have been re-made with mdadm --create --assume-clean. I've been hearing more that mdadm is decent with recovery, it's my only sliver of hope at this point. If you have any sort of knowledge on the matter, I would greatly appreciate any thoughts or input! At this moment, i'm just struggling to know the order of disks, and if they had partitions or not (this is my most regrettable portion), I know for a fact the majority did not, but I thought I recalled making a partition for one or two of them down the line. Knowing me, they were full sized standard linux partitions, I spent some time with testdisk, but walked away for starting a clone instead. This binary scan is my current target, using "bgrep" with data that was stored on the array.

Yes, sweating, loss of sleep, you name it. I messed up, and I'm so aware. everything repeats mentally. I'm just hoping I can learn from this, and walk away on a high note. Picture attached for kicks, the original disks have been disconnected entirely to remove the possibility of getting them involved in recovery procedures.
 

Attachments

Goose

New Member
Jan 16, 2019
14
2
3
We've all been there with disks everywhere. You may want a large desk fan blowing some air on them and turn them up the right way as the little hole on top of the disks needs to breathe.

I would try assembling in each order permutation first. The second link I posted has a script that may help with that.

For future you, I like(d) to keep a copy of the config myself. Just an output of mdadm detail so you know which disk is where. You may want to match it up with disk IDs/serial numbers in case disk order changes inexplicably.
I also, take a copy of the MBR through DD and use that to provide the partition info for new disks and replacements.

Aside from that, just setup the bitmap. It helps a ton if your disks are flaky.
 
  • Like
Reactions: frawst

frawst

New Member
Mar 2, 2021
21
3
3
We've all been there with disks everywhere. You may want a large desk fan blowing some air on them and turn them up the right way as the little hole on top of the disks needs to breathe.

I would try assembling in each order permutation first. The second link I posted has a script that may help with that.

For future you, I like(d) to keep a copy of the config myself. Just an output of mdadm detail so you know which disk is where. You may want to match it up with disk IDs/serial numbers in case disk order changes inexplicably.
I also, take a copy of the MBR through DD and use that to provide the partition info for new disks and replacements.

Aside from that, just setup the bitmap. It helps a ton if your disks are flaky.
Yeah.. it's so much fun! I have a small fan blowing across them, but I'll see about flipping them over here soon. As for the permutation method, I found that in your previous link and had it running for about 20 hours before I realized that it may never work. Issue is that it tries to mount /dev/md0 directly, But the system contains an XFS LVM filesystem that generally shows up under a different location, provided it shows up and is active off the bat. This is where I really start to drop on what I can do, I don't know perl scripting enough to tweak it.

and future me will have that kind of thing on a cron-job to an offsite location! :rolleyes: And the serial number thing is a total concern and issue that I've already ran into. So anything like that, where I can print blkid or lsblk, basically anything that would prove useful for mapping will be saved.



I also, take a copy of the MBR through DD and use that to provide the partition info for new disks and replacements.

Aside from that, just setup the bitmap. It helps a ton if your disks are flaky.
This portion has lost me a bit, and also has my interest, By now the partition tables are at a point where I don't know what they should be. This is the part that has me worried that I'm out of luck. I just don't know enough to know what to look for. In my previous messages I had mentioned that at least a couple of disks used partitons, where the majority didn't. Every time I think about it, I get that sinking feeling, I just wish I knew more.
 

EffrafaxOfWug

Radioactive Member
Feb 12, 2015
1,390
494
83
Issue is that it tries to mount /dev/md0 directly, But the system contains an XFS LVM filesystem that generally shows up under a different location, provided it shows up and is active off the bat. This is where I really start to drop on what I can do
If you're able to recover the RAID array enough that it comes to life and is identifiable as an LVM physical volume, they should immediately appear either immediately /dev/mapper or perhaps with the aid of a pvscan. Once the PV is available, hopefully the child VGs and LVs would also become visible, allowing you to mount the XFS filesystem(s) stored within them.

This portion has lost me a bit, and also has my interest, By now the partition tables are at a point where I don't know what they should be. This is the part that has me worried that I'm out of luck. I just don't know enough to know what to look for. In my previous messages I had mentioned that at least a couple of disks used partitons, where the majority didn't. Every time I think about it, I get that sinking feeling, I just wish I knew more.
I also learned that recovering mdraid discs without partition tables was a right pain the posterior so I've been using partitions on them since forever (and is I latterly found out, it helped a lot resizing with larger discs and making sure alignment was correct).

If you have them, MBRs are easy to back up - you just dd the first 512b of your hard drive and you're done.
Code:
dd if=/dev/sda of=/path/to/my/sda_mbr.backup bs=512 count=1
If you're using GPT instead of MBR (needed on drives >2.2TB) then a better method is to use the sgdisk util which has a dedicated backup option.
Code:
sgdisk --backup=/path/to/my/sdb_gpt.backup /dev/sdb
Naturally, don't store these backups on the drives/arrays in question...!

Without partition markers as a guide, restores can be a lot harder especially if you've got crucial metadata right at the start of the disc.
 
  • Like
Reactions: frawst

frawst

New Member
Mar 2, 2021
21
3
3
TL;DR, I wish I knew more. But there's files, the raid was made in a degraded state. How should I re-add the final drive to prevent any further loss?
I'm still getting "failed: Structure needs cleaning (117)" in various places as I attempt to pull the data off. If you know much about how to safely fix this, i'm all ears (keeping in mind, this is all on cloned disks).

Okay. So I don't even know what to say at this point. I'm super excited, annoyed, and hesitant all at once. But the problem would've been easily solved, had I only known what to look for. About an hour ago I was on the verge of admitting defeat. I took one more time to roll over my captured mdadm examine information, and compared it against what was being made using that permutation tool, (as well as my own attempts), and the first thing to glare at me was the unused space "Unused Space : before=243624 sectors, after=688 sectors", The before number was very different.

I decided to put this into my create command with --data-offset=121856, and used the original drive order as best I could (I knew 7 of the 9). the remaining two only gave 4 possible combos. On the first attempt I saw /dev/md127 and /dev/md127p1. My heart starts racing. idk why it has this "p1" (Please elaborate if you have any idea). But this to me was a great sign. but no LVM. So i tried another combo, LVM shows up, I could cry at this point. I go to mount it, "XFS: Bad Superblock blah blah blah", Xfs_repair has no help, wants me to mount it, but it won't mount. I try another combo, Now that's gone, but the filesystem won't mount, says it's dirty. The last possible option, LVM comes up. and it mounts. I see files. I am now watching grass grow as I attempt to get my data off of the volume. At this point, I built the array in a degraded state. I'll need to add the remaining disk back in. At this point I'm so nervous to ruin it, That I want input on how I should do so. mdadm --detail shows the following

/dev/md127:
Version : 1.2
Creation Time : Wed Mar 10 01:04:50 2021
Raid Level : raid5
Array Size : 62511235072 (59615.36 GiB 64011.50 GB)
Used Dev Size : 7813904384 (7451.92 GiB 8001.44 GB)
Raid Devices : 9
Total Devices : 8
Persistence : Superblock is persistent

Intent Bitmap : Internal

Update Time : Wed Mar 10 01:05:09 2021
State : clean, degraded
Active Devices : 8
Working Devices : 8
Failed Devices : 0
Spare Devices : 0

Layout : left-symmetric
Chunk Size : 512K

Consistency Policy : bitmap

Name : NasFu.frostmournemc.com:127 (local to host NasFu.frostmournemc.com)
UUID : e43624f1:0623057f:115be9a3:213f0e42
Events : 7

Number Major Minor RaidDevice State
0 8 48 0 active sync /dev/sdd
1 8 0 1 active sync /dev/sda
- 0 0 2 removed
3 8 112 3 active sync /dev/sdh
4 8 80 4 active sync /dev/sdf
5 8 96 5 active sync /dev/sdg
6 8 128 6 active sync /dev/sdi
7 8 16 7 active sync /dev/sdb
8 8 64 8 active sync /dev/sde
root@NasFu:~# mdadm --detail /dev/md127p1
/dev/md127p1:
Version : 1.2
Creation Time : Wed Mar 10 01:04:50 2021
Raid Level : raid5
Array Size : 54697300975 (52163.41 GiB 56010.04 GB)
Used Dev Size : 7813904384 (7451.92 GiB 8001.44 GB)
Raid Devices : 9
Total Devices : 8
Persistence : Superblock is persistent

Intent Bitmap : Internal

Update Time : Wed Mar 10 01:05:09 2021
State : clean, degraded
Active Devices : 8
Working Devices : 8
Failed Devices : 0
Spare Devices : 0

Layout : left-symmetric
Chunk Size : 512K

Consistency Policy : bitmap

Name : NasFu.frostmournemc.com:127 (local to host NasFu.frostmournemc.com)
UUID : e43624f1:0623057f:115be9a3:213f0e42
Events : 7

Number Major Minor RaidDevice State
0 8 48 0 active sync /dev/sdd
1 8 0 1 active sync /dev/sda
- 0 0 2 removed
3 8 112 3 active sync /dev/sdh
4 8 80 4 active sync /dev/sdf
5 8 96 5 active sync /dev/sdg
6 8 128 6 active sync /dev/sdi
7 8 16 7 active sync /dev/sdb
8 8 64 8 active sync /dev/sde
 
Last edited:

Goose

New Member
Jan 16, 2019
14
2
3
If you're getting your stuff copied to your work disks, then don't "fix" your array, start fresh with partitions as I and another poster suggested. Also, maybe rethink if the extra layer LVM provides is worth it. It's another failure point and if you don't need to grow "disks" then why bother.

XFS over the top of LVM auto tunes itself to the start array size SUNIT and all that and you can adjust that with mount options if you grow it later.

The other possibility is to go ZFS at this stage... but perhaps that's too controversial.
 

frawst

New Member
Mar 2, 2021
21
3
3
If you're getting your stuff copied to your work disks, then don't "fix" your array, start fresh with partitions as I and another poster suggested. Also, maybe rethink if the extra layer LVM provides is worth it. It's another failure point and if you don't need to grow "disks" then why bother.

XFS over the top of LVM auto tunes itself to the start array size SUNIT and all that and you can adjust that with mount options if you grow it later.

The other possibility is to go ZFS at this stage... but perhaps that's too controversial.

at the moment all of this is being done on the work disks. the originals are shelved and away for now. I'm working to get the data off of the work disk array onto yet another batch of disks. I told my boss about the dilemma of not wanting to wipe my source disks for fear of acting too soon, they've offered to lend more for an inter transfer. at this point I was already looking into ZFS! I have a TrueNAS vm up and was testing with it. I was looking at going for raidz2. I'm all ears for opinions on what to do here.

so as for my work disks, I just want to do whatever I can to get as much data off as possible. it's doing the "structure needs cleaning" a good bit. I've gotten a couple tb off so far.
 

Evan

Well-Known Member
Jan 6, 2016
3,252
560
113
As long as it’s working I would just get you data off first priority and then if you want see if you can add the final disk. You could use md5deep to generate the checksum values for what you have copied off and the array with last drive you have recovered to check it’s the same, takes a while to run though
 

frawst

New Member
Mar 2, 2021
21
3
3
As long as it’s working I would just get you data off first priority and then if you want see if you can add the final disk. You could use md5deep to generate the checksum values for what you have copied off and the array with last drive you have recovered to check it’s the same, takes a while to run though
I'll have to look into md5deep, that's a new one to me. Not entirely certain how to employ it, but from what I've done over the past week or so, I can definitely attest to it taking a while! Even on the disks that were lent to me (brand new ironwolf pros), it took 10 hours at max speed to scan for binary data across all disks.
 

EffrafaxOfWug

Radioactive Member
Feb 12, 2015
1,390
494
83
I'll have to look into md5deep, that's a new one to me. Not entirely certain how to employ it, but from what I've done over the past week or so, I can definitely attest to it taking a while! Even on the disks that were lent to me (brand new ironwolf pros), it took 10 hours at max speed to scan for binary data across all disks.
It's probably enough of a divert to warrant its own thread but this sort of thing is very useful for a recovery scenario as a way of gauging how scunnered your files might be.

I've been using md5deep/hashdeep for years for the same sort of thing (I'm not using a filesystem with full checksumming so it's a poor-mans solution to spotting bitrot); you essentially just run it on your directory tree and it'll make an MD5/SHA/whatever hash of all the files within. You can save that out to a file, and then at a later date compare the hashes stored in the file vs. what the hashes of the files are right now.

Create a list of MD5 hashes for the files under /stuff using 4 CPU threads and save to an audit file:
Code:
pushd /stuff && nice -n 19 ionice -c 3 hashdeep -c md5 -l -r -j 4 * > /var/lib/hashdeep/stuff_2021-03-10.hashdeep
Compare the current files with the previously generated hash list (audit mode):
Code:
nice -n 19 ionice -c 3 hashdeep -r -j4 -c md5 -x -v -k /var/lib/hashdeep/stuff_2021-03-10.hashdeep /stuff
This'll output a list of all the files that have either changed or didn't exist since the audit file was made so whilst it's very useful for static datasets (movies, photos, etc) it's not ideal for rapidly changing datasets.
 
  • Like
Reactions: Evan and frawst