Trying to Recover MDADM Array

Cole · Jan 20, 2017

I have a RAID6 mdadm array running under OpenMediaVault that has some issues. I've tried some things but am over my head now. All I need is to get this mounted so I can pull data off into a different server. I'm at a loss of what to do next.

Thanks in advance.

Output from "cat /proc/mdstat"
----------------------------------------------
root@medianas:~# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active (auto-read-only) raid6 sdb[0] sdf[13] sdj[14] sdp[12] sdo[11] sdn[10] sdm[9] sdl[8] sdk[7] sdi[6] sdh[5] sdg[4] sde[3] sdd[2] sdc[1]
0 blocks super 1.2 level 6, 512k chunk, algorithm 2 [15/15] [UUUUUUUUUUUUUUU]
resync=PENDING
bitmap: 0/0 pages [0KB], 65536KB chunk

unused devices: <none>
------------------------------------------------

Output from "mdadm --detail /dev/md0"
------------------------------------------------
root@medianas:~# mdadm --detail /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Sun Oct 9 19:46:47 2016
Raid Level : raid6
Used Dev Size : unknown
Raid Devices : 15
Total Devices : 15
Persistence : Superblock is persistent

Intent Bitmap : Internal

Update Time : Fri Jan 20 23:31:28 2017
State : clean, resyncing, Not Started (PENDING)
Active Devices : 15
Working Devices : 15
Failed Devices : 0
Spare Devices : 0

Layout : left-symmetric
Chunk Size : 512K

Name : MediaNAS:media
UUID : 48341ef3:3a295feb:52f54050:bdaf2b07
Events : 25008

Number Major Minor RaidDevice State
0 8 16 0 active sync /dev/sdb
1 8 32 1 active sync /dev/sdc
2 8 48 2 active sync /dev/sdd
3 8 64 3 active sync /dev/sde
4 8 96 4 active sync /dev/sdg
5 8 112 5 active sync /dev/sdh
6 8 128 6 active sync /dev/sdi
7 8 160 7 active sync /dev/sdk
8 8 176 8 active sync /dev/sdl
9 8 192 9 active sync /dev/sdm
10 8 208 10 active sync /dev/sdn
11 8 224 11 active sync /dev/sdo
12 8 240 12 active sync /dev/sdp
14 8 144 13 active sync /dev/sdj
13 8 80 14 active sync /dev/sdf
---------------------------------------

This is what I get when I try to mount the array
---------------------------------------
root@medianas:~# mount /dev/md0 /media/media
mount: wrong fs type, bad option, bad superblock on /dev/md0,
missing codepage or helper program, or other error

In some cases useful info is found in syslog - try
dmesg | tail or so.
root@medianas:~# mount -t xfs /dev/md0 /media/media
Killed
-----------------------------------------

sullivan · Jan 20, 2017

Based on the output from /proc/mdstat your array has gone into "auto-read-only" mode.

The classic advice to get out of this is to run: mdadm --readwrite /dev/mdN

You may want to try and figure out what caused this before you run the above command.

I have used Linux MD raid (with many self-inflicted crashes, HW failures, and rebuilds) for many years, but I've never gotten stuck in this state.

So I don't know what is most likely to cause this, but my guess is that something has happened that is preventing the array from being fully assembled during boot.

This might be due to a failed kernel update where the initrd ramdisk was not fully populated, or some config files changed that affect the startup init scripts / systemd behavior. Or you may have done something that changed the name or the UUID of the array.

BEFORE YOU TRY ANYTHING BELOW -- I recommend you do some googling on "mdadm --readwrite" first to find some other descriptions of how people handled this.

First save away a copy of your /etc/mdadm.conf file. Then try regenerating it.

mdadm --examine --scan

And compare against your existing mdadm.conf. If they look significantly different then you can try replacing mdadm.conf. (But make sure you save the old one first...)

mdadm --examine --scan > /etc/mdadm.conf

At this point, try manually stopping and restarting the array.

mdadm --stop /dev/mdN
mdadm --assemble --scan

If all of this works without obvious errors (check the kernel dmesg for I/O errors, etc.) then I'd be daring enough to try the --readwrite command.

If that gets you back up, but you end up stuck again after a reboot you may need to rebuild your initrd ramdisk. This varies widely between Linux distributions so you'll need to find instructions for your specific one.

Good Luck!

Cole · Jan 20, 2017

I went ahead and ran "mdadm --detail /dev/md0" again and now the state is showing "State : active, Not Started". Does that change anything? I still can't mount the array. I get "mount: /dev/md0 is already mounted or /media busy"

Honestly I'm at the end of my knowledge on this. I'd be willing to do a teamviewer session with someone and let them work directly with it. I have ikvm and ssh available to the server.

sullivan · Jan 20, 2017

I would try running the "mount" command (no arguments) and look to see if really is mounted.

The error message you are seeing is from the kernel based on its mount-table state in RAM, not based on anything stored on the disks. So if the kernel says it is mounted or busy, this doesn't imply a problem with the array itself.

If you continue to get "already mounted / busy" errors but the "mount" command doesn't show it as mounted then I would expect your kernel is partly crashed. I would reboot and start fresh trying to assemble the MD array from there.

You might want to comment the array out of your /etc/fstab so the system does not try and mount it at boot time.

Cole · Jan 20, 2017

sullivan said:
I would try running the "mount" command (no arguments) and look to see if really is mounted.

The error message you are seeing is from the kernel based on its mount-table state in RAM, not based on anything stored on the disks. So if the kernel says it is mounted or busy, this doesn't imply a problem with the array itself.

If you continue to get "already mounted / busy" errors but the "mount" command doesn't show it as mounted then I would expect your kernel is partly crashed. I would reboot and start fresh trying to assemble the MD array from there.

You might want to comment the array out of your /etc/fstab so the system does not try and mount it at boot time.

Thank you for taking the time to assist me. I really appreciate it.

Mount did not show the array as actually being mounted so I'm rebooting and seeing what happens

Cole · Jan 20, 2017

Now I get this when I try to mount the array;
----------------------------------
root@medianas:~# mount /dev/md127 /media/media
mount: wrong fs type, bad option, bad superblock on /dev/md127,
missing codepage or helper program, or other error

In some cases useful info is found in syslog - try
dmesg | tail or so.
----------------------------------

sullivan · Jan 20, 2017

What do you get now when you run "cat /proc/mdstat"?

If it shows that the array is active and the sync is no longer pending, then you may have an actual problem with the filesystem on the array being corrupted. This might have happened if you typed the wrong /dev/hdX name when trying to do something else.

What type of filesystem was this? Ext4 or something else? (In your first post it looks like you tried to mount an XFS filesystem.)

There are some tools for repairing either one of these filesystems but they are different for each type.

I probably can't help with filesystem recovery (never done it before), but someone else might be able to.

Cole · Jan 20, 2017

This is the result of "cat /proc/mdstat"
----------------------------------
root@medianas:~# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md127 : active raid6 sdf[0] sdk[13] sdn[14] sdp[12] sdj[11] sdo[10] sde[9] sdd[8] sdc[7] sdm[6] sdb[5] sdl[4] sdi[3] sdh[2] sdg[1]
0 blocks super 1.2 level 6, 512k chunk, algorithm 2 [15/15] [UUUUUUUUUUUUUUU]
bitmap: 0/0 pages [0KB], 65536KB chunk

unused devices: <none>
----------------------------------

I believe it is an XFS filesystem. Its been a while since I created this.

Cole · Jan 21, 2017

Ended up not able to mount the array. I installed Windows on another drive and started looking for recovery software. It is only media on this array but it represents a very large amount of time to re-rip/reacquire the media. I'm currently using UFS Explorer RAID Recovery and it was able to pick up the array and mount the XFS filesystem and I am currently transferring the data to another server. I found a "special" copy of the software on the interwebs but if this completes I am definitely buying a copy, even at $106 for a license.

Thanks @sullivan for you help.

Blinky 42 · Jan 21, 2017

Once done backing up your data I would reboot into Linux try and determine why the array got into that state if possible before rebuilding it.
Of note in your posts above, the array changed names from /dev/md0 to /dev/md127 - I have not seen that happen w/o changing kernels or the internal array IDs changed and the kernel assigned a different dev at boot thinking it was something new.
Look through the whole dmesg output (dmest | less) or (dmesg > /tmp/ugh_dmesg; less /tmp/ugh_dmesg) and see if you notice anything about the drives taking a long time to spin up or the port speeds being odd (some coming up a 1.5Gb some at 3Gbit or 6Gbit for example) or drives not responding to commands or other timeouts.

Also run smartctl on all your drives, you could do a

Code:

for D in /dev/sd?; do echo "Checking $D "; smartctl -l error $D ; done > /tmp/all_smart_errs
less /tmp/all_smart_errs

and look at that to see if there are any errors that pop out.
Doing "smartctl -a" instead of "smartctl -l error" has all the info and you can look at the counters and find the error section under a heading of "SMART Error Log"

You can check the previous system log files and see if you had any errors or warning in there from around the time it started giving you problems. Try /var/log/messages or /var/log/syslog depending on the base distribution they use. The old log files may be rotated into /var/log/syslog.$DATE.gz or /var/log/syslog.$N similar, just run less on them and it should uncompress it in memory while you are reading through the log.

And before putting the same set of drives back into play again with the same configuration, I would run a long smart test on each drive, and try to at leas read all sectors on each of the drives with a

Code:

time dd if=/dev/sdX od=/dev/null bs=1M

and check dmesg after that completes to see no drive errors show up, and that time rough time to read the whole drive for each is in the same ballpark. If you are willing to scrub and rebuild the array I would write data to the whole drive as well and check that those didn't cause errors, and compare the output of smartctl -a before and after each read/write pass to make sure the counters indicating Pre-Fail don't increase.

I don't see /dev/sda in your output, and assume it is the OS drive. Run your smart tests on that as well to make sure the OS drive didn't go bad on you which can cause all sorts of bizarre errors across the system.

EffrafaxOfWug · Jan 22, 2017

Hopefully I'm not stating the obvious, but assume you've tried an XFS fsck/repair (a quick google tells me the correct commands should be xfs_check and xfs_repair) against the re-assembled /dev/md127 and the contents of the member drives are all showing the correct results...? Notice you're not using partitions but raw drives - personally I always partition drives before adding them to mdadm since it means a) you can guarantee there'll be no misalignment issues and b) it's usually immediately obvious if one of the partitions is damaged instead of having to inspect each drive superblock.

Been a while since I monkeyed with XFS, but if there are problems with the filesystem on the array they should show up in dmesg when you attempt the mount. If the filesystem is half-hosed, then xfs_check and xfs_repair are likely to be able to help. If you still get OS errors after that (like the device being busy), reboot into a bootable distro like SystemRescueCD and see if that handles any different (since it won't try and do things like mount filesystems automatically). If that also doesn't recognise your XFS filesystem then it's likely that something happened in the rebuild process that hosed the array.

Cole · Jan 22, 2017

I'm going to try a few things once its backed up to another server. Its currently copying, albet a bit slow. the software I'm using is not the fastest it seems but it beats reacquiring all that media.

Search

Trying to Recover MDADM Array

Cole

Member

sullivan

New Member

Cole

Member

sullivan

New Member

Cole

Member

Cole

Member

sullivan

New Member

Cole

Member

Cole

Member

Blinky 42

Active Member

EffrafaxOfWug

Radioactive Member

Cole

Member