1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Trying to Recover MDADM Array

Discussion in 'Linux Admins, Storage and Virtualization' started by Cole, Jan 20, 2017.

  1. Cole

    Cole Member

    Joined:
    Jul 29, 2015
    Messages:
    30
    Likes Received:
    1
    I have a RAID6 mdadm array running under OpenMediaVault that has some issues. I've tried some things but am over my head now. All I need is to get this mounted so I can pull data off into a different server. I'm at a loss of what to do next.

    Thanks in advance.


    Output from "cat /proc/mdstat"
    ----------------------------------------------
    root@medianas:~# cat /proc/mdstat
    Personalities : [raid6] [raid5] [raid4]
    md0 : active (auto-read-only) raid6 sdb[0] sdf[13] sdj[14] sdp[12] sdo[11] sdn[10] sdm[9] sdl[8] sdk[7] sdi[6] sdh[5] sdg[4] sde[3] sdd[2] sdc[1]
    0 blocks super 1.2 level 6, 512k chunk, algorithm 2 [15/15] [UUUUUUUUUUUUUUU]
    resync=PENDING
    bitmap: 0/0 pages [0KB], 65536KB chunk

    unused devices: <none>
    ------------------------------------------------

    Output from "mdadm --detail /dev/md0"
    ------------------------------------------------
    root@medianas:~# mdadm --detail /dev/md0
    /dev/md0:
    Version : 1.2
    Creation Time : Sun Oct 9 19:46:47 2016
    Raid Level : raid6
    Used Dev Size : unknown
    Raid Devices : 15
    Total Devices : 15
    Persistence : Superblock is persistent

    Intent Bitmap : Internal

    Update Time : Fri Jan 20 23:31:28 2017
    State : clean, resyncing, Not Started (PENDING)
    Active Devices : 15
    Working Devices : 15
    Failed Devices : 0
    Spare Devices : 0

    Layout : left-symmetric
    Chunk Size : 512K

    Name : MediaNAS:media
    UUID : 48341ef3:3a295feb:52f54050:bdaf2b07
    Events : 25008

    Number Major Minor RaidDevice State
    0 8 16 0 active sync /dev/sdb
    1 8 32 1 active sync /dev/sdc
    2 8 48 2 active sync /dev/sdd
    3 8 64 3 active sync /dev/sde
    4 8 96 4 active sync /dev/sdg
    5 8 112 5 active sync /dev/sdh
    6 8 128 6 active sync /dev/sdi
    7 8 160 7 active sync /dev/sdk
    8 8 176 8 active sync /dev/sdl
    9 8 192 9 active sync /dev/sdm
    10 8 208 10 active sync /dev/sdn
    11 8 224 11 active sync /dev/sdo
    12 8 240 12 active sync /dev/sdp
    14 8 144 13 active sync /dev/sdj
    13 8 80 14 active sync /dev/sdf
    ---------------------------------------

    This is what I get when I try to mount the array
    ---------------------------------------
    root@medianas:~# mount /dev/md0 /media/media
    mount: wrong fs type, bad option, bad superblock on /dev/md0,
    missing codepage or helper program, or other error

    In some cases useful info is found in syslog - try
    dmesg | tail or so.
    root@medianas:~# mount -t xfs /dev/md0 /media/media
    Killed
    -----------------------------------------
     
    #1
  2. sullivan

    sullivan New Member

    Joined:
    Mar 27, 2016
    Messages:
    23
    Likes Received:
    14
    Based on the output from /proc/mdstat your array has gone into "auto-read-only" mode.

    The classic advice to get out of this is to run: mdadm --readwrite /dev/mdN

    You may want to try and figure out what caused this before you run the above command.

    I have used Linux MD raid (with many self-inflicted crashes, HW failures, and rebuilds) for many years, but I've never gotten stuck in this state.

    So I don't know what is most likely to cause this, but my guess is that something has happened that is preventing the array from being fully assembled during boot.

    This might be due to a failed kernel update where the initrd ramdisk was not fully populated, or some config files changed that affect the startup init scripts / systemd behavior. Or you may have done something that changed the name or the UUID of the array.

    BEFORE YOU TRY ANYTHING BELOW -- I recommend you do some googling on "mdadm --readwrite" first to find some other descriptions of how people handled this.

    First save away a copy of your /etc/mdadm.conf file. Then try regenerating it.

    mdadm --examine --scan

    And compare against your existing mdadm.conf. If they look significantly different then you can try replacing mdadm.conf. (But make sure you save the old one first...)

    mdadm --examine --scan > /etc/mdadm.conf

    At this point, try manually stopping and restarting the array.

    mdadm --stop /dev/mdN
    mdadm --assemble --scan

    If all of this works without obvious errors (check the kernel dmesg for I/O errors, etc.) then I'd be daring enough to try the --readwrite command.

    If that gets you back up, but you end up stuck again after a reboot you may need to rebuild your initrd ramdisk. This varies widely between Linux distributions so you'll need to find instructions for your specific one.

    Good Luck!
     
    #2
    Cole likes this.
  3. Cole

    Cole Member

    Joined:
    Jul 29, 2015
    Messages:
    30
    Likes Received:
    1
    I went ahead and ran "mdadm --detail /dev/md0" again and now the state is showing "State : active, Not Started". Does that change anything? I still can't mount the array. I get "mount: /dev/md0 is already mounted or /media busy"


    Honestly I'm at the end of my knowledge on this. I'd be willing to do a teamviewer session with someone and let them work directly with it. I have ikvm and ssh available to the server.
     
    #3
    Last edited: Jan 20, 2017
  4. sullivan

    sullivan New Member

    Joined:
    Mar 27, 2016
    Messages:
    23
    Likes Received:
    14
    I would try running the "mount" command (no arguments) and look to see if really is mounted.

    The error message you are seeing is from the kernel based on its mount-table state in RAM, not based on anything stored on the disks. So if the kernel says it is mounted or busy, this doesn't imply a problem with the array itself.

    If you continue to get "already mounted / busy" errors but the "mount" command doesn't show it as mounted then I would expect your kernel is partly crashed. I would reboot and start fresh trying to assemble the MD array from there.

    You might want to comment the array out of your /etc/fstab so the system does not try and mount it at boot time.
     
    #4
    Cole likes this.
  5. Cole

    Cole Member

    Joined:
    Jul 29, 2015
    Messages:
    30
    Likes Received:
    1

    Thank you for taking the time to assist me. I really appreciate it.

    Mount did not show the array as actually being mounted so I'm rebooting and seeing what happens
     
    #5
  6. Cole

    Cole Member

    Joined:
    Jul 29, 2015
    Messages:
    30
    Likes Received:
    1
    Now I get this when I try to mount the array;
    ----------------------------------
    root@medianas:~# mount /dev/md127 /media/media
    mount: wrong fs type, bad option, bad superblock on /dev/md127,
    missing codepage or helper program, or other error

    In some cases useful info is found in syslog - try
    dmesg | tail or so.
    ----------------------------------
     
    #6
  7. sullivan

    sullivan New Member

    Joined:
    Mar 27, 2016
    Messages:
    23
    Likes Received:
    14
    What do you get now when you run "cat /proc/mdstat"?

    If it shows that the array is active and the sync is no longer pending, then you may have an actual problem with the filesystem on the array being corrupted. This might have happened if you typed the wrong /dev/hdX name when trying to do something else.

    What type of filesystem was this? Ext4 or something else? (In your first post it looks like you tried to mount an XFS filesystem.)

    There are some tools for repairing either one of these filesystems but they are different for each type.

    I probably can't help with filesystem recovery (never done it before), but someone else might be able to.
     
    #7
    Cole likes this.
  8. Cole

    Cole Member

    Joined:
    Jul 29, 2015
    Messages:
    30
    Likes Received:
    1
    This is the result of "cat /proc/mdstat"
    ----------------------------------
    root@medianas:~# cat /proc/mdstat
    Personalities : [raid6] [raid5] [raid4]
    md127 : active raid6 sdf[0] sdk[13] sdn[14] sdp[12] sdj[11] sdo[10] sde[9] sdd[8] sdc[7] sdm[6] sdb[5] sdl[4] sdi[3] sdh[2] sdg[1]
    0 blocks super 1.2 level 6, 512k chunk, algorithm 2 [15/15] [UUUUUUUUUUUUUUU]
    bitmap: 0/0 pages [0KB], 65536KB chunk

    unused devices: <none>
    ----------------------------------

    I believe it is an XFS filesystem. Its been a while since I created this.
     
    #8
  9. Cole

    Cole Member

    Joined:
    Jul 29, 2015
    Messages:
    30
    Likes Received:
    1
    Ended up not able to mount the array. I installed Windows on another drive and started looking for recovery software. It is only media on this array but it represents a very large amount of time to re-rip/reacquire the media. I'm currently using UFS Explorer RAID Recovery and it was able to pick up the array and mount the XFS filesystem and I am currently transferring the data to another server. I found a "special" copy of the software on the interwebs but if this completes I am definitely buying a copy, even at $106 for a license.

    Thanks @sullivan for you help.
     
    #9
  10. Blinky 42

    Blinky 42 Active Member

    Joined:
    Aug 6, 2015
    Messages:
    365
    Likes Received:
    107
    Once done backing up your data I would reboot into Linux try and determine why the array got into that state if possible before rebuilding it.
    Of note in your posts above, the array changed names from /dev/md0 to /dev/md127 - I have not seen that happen w/o changing kernels or the internal array IDs changed and the kernel assigned a different dev at boot thinking it was something new.
    Look through the whole dmesg output (dmest | less) or (dmesg > /tmp/ugh_dmesg; less /tmp/ugh_dmesg) and see if you notice anything about the drives taking a long time to spin up or the port speeds being odd (some coming up a 1.5Gb some at 3Gbit or 6Gbit for example) or drives not responding to commands or other timeouts.

    Also run smartctl on all your drives, you could do a
    Code:
    for D in /dev/sd?; do echo "Checking $D "; smartctl -l error $D ; done > /tmp/all_smart_errs
    less /tmp/all_smart_errs
     
    and look at that to see if there are any errors that pop out.
    Doing "smartctl -a" instead of "smartctl -l error" has all the info and you can look at the counters and find the error section under a heading of "SMART Error Log"

    You can check the previous system log files and see if you had any errors or warning in there from around the time it started giving you problems. Try /var/log/messages or /var/log/syslog depending on the base distribution they use. The old log files may be rotated into /var/log/syslog.$DATE.gz or /var/log/syslog.$N similar, just run less on them and it should uncompress it in memory while you are reading through the log.

    And before putting the same set of drives back into play again with the same configuration, I would run a long smart test on each drive, and try to at leas read all sectors on each of the drives with a
    Code:
    time dd if=/dev/sdX od=/dev/null bs=1M
    and check dmesg after that completes to see no drive errors show up, and that time rough time to read the whole drive for each is in the same ballpark. If you are willing to scrub and rebuild the array I would write data to the whole drive as well and check that those didn't cause errors, and compare the output of smartctl -a before and after each read/write pass to make sure the counters indicating Pre-Fail don't increase.

    I don't see /dev/sda in your output, and assume it is the OS drive. Run your smart tests on that as well to make sure the OS drive didn't go bad on you which can cause all sorts of bizarre errors across the system.
     
    #10
  11. EffrafaxOfWug

    EffrafaxOfWug Radioactive Member

    Joined:
    Feb 12, 2015
    Messages:
    406
    Likes Received:
    142
    Hopefully I'm not stating the obvious, but assume you've tried an XFS fsck/repair (a quick google tells me the correct commands should be xfs_check and xfs_repair) against the re-assembled /dev/md127 and the contents of the member drives are all showing the correct results...? Notice you're not using partitions but raw drives - personally I always partition drives before adding them to mdadm since it means a) you can guarantee there'll be no misalignment issues and b) it's usually immediately obvious if one of the partitions is damaged instead of having to inspect each drive superblock.

    Been a while since I monkeyed with XFS, but if there are problems with the filesystem on the array they should show up in dmesg when you attempt the mount. If the filesystem is half-hosed, then xfs_check and xfs_repair are likely to be able to help. If you still get OS errors after that (like the device being busy), reboot into a bootable distro like SystemRescueCD and see if that handles any different (since it won't try and do things like mount filesystems automatically). If that also doesn't recognise your XFS filesystem then it's likely that something happened in the rebuild process that hosed the array.
     
    #11
  12. Cole

    Cole Member

    Joined:
    Jul 29, 2015
    Messages:
    30
    Likes Received:
    1
    I'm going to try a few things once its backed up to another server. Its currently copying, albet a bit slow. the software I'm using is not the fastest it seems but it beats reacquiring all that media.
     
    #12
Similar Threads: Trying Recover
Forum Title Date
Linux Admins, Storage and Virtualization Still trying to get IPMIView intalled on Ubuntu Jul 5, 2017
Linux Admins, Storage and Virtualization Trying to learn about iSCSI Jul 14, 2016
Linux Admins, Storage and Virtualization Thinking of trying unRaid Sep 8, 2014
Linux Admins, Storage and Virtualization NFS client shares not re-mounting after server recovery? May 19, 2016

Share This Page