Data Recovery Project

TuxDude · Jan 9, 2015

I'm not certain that this is the right forum for this, but a mod can move it if they like. In any case, on to the content...

I'm currently working on a small project here for a friend. He has had an Intel Matrix-Raid of 3x 500GB SATA drives running Raid-0 for quite a few years now, and just recently he decided it was time for an upgrade which I helped him with. During that process I learned that his array has actually been having issues for over a year now and as part of the upgrade it was decided to replace it with a single new 3TB disk. From his point of view - the Matrix Raid software telling him there was a problem was not really a problem as he could force the drive back online and everything appeared to keep working - but now attempts to copy all of his data to the new drive would always result in the system locking up (I'm guessing the files in bad areas were ones he didn't access often if at all). So it has fallen to me to recover as much of the data as possible from his old drives, and they are not in good shape. Here is a little snip of the smart data from one of them:

5 Reallocated_Sector_Ct 0x0033 001 001 036 Pre-fail Always FAILING_NOW 30822
198 Offline_Uncorrectable 0x0010 052 052 000 Old_age Offline - 1002

My question is whether this is a subject people here would find interesting to read about. If it is I'm happy to keep this thread up to date with the steps I take as well as the results.

OBasel · Jan 9, 2015

I don't know how to help but I'm interested.

Chuckleb · Jan 9, 2015

I would love to follow this thread.

sboesch · Jan 10, 2015

Ghost knows how to continue on error as well as read intel software raid. You could try booting from ghost and then cloning to the new 3TB disk.

rubylaser · Jan 10, 2015

I would use gddrescue to clone those bad disks to new drives if you have three spares laying around (so you can work on the recovery on good, working disks). Then, I would use the cloned disks re-assemble the array. If you can get that working, then you can copy the files to the new disk.

But, Ghost as suggested above should work too.

Mike · Jan 10, 2015

rubylaser said:
I would use gddrescue to clone those bad disks to new drives if you have three spares laying around (so you can work on the recovery on good, working disks). Then, I would use the cloned disks re-assemble the array. If you can get that working, then you can copy the files to the new disk.

But, Ghost as suggested above should work too.

Or mount the images from the new 3TB disk.

TuxDude · Jan 10, 2015

Thanks for the suggestions - I'm actually already well into in the recovery and have my plan (and yes, it does involve GNU ddrescue which I will refer to as just 'ddrescue'. 'gddrescue' is the Debian/Ubuntu package for ddrescue but as I don't use either of those distro's I had to do a quick google to find out what that was.) This thread was more about whether the readers here would enjoy reading the story/documentation of how to perform the recovery, and the results. And it appears there is at least some interest so I will keep updating this thread as I go.

I'm rather short on time right now so I will write a full story of the first part of the recovery later. But for now I will say that of the three 500GB source drives, I've got a complete error-free image of the only drive reporting good health, a complete error-free image of a drive that has serious issues spinning up and is making really bad noises, and an image containing all but 200MB of the drive with all the bad sectors who's smart info I originally posted. I'm still working on reducing the amount of missing data from that image but its a slow process.

mackle · Jan 10, 2015

Another example for me of how bad Matrix raid is. Look forward to the full write up.

rubylaser · Jan 11, 2015

I'm sure a write up will be helpful for others. I'm glad you ddrescuing was so successful with two bad drives.

TuxDude · Jan 12, 2015

I think the best way to write this all up is to do so as a series of steps. But writing is not my speciality and the step I am about to write here had a lot more thought go into it than how I would go about writing this thread. So at the very end I might end up going back and doing a ton of editing and combining all of these steps into a single post/document.

Step 0 - Theory and Planning

Before connecting a single drive its important to have a plan of how the procedure is going to work. In this case we are starting with three 500GB SATA drives, at least one of which will be having some issues, but all three of them are still functional enough that the user was able force them online and continue using his array. If any of the drives had failed to the point of not communicating over a SATA bus anymore then this recovery would be beyond my abilities - though if I could get an image of such a drive from a 3rd party data recovery company I would still be able to image the remaining drives myself and proceed through the other steps. Once we have images of all of the drives we can move up to the next layer in the stack, Intel Matrix RAID in this case. Luckily for us this is a well documented format and there are plenty of tools available that know how to read it. And then the final layer will be the NTFS filesystem - so long as none of the corruption has occurred in important NTFS areas this part should be quite easy.

You always have to be careful when dealing with failing drives, because everything you do could potentially cause more damage, or even kill the drive completely. I've come to trust 'ddrescue' as my go-to tool for these kind of situations because it was designed using that knowledge with an algorithm to recover as much data as possible while causing as little damage as possible. And because after going through a few situations like this it has given me better results than any other tool I've tried. So the plan for the first part of the rescue is to use my desktop, including a 2-bay USB-3 SATA dock and spare capacity on a 4TB internal SATA drive to recover as much raw data as possible from the provided drives. Once we have images (and make backups of them just in case), we no longer need to worry about anything we do causing further damage. And when we move on to the next layer we are also now dealing with a source that will not give us any errors, which can be important when there is some RAID involved. A RAID controller (hardware or software) will often toss a drive out of an array if it encounters an error, or fail a rebuild, or generally just not ignore that error and continue on its way. But in a situation like this we have to just accept that some data is lost and move on - our images won't return read errors for the sectors we were unable to recover - they will just return 0's. A non-ZFS RAID engine on top won't verify the data and so we will just end up with 0's in our NTFS image at the end of this process.

While there are a few tools available for dealing with the RAID layer in use here, I'm going to going with the tools that are already built into my desktop OS. The Linux md-RAID subsystem has had support for Intel metadata for quite a while now. So long as the metadata blocks survived corruption and made it to our images, this step should be as simple as setting up all 3 images as loopback devices, and then using mdadm to scan then for an array. If we do encounter any RAID-metadata corruption, as long as we can get the data from at least one of the drives we can still use mdadm to force-assemble the images into an array, specifying the exact chunk sizes and other information that we can find from the good drive. It might take a few guesses to get the drives in the right order, but it won't be the first time I've had to force-assemble an array. Worst case, as long as we know or can figure out the chunk size and the starting offset to the data, the last option is to use 'dmsetup' to manually create a new block device that is stripped across the three loopback images the same way the Intel RAID would do it.

Once we have the array assembled from images I expect mounting the NTFS filesystem on it to just work. As above, hopefully the corruption hasn't gotten into the NTFS metadata. If it has then we can use filesystem check tools to try and recover things from the backup metadata blocks. If we had any issues in the previous layer when assembling the RAID then this is where we figure out if we got the drives in the correct order - if we were right then it just works. If they are in the wrong order, then the NTFS either won't mount, or will be 99% corrupt - unmount it and reassemble the RAID in a different order and try again. Once we have the NTFS mounted, the last step of the recovery is as simple as putting the new 3TB from the user into my SATA dock, and copy the data across. I'll be using good old rsync for that task.

One final point to make - I'm going to experiment with a new method I came across to try and identify exactly which files have corruption in them. Before assembling the array and after making backups of the images, I'm going to use ddrescue's fill mode to replace the 0's in the corrupt areas with a different and hopefully unique data pattern. Then once the NTFS is mounted by doing a full search of every file for that pattern I should get a list of corrupt files as the result. I will then disassemble everything and use fill mode again to put the 0's back before doing the final data recovery step. That process is used as an example in the ddrescue documentation for fill mode for a single-drive recovery, so I will be adapting it slightly to work in a multi-device RAID setup.

TuxDude · Jan 12, 2015

Step 1 - Imaging the drives

The first thing I did was attach the drives to the SATA dock and use 'smartctl -a' to dump all of the SMART data from all of them. I also created a folder on my 4TB drive for everything related to this recovery, and in there created a .txt file for each drive named using the disks serial number from the smart info. The smart info is copied into the proper file and saved, and then everything else I do to each drive gets appended to the end of that file. For the three drives here are is the smart data, serial numbers have been removed.

Note - SMART data removed as post was too long to allow posting. Sorry.

Then the imaging process begins. For a first pass each drive has the following command run against it (the command and resulting output is all appended to each disks txt file for my own reference later)

ddrescue /dev/sdc ./<SERIAL-NUMBER>.img ./<SERIAL-NUMBER>.log

Referring to drives in the order that their smart info was posted in, here's how things have went, or are still ongoing.

The third drive completed its image with zero errors. It's smart info was clean so I figured it would be easy to get out of the way first and stick it back on a shelf.

The second drive has issues spinning up. The SATA dock proved quite useful here as by quickly pushing the power button twice I could power-cycle just that drive in less time than it takes the platter to stop spinning. With each quick reset a little bit more speed was left in the platter and after about 5 attempts and some horrible noises the drive completed spinup and started talking to my desktop. Luckily once it was going I was able to get a complete image in the first attempt.

The first drive listed is the one that is still fighting with me 5 days later. Besides being full of bad sectors, after encountering an error the drive also locks up and won't respond to any commands until it is power-cycled. I also tried connecting it directly to a SATA port to see if the dock maybe had issues dealing with drives with errors, but the problem continued there, and required a full system power off/on to re-recognize the drive after an error. Just an OS reboot with that drive internal resulted in the system coming back up with that drive still offline. So back in the dock it went, and again it is very convenient to be able to power-cycle that drive without affecting the rest of the system. I've gotten very good at hitting 'power, power' on the dock, then 'up, enter' on my keyboard to restart the ddrescue process whenever the drive locks up. After a few days of doing that I've also noticed that the drive behaves much better when it is cold (room-temperature, sitting in the dock powered off while I am away for a while), so as I'm getting the remaining data to recover smaller and smaller, and encountering errors more often, I'm also starting to put couple-hour pauses into my recovery efforts to let the drive cool down. When I get to the point of re-trying error blocks I might even try freezing the drive a bit to see if it helps. At that point I'm less concerned with damage to the drive as I've already recovered as much of the data as I can anyways - either I'm lucky and get a few more KB recovered, or I kill it in the attempt having already recovered as much as possible.

As to the detailed process - the first run recovered the first 400GB of data then the drive locked up. Additional runs after power-cycling the drive were just locking it up again with very little extra data recovered.

I tried adding the '-R' parameter to run the recovery in reverse - hoping to start and the end of the disk and work back through good data towards the 400GB location until more errors were recovered, but that also recovered very little extra data before locking up the drive.

Adding '-d' to run in direct-acess mode didn't help either - in areas with good sectors speed was reduced significantly without the OS's caching and read-ahead, and in areas with errors the drive still locked up.

Then I tried running with just '-M' as the only parameter, which really starting making ddrescue jump around in the location it was trying to read from - but with '-d' removed when it did find an area with good data the read-ahead was working and it would recover it quite quickly until it found another error. Using repeated runs with only '-M' as the parameter and rebooting the SATA dock whenever the drive locked up I was able to get the amount of missing data from 100GB down to around 800MB over a couple days. Of course I wasn't sitting here rebooting the drive 24x7, but that was still a few hours of time spent babysitting the process.

At that point it seemed that additional attempts were causing the drive to lock up too fast, with too little recovered. Attempts to read large blocks combined with read-ahead meant I almost always hit an error and locked up the drive within seconds. Back to direct mode - using '-d -M' as the parameters let me work more slowly through areas full of scattered bad blocks and also still caused it to jump around in terms of the location it was trying to recover from. A few more hours of babysitting with these parameters got the missing data down to the 100MB mark.

I'm now running passes (with cooldown periods between) with just the '-d' parameter. No more jumping around, just slowly working through whats left of the drive in direct-access mode. Because of all the drive-rebooting and pauses it is turning into a very slow process, though I'd hate to think of how I'd be trying to do it without the SATA dock. As of this post, I have recovered all but 62690 kB from that drive. I think if I keep myself to just a few attempts per day (before work, after work, and before bed maybe, with cooldown time between) I should probably have recovered most of the non-error sectors by the end of the week and hopefully have the missing data down to under 5-10MB. Every 2000 bad sectors represents 1MB of data, so I am hoping to eventually get my missing data down to a value that is close to what smart tells me for current pending sectors. However that value is increasing due to my work on the drive (or maybe just because I'm trying to read sectors that already were bad, but without trying them the drive didn't know it yet), it is now at 1355, up from the 1002 reported at the start of the procedure.

If I can get it to the point where the missing data and pending sectors numbers are pretty close, then I will start using ddrescue with the '-d -r 1' parameter to re-try the bad sectors. With the way this drive locks up I have a feeling more than 1 retry will just result in a lot of wasted time rebooting it. This is the point where I might also try freezing the drive or any other tricks I can think up - at this point any extra data recovered is a bonus and if I kill the drive in the attempt it will be with the knowledge that I got as much data off it as possible.

And with that, you might all have to wait a while before I move on to Step 2.

bx23.2005 · Jan 13, 2015

go to grc.com and check out spinrite. it's to late for backup software.

swerff · Jan 13, 2015

Once I imaged and drive fully puked, or satisfied with what I've extracted, I then make a copy of the image. That way when I mount the image, and do any file system modifications, if a change is for the worse, I'll have an undo button to start fresh, which won't involve the now dead hdd..

Although space gets piggish working with 3tb images.

I have 18x3tb drives raided, with one huge file system , so I have the room for projects like this.

TuxDude · Jan 13, 2015

swerff said:
Once I imaged and drive fully puked, or satisfied with what I've extracted, I then make a copy of the image. That way when I mount the image, and do any file system modifications, if a change is for the worse, I'll have an undo button to start fresh, which won't involve the now dead hdd..

Although space gets piggish working with 3tb images.

I have 18x3tb drives raided, with one huge file system , so I have the room for projects like this.

I will be making backups of the images onto my file-server before working with them - only around 40TB of raw capacity but still more than enough for this project. The other option, slightly more complicated but significantly faster to backup/roll-back and much lower capacity requirements, is to use linux device-mapper to make thin-provisioned snapshots of the images before mounting them. Then you only need as much extra capacity as the size of the changes you make, and creation and deletion of the snapshots is far faster than typing the commands required to make/remove them, compared to copying a multi-GB (or TB) image over a LAN.

swerff · Jan 14, 2015

I've done the lan trick. I just hot swap the drives directly into controller, unless their Pata/eide. I have an extra $35 4port card that's only task is hot hook-ups. Array is spanned across 4 other cards and on board serial.

I have a few sata cables and power plugs hanging out the side I made with a hole saw. Found a rubber grommet and presto . I just let the drives hang.

My data extraction extention with direct i/o access. No enclosure variable.

Not sure about you but once data is saved, or at best what I can make of it, if Windows still loads, I'll Sams the password, remove garbage files, then tar/pigz the image to squash zero space, and keep for later personal use. Working Windows keys come in handy.

I had fun yesterday dding a memory card in a phone mounted via USB with phone. I was able to pull lol photos off. Phone kept dropping SD card when reading too many errors. Had to keep booting into recovery and remounting.

Took a while, but I inched my way around with a log file.

TuxDude · Jan 18, 2015

Just a short update today. After many restarts of the flakey drive and re-runs of ddrescue I've finally finished a first attempt at reading every 512-byte sector on the drive. I've now recovered all but 1790 kB of the drive image, with ddrescue reporting 2598 bad sectors. Smart information for the drive is telling me that the current pending sectors is up to 1525. So I'm assuming that the extra 1000 errors from ddrescue are sectors that I should be able to recover but failed to in the first pass due to the drives instability. Next up is to run 'ddrescue -d -r 1 /dev/sdc ./<SERIAL-NUMBER>.img ./<SERIAL-NUMBER>.log' and give each of those 2598 sectors a second attempt. An initial run of that command has further reduced the missing data to 1767 kB and the number of errors to 2566 before the drive locked up again.

It is cooling off again now, I will probably give it one more run a bit later tonight just before I go to bed. Also keep in mind that the fact that it is retrying sectors and which ones have already been retried is all saved in the log - further runs will be just the command 'ddrescue -d /dev/sdc ./<SERIAL-NUMBER>.img ./<SERIAL-NUMBER>.log' again as I don't want to do any additional retries of sectors that have already gotten their second attempt. After a bunch more reboots of the drive and runs of that command it will eventually report 'Finished' at which point I will know that each sector has had its single retry. Then my plan is to try freezing the drive (its been behaving much better to this point when cool at least) and use '-r 1' again to give remaining unrecovered sectors a third attempt, with additional freeze's whenever the drive locks up.

rubylaser · Jan 19, 2015

I wish you luck. Hopefully your friend is at least buying you beer for this Herculean effort.

TuxDude · Jan 28, 2016

So with a lot of other things going on in life, I kinda forgot about this thread. It didn't help at all that the drive spent far more time powered off and cooling down than it did trying to recover data. But eventually I got to the point where I decided it wasn't worth spending any more time on. Final status, I recovered all but 226KB spread across 408 errors of the problem drive.

So now we have 3 disk images, corresponding to the 3 disks that originally made up the array. Two of the images are complete, and the last one is almost complete except for 226KB of zero's where data used to be - but at least we can read those zeros without getting IO errors. Time to move on to the next step...

Step 2 - Prepare to assemble the array

Before we actually do anything, its time to make some backups. As some of the following steps could possibly make (bad) changes to the image files, and it might not even be possible to recover all (or as much) of the data again, make copies of all three image files to a safe location - I made a copy of each on my NAS just in case. On top of that backup copy I also wanted to be able to test things without having to wait to restore 1.5TB over a 1gbps network connection over and over, for that job I turned to snapshots.

First off, we need to turn the image files into block devices. A block device in Linux is just a device that you can read or write in blocks instead of files, a filesystem sits on top of a block device. A hard-drive (eg. /dev/sda) is a block device, a partition (/dev/sda1) is also a block device, a RAID array is a block device, etc. We will be using loopback devices to make our image files work like block devices. As we run these commands pay attention to the output, we will need the loopback device names later on.

$ sudo losetup -f --show <disk1>.img
/dev/loop0
$ sudo losetup -f --show <disk2>.img
/dev/loop1
$ sudo losetup -f --show <disk3>.img
/dev/loop2

Ok, so now we can treat /dev/loop0 as if it was /dev/sdc or whatever the original drives would have been assigned. If it wasn't a RAID array we could run fdisk, fsck, mount, etc. directly against /dev/loop0. Then to save time if we have to do a restore lets snapshot the three new block devices. To set that up we need to know the exact sizes of each disk, and we need another image file for each disk where writes to the snapshot will end up going. These extra image files don't need to be the full size though, only big enough to hold all of the changes that we will be making.

$ sudo blockdev --getsize /dev/loop0
976773168
$ sudo blockdev --getsize /dev/loop1
976773168
$ sudo blockdev --getsize /dev/loop2
976773168

So all of our original disks are 976773168 blocks in size. 976773168 blocks * 512 bytes per block / 1024 bytes per KiB / 1024 KiB per MiB / 1024 MiB per GiB = 465.76174 GiB. Looks good considering we started with 500GB drives. Lets create some more image files and turn them into block devices as well. The truncate commands will create 10GB sparse files - a sparse file only uses as much disk space as the content written into it, like a thin-provisioned virtual machine, so the commands below actually only use up a few KB of disk space. The '.cow' in the filename stands for Copy-on-Write which is how the snapshots we will be making soon will work. Again pay attention to the output as we will need the loopback device names.

$ truncate -s 10G <disk1>.cow.img
$ truncate -s 10G <disk2>.cow.img
$ truncate -s 10G <disk3>.cow.img
$ sudo losetup -f --show <disk1>.cow.img
/dev/loop3
$ sudo losetup -f --show <disk2>.cow.img
/dev/loop4
$ sudo losetup -f --show <disk3>.cow.img
/dev/loop5

Now we can finally create the snapshots, the following commands work directly against Linux's device-mapper subsystem. Normally users don't deal with it directly - md-raid, LVM, and other such technologies usually sit between users and device-mapper to make it more user friendly - but powerful things can be done with it. I'm not going to spend a lot of words describing it as it would take many pages, feel free to read the documentation on the 'dmsetup' command here: dmsetup(8): low level logical volume management - Linux man page if you want to decode these commands or see what other kinds of things it can do.

$ echo 0 976773168 snapshot-origin /dev/loop0 | sudo dmsetup create tmp-<disk1>-orig
$ echo 0 976773168 snapshot-origin /dev/loop1 | sudo dmsetup create tmp-<disk2>-orig
$ echo 0 976773168 snapshot-origin /dev/loop2 | sudo dmsetup create tmp-<disk3>-orig
$ echo 0 976773168 snapshot /dev/loop0 /dev/loop3 p 128 | sudo dmsetup create tmp-<disk1>-snap
$ echo 0 976773168 snapshot /dev/loop1 /dev/loop4 p 128 | sudo dmsetup create tmp-<disk2>-snap
$ echo 0 976773168 snapshot /dev/loop2 /dev/loop5 p 128 | sudo dmsetup create tmp-<disk3>-snap

Great - we've got lots more block devices! What we've really accomplished is that we now have three block devices named '/dev/mapper/tmp-<diskx>-snap' that look and work exactly like '/dev/loop0-3' that we created first, except that if we make any changes those writes will end up in the appropriate .cow.img file and our large image files remain untouched. If we break something now we can just tear down the device-mapper devices, delete and re-create the .cow.img files as empty files, and re-run the device-mapper commands and try again - two seconds of copy-pasting commands instead of waiting for 1.5TB to come over the network. For reference, this is the little script I kept in a text editor - pasting it into the terminal window would set me back to working on untouched images again.

dmsetup remove tmp-<disk1>-snap
dmsetup remove tmp-<disk2>-snap
dmsetup remove tmp-<disk3>-snap
losetup -d /dev/loop3
losetup -d /dev/loop4
losetup -d /dev/loop5
rm -f *.cow.img
truncate -s 10G <disk1>.cow.img
truncate -s 10G <disk2>.cow.img
truncate -s 10G <disk3>.cow.img
sudo losetup -f --show <disk1>.cow.img
sudo losetup -f --show <disk2>.cow.img
sudo losetup -f --show <disk3>.cow.img
echo 0 976773168 snapshot /dev/loop0 /dev/loop3 p 128 | sudo dmsetup create tmp-<disk1>-snap
echo 0 976773168 snapshot /dev/loop1 /dev/loop4 p 128 | sudo dmsetup create tmp-<disk2>-snap
echo 0 976773168 snapshot /dev/loop2 /dev/loop5 p 128 | sudo dmsetup create tmp-<disk3>-snap

TuxDude · Jan 28, 2016

Step 3 - Assemble the array

I thought this was going to be part of the previous step, but ended up writing quite a lot more than I expected about getting the block devices and snapshots setup. But its finally time to turn all of these images into an array...

This is also where I ran into a few unforseen problems. While Linux software RAID does support the Intel Matrix Raid metadata format, it doesn't like to work with it unless the disks are connected through an Intel controller. Setting the environment variable IMSM_NO_PLATFORM bypasses some of those checks but I still couldn't get mdadm to assemble the images into an array. After a bunch of failed attempts I gave up, mdadm didn't want to be my friend and I already had all of the information I needed anyways. Back in step 1 while imaging the drives I had noticed in my logs that mdadm was trying to assemble the physical disks into an array every time my USB enclosure reset. Using a physical drive I was able to get 'mdadm --examine /dev/sdc' to tell me everything I need about the array, that being the raid level and chunk size.

/dev/sdc:
Magic : Intel Raid ISM Cfg Sig.
Version : 1.2.01
Orig Family : 8c8e8445
Family : 8c8e8445
Generation : 000103bb
Attributes : All supported
UUID : 0237ba90:3f428806:ed2c6128:12a47fe1
Checksum : f6b4e64a correct
MPB Sectors : 2
Disks : 3
RAID Devices : 1

[Volume0]:
UUID : b3b51825:314fd8b4:9b6126ad:273a493d
RAID Level : 0
Members : 3
Slots : [_U_]
Failed disk : 2
This Slot : ?
Array Size : 2930302976 (1397.28 GiB 1500.32 GB)
Per Dev Size : 976768003 (465.76 GiB 500.11 GB)
Sector Offset : 0
Num Stripes : 3815499
Chunk Size : 128 KiB
Reserved : 0
Migrate State : idle
Map State : failed
Dirty State : clean

Disk00 Serial : <disk1>:0
State : active failed
Id : ffffffff
Usable Size : 18446744073709545310

Disk01 Serial : <disk2>
State : active
Id : 00000000
Usable Size : 976766862 (465.76 GiB 500.10 GB)

Disk02 Serial : <disk3>
State : active failed
Id : 00000002
Usable Size : 976766862 (465.76 GiB 500.10 GB)

With that information available we don't need mdadm anymore. As I mentioned in the previous post md raid sits on top of the device-mapper, and we can control device-mapper directly. The only thing remaining that we don't know is the order that the drives were originally in the array, but with only 3 drives there aren't a lot of options so we can simply try them all until one works. Here is the new command to assemble the array the hard way:

$ echo 0 2930319360 striped 3 128 /dev/mapper/tmp-<disk1>-snap 0 /dev/mapper/tmp-<disk2>-snap 0 /dev/mapper/tmp-<disk3>-snap 0 | sudo dmsetup create tmp-array

And once again we have a new block device - now spanning across the three images, writing 128K to each before moving onto the next exactly the way a RAID-0 layout would do it. But did we guess right about the order of the disks?

$ sudo fdisk -l /dev/mapper/tmp-array
Disk /dev/mapper/tmp-array: 1.4 TiB, 1500323512320 bytes, 2930319360 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 65536 bytes / 196608 bytes
Disklabel type: dos
Disk identifier: 0x4b144b13

Device Boot Start End Sectors Size Id Type
/dev/mapper/tmp-array1 * 2048 206847 204800 100M 7 HPFS/NTFS/exFAT
/dev/mapper/tmp-array2 206848 2930300927 2930094080 1.4T 7 HPFS/NTFS/exFAT

We see a partition table, and the sizes make sense.

$ sudo partprobe /dev/mapper/tmp-array
$ sudo mount /dev/mapper/tmp-array2 ./mnt

mount: wrong fs type, bad option, bad superblock on /dev/mapper/tmp-array2,
missing codepage or helper program, or other error

In some cases useful info is found in syslog - try
dmesg | tail or so.

Nope, thats not good. And because 'mount' often writes things to disks its probably best to start over with fresh snapshots. After adding 3 more lines to the top of the script from the previous step to also clean up what we just did, starting over is only a couple seconds away.

dmsetup remove tmp-array2
dmsetup remove tmp-array1
dmsetup remove tmp-array

It took a few tries putting the disks into different orders, but I eventually ended up with this on my attempt to mount it:

$ sudo mount /dev/mapper/tmp-array2 ./mnt
The disk contains an unclean file system (0, 0).
Metadata kept in Windows cache, refused to mount.
Failed to mount '/dev/mapper/tmp-array2': Operation not permitted
The NTFS partition is in an unsafe state. Please resume and shutdown
Windows fully (no hibernation or fast restarting), or mount the volume
read-only with the 'ro' mount option.

Well thats looking better - its not surprising that a failed array wasn't unmounted cleanly.

sudo mount -o ro /dev/mapper/tmp-array2 ./mnt
ls mnt/
<List of files as you would expect to see on a Windows C: drive>

Oh look, theres the data we're after. Looking very promising. I opened a large video file and it played correctly with no corruption - a non-corrupt file that is large enough to span hundreds of chunks across all 3 drives tells me we definitely have our disks in the right order. We have now got a read-only copy of the entire 1.5TB NTFS filesystem, just with 226 KB of zero's somewhere in it where file data should be. At this point we could just start copying the data to a new location and let the user know that 99% of his data has been recovered but that there are some corrupt files. Or, we can figure out which files are corrupt - that step will follow in yet another post but I've run out of time for writing today.

TuxDude · Jan 29, 2016

This should be the last post detailing this data recovery and is an optional step, but in my opinion is still important to do. To summarize the current status before starting in on this, we started with three 500GB disks that used to be in a RAID-0 array controlled by an older Intel chipset, and have now gotten to the point where we can see the final 1.5TB array as a single device and can mount the NTFS filesystem on it and read the data. Being able to finally see the filesystem is also the first point where we can see how much of the overall capacity is still free space VS how much is used to hold data - in this case there are only 17GB free. So much for hoping that the unrecoverable sectors might end up being in the free space, at 99% utilization there's also a 99% chance that our missing sectors were used in a file somewhere. If this was my data I would sure want to know which files those are, time to figure it out. Start with un-mounting the filesystem again as we're going to be modifying the disk images again for this.

$ sudo umount ./mnt

Now as far as I know there are two ways to figure out which files have been impacted. The first method involves taking the log file from ddrescue from back in step 1, and dumping it in a human-readable format so that we have a list of exactly which sectors are still missing. Then we can do a bunch of math to calculate where in the raid array those sectors will land - at a 256K chunk size we would move to the next disk in the array every 512 sectors and we figured out the order of disks in the previous step. Then we can use low-level filesystem tools to tell us which file is used by each of those blocks. I've gone through that procedure on an ext4 filesystem using debugfs before and it is a ton of manual work and is very easy to make a mistake - it might not be a bad way to go about things if you only have a couple of sectors but we have 226KB worth scattered around, we won't be doing that. If you wanted to start down that road, the command 'ddrescuelog -l- <diskx>.log' would give you the list of missing sectors.

The other method makes the computer do the work. Knowing where the missing sectors are in the image we can replace the zero's that are currently there with unique data and then mount the filesystem and just do a search for that data. Its going to require scanning through every byte of the entire array which might take a long time on a large array, but at least we can walk away and come back later when its done. The way the ddrescue utility works we need to start by creating a new file containing data that hopefully will be unique.

$ printf "STH0STH0" > tmpfile

Gives us an 8-byte file with the contents STH0STH0. Doing it that way instead of using 'echo data > file' or using a GUI text editor ensures that we don't have a line-feed at the end or anything else in our file - just those 8 bytes. When ddrescue fills our missing sectors with it we want the pattern to stay 'STH0STH0STH0STH0', not 'STH0STH0/nSTH0STH0/n' or some other thing. Then we can have ddrescue write that data over top of our snapshot block device, using the log file from step 1 so that we only write over sectors that are marked as missing. Make sure you did unmount the filesystem before making changes to the underlying disks.

$ sudo ddrescue --force --fill-mode=- tmpfile /dev/mapper/tmp-<disk1>-snap <disk1>.log

GNU ddrescue 1.17
Press Ctrl-C to interrupt
Initial status (read from logfile)
filled size: 0 B, filled areas: 0
remaining size: 226816 B, remaining areas: 408
Current status
filled size: 226816 B, filled areas: 408, current rate: 79872 B/s
remain size: 0 B, remain areas: 0, average rate: 71850 B/s
current pos: 472732 MB
Finished

And finally re-mount the filesystem again and search for any files containing our unique data, printing the file names to the terminal.

$ sudo mount -o ro /dev/mapper/tmp-array2 ./mnt
$ find ./mnt -type f -print0 | xargs -0 -n 1 grep -l "STH0STH0"

That last command will probably take quite a while to run, especially if you are dealing with a large array and/or slow devices. In this case there were 81 files damaged, and those files represent 36GB of disk capacity. One 512-byte missing sector in the middle of a 7GB .ISO file is a corrupt 7GB file - that is the reason I spent so much time on step 1 trying to recover every possible sector.

At this point there's nothing left to do but give the user their recovered data and the list of damaged files. I included the damaged files in his recovered data as well just in case he wants to keep any of them - a single missing sector in a HD video stream might mean only a single bad frame somewhere in the video that otherwise plays perfectly. I was able to recover around 98% of the users data (>99% if you look at missing sectors, but I call it 36GB of lost data), and hopefully the readers here learned something and/or were entertained as well.

Data Recovery Project

Well-Known Member

Active Member

Moderator

Active Member

Active Member

Member

Well-Known Member

Active Member

Active Member

Well-Known Member

Well-Known Member

New Member

New Member

Well-Known Member

New Member

Well-Known Member

Active Member

Well-Known Member

Well-Known Member

Well-Known Member