ZOL - recover lacked MBR after physical problem

lenainjaune · Sep 21, 2022

We had a physical problem in our backup server (we suppose an electrical shock due to the thunderstorm). The system is Linux Debian on a disk and the storage in on a two 4 TB disks RAID-1 ZFS (ZFS On Linux) pool. The first symptom we discovered was a frozen system. After we had multiple erratic boot we could not go beyond the BIOS. So we moved the system disk on another computer which booted without problem and seemed stable but when we tried to move the ZFS storage in it we discovered that only one disk was detected as a ZFS part pool but could be loaded/mounted with zpool and data were there (lsblk -f simply indicated that the other disk is not partitioned). After several tests to load the second disk the first one showed us that it was no more loadable and was detected as unpartioned.

Note : the commands and the results tested are given below

So we tried to test the healty of the two disks with the SMART tools smartctl but nothing wrong was returned, the disks seemed operational. So we tried to read data with dd with success because no read error was returned. So we tried badblocks which indicated too that everything was ok. Finally we tried gpart which for the moment discovered a possible partition Windows NT/W2K empty but the process is not terminated as the disks are big.

For now the only problem observed is the lack of the MBR but we do not found a tool to recover a ZFS MBR. How to do this ?

Additionally as we have an outdated external clone disk (every month we replace one disk by another which 'resilver" itself so we can externalize the replaced disk), we asked to ourselves if we can copy its MBR to replace the one on the faulty disks. We are not sure if the MBR are exactly the same on a ZFS pool part disk and its mirror or if the differences spring after the MBR execution. If it possible to clone it how to do this with dd ?

The tests and the results

Code:

root@CZ-LIVE:~# lsblk -o NAME,SIZE,FSTYPE
NAME     SIZE FSTYPE
...
sda      3,6T
...

=> no ZFS Filesystem detected

Code:

root@CZ-LIVE:~# smartctl -t long /dev/sda
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-2-amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".
Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 54 minutes for test to complete.
Test will complete after Fri Sep 9 17:24:14 2022 UTC
Use smartctl -X to abort test.

root@CZ-LIVE:~# smartctl -l selftest /dev/sda
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-2-amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 4660 -

root@CZ-LIVE:~# smartctl -A /dev/sda
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-2-amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x0003 100 100 006 Pre-fail Always - 0
3 Spin_Up_Time 0x0003 100 100 000 Pre-fail Always - 16
4 Start_Stop_Count 0x0002 100 100 020 Old_age Always - 100
5 Reallocated_Sector_Ct 0x0003 100 100 036 Pre-fail Always - 0
9 Power_On_Hours 0x0003 100 100 000 Pre-fail Always - 1
12 Power_Cycle_Count 0x0003 100 100 000 Pre-fail Always - 0
190 Airflow_Temperature_Cel 0x0003 069 069 050 Pre-fail Always - 31 (Min/Max 31/31)

root@CZ-LIVE:~# smartctl -A /dev/sda | \
grep -iE "Power_On_Hours|G-Sense_Error_Rate|Reallocated|Pending|Uncorrectable"
5 Reallocated_Sector_Ct 0x0003 100 100 036 Pre-fail Always - 0
9 Power_On_Hours 0x0003 100 100 000 Pre-fail Always - 1

=> nothing special returned by the disk's internal components

dd show if there is read errors (source) :

Code:

root@CZ-LIVE:~# dd if=/dev/sda of=/dev/null bs=64k conv=noerror status=progress
4000784842752 octets (4,0 TB, 3,6 TiB) copiés, 104555 s, 38,3 MB/s
61047148+1 enregistrements lus
61047148+1 enregistrements écrits
4000785948160 octets (4,0 TB, 3,6 TiB) copiés, 104556 s, 38,3 MB/s

=> no read errors

Code:

root@CZ-LIVE:~# date ; badblocks -svn /dev/sda ; date
ven. 16 sept. 2022 17:00:06 UTC
Checking for bad blocks in non-destructive read-write mode
From block 0 to 3907017526
Checking for bad blocks (non-destructive read-write test)
Testing with random pattern: done
Pass completed, 0 bad blocks found. (0/0/0 errors)
dim. 18 sept. 2022 01:54:49 UTC

=> no block error

Code:

root@CZ-LIVE:~# gpart /dev/sda

Begin scan...
Possible partition(Windows NT/W2K FS), size(0mb), offset(345079mb)

=> an empty partition ... not detected as ZFS

andrewbedia · Sep 26, 2022

have you tried using testdisk to find the partition?
you might also be able to dump the parition table from one drive and restore it to the other with gdisk/fdisk (assuming they both had the same size partition for zfs and in the same spot on both)
you might also be able to import the one disk readonly=on
you might be able to rewind the one disk to an importable state (warning: possible data loss, but it's better than nothing).

pricklypunter · Sep 26, 2022

Before you do anything, bit copy/ image both disks onto known good media. You can use dd or similar to achieve this. If all else fails, you can return the disks to their pre-existing states and try again, as many times as you like, before hopefully/ successfully recovering your data. You can also work with the image files, instead of the disks, and leave the disks untouched, which would be my preferred option. Either way, make sure you have a copy of it all before poking around otherwise you definitely risk data loss

lenainjaune · Sep 30, 2022

Hello and really thanks for your help

Sorry for the delay, but not to aggravate the problem I had to make the backup system operational urgently and it is not easy with our equipment and my knowledge.

andrewbedia said:
have you tried using testdisk to find the partition?

Not directly on physical disks ... but I tried in virtual environment (Qemu/KVM) with 2 vHDs with the QCOW2 format. Then I made a pool with these 2 vHD and add data in. After deleting the MBR on one vHD with parted -a cylinder /dev/sdb -s "mklabel msdos" I tried to recover it with testdisk ... with no success because it not find the partitions. I asked to myself if trying to simulate in virtual environment is a relevant.

andrewbedia said:
you might also be able to dump the parition table from one drive and restore it to the other with gdisk/fdisk (assuming they both had the same size partition for zfs and in the same spot on both)

Yes ! As I indicated in my OP this is a way I wanted to explore but without knowing exactly what part of disk to copy.

andrewbedia said:
you might also be able to import the one disk readonly=on

Ever done but with no success ! In first lsblk does not see the ZFS partition.

andrewbedia said:
you might be able to rewind the one disk to an importable state (warning: possible data loss, but it's better than nothing).

Yes we externalized the last 2 months and for now we are trying to recover these latest backups (I will present it below)

pricklypunter said:
Before you do anything, bit copy/ image both disks onto known good media. You can use dd or similar to achieve this. If all else fails, you can return the disks to their pre-existing states and try again, as many times as you like, before hopefully/ successfully recovering your data. You can also work with the image files, instead of the disks, and leave the disks untouched, which would be my preferred option. Either way, make sure you have a copy of it all before poking around otherwise you definitely risk data loss

We do not have some spare disks with the sufficient size to make this but I agree with you, the first thing I always do on a sensitive support before testing whatever on it is to clone bit/bit its contents on an other support or an image file, then after I can test the different methods on this copy.

---

We are a french 1901 law association and we have a NAS where all the documents are saved. Daily we do 2 differential backups of NAS on a backup server which uses the rsnapshot tool to retain one year of data. The NAS and the backup server each use a two disks mirrored pool on a ZFS filesytem. We opted to a mirror because in case of disaster we can eject only one disk from the different external removable racks (we just need to open the rack door and eject the disk) to recover soon data as they were before the problem (in reason's the fact that it contains one year of activity it is better to eject one backup server disk) but this constrained us to install big disks sufficient to contain all the data (4TB each on the backup server).

When I tested a disaster scenario I discovered that we can eject a disk without export the ZFS pool before (from what I discovered recently here it is mandatory to export before, so I do not understand why it works without) and from a virtual fresh installed system recover all the data from a ZFS pool.

The last disaster be not a fire or a flood but we suppose an electrical shock due to the thunderstorm.

As I do not have a good knowledge on the subject, I asked to myself if the disks which were in place when the disaster arose can become an electrical problems vector as we observed some erratic behaviors.

I detailed here in the first time all these erratic behaviors and all the tests I made but as that taked a lot of place, I prefered to lighten and keep it locally but I could paste its contents here if necessary.

I made the mistake to try to resilver on the second "disastered disk" (as it did not worked on the first), which did not works as I could expect but I did not realized at this moment that I would kill the latest possibility to recover the data we had just before the disaster. I suppose that resilver on a disk really destroy the old data and there is no DIY way to recover them (appart if we are the FBI) and as the resilvering process interrupted and failed I suppose that the disks became in an inconsistency state. So I suppose the disks are definively broken ... but I am not really sure

!

Lastly I tried to recover and import in readonly the ZFS pool of our 2 external backup disks each with success. But at the moment I can not import the oldest ZFS pool on a new system as it triggers an error :

Code:

# zpool import -f MYPOOL -d /dev -o readonly=on -R /
cannot import 'MYPOOL': one or more devices is currently unavailable
# zpool import
   pool: MYPOOL
     id: 11234760596266434154
  state: UNAVAIL
status: The pool was last accessed by another system.
 action: The pool cannot be imported due to damaged devices or data.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-EY
 config:
    MYPOOL                                      UNAVAIL  missing device
      mirror-0                                    DEGRADED
        replacing-0                               UNAVAIL  insufficient replicas
          15902164808289257209                    OFFLINE
          15578373759769090835                    OFFLINE
          pci-0000:00:0f.0-ata-2                  UNAVAIL
        ata-WDC_WD40EFZX-68AWUN0_WD-WXA2DA14XTCH  ONLINE

Note : we can observe that the 2 OFFLINE disks match the differents attempts to resilver with the damaged disks.

Also, I saw this when I tried to import from another system that the originate system :

Code:

# zpool import MYPOOL
cannot import 'MYPOOL': pool was previously in use from another system.
Last accessed by backup (hostid=70b6d139) at Sat Sep 24 21:19:46 2022
The pool can be imported, use 'zpool import -f' to import the pool.

But from the backup system (backup server), I can not import neither.

Also I am studying this fantastic ZOL documentation but for now nothing repair the problem.

As I am frigthened to try to import again the newest ZFS pool disk to see it is unusable as is, I focuse on the oldest .. and your advices.

By the way I do not know if it advisable to open an other post to ask help for the import which do not works.

Thanks again

lenainjaune · Oct 4, 2022

As nobody answers me I will ask step by step.

Is there any possibility that an hard drive become an electrical problem vector which can produce some erratic behaviors or contaminate or physically destroy some other devices ?

dswartz · Oct 5, 2022

I'm confused. So RAID1 and one disk works and the other doesn't? Why not just blow away the first N MB of the bad disk, and re-add it to the pool?

lenainjaune · Oct 6, 2022

dswartz said:
Why not just blow away the first N MB of the bad disk, and re-add it to the pool?

How to determine the N ? Blow away with dd ? Why do this ?

At this moment the states of our 4 disks :

2 seem destroyed but without a definitive real proof
the oldest external backup (named below OEB) : we do not manage to import and mount it because after having worked, the import process now returns that it is impossible to do so : either "one or more devices is currently unavailable" from the source system, either "pool was previously in use from another system" from another system but if I force the import (-f) the message becomes the first (see posts above)
the newest external backup (named below NEB) : would be the latest operational disk but we are afraid to try to import it as it is the last chance and we can not import no more the OEB ; for now we plan to make a physical clone of NEB with a physical cloner and try to import the copy but as we do not have spare disks we must command some disks and waiting the delivery.

I have a little knowledge on ZOL so I expect some explanations to recover OEB as I am studying myself the ZOL but for now I do not find a solution for our scenario (force import, scrub, etc.) ; nothing works, so I need help.

lenainjaune · Oct 6, 2022

I do not know if the below would help to understand the problem (the disk is OEB ; see above for the meaning) :

Code:

root@vm-bullseye:~# zdb -l /dev/disk/by-id/ata-WDC_WD40EFZX-68AWUN0_WD-WXA2DA14XTCH-part1
------------------------------------
LABEL 0
------------------------------------
    version: 5000
    name: 'pool_bkp'
    state: 0
    txg: 14548634
    pool_guid: 11234760596266434154
    errata: 0
    hostid: 1891029305
    hostname: 'backup'
    top_guid: 17868619074112252488
    guid: 5458620634135658793
    vdev_children: 1
    vdev_tree:
        type: 'mirror'
        id: 0
        guid: 17868619074112252488
        whole_disk: 0
        metaslab_array: 68
        metaslab_shift: 34
        ashift: 12
        asize: 4000771997696
        is_log: 0
        create_txg: 4
        children[0]:
            type: 'replacing'
            id: 0
            guid: 15188534990629522152
            whole_disk: 0
            create_txg: 4
            children[0]:
                type: 'disk'
                id: 0
                guid: 15902164808289257209
                path: '/dev/disk/by-path/pci-0000:00:08.0-ata-1-part1'
                devid: 'ata-WDC_WD40EFZX-68AWUN0_WD-WXB2DA18JTPS-part1'
                phys_path: 'pci-0000:00:08.0-ata-1'
                whole_disk: 1
                not_present: 1
                DTL: 329
                create_txg: 4
                offline: 1
            children[1]:
                type: 'disk'
                id: 1
                guid: 15578373759769090835
                path: '/dev/disk/by-path/pci-0000:00:0f.0-ata-2-part1'
                devid: 'ata-WDC_WD40EFRX-68N32N0_WD-WCC7K6FPUV0V-part1'
                phys_path: 'pci-0000:00:0f.0-ata-2'
                whole_disk: 1
                not_present: 1
                DTL: 237
                create_txg: 4
                offline: 1
                resilver_txg: 14548422
            children[2]:
                type: 'disk'
                id: 2
                guid: 3087451285824867019
                path: '/dev/disk/by-path/pci-0000:00:0f.0-ata-2-part1'
                devid: 'ata-WDC_WD40EFRX-68N32N0_WD-WCC7K2XRNX7E-part1'
                phys_path: 'pci-0000:00:0f.0-ata-2'
                whole_disk: 1
                DTL: 240
                create_txg: 4
                resilver_txg: 14548634
        children[1]:
            type: 'disk'
            id: 1
            guid: 5458620634135658793
            path: '/dev/disk/by-id/ata-WDC_WD40EFZX-68AWUN0_WD-WXA2DA14XTCH-part1'
            devid: 'ata-WDC_WD40EFZX-68AWUN0_WD-WXA2DA14XTCH-part1'
            phys_path: 'pci-0000:00:08.0-ata-2'
            whole_disk: 1
            DTL: 399
            create_txg: 4
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
    labels = 0 1 2 3

Another thing, I do not understand why the following indicates the pool is unavailable due to damaged devices or data from a mirrored disk with the status ONLINE (ata-WDC...) :

Code:

root@vm-bullseye:~# zpool import -f
   pool: pool_bkp
     id: 11234760596266434154
  state: UNAVAIL
status: The pool was last accessed by another system.
action: The pool cannot be imported due to damaged devices or data.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-EY
config:

    pool_bkp                                      UNAVAIL  missing device
      mirror-0                                    DEGRADED
        replacing-0                               UNAVAIL  insufficient replicas
          15902164808289257209                    OFFLINE
          15578373759769090835                    OFFLINE
          pci-0000:00:0f.0-ata-2                  UNAVAIL
        ata-WDC_WD40EFZX-68AWUN0_WD-WXA2DA14XTCH  ONLINE

lenainjaune · Oct 13, 2022

OK I just finished to study ZOL and I did not find an answer to my problem. Anyone ?

Search

ZOL - recover lacked MBR after physical problem

lenainjaune

New Member

andrewbedia

Well-Known Member

pricklypunter

Well-Known Member

lenainjaune

New Member

lenainjaune

New Member

dswartz

Active Member

lenainjaune

New Member

lenainjaune

New Member

lenainjaune

New Member