Converting an existing Debian system to LUKS root

EffrafaxOfWug · Aug 10, 2018

I had a look around on the net and didn't find anyone who'd done this precise sort of thing before (fair few people doing this is slightly less complex setups though), so seeing as I spent a couple of days putting this together I thought it might help someone else out there. Bear in mind I've only attempted this on MBR-based systems currently and I'm still using good ol' sysV init - I'm not sure if systemd makes a difference to any of the below or not but I doubt it since all of the changes should mostly be happening before init kicks in anyway. Nor has this been tested on anything other than debian stretch, although I can't see most other debian-derived distros being a million miles apart.

Some background: some of our kit in satellite offices is less physically secure than we'd like, so we've been standardising on making sure things like file server storage sit on LUKS crypto devices to mitigate the risk of data being physically stolen (physical theft of HDDs from servers having happened to me in a previous job and something I think that's likely also applicable to many of you at home too). But that's meant either servers coming up without the data partitions mounted, manual intervention when booting (to input the password when the volume is mounted) or storing LUKS keys on unencrypted filesystems. Anyway, seeing as all our servers are IPMI'd up the wazoo we've agreed to start encrypting all the boot drives as well as the data drives so that the LUKS keys could be stored on the root filesystem, and an operator would be there at boot to input the "master" password to open the root drive, whereupon the LUKS keys could be accessed and used to mount other drives. Servers are never intentionally rebooted without an operator on the IPMI console anyway so it's not a major inconvenience. The question would be how to do this with as little disruption as possible - ideally by converting in place, avoiding a reinstall and partial restore (full restore would be out of the window since too many niggly bits in /boot and /etc will have changed).

So we start looking for a way to convert a running system. Former disc geometry is two SSDs for the OS root and other system partitions;

Our general "old" device/filesystem layout is as follows:

Code:

	/dev/sda1	512MB fd00
	/dev/sdb1	512MB fd00
		Combined into mdadm RAID1 /dev/md0
			Formatted ext4 and mounted as /boot
			
	/dev/sda2	96GB fd00
	/dev/sdb2	96GB fd00
		Combined into mdadm RAID1 /dev/md1
			Formatted as an LVM physical volume (PV)
				Volume group (VG) "root_vg"
					Logical volumes (LV) for OS, swap, etc

If the SSDs are 240GB or bigger we'd generally also add:

Code:

	/dev/sda3	96GB+ fd00
	/dev/sdb3	96GB+ fd00
		Combined into mdadm RAID1 /dev/md2
			Formatted as a LVM physical volume (PV)
				Volume group (VG) "storage_vg"
					Logical volumes (LV) for dm-cache to speed up the platters used elsewhere

Essentially the plan is to insert another layer between the mdadm and LVM of LUKS-encrypted storage. When the device boots, the operator will be prompted to enter the password for the root partition, all other LUKS partitions will be configured to open up based on a key stored on the encrypted root partition. Note that we're still keeping /boot unencrypted as I've not done enough fiddling with grub to know whether this is viable yet, and RAID1 for boot typically only works with MBR/legacy BIOS since in this regard UEFI is fecking stupid.

I tested this on physical hardware - I could have done it in VMware workstation or ESX easily enough but had spare tin around it cloning the discs was easy. Anyway, booted into a live USB distro (specifically SystemRescueCD) on a spare chassis, plugged in the SSDs and got cracking. In the following examples the SSDs are sdb and sdc since the bootable USB was sda. sdb was formatted as follows:

Code:

fdisk -l /dev/sdb
Disk /dev/sdi: 223.6 GiB, 240057409536 bytes, 468862128 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: dos
Disk identifier: 0x7s2o3220

Device     Boot     Start       End   Sectors  Size Id Type
/dev/sdb1  *         2048   1048575   1046528  511M fd Linux raid autodetect
/dev/sdb2         1048576 202375167 201326592   96G fd Linux raid autodetect
/dev/sdb3       202375168 403701759 201326592   96G fd Linux raid autodetect

The partition layout was cloned to the other SSD with sfdisk

Code:

sfdisk -d /dev/sdb | sfdisk /dev/sdc

All three of the RAID1 arrays were then created

Code:

mdadm --create /dev/md0 --level=1 --raid-devices=1 /dev/sdb1 /dev/sdc1
mdadm --create /dev/md1 --level=1 --raid-devices=1 /dev/sdb2 /dev/sdc2
mdadm --create /dev/md2 --level=1 --raid-devices=1 /dev/sdb3 /dev/sdc3

md1 was then turned into a LUKS partition using cryptsetup. This'll prompt you to create the password that'll ultimately be needed to open the root drive.

Code:

cryptsetup luksFormat /dev/md1

Once it's created, open it up to the OS and label it as md1_crypt

Code:

cryptsetup luksOpen /dev/md1 md1_crypt

We're now ready to recreate the LVM stuff, and to make things easier for us down the line instead of recreating from scratch we're going to copy the LVM config from the original system; I scp'd the current LVM config over from the running system (to cut a long story short, this'll almost always be the file /etc/lvm/backup/$vgname) to a working area my bootable distro. Look inside this file for the UUID of the PV; generally that'll be in a block that looks like this:

Code:

    physical_volumes {

        pv0 {
            id = "f77rlK-qxme-D6kG-SLV3-lqaF-Zs1m-iEuu1k"
            device = "/dev/md1" # Hint only

            status = ["ALLOCATABLE"]
            flags = []
            dev_size = 251656192    # 119.999 Gigabytes
            pe_start = 2048
            pe_count = 30719    # 119.996 Gigabytes
        }
    }

Now let's use a) that UUID and b) the restore file itself to recreate the PV with the same details

Code:

pvcreate --uuid "f77rlK-qxme-D6kG-SLV3-lqaF-Zs1m-iEuu1k" --restorefile /mnt/usb/original_system/etc/lvm/backup/root_vg /dev/mapper/md1_crypt
  Couldn't find device with uuid f77rlK-qxme-D6kG-SLV3-lqaF-Zs1m-iEuu1k.
  Physical extents end beyond end of device /dev/mapper/md1_crypt.
  Format-specific initialisation of physical volume /dev/mapper/md1_crypt failed.
  Failed to setup physical volume "/dev/mapper/md1_crypt".

The "couldn't find device with uuid..." line is normal when you're creating one from scratch with a UUID specified, but the error below it is because of the change in disc geometry. The original system was first set up on a 120GB SSD, and then later cloned to a 240GB SSD - when the partitions were expanded and the PV resized, it thought it was now a ~240GB PV. As such when we try and recreate it, it rightly refuses to say it can fit on a 96GB partition.

Easy way out here - I know for a fact that the volumes within the PV take up way less space than 96GB, so we can happily change the dev_size and pe_count attributes in the /mnt/usb/original_system/etc/lvm/backup/root_vg file. Cue back-of-a-fag-packet calculating the dev_size for a ~90GB PV; 90*1024*1024 = 94371840. And for 4MB extents that's (90*1024)/4 = 23040. So amend the file with these new values for the PV and that section now looks like so:

Code:

    physical_volumes {

            pv0 {
                    id = "f77rlK-qxme-D6kG-SLV3-lqaF-Zs1m-iEuu1k"
                    device = "/dev/md1"     # Hint only

                    status = ["ALLOCATABLE"]
                    flags = []
                    #dev_size = 251656192   # 119.999 Gigabytes
                    dev_size = 94371840
                    pe_start = 2048
                    #pe_count = 30719       # 119.996 Gigabytes
                    pe_count = 23040
            }
    }

PV now creates fine.

Code:

pvcreate --uuid "f77rlK-qxme-D6kG-SLV3-lqaF-Zs1m-iEuu1k" --restorefile /mnt/usb/original_system/etc/lvm/backup/root_vg /dev/mapper/md1_crypt
  Couldn't find device with uuid f77rlK-qxme-D6kG-SLV3-lqaF-Zs1m-iEuu1k.
  Physical volume "/dev/mapper/md1_crypt" successfully created.

Yes, 90GB is smaller than 96GB (didn't want to risk overshooting), but we can resize the PV any time we want in the future with a simple pvresize.

Now we can recreate the volume group itself - along with all the child logical volumes - from the same config file.

Code:

vgcfgrestore -f /mnt/usb/original_system/etc/lvm/backup/root_vg root_vg
  Restored volume group root_vg

After a quick glance at pvdisplay, vgdisplay and lvdisplay everything looked happy so we activate the VG and LVs to allow them to be used.

Code:

vgchange -a y root_vg
lvchange -a y /dev/root_vg/*_lv

Now is the time to recreate the filesystems, and another thing I did here was to make a note of the UUIDs of the ext4 filesystems from the original system and re-use those as well. Just showing the boot and root partitions here for brevity, we're also making sure that 64bit filesystem (needed for >16TB and metadata checksums) and metadata checksums are turned on:

Code:

mkfs.ext4 -m 5 -L poghril_boot -O 64bit,metadata_csum -U rq908oqq-588p-41op-8qsr-snq887518p5o /dev/md0
mkfs.ext4 -m 5 -L poghril_root -O 64bit,metadata_csum -U 288289q0-48s0-4sq6-9p7r-9pq408sorn63 /dev/mapper/root_vg-root_lv

Now's the time to mount the new partitions and copy the data over. The new disc structure will be mounted at /mnt/poghril (that's the system name BTW) and then have the root mounted on it;

Code:

mkdir /mnt/poghril
mount /dev/mapper/root_vg-root_lv /mnt/poghril

On poghril itself, I use rsync to push the data from the live system to the one running the live ISO;

Code:

root@poghril:~# rsync -axHAWXS --numeric-ids --info=progress2 / root@10.17.10.6:/mnt/poghril
Password:
  1,640,635,234  99%   52.16MB/s    0:00:29 (xfr#32254, to-chk=0/41424)

The x option is important here as it stops rsync from crossing filesystem, so any additional mounts won't be copied and will need to be done separately. As such, it also avoids copying /dev, /proc and the rest of it. Anyway, now root is done we'll mount boot and so the same thing there:

Code:

mount /dev/mapper/root_vg-boot_lv /mnt/poghril/boot

...and copy the filesystem over from poghril itself again

Code:

root@poghril:~# rsync -axHAWXS --numeric-ids --info=progress2 /boot/ root@10.17.10.6:/mnt/poghril/boot
Password:
  194,414,333 100%   86.58MB/s    0:00:02 (xfr#352, to-chk=0/358)

Repeat for whatever other partitions make up your root filesystem. Once those are done we're ready to start chrooting but first we need to set up the mounts to be used within the chroot itself.

Code:

mount -t proc none /mnt/poghril/proc
mount -t sysfs none /mnt/poghril/sys
mount --bind /dev /mnt/poghril/dev

We're now hopefully ready to do the chroot.

Code:

chroot /mnt/poghril /bin/bash

So, hopefully tadaa! Now we can look at reconfiguring grub and reinstalling to the MBR to make the new drives bootable but there's a bunch of stuff we need to check first. Firstly make sure /etc/mtab contains the right information - it should just be a symlink to /proc/mounts but you want to make sure the disc geometry in there looks kosher. Since we've retained the same UUIDs for both LVM and the filesystems themselves, most stuff configured in fstab should Just Work, but if you use labels or other methods of mounting those will need to be changed now.

One thing that needs to be added is an entry into /etc/crypttab for the new LUKS disc to allow it to prompt to be opened/unlocked at boot. We're sticking with the md1_crypt for the mounted name and we don't want to set a key for it.

Code:

md1_crypt UUID=dc28b993-8a09-475c-b4ed-9e7aa460a791 none luks

One gotcha that I missed when I first looking at grub reconfiguration was this, complaining about a missing something or other:

Code:

update-grub2
Generating grub configuration file ...
/usr/sbin/grub-probe: warning: Couldn't find physical volume `(null)'. Some modules may be missing from core image..
/usr/sbin/grub-probe: warning: Couldn't find physical volume `(null)'. Some modules may be missing from core image..
/usr/sbin/grub-probe: warning: Couldn't find physical volume `(null)'. Some modules may be missing from core image..
  WARNING: Failed to connect to lvmetad. Falling back to device scanning.
  WARNING: Failed to connect to lvmetad. Falling back to device scanning.
/usr/sbin/grub-probe: warning: Couldn't find physical volume `(null)'. Some modules may be missing from core image..
Found linux image: /boot/vmlinuz-4.9.0-7-amd64
Found initrd image: /boot/initrd.img-4.9.0-7-amd64
/usr/sbin/grub-probe: warning: Couldn't find physical volume `(null)'. Some modules may be missing from core image..
/usr/sbin/grub-probe: warning: Couldn't find physical volume `(null)'. Some modules may be missing from core image..
/usr/sbin/grub-probe: warning: Couldn't find physical volume `(null)'. Some modules may be missing from core image..

The terminology of "physical volume" was confusing but turns out this actually means a physical device of some sort. As near as I could tell was because in the chroot, the grub configulator was looking for information about /boot (sitting on md0) from mdadm.conf - but mdadm.conf was wrong since it did not contain the UUIDs of the new mdadm arrays, so I quickly changed it to include the new ones (I didn't recreate it from mdadm --scan since that would have removed all the items that weren't currently attached), so I just remmed out the old and added the new;

Code:

# definitions of existing MD arrays
#ARRAY /dev/md/0  metadata=1.2 UUID=68rr4q67:0r14os37:3061r653:sq7q53o0 name=poghril:0
ARRAY /dev/md/0  metadata=1.2 UUID=n72so981:136n7006:4p75404s:s6o6os0r name=poghril:0
#ARRAY /dev/md/1  metadata=1.2 UUID=q00sr243:qs8r7p05:o6pn7oo7:79sn3qrp name=poghril:1
ARRAY /dev/md/1  metadata=1.2 UUID=2478705s:22n6o8oo:9p7119r7:90p5sr12 name=poghril:1

Now the grub wizard should be able to find the /boot md0 OK - just run `dpkg-reconfigure grub-pc`, install it to the MBR of each of the new SSDs (/dev/sdb and /dev/sdc as they're seen here) and (fingers crossed) we should be good to reboot...

...and at this point I rebooted and the system clanged to an initramfs emergency prompt. D'oh! I had forgotten to reconfigure the initramfs to make sure it's now able to boot from crypto. So I had to boot back into the USB distro, ensure all the RAID devices and LVMs were correctly detected and remounted and then repeat the chroot process. Once we were back in the chroot environment, add explicit support for dmcrypt and LVM to initramfs - they should be detected and added automagically, but at this point I didn't want to take any more chances.

Code:

echo lvm2 >> /etc/initramfs-tools/modules
echo dmcrypt >> /etc/initramfs-tools/modules

Finally we can run the following to recreate the initramfs for all installed kernels:

Code:

update-initramfs -k all -u

Laptop/desktop users will also want to double-check the file /etc/initramfs-tools/conf.d/resume - this is the device that initramfs looks for resume images for hibernate/S2D. However since this is a server it's never used, and in any case the UUIDs of the LVMs and filesystems have been retained so hopefully there's nothing to change here.

This time when I rebooted into the SSDs, I immediately got the prompt asking me to provide the password to unlock the root partition. Unplugged the NIC as a just-in-case, entered the password and successfully booted into a LUKS clone of the existing system. After that's done it's a simple matter to re-enable the pre-existing RAID arrays and crypt volumes, and to add any new ones like the /dev/md2 that we'll be encrypting for use with dm-cache.

Once we'd done this as a proof-of-concept and got the procedure down pat, we did a test on poghril itself. We powered it down and repeated the above process with the live USB, but when the time came to copy from the original system, we rsynced from poghril's original discs (also mounted in the live environment) to ensure data consistency, removed the old discs and then rebooted poghril into the shiny new world of encryption where it's been running fine ever since. I'm also going to be doing the same thing to my main home server when I semi-rebuild it (which was part of why I wanted to get this properly written up) as that also involves a motherboard swap-out and a change in disc geometry too.

As I'm sure most have figured out, most of the above commands and general methodology will work fine for non-RAIDed, non-LVM'd setups as well assuming you know which device IDs you're working from, and there's no real need to do this with a whole new SSD/chassis if you're confident enough in your backups - I did so to avoid potential disruption/destruction and because I had the means available. Anyway, hope someone out there finds this useful or at least interesting.

Search

Converting an existing Debian system to LUKS root

EffrafaxOfWug

Radioactive Member