Cascading Failures Due to Failing RAID1 on Linux

I recently had a partial server outage due to a failing RAID, but the problems were due to a cascade of software-based dependencies failing.

I had a damage RAID1 array, and it resulted in a bootable system that would not run my software stack, rendering the server dead to most of the world.

I run some websites in Docker containers, on an Ubuntu server. The server runs LVM on top of software raid (mdadm).

One day, it just wouldn’t start the containers. It complained that the overlay driver wasn’t available.

It turned out that the server was running an old kernel, which was a kind of failsafe kernel. It wouldn’t load the current “overlay” kernel modules because they were a different format.

GRUB had chosen that failsafe kernel.

Why? Because the past two kernel packages didn’t install correctly. They installed, but couldn’t produce a good initramdisk (initial ramdisk for booting).

This was because the failing RAID was causing a problem, manifesting as unavailable files during the initramdisk image creation. (The RAID contained a boot partition that was mounted at /boot. This way, the boot partition is mirrored, and bootable on both disks.)

I also had a problem with a swap partition on the disk, because it was having bad reads. The system took this swap partition offline. I think it also caused problems during ramfs creation.

According to what I read, if the RAID could be re-synced, the files problem would go away, and the kernel packages could be reinstalled, and all would be good. I re-added the disk to the array. (Really, readded the partition to the mirror.)

The problem was, the disks were damaged enough that they caused problems. After syncing, they developed bad blocks, again.

(The good news is, it didn’t seem to affect the file data. Hurrah for RAID1.)

My fix was to install new disks.

Linux Rocks

I drove to the computer. I removed the failing disk, and put a new one in. It was pre-partitoned similar to the original disk, but with larger partitions.

I rebooted the computer, and this time around, I guess the kernel reinstall got farther, because the computer wouldn’t get past GRUB. So I booted from a USB live bootdisk/installer.

The system saw the new partitions. I added it to the mirror. md immediately started to resync with the functional disk.

It took around 90 minutes to sync 500GB.

I re-installed the kernel, but got an error message about nonexistent swap partition.

I re-enabled the swap partition on the new disk. (It has a useful option to set the disk UUID. With that, I don’t need to edit the /etc/fstab file, or mess with the kernel package’s installer’s options.)

Re-installing the kernel, and the kernel modules, went smoothly after that.

I rebooted, and it worked.

I repeated the removal and reinstallation process with the second disk. Again, it went smoothly, syncing in under 40 minutes.

At the end of the repair, I had a server with 2TB disks in a RAID1 array.

I considered the repair complete!

So, while the system was offline a long time, data wasn’t lost. It could have been worse.

Why Isn’t the Space Available?

It’s complicated, because there are a few layers to the system, and each needs to be resized.

The bottom layer is the partition. Though it’s called “RAID”, where the “D” means disks, Linux software raid is mirroring a partition on my disk, creating an array.

Then, the array needs to be enlarged, so all the new blocks in the new partition get mirrored.

Then, the partition in the array needs to be resized, to use the new blocks.

Then, the LVM physical volume needs to be enlarged, to use up the entire partition in the array. (LVM doesn’t automatically use all the space.)

Then, the LVM logical volume, needs to be enlarged, to use the available space on the physical volume, or more volumes created, to be mounted in the filesystem.

Then, the file system in the logical volume must be enlarged, to make the space available to users.

I’ll train to do this, and then do it. I’ll practice on a regular external disk on my laptop.

Resizing /boot

I am finding myself in a weird conundrum. I cannot resize the 500MB boot partition to something more appropriate for 2024, like 4GB.

That’s because the boot partition is mirrored. I cannot move partitions inside the array. They can be grown into the new space, but I can’t take some of that new space, and give it to the boot partition.

The partition must be a regular partition within the array, because the bootloader requires that.

I think that I could use another disk and LVM to mirror data, and alter the partitions in the array, then restore the data.

This all sounds pretty complicated, and it’s probably safer to build a completely separate array on two new disks, partition it in the desired sizes, and copy the data over, and then set the system to boot from the new partition.

The most realistic solution is to build a completely new server, get it booting correctly, using current best practices, using UEFI boot. Then, improve on the existing file system arrangement*, and copy the data over. While this sounds drastic, it’s not: the disks on my server were last formatted in 2016, and it’s now 2024. The disks lasted 8 years. The market value of the server, which I paid 350 for, without good disks, was now around 150.

The server is fast and functional, but obsolete, and increasingly at risk of failure.

Leaving BIOS/MBR

The boot partition on my server, in a way, is more like “hardware” than the code in the rest of the system, because it’s the old MSDOS style BIOS and MBR boot sequence.

The current boot sequence is roughly: BIOS -> MBR -> GRUB -> /boot partition in array loads kernel.

The disk is using the old MSDOS partition table. The MBR is the first sector. GRUB is in empty sectors after the MBR. The /boot file system must be reachable by GRUB, to boot the initrd ramdisk, and then the kernel. So a partition is created (in /dev/md) for /boot, to keep that sector down in the lower part of the disk, in a legacy ext2 filesystem.

The new server’s boot sequence would be: EFI Boot Menu -> UEFI GRUB Linux bootloader -> /boot directory in main file partition.

The EFI boot is done from a EFI System Partition, a small partition with a MSDOS fat file system, that contains bootloaders for different operating systems. EFI boot can work with GPT partition tables, and can address disks larger than 2TB, with many partitions. So you don’t need a boot partition. I might still make a boot partition, however, so it can use an ext2 filesystem.

* Improving on the File System

LVM opens up possibilities for better backups, by using LVM tools to clone volumes.

For these kinds of backups to be practical, they need to be fast and small. Your file system can be sharded into application-specific logical volumes.

Why Didn’t I Catch This?

I was using smartctl to look at the disk health. I’d log in, look at the disks.

There were two remapped sectors on the failing disk.

I didn’t think that was too serious.

Had I been using the mdadm monitoring tools, I would have gotten earlier warning about disk problems.

I should have learned those tools, and set them up to notify me of problems.