Technicalfeline: RAID1

Thursday, 10 November 2016

Reverting UEFI boot to MBR boot for CentOS/Redhat 6 on the HP z230

Why would you want to do this? In my case it is because the HP z230 workstation I am maintaining has two disks in software RAID1 and while /boot can be on a RAID1 array, and the boot loader can be installed in the MBR of both disks, so that boot can continue even if one disk fails, UEFI doesn't (as far as I know) provide this redundancy. If you know different, please write that up.

I had already configured UEFI boot on the workstation. This is what I had to do to change to MBR boot when I discovered that UEFI doesn't provide redundancy.

First do all the required kernel updates and check that UEFI boot works.

Then copy the contents of /usr/share/grub/x86_64-redhat to /boot/grub. The OS installer doesn't install the grub helper files when it detects that UEFI boot is active.

Next copy /boot/efi/EFI/redhat/grub.conf to /boot/grub. This is the reason for doing the kernel updates first so that grub.conf is up to date.

Shutdown and go into the BIOS and disable UEFI so the boot method is Legacy MBR.

Boot with the CentOS install DVD and select rescue mode. You'll need to let the rescue OS mount the root filesystem on /mnt/sysimage so that you can run grub-install. Get a shell and do chroot /mnt/sysimage, then run grub-install /dev/sda and grub-install /dev/sdb to install the boot loader on both disks.

Reboot and remove the DVD before startup. Now GRUB should boot from the hard disk. It also wanted to do a selinux relabel so plan for some downtime in case this happens.

Addendum 2016-11-10: Also ensure that /etc/grub.conf is a symlink to ../../boot/grub/grub.conf and not to the one in the EFI boot partition, otherwise kernel package updates will update the wrong grub.conf.

The instructions are for the HP z230 but may be applicable to your workstation. This assumes that your BIOS supports the legacy MBR boot method in addition to UEFI.

Thursday, 9 April 2015

Converting single disk to RAID1 in-situ

You have this Linux system that doesn't use RAID. You start to worry about the loss of files (from the last backup; you do backups, right?) and downtime should the disk fail. Maybe it is a good idea to have RAID. But how to retrofit RAID1 without a lot of downtime backing up, reformatting the disks and restoring the data?

I suspected there might be a way to start off with a degraded RAID1 array on the second, new disk, copy the partitions on the old disk onto it, change the type of the old disk to RAID element, add it to the array and let it resync. Sure enough it can be done, and François Marier has blogged it. In fact he goes further and shows how to reinstall the boot loader. I didn't have to do this because my partition is /home. The critical tip is the use of the keyword missing to create the degraded array without issues.

In my case the decision to go RAID1 was done after a failed disk caused loss of files. It was not a wise decision by the system builder to not use RAID1 in the first place.

I've varied the procedure a little. Instead of putting ext4 directly on the RAID partition, I put a logical volume on it, and then created an ext4 partition inside that. This allows me to migrate the content to a larger disk if expansion is needed in future, using logical volume operations, with little downtime.

There's one thing you should do if you decide to use logical volumes on the RAID. After you have assembled the RAID array, run vgscan. This will reinitialise the cache in /etc/lvm/cache/.cache. Otherwise it will contain entries for the components of the array and cause failure to assemble later on with a mysterious (to me at first) duplicate PV error because it thinks the array components are candidates for volumes. LVM is normally configured to ignore components of RAID arrays but only if the cache is up to date. See here for more details.

A couple of caveats: On other Linux systems mdadm.conf may be in /etc, not /etc/mdadm. Also the mdadm --detail --scan command to get the mdadm.conf line will contain a spares=1 directive if run while the array is resyncing. Remove it, or you will have problems next boot.