Replacing disks in a Linux software RAID

Posted:

I’ve got a Redhat 9.0 box running a software RAID1 array with two 160GB IDE. It works well and I can’t complain about the performance. Like the idiot that I am, I failed to set up notification to tell me when a disk in the array fails (UPDATE: use smartd to check the health of hard drives and even mdmonitor to watch for a failed disk in array sets).

This happened recently (within the month), and the array degraded gracefully to using the remaining disk. But I still had to replace the “broken” one (which I don’t believe to be broken at all).

To do this, install the new disk as planned. When you boot, you’ll be shoved into a root shell.

  • fdisk the new disk.
  • create a new partition with partition type 0xFD.
  • write the new partition table out.
  • edit the /etc/raidtab file. Promote the remaining disk to being the first in the array.
  • start raid with mdadm: mdadm --assemble --run /dev/md0 /dev/hd
  • add the new disk into the existing array: mdadm --add /dev/md0 /dev/hd
  • The new disk will sync up to the old. Verify with cat /proc/mdstat. This process took about five hours on my system.
  • Reboot your new, happy system.

Just to make this more concrete, my /etc/raidtab looked like this:

raiddev             /dev/md0
raid-level                  1
nr-raid-disks               2
chunk-size                  4
persistent-superblock       1
nr-spare-disks              0
    device          /dev/hdb1
    raid-disk     0
    device          /dev/hdc1
    raid-disk     1

Recently, /dev/hdb1 failed (although it seems ok. fsck revealed no problems, but it was out of sync with /dev/hdc1). I then replaced the drive (Slave HD on the primary IDE channel). I booted the box and changed /etc/raidtools:

raiddev             /dev/md0
raid-level                  1
nr-raid-disks               2
chunk-size                  4
persistent-superblock       1
nr-spare-disks              0
    device          /dev/hdc1
    raid-disk     0
    device          /dev/hdb1
    raid-disk     1

I then started the RAID: mdadm --assemble --run /dev/md0 /dev/hdc1. Then, I added the new disk: mdadm --add /dev/md0 /dev/hdb1.

Hope this limited and cursory treatment of a complex topic helps.

UPDATE: Also see this primer on LVM and RAID tools in modern RedHat. This system was running RedHat 9 and so used the older raidtools stuff. You really only need mdadm, which can create, restore and repair RAID disk sets. Configure /etc/mdadm accordingly.