Search Our Database

Replace HDD for Software RAID in CentOS

Last updated on |
by

Introduction

This guide explains how to replace a failing hard drive in a software RAID array on CentOS using the mdadm tool. It is intended for system administrators who manage RAID arrays and need to replace failed or failing disks in a RAID 1 or RAID 5 configuration. The article will walk you through the process of identifying the failed disk, removing it from the array, replacing it with a new one, and rebuilding the RAID array to restore redundancy.

 

Prerequisites

Before proceeding, ensure you have the following:

  • Root or sudo privileges on your CentOS server.
  • A new hard drive ready to replace the failing one.
  • The mdadm tool installed on your server (commonly pre-installed with RAID setups).

 

Step-by-step guide

Step 1: Identify the failed drive

  • First, check the status of your RAID array to identify which drive has failed. Run the following command to display the status:
    cat /proc/mdstat
  • This will show the current status of your RAID devices. For example:
md0: active raid1 sda1[0] sdb1[1](F)
  • In this example, sdb1 has failed (denoted by (F)).
  • we assume that /dev/sdb1 has failed after checking it with SmartMonTools or by using the command above

 

 

Step 2: Mark the failed drive as faulty and remove it from the array

  • Once you have identified the failed drive, you need to mark it as faulty and remove it from the RAID array using mdadm :
    sudo mdadm --manage /dev/md0 --fail /dev/sdb1
    sudo mdadm --manage /dev/md0 --remove /dev/sdb1
    
  • This tells mdadm to stop using the failed drive (sdb1 in this case) and removes it from the RAID array.

 

Step 3: Replace the failed drive

  • Power down your server to physically replace the failed drive. Once the server is powered off:
    • Replace the failed hard drive with the new one.
    • Boot up the server after replacing the drive.

 

Step 4: Prepare the new drive

  • Once the server is running again, you’ll need to partition the new drive to match the existing drives in the array. You can copy the partition table from a working drive (e.g., sda ) to the new drive (e.g.,sdb ) using the sfdisk command:
    sudo sfdisk -d /dev/sda | sudo sfdisk /dev/sdb
  • If you’re using GPT partitioning instead of MBR,use the sgdisk command
    sudo sgdisk -R /dev/sdb /dev/sda sudo sgdisk -G /dev/sdb

 

Step 5: Add the new drive to the RAID array

  • After partitioning the new drive, add it back to the RAID array using mdadm:
    sudo mdadm --manage /dev/md0 --add /dev/sdb1
  • This will start the process of rebuilding the RAID array. You can monitor the rebuild progress by running:
    cat /proc/mdstat
  • The rebuild may take some time depending on the size of the array and the speed of your disks.

 

Step 6: Update the GRUB bootloader (if required)

  • If your system uses the RAID for booting, you might need to update the GRUB bootloader on the new drive to ensure the system can boot from it. Run the following command to install GRUB on the new drive:
    sudo grub-install /dev/sdb

     

Step 7: Verify the RAID status

  • After the rebuild is complete, verify that the RAID array is functioning properly and that the new drive has been fully integrated. Use the following command to check the RAID status:
    sudo mdadm --detail /dev/md0
  • The output should show all drives active and synchronized.

 

Conclusion

By following this guide, you can replace a failing hard drive in a software RAID array on CentOS and restore the redundancy of your RAID system. Regularly monitoring the health of your RAID array and replacing failing drives promptly is crucial for ensuring the longevity and reliability of your storage setup.

For additional assistance or if you encounter any issues, please contact our support team at support@ipserverone.com.

 

 

Article posted on 20 April 2020 by Louis