Search Our Database

Replace HDD for Software RAID in CentOS

Last updated on Sep 8,2024 |

under

EOL, Storage & File Management, Storage, Linux

by z.yi

Introduction

This guide explains how to replace a failing hard drive in a software RAID array on CentOS using the mdadm tool. It is intended for system administrators who manage RAID arrays and need to replace failed or failing disks in a RAID 1 or RAID 5 configuration. The article will walk you through the process of identifying the failed disk, removing it from the array, replacing it with a new one, and rebuilding the RAID array to restore redundancy.

Prerequisites

Before proceeding, ensure you have the following:

Root or sudo privileges on your CentOS server.
A new hard drive ready to replace the failing one.
The mdadm tool installed on your server (commonly pre-installed with RAID setups).

Step-by-step guide

Step 1: Identify the failed drive

First, check the status of your RAID array to identify which drive has failed. Run the following command to display the status:
```
cat /proc/mdstat
```
This will show the current status of your RAID devices. For example:

md0: active raid1 sda1[0] sdb1[1](F)

In this example, sdb1 has failed (denoted by (F)).
we assume that /dev/sdb1 has failed after checking it with SmartMonTools or by using the command above

Step 2: Mark the failed drive as faulty and remove it from the array

Once you have identified the failed drive, you need to mark it as faulty and remove it from the RAID array using mdadm :
```
sudo mdadm --manage /dev/md0 --fail /dev/sdb1
sudo mdadm --manage /dev/md0 --remove /dev/sdb1
```
This tells mdadm to stop using the failed drive (sdb1 in this case) and removes it from the RAID array.

Step 3: Replace the failed drive

Power down your server to physically replace the failed drive. Once the server is powered off:
- Replace the failed hard drive with the new one.
- Boot up the server after replacing the drive.

Step 4: Prepare the new drive

Once the server is running again, you’ll need to partition the new drive to match the existing drives in the array. You can copy the partition table from a working drive (e.g., sda ) to the new drive (e.g.,sdb ) using the sfdisk command:
```
sudo sfdisk -d /dev/sda | sudo sfdisk /dev/sdb
```
If you’re using GPT partitioning instead of MBR,use the sgdisk command
```
sudo sgdisk -R /dev/sdb /dev/sda sudo sgdisk -G /dev/sdb
```

Step 5: Add the new drive to the RAID array

After partitioning the new drive, add it back to the RAID array using mdadm:
```
sudo mdadm --manage /dev/md0 --add /dev/sdb1
```
This will start the process of rebuilding the RAID array. You can monitor the rebuild progress by running:
```
cat /proc/mdstat
```
The rebuild may take some time depending on the size of the array and the speed of your disks.

Step 6: Update the GRUB bootloader (if required)

If your system uses the RAID for booting, you might need to update the GRUB bootloader on the new drive to ensure the system can boot from it. Run the following command to install GRUB on the new drive:
```
sudo grub-install /dev/sdb
```

Step 7: Verify the RAID status

After the rebuild is complete, verify that the RAID array is functioning properly and that the new drive has been fully integrated. Use the following command to check the RAID status:
```
sudo mdadm --detail /dev/md0
```
The output should show all drives active and synchronized.

Conclusion

By following this guide, you can replace a failing hard drive in a software RAID array on CentOS and restore the redundancy of your RAID system. Regularly monitoring the health of your RAID array and replacing failing drives promptly is crucial for ensuring the longevity and reliability of your storage setup.

For additional assistance or if you encounter any issues, please contact our support team at support@ipserverone.com.

Article posted on 20 April 2020 by Louis