Search Our Database

How to fix DRBD recovery from split brain

Last updated on Oct 29,2024 |

under

EOL, Storage & File Management, Troubleshooting, Linux

by IPSERVERONE

Warning: CentOS 7 reached its end-of-life (EOL) on June 30, 2024. This means it no longer receives security updates or support from the developers. It is strongly recommended to upgrade to a supported operating system version, such as CentOS Stream 9 or an alternative Linux distribution, to maintain security and stability.

Introduction

Distributed Replicated Block Device (DRBD®) is an essential component for configuring high-availability (HA) clusters. DRBD mirrors a complete block device over a network, effectively creating a network-based RAID-1. This replication enables synchronous data mirroring, ensuring that critical data remains accessible even if one server fails. DRBD is widely used for disaster recovery and HA environments, making it popular in systems that require minimal downtime and high fault tolerance.

This article addresses a common DRBD connectivity issue where nodes fail to synchronize, displaying an “unresolved split-brain” error. Split-brain occurs when both DRBD nodes make independent changes, preventing the devices from automatically syncing. This issue can prevent critical data from being mirrored between servers. Here, the guide outlines a step-by-step resolution to reconnect DRBD nodes and establish synchronization.

Problem

When attempting to connect DRBD on CentOS-based servers, the following error may appear on Node 1 and Node 2, preventing synchronization:

- Output from # /proc/drbd on Node 1:

version: 8.4.0 (api:1/proto:86-100)
GIT-hash: 28753f559ab51b549d16bcf487fe625d5919c49c build by gardner@, 2011-12-12 23:52:00
0: cs:StandAlone ro:Secondary/Unknown ds:UpToDate/DUnknown   r-----
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:76

- Log message on Node 1:

Mar  7 15:38:05 node1 kernel: block drbd0: Split-Brain detected but unresolved, dropping connection!

- Output from # /proc/drbd on Node 2:

version: 8.4.0 (api:1/proto:86-100)
GIT-hash: 28753f559ab51b549d16bcf487fe625d5919c49c build by gardner@, 2011-12-12 23:52:00
0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown   r-----
    ns:0 nr:0 dw:144 dr:4205 al:5 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:100

Solution

Step 1: Start DRBD Manually on Both Nodes

To re-establish DRBD connectivity, begin by starting the DRBD service on both the primary and secondary nodes.

- For versions earlier than CentOS 7:

/etc/init.d/drbd start

- For CentOS 7 and later versions:

systemctl start drbd

Step 2: Set the First Node as Secondary and Discard Data

On one of the nodes, run the following commands to designate it as the secondary node and discard its data to resolve the split-brain state:

drbdadm secondary all
drbdadm disconnect all
drbdadm -- --discard-my-data connect all

Step 3: Set the Second Node as Primary and Reconnect

On the other node, set it as the primary node and reconnect both nodes to reestablish data synchronization:

drbdadm primary all
drbdadm disconnect all
drbdadm connect all

Conclusion

By following this guide, DRBD connectivity issues resulting from split-brain conditions can be resolved, ensuring that data continues to mirror correctly between HA cluster nodes. Regular DRBD monitoring and quick response to split-brain events can prevent potential data loss in production environments.

Should you have any inquiries about the guidelines, please feel free to open a ticket through your portal account or contact us at support@ipserverone.com. We’ll be happy to assist you further.

Article posted on 12 April 2020