Search Our Database
How to fix DRBD recovery from split brain
Introduction
Distributed Replicated Block Device (DRBD®) is an essential component for configuring high-availability (HA) clusters. DRBD mirrors a complete block device over a network, effectively creating a network-based RAID-1. This replication enables synchronous data mirroring, ensuring that critical data remains accessible even if one server fails. DRBD is widely used for disaster recovery and HA environments, making it popular in systems that require minimal downtime and high fault tolerance.
This article addresses a common DRBD connectivity issue where nodes fail to synchronize, displaying an “unresolved split-brain” error. Split-brain occurs when both DRBD nodes make independent changes, preventing the devices from automatically syncing. This issue can prevent critical data from being mirrored between servers. Here, the guide outlines a step-by-step resolution to reconnect DRBD nodes and establish synchronization.
Problem
When attempting to connect DRBD on CentOS-based servers, the following error may appear on Node 1 and Node 2, preventing synchronization:
-
- Output from # /proc/drbd on Node 1:
version: 8.4.0 (api:1/proto:86-100) GIT-hash: 28753f559ab51b549d16bcf487fe625d5919c49c build by gardner@, 2011-12-12 23:52:00 0: cs:StandAlone ro:Secondary/Unknown ds:UpToDate/DUnknown r----- ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:76
-
- Log message on Node 1:
Mar 7 15:38:05 node1 kernel: block drbd0: Split-Brain detected but unresolved, dropping connection!
-
- Output from # /proc/drbd on Node 2:
version: 8.4.0 (api:1/proto:86-100) GIT-hash: 28753f559ab51b549d16bcf487fe625d5919c49c build by gardner@, 2011-12-12 23:52:00 0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown r----- ns:0 nr:0 dw:144 dr:4205 al:5 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:100
Solution
Step 1: Start DRBD Manually on Both Nodes
To re-establish DRBD connectivity, begin by starting the DRBD service on both the primary and secondary nodes.
-
- For versions earlier than CentOS 7:
/etc/init.d/drbd start
-
- For CentOS 7 and later versions:
systemctl start drbd
Step 2: Set the First Node as Secondary and Discard Data
On one of the nodes, run the following commands to designate it as the secondary node and discard its data to resolve the split-brain state:
drbdadm secondary all drbdadm disconnect all drbdadm -- --discard-my-data connect all
Step 3: Set the Second Node as Primary and Reconnect
On the other node, set it as the primary node and reconnect both nodes to reestablish data synchronization:
drbdadm primary all drbdadm disconnect all drbdadm connect all
Conclusion
By following this guide, DRBD connectivity issues resulting from split-brain conditions can be resolved, ensuring that data continues to mirror correctly between HA cluster nodes. Regular DRBD monitoring and quick response to split-brain events can prevent potential data loss in production environments.
Should you have any inquiries about the guidelines, please feel free to open a ticket through your portal account or contact us at support@ipserverone.com. We’ll be happy to assist you further.
Article posted on 12 April 2020