Replication is one of those things that looks simple until it breaks at 3 AM and your primary just promoted itself to a state your replica has never seen. Classic split-brain. You’ve got two nodes that both think they’re right, and your data is somewhere in between.
This article is for the engineers who’ve already skimmed the official docs, found them dry, and want the real story: how GTIDs actually save you from binary log position hell, what causes replica lag and how to catch it before users do, and how to recover from split-brain without just nuking the replica and starting over.
The official MySQL docs live at https://github.com/mysql/mysql-server. MariaDB’s replication docs are at https://mariadb.com/kb/en/replication/.
Why File-Position Replication Is a Maintenance Nightmare
Classic MySQL replication works like this: the primary writes events to a binary log file, the replica connects, asks "give me everything starting at file mysql-bin.000042, position 891234", and streams from there.
That works fine until you do a failover. After promoting your replica to primary, you need to tell the new replica where to start reading — but now the binlog file name and position on the new primary are completely different from anything the old primary had. You’re hand-stitching offsets together under pressure. People get it wrong. Data diverges silently.
GTIDs (Global Transaction Identifiers) fix this by tagging every committed transaction with a universally unique ID in the format source_uuid:transaction_id. Instead of "file + position," a replica tells the primary "I’ve already seen these GTIDs, give me everything else." After a failover, the same conversation happens with the new primary — no manual offset hunting required.
GTID Concepts You Actually Need
A GTID looks like this:
3E11FA47-71CA-11E1-9E33-C80AA9429562:1-57
The UUID identifies the server that originally committed the transaction. The number (or range) is the sequence. A server tracks two sets:
gtid_executed— every GTID this server has applied or generatedgtid_purged— GTIDs that have been executed but whose binary logs have been deleted
The replica tracks gtid_executed and uses it to compute what it still needs from the primary. This is the magic that makes CHANGE MASTER TO MASTER_AUTO_POSITION=1 work.
MySQL vs MariaDB GTIDs: they’re not wire-compatible. MySQL uses a UUID-based scheme; MariaDB uses domain_id-server_id-sequence_number (e.g., 0-1-42). You cannot replicate between MySQL and MariaDB using GTIDs. If you’re running a mixed environment, stick to file-position replication or use a middleware layer like MaxScale.
Setting Up GTID Replication from Scratch
Primary configuration
# /etc/mysql/mysql.conf.d/mysqld.cnf (MySQL)
# or /etc/mysql/mariadb.conf.d/50-server.cnf (MariaDB)
[mysqld]
server_id = 1 # Must be unique across the topology
log_bin = /var/log/mysql/mysql-bin
binlog_format = ROW # Always ROW for GTID replication
gtid_mode = ON # MySQL only
enforce_gtid_consistency = ON # MySQL only; prevents statements incompatible with GTIDs
# MariaDB enables GTIDs automatically when log_bin is set; no gtid_mode needed
# but you should set:
# gtid_strict_mode = 1 # MariaDB: rejects out-of-order GTIDs
binlog_expire_logs_seconds = 604800 # Keep 7 days of binary logs
max_binlog_size = 256M
sync_binlog = 1 # Flush binlog to disk on every commit — expensive but safe
innodb_flush_log_at_trx_commit = 1 # Durability guarantees
# Replica safety: skip writing own events to binlog by default
# log_replica_updates = ON # Enable this if this node may become a relay primary
Create a dedicated replication user. Never reuse root or an application account:
-- On the primary
CREATE USER 'replicator'@'%' IDENTIFIED WITH mysql_native_password BY 'StrongPassHere!';
GRANT REPLICATION SLAVE ON *.* TO 'replicator'@'%';
FLUSH PRIVILEGES;
Taking a consistent backup for the initial replica seed
Don’t try to point a blank replica at a live primary without seeding it first. You’ll trigger a full table dump from the primary, which will crush it under load.
Use mysqldump with the GTID option:
mysqldump \
--single-transaction \
--master-data=2 \
--set-gtid-purged=ON \
--all-databases \
--routines \
--triggers \
-u root -p \
> /tmp/primary_dump.sql
--set-gtid-purged=ON embeds a SET @@GLOBAL.gtid_purged statement in the dump. When you restore this on the replica, it tells MySQL "these GTIDs have already been applied, don’t ask the primary for them."
For large databases (50 GB+), use xtrabackup (Percona XtraBackup) instead — it takes a hot physical backup without locking tables:
# On the primary
xtrabackup --backup \
--target-dir=/var/backup/xtrabackup \
--user=root \
--password='yourpassword'
xtrabackup --prepare --target-dir=/var/backup/xtrabackup
# Transfer to replica, then restore
xtrabackup --copy-back --target-dir=/var/backup/xtrabackup
chown -R mysql:mysql /var/lib/mysql
Replica configuration
# /etc/mysql/mysql.conf.d/mysqld.cnf
[mysqld]
server_id = 2 # Different from primary
relay_log = /var/log/mysql/mysql-relay-bin
log_bin = /var/log/mysql/mysql-bin # Required for log_replica_updates
log_replica_updates = ON # Write replicated events to own binlog
gtid_mode = ON
enforce_gtid_consistency = ON
read_only = ON # Prevent accidental writes to replica
super_read_only = ON # Block even SUPER users from writing
# Parallel replication — dramatically reduces lag on multi-core hardware
replica_parallel_workers = 4
replica_parallel_type = LOGICAL_CLOCK # Better than DATABASE for most workloads
replica_preserve_commit_order = ON # Keeps commits in original order
Restore the dump, then connect:
-- MySQL 8.0+ syntax
CHANGE REPLICATION SOURCE TO
SOURCE_HOST = '10.0.0.1',
SOURCE_PORT = 3306,
SOURCE_USER = 'replicator',
SOURCE_PASSWORD = 'StrongPassHere!',
SOURCE_AUTO_POSITION = 1; -- This is the GTID magic line
START REPLICA;
SHOW REPLICA STATUS\G
-- MariaDB syntax
CHANGE MASTER TO
MASTER_HOST = '10.0.0.1',
MASTER_PORT = 3306,
MASTER_USER = 'replicator',
MASTER_PASSWORD = 'StrongPassHere!',
MASTER_USE_GTID = slave_pos; -- MariaDB GTID mode
START SLAVE;
SHOW SLAVE STATUS\G
Look at Replica_IO_Running: Yes and Replica_SQL_Running: Yes. If either is No, the Last_Error field tells you why.
Gotcha: gtid_mode Can’t Be Toggled Live on a Busy Primary
Switching from file-position replication to GTID on a running production primary is a multi-step process (ON_PERMISSIVE → ON), and you need all replicas to go through the same transition. Doing it mid-flight on a system with active writes is a recipe for broken replication. Either plan a maintenance window or build the topology with GTIDs from day one.
Monitoring Replication Lag
Seconds_Behind_Source in SHOW REPLICA STATUS is the number most people watch, but it’s lying to you. It compares the timestamp of the event the replica is currently executing against wall clock time. If the replica’s SQL thread is idle (all caught up), it reports 0 — even if the IO thread is several gigabytes behind, still downloading events.
For real lag measurement, use Percona’s pt-heartbeat:
# On the primary — insert a heartbeat every second
pt-heartbeat \
--database percona \
--update \
--interval=1 \
--daemonize \
--user=root \
--password='yourpassword'
# On the replica — measure the actual lag
pt-heartbeat \
--database percona \
--monitor \
--master-server-id=1 \
--user=root \
--password='yourpassword'
This inserts a timestamped row on the primary and reads it on the replica. True end-to-end latency, network and SQL thread included.
For Prometheus-based monitoring, the mysql_exporter exposes mysql_slave_status_seconds_behind_master and mysql_slave_status_slave_sql_running. Combine with a Grafana alert at >30 seconds lag and you’ll know before your users do.
Why replicas fall behind
Three common causes:
1. Single-threaded SQL thread bottleneck. Before MySQL 5.7 / MariaDB 10.0, replicas applied events serially. One slow query on the primary = replica falls behind by that query’s duration, forever. Parallel replication (replica_parallel_workers) solves this but requires your workload to be suitable (transactions touching different rows/tables).
2. Large transactions. A DELETE FROM logs WHERE created_at < '2024-01-01' that takes 40 seconds on the primary takes 40 seconds on the replica too, and nothing else can apply during that time. Batch your deletes: delete 10,000 rows at a time in a loop with small sleeps.
3. Network saturation. Binary logs are not compressed by default. If your primary is doing heavy writes and the replica is on a congested link, the IO thread will lag purely on download. Enable SOURCE_COMPRESSION_ALGORITHMS = 'zstd' in MySQL 8.0+.
Gotcha: read_only Doesn’t Stop Replication Threads
read_only = ON blocks regular users from writing. The replica’s SQL thread runs as a replication thread and is exempt. super_read_only = ON also blocks SUPER users, but still exempts replication. This is correct behavior — don’t fight it. Just make sure your application users on the replica don’t have SUPER.
Split-Brain: What It Is and Why It Happens
Split-brain occurs when two nodes both accept writes believing they’re the authoritative primary. The usual trigger: a primary goes unresponsive (network partition, OOM kill), an orchestrator (MHA, Orchestrator, ProxySQL + Keepalived) promotes the replica, and then the original primary comes back online and starts accepting writes again because nothing fenced it off.
You now have two primaries, diverged transaction histories, and GTID sets that are mutually incompatible.
Prevention first
Use fencing. Before promoting a replica, shoot the old primary in the head:
- STONITH (Shoot The Other Node In The Head) in a proper HA stack like Pacemaker
- VIP/floating IP that gets removed from the old primary before the new one takes it
- A
FLUSH TABLES WITH READ LOCKvia SSH to the old primary as part of the failover script - Cloud-native: revoke the old primary’s security group rule for port 3306 atomically
Orchestrator with pseudo-GTID or real GTID mode handles this well and has built-in anti-split-brain logic.
Recovering from Split-Brain
So it happened anyway. Here’s the clinical approach.
Step 1: Figure out who has what
On both nodes, run:
SELECT @@server_uuid;
SHOW MASTER STATUS; -- or SHOW BINARY LOG STATUS on MySQL 8.4+
SELECT @@global.gtid_executed;
Write down the GTID sets from both nodes. You need to determine:
- Which GTIDs exist on the old primary but not the new?
- Which GTIDs exist on the new primary (wrote during the split window) but not the old?
-- Find GTIDs present on old-primary but not new-primary
-- Run on new-primary:
SELECT GTID_SUBTRACT(
'paste-old-primary-gtid_executed-here',
@@global.gtid_executed
) AS missing_on_new_primary;
-- Find diverged GTIDs (generated on new-primary after promotion)
SELECT GTID_SUBTRACT(
@@global.gtid_executed,
'paste-old-primary-gtid_executed-here'
) AS new_primary_only;
Step 2: Assess the damage
If missing_on_new_primary is empty, the replica was fully caught up when it was promoted. You have clean divergence: new writes went only to the new primary, old primary just accepted stale reads (or got written to by a misconfigured app).
If missing_on_new_primary is non-empty, there were in-flight transactions on the old primary that never made it to the replica. Those transactions are now orphaned. You need to decide: extract and replay them, or accept the data loss.
Step 3a: Clean divergence — re-attach old primary as replica
The old primary needs to "forget" the writes it accepted after the split and catch up to the new primary. If those writes are acceptable to lose:
-- On old primary (now being demoted back to replica)
STOP REPLICA; -- stop any replication if it somehow reconnected
-- Reset its GTID state to match what it should have seen
-- Only do this if you are sure the diverged GTIDs represent lost work you're OK discarding
RESET MASTER;
SET @@global.gtid_purged = 'paste-new-primary-gtid_executed-here';
-- Point it at the new primary
CHANGE REPLICATION SOURCE TO
SOURCE_HOST = '10.0.0.2', -- new primary
SOURCE_AUTO_POSITION = 1;
START REPLICA;
SHOW REPLICA STATUS\G
Watch Replica_SQL_Running_State — it should say "Reading event from the relay log" and then go idle as it catches up.
Step 3b: There are orphaned transactions you want to recover
Extract the orphaned transactions from the old primary’s binary logs using mysqlbinlog:
mysqlbinlog \
--start-datetime="2026-05-23 02:30:00" \
--stop-datetime="2026-05-23 02:45:00" \
--include-gtids="3E11FA47-...:58-62" \
/var/log/mysql/mysql-bin.000045 \
> /tmp/orphaned_transactions.sql
Review that SQL file manually. Check for conflicts with what the new primary already has. If it’s clean:
# Apply against the new primary while replication is paused
mysql -u root -p < /tmp/orphaned_transactions.sql
This is surgical work. Take a backup of the new primary before touching it.
Gotcha: RESET MASTER Wipes Your Entire Binlog History
RESET MASTER deletes all binary logs and resets gtid_executed to empty. On a live primary with replicas still attached, this is catastrophic — replicas will request GTIDs that no longer exist and error out. Only run it on a node you’re demoting and have already disconnected from everything.
Production Checklist
Before you call your replication topology "production-ready":
- Binary logs are on a separate disk or volume from data files. A disk full event that kills binlogs takes down replication.
binlog_expire_logs_secondsis set long enough that a replica can fall behind for a weekend (support incident, network outage) and still catch up. 7-14 days is reasonable.sync_binlog = 1andinnodb_flush_log_at_trx_commit = 1are set on the primary. Without these, a crash can lose the last transaction from the binlog while InnoDB already committed it — instant replication break.- Automated lag alerting via
pt-heartbeatormysql_exporter.Seconds_Behind_Sourcealone is not enough. - Fencing is in place before your orchestrator promotes a replica. Untested failover is no failover.
- You’ve done a failover drill in staging. Orchestrator, MHA, and similar tools have edge cases that only surface when you actually run them.
- Replicas have
super_read_only = ON. One accidental direct write to a replica creates a GTID that the primary has never seen, and the next time you try to re-attach it as a replica it will diverge.
One Final Thing About MariaDB-Specific Behavior
MariaDB’s GTID implementation has a concept called gtid_domain_id (default 0). This allows multi-source replication and ring topologies where each node has its own domain, preventing circular replication of the same events. If you’re building active-active or ring replication with MariaDB, set a unique gtid_domain_id on each node. Getting this wrong is the fastest way to create a split-brain you can’t easily diagnose.
# Node 1
gtid_domain_id = 1
# Node 2
gtid_domain_id = 2
MySQL doesn’t have this concept — its UUID-based GTIDs handle multi-source naturally.
Replication is not a backup, it’s not magic HA, and it’s definitely not something you set up once and ignore. Treat it like a production dependency: monitor it actively, test your failover regularly, and understand your GTID sets before a 3 AM incident forces you to learn them under pressure.