Skip to content
yutils

How Database Replication Actually Works

Sync vs async, leader-follower vs multi-leader vs leaderless, replication lag, read-your-writes consistency, the surprising failure modes when you scale reads, and why eventual consistency is rarely what you actually want.

~10 min read

Database data copied across multiple nodes. The data survives node failure, and reads spread across nodes. Sounds simple, but sync vs async, leader vs multi-leader, lag handling, and read-your-writes consistency are each their own trade-off. This guide covers the mechanics.

Why Replicate

Three reasons:
1. High availability — if 1 node fails, another serves
2. Read scaling — N× read capacity (read replicas)
3. Geo-distribution — data near users → lower latency

Costs:
- Consistency vs availability trade-off (CAP)
- Replication lag — primary vs replica time skew
- Operational complexity — failover, split-brain, monitoring

Sync vs Async — Same Word, Different Guarantees

Synchronous Replication

Client → Primary write → Primary "waits"
                              ↓
                         Replicate to replicas
                              ↓
                        All (or N) replicas ack
                              ↓
                          OK returned to client

Pros:
- Zero data loss (committed everywhere by the time OK returns)
Cons:
- Slowest replica's latency = client's latency
- A hung replica stalls all writes
- Rarely used in full — usually only quorum sync (N/2+1)

Asynchronous Replication (most DBs' default)

Client → Primary write → Primary writes to its disk and returns OK
                              ↓ (in the background)
                         Replicate to replicas (may lag)

Pros:
- Low write latency (no replica dependency)
Cons:
- Replica lag — 1-100ms (or worse) — if primary dies in that window, data loss
- Stale reads on read replicas

Trade-off knobs:
- MySQL: semi-sync (wait for 1 replica ack)
- PostgreSQL: synchronous_commit = on/off/remote_apply
- Cassandra: ANY (none), ONE, QUORUM, ALL (degree of sync)

Topology — Who Replicates to Whom

Leader-Follower (most common)

     Leader (write)
       ↓     ↓     ↓
   Follower Follower Follower (read)

Pros: simple, clear consistency model
Cons: leader failure needs failover (downtime ~ failover time)
Used by: MySQL, PostgreSQL, MongoDB primary, Redis master

Multi-Leader (multi-master)

Leader-A (US)  ←→  Leader-B (EU)  ←→  Leader-C (APAC)
   ↓                ↓                  ↓
 Follower         Follower           Follower

Pros: geo-distributed; each region writes to the nearest leader
Cons: write conflicts (same key updated at different leaders) —
      resolution is complex
Used by: CouchDB, Bucardo, some multi-region setups

Leaderless (Dynamo-style)

All nodes are peers.
Client → writes directly to N nodes (or via a coordinator)
         W = write quorum, R = read quorum, N = replica count

Quorum condition: W + R > N → consistency guaranteed
  e.g. N=3, W=2, R=2 → even if one node is stale, comparing with another
       yields the correct value on read

Pros: no leader → no SPOF, very available ("always writable", Dynamo)
Cons: needs conflict resolution (LWW / version vectors / CRDTs)
Used by: Cassandra, Riak, DynamoDB (internally)

Replication Lag — The Invisible Trap

Scenario:
  user → Primary "upload profile photo" → OK
  user → refresh → load-balanced read goes to a Replica
  Replica hasn't received the photo yet (lag 50ms)
  → "no photo" displayed ← user confused

This violates read-your-writes consistency.

Mitigations:
1. Read from primary briefly after write (sticky session)
2. Route post-write reads to primary (logical clock)
3. Replica tracks timestamp — if older than client, fall back to primary
4. Force-invalidate caching layer for that user ID

Consistency Models — What Is Guaranteed

ModelGuaranteeRepresentative system
Strong / LinearizableAll clients see the same value at any moment (atomic)Spanner, etcd, ZooKeeper
SequentialAll processes see the same orderSome NewSQL
Read-your-writesMy writes show up in my readsPossible at the app layer
Monotonic readsA user only ever sees newer values (no going backwards)Sticky session
EventualAll converge eventually (no time bound)DNS, Cassandra default, S3

Failover — When the Leader Dies

Automatic failover steps:
1. Failure detection — heartbeat lost (usually 10-30s timeout)
2. Elect new leader — consensus (e.g. Raft) or manual
3. Verify data consistency — new leader's log is sufficient
   (otherwise data loss)
4. Update routing — clients to new leader

Pitfalls:
- False positives — alive leader temporarily hangs → failover → split-brain
- Asymmetric failure — leader can see followers but not clients
- Data loss — async-replicated uncommitted writes

Common Pitfalls

  • "Async replication is enough" — in production with zero data loss requirements, semi-sync (at least 1 replica ack) is mandatory.
  • Transactional reads from a replica — lag breaks consistency. Replicas are for analytics / cache-style reads.
  • Ignoring multi-leader write conflicts — last-write-wins drops data. CRDTs or app-level merges are the right tools.
  • Too many replicas — leader's outbound network explodes. Consider cascading replication (replica → replica).
  • Cross-region replication cost — some clouds charge per GB. Heavy writes can blow up monthly bills.

Wrap-up

Replication is not just copying — it's the simultaneous game of consistency, availability, and latency. Neither sync nor async is the right answer in isolation. Choose based on what can't be lost and what can lag.

Eventual consistency is marketing-attractive, but most apps actually want read-your-writes. So "leave it eventual" often causes "user's just-saved data disappears" — explicit app-layer reinforcement is required.

Back to guides