"Just spin up another region" — multi-region is never that simple. The distance between regions imposes the physical limit of the speed of light, and data replication trades latency against consistency in the choice between synchronous and asynchronous. This guide covers active-passive vs active-active, replication models, failover, split brain, and data residency. The replication mechanics within a region are a separate guide (how-replication-actually-works); the consistency choice under partitions is how-the-cap-theorem-actually-works.
Why Multi-Region
Limits of running a single region:
1. Availability — a whole-region outage (e.g. a major AWS us-east-1
outage) takes everything down
2. Latency — hundreds of ms RTT on every request for users on the
other side of the planet
3. Regulation — EU user data must stay in the EU (GDPR data residency)
4. Disaster recovery — a backup in the same region can be wiped by the
same disaster
→ Your motivation is one or more of these four.
"Just for speed" isn't enough — which motivation it is changes the
whole design.Key point: what you want multi-region for decides the topology. If you only need disaster recovery, active-passive is enough; if low latency worldwide is the goal you need active-active, but cost and complexity jump sharply.
Geographic Latency — A Physical Limit
Light travels ~200,000 km/s in fiber (2/3 of vacuum speed).
Round-trip (RTT) is distance × 2 + routing overhead.
Approximate RTTs (measured, larger than the ideal straight line):
Seoul ↔ Tokyo ~30-40 ms
Seoul ↔ Singapore ~70-90 ms
Seoul ↔ US West ~130-160 ms
Seoul ↔ US East ~180-200 ms
Seoul ↔ Europe ~230-280 ms
Implications:
- Synchronous replication between distant regions pays that RTT on
every write
- Seoul-Virginia sync replication = at least ~180 ms added per write
- You cannot tune this away — the speed of light is non-negotiableActive-Passive (Failover)
One region takes all traffic (active); the other only receives replicated data and waits (passive/standby). On failure, promote the passive to active.
[users]
│ (all)
▼
┌─────────┐ async replication ┌──────────┐
│ Primary │ ────────────────────▶ │ Standby │
│ (us-east) │ │ (us-west) │
└─────────┘ (idle, no traffic) └──────────┘
Pros:
- No write conflicts (only one place takes writes)
- Simple to build and reason about — normally it's basically one region
- Standby can double as a read replica
Cons:
- Standby resources sit idle most of the time (cost)
- If failover isn't automatic, RTO (recovery time) is minutes to tens
of minutes
- With async replication, failover may lose the last few seconds of
writes (RPO > 0)RTO / RPO — The Two Failover Metrics
RPO (Recovery Point Objective): how much data you can afford to lose
- Sync replication: RPO = 0 (no loss, at a latency cost)
- Async replication (5 s lag): RPO ≈ 5 s worth of writes
RTO (Recovery Time Objective): how long recovery may take
- Manual failover: a human wakes up and promotes → minutes to tens
- Automatic failover: health check + auto promotion → tens of seconds
These two decide your SLA and cost. "RPO 0, RTO 0" means active-active
or very expensive sync + auto failover. Most setups compromise on both.Active-Active (Multi-Master)
Every region takes reads and writes simultaneously. Users route to the nearest region → low latency. The price is write conflicts.
[Asia users] [US users]
│ │
▼ ▼
┌─────────┐ bidirectional ┌─────────┐
│ ap-east │ ◀────────────▶ │ us-east │
└─────────┘ replication └─────────┘
both take writes → same row, different values at once?
Pros:
- Every user hits a nearby region (low read & write latency)
- One region failing is absorbed instantly by the rest (RTO ≈ 0)
Cons:
- Need write conflict resolution (see below) — the hardest part
- During replication lag, regions can see different values (eventual)
- Strong consistency requires cross-region consensus = latency explosionReplication: Sync vs Async
Synchronous:
write → wait until the other region acks → then return success to client
- Strong consistency (RPO 0)
- Every write pays the cross-region RTT (tens to hundreds of ms)
- If the remote region is slow or cut off, writes stall (availability ↓)
Asynchronous:
write → commit locally, return success immediately → replicate in the
background
- Low latency (local speed)
- Replication lag exists → failover may lose the last writes (RPO > 0)
- Local writes continue even if the remote region is slow
Semi-sync:
Wait for at least 1 ack (not all) — a middle groundSynchronous replication between distant regions is rarely used — the latency wrecks the user experience. Most multi-region setups layer it: synchronous (or quorum) within a region, asynchronous between regions.
Write Conflicts — The Core Active-Active Problem
Two regions modify the same data nearly simultaneously:
ap-east: user.name = "Cheolsu" (t=0 ms)
us-east: user.name = "John" (t=5 ms)
replication lag 100 ms → both believe "I was first" → conflict
Resolution strategies:
1. LWW (Last-Write-Wins): the larger timestamp wins
- Simple but fragile to clock skew + the losing write vanishes silently
2. Conflict avoidance (partition by region): each datum is written by
only one region
- e.g. assign each user a home region → that user's writes go there only
- Eliminates conflicts at the source (the most practical option)
3. CRDT (Conflict-free Replicated Data Type): data structures that merge
mathematically
- Counters, sets, etc. merge automatically (converge regardless of order)
- Hard to apply to arbitrary data
4. Application merge: surface the conflict to the app to resolve (like a
Git merge conflict)The most common in practice is #2 (partition by region). Routing "this user's / this tenant's writes always to this region" keeps the active-active benefit (a nearby region) while removing conflicts at the source.
Failover — DNS · Health Checks
The layer that decides which region traffic goes to:
1. DNS-based (e.g. Route 53, Cloudflare):
- A health check watches the primary
- On death, swap the DNS answer to the standby IP
- Problem: DNS TTL + resolver caches → propagation takes tens of
seconds to minutes
- So you keep TTL short (e.g. 60 s), but too short raises DNS load
2. Anycast (e.g. advertise the same IP from multiple regions):
- BGP routing sends traffic to the nearest live region
- Fast propagation (bypasses DNS cache) — common for CDNs / edge
- Session affinity is hard to guarantee
3. Global Load Balancer (e.g. GCP GLB, AWS Global Accelerator):
- A single anycast entry point + health-based backend routing
- Fast failover, simple ops (but vendor lock-in)
Health check pitfalls:
- Too sensitive → false failover on transient jitter (flapping)
- Too dull → real outages detected late
- "App is alive but only the DB is dead" → need L7 (application) checksSplit-Brain — The Most Dangerous Failure
A network partition makes two regions each judge the other "dead":
[us-east] ──✕── [us-west] (inter-region link cut, both alive)
In active-passive:
- us-west thinks "primary is dead" → promotes itself to primary
- but us-east is happily still taking writes
- → two primaries each take writes → data diverges (split-brain)
- on partition heal, which one is the truth? → conflict / loss
Prevention:
- Quorum-based promotion: a witness/arbiter in a 3rd location votes
→ the side that can't get a majority is forbidden to promote (same
principle as consensus)
- Fencing: forcibly isolate the old primary (STONITH — "Shoot The Other
Node In The Head") — physically block the old primary's writes
- "Two regions alone can't do safe automatic failover" — an even split is
possible. You need at least 3 locations (or an external arbiter) to
decide a majority
See how-consensus-actually-works for consensus/quorum details.Data Residency — Regulatory Constraints
Some data legally must not leave a specific geography:
- GDPR (EU): EU citizens' personal data is restricted from leaving the
EU without adequate protection
- China, Russia, India, etc.: obligations to store citizens' data
domestically (data localization)
- Finance / healthcare: industry-specific rules (e.g. Korean financial
data kept domestically)
Design impact:
- "One global active-active replicating all data everywhere" → may violate
- Fix: pin data to a region — EU user data lives only in EU regions
→ effectively sharding by geography
- Replicate only metadata (non-PII) globally; isolate sensitive data
regionally
→ With data residency, active-active's "all data everywhere" breaks.
Check the regulations first, then pick the topology.Topology Comparison
| Aspect | Active-Passive | Active-Active |
|---|---|---|
| Write location | One region | Every region |
| Write conflicts | None | Core problem |
| Read latency | High for distant users | All nearby |
| RTO (failover) | Minutes to tens | ~0 |
| Resource efficiency | Standby idle | All used |
| Complexity | Low | High |
| Fits | Disaster recovery | Global low-latency need |
Common Pitfalls
- Synchronous replication across distant regions — every write pays the cross-region RTT, exploding perceived latency. Layer it: sync within a region, async between.
- Automatic failover with only 2 regions — a partition gives an even split → split-brain. Safe auto promotion is impossible without an external arbiter/witness (a 3rd location).
- Assuming DNS failover is instant — TTL + resolver caches mean minutes of propagation. Include DNS propagation time in your RTO math.
- Trusting LWW as conflict resolution — with clock skew the wrong write wins and the loser vanishes silently. Partition by region is safer.
- Never actually testing failover — "it'll work" doesn't, during an outage. Regular game days (deliberate region isolation drills) are mandatory.
- Ignoring data residency — global replication may violate regulations. Check first whether data must be pinned to a region.
- Not verifying the standby actually works — replication runs but promotion is blocked or capacity is short, so failover fails. Use the standby as a read replica at least, to keep it verified.
Wrap-up
The first question of multi-region is "why" — disaster recovery, latency, or regulation decides the topology. For disaster recovery, active-passive is enough; only when global low latency is the goal does active-active's complexity (write conflicts, split-brain) become worth it.
Physical limits (the speed of light) are non-negotiable. So practical designs split it: sync within a region, async between regions, and when active-active is needed, they partition data by region to remove conflicts at the source. And failover is always actually tested — an untested failover is not a failover.