Database Replication Explained
Table of Contents + â
Picture your whole app running on one database server. It holds every user, every order, every message. So far so good, right?
- That one machine is doing all the work, reads and writes both.
- Now imagine it crashes, or its disk dies, or it just gets too busy.
- Your app goes down with it. Thereâs no backup machine ready to take over.
Thatâs a scary place to be. In this lesson weâll fix it by keeping copies of your data on more than one server. Letâs see how.
đŻ The Problem
Letâs name the pain first, because thatâs what makes replication click.
- With one database server, that server is a single point of failure. A single point of failure is any one piece that, if it breaks, takes the whole system down with it.
- So if that machine dies, your app canât read or write anything. Youâre stuck until someone fixes the box.
- And even when itâs healthy, one server can only handle so many requests. Most apps read data way more than they write it, so reads pile up and the server slows to a crawl.
So weâve got two problems sitting on one machine: it can crash, and it can get overloaded. Replication helps with both.
đď¸ What is Replication
Okay so hereâs the idea in plain words.
- Replication means keeping copies of your data on multiple database servers. Instead of one server holding everything, you have several, and they all carry the same data.
- Each copy lives on its own server. That copy is called a replica. A replica is just another database server that holds the same data as the main one.
- The servers stay in sync. When data changes on one, that change gets sent over to the others so they match.
Think of it like a teacher writing notes on the main whiteboard, and an assistant copying those same notes onto a second board in the next room. Two boards, same notes. If one room is full, students can read from the other.
âď¸ How It Works
In the most common setup, the servers arenât all equal. One is in charge, and the rest follow along.
- One server is the primary. The primary is the server that takes all the writes, meaning anything that changes data: inserts, updates, deletes. (Youâll also hear it called the leader or master.)
- The others are replicas. A replica copies whatever the primary does, so it ends up holding the same data. (Youâll also hear these called followers or read replicas.)
- Every time the primary changes something, it sends that change down to the replicas, and they apply the same change to their own copy.
So writes go to one place, but the data spreads out to many. Hereâs the write side, where the primary copies its changes to every replica.
And hereâs the clever part for reads. Since every replica holds the same data, your app can read from any of them. That spreads the read load across many machines instead of overloading one.
So the primary handles the writes, and a whole pool of replicas shares the reads. Thatâs the heart of replication.
đ Sync vs Async Replication
Now thereâs an important choice hiding in there. When the primary gets a write, does it wait for the replicas to confirm they got the copy, or does it just move on? That choice has a name.
- Synchronous (sync) replication means the primary waits. It doesnât tell your app âdoneâ until the replicas confirm theyâve saved the change too. Safer, because every copy is up to date, but slower, because youâre waiting on those extra confirmations.
- Asynchronous (async) replication means the primary doesnât wait. It saves the write, tells your app âdoneâ right away, and sends the copy to the replicas in the background. Faster, but the replicas can fall a little behind.
Hereâs the trade-off side by side.
| Aspect | Synchronous | Asynchronous |
|---|---|---|
| Does the primary wait? | Yes, for replica confirmation | No, replies right away |
| Write speed | Slower | Faster |
| Are replicas always current? | Yes | Can lag behind a bit |
| Risk of losing a write if primary dies | Very low | Possible |
| Good for | Money, orders, anything you canât lose | Feeds, analytics, most read-heavy apps |
Which one do most apps use?
Async is the common default, because speed matters and a tiny delay on the replicas is usually fine. Sync gets pulled in for the data you absolutely cannot afford to lose, like a bank balance. Many real systems even mix both.
⥠Why Replicate
So why go through all this trouble? Replication buys you a few big things at once.
- High availability. If the primary dies, a replica can be promoted to become the new primary, so your app keeps running. This switch-over is called failover.
- Read scaling. Spread reads across many replicas, so no single server gets crushed by traffic.
- Durability. Your data lives on several machines, so one dead disk doesnât wipe it out. (Durability just means your data stays around even when hardware fails.)
- Backups without slowing things down. You can take backups off a replica instead of the busy primary, so users donât feel a hit.
â ď¸ The Catch: Replication Lag
Replication isnât free, though. Thereâs one gotcha you really need to know.
- Replication lag is the small delay between a write hitting the primary and that change showing up on the replicas. The replicas can be a tiny bit behind.
- So if you write something to the primary and then immediately read it from a replica, you might get the old value. That out-of-date value is called stale data, meaning data thatâs slightly behind the latest version.
- This is the classic âI posted a comment but itâs not showing up yetâ feeling. The write landed on the primary, but the replica you read from hasnât caught up.
This ties into a bigger idea called eventual consistency. Eventual consistency means the replicas will all match the primary eventually, just not the very instant a write happens. For most apps a short lag is totally fine. For things like a bank balance, it isnât, and thatâs where you lean on sync replication or read straight from the primary.
â ď¸ Common Mistakes and Misconceptions
A few things trip people up here. Letâs clear them out.
- âReplication is the same as a backup.â No. Replication copies changes live, so if you delete a row by mistake, that delete copies to every replica instantly. A backup is a snapshot from a point in time that you can restore. You need both.
- âReplicas are always perfectly up to date.â Not with async replication. Thereâs almost always a little lag, so a replica read can return stale data.
- âI can write to any replica.â Usually no. In the standard primary-replica setup, writes only go to the primary. Replicas are read-only copies. Writing to a replica leads to conflicts and data that doesnât match.
đ ď¸ Design Challenge
Try this one yourself to test your thinking.
Imagine Alex runs a social media app. The feed gets read millions of times a day, but people post far less often. Sketch out how youâd use replication here.
- Where would writes (new posts) go?
- How would you handle the huge number of feed reads?
- Would a small replication lag on the feed be okay? What about on a userâs account balance for paid features?
Think through where stale data is fine and where it isnât. Thatâs exactly the kind of reasoning interviewers want to see.
đ§Š What Youâve Learned
You can now explain how databases stay alive and fast under load. Hereâs what youâve picked up.
- â Replication keeps copies of your data on multiple servers, so one machine isnât a single point of failure.
- â A primary takes the writes and copies changes to read-only replicas.
- â Reads spread across replicas, which scales read-heavy apps.
- â Sync replication waits for replicas (safe, slower); async doesnât wait (fast, can lag).
- â Replication lag means a replica read can return stale data, which ties into eventual consistency.
- â Replication is not a backup, and you normally donât write to replicas.
Check Your Knowledge
Test what you learned. Pick an answer for each question, then click Check.
- 1
What is database replication?
Why: Replication keeps the same data on several servers so one machine is not a single point of failure.
- 2
What is the difference between synchronous and asynchronous replication?
Why: Sync waits for confirmation so copies stay current, while async is faster but lets replicas fall behind.
- 3
What can replication lag cause?
Why: During the small delay a replica is slightly behind, so a read can return data that is not the latest.
- 4
Is replication a substitute for backups?
Why: An accidental delete copies to every replica instantly, so you still need point-in-time backups.
đ Whatâs Next?
Replication is the foundation for scaling databases. Next, weâll zoom into how teams actually put it to work.
- Read Replicas digs into routing reads to replicas and handling lag in real apps.
- Database Sharding shows how to scale writes by splitting data across servers, the next step beyond replication.
Get these two down and youâll have a solid story for any âhow would you scale the databaseâ interview question.