Master-Slave Replication

Table of Contents +

Think about an app like Twitter or a news site. Here’s the thing about apps like that:

For every one person posting something, thousands of people are just reading it.
So the app is doing way more reads than writes. Like, a lot more.
One database trying to serve all those reads on its own will start to choke.

So how do the big apps handle this? How do they let millions of people read at once without the database falling over? That’s exactly what master-slave replication is for, and by the end of this lesson you’ll be able to explain it like you’ve built it.

🎯 The Problem

Let’s start with the pain, because that’s what makes the solution click. Picture a single database doing everything:

Every write (someone posting a tweet) goes to it.
Every read (someone scrolling their feed) goes to it too.
As your app grows, the reads pile up fast, since most users just look around without posting.

Now here’s where it hurts:

That one database has only so much CPU, memory, and disk to go around.
When reads flood in, the database gets swamped, and everything slows down, even the writes.
If that one machine dies, your whole app goes down with it. There’s no backup ready to take over.

So we’ve got two problems really: one machine can’t keep up with all the reads, and one machine is a single point of failure. We need a way to spread the reads out, and to have a spare ready. That’s the idea we’re building toward.

🧩 What is Master-Slave Replication

Okay so here’s the plan in plain words. Instead of one database doing everything, we use a small team of databases:

One database is the master. It’s the boss, and it handles all the writes. Every insert, update, or delete goes here.
One or more databases are the slaves. Each slave keeps a copy of the master’s data, and they handle the reads.
The master keeps the slaves up to date by sending them every change it makes. Copying changes from one database to another like this is called replication.

So in one line: the master is the single source of truth that takes writes, and the slaves are read-only copies that take reads. Same data, just spread across more machines so no single one gets overwhelmed.

Why the split makes sense

Most apps read far more than they write. So it pays to throw extra machines at the reads. Writes still go to one place, which keeps the data consistent, while reads get spread across many copies. That’s the whole trick.

⚙️ How It Works

Let’s walk through what actually happens when your app talks to this setup. There are really two paths, one for writing and one for reading.

When your app needs to write something:

The write always goes to the master. Only the master.
The master saves the change to its own data first.
Then the master sends that change out to every slave, so they update their copies too.

When your app needs to read something:

The read goes to one of the slaves, not the master.
The slave answers from its copy of the data.
With several slaves around, you can spread reads across all of them, so no single slave gets buried either.

Here’s the whole flow in one picture. Writes come in on top and flow down to the slaves; reads come in at the bottom and get spread across them.

So the master is the one writer, and the slaves are a fleet of readers all kept in sync from above. Here’s the same split laid out side by side.

Role	Handles	How many	In plain words
Master	All writes (insert, update, delete)	Just one	The boss and source of truth
Slave	Reads only	One or many	Read-only copies of the master

⚡ Why It Helps

So why go through all this trouble? Because it solves the exact problems we started with. Here’s what you get:

Read scaling. Need to handle more reads? Just add another slave. Each new slave is more read power, so your app can serve a lot more users. Adding machines to share the load like this is called scaling out.
The master stays free. Since reads go to the slaves, the master isn’t getting swamped by people scrolling around. It can focus on doing writes well.
Built-in backups. Each slave is a full live copy of your data. So if something goes wrong, you’ve already got copies sitting right there, ready.
A spare ready to take over. Because a slave holds the same data, it can step up and become the new master if the master ever dies. We’ll look at that next.

So you get speed and safety at the same time. More machines reading, and copies always on standby.

🔄 Failover

Now what happens if the master suddenly dies? A crash, a bad disk, whatever. This is where the slaves really earn their keep.

When the master goes down, writes have nowhere to go, since only the master takes writes.
So the system picks one of the slaves and turns it into the new master. Turning a slave into the new master is called promotion.
The promoted slave now starts accepting writes, and the app points its writes there instead.
This whole process of recovering from a dead master by promoting a slave is called failover.

So failover is your safety net. Instead of the app being down until someone fixes the old master, a slave steps up and keeps things running.

Failover can be automatic

In serious setups, a watcher process keeps an eye on the master. The moment it stops responding, the watcher promotes a slave on its own, no human needed. That’s how big apps recover in seconds instead of hours.

🏷️ A Note on Naming

Quick heads-up on the words here, because this matters in interviews. The terms “master” and “slave” are the old names. Here’s the deal:

The industry has largely moved away from “master-slave” because of what those words carry.
The same exact idea is now usually called leader-follower, or sometimes primary-replica.
The leader (or primary) is the one that takes writes. The followers (or replicas) get copies and take reads. Same setup, friendlier names.

So if you hear “leader-follower” or “primary-replica” in a real codebase or an interview, don’t get confused. It’s the same thing you just learned. We dig into the modern framing in the Leader-Follower Architecture lesson.

⚠️ The Catches

This is a great pattern, but it’s not free magic. There are two catches you really need to know, because interviewers love to poke at them.

First, replication lag:

Copying changes from the master to the slaves takes a tiny bit of time. That small delay is called replication lag.
During that gap, a slave might still have the old data while the master has the new data.
So a read from a slave can be slightly out of date. Reading old data like this is called a stale read.
Real example: Alex posts a comment, then refreshes the page. If the read hits a slave that hasn’t caught up yet, Alex might not see the comment for a second. Annoying, right?

Second, writes don’t scale this way:

Every write still goes through the one master. Always.
So no matter how many slaves you add, your write capacity doesn’t go up at all.
This setup scales reads beautifully, but it does nothing for writes. If writes are your bottleneck, you need a different approach (like sharding, which is its own topic).

So remember the trade: you scale reads and you get a safety net, but the master is still a single writer with a little lag downstream.

⚠️ Common Mistakes and Misconceptions

A few things trip people up here. Let’s clear them out so you don’t say them in an interview.

“Slaves can take writes too.” No. Slaves are read-only. All writes go to the master, full stop. A slave only ever starts taking writes if it gets promoted to master during failover.
“Reads from slaves are always fresh.” Not guaranteed. Because of replication lag, a slave might serve slightly old data. If you need the very latest, you read from the master.
“This scales my writes.” It doesn’t. Adding slaves adds read capacity only. The single master is still the one and only writer, so write throughput stays the same.
“More slaves means I’m fully safe.” Slaves help a lot, but if the master dies you still need failover to pick a new one. And the lag means the promoted slave might be missing the very last few writes.

🛠️ Design Challenge

Try this one on your own to test yourself.

You’re designing the backend for a blog platform. Reads (people viewing posts) outnumber writes (authors publishing) by about a hundred to one. Sketch out a master-slave setup and answer these.

Where do writes go, and where do reads go?

Show the answer

If traffic doubles because of a viral post, what do you add, and does it help reads or writes?

Show the answer

An author publishes a post and immediately refreshes but doesn’t see it. What’s happening, and how could you fix it?

Show the answer

🧩 What You’ve Learned

You can now explain how big apps handle a flood of reads. Here’s what you’ve picked up.

✅ Master-slave replication uses one master for all writes and one or more slaves for reads.
✅ The master copies every change out to the slaves to keep them in sync.
✅ It scales reads by adding more slaves, and the master stays free of read load.
✅ Failover promotes a slave to be the new master when the old master dies.
✅ Replication lag means slave reads can be stale for a short moment.
✅ Writes don’t scale this way, since they all still go through the single master.
✅ The modern names for the same idea are leader-follower and primary-replica.

Check Your Knowledge

Test what you learned. Pick an answer for each question, then click Check.

🚀 What’s Next?

You’ve got the foundation of read scaling now. Next, go deeper into the modern framing and the practical side.

Leader-Follower Architecture revisits this same idea with the naming the industry actually uses today.
Read Replicas zooms into the slaves themselves, how you route reads to them and handle the lag in practice.

Once you’ve got those, you’ll be ready to talk about scaling databases with real confidence.

Previous Database Replication Explained Next Leader-Follower Architecture

Share & Connect

Share on LinkedIn