Vertical Scaling vs Horizontal Scaling
Table of Contents + −
Let’s say Alex built a small app and put it on one server. At first it was fine:
- A few hundred people used it, and the server handled them easily.
- Then word got around, and suddenly thousands of people showed up.
- Now the server is sweating. Pages load slowly, and sometimes it just freezes.
So Alex has a problem. The app is too popular for one machine. There are really only two ways out of this, and that’s what this whole lesson is about:
- Make that one server bigger and stronger.
- Or add more servers and split the work between them.
That’s it. Those two ideas have names, vertical scaling and horizontal scaling, and almost every “how do we handle more users” question comes down to picking between them. Let’s go through both.
🎯 The Problem: One Server Hits Its Limit
Before we talk solutions, let’s be clear about the actual pain. A single server is just a computer, and every computer has limits:
- It has a fixed amount of CPU, which is the part that does the thinking and calculating.
- It has a fixed amount of RAM, which is the short-term memory it uses to juggle many tasks at once. (RAM is short for Random Access Memory.)
- It can only handle so many requests at the same time before things slow down.
Here’s the thing. As more users show up, the server fills up:
- Requests start waiting in line because the CPU is busy.
- The RAM gets full, so the machine slows to a crawl.
- Eventually it can’t keep up, and users see errors or spinning loaders.
When a server is doing as much as it possibly can, we say it’s maxed out. So the real question becomes: how do we give the app more power to handle the crowd? That’s where scaling comes in. Scaling just means making your system able to handle more load.
⬆️ Vertical Scaling (Scale Up)
The first option is the simple one. If the server is too weak, give it more muscle.
- Vertical scaling means adding more power to the same machine.
- You add more CPU, more RAM, faster disks, to the one server you already have.
- People also call this scale up, because you’re making one box taller and stronger.
Think of it like a single delivery van that’s too small. Vertical scaling is swapping it for a bigger truck. Same one vehicle, just more capacity.
So why do people love this option? Because it’s the least painful path:
- It’s simple. You’re not changing how the app works, you just give it a stronger machine.
- No code changes. The app doesn’t even know it’s running on better hardware. It just runs faster.
- Nothing new to manage. It’s still one server, so there’s nothing tricky to coordinate.
But there’s a catch, actually a few of them. Vertical scaling runs into walls fast:
- There’s a hard limit. A machine can only get so big. At some point you can’t add any more CPU or RAM, and that’s the ceiling.
- It gets expensive at the top. The most powerful machines cost way more than their size suggests. Doubling the power often more than doubles the price.
- It’s a single point of failure. A single point of failure is one part that, if it breaks, takes the whole system down. With one server, if that machine dies, your app is just gone.
- It usually needs downtime. To swap in bigger hardware, you often have to turn the server off for a bit. Downtime means the app is unavailable during that window, and users can’t reach it.
One server means one point of failure
No matter how powerful you make a single machine, it’s still just one machine. If it crashes, loses power, or needs a reboot, your whole app goes down with it. That risk doesn’t shrink as the server gets bigger. It stays exactly the same.
↔️ Horizontal Scaling (Scale Out)
The second option takes a totally different angle. Instead of one strong machine, use many ordinary ones working together.
- Horizontal scaling means adding more machines and sharing the load between them.
- Instead of one server handling everything, you run several, and each one handles a slice of the traffic.
- People call this scale out, because you’re spreading out wide instead of growing tall.
Now there’s a question right away: if there are many servers, how does a user’s request know which one to go to? That’s the job of a load balancer:
- A load balancer is a piece that sits in front of your servers and hands each incoming request to one of them.
- It spreads the traffic around so no single server gets overwhelmed.
- To the user, it still looks like one website. They never know there are many machines behind it.
Going back to the van example, horizontal scaling isn’t a bigger truck. It’s getting five vans and a dispatcher who decides which van takes each delivery. The dispatcher is the load balancer.
So what makes this approach powerful?
- It scales almost without limit. Need more capacity? Add another server. And another. You’re not stuck waiting for a bigger machine to exist.
- It’s fault tolerant. Fault tolerant means the system keeps working even when a part fails. If one server dies, the load balancer just sends traffic to the others, and users barely notice.
- You can grow in cheap steps. Instead of buying one giant expensive machine, you add normal-sized ones as you need them.
But spreading out comes with its own headaches:
- It’s more complex. Now you’ve got many servers, a load balancer, and more moving parts to set up and watch.
- Your app should be stateless. Stateless means a server doesn’t hold on to a user’s session data on its own. We’ll unpack why that matters next.
Why stateless matters for scale out
Say a user logs in and that login info is stored only on Server 1. The next request might land on Server 2, which has never heard of this user, so it gets logged out. To avoid this, servers should be stateless and keep shared data like sessions in one common place, such as a database or a cache that all servers read from.
⚖️ Side by Side
Let’s put the two approaches next to each other so the trade-offs are crystal clear.
| What we compare | Vertical (Scale Up) | Horizontal (Scale Out) |
|---|---|---|
| Cost | Cheap at first, very expensive at the top | Grows in steady, affordable steps |
| Limit | Hard ceiling, a machine can only get so big | Near-unlimited, just add more machines |
| Complexity | Simple, no code changes | More complex, needs a load balancer and stateless apps |
| Failure | Single point of failure, one server is all you have | Fault tolerant, others cover for a dead server |
| Downtime | Usually needs downtime to upgrade hardware | Add or remove servers with no downtime |
🧩 Which Should You Use?
Now the practical question: which one should Alex pick? The honest answer is, it depends on where you are. Here’s a simple way to think about it:
- Start vertical. When your app is young and traffic is small, just get a bigger machine. It’s quick, it’s simple, and you’ve got better things to build than a load balancer setup.
- Go horizontal for big scale. Once you outgrow what one machine can do, or once you can’t afford any downtime, you spread out across many servers.
- Go horizontal for availability. If your app must stay up even when a server dies, you need more than one machine. A single server can’t give you that, no matter how big it is.
And here’s the part people miss. Most real systems do both:
- They run several servers (horizontal) for safety and scale.
- And each of those servers is a decently powerful machine (vertical) so they’re not wasting effort on tiny boxes.
So it’s not really a war between the two. It’s about knowing which one fits your stage.
🌍 Real Examples
Let’s make this concrete with two everyday situations.
A small business app, maybe an internal tool a company uses:
- It has a few hundred users, all in one office.
- When it slows down, the team just bumps the server up to more RAM and CPU.
- That’s vertical scaling, and it’s the right call. There’s no need for the complexity of many servers here.
A huge consumer site, think something like a popular shopping platform on sale day:
- Millions of people hit it at once, far more than any single machine could handle.
- It runs thousands of servers behind load balancers, spread across many locations.
- If a few servers fail, nobody even notices, because the rest pick up the slack.
- That’s horizontal scaling, and at that size there’s really no other choice.
So the same idea, “we need more power,” leads to two very different answers depending on how big you are.
⚠️ Common Mistakes and Misconceptions
A few ideas trip people up when they’re new to scaling. Let’s clear them out:
- “Just keep buying a bigger server forever.” You can’t. There’s a hard ceiling on how big one machine gets, and the top-end machines cost a fortune. At some point you have to scale out.
- “Horizontal is always better.” Not for a small app. Spreading across many servers adds real complexity, a load balancer to manage, stateless code to write, more things to monitor. If one machine does the job, that complexity is just wasted effort.
- “I can scale out anytime, my app’s ready.” Often it isn’t. If your app stores user sessions on each server, adding more machines will quietly log people out. You have to handle shared state first, otherwise scaling out breaks things.
- “Scaling fixes everything.” Sometimes the real problem is a slow database query or messy code. Throwing more hardware at a bad bottleneck just burns money. Find the actual slow part first.
🛠️ Design Challenge
Try this one on your own to test yourself.
Imagine Alex’s app is now getting popular, and it must stay online during a big marketing push, no downtime allowed. Sketch out how you’d scale it. As you do, answer these:
- Would you scale up, scale out, or both? Why?
- Where does the load balancer sit, and what does it do?
- The app currently stores logged-in sessions on each server. What problem does that cause when you add more servers, and how would you fix it?
Write down your reasoning. This is exactly the kind of thinking a system design interview is looking for.
🧩 What You’ve Learned
You can now explain the two core ways to handle more load. Here’s what you’ve picked up:
- ✅ Vertical scaling means adding more CPU and RAM to a single machine. Simple, but it has a hard ceiling, gets pricey, and is a single point of failure.
- ✅ Horizontal scaling means adding more machines and sharing the load with a load balancer. It scales far and is fault tolerant, but it’s more complex.
- ✅ Horizontal scaling needs a stateless app, with shared data kept in a common store so any server can handle any request.
- ✅ A good rule of thumb is start vertical for simplicity, then go horizontal for big scale and high availability.
- ✅ Most large real-world systems use both together.
Check Your Knowledge
Test what you learned. Pick an answer for each question, then click Check.
- 1
What is vertical scaling?
Why: Vertical scaling (scale up) means making one machine bigger and stronger.
- 2
What is a downside of vertical scaling?
Why: A single machine can only get so big, and if it dies the whole app goes down.
- 3
Why does horizontal scaling need a load balancer?
Why: With many servers, the load balancer spreads requests across them and routes around failures.
- 4
Why does horizontal scaling need a stateless app?
Why: If a session lives on only one server, the next request on another server would lose it, so shared state is kept in a common store.
🚀 What’s Next?
You’ve got the two scaling strategies down. The next pieces build right on top of this.
- What is Load Balancing? goes deeper into the piece that makes horizontal scaling work, and how it decides where each request goes.
- What is High Availability? shows how to keep a system online even when parts of it fail, which is a big reason teams scale out in the first place.
Get these two, and you’ll have the foundation that almost every system design discussion stands on.