What is Load Balancing?

Table of Contents +

Picture this. You build a website, you put it on one server, and it works great.

Then your site gets popular, like really popular.
Suddenly thousands of users are hitting that one server at the same time.
The server slows down, then it chokes, and then it crashes. Everyone sees an error page.

So how do the big sites stay up when millions of people show up at once? They don’t run on one giant server. They spread the crowd across many servers, and the thing that does that spreading is called a load balancer. Let’s see how it works, one piece at a time.

🎯 The Problem

Here’s the pain you hit with a single server:

One machine can only handle so many requests. There’s a limit to its CPU, memory, and network. Push past that, and it gets slow for everyone.
If too many people come at once, the server gets overwhelmed, requests start piling up, and some just time out.
And there’s a scarier problem. If that one server goes down, your whole site goes down with it. There’s no backup.

That last one has a name in system design. When your whole system depends on one part, and that part failing takes everything down, we call it a single point of failure. One server is the classic example. You really don’t want that.

So the obvious fix is: run more than one server. But the moment you have many servers, a new question pops up. When a user shows up, which server should answer them? That’s exactly the question a load balancer is built to answer.

🏦 A Real-World Analogy

Think about walking into a busy bank.

There are many counters open, each with a teller ready to help.
You don’t just wander up to a random counter and hope it’s free.
Instead, there’s a queue manager near the door. They look at the counters, see which one is free, and send you there.

A load balancer is exactly that queue manager, but for web traffic.

Each bank counter is a server.
Each customer is an incoming request from a user.
The queue manager standing at the door is the load balancer, deciding who goes where.

And see why this is smart. No single counter gets a huge line while others sit empty. If one teller goes on break, the manager just stops sending people there and uses the rest. Keep this bank picture in your head, every part below maps back to it.

⚖️ What is a Load Balancer

So let’s define it plainly.

A load balancer is a server that sits in front of your other servers and spreads incoming requests across all of them.
Users never talk to your servers directly. They talk to the load balancer, and it passes their request on to one of the servers behind it.
The servers behind it are often called a server pool or a backend pool. “Backend” just means the machines doing the actual work, hidden behind the load balancer.

Here’s the key mental model. To the outside world, it looks like there’s one address for your whole site. But behind that single address, there could be two servers, ten servers, or a thousand. The load balancer hides all of that. Users see one door, and behind the door there’s a whole room full of servers sharing the work.

One door, many rooms

The whole trick is that users only ever see the load balancer. They have no idea how many servers are behind it. You can add or remove servers any time, and nobody on the outside notices. That’s what makes load balancing so powerful for growing sites.

⚙️ How It Works

Let’s walk through what happens when one request comes in.

A user sends a request, like opening your homepage. That request lands on the load balancer first, not on any real server.
The load balancer looks at its list of healthy servers and picks one, using a rule we’ll cover in the next section.
It forwards your request to that chosen server. The server does the work and sends the answer back through the load balancer to you.

So from your side, you just asked for a page and got it. You never knew which server actually handled it. That’s the whole point, it stays invisible.

Now there’s one more job the load balancer quietly does, and it’s important. It runs health checks. Every few seconds it pings each server to ask “are you alive and responding?” If a server stops replying, the load balancer marks it as down and stops sending traffic there, so users never get routed to a dead machine.

L4 vs L7, lightly

You’ll sometimes hear load balancers described as L4 or L7. An L4 load balancer just looks at network info like IP and port and forwards traffic fast, without caring what’s inside. An L7 one looks deeper, at the actual HTTP request, so it can route based on things like the URL path. You don’t need the deep details yet, just know both exist.

🔁 Load Balancing Algorithms

Okay so the load balancer has to pick a server for each request. How does it decide? It follows a rule, and that rule is called a load balancing algorithm. There are a few common ones, and each picks servers in a different way.

Algorithm	How it works in one line
Round Robin	Hand out requests in a simple rotation: server 1, then 2, then 3, then back to 1.
Least Connections	Send the next request to whichever server is handling the fewest active connections right now.
IP Hash	Use the user’s IP address to always send the same user to the same server.

Let’s quickly feel the difference between them.

Round robin is the simplest. It just takes turns, so the load spreads evenly. It works great when all your servers are about the same and all requests are roughly equal.
Least connections is smarter when some requests take longer than others. Instead of blindly taking turns, it looks at who’s least busy and sends work there, so no server gets buried.
IP hash is useful when you want the same user to keep landing on the same server. It maps their IP to one server, so they stick there. That matters if the server is keeping some info about that user in its own memory.

There’s no single “best” one. You pick based on what your app needs. For most beginner answers, round robin and least connections are the two to know cold.

⚡ Benefits

So why go through all this trouble? Because load balancing gives you a few big wins.

It handles way more traffic. Instead of one server doing everything, the work is split across many. Need to handle more users? Just add more servers behind the load balancer.
It gives you high availability. High availability means your site stays up and reachable almost all the time, even when something breaks. If one server dies, the load balancer just routes around it to the healthy ones, and users barely notice.
It removes the single point of failure on the server side. No single server crashing can take your whole site down anymore, because there are others ready to pick up the slack.

There’s a nice bonus here too. This is the foundation of horizontal scaling, which means growing by adding more machines rather than buying one bigger machine. Load balancing is what makes adding more machines actually useful, since it spreads the traffic onto them.

⚠️ Things to Watch

Now here’s a catch that trips a lot of beginners. We added a load balancer to kill the single point of failure. But think about it, the load balancer itself is now the one thing every request passes through.

If your load balancer goes down, then nobody can reach any server, even though all your servers are perfectly healthy.
So the load balancer becomes a new single point of failure. We just moved the problem.
The fix is to run more than one load balancer, so if one fails, another takes over. In real systems, load balancers are almost always set up in pairs or groups for exactly this reason.

One more thing to know in passing. Sometimes a user needs to keep hitting the same server, maybe because that server is holding their login session in its own memory. Forcing a user to stick to one server like that is called a sticky session. It’s handy, but it can make load spreading less even, so it’s a trade-off you choose on purpose.

⚠️ Common Mistakes and Misconceptions

A few ideas confuse people early on. Let’s clear them up.

“A load balancer is just one server, so it’s a weak point.” Only if you run a single one. In practice you run more than one, so it isn’t a single point of failure.
“Load balancing makes a single request faster.” Not really. It doesn’t speed up one request. It lets you serve many requests at once without anyone waiting in a huge line.
“More servers always means more capacity automatically.” Only if traffic actually gets spread onto them. Without a load balancer, extra servers can just sit idle while one machine drowns.
“Round robin is always the best choice.” No single algorithm wins everywhere. If your requests vary a lot in size, least connections often does a better job.
“The load balancer does the actual work of the app.” It doesn’t. It only directs traffic. The real work still happens on the backend servers.

🛠️ Design Challenge

Try this one on your own to test yourself. Imagine you run an online store, and right now it’s all on a single server. A big sale is coming and you expect ten times the usual traffic.

Where would you put a load balancer, and what sits behind it?

Show the answer

Which algorithm would you start with, and why?

Show the answer

How would you make sure the load balancer itself does not become the thing that takes the whole site down?

Show the answer

🧩 What You’ve Learned

You can now explain how big sites stay up under heavy traffic. Here’s what you picked up.

✅ A single server gets overwhelmed and is a single point of failure.
✅ A load balancer sits in front of many servers and spreads requests across them.
✅ It runs health checks and skips any server that stops responding.
✅ Common algorithms are round robin, least connections, and IP hash.
✅ Load balancing gives you high availability and powers horizontal scaling.
✅ The load balancer must be made redundant, so it doesn’t become a single point of failure itself.

Check Your Knowledge

Test what you learned. Pick an answer for each question, then click Check.

🚀 What’s Next?

You’ve got the core idea of spreading traffic across servers. Next, let’s connect it to the bigger picture.

What Happens When You Type a URL walks through the full journey of a request, and shows where the load balancer fits in. Read it at /tutorials/system-design/what-happens-when-you-type-a-url.
Horizontal vs Vertical Scaling breaks down the two ways to grow a system, and why load balancing is what makes adding more machines actually pay off.

Once you’ve got those, you’ll be ready to reason about scaling and availability like a real system designer.

Previous Distributed Caching Next Load Balancer Architecture

Share & Connect

Share on LinkedIn