Load Balancer Architecture

Table of Contents +

So your app has a bunch of servers now, not just one. Good. But that opens up a new question right away:

When a user hits your site, who decides which server handles them?
And what if one of those servers quietly dies in the middle of the night?

That decision-maker sitting in front of your servers is the load balancer. A load balancer is a piece of software or hardware that takes incoming traffic and spreads it across many servers. In this lesson we’ll figure out exactly where it sits, how it decides where to send each request, and how to make sure the load balancer itself doesn’t become the weak link.

🎯 Where the Load Balancer Sits

Let’s get the picture straight first, because once you see where it sits, everything else clicks.

The load balancer sits between your clients and your servers. Clients are the users’ browsers or apps sending requests, and servers are the machines running your app.
That group of servers behind it has a name: the server pool (also called the backend pool). It’s just the set of identical servers that can each handle a request.
Clients never talk to your servers directly. They send everything to the load balancer, and it passes the request along to one server in the pool.
Because it stands in front and forwards traffic on behalf of the servers, a load balancer is a type of reverse proxy. A reverse proxy is just a middleman that receives requests and hands them to backend servers.

Here’s the whole setup in one picture.

See how every client funnels through the one load balancer, and the load balancer fans out to the pool? That’s the core shape of almost every scalable system you’ll ever design.

🩺 Health Checks

Now here’s a problem. What if Server 2 crashes? You don’t want the load balancer to keep cheerfully sending users to a dead machine. So the load balancer needs a way to know which servers are actually alive. That’s what health checks do.

A health check is the load balancer regularly pinging each server to ask “are you okay?” If the server answers properly, it’s healthy. If it doesn’t, it’s marked unhealthy.
The check usually hits a small endpoint on the server, something like /health, and expects a quick 200 OK back. (Remember, 200 just means “all good”.)
When a server fails its check, the load balancer stops sending traffic to it. Users get routed to the healthy ones instead, and most of them never even notice.
Once that server recovers and starts passing checks again, the load balancer quietly puts it back in rotation.

So the health check is what makes the whole thing self-healing. A server can die and your site stays up, because the load balancer just steers around the broken one.

Why health checks matter so much

Without health checks, adding more servers can actually make things worse. One dead server means a chunk of your users hit errors, and the load balancer keeps feeding it traffic anyway. Health checks are the difference between “more servers, more reliable” and “more servers, more ways to fail”.

🔢 Layer 4 vs Layer 7

Okay, so the load balancer forwards requests. But how closely does it actually look at each request before deciding where to send it? That’s where the difference between Layer 4 and Layer 7 comes in. These names come from the network layers, but you don’t need all that theory. Here’s the plain version.

A Layer 4 (L4) load balancer works at the connection level. It looks only at the IP address and port, basically just “where is this coming from and where is it going”. It does not open up the request to see what’s inside.
Because it doesn’t read the contents, L4 is very fast and lightweight. It just shuffles connections to servers without thinking hard about them.
A Layer 7 (L7) load balancer works at the application level. It actually reads the request, the URL, the headers, the cookies, and can make smart decisions based on what it sees.
So an L7 load balancer can do things like “send all /images requests to these servers and all /api requests to those servers”. L4 simply can’t, because it never looks at the URL.

Here’s the two side by side so it sticks.

Aspect	Layer 4 (L4)	Layer 7 (L7)
What it looks at	IP address and port only	URL, headers, cookies, full request
Smart routing	No, just forwards connections	Yes, can route by path or content
Speed	Faster, very low overhead	Slower, does more work per request
SSL termination	Usually no	Yes, can decrypt and inspect
Good for	Raw speed, simple traffic spreading	Web apps needing content-based routing

A simple way to remember it: L4 is a fast traffic cop who only checks number plates, while L7 is a receptionist who actually reads your request and sends you to the right desk.

🧩 How a Request Flows

Let’s trace one single request all the way through, so you can see all the pieces working together.

A client sends a request. It lands at the load balancer, not at any server directly.
The load balancer checks its list of healthy servers (the ones passing health checks) and skips any that are down.
From the healthy ones, it picks a server using its algorithm. The algorithm is just the rule for choosing, like “go around in a circle” (round robin) or “pick the one with the fewest active connections”.
It forwards the request to that chosen server. The server does its work and builds a response.
The response travels back through the load balancer to the client. The user just sees their page load, with no idea which server handled them.

Here’s that journey as a flow.

That’s it. The same simple loop happens millions of times a second on big sites, and the user never sees the machinery behind it.

🛡️ Don’t Make the LB a Single Point of Failure

Now here’s the trap that catches a lot of beginners. You set up a load balancer in front of ten servers, you feel safe, but think about it for a second:

If all traffic flows through one load balancer, and that one load balancer crashes, then your whole site goes down. Every single server behind it is now unreachable.
That’s called a single point of failure, or SPOF. A SPOF is any one part of the system that, if it breaks, takes everything down with it.
The fix is to run two or more load balancers, not just one. If the active one dies, a standby one takes over. This automatic handover is called failover.
Together this gives you high availability, which just means the system stays up even when individual pieces fail. The whole point of load balancing is reliability, so it would be silly to add a brand new way for everything to break.

Here’s what a redundant setup looks like, with a backup load balancer ready to step in.

So the rule of thumb: never let your load balancer be the only one of its kind. Always have a buddy ready to take over.

⚡ What a Load Balancer Also Does

Spreading traffic is the main job, but a modern load balancer (especially an L7 one) usually does a few extra things while it’s there. Here’s a quick taste of each, and we’ll go deeper on them in later lessons.

SSL termination: the load balancer handles the HTTPS encryption and decryption, so your backend servers don’t have to. That takes load off them.
Sticky sessions: it can keep sending the same user to the same server, which matters when that server is remembering something about them.
Basic routing: an L7 load balancer can send different URLs to different server groups, like /api to one set and /static to another.

Don’t worry about mastering these right now. Just know the load balancer is more than a simple splitter, and each of these is a topic on its own.

⚠️ Common Mistakes and Misconceptions

A few ideas trip people up early on. Let’s clear them out before they stick.

“One load balancer is enough.” Nope. One load balancer means one single point of failure. You need at least two with failover, or you’ve just moved the risk, not removed it.
“L4 and L7 are basically the same.” They’re not. L4 only sees IP and port and forwards blindly, while L7 reads the actual request and can route by URL or headers. Different powers, different costs.
“Skipping health checks is fine, my servers rarely crash.” Without health checks, the load balancer can’t tell a dead server from a live one, so it keeps sending users into errors. Health checks aren’t optional for a real system.
“The load balancer holds my app’s data.” It doesn’t. It just forwards traffic. Your servers and databases hold the data. The load balancer is a traffic director, not a storage box.

🛠️ Design Challenge

Try this one on your own to test yourself. Imagine Alex runs a photo-sharing site. There’s a set of servers for the website pages and a separate set for serving images, because images are heavy.

Would you reach for an L4 or an L7 load balancer here, and why?

Show the answer

How would the load balancer know to send /images requests to the image servers and everything else to the web servers?

Show the answer

Where would you add a second load balancer so the whole thing doesn’t go down if one fails?

Show the answer

🧩 What You’ve Learned

You can now explain how a load balancer is built and where it fits. Here’s what you’ve picked up.

✅ The load balancer sits between clients and the server pool and acts as a reverse proxy.
✅ Health checks let it skip dead servers and route only to healthy ones, making the system self-healing.
✅ L4 balances by IP and port for raw speed, while L7 reads the request and routes by URL or headers.
✅ A request flows in to the load balancer, gets sent to a healthy server by an algorithm, and the response flows back.
✅ Running two or more load balancers with failover avoids a single point of failure and gives high availability.
✅ A load balancer also handles SSL termination, sticky sessions, and basic routing.

Check Your Knowledge

Test what you learned. Pick an answer for each question, then click Check.

🚀 What’s Next?

You’ve got the architecture down. Next, step back for the big picture and then zoom into how it actually picks a server.

What is Load Balancing? covers the why and the core idea from the ground up.
Round Robin Algorithm digs into the simplest way a load balancer chooses which server gets the next request.

Once you’ve got those, you’ll be ready to compare the different load balancing algorithms and pick the right one for the job.

Previous What is Load Balancing? Next Layer 4 vs Layer 7 Load Balancing

Share & Connect

Share on LinkedIn