Load Balancer Architecture

So your app has a bunch of servers now, not just one. Good. But that opens up a new question right away:

  • When a user hits your site, who decides which server handles them?
  • And what if one of those servers quietly dies in the middle of the night?

That decision-maker sitting in front of your servers is the load balancer. A load balancer is a piece of software or hardware that takes incoming traffic and spreads it across many servers. In this lesson we’ll figure out exactly where it sits, how it decides where to send each request, and how to make sure the load balancer itself doesn’t become the weak link.

🎯 Where the Load Balancer Sits

Let’s get the picture straight first, because once you see where it sits, everything else clicks.

  • The load balancer sits between your clients and your servers. Clients are the users’ browsers or apps sending requests, and servers are the machines running your app.
  • That group of servers behind it has a name: the server pool (also called the backend pool). It’s just the set of identical servers that can each handle a request.
  • Clients never talk to your servers directly. They send everything to the load balancer, and it passes the request along to one server in the pool.
  • Because it stands in front and forwards traffic on behalf of the servers, a load balancer is a type of reverse proxy. A reverse proxy is just a middleman that receives requests and hands them to backend servers.

Here’s the whole setup in one picture.

Client

Load Balancer

Client

Client

Server 1

Server 2

Server 3

See how every client funnels through the one load balancer, and the load balancer fans out to the pool? That’s the core shape of almost every scalable system you’ll ever design.

🩺 Health Checks

Now here’s a problem. What if Server 2 crashes? You don’t want the load balancer to keep cheerfully sending users to a dead machine. So the load balancer needs a way to know which servers are actually alive. That’s what health checks do.

  • A health check is the load balancer regularly pinging each server to ask “are you okay?” If the server answers properly, it’s healthy. If it doesn’t, it’s marked unhealthy.
  • The check usually hits a small endpoint on the server, something like /health, and expects a quick 200 OK back. (Remember, 200 just means “all good”.)
  • When a server fails its check, the load balancer stops sending traffic to it. Users get routed to the healthy ones instead, and most of them never even notice.
  • Once that server recovers and starts passing checks again, the load balancer quietly puts it back in rotation.

So the health check is what makes the whole thing self-healing. A server can die and your site stays up, because the load balancer just steers around the broken one.

Why health checks matter so much

Without health checks, adding more servers can actually make things worse. One dead server means a chunk of your users hit errors, and the load balancer keeps feeding it traffic anyway. Health checks are the difference between “more servers, more reliable” and “more servers, more ways to fail”.

🔢 Layer 4 vs Layer 7

Okay, so the load balancer forwards requests. But how closely does it actually look at each request before deciding where to send it? That’s where the difference between Layer 4 and Layer 7 comes in. These names come from the network layers, but you don’t need all that theory. Here’s the plain version.

  • A Layer 4 (L4) load balancer works at the connection level. It looks only at the IP address and port, basically just “where is this coming from and where is it going”. It does not open up the request to see what’s inside.
  • Because it doesn’t read the contents, L4 is very fast and lightweight. It just shuffles connections to servers without thinking hard about them.
  • A Layer 7 (L7) load balancer works at the application level. It actually reads the request, the URL, the headers, the cookies, and can make smart decisions based on what it sees.
  • So an L7 load balancer can do things like “send all /images requests to these servers and all /api requests to those servers”. L4 simply can’t, because it never looks at the URL.

Here’s the two side by side so it sticks.

Aspect Layer 4 (L4) Layer 7 (L7)
What it looks at IP address and port only URL, headers, cookies, full request
Smart routing No, just forwards connections Yes, can route by path or content
Speed Faster, very low overhead Slower, does more work per request
SSL termination Usually no Yes, can decrypt and inspect
Good for Raw speed, simple traffic spreading Web apps needing content-based routing

A simple way to remember it: L4 is a fast traffic cop who only checks number plates, while L7 is a receptionist who actually reads your request and sends you to the right desk.

🧩 How a Request Flows

Let’s trace one single request all the way through, so you can see all the pieces working together.

  • A client sends a request. It lands at the load balancer, not at any server directly.
  • The load balancer checks its list of healthy servers (the ones passing health checks) and skips any that are down.
  • From the healthy ones, it picks a server using its algorithm. The algorithm is just the rule for choosing, like “go around in a circle” (round robin) or “pick the one with the fewest active connections”.
  • It forwards the request to that chosen server. The server does its work and builds a response.
  • The response travels back through the load balancer to the client. The user just sees their page load, with no idea which server handled them.

Here’s that journey as a flow.

Client sends request

Load Balancer receives it

Pick a healthy server by algorithm

Forward to that server

Server builds response

Response returns to client

That’s it. The same simple loop happens millions of times a second on big sites, and the user never sees the machinery behind it.

🛡️ Don’t Make the LB a Single Point of Failure

Now here’s the trap that catches a lot of beginners. You set up a load balancer in front of ten servers, you feel safe, but think about it for a second:

  • If all traffic flows through one load balancer, and that one load balancer crashes, then your whole site goes down. Every single server behind it is now unreachable.
  • That’s called a single point of failure, or SPOF. A SPOF is any one part of the system that, if it breaks, takes everything down with it.
  • The fix is to run two or more load balancers, not just one. If the active one dies, a standby one takes over. This automatic handover is called failover.
  • Together this gives you high availability, which just means the system stays up even when individual pieces fail. The whole point of load balancing is reliability, so it would be silly to add a brand new way for everything to break.

Here’s what a redundant setup looks like, with a backup load balancer ready to step in.

failover

failover

Clients

Load Balancer (active)

Load Balancer (standby)

Server 1

Server 2

So the rule of thumb: never let your load balancer be the only one of its kind. Always have a buddy ready to take over.

⚡ What a Load Balancer Also Does

Spreading traffic is the main job, but a modern load balancer (especially an L7 one) usually does a few extra things while it’s there. Here’s a quick taste of each, and we’ll go deeper on them in later lessons.

  • SSL termination: the load balancer handles the HTTPS encryption and decryption, so your backend servers don’t have to. That takes load off them.
  • Sticky sessions: it can keep sending the same user to the same server, which matters when that server is remembering something about them.
  • Basic routing: an L7 load balancer can send different URLs to different server groups, like /api to one set and /static to another.

Don’t worry about mastering these right now. Just know the load balancer is more than a simple splitter, and each of these is a topic on its own.

⚠️ Common Mistakes and Misconceptions

A few ideas trip people up early on. Let’s clear them out before they stick.

  • “One load balancer is enough.” Nope. One load balancer means one single point of failure. You need at least two with failover, or you’ve just moved the risk, not removed it.
  • “L4 and L7 are basically the same.” They’re not. L4 only sees IP and port and forwards blindly, while L7 reads the actual request and can route by URL or headers. Different powers, different costs.
  • “Skipping health checks is fine, my servers rarely crash.” Without health checks, the load balancer can’t tell a dead server from a live one, so it keeps sending users into errors. Health checks aren’t optional for a real system.
  • “The load balancer holds my app’s data.” It doesn’t. It just forwards traffic. Your servers and databases hold the data. The load balancer is a traffic director, not a storage box.

🛠️ Design Challenge

Try this one on your own to test yourself.

Imagine Alex runs a photo-sharing site. There’s a set of servers for the website pages and a separate set for serving images, because images are heavy. Sketch out a load balancer setup for this and answer these:

  • Would you reach for an L4 or an L7 load balancer here, and why?
  • How would the load balancer know to send /images requests to the image servers and everything else to the web servers?
  • Where would you add a second load balancer so the whole thing doesn’t go down if one fails?

If you can talk through that, you understand load balancer architecture well enough for most interviews.

🧩 What You’ve Learned

You can now explain how a load balancer is built and where it fits. Here’s what you’ve picked up.

  • ✅ The load balancer sits between clients and the server pool and acts as a reverse proxy.
  • ✅ Health checks let it skip dead servers and route only to healthy ones, making the system self-healing.
  • ✅ L4 balances by IP and port for raw speed, while L7 reads the request and routes by URL or headers.
  • ✅ A request flows in to the load balancer, gets sent to a healthy server by an algorithm, and the response flows back.
  • ✅ Running two or more load balancers with failover avoids a single point of failure and gives high availability.
  • ✅ A load balancer also handles SSL termination, sticky sessions, and basic routing.

Check Your Knowledge

Test what you learned. Pick an answer for each question, then click Check.

  1. 1

    Where does a load balancer sit in a system?

    Why: The load balancer sits between clients and the server pool, receiving each request and forwarding it to a healthy server.

  2. 2

    What is the key difference between an L4 and an L7 load balancer?

    Why: L4 works at the connection level using IP and port, while L7 reads the actual request so it can route by content.

  3. 3

    Why are health checks important in a load balancer setup?

    Why: Health checks make the system self-healing by routing around servers that stop responding.

  4. 4

    How do you keep the load balancer from being a single point of failure?

    Why: Running more than one load balancer with failover means a standby can take over if the active one dies.

🚀 What’s Next?

You’ve got the architecture down. Next, step back for the big picture and then zoom into how it actually picks a server.

Once you’ve got those, you’ll be ready to compare the different load balancing algorithms and pick the right one for the job.

Share & Connect