Least Connections Load Balancing
Table of Contents + −
Picture this. You’ve got three servers behind a load balancer, and you’re sending requests to each one in turn, one after the other. That’s round robin, and it works great when every request is small and quick.
But what happens when some requests take a few milliseconds and others take thirty seconds? Like:
- One user asks for a tiny webpage. Done in a blink.
- Another user kicks off a big report that runs for half a minute.
- Round robin doesn’t know the difference. It just keeps handing out requests in order, like dealing cards.
So one server can quietly pile up a stack of slow, heavy requests while another sits there almost idle. That’s the problem least connections fixes, and we’ll see exactly how.
🎯 The Problem With Round Robin
Round robin is the simplest way to spread requests around. It just goes server 1, server 2, server 3, then back to server 1, over and over. Fair and easy, right?
But here’s the thing it completely ignores:
- It doesn’t look at how busy each server actually is right now. It only counts whose turn it is next.
- It assumes every request costs about the same. In real life, that’s often not true.
- So if server 1 happens to catch a bunch of slow requests, round robin keeps sending it more anyway, because it’s just following the rotation.
Let’s make that concrete with Alex’s app. Say Alex runs a service where most requests are quick, but some users export huge files:
- Server 1 gets handed three export jobs in a row by sheer bad luck.
- Those jobs hang around for a long time, eating up the server’s resources.
- Meanwhile round robin has moved on and comes back to server 1 again, piling a fourth job on top.
- Server 2 and server 3 finished their quick requests ages ago and are basically twiddling their thumbs.
The work isn’t really balanced at all. It’s balanced by turn, not by actual load. We need something that looks at what’s really going on.
🔢 What is Least Connections
Least connections is a load balancing rule that does exactly that. Here’s the core idea in one line:
- The load balancer sends each new request to the server that has the fewest active connections right now.
Before we go further, let’s nail down one term. What’s an active connection?
- An active connection is a request that a server is currently handling and hasn’t finished yet.
- The moment a request comes in and the server starts working on it, that’s one active connection.
- The moment the server sends its answer back and closes things up, that connection is gone and the count drops by one.
So the connection count is a live measure of how busy a server is at this exact moment. A server stuck with three slow export jobs has a high count. A server that just finished its quick requests has a low count.
Least connections uses that count to decide. New request comes in? Send it to whoever has the smallest pile. Simple as that.
Why connection count is a good signal
A connection that’s still open usually means the server is still doing work for it. So counting open connections is a cheap, quick way to guess which server has the least on its plate right now, without measuring CPU or memory directly.
⚙️ How It Works
The load balancer keeps a little tally. For every server behind it, it tracks how many connections are currently open. Here’s the loop it runs:
- A new request arrives at the load balancer.
- It looks at its tally and finds the server with the lowest active connection count.
- It sends the request there, and bumps that server’s count up by one.
- When that request finishes, the server’s count drops back down by one.
So the tally is always shifting as requests come and go, and the load balancer always picks from the freshest numbers.
Here’s the flow when a request lands and three servers have different loads:
In this snapshot, server 2 has the lightest load, so the new request goes there. If server 3 had just freed up a bunch of connections, it might win the next round instead. The decision changes every time because the load changes every time.
⚖️ Weighted Least Connections
Plain least connections assumes all your servers are equally powerful. But that’s not always true, right? Maybe one server is a beefy machine and another is older and weaker. That’s where weighted least connections comes in.
- You give each server a weight, a number that says how much capacity it has compared to the others.
- A powerful server gets a higher weight, so it’s allowed to hold more connections before it’s considered “busy”.
- Instead of comparing raw connection counts, the load balancer compares connections divided by weight.
Let’s see it with two servers Alex is running:
- Server A is a big machine with weight 4. It currently holds 8 connections. So its score is 8 divided by 4, which is 2.
- Server B is a smaller machine with weight 1. It currently holds 3 connections. So its score is 3 divided by 1, which is 3.
- Server A has more raw connections, but its score is lower, so the next request goes to server A.
So even though server A is handling more requests, it’s a stronger machine, and the math knows it still has room to spare. Weighted least connections balances by real capacity, not just by raw counts.
⚖️ Round Robin vs Least Connections
These two get compared a lot, so let’s put them side by side.
| Aspect | Round Robin | Least Connections |
|---|---|---|
| How it picks | Next server in rotation | Server with fewest active connections |
| Looks at current load? | No | Yes |
| Best when requests are | Short and similar in duration | Uneven, some short some long |
| Work the LB does | Almost none, just counts turns | Tracks connection counts per server |
| Risk | Can overload a server with slow requests | Slightly more bookkeeping |
✅ When to Use It
Least connections shines in a few clear situations. Reach for it when:
- Your requests vary a lot in duration. Some finish instantly, others run for ages. This is the classic case where round robin falls apart.
- You have long-lived connections, like WebSockets, streaming, or database sessions that stay open for a while. These hang around and really need to be spread by actual load.
- You want better load awareness in general. Sending work to the least busy server keeps any single machine from getting buried.
So if Alex’s traffic is a messy mix of quick page loads and heavy export jobs, least connections is a much better fit than round robin. It naturally steers new work away from the servers that are already swamped.
⚠️ The Cost
Nothing’s free, right? Least connections does ask a little more of your load balancer than round robin does:
- The load balancer has to track an active connection count for every server and keep it updated as requests start and finish.
- That’s more state to manage and a tiny bit more work per decision than just rotating through a list.
- In most setups this cost is small and well worth it. But it’s why round robin still wins when all your requests are short and look the same, because then the extra tracking buys you nothing.
⚠️ Common Mistakes and Misconceptions
A couple of ideas trip people up here. Let’s clear them up:
- “Fewest connections always means least loaded.” Usually true, but not perfectly. A server could have one connection that’s pinning its CPU at 100 percent, while another has five tiny idle connections. Connection count is a good guess at load, not an exact measurement of it.
- “Least connections is always better than round robin.” Nope. If all your requests are short and roughly equal, round robin does just as well with less overhead. Least connections only pulls ahead when request durations are uneven.
🛠️ Design Challenge
Try this one on your own to test yourself.
Imagine Alex runs a video processing service. Most requests are tiny thumbnail fetches that finish instantly, but some are full video encodes that run for a minute or more. There are three servers behind the load balancer.
- Would you pick round robin or least connections here, and why?
- Now say one server is twice as powerful as the others. How would weighted least connections change your setup?
- Sketch out what happens to each server’s connection count over a burst of mixed requests.
Walk through it out loud. If you can explain why round robin would bury one server while least connections spreads the load, you’ve really got it.
🧩 What You’ve Learned
You can now explain how least connections balances real load, not just turns. Here’s what you’ve picked up.
- ✅ Least connections sends each new request to the server with the fewest active connections right now.
- ✅ An active connection is a request a server is still handling, so the count is a live measure of how busy it is.
- ✅ It beats round robin when request durations are uneven or connections are long-lived.
- ✅ Weighted least connections factors in server power by comparing connections divided by weight.
- ✅ The cost is a little extra bookkeeping, since the load balancer must track connection counts per server.
Check Your Knowledge
Test what you learned. Pick an answer for each question, then click Check.
- 1
How does the least connections algorithm choose a server?
Why: Least connections looks at the live connection count and sends new work to the least busy server.
- 2
What is an active connection?
Why: An active connection is an in-progress request, so counting them is a live measure of how busy a server is.
- 3
When does least connections beat round robin?
Why: Uneven or long-lived requests can pile up on one server with round robin, while least connections steers work away from busy machines.
- 4
How does weighted least connections handle servers of different power?
Why: By comparing connections divided by weight, a stronger server can hold more connections before it is treated as busy.
🚀 What’s Next?
You’ve now got two load balancing strategies in your toolkit, and you know when each one fits. Next, go deeper.
- Round Robin Load Balancing is the simple rotation method we kept comparing against, worth knowing cold.
- Sticky Sessions shows how to keep a user pinned to the same server when that matters, like for login sessions.
Once you’ve got these, you’ll be able to reason about how real systems spread traffic at scale.