Latency vs Throughput
Table of Contents + â
Hereâs a question that trips up a lot of people:
- A site that feels lightning fast for one user, and a site that can serve millions of users at once, are those the same thing?
- Sounds like they should be, right? Fast is fast.
- But theyâre actually two different ideas, and they have names: latency and throughput.
So letâs untangle them:
- Latency is about how quickly one request gets answered.
- Throughput is about how many requests the system can handle in a second.
- A system can be great at one and weak at the other. Thatâs the whole point of this lesson.
By the end, youâll know exactly what each word means, how they relate, and how to talk about them in an interview without mixing them up.
đŻ Why People Mix These Up
Hereâs the pain:
- You read âthis server is fastâ and you donât really know what fast means.
- Does it mean each request comes back quickly? Or that it can handle a huge crowd?
- Those are different things, and using the wrong one in an interview makes you sound shaky.
The thing is, both words are about performance, so people lump them together:
- They both feel like âspeedâ, so the brain treats them as one.
- But one measures time, and the other measures volume. Very different.
- A system can answer a single request slowly and still handle a massive number of them, or the other way around.
Weâll keep this beginner-correct. Simple words, but technically right, the way youâd want it in a real interview.
đŁď¸ Real-World Analogy
Picture a highway. This one analogy will carry the whole lesson, so hold onto it:
- A car driving across the highway is like one request going through your system.
- How long it takes a single car to cross from one end to the other, thatâs latency.
- How many cars cross a given point each minute, thatâs throughput.
Now hereâs the cool part:
- Imagine a wide highway with ten lanes. Lots of cars can cross every minute, so throughput is high.
- But each individual car might still drive slowly because of speed limits, so latency for one car is not great.
- Widening the road (adding lanes) lets more cars through, but it doesnât make any single car faster.
- Raising the speed limit makes each car finish quicker, but it doesnât add lanes.
See the split? Lanes are throughput. Speed is latency. You can change one without touching the other.
âąď¸ What is Latency
Latency is the time it takes for one request to get a response. Thatâs the whole definition. Letâs unpack it:
- You send a request, the system does its work, and the answer comes back. The clock running during all of that is latency.
- Itâs measured in milliseconds, written
ms. One millisecond is a thousandth of a second. - Lower is better. A request that comes back in 20 ms feels instant. One that takes 2000 ms (two whole seconds) feels sluggish.
A quick everyday example:
- You tap a button in an app and wait for the screen to update.
- That little wait, from tap to result, is the latency youâre feeling.
- People also call this the response time, and for most beginner purposes you can treat the two as the same.
A simple way to remember latency
Latency is about waiting. If youâre tapping your foot waiting for one thing to load, youâre feeling latency. Lower latency means less waiting.
đ What is Throughput
Throughput is how many requests the system handles per unit of time. Hereâs what that really means:
- Itâs a count, not a clock. Youâre asking âhow many requests got served in one second?â
- Itâs measured in requests per second, often written as RPS (youâll also hear QPS, queries per second, which is basically the same idea).
- Higher is better. A system doing 50,000 requests per second can serve a much bigger crowd than one doing 500.
Think of it like a delivery service:
- Latency is how long one package takes to reach the customer.
- Throughput is how many packages the whole service delivers in an hour.
- A big company with many trucks delivers tons of packages per hour, even if any single package still takes a day to arrive.
So throughput is about volume, about the size of the crowd you can serve at once.
âď¸ Latency vs Throughput
Letâs put them side by side so the difference is crystal clear.
| Latency | Throughput | |
|---|---|---|
| What it measures | Time for one request | Number of requests handled |
| Unit | Milliseconds (ms) | Requests per second (RPS) |
| Good means | Lower is better | Higher is better |
| Example | One car crosses in 30 seconds | 600 cars cross per minute |
If you remember just one thing from this table: latency is time, throughput is volume.
đ How They Relate
Now, are these two always fighting each other? Not exactly. Theyâre separate dials, but sometimes turning one nudges the other. Letâs walk through it:
- You can have low latency and low throughput. Picture a tiny shop where the one cashier serves each customer in seconds, but only one at a time, so the line backs up fast.
- You can have high throughput and higher latency. Picture a giant warehouse that processes thousands of orders, but each order sits in a queue for a while before itâs handled.
- And you can have both good, thatâs the goal, but it usually costs more money and effort.
So hereâs the key idea, the one interviewers love:
- Latency and throughput are different dials, not the same dial.
- Theyâre not always a trade-off. Often you can improve both.
- But sometimes they do trade off. If you batch many requests together to push throughput up, each request might wait a bit longer, so latency goes up too.
The takeaway: donât assume fixing one fixes the other. Always ask which one actually matters for the problem in front of you.
đ§Š How to Improve Each
Because theyâre separate dials, you tune them in different ways. Letâs split it up.
To bring latency down (make each request faster):
- Use caching. Keep ready-made answers nearby so the system doesnât redo the same work every time. (Caching just means saving a result so you can reuse it.)
- Use a CDN. Thatâs a Content Delivery Network, a set of servers spread around the world, so the data comes from somewhere close to the user instead of far away.
- Put servers closer to your users. The shorter the distance, the less time data spends traveling.
- Do less work per request. Trim slow database calls and skip steps you donât really need.
To push throughput up (handle more requests at once):
- Add more servers. More machines means more requests handled in parallel. (Parallel just means many things happening at the same time.)
- Use parallelism inside each server too, so it works on several requests together instead of one by one.
- Use queues. A queue lines requests up and feeds them in steadily, so a sudden rush doesnât overwhelm the system. (A queue is just a waiting line for requests.)
Notice the overlap
Some tricks help both. Caching makes one request faster (lower latency) and also frees the server to handle more requests (higher throughput). Thatâs why caching shows up everywhere in system design.
â ď¸ Common Mistakes and Misconceptions
A few things trip people up here. Letâs clear them out:
- âFast equals high throughput.â No. Fast usually means low latency, which is about one request. A site can answer one request quickly but still crash when a crowd shows up.
- âLatency and throughput are the same thing.â Theyâre not. One is time per request, the other is requests per second. Different units, different meaning.
- âIf I fix latency, throughput is handled too.â Not always. Making one request faster doesnât automatically let you serve more of them at once. Sometimes it helps, sometimes it doesnât.
- âOptimize one and ignore the other.â Risky. A fast site that crashes under load is bad. A site that handles millions but feels slow to each person is also bad. You usually have to watch both.
đ ď¸ Design Challenge
Try this on your own to test yourself.
Alex is building a photo-sharing app. The home feed feels quick for one person, but the app keeps crashing when lots of users open it at the same time. Think it through:
- Which dial is the problem here, latency or throughput?
- Latency is fine, since one user gets a fast feed. The trouble shows up under a crowd, so itâs a throughput problem.
- What would you try first? Maybe add more servers, or put a queue in front so a rush of users doesnât overwhelm things.
Now flip it. Imagine the feed loads slowly even for a single user on an empty app. Thatâs a latency problem instead. See how naming the right dial points you straight to the fix?
đ§Š What Youâve Learned
You can now tell these two apart with confidence. Hereâs what youâve picked up.
- â Latency is the time for one request to get a response, measured in milliseconds, where lower is better.
- â Throughput is how many requests the system handles per second, where higher is better.
- â Theyâre different dials: one measures time, the other measures volume.
- â Theyâre not always a trade-off, but pushing one hard can sometimes affect the other.
- â You lower latency with caching, CDNs, closer servers, and less work per request.
- â You raise throughput with more servers, parallelism, and queues.
Check Your Knowledge
Test what you learned. Pick an answer for each question, then click Check.
- 1
What does latency measure?
Why: Latency is the time for a single request, measured in milliseconds, where lower is better.
- 2
What does throughput measure?
Why: Throughput is requests per second, where higher is better.
- 3
Are latency and throughput always a trade-off?
Why: They are separate dials; often both improve, though pushing one hard can sometimes affect the other.
- 4
Which approach mainly lowers latency?
Why: Caching, CDNs, and closer servers all cut the time a single request takes.
đ Whatâs Next?
Youâve got the two core performance words down. Next, go deeper into the tools that move these dials.
- Introduction to Caching shows how saved answers cut latency and free up servers for more throughput.
- What is Scalability? covers how systems grow to handle bigger and bigger crowds without falling over.
Once youâve got those, youâll be able to reason about performance the way real system design interviews expect.