Token Bucket Algorithm

Table of Contents +

Say you’re building an API. You want to be friendly, so if a user fires off a few quick requests in a row, that’s fine, let them through. But you also can’t let someone hammer your server forever. So here’s the tricky part:

You want to allow a short burst now and then, like ten requests in one second.
But over the long run, you still want to cap things, like maybe five requests per second on average.
“Allow bursts, but cap the overall rate” sounds like two opposite rules, right?

Good news. There’s one neat little algorithm that does exactly this, and it’s everywhere in real systems. It’s called the token bucket, and by the end of this lesson you’ll be able to explain it cold, even to an interviewer.

🎯 Recap: Rate Limiting

Quick reminder before we dive in. Rate limiting just means putting a cap on how many requests someone can make in a given time, so one user can’t overload your server. If that idea feels fuzzy, go read Rate Limiting Explained first, then come back here. Token bucket is one specific way to actually do rate limiting.

🪣 The Idea

Picture a real bucket sitting next to your server, and it holds little coins called tokens. Here’s the whole idea in a few lines:

There’s a bucket, and it holds tokens. A token is just a permission slip, one token lets one request through.
The bucket gets refilled at a steady rate, like one token every second. This steady drip is called the refill rate.
Every time a request comes in, it has to take one token out of the bucket. We say the request “spends” a token.
If a token is there, the request grabs it and goes through. If the bucket is empty, there’s no token to spend, so the request gets rejected.
The bucket has a maximum size. It can’t hold more than that, no matter how long it’s been filling. This max size is the bucket capacity, and it’s the same thing as your biggest allowed burst.

So that’s it. Tokens trickle in at a fixed pace, requests carry them away, and an empty bucket means “sorry, try again in a moment.”

Why tokens, not a counter?

You could just count requests, sure. But tokens give you a clean way to handle two things at once: the slow steady drip controls your long-term rate, and the bucket size controls how big a sudden burst you’ll tolerate. One mechanism, two knobs.

⚙️ How It Works

There are really only three moving parts, and once you’ve got them, the whole thing clicks. Let’s go through them:

Refill rate. This is how fast tokens get added back, say one token per second. It sets your long-term average rate. Refill more tokens per second and you allow more traffic over time.
Bucket capacity. This is the most tokens the bucket can ever hold at once. It sets your burst size. A bigger bucket means a bigger sudden spike is allowed.
Spend on request. Every incoming request tries to take one token. Got a token? You’re allowed in. No token left? You’re rejected, usually with a “too many requests” message.

Here’s the catch that trips people up. Tokens keep refilling even when nobody’s making requests. So if your service sits quiet for a while, the bucket slowly fills back up to capacity, ready for the next burst. It doesn’t fill past capacity though, any extra drops just spill over and are lost.

Let’s walk the decision a single request goes through:

Notice that refill arrow feeding back into the bucket. That’s the part working quietly in the background the whole time, topping the bucket up while requests are busy draining it.

💥 Why It Allows Bursts

This is the trait that makes token bucket special, so let’s slow down here. The reason it allows bursts is simple once you see it:

When the bucket is full, all those saved-up tokens are just sitting there ready to be spent.
So a sudden rush of requests can grab them all quickly and go through, one after another, fast.
That’s your burst, a short spike of traffic that gets through faster than your normal rate.
But once those saved tokens are gone, the bucket is empty. Now requests can only go through as fast as new tokens drip in, which is your refill rate.

So the behavior has two modes, in a sense. While there are saved tokens, you get a quick burst. After that, you settle down to the steady refill rate. Think of it like a small water tank. If the tank is full you can pour a big glass right away, but after that you can only pour as fast as the tap refills it.

This is exactly what we wanted at the start: allow short bursts, but cap the long-term rate. Token bucket gives you both with one bucket.

A quick example

Say your bucket holds 10 tokens and refills 2 per second. If Alex has been quiet, the bucket is full at 10. Alex can fire 10 requests instantly, that’s the burst. After that, the bucket is empty, so Alex is limited to 2 requests per second from then on. That’s the refill rate taking over.

🆚 Token Bucket vs Leaky Bucket

People mix these two up all the time, so let’s put them side by side. They sound similar and both use a bucket, but they behave differently. The short version is: token bucket allows bursts, while leaky bucket smooths everything into one steady stream.

Aspect	Token Bucket	Leaky Bucket
What’s in the bucket	Tokens that requests spend	The requests themselves, waiting
Bursts	Allowed, up to the bucket size	Not really, traffic comes out steady
Output pattern	Bursty, then steady once empty	Smooth, fixed rate at all times
When the limit is hit	Request is rejected	Request waits in line, or is dropped if the line is full
Good for	APIs that want to allow short spikes	Systems that need one even, predictable flow

So the easy way to remember it: token bucket is forgiving about bursts, leaky bucket is strict about smoothness. We cover the other side in detail in Leaky Bucket Algorithm.

🌍 Where It’s Used

You don’t have to look far to find token bucket in the wild. It’s genuinely one of the most common rate limiters out there. Here’s where you’ll bump into it:

API rate limits. Tons of public APIs use it. They let you make a quick batch of calls, then throttle you to a steady rate after. That mix is the token bucket behavior.
Cloud throttling. AWS and other cloud providers lean on token bucket style limits for many services, so a short spike of activity is fine but sustained overloading gets capped.
Network traffic shaping. Routers and gateways use it to control how much traffic flows through, allowing bursts while keeping the average in check.

The reason it shows up so much is that real traffic is bursty. Users click a few things fast, then pause. Token bucket fits that natural rhythm instead of fighting it.

⚠️ Common Mistakes and Misconceptions

A few things trip people up with this one. Let’s clear them out so you don’t get caught:

“Token bucket smooths traffic into a steady stream.” No, that’s the leaky bucket. Token bucket actually allows bursts. If you want smooth output, you want leaky bucket instead.
“Token bucket and leaky bucket are the same thing.” They both use a bucket, but they’re opposites in spirit. Token bucket spends tokens and allows spikes. Leaky bucket queues requests and forces a steady drip.
“Bigger bucket is always better.” Not really. A huge bucket allows a huge burst, which might overwhelm your server all at once. The capacity should match the biggest spike you can actually handle.
“Refill rate and capacity are the same setting.” They’re separate knobs. Refill rate sets your long-term average. Capacity sets your burst size. Get them mixed up and your limiter behaves nothing like you expect.

🛠️ Design Challenge

Try this one on your own to lock it in.

Imagine you’re rate limiting an API at 5 requests per second on average, but you want to allow a quick burst of up to 20 requests when a user first opens your app.

What refill rate would you set? (Hint: it controls the long-term average.)

Show the answer

What bucket capacity would you set? (Hint: it controls the burst size.)

Show the answer

If a user fires 20 requests instantly and then keeps firing, what happens after the first 20?

Show the answer

🧩 What You’ve Learned

You can now explain how a token bucket controls traffic. Here’s what you’ve picked up.

✅ A token bucket holds tokens that refill at a steady rate, and each request spends one.
✅ An empty bucket means the request is rejected, since there’s no token to spend.
✅ The refill rate sets your long-term average, and the bucket capacity sets your maximum burst.
✅ It allows short bursts when the bucket is full, then throttles down to the refill rate.
✅ Token bucket allows bursts, while leaky bucket smooths traffic into one steady stream.
✅ It’s used widely in API rate limits, cloud throttling, and network traffic shaping.

Check Your Knowledge

Test what you learned. Pick an answer for each question, then click Check.

🚀 What’s Next?

You’ve got the burst-friendly side of rate limiting down. Now go see the other side and the bigger picture.

Leaky Bucket Algorithm shows the strict, steady-flow cousin of token bucket and when you’d reach for it instead.
Rate Limiting Explained zooms out to the full topic, why we limit rates and the other strategies you can use.

Previous Rate Limiting Explained Next Leaky Bucket Algorithm

Share & Connect

Share on LinkedIn