Introduction to Caching

Table of Contents +

Open any app and tap on a profile page. It loads in a blink, right? Now imagine what’s happening behind that:

Every time someone opens that profile, the app could go and ask the database for the name, the photo, the follower count, all of it.
The database digs through its tables, does the work, and sends it back.
That’s fine for one person. But what about a million people opening the same popular profile at the same time?

Now the database is doing the exact same heavy lookup again and again, for data that barely changes. That’s slow, and it’s wasteful. Caching is how we fix that.

🎯 The Problem

Let’s name the pain first. Here’s what goes wrong without caching:

Your app talks to a database for every single request. And a database lookup is slow compared to just reading from memory.
A lot of that work is repeated. The same profile, the same product page, the same YouTube thumbnail gets fetched over and over.
When traffic spikes, the database gets overloaded. It slows down for everyone, and sometimes it just crashes.
Some pages aren’t even stored directly. The server has to compute them, like adding up totals or sorting a feed, and redoing that math every time is expensive.

So we’re paying a slow, repeated cost for data that mostly stays the same. There’s a better way, and that’s caching.

🍱 Real-World Analogy

Think about your study desk for a second:

You keep the things you use all the time right on the desk. Your pen, your notebook, your water bottle.
You don’t walk to the store every time you need a pen, right? That would be ridiculous.
The store still has everything. But for the stuff you reach for constantly, you keep a copy close by.

A cache works the exact same way:

The database is the store. It has everything, but it’s far and slow to reach.
The cache is your desk. It holds the few things you use most, right where you can grab them instantly.
When you need something that’s not on the desk, then you walk to the store, get it, and keep a copy on the desk for next time.

Keep this desk in your head. Everything below maps back to it.

⚡ What is Caching

Okay so let’s define it properly:

A cache is a small, fast storage layer that keeps copies of data you use often, close to where it’s needed.
Caching is just the act of using that cache. You save a copy of frequently used data somewhere fast, so you can skip the slow trip to the original source next time.
“Fast” usually means memory (RAM), which is way quicker to read than a database on disk or a server far away.

Here’s the key idea in one line. A cache doesn’t replace your database. It sits in front of it and answers the easy, repeated questions, so the database only handles the rest.

The whole point of a cache

A cache trades a little bit of memory and freshness for a lot of speed. You keep copies of hot data nearby so most requests never have to touch the slow source at all.

🎯 Cache Hit vs Cache Miss

Every time a request comes in, the cache gets checked first. And there are only two outcomes:

A cache hit is when the data is already in the cache. Great, you grab it and return it instantly. No database needed.
A cache miss is when the data isn’t in the cache. So now you go to the database, get the data, and on the way back you store a copy in the cache. Next time it’ll be a hit.

That second part is important. On a miss, we don’t just fetch the data, we also fill the cache so the same request is fast from then on.

Here’s the flow for one request.

So the more requests that turn into hits, the faster and lighter your whole system gets. The fraction of requests that are hits even has a name, the hit rate, and a high hit rate is what you’re aiming for.

Term	What it means	What happens
Cache hit	Data was found in the cache	Return it right away, skip the database
Cache miss	Data was not in the cache	Fetch from the database, then store a copy

📍 Where Caches Live

Now here’s a thing that surprises people. A cache isn’t just one box. There are caches at almost every layer of the web, each one closer to the user than the last:

Browser cache. Your own browser saves images, styles, and scripts on your device, so a repeat visit doesn’t re-download them.
CDN cache. A CDN is a set of servers spread around the world that keep copies of files near users, like YouTube thumbnails sitting close to your city instead of on one faraway server.
Application cache. This is an in-memory store like Redis that your servers check before hitting the database. It’s the classic backend cache, super fast because it lives in RAM.
Database cache. Even the database keeps recent query results in memory, so repeating the same query can skip some of the disk work.

So a single request might pass through several caches on its way. And the closer a copy sits to the user, the faster the response feels.

⏳ Keeping the Cache Fresh

Here’s the catch with caches. A cache holds a copy, and copies can go out of date. Like if a user changes their profile photo but the cache still has the old one, that’s a problem. So we have a few tools to keep things fresh:

TTL, which stands for time to live, is an expiry timer on each cached item. After the TTL runs out, the item is considered too old to trust, so the next request treats it as a miss and refetches fresh data.
Stale data is the name for a cached copy that no longer matches the source. TTL keeps stale data from sticking around forever.
Eviction is what happens when the cache fills up. Memory is limited, so the cache throws out some items to make room for new ones, usually the ones that haven’t been used in a while.

So the cache stays useful and fresh by expiring old items and evicting the ones it doesn’t need. How exactly it decides what to evict is a topic on its own, the eviction policies, and we’ll cover those next.

⚡ Benefits

So why do we bother with all this? Because the payoff is huge:

Speed. Reading from memory is far quicker than a database lookup, so hits feel instant. That’s the whole reason a profile page snaps open.
Less load on the database. Most requests get answered by the cache, so the database only handles the few that miss. It can breathe.
Lower latency. Latency is the round-trip wait for a request and its answer. A nearby cache cuts that wait, and the closer the cache, the lower the latency.
Handles traffic spikes. When a million people open the same popular post, the cache absorbs the rush instead of the database crashing.

⚠️ Common Mistakes and Misconceptions

A few things trip people up early. Let’s clear them out:

“Cache everything.” No. A cache has limited memory, and caching rarely-used data just wastes space and adds complexity. Cache the hot, frequently read stuff. (And don’t cache things that must always be exact, like a bank balance, without being careful.)
“Cached data is always correct.” It’s a copy, so it can go stale. If the source changes and the cache doesn’t, users see old data. That’s why TTL and invalidation matter.
“Cache invalidation is easy.” It’s famously one of the hardest problems in computing. Knowing exactly when to refresh or remove a cached item, without serving stale data or doing it too often, is genuinely tricky.
“A cache replaces the database.” It doesn’t. The cache can be wiped or expire at any moment. The database is still the real, permanent source of truth.

The hard part of caching

Putting data into a cache is easy. Knowing when to update or remove it, so nobody sees stale data, is the hard part. This is called cache invalidation, and it’s where most caching bugs come from.

🛠️ Design Challenge

Try this one on your own to test yourself. Imagine you’re designing the profile page for a social app, like the one we opened at the start. Millions of people view popular profiles every minute.

What would you cache here?

Show the answer

Where would you put that cache: browser, CDN, or an in-memory store like Redis?

Show the answer

What TTL feels right for a profile, a few seconds or a few hours, and why?

Show the answer

When a user updates their photo, how do you make sure people stop seeing the old one?

Show the answer

🧩 What You’ve Learned

You can now explain what a cache is and why systems lean on it so heavily. Here’s what you’ve picked up.

✅ A cache is a fast storage layer that holds copies of frequently used data, close to where it’s needed.
✅ A cache hit returns data instantly, a cache miss fetches from the source and then stores a copy.
✅ Caches live at many layers: the browser, the CDN, the application (like Redis), and the database.
✅ TTL expires old items and eviction clears space, which together keep the cache fresh.
✅ Caching brings speed, lower latency, and far less load on the database.
✅ The hard part is cache invalidation, knowing when to refresh or remove stale copies.

Check Your Knowledge

Test what you learned. Pick an answer for each question, then click Check.

🚀 What’s Next?

You’ve got the big picture now. Next, we’ll zoom into the pieces.

Redis Introduction at /tutorials/system-design/redis-introduction shows you the most popular in-memory cache and how it actually works.
CDN Explained at /tutorials/system-design/cdn-explained covers how caches spread across the world to keep sites fast for everyone.

Once you’ve got those, you’ll be ready to talk about caching the way real systems use it, and the way interviewers love to dig into.

Previous Multi-Region Databases Next Why Caching Matters

Share & Connect

Share on LinkedIn