Design a Video Streaming Service (System Design)

Table of Contents +

You do this all the time, right?

You open YouTube on your phone, tap a video, and it starts playing almost instantly.
You’re on a shaky train connection, but the video keeps going. It just looks a little blurry for a few seconds, then sharpens up again.
Later you watch the same video on your laptop at home, and now it’s crisp and full quality.

That smooth experience is doing a ton of work in the background, and it’s one of the most loved system design interview questions out there. It touches storage, queues, transcoding, CDNs, and a clever trick for picking the right quality on the fly. So let’s design a video streaming service together, step by step.

🎯 What We’re Building

So what are we actually building here? Let’s name it plainly first.

A video streaming service lets people upload videos and lets other people watch them, smoothly, on any device and any network.
Think YouTube or Netflix. Someone uploads or adds a video once, and then millions of people watch it later.
Our job is to make uploads work reliably, and make playback feel instant and smooth no matter where the viewer is or how good their internet is.

The tricky part isn’t storing one video. It’s serving the same video to huge numbers of people, fast, all over the world, at whatever quality their connection can handle.

📋 Requirements

Before drawing any boxes, a good engineer asks: what must this thing actually do? We split that into two buckets.

A functional requirement is a thing the system must do, a feature you can point at.
A non-functional requirement is about how well it does those things, like how fast or how reliable it is.

Here’s what our service must do. These are the functional ones:

Let a creator upload a video.
Let viewers watch that video, smoothly, without long pauses.
Serve the video in multiple qualities, like 360p, 720p, and 1080p, so it works on a tiny phone or a big TV.

And here’s how well it should do them. These are the non-functional ones:

It needs huge storage, because video files are massive and there are millions of them.
Playback should have low buffering. Buffering is that annoying spinning circle when the video pauses to load more. We want as little of that as possible.
It needs low latency for viewers everywhere on the planet, not just near our servers.
It must scale, so it keeps working as videos and viewers pile up.

Always ask before you design

In a real interview, don’t jump straight to drawing boxes. First ask what features matter and roughly how big it needs to be. Nailing the requirements first is half the score.

🧩 Two Halves: Upload and Playback

Here’s the single most useful way to think about this whole system. It really splits into two separate jobs that barely overlap.

Upload. This is the write side. A creator sends us a video, and we get it ready to be watched. This happens once per video.
Playback. This is the read side. Viewers watch that video, again and again. This happens millions of times.

Why does this split matter so much?

Uploads are rare and can take their time. Nobody minds if a video takes a few minutes to “finish processing” after upload.
Playback is constant and must feel instant. This is where almost all the traffic is.
So like most big read-heavy systems, we make playback blazing fast, and we let the upload side do its slow, heavy work quietly in the background.

Keep this two-halves picture in your head. Everything below fits into one half or the other.

⬆️ The Upload + Transcoding Pipeline

Let’s start with upload. A creator hands us a single big video file. But we can’t just store that one file and call it done. Here’s the problem.

The creator might upload a giant 4K file, but a viewer on an old phone with slow internet can’t play that. It would just buffer forever.
So we need the same video in several smaller versions: 360p, 480p, 720p, 1080p, and so on.
Turning that one uploaded file into multiple resolutions and formats is called transcoding. That’s the key word here.

So the moment a video lands, we have work to do. But notice, transcoding a long video is slow and heavy. It can take minutes. We can’t make the creator sit and wait, and we can’t tie up our upload server doing it. So we do it asynchronously.

“Asynchronously” just means we don’t do it right away while the creator waits. We accept the upload, say “thanks, we’ll process it”, and do the heavy work in the background.
To pass that work to the background, we drop a little “please process this video” message into a message queue. A message queue is a waiting line for tasks, so work can be handed off and picked up later. (More on that in the linked lesson below.)
Separate machines called transcoding workers pull jobs off that queue, one by one, and do the actual converting. A worker is just a server whose only job is to chew through these tasks.

Here’s the flow when a video comes in.

Let’s trace it:

The creator uploads the file. The upload service grabs it and quickly stores the raw original.
The upload service then drops a job in the queue and immediately tells the creator “got it, processing now”. The creator can close the tab.
Workers pick up the job and transcode the video into all the qualities we want, then save each version.
Once that’s done, the video is marked ready and viewers can watch it.

Why a queue instead of doing it on the spot

If we transcoded during the upload request itself, every upload would hang for minutes, and a traffic spike could crash the upload server. A queue lets us accept uploads instantly and process them at our own pace. If a million videos arrive at once, they just wait politely in line, and we add more workers to chew through them faster.

Learn more about the queue piece in Message Queues Explained.

🗄️ Where Videos Are Stored

Now, where do all these video files actually live? This trips a lot of people up, so let’s be clear.

Video files are huge and there are tons of them. We need a store built for massive blobs of data.
That store is called object storage. Object storage keeps large files (called objects) cheaply and at basically unlimited scale. Think of services like Amazon S3.
So all the actual video bytes, the raw upload and every transcoded version, live in object storage.

But a video isn’t just its bytes. There’s other stuff we need to track too.

The title, the description, who uploaded it, the view count, which qualities are ready, where each file lives, and so on. This is called metadata, which just means data about the video.
Metadata is small, structured, and we query it a lot (“show me this video’s title and available qualities”). So it goes in a regular database, not object storage.

So we split the two cleanly.

What	Where it lives	Why
The big video files	Object storage	Cheap, huge, built for large blobs
Title, owner, view count, file locations	Database	Small, structured, queried often

Learn more in Object Storage Explained.

🌍 Delivery via CDN

Okay, the videos are stored and ready. Now the big question: how do we get them to viewers fast, all over the world?

Say all our storage sits in one place, like a data center in the US. A viewer in India would have their video travel halfway around the planet for every chunk. That’s slow, and it means buffering.
The fix is a CDN, which stands for Content Delivery Network. A CDN is a set of servers spread all over the world, called edge servers, that keep copies of your content close to users.
So a viewer in India gets the video from a nearby edge server in India, not from across the ocean. Less distance means less latency, which means less buffering.

Here’s why this is the single most important piece for playback.

Video is the heaviest thing on the internet to deliver. The same popular video gets watched millions of times.
Instead of our origin storage serving all those millions of streams, the CDN does it. Each edge server keeps a cached copy of the popular videos and serves its local crowd.
Our origin storage barely gets touched. The CDN soaks up the vast majority of the traffic.

The CDN does the heavy lifting

Most of the work in a streaming service is reads, the same videos watched over and over. A CDN serves those reads from servers near each viewer, so the experience feels fast everywhere and your central storage stays calm. This is the heart of streaming at scale.

Learn more in CDN Explained.

📶 Adaptive Bitrate Streaming

Now here’s the clever bit that makes video feel smooth even on bad internet. Remember our train example, where the video stayed playing but went blurry for a bit? That’s this feature at work.

The idea is called adaptive bitrate streaming. “Bitrate” just means how much data per second a video uses. Higher quality means a higher bitrate.
Instead of sending one big file, we chop each video into lots of tiny pieces, called chunks, usually a few seconds of video each.
And we keep every chunk in several qualities, like a 360p version and a 1080p version of that same two-second piece.

So how does the smooth switching actually happen?

As you watch, the player downloads the next few chunks ahead of time.
The player watches your network speed. If your connection is strong, it grabs the high-quality chunks. If your connection drops, it quietly switches to lower-quality chunks for a bit.
That’s why the video keeps playing on the train instead of freezing. It trades a little sharpness for no buffering. When your signal comes back, it jumps to high quality again.

A common way to do this is HLS, which stands for HTTP Live Streaming. HLS is a popular format that handles the chunking and the quality list for you, and it works great over plain CDNs.

So the transcoding step from earlier isn’t just making different quality files. It’s also slicing each quality into these little chunks and writing a small playlist that lists them. The player reads that playlist and picks chunks as it goes.

🏗️ High-Level Design

Let’s put both halves together now. When you zoom out, the whole system is just a chain of boxes, split into the upload side and the playback side.

Let’s trace both jobs.

Uploading a video (the write side):

The creator sends the file to the upload service.
The upload service stores the raw file, writes the video’s metadata to the database, and drops a transcoding job in the queue.
Workers pick up the job, transcode the video into multiple qualities and chunks, and save it all to object storage.
The metadata is updated to say the video is ready.

Watching a video (the read side):

The viewer’s player asks for the video. It gets the metadata (title, available qualities) from the database, and the playlist of chunks.
The player then pulls video chunks from the nearest CDN edge server.
As the network changes, the player switches between quality chunks. Smooth playback.

That’s the full picture. The slow, heavy work sits on the left in the background. The fast, viewer-facing work flows through the CDN on the right.

📈 Scaling It

Now imagine this thing gets huge, billions of views a day. Here’s how each part grows.

Let the CDN do the heavy lifting for reads. Almost all traffic is people watching videos, and the CDN serves that from edge servers near each viewer. Our origin storage stays mostly idle. This is the biggest scaling win by far.
Scale transcoding with more workers. Since transcoding runs off a queue, when uploads pile up we just add more worker machines pulling from the queue. The queue absorbs the spikes so nothing gets dropped.
Cache the metadata. Popular videos get their metadata looked up constantly. We keep that hot metadata in a fast in-memory cache (like Redis) so the database doesn’t get overloaded for the same titles over and over.
Object storage scales on its own. Services built for it handle near-unlimited files, so we don’t sweat running out of room.

Put together, this design handles enormous load. Reads fly through the CDN, transcoding scales by adding workers, and caching keeps the metadata database calm.

🧰 Tech Choices

Part of system design is not just naming pieces, it’s saying why you picked each one. Here are the main technology decisions for this system and the reason behind each.

Decision	Choice	Why
Convert uploads	Queue + worker pipeline	Transcode each video into several quality levels in the background.
Store videos	Object storage	Built to hold enormous video files.
Deliver playback	CDN	Serves video from a server near the viewer, cutting buffering.
Adjust to the network	Adaptive bitrate (HLS/DASH)	Switches quality to match the viewer’s bandwidth.
Video info	Database	Stores titles, owners, and the list of quality levels.

⚠️ Common Mistakes and Misconceptions

A few things trip people up on this one. Let’s clear them out.

“Just stream straight from one origin server.” That would melt under load and feel slow for far-away viewers. You need a CDN serving copies from edge servers near each viewer. That’s non-negotiable for video.
“Transcode the video right there during upload.” No. Transcoding is slow and heavy, so doing it during the upload request would make every upload hang and could crash the server on a spike. Do it asynchronously with a queue and workers.
“Store the videos in a database.” Databases are built for small, structured records, not giant blobs of video. Big files go in object storage. Only the small metadata goes in the database.
“One quality is enough, just send the full file.” Then viewers on weak connections buffer endlessly. You need multiple qualities and adaptive bitrate so the player can switch as the network changes.
“Adaptive bitrate is some fancy magic in the network.” Nope. It’s just the video pre-chopped into small chunks at several qualities, and the player picking which chunk to grab next based on your speed.

🛠️ Design Challenge

Try extending the design yourself. Think each one through first, then open the answer to see a full breakdown.

Recommendations. Suggest videos a viewer might like next. What extra data do you track, and where, without slowing playback?

Show the answer

View counts. Show how many times each video was watched. Millions of views would overload the database if written instantly. How do you handle it?

Show the answer

Live streaming. Now the video is created and watched at the same time. How does transcoding change, and how small are the chunks?

Show the answer

🧩 What You’ve Learned

You can now design a video streaming service from scratch and talk through it clearly. Here’s what you picked up.

✅ The system splits into two halves: a slow upload side and a fast playback side.
✅ Uploads kick off transcoding, which converts the video into multiple qualities.
✅ Transcoding runs asynchronously through a message queue and worker machines, so uploads never hang.
✅ The big video files live in object storage, while small metadata lives in a database.
✅ A CDN serves videos from edge servers near each viewer, cutting latency and buffering.
✅ Adaptive bitrate streaming chops video into chunks at several qualities, and the player switches based on the network. HLS is a common format for this.
✅ Scaling leans on the CDN for reads, more workers for transcoding, and caching for metadata.

Check Your Knowledge

Test what you learned. Pick an answer for each question, then click Check.

🚀 What’s Next?

This case study leans hard on two ideas that show up in almost every system design. Go deeper on them next.

CDN Explained shows how edge servers keep content close to users, the exact trick our playback depends on.
Object Storage Explained breaks down how we store huge files cheaply at scale, which is where every video in our design lives.

Once you’re comfortable with those, come back and try the design challenge again. You’ll see the whole system click into place.

Previous Design a News Feed System (System Design) Next Design an E-commerce Platform (System Design)

Share & Connect

Share on LinkedIn