Design Google Drive / Dropbox
Table of Contents + −
You save a file on your laptop, and a moment later it shows up on your phone. You share a folder with a friend, and they can open it instantly. That’s what apps like Google Drive and Dropbox do: store your files in the cloud and keep them the same across all your devices. Let’s design one, step by step.
🎯 What the System Does
At its heart, cloud storage does a few things:
- Lets you upload files and keeps them safe.
- Syncs them, so a change on one device shows up on your others.
- Lets you share files and folders with other people.
Sounds simple, but doing this for huge files, across many devices, without wasting space, is where the real design lives.
📋 Requirements
Let’s pin down what we need first.
Functional (what it must do):
- Upload and download files.
- Sync changes across all of a user’s devices.
- Share files and folders with other users.
Non-functional (how well it must do it):
- Reliability: never lose a file. This is the most important one.
- Scale: store huge numbers of large files.
- Speed: uploads and syncs should feel quick.
For storage, reliability comes first
People trust this app with their photos and documents. Losing a file is the worst thing that can happen. So durability (never losing data) is the top priority, even above speed.
🧩 Splitting Files into Chunks
Here’s the key idea that makes the whole thing work well. Instead of storing each file as one big blob, we split it into smaller chunks, say 4 MB each.
Why bother? Chunks give us three nice wins:
- Faster, safer uploads. If a 1 GB upload fails at 90%, you only re-send the last failed chunk, not the whole file again.
- Sync only what changed. If you edit one part of a big file, only the chunks that changed get uploaded, not the entire file.
- Save space. If two chunks are identical (even across users), you can store just one copy. This is called deduplication.
So chunking is the trick behind fast syncs and efficient storage.
🏗️ High-Level Design
The design splits into two main parts: where the file bytes live, and where the information about files lives.
Reading the parts:
- The chunk store holds the actual file chunks. This is usually object storage, which is built to hold huge numbers of files cheaply and safely.
- The metadata service stores information about files: names, folders, who owns them, and which chunks make up each file.
- The notification service tells your other devices “something changed, come sync.”
So the bytes go in the chunk store, and the facts about the bytes go in the metadata database. Keeping these separate is a core idea.
🔄 How Syncing Works
Let’s walk through a sync. Say you edit a document on your laptop.
- Your laptop figures out which chunks changed and uploads just those to the chunk store.
- It updates the metadata service: “this file now uses these chunks.”
- The notification service pings your other devices: “this file changed.”
- Your phone hears the ping, asks the metadata service what’s new, and downloads only the changed chunks.
So your phone updates quickly because it only grabs what actually changed, not the whole file. That’s the payoff from chunking.
🤝 Sharing Files
Sharing is mostly a metadata job. When you share a folder with a friend:
- We add a record in the metadata service: “this user also has access to this folder.”
- Now their app can see and download those chunks too.
The actual chunks don’t move or get copied. We just update who’s allowed to reach them. That’s why sharing feels instant, even for big folders.
📈 Scaling and Reliability
To keep files safe and the system fast:
- Replication: each chunk is stored as several copies on different machines. If one machine dies, the file is still safe on the others. This is how we avoid ever losing data.
- Object storage scales easily: it’s designed to hold endless files across many machines, so we just keep adding capacity.
- The metadata database can be split (sharded) by user, so each user’s file info lives in a manageable piece.
🧰 Tech Choices
Part of system design is not just naming pieces, it’s saying why you picked each one. Here are the main technology decisions for this system and the reason behind each.
| Decision | Choice | Why |
|---|---|---|
| Store the file bytes | Object storage | Built to hold endless large files cheaply and safely. |
| Store file info | Database (replicated) | Names, folders, and chunk lists need fast, structured lookups. |
| Tell other devices to sync | Notification / push service | Push changes so devices don’t waste effort polling. |
| Never lose a file | Replicate each chunk | Several copies on different machines survive a failure. |
⚠️ Common Mistakes and Misconceptions
A few things to keep straight:
- “Store each file as one big object.” That makes uploads fragile and syncs wasteful. Chunking lets you re-send and sync only the parts that changed.
- “Sharing copies the file to the other person.” No. Sharing just updates metadata about who can access the existing chunks. No copying needed.
- “One copy of each file is enough.” Not for reliability. Files are replicated across machines so a single failure never loses your data.
🧩 What You’ve Learned
Nice work. Here’s the recap:
- ✅ Cloud storage uploads files, syncs them across devices, and lets users share them.
- ✅ Files are split into chunks, which makes uploads safer, syncs faster, and storage smaller (via deduplication).
- ✅ The design separates the chunk store (the bytes) from the metadata service (facts about files).
- ✅ Syncing transfers only the changed chunks, and a notification service tells other devices to update.
- ✅ Sharing is a metadata change, and chunks are replicated across machines so files are never lost.
Check Your Knowledge
Test what you learned. Pick an answer for each question, then click Check.
- 1
Why are files split into chunks?
Why: Chunking lets you re-send only a failed piece, sync only what changed, and store identical chunks once (deduplication).
- 2
What does the metadata service store?
Why: The metadata service holds facts about files. The actual bytes live separately in the chunk store (object storage).
- 3
What happens when you share a folder with a friend?
Why: Sharing just updates who can access the chunks. The data isn't copied, which is why sharing feels instant.
- 4
How does the system make sure files are never lost?
Why: Replication keeps multiple copies of each chunk on different machines, so a single machine failure never loses your data.
🛠️ Design Challenge
Try extending the design yourself. Think each one through first, then open the answer to see a full breakdown.
Edit conflicts. Two devices edit the same file while offline, then both come back online with different changes. How do you handle that?
Sharing with permission levels. Some people should only view a file, others can edit, and you may want to revoke access later. How do you build that?
Search across files. A user wants to search the text inside their documents. How would you add that?
🚀 What’s Next?
You’ve designed cloud storage. Let’s explore related building blocks.
- Object Storage Explained is the storage type that holds the chunks.
- Database Replication is how we keep copies safe across machines.
Get these down and you’ll see how storage systems stay both huge and safe.