Design a Chat Application (System Design)

Table of Contents +

Picture this. Alex types “lunch?” and hits send.

A moment later, the message pops up on Riya’s phone across town.
No refreshing, no waiting, no “tap to load new messages”. It just appears.
It feels instant, right? Like the two phones are wired together.

But they’re not wired together at all. There’s a whole system sitting in the middle, quietly catching Alex’s message and rushing it over to Riya. In this lesson we’ll design that system from scratch, the same way you’d be asked to in an interview. We’ll keep it beginner-correct, not dumbed down so much that it becomes wrong, but not drowning you in jargon either.

🎯 What We’re Building

Let’s be clear about the goal before we draw a single box:

We’re building a 1:1 real-time chat app, like WhatsApp or Messenger, where two people send messages back and forth.
When Alex sends a message, it should land on Riya’s phone almost instantly.
It should work even when Riya’s phone is off or out of network. The message shouldn’t just vanish.
And it should handle huge numbers of users, not just two friends.

We’ll stick to one-on-one chat to keep things focused. Group chat is a great extension, and we’ll touch it at the end as a challenge.

📋 Requirements

In any system design question, you start by pinning down what the system must do. We split this into two kinds of requirements.

First, the functional requirements, the things the app must actually do for the user:

Send and receive 1:1 messages in real time.
Show delivery and read receipts (those little ticks that say “delivered” and “seen”).
Keep message history, so you can scroll up and read old chats.
Show online status, like “online now” or “last seen at 9 PM”.

Then the non-functional requirements, the qualities the system must have. These are about how well it does the job, not what it does:

Low latency. Messages should feel instant, ideally well under a second.
Reliable delivery. A message must never quietly disappear, even if the other person is offline.
Scalable. It should keep working smoothly whether there are a thousand users or a hundred million.

Always split requirements like this

In an interview, listing functional and non-functional requirements first shows you can think before you build. Functional is “what it does”, non-functional is “how well it does it”. Get into the habit of saying both out loud.

⚡ The Core Challenge: Real-Time Delivery

Here’s the pain that makes chat tricky. The normal way websites work doesn’t fit here at all. Let me show you why:

The usual web pattern is request and response. Your app asks the server for something, the server answers, and then the connection closes. Done.
That’s fine for loading a page, because you decided to ask.
But in chat, the new message comes from Riya, not from Alex. The server needs to push it to Alex without Alex asking first. Plain request-response can’t do that.

So how do people try to fix this? The first idea is usually polling. Let’s see why it’s wasteful:

Polling means Alex’s app keeps asking the server “any new messages? any new messages?” every couple of seconds.
Most of the time the answer is “nope”, so it’s a ton of wasted requests for nothing.
And it’s still slow. If you ask every 3 seconds, a message can sit there for up to 3 seconds before you even ask for it.

The real fix is to keep a line open. This is where WebSockets come in:

A WebSocket is a persistent, two-way connection between the client and the server. (Client just means the user’s app or browser.)
Persistent means it stays open. The phone connects once, and that line stays alive instead of closing after each message.
Two-way means either side can send at any time. The server can push a message down to you the instant it arrives, without you asking.
So the chat server keeps a live WebSocket connection open for every online user. When a message for Riya shows up, the server already has an open pipe straight to her phone.

That single idea, one open connection per online user, is the heart of every real-time chat system. Hold onto it.

🏗️ High-Level Design

Now let’s draw the big picture before zooming in. At a high level, the system has a few key pieces:

The chat servers are the machines that hold those open WebSocket connections. Each online user is connected to one of them.
The message store is the database where every message is saved, so history survives and offline messages aren’t lost.
And there’s a way to route a message to the right chat server, the one holding Riya’s connection. We’ll see why that matters once there’s more than one server.

Here’s how Alex’s message gets to Riya at a glance.

Read it like this. Alex’s app is connected to a chat server. That server saves the message to the store, then sends it down the open line to Riya’s app. Simple enough at this size. The interesting part is what happens when Riya is offline, and what happens when there are millions of users. Let’s take those one at a time.

📨 How a Message Flows

Let’s follow one message, step by step, from Alex’s thumb to Riya’s screen. Here’s the journey:

Alex types “lunch?” and taps send. The message travels up Alex’s open WebSocket to his chat server.
The server saves the message to the store first. This is important. Save before delivering, so even if something fails next, the message isn’t lost.
Then the server checks: is Riya online? Does she have an open connection right now?
If Riya is online, the server pushes the message straight down her WebSocket. It lands on her screen almost instantly.
If Riya is offline, there’s nobody to push to. So the message just waits safely in the store. The moment Riya reconnects, her app pulls the messages it missed.
And to nudge her, the system sends a push notification to her phone, that little banner that says “Alex: lunch?”. (A push notification is a message the phone’s operating system shows even when the app is closed.)

Here’s that decision as a diagram.

Save first, deliver second

A common beginner mistake is to deliver the message and then save it. If the save fails after delivery, your history is now wrong. Always write the message to the store before you try to deliver it. Storage is your safety net.

🗄️ Storing Messages

Every message needs to be saved, and chat apps write a staggering number of messages. So the kind of database matters. Here’s the thinking:

A chat app does mostly simple writes (save this one message) and simple reads (give me the last 50 messages in this chat).
It doesn’t need complicated cross-table queries the way a banking app might.
It does need to handle an enormous volume of writes and grow easily across many machines.
That’s exactly what a NoSQL store is good at. NoSQL just means a database that doesn’t use the rigid table-and-rows model of traditional SQL databases, and many NoSQL stores are built to spread across lots of servers and absorb huge write traffic.

So what does one stored message actually look like? Here are the core fields.

Field	Meaning
`id`	Unique ID for this message
`sender`	Who sent it (Alex)
`receiver`	Who it’s for (Riya)
`text`	The message content
`timestamp`	When it was sent
`status`	sent, delivered, or read

That status field is doing quiet but important work. It’s how the app knows whether to show one tick, two ticks, or blue ticks. Let’s look at that next.

🟢 Online Presence and Receipts

Two small features make chat feel alive: knowing who’s online, and those little ticks on each message. Both work the same basic way, by tracking and broadcasting status.

First, online presence, the “online” and “last seen” labels:

When Riya’s app connects its WebSocket, the server marks her as online and remembers the time.
While she’s connected, her app quietly sends a tiny “still here” signal every so often. (This regular signal is often called a heartbeat.)
When she disconnects or the heartbeats stop, the server marks her offline and saves that moment as her “last seen”.
If Alex has Riya’s chat open, the server can tell his app about the change, so “online” flips to “last seen at 9 PM” on its own.

Now the receipts, those ticks that track a single message:

When the server saves Alex’s message, it’s marked sent. That’s one tick.
When the message actually reaches Riya’s device, her app sends back a small “got it” signal. The server flips the status to delivered. That’s two ticks.
When Riya opens the chat and sees it, her app sends a “read it” signal. The status becomes read, the blue ticks.
Each time the status changes, the server tells Alex’s app, so his ticks update live.

So inside, presence and receipts are just little status updates flowing back and forth over those same open connections. Nothing magical, just constant small signals.

📈 Scaling It

Everything so far assumed one chat server. That falls apart fast. One machine can’t hold millions of open connections at once. So we run many chat servers, and that creates a fresh puzzle. Let’s walk through it:

With many servers, Alex might be connected to server 1 while Riya is connected to server 7.
When Alex’s message arrives at server 1, server 1 has no open line to Riya. Her line lives on server 7.
So we need a way to answer one question fast: which server is holding Riya’s connection right now?

We solve that with a registry and some routing. Here’s the setup:

A registry keeps track of which user is connected to which server. Think of it as a live phone book: “Riya → server 7”. When Riya connects, her server writes itself into the registry.
So when Alex’s message comes in, server 1 looks up Riya in the registry, sees “server 7”, and forwards the message there. Server 7 then pushes it down Riya’s open line.
A load balancer sits in front of all the chat servers. When a new user connects, it spreads them evenly so no single server gets overloaded. (A load balancer is just a traffic director that hands each incoming connection to one of the servers.)

There’s one more piece for reliability. We add a message queue:

A message queue is a holding line for messages, where they wait safely until the right server is ready to pick them up and deliver them.
Why bother? Because servers can get busy or briefly go down. Instead of dropping a message when the destination server isn’t reachable, you put it in the queue, and it gets delivered as soon as things recover.
This is what lets you promise reliable delivery even when parts of the system hiccup.

Here’s the scaled-up picture.

Why chat servers are 'stateful'

Most web servers are stateless, meaning any server can handle any request. Chat servers are different. Each one holds live connections, so it remembers who’s attached to it right now. That memory is called state, which is why we need a registry to find users. This stateful nature is the trickiest part of scaling chat.

🧰 Tech Choices

Part of system design is not just naming pieces, it’s saying why you picked each one. Here are the main technology decisions for this system and the reason behind each.

Decision	Choice	Why
Deliver messages instantly	WebSockets	A persistent two-way connection pushes messages the moment they’re sent.
Store messages	Wide-column NoSQL (e.g. Cassandra)	Handles a huge write volume and reads by conversation in time order.
Online status / presence	Redis	Fast, frequently-changing data with automatic expiry.
Deliver to many devices	Message queue / pub-sub	Fans a message out to all of a user’s devices and group members.
Photos and files	Object storage + CDN	Large media stored cheaply and served fast from nearby.

⚠️ Common Mistakes and Misconceptions

A few ideas trip people up the first time they design a chat app. Let’s clear them out:

“Just use HTTP polling.” Polling wastes huge amounts of requests asking “anything new?” and still feels laggy. Real-time chat needs persistent connections like WebSockets, where the server pushes to you the instant a message arrives.
“If the user is offline, drop the message.” Never. Offline is normal, phones lose signal all the time. You save the message and deliver it when they reconnect, plus send a push notification. Reliable delivery is a hard requirement.
“One server can handle everyone.” A single machine can’t hold millions of open connections. You need many chat servers, a load balancer to spread users, and a registry to find which server holds a given user.
“Deliver the message, then save it.” Backwards. Save first, so a delivery failure never loses the message.

🛠️ Design Challenge

Try extending the design yourself. Think each one through first, then open the answer to see a full breakdown.

Group chat. A message now goes to many people, not one. How would you deliver Alex’s message to ten group members, some online and some offline?

Show the answer

Media messages. People send photos and videos, not just text. Would you stuff a 20 MB video into the message itself?

Show the answer

🧩 What You’ve Learned

You can now reason about a real-time chat system end to end. Here’s what you picked up.

✅ Real-time delivery needs WebSockets, a persistent two-way connection, not HTTP polling.
✅ Every message is saved first, then delivered, so nothing is ever lost.
✅ Offline users get the message stored plus a push notification, and it’s delivered when they reconnect.
✅ A NoSQL store fits chat’s huge volume of simple writes and reads.
✅ Presence and receipts are just small status updates flowing over open connections.
✅ You scale with many stateful chat servers, a load balancer, a registry to find users, and a message queue for reliable delivery.

Check Your Knowledge

Test what you learned. Pick an answer for each question, then click Check.

🚀 What’s Next?

You’ve got the chat system in your head. The next two topics build directly on the scaling ideas we just used.

What is Load Balancing? goes deeper into how that traffic director spreads users across servers.
Design a Notification System zooms into those push notifications and how they’re delivered at scale.

Get comfortable with these, and you’ll have a strong grip on the building blocks behind almost every real-time system.

Previous Design a Pastebin (System Design) Next Design a Notification System (System Design)

Share & Connect

Share on LinkedIn