Design a Notification System (System Design)

Table of Contents +

You order something online and tap “Place Order”. A second later your phone buzzes with a push notification, and an email lands in your inbox saying “Order confirmed”. Maybe a text message too, with the delivery date.

Feels instant, right? Like the app just fired off all three the moment you tapped.
But behind that little buzz, there’s a whole system working hard.
It has to reach you on the right channels, never lose a message, and survive a Black Friday where millions of people are ordering at once.

In this lesson we’ll design that system from scratch. By the end you’ll be able to walk an interviewer through it with confidence.

🎯 What We’re Building

Let’s be clear about the goal before we draw a single box.

We’re building a notification system, which is the service that sends messages to users when something happens.
It needs to reach people over several channels: push notifications, email, SMS, and in-app alerts. A channel is just one way of reaching the user.
It has to fire off the right notification for the right event, like “order shipped” or “password changed”.
And it has to do all this without slowing down the app that asked for it, even when traffic suddenly spikes.

We’ll keep it beginner-correct. Not so simple that it’s wrong, but not buried in jargon either.

📋 Requirements

Before designing anything, we pin down what it must do. We split this into two buckets.

First, the functional requirements, which is just a fancy way of saying “the things it must actually do”:

Send notifications over many channels: push, email, SMS, and in-app.
Support different events, so “welcome” and “payment failed” can each send their own message.
Respect user preferences, so a user can opt out of, say, marketing emails. Opt out means the user chose not to receive that kind of message.

Then the non-functional requirements, which describe how well it should behave:

Reliable delivery, meaning a message shouldn’t just vanish if something goes wrong for a moment.
Handle huge spikes, like a flash sale where requests jump from a trickle to a flood.
Don’t block the app that’s sending it. The order service should hand off the notification and move on right away.

Always start with requirements

In an interview, the worst thing you can do is start drawing boxes immediately. Nail down what it must do first. It shows you think before you build, and it keeps your design focused.

⚡ The Core Idea: Don’t Send It Inline

Here’s the most important decision in the whole design, so let’s slow down on it.

Imagine the order service tries to send the email itself, right there while you’re waiting for the page to load. We call this sending it inline, meaning inside the same request you’re waiting on.
The problem is, the email provider might be slow that second, or briefly down. Now your “Place Order” button is just spinning, because it’s stuck waiting on email.
Worse, if a million orders come in at once, a million requests all pile onto the email provider at the same instant. Something’s going to crash.

So we don’t do that. Instead we use a message queue, and this is the heart of the design.

A message queue is a buffer that holds tasks so they can be processed later by separate workers. A buffer is just a waiting line that holds things until someone’s ready for them.
The app drops a “send this notification” task into the queue and immediately moves on. It doesn’t wait for the email to actually go out.
Separate programs called workers pick tasks off the queue and do the slow work of calling the providers.

This is called decoupling, which means the part that asks for a notification and the part that actually sends it are no longer glued together. One can be busy or slow without freezing the other.

🏗️ High-Level Design

Let’s lay out the whole system at a glance, then explain each box in plain words.

Here’s what each piece does:

Services are the apps that have something to say, like the order service or the auth service. When an event happens, they tell the notification service “hey, notify this user”.
Notification service is the front door. It takes the event, figures out what message to build, checks the user’s preferences, and then drops a task on the queue.
Message queue is that waiting line we just talked about. It holds the tasks safely until a worker is free.
Workers are the busy hands. There’s one set per channel, so a push worker, an email worker, and an SMS worker. Each one knows how to talk to its kind of provider.
Providers are the outside companies that actually deliver the message, like Twilio for SMS. The workers call them; they do the last-mile delivery.

Why one worker per channel

Sending an email is nothing like sending an SMS. They use different providers and have different speed limits. Giving each channel its own workers keeps things clean, and lets you scale them on their own. We’ll come back to this when we talk about scaling.

📨 How a Notification Flows

Let’s follow one single notification from start to finish, the way you’d trace it in an interview.

An event happens. Say the order service finishes shipping your order, so it sends an “order shipped” event to the notification service.
The notification service builds the message and checks preferences. It picks a template, fills in your name and order details, then checks whether you’ve opted out of this kind of notification. If you’ve opted out, it just stops here.
It puts the task on the queue. Now the order service is long done; it never had to wait.
A channel worker picks it up. Whichever worker matches the channel, push or email or SMS, grabs the task when it’s free.
The worker calls the provider. It hands the message to the right provider and waits for a yes or no.
It marks the result. If the provider says delivered, mark it sent. If it fails, mark it failed so we can deal with it.

So the slow stuff, the actual sending, all happens after the original request is over. That’s the whole point.

🔌 Channels and Providers

Each channel has its own real-world providers and its own best use. A provider is just the outside service that does the delivery for you, so you don’t build phone networks or mail servers yourself.

Channel	Example provider	Used for
Push	FCM (Android), APNs (iPhone)	Quick app alerts, like “Your order shipped”
Email	Amazon SES, SendGrid	Receipts, longer messages, marketing
SMS	Twilio	Codes and urgent alerts, like OTPs
In-app	Your own server + database	The little bell icon inside the app

A couple of names to know: FCM is Firebase Cloud Messaging, Google’s way of pushing to Android phones, and APNs is the Apple Push Notification service for iPhones. The push worker talks to whichever one matches the user’s device.

🔁 Reliability

Now, providers fail sometimes. Networks hiccup. So we need the system to keep its promise of reliable delivery even when things go wrong for a moment. A few tricks make that happen.

Retries with backoff. If a send fails, the worker just tries again. But it doesn’t overload the provider instantly. It waits a bit longer each time, like one second, then two, then four. That growing wait is called exponential backoff, and it gives a struggling provider room to recover.
Dead-letter queue. If a message fails over and over even after retries, we don’t loop forever. We move it to a special holding area called the dead-letter queue. A dead-letter queue is just a side queue for messages that couldn’t be delivered, so a human or a job can look at them later without blocking everything else.
Idempotency. Here’s the tricky part: a retry might accidentally send twice. Idempotency means doing the same thing twice has the same effect as doing it once. We give each notification a unique ID and remember which IDs we’ve already sent, so a repeat just gets ignored. No double texts.

Retries without idempotency cause duplicates

If you add retries but forget idempotency, you’ll eventually send the same email twice, or charge someone’s “payment received” SMS twice. Always pair retries with a way to recognize a message you’ve already handled.

🎚️ User Preferences and Rate Limits

Reaching users is good. Annoying them is not. So we put guardrails in place.

Opt-outs. Users get to choose what they receive. Maybe Alex wants order updates but no marketing. The notification service checks these preferences before queuing anything, so an opted-out message never even gets sent.
Don’t spam the user. Some events can fire in bursts. You don’t want ten “someone liked your post” pushes in one minute. So we group them, or just hold back, so the user isn’t flooded.
Rate limiting per user. A rate limit is a cap on how many messages a single user gets in a window of time, say no more than five SMS an hour. It protects the user from spam, and it protects you from a bug that would otherwise blast someone a thousand times.

📈 Scaling It

This design scales nicely because the queue does the heavy lifting. Let’s see how it handles a sudden flood.

Queues absorb spikes. When a flash sale sends a million events in a minute, they all just line up safely in the queue. The workers drain them as fast as they can. Nothing crashes; the queue acts like a shock absorber.
Add more workers per channel. If email is backing up, you don’t redesign anything. You just add more email workers, and they pull from the same queue in parallel. Twice the workers, roughly twice the speed.
Separate queues per channel. Give push, email, and SMS their own queues. That way a slow SMS provider can’t hold up your fast push notifications. Each channel flows at its own pace.

So the answer to “how do we handle scale” is short and powerful: queue absorbs the spike, then add workers to drain it faster.

🧰 Tech Choices

Part of system design is not just naming pieces, it’s saying why you picked each one. Here are the main technology decisions for this system and the reason behind each.

Decision	Choice	Why
Accept requests fast	Message queue	Take the request and send it later, so callers aren’t blocked.
Send on each channel	Provider adapters (email/SMS/push)	One clean way to talk to each outside provider.
Preferences & dedup	Database + cache	Respect user settings and avoid sending the same thing twice.
Handle failures	Retries + dead-letter queue	Retry sends, and set aside ones that keep failing.
Rate limits	Redis	Fast counters to avoid spamming a user.

⚠️ Common Mistakes and Misconceptions

A few things trip people up on this design. Let’s clear them out.

“Just send it directly in the request.” This is the big one. If the order service calls the email provider inline, a slow provider freezes your app and a spike crushes the provider. Always hand off to a queue.
“We don’t really need retries.” You do. Providers and networks fail briefly all the time. Without retries, every little hiccup means a lost notification.
“Ignore user preferences, just send everything.” That gets your emails marked as spam and your users angry. Check preferences before queuing, every time.
“Retries are safe on their own.” Not without idempotency. A retry can resend a message the provider actually did deliver, and now your user gets the same thing twice.

🛠️ Design Challenge

Try extending the design yourself. Think each one through first, then open the answer to see a full breakdown.

Scheduled notifications. A “your trial ends tomorrow” reminder should go at 9am, not the moment the event fires. How would you hold a message until a future time?

Show the answer

Priority. An OTP code must go out before a marketing email. How do you make urgent messages jump the line?

Show the answer

🧩 What You’ve Learned

You can now design a notification system end to end. Here’s what you’ve picked up.

✅ Never send notifications inline; hand them to a message queue so the app isn’t blocked.
✅ A notification service builds the message and checks preferences, then queues it for channel workers.
✅ Each channel (push, email, SMS, in-app) has its own workers and providers like FCM, SES, and Twilio.
✅ Reliability comes from retries with backoff, a dead-letter queue, and idempotency to stop duplicates.
✅ User preferences and per-user rate limits keep you from spamming people.
✅ The queue absorbs spikes, and you scale by adding more workers per channel.

Check Your Knowledge

Test what you learned. Pick an answer for each question, then click Check.

🚀 What’s Next?

You’ve got the core pattern down: queue, workers, retries, preferences. Next, go deeper on the pieces that make it solid at scale.

Rate Limiting Explained breaks down how to cap requests per user so nobody gets spammed and nothing gets overwhelmed.
Design a Chat Application uses many of the same ideas, like queues and real-time delivery, in a system that has to feel instant.

Get comfortable with these and you’ll handle most “design a system that sends messages” interviews with ease.

Previous Design a Chat Application (System Design) Next Design a Rate Limiter (System Design)

Share & Connect

Share on LinkedIn