Design an E-commerce Platform (System Design)

You’ve shopped online a hundred times, right?

  • You open a store like Amazon, browse some products, and search for the thing you want.
  • You drop a couple of items into your cart, then head to checkout and pay.
  • A moment later you get an order confirmation, and the store quietly sets one of that item aside for you.

That little flow feels simple from the outside, but behind it sits one of the richest system design interview questions out there. It touches catalogs, carts, orders, payments, and the tricky business of counting stock correctly. So let’s design an e-commerce platform together, step by step.

🎯 What We’re Building

So what exactly is an e-commerce platform? Let’s name it plainly first.

  • An e-commerce platform is an online store where people browse products, add them to a cart, and place orders that get paid for and shipped.
  • A product is one thing for sale, like a pair of shoes, with a name, price, photos, and how many are in stock.
  • A cart is the temporary list of things a shopper picked but hasn’t paid for yet.
  • An order is what that cart becomes once the shopper checks out and pays. It’s now a real commitment to buy.

Now why is this hard? Because money and stock are on the line.

  • If we show the wrong price or lose someone’s cart, shoppers leave.
  • If we take money but don’t record the order, that’s a serious bug.
  • And if two people both buy the last item, somebody’s going to be unhappy.

So our job is to build something that lets people browse and buy reliably, even when a million shoppers show up at once.

📋 Requirements

Before drawing any boxes, a good engineer asks: what must this thing actually do? We split that into two buckets.

  • A functional requirement is a thing the system must do, a feature you can point at.
  • A non-functional requirement is about how well it does those things, like how fast or how reliable it is.

Here’s what our platform must do. These are the functional ones:

  • Let shoppers browse and search products, and see details like price and stock.
  • Let shoppers add and remove items in a cart.
  • Let shoppers place an order and pay for it.
  • Keep track of inventory, so stock goes down when something sells.

And here’s how well it should do them. These are the non-functional ones:

  • It should be highly available, meaning it stays up almost all the time. A store that’s down sells nothing.
  • It must be consistent for orders and inventory. “Consistent” here means everyone sees the same true count of stock, so we never sell the same last item twice.
  • It should scale, so it keeps working as products, shoppers, and orders pile up.
  • It must handle traffic spikes, like a sudden flood of shoppers when a big deal drops.

Always ask before you design

In a real interview, don’t jump straight to drawing boxes. First ask what features matter and roughly how big it needs to be. Pinning down the requirements first is half the score.

🧩 Break It Into Services

Here’s a tempting mistake: build the whole store as one giant program with one big database. It works at first, but as the store grows it becomes a tangled mess where one bug can take everything down.

So instead, we split the store into separate pieces, each doing one job. Each piece is called a service, a small program that owns one slice of the work and keeps its own data. This style of building from many small services is called microservices.

Here’s how we’d carve up our store.

Service What it owns
Catalog The list of products: names, prices, descriptions, photos
Cart Each shopper’s current cart, before they pay
Order Placed orders and their status
Inventory How many of each product are in stock
Payment Charging the shopper and tracking if it worked

The big idea: each service keeps its own database, so they don’t trip over each other.

  • The Catalog service can be tuned for fast reads, while the Order service can be tuned for safe writes. Different jobs, different needs.
  • If the Cart service has a hiccup, people can still browse products. The store doesn’t crash all at once.
  • Teams can work on each service on their own, which keeps everyone from stepping on the same code.

If this split is new to you, the deep dive on Monolith vs Microservices walks through exactly when one big program turns into many small services.

🔎 Product Catalog

Let’s start with the catalog, since that’s what shoppers see first. The catalog is read-heavy, which means most of the traffic is just looking at products, not changing them. Think about it:

  • A product like a popular phone might be viewed millions of times a day, but its price and description change rarely.
  • Reads (people browsing) hugely outnumber writes (a seller editing a product).
  • So we should make reads blazing fast, even if editing a product is a little slower. That’s a fair trade.

Here’s how we make browsing quick:

  • Cache aggressively. A cache is a small, super-fast store, usually in memory, that holds the data people ask for most. We keep the popular products in a cache like Redis, so most views never even touch the main database. If caching is new to you, the Introduction to Caching lesson covers exactly this trick.
  • Use a search index. When someone types “running shoes”, we don’t want to scan every product one by one. A search index is a special structure, built by a tool like Elasticsearch, that finds matching products almost instantly. Think of it like the index at the back of a book, it jumps you straight to the right page.

Why caching fits the catalog so well

Product views are wildly uneven. A small set of popular products gets most of the views. So even a modest cache holding the hottest items can serve a huge chunk of all browsing without ever touching the database.

🛒 Cart

Next comes the cart. A cart is per-user, meaning each shopper has their own, and it changes a lot as they add and remove things. Here’s what’s special about it:

  • A cart isn’t a permanent record like an order. It’s a quick scratchpad that lives until checkout.
  • Shoppers expect it to be instant. Adding an item should feel fast, with no waiting.
  • It changes far more often than the catalog, since people fiddle with their carts constantly.

So where do we keep carts? In a fast store, often Redis.

  • Redis is an in-memory store, which means it keeps data in fast RAM instead of slower disk, so reads and writes take well under a millisecond.
  • That speed is exactly what a cart needs, since shoppers are clicking and waiting.
  • We can also let each cart expire after a while, so abandoned carts clean themselves up automatically.

Carts can be temporary, orders cannot

Losing a cart is annoying but not the end of the world, the shopper can re-add items. Losing an order means losing money and trust. So we treat carts as fast and disposable, but orders as precious and permanent. That difference shapes where we store each.

🧾 Orders and Payments

This is where things get serious, because now real money moves. When a shopper checks out, the cart turns into an order, and we charge them. Both steps have to be rock solid.

First, orders need consistency. An order has to be all-or-nothing:

  • We don’t want to charge someone but forget to record their order.
  • We don’t want to record an order but forget to set aside the stock.
  • A change that’s all-or-nothing like this, where either everything happens or nothing does, is called a transaction.

Second, payments must be idempotent. Idempotency means doing the same operation twice has the same effect as doing it once.

  • Imagine the shopper’s phone loses signal right after they tap “Pay”, so the app retries the charge. Without protection, we’d charge them twice.
  • With idempotency, each payment carries a unique key. If the same key comes in again, the payment service says “I already did this one” and doesn’t charge again.
  • This is a must for anything touching money. The full idea is in the Idempotency lesson.

Now here’s the tricky part. The order flow spans several services: Order, Inventory, and Payment. We can’t wrap all of them in one simple transaction, because they each have their own database. So we use a saga, which is a way to run a multi-step process across services where each step can be undone if a later step fails.

  • If the payment fails, the saga undoes the stock reservation, so we don’t keep an item locked for an order that never happened.
  • Each step has a matching “undo” step, called a compensating action.
  • The deep dive lives in the Saga Pattern lesson.

Here’s the order flow as a saga.

Yes

No

Checkout: create order

Reserve stock in Inventory

Charge shopper in Payment

Payment OK?

Confirm order

Release stock, cancel order

Read it top to bottom: we create the order, reserve the stock, then try to charge. If the charge works, we confirm. If it fails, we release the stock and cancel, leaving things just as they were before.

📦 Inventory and the Oversell Problem

Now for the classic interview trap. Picture one item left in stock, and two shoppers both hit “Buy” at the exact same moment. If we’re not careful, both orders go through and we’ve sold something we don’t have. Selling more than you actually have is called the oversell problem.

Why does it happen? Because of timing:

  • Shopper A reads “stock = 1”. A moment later, Shopper B also reads “stock = 1”.
  • Both think there’s one available, so both proceed.
  • Both write “stock = 0”, and now two orders exist for one item. Oops.

So how do we stop it? We make the stock check and the decrease happen as one safe, uninterruptable step.

  • Atomic decrements. “Atomic” means an operation finishes in one indivisible step that nobody can sneak in the middle of. We tell the database “decrease stock by 1, but only if it’s still above zero”. The database does the check and the decrease together, so only one shopper can grab that last item.
  • Reservations. Instead of decreasing stock at the very end, we set the item aside the moment checkout starts. That reserved item is held for a short time while payment happens, then either confirmed or released.
  • Distributed locks. A lock is a way to say “only one process can touch this item right now”. A distributed lock works across many servers, so even with shoppers spread over different machines, only one can update that stock count at a time. The Distributed Locks lesson goes deeper.

Overselling is a money and trust problem

If you sell an item you don’t have, you either disappoint a customer by cancelling their order or scramble to find more stock. Both hurt. This is why inventory needs strong consistency, not the relaxed kind we can get away with for, say, product view counts.

🏗️ High-Level Design

Okay, let’s put the pieces together. When you zoom out, the whole platform is a handful of services sitting behind one front door.

That front door is the API gateway, a single entry point that takes every request from shoppers and routes it to the right service. It’s like the front desk at a big office that sends each visitor to the correct department.

Client (browser or app)

API gateway

Catalog service + DB + cache

Cart service + Redis

Order service + DB

Inventory service + DB

Payment service + DB

Message queue

Let’s trace what flows where:

  • The client (a browser or phone app) sends every request to the API gateway.
  • The gateway forwards browsing requests to the Catalog service, which leans on its cache to answer fast.
  • Cart actions go to the Cart service, backed by Redis for speed.
  • Checkout goes to the Order service, which kicks off the saga across Inventory and Payment.
  • The Order service drops follow-up work, like sending a confirmation email, onto a message queue. A message queue is a waiting line for tasks, so slow jobs happen in the background instead of making the shopper wait.

Notice how each service owns its own database. That’s the microservices idea in action.

⚡ Handling Flash Sales

Now let’s talk about the scary moment every shopping site dreads. A flash sale is a huge spike in traffic when a big deal drops, like a limited-stock phone going on sale at noon. Suddenly a million shoppers all hit “Buy” in the same few seconds.

This is dangerous for a few reasons:

  • The catalog gets overloaded as everyone loads the same product page at once.
  • The inventory gets pounded as everyone races for the limited stock, which is exactly where overselling loves to happen.
  • The whole system can buckle under the sudden load if we don’t prepare.

Here’s how we survive it:

  • Cache the deal page hard. Since everyone’s looking at the same product, serve it almost entirely from the cache or a CDN, so the database barely feels the crowd.
  • Put a queue in front of checkout. Instead of letting a million checkouts slam the Order service at once, we line them up in a message queue and process them at a steady pace the system can handle.
  • Rate limit. A rate limit caps how many requests one shopper can send in a short window, which blocks bots and stops any one person from flooding the system.
  • Guard inventory carefully. Use the atomic decrements and reservations from earlier, so even with everyone racing, we never sell more than we have.

A flash sale is an inventory stress test

Flash sales are where overselling bugs show up in the worst possible spotlight, with thousands of angry customers watching. So test your atomic stock logic under heavy load before the sale, not after.

📈 Scaling It

Now imagine this store gets huge, with millions of products and orders. One server and one database per service won’t cut it. Here’s how we grow.

  • Cache the catalog. As we keep saying, browsing is read-heavy, so a cache up front soaks up most of those reads before they reach the database.
  • Shard the data. Sharding means splitting one giant database into smaller pieces, called shards, so no single machine holds everything. We can shard products by ID, so different products live on different machines, each handling a slice of the load.
  • Process orders asynchronously with queues. “Asynchronous” means the shopper doesn’t wait for every last step. We confirm the order quickly, then let the message queue handle slower jobs, like emailing a receipt or telling the warehouse to ship, in the background.
  • Use read replicas. A read replica is an extra copy of a database that only handles reads. Since browsing and order lookups are read-heavy, we point those at replicas and keep the main database free for writes.

Clients

API gateway

Catalog (cache + read replicas)

Order service

Message queue

Background workers (email, shipping)

Sharded order DB

Put together, this handles enormous load. Reads fly through caches and replicas, heavy jobs move to the background, and we add machines as the store grows.

🧰 Tech Choices

Part of system design is not just naming pieces, it’s saying why you picked each one. Here are the main technology decisions for this system and the reason behind each.

Decision Choice Why
Organize the system Microservices Catalog, cart, and orders can scale and change independently.
Search products Search index (e.g. Elasticsearch) Fast search and filtering that a plain database is poor at.
Hold the cart Redis Fast, per-user, temporary data.
Orders and payments Relational database (ACID) Money needs transactions that fully succeed or fully fail.
Avoid overselling Atomic stock decrement One safe step stops two buyers taking the last item.

⚠️ Common Mistakes and Misconceptions

A few things trip people up on this one. Let’s clear them out.

  • “Just use one big database for everything.” Tempting, but it ties all your services together so one slow query or one crash can sink the whole store. Give each service its own data so they can scale and fail on their own.
  • “Overselling is a rare edge case, ignore it.” It’s not rare at all during a sale, when everyone races for limited stock. You must guard inventory with atomic decrements, reservations, or locks, or you’ll sell things you don’t have.
  • “Just charge the card, retries are no big deal.” Without idempotency, a retried payment charges the shopper twice. Every payment needs a unique key so a repeat request is recognized and skipped.
  • “Carts and orders are basically the same.” No. A cart is a fast, throwaway scratchpad. An order is a precious, permanent record involving money. They get stored and protected very differently.
  • “A saga is just one big transaction.” It isn’t. A real transaction is all-or-nothing inside one database. A saga stitches steps across services and undoes earlier steps with compensating actions when a later one fails.

🛠️ Design Challenge

Try extending the design yourself. Think each one through first, then open the answer to see a full breakdown.

Recommendations. Show shoppers products they might like, based on what they viewed or bought. Where would this service get its data, and why keep it separate from the catalog?

Reviews and ratings. Let shoppers rate products and leave reviews. Is this read-heavy or write-heavy, and how does that shape where you store them?

Returns. Let a shopper return an item after buying it. How does this affect inventory and payment, and which steps do you reverse?

🧩 What You’ve Learned

You can now design an e-commerce platform from scratch and talk through it clearly. Here’s what you picked up.

  • ✅ The core job: let people browse, cart, and buy reliably, even under heavy load.
  • ✅ Functional vs non-functional requirements, and why you gather them first.
  • ✅ Splitting the store into Catalog, Cart, Order, Inventory, and Payment services, each with its own data.
  • ✅ The catalog is read-heavy, so we cache aggressively and use a search index.
  • ✅ Carts live in a fast store like Redis, since they’re per-user and change a lot.
  • ✅ Orders need consistency, payments must be idempotent, and the checkout flow runs as a saga across services.
  • ✅ The oversell problem, and how atomic decrements, reservations, and distributed locks prevent it.
  • ✅ Surviving flash sales with caching, queues, rate limits, and careful inventory.

Check Your Knowledge

Test what you learned. Pick an answer for each question, then click Check.

  1. 1

    Why does each service (Catalog, Cart, Order, Inventory, Payment) keep its own database?

    Why: Separate databases let each service be tuned, scaled, and fail on its own, so a problem in one part doesn't take everything down.

  2. 2

    How do you stop two shoppers from both buying the last item in stock?

    Why: An atomic decrement does the check and the decrease as one indivisible step, so only one shopper can take the last item.

  3. 3

    Why must payments be idempotent?

    Why: A unique key per payment lets the service recognize a repeated request and charge the shopper only once.

  4. 4

    Why is a saga used for the checkout flow instead of one transaction?

    Why: One transaction can't cover multiple services' databases, so a saga sequences the steps and uses compensating actions to undo them on failure.

🚀 What’s Next?

This case study leans hard on two ideas that show up across almost every large system. Go deeper on them next.

  • Monolith vs Microservices explains when one big program should become many small services, the exact split we made for our store.
  • Saga Pattern breaks down how to run a multi-step order flow across services and safely undo it when a step fails.

Once you’re comfortable with those, come back and try the design challenge again. You’ll see the whole platform click into place.

Share & Connect