Saga Pattern Explained

Think about something you do all the time: placing an order online. You click “Buy”, and in the background a few different services spring into action:

  • One service reserves the item in the warehouse, so nobody else grabs your last unit.
  • Another service charges your card.
  • A third service ships the package to your door.

Here’s the catch. In a microservices setup, each of those is a separate service with its own database. So how do you make sure all three either happen together or none of them happen? You don’t want your card charged for something that never ships, right? That awkward in-between is exactly what the saga pattern is built to handle.

🎯 The Problem

Let’s name the real pain first. Then the solution makes sense.

  • In a single app with one database, you’d just wrap all three steps in one database transaction. A transaction is a group of steps that either all succeed together or all get undone together. If you’re fuzzy on this, the ACID and transactions lesson covers it.
  • That all-or-nothing guarantee is great. If the shipping step fails, the database quietly rolls back the charge and the stock reservation, like nothing ever happened.
  • But now move to microservices. Each service has its own separate database. The stock database, the payment database, and the shipping database don’t know about each other.
  • A normal database transaction can only cover one database. It cannot reach across three different services and three different databases. There’s no single “undo” button that spans all of them.

So you’re stuck. You could charge the card, then find out shipping is broken, and now there’s no automatic way to give the money back. We need a pattern that handles this on purpose.

📜 What is the Saga Pattern

So here’s the idea. Instead of one giant transaction across everything, we break the work into a chain of smaller ones.

  • A saga is a way to manage a transaction that spans multiple services by splitting it into a sequence of local transactions, one per service.
  • A local transaction is just a normal transaction that lives inside a single service and its own database. Each service does its own little step and commits it on its own.
  • The steps run in order. Reserve stock, then charge the card, then ship. Each one finishes and saves before the next begins.
  • And here’s the important part. If a step fails partway through, the saga doesn’t have a magic rollback. Instead it runs compensating actions to undo the steps that already succeeded, walking backward through the chain.

So a saga is really two things: a forward path where each service does its bit, and a backward path that cleans up if something goes wrong.

🔄 Compensating Transactions

This backward path is the heart of the pattern, so let’s slow down on it.

  • A compensating transaction is an action that undoes the effect of an earlier step. If you charged the card, the compensation is to refund it. If you reserved stock, the compensation is to release it back.
  • The thing to understand is that this is not a true rollback. A real rollback erases history, like the step never happened. A compensation is a brand-new step that does the opposite of the old one.
  • So a refund doesn’t delete the original charge. It records a new “give the money back” action. The books still show both, the charge and the refund.
  • This means every forward step needs a matching undo step that you design yourself. Reserve stock pairs with release stock. Charge card pairs with refund card. No automatic safety net here, you build the undo.

Undo, not erase

A compensating action makes things right going forward, it doesn’t pretend the past never happened. Think of it like an apology and a refund, not a time machine. This is why some steps are harder to compensate than others, and why you design each undo on purpose.

⚙️ How It Works

Let’s walk the order example all the way through, both when it works and when it breaks.

When everything goes smoothly, the chain runs forward:

  • The stock service reserves your item and commits that to its own database.
  • The payment service charges your card and commits that.
  • The shipping service ships the package and commits that.

Each step finishes on its own before the next one starts. No single transaction wraps them, but together they get the job done.

Now suppose the charge fails, maybe your card is declined. Stock was already reserved, so we can’t just stop and walk away. We compensate backward:

  • The payment step failed, so there’s nothing to undo there yet.
  • We go back to the stock service and run its compensation, releasing the reserved item so someone else can buy it.
  • The order ends in a clean “failed” state, with no stock stuck and no money taken.

Here’s that flow as a picture.

Success

Charge fails

Start order

Reserve stock

Charge card

Ship package

Order complete

Compensate: release stock

Order failed cleanly

See how the failure doesn’t just halt things? It triggers an undo for every step that already committed. That’s the saga doing its job.

🎭 Choreography vs Orchestration

Okay so the steps need to run in order and trigger undos on failure. But who’s in charge of coordinating all that? There are two styles, and this is a classic interview point.

  • Choreography means there’s no central boss. Each service reacts to events from the others. The stock service finishes and shouts “stock reserved!”, the payment service hears that and charges the card, and so on. Everyone listens and reacts on their own.
  • Orchestration means there’s a central coordinator, often called an orchestrator. This one component tells each service what to do, in order, and decides when to trigger compensations. The services just follow instructions.

Here’s how the two compare.

Aspect Choreography Orchestration
Who’s in charge No central boss, services react to events A central coordinator runs the show
How steps trigger Each service listens for events and acts The coordinator sends commands one by one
Coupling Loose, services don’t know each other directly Services depend on the coordinator
Following the flow Harder, logic is spread across services Easier, the flow lives in one place
Best when Few simple steps Many steps or complex undo logic

A quick way to remember it

Choreography is like dancers who all know the routine and move off each other’s cues, with no director on stage. Orchestration is like an orchestra with a conductor pointing at each player. Same goal, very different control.

⚖️ The Trade-off

Nothing’s free, so let’s be honest about what you give up to get this.

  • What you gain is consistency across services. Even with separate databases, your order, payment, and shipping eventually agree on what happened. No money taken for an unshipped order.
  • What you give up is the instant all-or-nothing feeling. A saga gives you eventual consistency, meaning the system becomes consistent after a short while, once all the steps and any compensations finish. It’s not atomic, where everything flips together in one instant.
  • There’s a window in the middle where things look half-done. Stock is reserved but the card isn’t charged yet. That’s normal for a saga, and your design has to expect it.
  • And it’s just more work. You write every forward step and every matching undo step by hand. More moving parts means more to test and more ways to get it wrong.

So the saga trades simplicity and instant atomicity for the ability to span services at all. When you truly need cross-service consistency, that trade is worth it.

🌍 Where It’s Used

This isn’t some rare academic thing. You bump into sagas behind a lot of everyday flows.

  • E-commerce orders, exactly like our example. Reserve stock, charge payment, arrange shipping, each its own service.
  • Travel and hotel booking. Book the flight, book the hotel, book the rental car. If the hotel fails, cancel the flight you already booked, which is a compensation.
  • Banking and money transfers, where one account is debited and another credited across different systems.
  • Really any multi-service business process where several services each own a piece of one logical operation, and you need them all to end up agreeing.

⚠️ Common Mistakes and Misconceptions

A few ideas trip people up here. Let’s clear them out.

  • “Just use one transaction across all the services.” You can’t. A normal database transaction lives inside one database. It cannot span three separate services and their separate databases, and that’s the whole reason the saga exists.
  • “Compensation is the same as an automatic rollback.” No. A rollback is the database erasing changes for you. A compensation is a new step you wrote by hand that does the opposite of an earlier step, like a refund undoing a charge.
  • “Once the steps are defined, partial failure takes care of itself.” It doesn’t. Handling the half-done middle is the hard part. You have to plan for a step failing, a compensation failing, or the same event arriving twice. Ignoring partial failure is how sagas leave money stuck.
  • “A saga is atomic like a local transaction.” It isn’t. It’s eventually consistent. There’s a real window where some steps are done and others aren’t, and your system has to be okay with that.

🛠️ Design Challenge

Try this one on your own to test yourself.

Imagine Alex is building a hotel booking flow with three services: book the flight, book the hotel, and reserve a rental car. They run in that order. Now answer these:

  • Write down the compensating action for each of the three steps.
  • The car reservation fails. Which compensations run, and in what order?
  • Would you pick choreography or orchestration for this, and why?

If you can answer all three, you’ve got the saga pattern down cold. This is exactly how you’d reason about it in a real design interview.

🧩 What You’ve Learned

You can now explain how to keep a business process consistent across many services. Here’s what you’ve picked up.

  • ✅ A normal database transaction can’t span multiple microservices, because each one has its own database.
  • ✅ A saga breaks a distributed transaction into a sequence of local transactions, one per service.
  • ✅ Compensating actions undo earlier steps on failure, and they’re hand-built undos, not automatic rollbacks.
  • ✅ On failure the saga compensates backward through the steps that already committed.
  • ✅ Choreography uses events with no central boss, while orchestration uses a central coordinator.
  • ✅ Sagas give eventual consistency, not instant atomicity, in exchange for spanning services at all.

Check Your Knowledge

Test what you learned. Pick an answer for each question, then click Check.

  1. 1

    Why can't you use one normal database transaction across multiple microservices?

    Why: A normal transaction lives inside a single database, so it cannot span the separate databases that different services own.

  2. 2

    What is a compensating transaction in a saga?

    Why: A compensation is a hand-built opposite step, such as refunding a charge, not an automatic rollback that erases history.

  3. 3

    What is the main difference between choreography and orchestration?

    Why: In choreography services react to each other's events, while in orchestration one coordinator tells each service what to do.

  4. 4

    What kind of consistency does a saga give you?

    Why: A saga is eventually consistent, with a window where some steps are done and others are not before everything settles.

🚀 What’s Next?

Now that you can manage consistency across services, here’s where to go deeper.

  • Challenges of Microservices shows the broader set of problems that come with splitting an app into services, and saga is one answer to them.
  • Event-Driven Architecture explains the events that power choreography-style sagas, and how services talk without a central boss.

Get those two under your belt, and the way modern distributed systems stay consistent will start to feel natural.

Share & Connect