Challenges of Microservices
Table of Contents + −
Microservices sound great on a slide:
- Small services, each one doing its own job, teams shipping fast and independently.
- It all looks clean and tidy on the architecture diagram.
- But then reality shows up. You’re not running one app anymore, you’re running thirty of them at once.
- And suddenly there are problems you never had to think about before.
So here’s the honest take. Microservices solve real problems, but they hand you a fresh set of problems in return. The thing is, you should know those new problems before you sign up for them. Let’s walk through them one by one, in plain language.
🎯 The Big Theme
Before we list the specific pains, let’s get the one idea that explains all of them. Once this clicks, every challenge below makes sense.
- In a single app (a monolith), when one part of your code calls another part, it’s just a normal function call. It happens inside the same program, on the same machine, instantly.
- A monolith just means your whole app is one single program deployed as one unit.
- When you split that app into microservices, those parts now live in separate programs, often on separate machines.
- So a call that used to be a quick in-process call becomes a network call, a message that has to travel over the network to reach another service.
And that one switch, from in-process calls to network calls, is where almost all the trouble comes from. We have a name for it: distributed-system complexity. A distributed system is just one where the pieces run on different machines and talk over a network, and “complexity” here means all the new ways things can go slow, fail, or get out of sync once that network is in the middle.
So keep this in your head as you read. Every challenge below is really the same root cause showing up in a different costume.
🌩️ The Network Is Now in the Way
Let’s start with the most obvious one. The moment your services talk over the network, that network becomes a thing that can hurt you.
- An in-process function call basically never fails and takes almost no time. A network call is the opposite, it can be slow, and it can fail outright.
- Network calls add latency, which is just the delay for a message to travel there and back. One service waiting on another, which is waiting on another, and those little delays stack up.
- The network can drop the message, the other service might be down, or it might be too busy to answer in time.
- And here’s the kicker: more services means more connections between them, so more places where something can break. Each new connection is one more failure point.
So in a monolith, if your code runs, the call works. In microservices, you have to assume calls will sometimes fail and plan for it: retries, timeouts, fallbacks. That’s extra work you simply didn’t have before.
Failure is normal now, not rare
In a distributed system, something is almost always a little bit broken. A service is restarting, a network link is slow, a request times out. You can’t treat failure as a rare accident anymore. You have to design every service expecting the ones it calls to fail sometimes.
🧩 Data Consistency Is Hard
This is the one that surprises people the most, so let’s slow down here.
- In microservices, each service owns its own database. The orders service has its own data, the payments service has its own, the inventory service has its own. They don’t share one big database.
- That separation is on purpose, it keeps services independent. But it creates a real headache when one action needs to touch several services at once.
Let’s make it concrete. Say a customer named Alex places an order. That single click has to do a few things across services:
- The orders service creates the order.
- The payments service charges Alex’s card.
- The inventory service reduces the stock count.
Now here’s the problem:
- In a monolith with one database, you’d wrap all three in a single transaction. A transaction is an all-or-nothing unit of work, either every step succeeds, or none of them do, and the database guarantees that for you.
- But these three services have three separate databases. There’s no single transaction that can span all of them. So payment might succeed while inventory fails, and now your data disagrees with itself. Alex got charged but the stock never went down.
So how do people deal with this? Two ideas you’ll hear a lot:
- Eventual consistency means the data won’t match across services in the same instant, but it will catch up and become correct shortly after. You give up “everything is in sync right now” in exchange for the system actually working across services.
- A saga is a pattern that strings the steps together one by one, and if a later step fails, it runs “undo” steps for the earlier ones. So if inventory fails, the saga issues a refund to undo the payment. It’s like a transaction stitched together by hand across services.
The takeaway: keeping data correct across services is genuinely hard, and it’s a problem the monolith never made you solve. We’ll go deeper on sagas and eventual consistency in the communication lessons.
🔍 Debugging and Tracing
Now picture something went wrong in production. In a monolith, finding the bug is annoying but doable. In microservices, it gets much harder.
- One request from a user doesn’t stay in one place. It hops from service to service, maybe through five or six of them, before the user gets an answer.
- So when something breaks or gets slow, the question becomes: which service caused it? The logs are scattered across all of them, each on its own machine.
- You can’t just open one log file and read top to bottom anymore. The story of a single request is spread across many places.
To survive this, teams add distributed tracing. Tracing here means tagging each request with a unique ID and following that ID as it travels through every service, so you can see the full path it took and where it slowed down or failed.
- Without tracing, debugging is like trying to follow a conversation when you can only hear every fifth word.
- With it, you can see the whole journey of one request across all the services at once.
So the cost here is real: you need extra tooling just to answer questions that were trivial in a monolith. We’ll cover tracing more when we talk about observability.
🚀 Deployment and Ops Overhead
Even when everything works, just running the system day to day is more work. This is the part people underestimate the most.
- One app means one thing to deploy, monitor, and secure. Thirty services means thirty things to deploy, monitor, and secure.
- Each service needs its own pipeline to build and ship it, its own health checks, its own alerts when it misbehaves.
- Services have to find each other on the network, which needs its own setup. And the connections between them need to be secured too.
- Versions get tricky. When you change one service, you have to make sure it still works with all the others that call it, even though they update on their own schedules.
So you end up needing a whole layer of infrastructure tooling just to keep the lights on: containers, orchestration, monitoring dashboards, centralized logging. None of that was necessary when it was all one app.
This is why small teams struggle with microservices
A big company can afford a platform team whose whole job is running this infrastructure. A small team of three people usually can’t. For them, all this operational overhead eats the very time they were hoping to save. That’s the trade nobody puts on the slide.
📋 The Challenges at a Glance
Here’s everything in one place, with why each one is actually hard.
| Challenge | Why it’s hard |
|---|---|
| Network calls | In-process calls become network calls that can be slow or fail, and more services means more failure points |
| Data consistency | Each service has its own database, so one action across services can’t use a single transaction |
| Debugging | One request hops through many services, so logs are scattered and you need distributed tracing |
| Deployment | Many services to build, ship, and version, each on its own schedule but still expected to work together |
| Operations | Every service needs its own monitoring, alerts, and security, plus a heavy infrastructure layer to run it all |
🧩 One Click, Many Services
To really feel the complexity, look at what a single user action turns into. Alex clicks “Place Order” once. Here’s where that one click actually goes.
- One click fans out to several services, each making its own network call.
- Each service writes to its own separate database.
- If any one of those steps is slow or fails, the whole action is in trouble, and there’s no single transaction tying them together.
That picture is the challenges of microservices in one diagram. What used to be one function call is now a web of network calls and separate databases.
🧠 So Should You Avoid Them?
Fair question after all that. And the answer is no, not at all, but go in with your eyes open.
- Microservices exist because at large scale, the monolith starts to hurt too. Big teams stepping on each other, slow deploys, one bug taking down everything.
- So it’s a trade, not a free win. You’re swapping the monolith’s problems for the distributed system’s problems.
- The usual advice is start with a monolith. Build your app as one unit first, because it’s simpler and faster to move when you’re small.
- Then split off microservices only when you have a real reason, like a team that’s grown too big to coordinate, or one part of the app that needs to scale on its own.
So the rule of thumb: don’t reach for microservices because they sound modern. Reach for them when the pain of the monolith is real and the cost of all these challenges is worth paying.
⚠️ Common Mistakes and Misconceptions
A few beliefs trip people up. Let’s clear them out.
- “Microservices remove complexity.” They don’t remove it, they move it. The complexity leaves your code and shows up in the network, the data, and the operations between services. Often there’s more of it overall, just spread around.
- “Just use one big transaction across services.” You can’t. Each service has its own database, and a single transaction can’t span separate databases. That’s exactly why sagas and eventual consistency exist.
- “No extra tooling needed, it’s the same as before.” Far from it. You need distributed tracing to debug, centralized logging to read what happened, and orchestration to run everything. The tooling is part of the cost, not an optional extra.
- “More services always means better.” Splitting too early, or too finely, gives you all the overhead with none of the benefit. The number of services should follow real need, not ambition.
🛠️ Design Challenge
Try this on your own to test yourself.
Imagine Alex transfers money from one account to another, and your system has a separate accounts service and a separate ledger service, each with its own database. Walk through what could go wrong.
- What happens if the money leaves Alex’s account but the ledger entry fails to save?
- Since you can’t use one transaction across both databases, how would a saga undo the first step?
- How would you trace this single transfer if it touched both services and something timed out in the middle?
Write down your answers. This is exactly the kind of reasoning a system design interview is looking for.
🧩 What You’ve Learned
You can now explain the real costs of microservices, not just the shiny parts. Here’s what you’ve picked up.
- ✅ Splitting an app turns fast in-process calls into network calls, which brings distributed-system complexity.
- ✅ The network adds latency and failure, and more services means more places to break.
- ✅ Each service has its own database, so cross-service actions can’t use one transaction, which is why we use sagas and eventual consistency.
- ✅ One request hops through many services, so debugging needs distributed tracing.
- ✅ Running many services adds real deployment and operational overhead.
- ✅ The smart move is usually to start with a monolith and split only when there’s a real reason.
Check Your Knowledge
Test what you learned. Pick an answer for each question, then click Check.
- 1
Why do most challenges of microservices appear once you split a monolith?
Why: Turning in-process calls into network calls brings distributed-system complexity, which is the root of most of these problems.
- 2
Why is data consistency hard across microservices?
Why: Because each service has a separate database, a single all-or-nothing transaction cannot cover an action that touches several services.
- 3
What does a saga do when a later step in a multi-service action fails?
Why: A saga strings the steps together and, if one fails, runs undo steps (like a refund) for the earlier ones.
- 4
What helps you debug a single request that hops through many services?
Why: Distributed tracing tags each request and follows that ID through every service, so you can see the full path and where it failed.
🚀 What’s Next?
You now know the challenges, so next we look at how services actually deal with them.
- Service Communication shows how services talk to each other over the network and how they handle failures and consistency.
- Benefits of Microservices weighs the other side, so you can decide when the trade is worth making.
Once you’ve got both sides, you’ll be able to argue for or against microservices like someone who’s actually run them.