Service Discovery Explained
Table of Contents + −
Picture a system made of many small services talking to each other. Now here’s the tricky part:
- Services start up, they crash, they restart, and they get moved around all the time.
- Every time that happens, their network address can change. (A network address is just where a service lives, like an IP address and a port.)
- So if the Orders service wants to call the Payments service, how does it even know where Payments is right now?
That’s the question this whole lesson answers. By the end, you’ll know how services find each other automatically, even when nothing stays in one place.
🎯 The Problem
Let’s start with the pain, because that’s what makes the rest click.
- In the old days, you had one big server with a fixed address. You wrote that address down once and it just worked.
- But modern systems run in the cloud and split work into many small services. We call these microservices, small independent services that each do one job.
- In the cloud, instances come and go all the time. An instance is just one running copy of a service. (You often run several copies so you can handle more traffic.)
- When a copy crashes, the system starts a fresh one somewhere else, and that new copy gets a brand new address.
So here’s the trap people fall into:
- They hardcode the address. Hardcoding means you type the exact IP right into the code, like
10.0.1.7. - It works on Monday. Then on Tuesday that instance dies, a new one comes up at
10.0.4.2, and suddenly nothing can reach it. - Now multiply that by hundreds of services, all moving around. Updating addresses by hand becomes impossible.
We need a way for services to find each other without anyone typing addresses by hand. That’s exactly what service discovery is for.
🧭 What is Service Discovery
So let’s define it plainly.
- Service discovery is the way services automatically find the current network address of other services.
- Instead of asking “what’s the IP of Payments?” and hoping it hasn’t changed, a service just asks “where is Payments right now?” and gets a fresh, correct answer.
- The key word is automatically. Nobody updates a config file by hand. The system keeps track of who’s running and where, all on its own.
Think of it like calling your friend Alex. You don’t memorize Alex’s phone number. You save Alex under their name, and even if the number changes, you just tap the name and it dials the right one. Service discovery is that contacts app for services.
📒 The Service Registry
For discovery to work, something has to keep the list of who’s running. That something is the registry.
- A service registry is a live directory where every service writes down its address so others can look it up.
- When a service starts, it tells the registry “hi, I’m Payments, and I’m at this address.” That step is called registering.
- When another service needs Payments, it asks the registry “where’s Payments?” and gets back the current address.
But there’s a catch. What if a service crashes without telling anyone? The registry would hand out a dead address. So registries use health checks:
- A health check is a small regular ping that asks each service “are you still alive and okay?”
- If a service stops answering, the registry marks it as unhealthy and stops handing out its address.
- This way the directory stays fresh. Callers only ever get addresses that actually work.
Here’s the whole idea in one picture. Services register themselves, and a caller asks the registry for an address before calling.
Why the registry is the heart of it
Everything in service discovery revolves around the registry. It’s the single source of truth for who’s running and where. Get the registry right, with solid health checks, and the rest more or less takes care of itself.
🔀 Client-Side vs Server-Side Discovery
Okay, so the registry holds the addresses. But who actually does the lookup? There are two common styles, and this is a favorite interview question.
- In client-side discovery, the caller does the work itself. The Orders service asks the registry directly, gets back a list of healthy Payments addresses, and picks one to call.
- In server-side discovery, a middleman does the work. Orders just sends its request to a load balancer or router, and that middleman asks the registry and forwards the call. (A load balancer is a piece that spreads requests across several copies of a service.)
The difference is simply where the lookup happens. Let’s lay them side by side.
| Aspect | Client-Side Discovery | Server-Side Discovery |
|---|---|---|
| Who asks the registry | The calling service itself | A load balancer or router in the middle |
| Who picks the instance | The caller picks from the list | The middleman picks for you |
| Caller complexity | Higher, needs discovery logic built in | Lower, caller just sends the request |
| Extra moving parts | Fewer, no middle hop | More, the load balancer is one more thing to run |
| Typical example | Netflix Eureka with a client library | Kubernetes services, AWS load balancers |
Neither one is the winner
Both styles solve the same problem, just in different places. Client-side keeps things simple to run but pushes logic into every service. Server-side keeps services dumb and simple but adds a middleman to manage. Pick based on what your team can maintain.
⚙️ How It Works
Let’s walk through the full life of a service, from the moment it starts to the moment it disappears.
- A service instance starts up. Say a new copy of Payments boots at address
10.0.4.2. - It registers itself. The instance tells the registry “I’m Payments, I’m at
10.0.4.2, and I’m healthy.” - The registry keeps checking on it. Health checks ping the instance regularly to confirm it’s still alive.
- Other services discover it. When Orders needs Payments, it asks the registry and gets
10.0.4.2back, then makes the call. - When the instance goes away, it leaves the directory. If it shuts down cleanly, it deregisters, meaning it tells the registry “I’m going offline, remove me.”
- And if it dies suddenly without a goodbye? The health check fails, the registry notices it’s not answering, and quietly drops it from the list.
So the directory is always catching up to reality on its own. New copies appear, dead copies vanish, and callers never have to know the messy details.
🌍 Real Examples
You don’t have to build any of this from scratch. These tools already do it, and you’ll hear their names a lot.
- Consul is a popular registry from HashiCorp. Services register with it, and it runs health checks and even offers DNS-based lookups.
- etcd is a reliable key-value store often used to hold service info. (A key-value store is a simple database that maps a name to a value.) Kubernetes uses etcd inside.
- Eureka is Netflix’s registry, built for client-side discovery. Services register, and client libraries fetch the list and pick an instance.
- Kubernetes has discovery built right in. When you create a service, Kubernetes gives it a stable internal name, and a built-in DNS-based discovery turns that name into a live address automatically. So your code just calls
paymentsand Kubernetes routes it to a healthy copy.
Kubernetes makes it feel invisible
If you work with Kubernetes, you’re already using service discovery, maybe without realizing it. You call a service by its name and it just works. Behind that simple name sits a registry, health checks, and routing doing all the hard work.
⚡ Why It Matters
This isn’t just a neat trick. Service discovery is what makes modern, elastic systems even possible.
- It enables autoscaling. Autoscaling means the system adds more copies when traffic is high and removes them when it’s quiet. Each new copy registers itself, so callers find it instantly, no config changes needed.
- It enables self-healing. When a copy crashes, a fresh one comes up and registers, while the dead one gets dropped. The system patches itself without anyone waking up at 3 a.m.
- It removes manual address updates. Nobody edits a list of IPs by hand, so a huge source of human error just disappears.
In short, services can come and go freely, and the rest of the system keeps working. That freedom is the whole point of running microservices in the first place.
⚠️ Common Mistakes and Misconceptions
A few ideas trip people up early. Let’s clear them out.
- “Just hardcode the IPs, it’s simpler.” It feels simpler on day one. But the moment an instance moves or scales, those hardcoded addresses break, and you’re back to editing configs by hand across the whole system.
- “The registry never needs health checks.” It does. Without health checks, the registry happily hands out addresses of services that already crashed, so callers fail anyway. Health checks are what keep the directory honest.
- “Stale entries don’t matter.” A stale entry is an address still listed even though the service is gone. Ignore those and your traffic gets routed into a black hole. Good discovery aggressively removes anything that fails its checks.
- “Service discovery is the same as load balancing.” They’re related but not the same. Discovery finds where services are. Load balancing decides which copy to send each request to. They often work together, but they’re different jobs.
🛠️ Design Challenge
Try this on your own to test what you’ve learned.
Imagine you run a shopping site with an Orders service that calls a Payments service, and you autoscale Payments from two copies up to ten during a big sale. Sketch out how discovery handles it:
- How does each new Payments copy join the system so Orders can find it?
- What happens to a copy that crashes mid-sale?
- Would you go client-side or server-side here, and why?
Walk through the registration, the health checks, and the lookups. If you can explain how Orders always reaches a healthy Payments copy without anyone editing a single address, you’ve got it.
🧩 What You’ve Learned
You can now explain how services find each other in a system where nothing stays still. Here’s what you’ve picked up.
- ✅ Service discovery lets services find each other’s current address automatically, so you never hardcode IPs.
- ✅ A service registry is the live directory where services register and others look them up.
- ✅ Health checks remove dead instances, so callers only get addresses that actually work.
- ✅ Client-side discovery puts the lookup in the caller, while server-side puts it in a load balancer or router.
- ✅ Tools like Consul, etcd, Eureka, and Kubernetes give you discovery by default.
- ✅ Discovery is what makes autoscaling and self-healing possible without manual address updates.
Check Your Knowledge
Test what you learned. Pick an answer for each question, then click Check.
- 1
What problem does service discovery solve?
Why: In dynamic systems, addresses keep changing, so discovery finds the current address automatically instead of using hardcoded IPs.
- 2
What is a service registry?
Why: The registry is the live directory of who is running and where, and it is the heart of service discovery.
- 3
In client-side discovery, who asks the registry and picks an instance?
Why: Client-side discovery puts the lookup logic in the caller, which asks the registry directly and chooses an instance.
- 4
Why are health checks important in a service registry?
Why: Without health checks the registry would hand out addresses of crashed instances, so calls would fail.
🚀 What’s Next?
You now understand how services locate each other in a dynamic system. Next, zoom out to see where these services come from and how they’re exposed to the world.
- Monolith vs Microservices explains why we split systems into many small services in the first place.
- API Gateway shows how a single front door routes outside traffic to the right service in the background.
Get these two and you’ll have a clear picture of how a microservices system fits together end to end.