Logging Basics
Table of Contents + −
It’s 2am. Your phone buzzes:
- Users are complaining that checkout is broken.
- You roll out of bed, open your laptop, and stare at the screen.
- The app is running. Nothing is on fire. But something is clearly wrong.
- So where do you even look?
Here’s the thing. Your code already ran, did its job (or failed), and moved on. That moment is gone. The only way to know what happened is if your system wrote down what it was doing as it went. That’s logging. And tonight, those notes are the difference between fixing this in five minutes and guessing for two hours.
🎯 Why Logging
Let’s start with the big word, because you’ll hear it everywhere.
- Observability means understanding what’s happening inside a system just by looking at what it puts out. You can’t open up a running server and watch the code think, right? So the system has to tell you what it’s doing.
- People usually talk about three pillars of observability: logs, metrics, and traces.
- Logs are the first pillar, and honestly the one you’ll reach for most. They’re the running diary of your system.
Why does this matter so much?
- When something breaks, logs are usually the first place you look. They tell you the story of what happened, step by step.
- They work even when nothing is broken. Want to know how many users signed up today, or which API got called the most? Logs can tell you.
- Without them, you’re debugging blind. You’d be poking at a live system hoping to recreate the exact moment it failed.
📝 What is a Log
Let’s nail down the basic unit first.
- A log (or a log line, or a log entry) is a timestamped record of one event that happened in your system.
- “Timestamped” just means it has the date and time stamped on it, so you know when it happened.
- “Event” is anything worth noting: a user logged in, a payment went through, a database call failed, a request came in.
So a single log line might say something like “at 2:01am, user 42 tried to pay and it failed.” That’s it. One event, one moment in time, written down.
Now imagine thousands of these lines, one after another, as your system runs. Read them top to bottom and you get the full story of what your app did. That stream of lines is your log.
A log is just a record, not an alarm
Writing a log doesn’t do anything on its own. It doesn’t page you or fix the problem. It just records the event so a human (or a tool) can read it later. Acting on logs comes afterward, when you search them or set up alerts.
🔢 Log Levels
Not every event is equally important, right? A user clicking a button is no big deal. A payment system crashing is a five-alarm fire. So every log line gets tagged with a level that says how serious it is.
This is super useful, because later you can say “just show me the errors” and skip the noise.
Here are the four levels you’ll use ninety percent of the time, from least to most serious.
| Level | What it means | When you’d use it |
|---|---|---|
DEBUG | Tiny details for developers | ”Here’s the exact value of this variable” while you’re hunting a bug |
INFO | Normal things happening | ”User logged in”, “Order placed”. The everyday story. |
WARN | Something looks off, but it still worked | ”Payment was slow”, “Retrying the connection”. A heads-up, not a failure. |
ERROR | Something actually failed | ”Payment failed”, “Database is down”. This needs attention. |
A simple way to remember it:
DEBUGandINFOare the calm, “everything’s normal” lines.WARNis the system tapping you on the shoulder.ERRORis the system shouting for help.
Pick a level you can filter by
In production you usually turn off DEBUG because it’s just too much noise, and keep INFO and above. That way your logs stay readable, and when there’s trouble you can zoom straight to the ERROR lines.
🧱 Structured Logging
Okay, so far you might picture logs as plain sentences. And for a long time, that’s exactly what they were. Lines like:
2am - user 42 payment failed because card declined
That reads fine to a human. But here’s the problem:
- A computer can’t easily pull “user 42” out of that sentence. It’s just a blob of text.
- When you have millions of these lines, you can’t ask clean questions like “show me every failed payment for user 42.”
- Searching plain text is slow and messy.
The fix is structured logging. That means writing each log as machine-readable key-value data instead of a plain sentence, usually as JSON. Each piece of information gets its own labeled field.
Here’s that same event as a structured log line.
{ "timestamp": "2026-06-09T02:01:33Z", "level": "ERROR", "service": "payment-api", "event": "payment_failed", "user_id": 42, "amount": 49.99, "reason": "card_declined"}Let’s read it field by field:
timestampis exactly when it happened.levelisERROR, so we know it’s serious.servicesays which part of the system this came from.eventnames what happened in one short tag.user_id,amount, andreasonare the details, each in its own labeled box.
Now a tool can do real work with this. You can ask it “give me every line where level is ERROR and user_id is 42” and get an instant answer. That’s the whole point: structure turns your logs from a wall of text into data you can search and filter.
🌐 Centralized Logging
Now picture real life. You don’t run one server, you run many.
- Modern apps are split across lots of machines and services. Maybe one for payments, one for users, one for search, and several copies of each.
- Each one is writing its own logs, to its own little file, on its own machine.
- So when checkout breaks, which box do you check? You don’t even know which machine handled that user’s request.
- SSH-ing into each server one by one to read files? That doesn’t scale past a handful of machines, and it definitely doesn’t work at 2am.
The answer is centralized logging. The idea is simple:
- Every server and service ships its logs into one shared, searchable place.
- Instead of hunting across machines, you open one dashboard and search everything at once.
Here’s the shape of it.
So all your services push their logs to a central store, and you search and build dashboards on top of it. Now finding that failed payment is one query, not a treasure hunt.
A couple of tools you’ll hear about for this:
- ELK is a popular stack of three open-source tools: Elasticsearch (stores and searches the logs), Logstash (collects and shapes them), and Kibana (the dashboard you look at). People say “ELK” to mean all three together.
- CloudWatch is Amazon’s built-in logging service. If you run on AWS, your apps can ship logs straight into it.
You don’t need to master these today. Just know the pattern: many services, one searchable home for all their logs.
🆔 Correlation IDs
Here’s a tricky one that comes up the moment you have multiple services.
- One user action, say clicking “Pay”, might bounce through several services. The request hits the payment service, which calls the user service, which calls the fraud-check service.
- Each of those writes its own logs.
- So now you’ve got pieces of one story scattered across three different services. How do you stitch them back together?
The trick is a correlation ID. That’s a single shared id attached to every log line from one request, all the way through.
- When the request first arrives, the system generates a unique id, like
abc-123. - That id gets passed along to every service the request touches.
- Every service stamps
abc-123onto its log lines.
So later, you search for abc-123 and get every log from that one user’s click, across every service, in order. The whole journey of that single request, in one view.
This leads straight into tracing
Following one request across many services is exactly what distributed tracing is built for. Correlation IDs are the simple, do-it-yourself version of that idea. Once you’re comfortable here, the Distributed Tracing lesson takes it further.
⚡ Good Logging Habits
Logging well is a skill. A few habits will save you a lot of pain.
- Log meaningful events. Log the things you’d actually want to know about later: a user signed up, a payment failed, a job finished. Not every tiny step.
- Include context. A line that says “error” tells you nothing. A line that says “payment failed for user 42, card declined” tells you everything. Always add the who, what, and why.
- Never log secrets. No passwords, no credit card numbers, no API keys. We’ll come back to this one, because it’s that important.
- Pick the right level. A failed payment is an
ERROR, not anINFO. A retry is aWARN. Getting levels right is what makes filtering useful. - Use structured logs. Write JSON, not sentences, so your future self can actually search them.
⚠️ Common Mistakes and Misconceptions
A few traps catch almost everyone when they start out. Let’s clear them up.
- “Just log everything to be safe.” Tempting, but no. Logging every tiny thing buries the important lines in noise, and storing all that data costs real money. More logs is not better logs. Log what’s useful.
- “It’s fine to log the password, it’s just for debugging.” Never do this. Logs get stored, copied, and read by lots of people and tools. A password or card number sitting in a log file is a serious security leak. Keep secrets out of logs, always.
- “I’ll just SSH into each box and read the files.” That works for one or two servers. The moment you have ten, or a hundred, or auto-scaling copies that come and go, it falls apart. This is exactly why centralized logging exists.
- “Plain text logs are good enough.” They’re readable, sure, but you can’t search or filter them cleanly at scale. When you’ve got millions of lines, unstructured text is almost useless. Structure it.
🛠️ Design Challenge
Try this on your own to test yourself.
Imagine you run a food-delivery app. A user complains that their order was charged twice. You have three services: the app, the payment service, and the order service. Sketch out how logging would help you debug this.
- What would you log at each service, and at what level?
- Which fields would you put in each structured log line?
- How would a correlation ID help you find every log for that one order?
Write down your answers. Walking through a real failure like this is exactly how you’d reason about logging in an interview.
🧩 What You’ve Learned
You can now explain how systems keep a record of what they do. Here’s what you picked up.
- ✅ A log is a timestamped record of one event in your system, and logs are the first pillar of observability.
- ✅ Log levels (
DEBUG,INFO,WARN,ERROR) tag how serious an event is so you can filter. - ✅ Structured logging writes entries as machine-readable JSON, so you can search and filter them.
- ✅ Centralized logging ships logs from every service into one searchable place, using tools like ELK or CloudWatch.
- ✅ Correlation IDs tie together all the logs from a single request across services.
- ✅ Good habits: log meaningful events, include context, never log secrets, and pick the right level.
Check Your Knowledge
Test what you learned. Pick an answer for each question, then click Check.
- 1
What is a log?
Why: A log is a timestamped record of a single event, like a user logging in or a payment failing.
- 2
Which log level means something actually failed?
Why: ERROR marks a real failure that needs attention, like a payment failing or a database being down.
- 3
Why is structured logging (such as JSON) better than plain text?
Why: Structured logs put each detail in a labeled field, so you can run clean queries like every ERROR for a given user.
- 4
What is a correlation ID used for?
Why: A correlation ID is one shared id stamped on every log from a single request, so you can see its full journey in one search.
🚀 What’s Next?
Logs are just the first pillar of observability. Next, you’ll meet the others and go deeper.
- Monitoring Basics shows how metrics and alerts tell you something is wrong before users do.
- Distributed Tracing takes the correlation-ID idea further and follows a single request across your whole system.
Get these three together, logs, metrics, and traces, and you’ll be able to see right inside any running system.