String Compression
Table of Contents + β
This one looks easy. You read a string. You shrink it. But the interviewer is not really watching the shrinking. They want to see if you can walk a string just once and count things as you go. So the real test here is whether you can scan in one pass and keep a count without getting confused.
Let me show you what we are building.
π€ The Problem
You get a string. You have to compress it using run-length style. Run-length means you take each group of the same character that sits together. Then you write the character followed by how many times it repeats. So aaabbc becomes a3b2c1.
Here we keep the count even when it is 1. So a single c becomes c1. I will say this out loud in the interview too. Some versions drop the 1. So always state your rule first.
Input: "aaabbc"Output: "a3b2c1"
Input: "aabcccccaaa"Output: "a2b1c5a3"See the trick? The same character can come back later, like the a at the end. That is a fresh group. We do not jump around the string looking for every a. We only ever compare a character with the one right before it.
π’ The Brute Idea First
A beginner often thinks like this. For each unique character, scan the whole string and count how many times it appears. Then write the character and the total.
This breaks the run-length rule. It would turn aabcccccaaa into one big a5, because it counts all the as together. But run-length wants a2 at the start and a3 at the end. Those are two separate groups. So counting globally is wrong here.
Even if it were allowed, it is slow. For each character you walk the whole string again. That is O(nΒ²) work for nothing.
π The Optimal Idea
We walk the string left to right one time. We keep a running count of the current group. A group is just a run of the same character sitting next to each other.
Here is how we think about it while moving:
- Look at the current character and compare it with the previous one.
- If it is the same, the group continues, so add one to the count.
- If it is different, the old group just ended. Write down the old character and its count. Then start a new group with count 1.
- When the string finishes, the last group is still open, so write it down too.
That last step matters. Many people forget the final group and lose the end of the answer.
β Steps to Compress
These are the steps we will turn into code next.
- If the string is empty, return an empty string right away.
- Start with the first character as the current group. Set count to 1.
- Move from the second character to the end.
- If this character equals the previous one, add 1 to count.
- If it is different, append the previous character and its count to the result. Then reset the current character to this one and count to 1.
- After the loop ends, append the last character and its count.
- Return the built result.
Now let me show the code in all five languages.
# Compress a string using run-length style.def compress(s): if not s: return ""
result = [] cur = s[0] # current group character count = 1 # how many times it repeats
for ch in s[1:]: if ch == cur: count += 1 # same group else: result.append(cur + str(count)) # close old group cur = ch count = 1 result.append(cur + str(count)) # last group return "".join(result)
print(compress("aabcccccaaa"))The output of the above code will be:
a2b1c5a3β±οΈ Time and Space Complexity
We touch every character once, so the time is linear. We build a new string for the answer, so the space depends on how big that answer is.
| Approach | Time | Space |
|---|---|---|
| Count each character globally (wrong + slow) | O(nΒ²) | O(n) |
| Single pass with running count (optimal) | O(n) | O(n) |
Tip
A real interviewer often adds a twist. They ask you to return the original string if the compressed one is not shorter. So for abc the compressed form a1b1c1 is longer, and you would return abc instead. Mention this out loud. It shows you think about edge cases before they ask.
π§© Key Takeaways
- β Run-length compression means writing each character followed by how many times it repeats in a row.
- β One left-to-right pass with a running count solves it in O(n) time.
- β You only compare a character with the one right before it. The same character later is a brand new group.
- β Never forget to write the final group after the loop ends.
- β
State your rule early: do singles become
c1or justc?
Check Your Knowledge
Test what you learned. Pick an answer for each question, then click Check.
- 1
Using the rule where singles keep their count, what does "aaabbc" compress to?
Why: Three a's, two b's, and one c, so a3b2c1 because we keep the count even when it is 1.
- 2
Why does the single-pass approach beat counting each character across the whole string?
Why: One pass is linear time, and it correctly treats a returning character as a new group instead of merging all of them.
- 3
What common mistake makes you lose the end of the answer?
Why: The final group is still open when the loop ends, so you must append it after the loop.
- 4
What does "aabcccccaaa" compress to?
Why: Two a's, one b, five c's, then three a's as a separate group gives a2b1c5a3.