String Searching (Naive)
Table of Contents + β
You type βrainβ in a search box. The app has to tell you if that word sits somewhere inside a big block of text. How does it actually find it? The simplest way is naive search. Naive search just tries to match your pattern starting at every single spot in the text. It keeps trying until it finds one that fits.
In this tutorial we will build a small function. You give it a text and a pattern. It returns the first index where the pattern shows up. If the pattern is not there, it returns -1.
π What Is Naive String Searching?
Say you are reading a book and looking for the word βrainβ. You put your finger on the first letter. You check if the next few letters spell βrainβ. No? Then you move your finger one letter to the right and check again. You keep sliding until you find it or reach the end.
That sliding-and-checking is exactly what naive string searching does. It slides the pattern across the text one position at a time. At each stop it compares letter by letter.
Let us fix some words first so nothing feels confusing later.
- Text is the big string you search inside. Think of it as the full page.
- Pattern is the small string you are looking for. Think of it as the word you want to find.
- Match means every character of the pattern lines up with the text at that position.
So say the text is ABCABD and the pattern is ABD. The answer is index 3. That is where ABD starts inside the text.
Note
We count positions starting from 0. So the first character sits at index 0. The second sits at index 1. And so on. This is how arrays and strings work in almost every language.
π§ How The Sliding Works
Picture the pattern as a small window. You place that window at the start of the text. Then you compare character by character.
Here is the key thing. At every start position we do a full check of the pattern. If even one character is wrong, we stop checking that position. Then we slide one step to the right and start a fresh check.
Let us trace text = "AABAACAADAABAABA" and pattern = "AABA".
- Start at index 0. Compare
AABAwithAABA. All four match. We found it. Return 0.
That one was easy. Now imagine the pattern was AADA instead. We would fail at index 0 on the third letter. So we slide right and keep going. We reach index 6, where AADA lines up.
πͺ Steps To Search Naively
Before we write code, let us lay out the plan in plain steps. Follow these and the code will read like a story.
- Let
nbe the length of the text andmbe the length of the pattern. - Loop a start index
ifrom 0 up ton - m. We stop atn - mbecause past that point the pattern cannot fit. - For each
i, loop an inner indexjfrom 0 tom - 1. - Compare
text[i + j]withpattern[j]. If they differ, break out of the inner loop. This start position failed. - If the inner loop finished without breaking, every character matched. Return
i. - If the outer loop ends and nothing matched, return -1.
Tip
The index i goes only up to n - m. Say the pattern has 4 characters and the text has 16. Then the last place the pattern can start is index 12. Starting any later would run off the end of the text.
π» The Code In Five Languages
Now we turn those steps into a real function. The function is called naiveSearch. It takes the text and the pattern. Then it returns the first matching index, or -1.
# Returns the first index where pattern is found in text, or -1.def naive_search(text, pattern): n = len(text) m = len(pattern)
# Try every start position that still has room for the pattern. for i in range(n - m + 1): j = 0 # Compare the pattern against the text at this start. while j < m and text[i + j] == pattern[j]: j += 1 if j == m: return i # All characters matched. return -1 # Pattern not found anywhere.
text = "AABAACAADAABAABA"pattern = "AABA"
index = naive_search(text, pattern)print("Pattern first found at index:", index)The output of the above code will be:
Pattern first found at index: 0Note
Notice the same shape in every language. An outer loop picks the start. An inner loop checks the pattern. When j reaches m, every character matched. So we return the start index.
β³ How Fast Is It?
Let us think about the worst case. Imagine the text is AAAAAAAA and the pattern is AAAB. At almost every start position the first few characters match. Then the last one fails. So the inner loop runs nearly the full pattern length again and again.
Caution
In the worst case naive search does about n * m comparisons. Here n is the text length and m is the pattern length. So the time is O(n*m). For short patterns this feels fast. For long text with long patterns it gets slow.
Here is the cost in one table.
| Case | What Happens | Time |
|---|---|---|
| Best case | Pattern fails on the first character at each start | O(n) |
| Worst case | Pattern almost matches everywhere then fails at the end | O(n*m) |
| Space | Only a couple of index counters, no extra arrays | O(1) |
The reason naive search can be slow is that it forgets what it already learned. When a match fails, it slides just one step and starts over from scratch. A smarter algorithm called KMP remembers the matched part and skips ahead instead of restarting. We cover that next.
β οΈ Common Mistakes
A few small slips trip up most people the first time. Watch out for these.
- Letting the outer loop run all the way to
ninstead ofn - m. Thentext[i + j]reads past the end of the string. You get a crash or wrong data. - Forgetting the
j == mcheck after the inner loop. Without it you never confirm a full match. - Returning the wrong index. You want the start index
i, not the inner counterj. - Mixing up text and pattern. The pattern is the small one you search for. The text is the big one you search inside.
π§© What Youβve Learned
β Naive string searching slides the pattern across the text and checks character by character at each start position.
β
You stop the outer loop at n - m so the pattern never runs off the end of the text.
β The function returns the first matching index, or -1 when the pattern is not present.
β The worst case time is O(n*m). That is fine for small inputs but slow for long ones.
β KMP improves on this by remembering matched characters instead of restarting after every failure.
Check Your Knowledge
Test what you learned. Pick an answer for each question, then click Check.
- 1
What does the naive search function return when the pattern is not in the text?
Why: By design the function returns -1 to signal that the pattern was not found anywhere.
- 2
Why does the outer loop stop at index n - m instead of n?
Why: After index n - m there are fewer than m characters left, so the pattern has no room to match.
- 3
What is the worst case time complexity of naive string searching?
Why: In the worst case the pattern nearly matches at each start then fails, giving about n*m comparisons.
- 4
When does the inner loop confirm a full match?
Why: If j reaches m, every character of the pattern matched the text at that start position.