String Pattern Matching with KMP

Table of Contents +

Say you are looking for a small word inside a big piece of text. You want to know where the word “abcab” shows up inside “abcabcabxabcab”. The simple way is to line the word up at the start. You check it letter by letter. If something does not match, you move one step to the right and start checking from the beginning again. That restart is the real problem. You throw away everything you already learned. KMP fixes exactly that. It never re-checks letters it has already matched. Let us go slow and build it up together.

🎯 The Problem with Naive Search

First let us name the pain clearly. The naive way searches by brute force.

You place the pattern at position 0 of the text.
You compare letter by letter.
On a mismatch, you slide the pattern just one step to the right.
Then you start comparing from the very first letter of the pattern again.

That last step is the waste. Imagine the text is “AAAAAB” and the pattern is “AAAB”. You match “AAA”, and then the fourth letter fails. The naive method shifts by one and compares letters it already knew were a match. For a long text with lots of repeats, this gets slow. In the worst case it does about n times m comparisons. Here n is the text length and m is the pattern length.

KMP is short for Knuth-Morris-Pratt, the three people who came up with it. Its whole trick is simple. When a mismatch happens, KMP already knows that some letters at the start of the pattern matched. So instead of starting again from zero, it jumps the pattern forward to the next sensible spot. It never re-checks those letters.

🧩 The Key Idea: Prefix that is also a Suffix

To slide smartly, KMP needs one piece of information about the pattern. For every position in the pattern, it asks one simple question.

Take the part of the pattern matched so far. What is the longest chunk at its start that is exactly the same as the chunk at its end?

That question has a name. A prefix is any chunk taken from the start of a string. A suffix is any chunk taken from the end. We want the longest prefix that is also a suffix. But it cannot be the whole string itself.

Let us look at the word “ababa”.

String	Prefixes (not whole)	Suffixes (not whole)	Longest match
”a”	none	none	0
”ab"	"a"	"b”	0
”aba"	"a”, “ab"	"a”, “ba”	1 (“a”)
“abab"	"a”, “ab”, “aba"	"b”, “ab”, “bab”	2 (“ab”)
“ababa"	"a”, “ab”, “aba”, “abab"	"a”, “ba”, “aba”, “baba”	3 (“aba”)

So why does this matter? When the pattern fails at some point, the part before the failure already matched the text. Say that matched part has a prefix that is also a suffix. Then that prefix is already sitting in the text in the right place. We can reuse it. We do not check those letters again. We just continue from there.

📚 The LPS Table

We store that answer for every position in an array. We call it the LPS array. LPS stands for Longest Prefix that is also Suffix.

So lps[i] is the length of the longest proper prefix of the pattern, up to and including index i, that is also a suffix of that same chunk. “Proper” just means we do not count the whole chunk itself.

For the pattern “ababa”, the LPS array looks like this.

Note

Pattern: a b a b a

Index: 0 1 2 3 4

LPS: 0 0 1 2 3

Read it slowly. At index 2 the chunk is “aba”, and its longest prefix-suffix is “a”, so the value is 1. At index 4 the chunk is “ababa”, and the answer is “aba”, which has length 3. This single array is the brain of KMP.

⚙️ How KMP Searches, Step by Step

Now for the search. We walk through the text with one pointer i. We walk through the pattern with another pointer j. Here is the flow in points.

If text[i] equals pattern[j], the two letters match. Move both i and j forward by one.
If j reaches the pattern length, we found a full match. Record the position. Then set j = lps[j-1] to keep searching for more matches.
If the letters do not match and j is greater than 0, do not move i. Instead set j = lps[j-1]. This jumps the pattern to reuse the prefix we already matched.
If the letters do not match and j is 0, there is nothing to reuse. Move i forward by one.

Notice the important thing here. On a mismatch, i almost never goes backward. The text pointer only moves forward. That is where the speed comes from.

Let us trace a tiny example. The text is “ababcabab” and the pattern is “abab” with LPS 0 0 1 2.

text:    a b a b c a b a b
index:   0 1 2 3 4 5 6 7 8

i=0 j=0  a==a  -> i=1 j=1
i=1 j=1  b==b  -> i=2 j=2
i=2 j=2  a==a  -> i=3 j=3
i=3 j=3  b==b  -> j hits 4 = pattern length -> MATCH at index 0
         set j = lps[3] = 2
i=4 j=2  text 'c' vs pattern 'a'  mismatch, j>0 -> j = lps[1] = 0
i=4 j=0  text 'c' vs pattern 'a'  mismatch, j==0 -> i=5
i=5 j=0  a==a -> i=6 j=1
... continues forward, never re-reading old text

See how i kept moving right and never went back? That is the whole win.

💻 KMP in Code

Here is the full thing in five languages. We build the LPS array first, and then we run the search. The search prints every starting index where the pattern is found.

[data-tabs] [role="tablist"], [data-tabs] [data-scroll] { display: none !important; } [data-tabs] [role="tabpanel"] { display: block !important; } [data-tabs] [role="tabpanel"] + [role="tabpanel"] { margin-top: 1rem; }

# Build the LPS array for the pattern
def build_lps(pat):
    m = len(pat)
    lps = [0] * m
    length = 0          # length of the current prefix-suffix
    i = 1
    while i < m:
        if pat[i] == pat[length]:
            length += 1
            lps[i] = length
            i += 1
        elif length != 0:
            length = lps[length - 1]   # fall back, do not move i
        else:
            lps[i] = 0
            i += 1
    return lps


# Search and print every match index
def kmp_search(text, pat):
    n, m = len(text), len(pat)
    lps = build_lps(pat)
    i = j = 0
    while i < n:
        if text[i] == pat[j]:
            i += 1
            j += 1
            if j == m:
                print("Found at index", i - j)
                j = lps[j - 1]
        elif j != 0:
            j = lps[j - 1]
        else:
            i += 1


kmp_search("abxabcabcaby", "abcaby")

The output of the above code will be:

Found at index 6

Tip

The fall-back line j = lps[j - 1] is the heart of KMP. When a letter does not match, it does not throw away the matched part. It looks up the LPS value and jumps the pattern to the next useful spot. The text pointer i stays put and never moves backward.

⏱️ Why KMP is Fast

Let us put a number on it. Say n is the text length and m is the pattern length.

Building the LPS array walks the pattern once, so that costs O(m).
The search walks the text. The pointer i only moves forward and never goes back. So the search costs O(n).
Add the two together, and KMP runs in O(n + m) time overall.

That is much better than the naive O(n times m). KMP uses O(m) extra space for the LPS array.

Note

KMP time complexity is O(n + m). The naive method can hit O(n times m) on bad inputs, like long runs of repeated letters. KMP avoids that because it never re-reads text it has already passed.

Here is the comparison in one table.

Approach	Time	Extra Space	Re-checks text?
Naive search	O(n times m)	O(1)	Yes
KMP	O(n + m)	O(m)	No

⚠️ Common Mistakes

A few things trip people up the first time. Watch out for these.

Moving the text pointer i backward on a mismatch. In KMP, i only goes forward. Only j falls back using the LPS array.
Using lps[j] instead of lps[j - 1] on a mismatch. You always look up the value for the letter just before the one that failed.
Forgetting to reset j = lps[j - 1] after a full match. Without it, you miss patterns that overlap.
Building the LPS array from the text. The LPS array is built only from the pattern. The text is never part of it.
Setting lps[0] to anything but 0. A single letter has no proper prefix that is also a suffix, so it is always 0.

🧩 What You’ve Learned

✅ Naive search wastes work because it re-checks letters it already matched after every mismatch.

✅ KMP avoids that by precomputing the LPS array, the longest prefix of the pattern that is also a suffix at each index.

✅ On a mismatch, KMP sets j = lps[j - 1] to slide the pattern smartly, and the text pointer i never moves backward.

✅ Building LPS is O(m) and the search is O(n), so KMP runs in O(n + m) time with O(m) extra space.

✅ Common slip-ups are moving i back, using the wrong LPS index, and building LPS from the text instead of the pattern.

Check Your Knowledge

Test what you learned. Pick an answer for each question, then click Check.

🚀 What’s Next?

Big O Notation explains what those O(…) symbols mean, so the O(n + m) you saw here makes full sense.
How to Analyze Algorithms walks you through reading code and working out its running time, just like we did for KMP.

Previous String Searching (Naive) Next Linear Search

Share & Connect

Share on LinkedIn