Collision Handling

A hash table is fast because it turns a key into an array index. You give it a key. The hash function gives you a number. That number is the slot where your value lives. So far so good. But here is the problem. Two different keys can land on the same slot. Now what? They both want that one spot. This is called a collision. If you do not handle it, the second key overwrites the first one and you lose data.

In this tutorial you will learn what a collision is. You will also learn the two main ways to fix it. Do not worry. Both are simple once you see the picture.

πŸ€” What Is a Collision?

A collision happens when two different keys hash to the same index. That is it. The hash function is supposed to spread keys across slots. But it cannot give every key its own slot. There are only so many slots and far more possible keys.

Think of a small school with 10 lockers, numbered 0 to 9. The rule for picking a locker is simple. Take the first letter of the student’s name and use its position. Now Alex and Arjun both start with β€œA”. So they both get sent to the same locker. That clash is a collision.

Here is a quick example with numbers. Say our table has 10 slots. The hash function is key % 10.

  • The key 15 goes to slot 15 % 10 = 5.
  • The key 25 also goes to slot 25 % 10 = 5.

Two keys, same slot. We need a plan for what to do when this happens.

Note

Collisions are normal. They are not a bug or a rare edge case. Any real hash table will have collisions. The whole job is to handle them well, not to avoid them completely.

πŸ”— Fix 1: Chaining

The first fix is called chaining. The idea is simple. Instead of each slot holding one value, each slot holds a small list. When two keys land on the same slot, you just add both to that slot’s list.

So the slot is no longer a single box. It is the head of a linked list. Every entry that hashes to that slot gets added to the same list.

Let us put 15, 25, and 7 into a table of 10 slots using key % 10.

  • 15 goes to slot 5.
  • 25 also goes to slot 5, so it joins the list there.
  • 7 goes to slot 7, alone.

Here is the picture:

Table

Slot 5

Slot 7

15

25

7

To find a key, you hash it to get the slot. Then you walk down that slot’s list and compare each entry. To insert, you hash the key and add it to that slot’s list. To delete, you hash, find it in the list, and unlink it.

So what is good and bad about chaining? It is easy to write and easy to understand. The table never fills up. You can keep adding to a list even when slots run low. But there is a cost. Each entry needs extra memory for the link pointer. And if many keys land on one slot, that list gets long. A long list is slow to search.

A tiny chaining example in code

Let us build a very small hash table that uses chaining. We use 10 slots and the hash key % 10. We insert a few keys, then search for one. Watch how 15 and 25 share slot 5.

chaining.py
SIZE = 10
# Each slot holds a list of keys.
table = [[] for _ in range(SIZE)]
def hash_key(key):
return key % SIZE # pick the slot
def insert(key):
table[hash_key(key)].append(key)
def search(key):
return key in table[hash_key(key)]
insert(15)
insert(25) # collides with 15 at slot 5
insert(7)
print("Slot for 15 and 25:", hash_key(15), "and", hash_key(25))
print("Is 25 present?", "yes" if search(25) else "no")
print("Is 99 present?", "yes" if search(99) else "no")

The output of the above code will be:

Slot for 15 and 25: 5 and 5
Is 25 present? yes
Is 99 present? no

See how 15 and 25 both report slot 5, yet both can still be found? That is chaining doing its job.

🎯 Fix 2: Open Addressing

The second fix is called open addressing. Here there are no lists. Every entry lives directly in the array. So when a key’s slot is already taken, you do not make a list. Instead you look for the next free slot.

This search for a free slot is called probing. The simplest kind is linear probing. With linear probing you just check the next slot, then the next, then the next, until you find an empty one.

Let us insert 15, 25, and 35 into a table of 10 slots with key % 10.

  • 15 wants slot 5. It is empty, so it goes in slot 5.
  • 25 wants slot 5. It is taken, so we check slot 6. Slot 6 is empty, so 25 goes there.
  • 35 wants slot 5. Taken. Slot 6? Taken. Slot 7? Empty, so 35 goes in slot 7.

Here is the picture of the array after all three inserts:

Slot 4: empty

Slot 5: 15

Slot 6: 25

Slot 7: 35

Slot 8: empty

To search for a key, you do the same walk. Hash to the starting slot. Then step forward until you either find the key or hit an empty slot. An empty slot means the key is not there.

So what is good and bad about open addressing? Everything sits in one array. There are no extra pointers, so memory stays compact. It is also friendly to the CPU cache because nearby slots are checked together, and that makes it fast in practice. But there is a cost. The table can fill up. Once it is full, you cannot insert until you grow it. And when the table gets crowded, probes get long because you walk past many taken slots.

Caution

With linear probing, deleting is tricky. If you just empty a slot, a later search might hit that empty slot and stop too early. It would then miss a key that was pushed further down. The usual fix is to mark the slot as β€œdeleted” instead of truly empty.

βš–οΈ Chaining vs Open Addressing

Both fixes solve the same problem. They just make different trade-offs. Here is a side-by-side look.

Point Chaining Open Addressing
Where entries live In linked lists outside the array Directly inside the array
Extra memory Yes, one pointer per entry No pointers needed
Can the table fill up? No, lists just grow Yes, must resize when full
Cache friendliness Lower, lists are scattered Higher, slots sit together
Deletion Simple, just unlink the node Tricky, need a deleted marker

⏱️ A Note on Speed

People love hash tables because lookups are O(1) on average. That word β€œaverage” matters here. The speed depends on how well your keys spread out.

With a good hash function, keys spread evenly. Lists stay short. Probes stay short. So search, insert, and delete are O(1) on average. With a bad hash function, many keys pile onto one slot. Now one list holds almost everything, or one probe run is very long. Search drops to O(n), the same as scanning a plain list.

So collision handling and a good hash function work together. The handling decides what happens on a clash. The hash function decides how often clashes happen. One thing that controls this is the load factor, which is the number of entries divided by the number of slots. When the load factor gets high, collisions get common. So most tables resize and rehash to keep it low.

Tip

A common rule of thumb is to grow the table once the load factor passes around 0.75. Keeping the table less than three-quarters full keeps collisions rare and lookups fast.

🧩 What You’ve Learned

  • βœ… A collision is when two different keys hash to the same index. It is normal, not a bug.
  • βœ… Chaining stores each slot as a linked list, so colliding keys simply join the same list.
  • βœ… Open addressing keeps everything in one array and uses probing. Linear probing checks the next slot until it finds a free one.
  • βœ… Chaining is simple and never fills up but uses extra pointers. Open addressing is compact and cache friendly but must resize and handle deletion carefully.
  • βœ… Good hashing keeps operations O(1) on average. Bad hashing drops them to O(n).
  • βœ… The load factor controls how often collisions happen, so tables resize to keep it low.

Check Your Knowledge

Test what you learned. Pick an answer for each question, then click Check.

  1. 1

    What is a hash collision?

    Why: A collision happens when the hash function maps two different keys to the same slot.

  2. 2

    In chaining, what does each slot of the table hold?

    Why: With chaining, each slot is the head of a linked list, so all colliding keys join that list.

  3. 3

    In open addressing with linear probing, what happens when a key's target slot is already taken?

    Why: Linear probing steps forward slot by slot until it finds a free slot for the key.

  4. 4

    When does hash table search degrade to O(n)?

    Why: A bad hash function creates one long list or probe run, so search becomes a linear scan.

πŸš€ What’s Next?

Share & Connect