Hash Functions
Table of Contents + β
Say you have a name like βRiyaβ and you want to store it in an array. But arrays only understand numbers. You canβt tell an array βput this at slot Riyaβ. So how do you turn a word into a slot number? That is the exact job of a hash function. A hash function takes a key and gives back a number. You use that number as an array index.
π What Is a Hash Function?
A hash function is just a small piece of code. You give it a key. A key is the thing you want to store, like a name or a word. The function gives you back a number. That number tells you which slot in the array to use.
So the flow is simple:
- You start with a key. This is the thing you want to store, like βRiyaβ.
- The hash function reads that key.
- It returns an index. This is a plain number, like 3.
- You store the value at that index in the array.
The array we store things in is called a hash table. It is just a normal array. The only difference is that we decide the slot using a hash function. We do not pick the slot ourselves.
π A Real-World Way to Think About It
Think about a big locker room at a gym. There are 100 lockers. When you walk in, the staff does not let you pick any locker you like. They look at the first letter of your name. Then they send you to a fixed locker.
So βArjunβ always goes to the same locker. Every single time. You never search around. You go straight there.
That is hashing. The staffβs rule is the hash function. The locker number is the index. The rule always sends βArjunβ to the same place. So you can find your stuff fast later.
Note
The big win of hashing is speed. Once the hash function tells you the index, you go straight to that slot. You do not check every item one by one. That is why hash tables are so fast for lookups.
π― Why We Even Need a Hash Function
Here is the pain. Imagine you store names in a normal array with no rule. Later you want to find βRiyaβ. You have to walk through the array. You compare every name until you find it. If there are a million names, that is slow.
The hash function fixes this. You do not search at all. You just compute the index from the key and jump straight there.
But the hash function has to follow some rules to be useful. Let us look at those next.
β What Makes a Good Hash Function
A hash function is not just any random math. A good one has a few clear qualities.
- It is fast. You call it on every store and every lookup. So it must run quickly. No heavy work inside.
- It is consistent. The same key must always give the same index. Say βRiyaβ gives 3 today and 7 tomorrow. Then you can never find your data again.
- It spreads keys evenly. Different keys should land in different slots as much as possible. You do not want everyone crowding into slot 0.
That last point is the one people get wrong. Sometimes two different keys land on the same index. We call that a collision. A collision is when two keys want the same slot. Some collisions are normal. But a good hash function keeps them rare. It does this by spreading keys all over the table.
Tip
Here is a simple test for even spreading. Imagine throwing 100 keys at a table of 100 slots. A good hash function fills the slots roughly one each. A bad one dumps 90 keys into 10 slots and leaves the rest empty.
π’ A Simple Hash Function, Step by Step
Let us build the most common beginner hash function. It works on strings. The idea has two small steps.
First, turn the string into a number. Every character has a number code behind it. For example, βAβ is 65 and βBβ is 66, and so on. This is called the character code. So we walk through the string and add up all the character codes. Now we have one big number for the whole string.
Second, that big number could be huge. But our table only has, say, 10 slots. So we use modulo to fold it into range. Modulo is the remainder after division. We do sum % tableSize. The remainder is always between 0 and tableSize minus 1. That is perfect for an index.
So the full recipe is:
- Start a running total at 0.
- For each character in the key, add its character code to the total.
- Return
total % tableSize.
Let us trace it by hand for the word βcatβ with a table size of 10. The codes are c = 99, a = 97, t = 116. The sum is 99 + 97 + 116 = 312. Then 312 % 10 = 2. So βcatβ goes to slot 2.
π» The Hash Function in Code
Here is that same hash function in all five languages. Each one takes a string and a table size. Then it returns an index. We test it with a few words so you can see the numbers it produces.
# Add up character codes, then fold into table range with modulodef hash_key(key, table_size): total = 0 for ch in key: total += ord(ch) # ord gives the character code return total % table_size # squeeze into 0..table_size-1
table_size = 10print("cat ->", hash_key("cat", table_size))print("dog ->", hash_key("dog", table_size))print("Riya ->", hash_key("Riya", table_size))The output of the above code will be:
cat -> 2dog -> 4Riya -> 5Note
See how each word gets a steady index? Run it again and βcatβ is still 2. That is the consistent part. Same key, same index, every time.
β οΈ When a Hash Function Is Bad
Now let us see the other side. A bad hash function still returns a number. But it returns bad numbers. They pile up in a few slots.
Take our sum-of-codes idea but with a tiny twist. What if you only used the first letter of the string? Then βcatβ, βcarβ, and βcupβ all start with βcβ. They all give the same index. Three keys, one slot. That is a lot of collisions right away.
Here is what bad hash functions usually do wrong:
- They ignore most of the key. Using only the first letter throws away information. Less information means more keys clashing.
- They pick a weak table size. Say your table size shares factors with your sums. Then keys bunch up. Many people use a prime number as the table size. A prime spreads things better.
- They return the same index too often. When most keys map to a handful of slots, your fast lookup turns slow again. You are back to searching long chains.
Caution
A collision is not a crash. The hash table keeps working. But every collision adds extra work. Now two keys share a slot, so the table has to tell them apart. Too many collisions, and your hash table is no faster than a plain list.
Here is a small picture of the difference. On the left, a good hash function spreads keys out. On the right, a bad one crowds them.
π Good vs Bad Hash Function
Here is a quick side-by-side so the qualities stick.
| Quality | Good Hash Function | Bad Hash Function |
|---|---|---|
| Speed | Runs fast on every call | Does heavy work each time |
| Consistency | Same key gives same index | Index can change for same key |
| Spreading | Keys land all over the table | Keys pile into a few slots |
| Collisions | Rare | Very common |
π§© What Youβve Learned
- β A hash function takes a key and returns a number you use as an array index.
- β The array we store into is called a hash table.
- β A good hash function is fast, gives the same index for the same key, and spreads keys evenly.
- β
A common beginner hash sums the character codes of a string, then takes
sum % tableSize. - β When two keys land on the same index, that is a collision.
- β A bad hash function causes many collisions, which makes lookups slow again.
Check Your Knowledge
Test what you learned. Pick an answer for each question, then click Check.
- 1
What does a hash function return?
Why: A hash function turns a key into a number that tells you which slot of the array to use.
- 2
Why do we use modulo (sum % tableSize) in a string hash function?
Why: Modulo gives the remainder, which always falls inside the table's index range.
- 3
Which is a quality of a GOOD hash function?
Why: Spreading keys evenly keeps collisions rare and lookups fast.
- 4
What is a collision?
Why: A collision happens when two different keys are sent to the same slot by the hash function.