Range, List & Hash Partitioning
Table of Contents + −
You already know the idea of partitioning: one giant table is slow and hard to manage, so you split it into smaller pieces called partitions. But that leaves one real question hanging:
- When a new row comes in, how do you decide which partition it goes to?
That decision is the whole game. Pick well and your data spreads out evenly and queries stay fast. Pick badly and one partition gets overloaded while the others sit empty. There are three common ways to make that choice: range, list, and hash. Let’s go through each with a simple example.
🎯 The Partition Key
Before the three methods, one shared idea. Every method picks a partition based on one column, called the partition key. It might be the date, the user’s country, or the user ID. The method is just the rule for turning that key into “this row goes to partition 3”.
So as we go, keep asking: what’s the key, and what’s the rule?
📏 Range Partitioning
Range partitioning splits rows by ranges of the key, usually something that increases like a date or a number. Each partition holds one slice of the range.
Say you have an orders table. You could split it by order date:
- Partition 1: orders from January
- Partition 2: orders from February
- Partition 3: orders from March
So a new April order would go to a new April partition. Here’s what’s good and bad about this:
- Good: range queries are fast. “Give me all of March’s orders” hits exactly one partition.
- Good: deleting old data is easy. Drop the whole January partition when it’s too old.
- Bad: data can pile up unevenly. If this month is much busier than last, one partition gets huge while older ones sit quiet. That busy partition is called a hot partition.
Range shines for time-based data
Range partitioning is the natural fit when you mostly query by time, like logs, orders, or events. You usually want a recent slice, and that’s exactly one partition.
📋 List Partitioning
List partitioning splits rows by a fixed list of values. You decide which values go to which partition by hand.
The classic example is splitting by region or status. Say you split users by country:
- Partition 1: India
- Partition 2: USA
- Partition 3: everyone else
So a new user from the USA goes straight to partition 2. The trade-offs:
- Good: it’s clear and predictable. You know exactly where each group lives, which is handy for rules like “keep European data in Europe.”
- Good: queries for one group are fast, since that group is one partition.
- Bad: the groups are rarely the same size. If most of your users are from one country, that partition gets overloaded while others stay small.
So list partitioning is great when your data falls into clear, meaningful buckets, but watch out for one bucket growing far bigger than the rest.
🔢 Hash Partitioning
Hash partitioning takes a different goal: spread rows as evenly as possible, without caring about meaning. It runs the key through a hash function (a small math formula that turns any value into a number) and uses that number to pick the partition.
Say you partition users by user ID across 4 partitions. The system computes hash(user_id) % 4 and that gives a partition number from 0 to 3. Here’s the lead-in to the simple idea in code:
partition = hash(user_id) % 4# user 1001 -> hash -> 2 -> partition 2# user 1002 -> hash -> 0 -> partition 0# user 1003 -> hash -> 3 -> partition 3Reading that: each user ID gets turned into a number, and the remainder after dividing by 4 picks the partition. Since hashes scatter values, users spread out roughly evenly. The trade-offs:
- Good: very even spread. No single partition gets overloaded just because of busy values.
- Bad: range queries are slow. “Give me all users created last week” can’t hit one partition, because related rows are scattered everywhere.
- Bad: adding a partition is painful. Change 4 to 5 and the math changes for almost every row, so lots of data has to move. (This is why systems often use a smarter version called consistent hashing.)
Hash spreads data, but hides order
Hash partitioning is the best at even spread, but it throws away any natural order. If your app does a lot of range queries, hashing will make them slow. Match the method to how you actually read the data.
📊 Quick Comparison
Here’s the three side by side, so you can pick fast.
| Method | Splits by | Best for | Main weakness |
|---|---|---|---|
| Range | Slices of a value (dates, numbers) | Time-based data, range queries | Uneven load, hot partitions |
| List | A fixed list of values (region, status) | Clear, meaningful groups | Groups are often unequal sizes |
| Hash | A hash of the key | Even spread of load | Slow range queries, hard to resize |
⚠️ Common Mistakes and Misconceptions
A few things to keep straight:
- “Hash partitioning is always best because it’s even.” Only if you don’t do range queries. If you often read by date or by order, hashing makes those reads slow. Even spread isn’t the only goal.
- “Range partitioning spreads load evenly.” Usually it doesn’t. Recent data is often the busiest, so the newest partition runs hot while old ones idle.
- “Picking the partition key doesn’t matter much.” It matters more than almost anything. A bad key gives you hot partitions no matter which method you use.
🧩 What You’ve Learned
Nice work. Here’s the recap:
- ✅ Every method uses a partition key plus a rule to decide where a row goes.
- ✅ Range splits by slices of a value, great for time-based data but can run uneven.
- ✅ List splits by a fixed set of values, clear but often unequal in size.
- ✅ Hash scatters data evenly, but slows range queries and is hard to resize.
- ✅ The partition key choice matters most; a bad one causes hot partitions either way.
Check Your Knowledge
Test what you learned. Pick an answer for each question, then click Check.
- 1
Which partitioning method is best for time-based data and range queries?
Why: Range splits by slices like months, so a query for one month hits one partition. Great for logs and orders by date.
- 2
What is the main weakness of hash partitioning?
Why: Hashing scatters data evenly but throws away order, so range queries are slow, and changing the partition count moves lots of data.
- 3
A partition that gets far more load than the others is called a what?
Why: A hot partition becomes a bottleneck while others sit idle. It is often caused by a poor partition key.
- 4
List partitioning splits rows by what?
Why: List partitioning assigns chosen values (like country) to partitions. Clear, but groups are often unequal in size.
🚀 What’s Next?
You now know how data gets split. Let’s see how systems keep those pieces healthy as they grow.
- Database Sharding applies these ideas across many separate servers.
- Handling Database Bottlenecks shows how to spot and fix the partition that’s running hot.
Get these down and you’ll understand how big databases stay fast under real load.