Union-Find (Disjoint Set)
Table of Contents + −
Say you have a group of friends. You keep asking one question. Are these two people in the same friend circle? If you check by walking through every connection each time, it gets slow fast. Union-Find is the trick that answers this almost instantly. So let us learn it step by step.
📚 What Is Union-Find?
Union-Find is a data structure that keeps items split into separate groups. It answers one simple question fast. Are these two items in the same group? It is also called Disjoint Set Union, or DSU for short. Disjoint just means the groups do not overlap. Each item belongs to exactly one group.
It gives you two operations:
- find — tell me which group this item is in.
- union — merge the two groups that these items belong to.
Here is the key idea. Every group picks one item to be its leader. We call this leader the representative, or the root. So when you ask “which group is this item in?”, the real answer is “who is the root of this item?”. If two items have the same root, they are in the same group. That is all there is to it.
🌳 A Real-World Way To Picture It
Think of a school with many clubs. Each club has one club president. You want to know whether Riya and Arjun are in the same club. You do not compare the whole member list. You just ask each of them “who is your president?”. Same president means same club.
When two clubs merge into one, you keep one president. You tell the other club “your president now reports to ours”. That is exactly what union does.
Inside, we store this with a parent array. Each item points to its parent. Follow the parents upward and you reach the root. The root points to itself.
Here items 0, 1 and 2 are one group with root 0. Items 3 and 4 are another group with root 3.
🎯 How find and union Work
Let us look at what each operation really does.
find(x) walks up the parent chain until it reaches the root. The root is the item that is its own parent. Then it returns that root.
union(a, b) finds the root of a and the root of b. If the roots are different, it makes one root point to the other. Now both groups share a single root. So they become one group. The path compression trick below makes find even faster.
Note
If find(a) and find(b) give the same root, the two items are already in the same group. Calling union on them changes nothing.
⚡ Making It Fast
A plain Union-Find can get slow if the parent chains grow long and thin. We add a couple of small tricks to keep it fast.
Path compression happens during find. As we walk up to the root, we point each item we pass straight at the root. So next time the chain is short. The tree gets flatter every time you search.
Union by rank happens during union. Rank is a rough measure of a tree’s height. When we merge, we attach the shorter tree under the taller one. This stops the tree from getting tall. So lookups stay quick.
Put both tricks together. Now each operation runs in almost constant time on average.
📝 Steps To Build Union-Find
Here is the plan before we write any code.
- Create a
parentarray where every item starts as its own parent. So each item is its own group at the start. - Create a
rankarray filled with zeros. This tracks tree heights. - find(x): if
xis not its own parent, setparent[x]tofind(parent[x]). This is path compression. Then returnparent[x]. - union(a, b): find both roots. If they are equal, stop. Otherwise attach the lower-rank root under the higher-rank root. If the ranks are equal, pick one as the new root and add one to its rank.
- To check if two items are connected, compare
find(a)andfind(b).
Now let us see it in all five languages.
class UnionFind: def __init__(self, n): # Each item starts as its own group self.parent = list(range(n)) self.rank = [0] * n
# find with path compression def find(self, x): if self.parent[x] != x: self.parent[x] = self.find(self.parent[x]) # point straight to the root return self.parent[x]
# union by rank def union(self, a, b): ra = self.find(a) rb = self.find(b) if ra == rb: return # already in the same group if self.rank[ra] < self.rank[rb]: self.parent[ra] = rb elif self.rank[ra] > self.rank[rb]: self.parent[rb] = ra else: self.parent[rb] = ra self.rank[ra] += 1
uf = UnionFind(5)uf.union(0, 1)uf.union(1, 2)uf.union(3, 4)
print("0 and 2 connected:", "yes" if uf.find(0) == uf.find(2) else "no")print("0 and 4 connected:", "yes" if uf.find(0) == uf.find(4) else "no")
uf.union(2, 4) # merge the two groupsprint("0 and 4 connected:", "yes" if uf.find(0) == uf.find(4) else "no")The output of the above code will be:
0 and 2 connected: yes0 and 4 connected: no0 and 4 connected: yesTip
With both path compression and union by rank, find and union run in almost constant time on average. The exact bound is O(α(n)), where α is the inverse Ackermann function. That value grows so slowly that it stays under 5 for any input you will ever use. So treat it as near O(1).
🛠️ Where Union-Find Is Used
Once you know it, you start seeing it everywhere.
- Connected components — group items that are linked together, like friends in a social network.
- Cycle detection in a graph — while adding an edge, if both ends already share a root, adding it would make a cycle.
- Kruskal’s algorithm for a Minimum Spanning Tree — it adds edges one by one and uses union to skip cycles.
- Grouping problems — like joining accounts that share an email, or merging regions in an image.
⏱️ Complexity At A Glance
The table below shows how much the two tricks help. The plain version is slow on long chains. With both speed-ups, every operation is near constant time.
| Operation | Plain version | With both speed-ups |
|---|---|---|
| find | O(n) | O(α(n)) ≈ O(1) |
| union | O(n) | O(α(n)) ≈ O(1) |
| connected check | O(n) | O(α(n)) ≈ O(1) |
| Space | O(n) | O(n) |
⚠️ Common Mistakes To Avoid
Here are the things that trip people up.
- Comparing items directly instead of comparing their roots. Always run
findon both before you decide they are in the same group. - Forgetting to check if the roots are already equal inside union. Without that check you can mess up the rank.
- Skipping path compression and union by rank on big inputs. The plain version can crawl when chains get long.
- Mixing up the parent and rank arrays, or forgetting to start each item as its own parent.
🧩 What You’ve Learned
- ✅ Union-Find (DSU) keeps items in non-overlapping groups and tells you fast whether two items share a group.
- ✅ find returns the root of an item, and union merges two groups by linking their roots.
- ✅ Two items are in the same group when
findgives them the same root. - ✅ Path compression flattens the tree during find, and union by rank keeps trees short during union.
- ✅ Together these make each operation run in near O(1) amortized time.
- ✅ It powers connected components, cycle detection, and Kruskal’s MST.
Check Your Knowledge
Test what you learned. Pick an answer for each question, then click Check.
- 1
What does the find operation return for an item?
Why: find walks up the parent chain and returns the root, which represents the whole group.
- 2
How do you check if two items are in the same group?
Why: Two items share a group only when find returns the same root for both.
- 3
What does path compression do?
Why: During find, path compression reattaches visited items directly to the root, flattening the tree.
- 4
With path compression and union by rank, what is the amortized cost per operation?
Why: The inverse Ackermann function α(n) stays tiny, so each operation is effectively constant time.