Disjoint Set Union (DSU) Optimizations

Table of Contents +

You already know the basic Union-Find. You have a parent array. You walk up to the root. That tells you which group a node is in. It works. But here is the problem. The plain version can get slow. If you keep attaching trees the wrong way, the tree grows tall and thin. It turns into a long chain. Then every find has to walk that whole chain. For a big input that is painfully slow.

This tutorial fixes that. We add two small tricks. They keep the trees short and flat. After this, each operation runs in close to constant time. If you have not seen the basics yet, read the Union-Find intro first. Then come back here.

🐢 Why The Plain Version Gets Slow

The cost of Union-Find lives in one place. It is the height of the trees. find walks from a node up to its root. So a tall tree means a long walk.

In the basic version, when you union two sets you just point one root at the other. You never check which tree is bigger. So you might keep stacking small trees on top. The height keeps growing. In the worst case the tree becomes a straight line of n nodes. Now one find does n steps. Do that many times and you get O(n) per operation. That is bad.

We want the opposite. We want trees that are short and bushy, not tall and thin. The two optimizations below do exactly that.

⚖️ Trick 1: Union by Rank or Size

The idea is simple. When you join two trees, always attach the smaller one under the bigger one. The big tree is the new owner. This way the height grows slowly instead of fast.

There are two common ways to measure “bigger”:

Union by size — track how many nodes each tree has. Attach the tree with fewer nodes under the one with more nodes.
Union by rank — track an estimate of the tree’s height, called the rank. Attach the tree with smaller rank under the one with larger rank.

Both keep the tree flat. Size is a little easier to picture. We will use rank in the code and mention size too. The key rule is the same. The loser hangs under the winner, never the other way around.

Here is what goes wrong without it, and what goes right with it.

Tip

Union by rank and union by size give the same big win. Pick one and stick with it. Do not mix both on the same structure.

🪢 Trick 2: Path Compression

Union by rank keeps trees from getting tall. Path compression makes them even flatter while you search.

Here is the idea. When you call find on a node, you walk up to the root anyway. So why not point every node you touched straight at the root on the way back? Next time you call find on any of them, it is a single hop.

Think of it like learning a shortcut. The first time you drive to a new place you take the long road. After that you remember the direct route and go straight there.

So find is not just a lookup. It also repairs the tree as it goes. The more you use it, the flatter it gets.

⚡ Why The Two Together Are Almost O(1)

On their own each trick helps. Together they are even better.

When you use union by rank AND path compression, each operation costs O(α(n)) amortized. That α(n) is the inverse Ackermann function. You do not need the math. You just need this one fact. For any input you will ever see in real life, α(n) is less than 5. That holds even if n were the number of atoms in the universe. So it is basically a small constant.

Let me put “amortized” in plain words. It does not mean every single find is instant. The first few might do some work and flatten the tree. But average the cost over many operations and each one is tiny. So a sequence of m operations on n elements runs in about O(m · α(n)). That is nearly O(m).

Note

“Near O(1)” is not exactly O(1). It is O(α(n)). For real inputs that difference never matters. So people just say constant time.

📝 Steps For The Optimized Operations

Here is the full plan before we write code.

For setup:

Make a parent array where each node is its own parent.
Make a rank array filled with 0.

For find(x) with path compression:

If parent[x] is not x, then x is not a root.
Recursively find the real root, and set parent[x] to that root.
Return parent[x].

For union(a, b) with union by rank:

Find the root of a and the root of b.
If they are the same root, they are already joined, so stop.
Compare ranks. Attach the smaller-rank root under the larger-rank root.
If the ranks are equal, pick one as the parent and bump its rank by one.

Now the code. Each version builds the same demo. We make 7 nodes, do a few unions, then check who is connected.

[data-tabs] [role="tablist"], [data-tabs] [data-scroll] { display: none !important; } [data-tabs] [role="tabpanel"] { display: block !important; } [data-tabs] [role="tabpanel"] + [role="tabpanel"] { margin-top: 1rem; }

# set up n single-node sets
def make_set(n):
    parent = list(range(n))  # each node is its own root
    rank = [0] * n
    return parent, rank

# find root of x with path compression
def find(parent, x):
    if parent[x] != x:
        parent[x] = find(parent, parent[x])  # point x straight at the root
    return parent[x]

# join the sets of a and b using union by rank
def union(parent, rank, a, b):
    ra, rb = find(parent, a), find(parent, b)
    if ra == rb:
        return                  # already in same set
    if rank[ra] < rank[rb]:     # attach smaller under bigger
        parent[ra] = rb
    elif rank[ra] > rank[rb]:
        parent[rb] = ra
    else:                       # equal rank: pick one, bump rank
        parent[rb] = ra
        rank[ra] += 1

parent, rank = make_set(7)
union(parent, rank, 0, 1)
union(parent, rank, 1, 2)
union(parent, rank, 3, 4)
union(parent, rank, 5, 6)
union(parent, rank, 4, 6)

print("0 and 2 connected?", "yes" if find(parent, 0) == find(parent, 2) else "no")
print("3 and 5 connected?", "yes" if find(parent, 3) == find(parent, 5) else "no")
print("0 and 5 connected?", "yes" if find(parent, 0) == find(parent, 5) else "no")

The output of the above code will be:

0 and 2 connected? yes
3 and 5 connected? yes
0 and 5 connected? no

Tip

Notice find is recursive here. It walks to the root and then rewrites parent[x] on the way back. That single line is the whole path compression. You can also write it as a loop if you want to avoid deep recursion on huge inputs.

🗺️ Where You Actually Use This: Kruskal MST

The biggest real use of optimized DSU is Kruskal’s algorithm for finding a Minimum Spanning Tree (MST). An MST is the cheapest set of edges that connects every node in a weighted graph with no cycles.

Here is how Kruskal uses DSU, in plain steps:

Sort all the edges from cheapest to most expensive.
Go through them one by one. For each edge, use find to check if its two endpoints are already in the same set.
If they are in different sets, add the edge to the MST and union the two sets.
If they are already in the same set, skip it. Adding it would make a cycle.

So DSU is the part that answers one question instantly. “Would this edge create a cycle?” Without the optimizations, that check would slow the whole algorithm down. With them, it is nearly free.

Note

This same “are these two already connected, and if not, join them” pattern shows up everywhere. You see it in network connectivity, in grouping friends in a social graph, and in image segmentation.

⏱️ Complexity After Optimization

Here is the cost before and after, so the win is clear.

Version	find	union	Space
Plain (no tricks)	O(n)	O(n)	O(n)
Union by rank only	O(log n)	O(log n)	O(n)
Rank + path compression	O(α(n)) amortized	O(α(n)) amortized	O(n)

That bottom row is the goal. The α(n) is the inverse Ackermann value. For any real input it is less than 5. So treat it as constant time.

🧩 What You’ve Learned

✅ The plain Union-Find gets slow because trees grow tall, and find has to walk the whole height.
✅ Union by rank (or size) always attaches the smaller tree under the bigger one, so trees stay short.
✅ Path compression points every visited node straight at the root during find, flattening the tree as you search.
✅ Together they give O(α(n)) amortized per operation, where α(n) is the inverse Ackermann function and is less than 5 for any real input.
✅ Optimized DSU is the engine behind Kruskal’s algorithm for finding a Minimum Spanning Tree.

Check Your Knowledge

Test what you learned. Pick an answer for each question, then click Check.

🚀 What’s Next?

Union-Find (Disjoint Set)
Depth-First Search (DFS)

Previous Fenwick Tree (Binary Indexed Tree) Next KMP Algorithm (Reference)

Share & Connect

Share on LinkedIn