Disjoint Sets
Basic Concept and Interface
Two sets are named disjoint sets if they have no elements in common.
A Disjoint-Sets (or Union-Find) data structure keeps track of a fixed number of elements partitioned into a number of disjoint sets. Operations:
connect(x, y)
: connectx
andy
. Also known asunion
isConnected(x, y)
: returns true ifx
andy
are connected (i.e. part of the same set).
connect(A, B)
- Before calling:
{A}, {B}, {C}, {D}
isConnected(A, B) -> false
- After calling:
{A, B}, {C}, {D}
isConnected(A, B) -> true
Sets of non-negative integers
public interface DisjointSets {
/** connects two items P and Q */
void connect(int p, int q);
/** checks to see if two items are connected */
boolean isConnected(int p, int q);
}
Quick Find
Using a single array of integers:
- The indices of the array represent the elements of our set.
- The value at an index is the set number it belongs to.
_____________
int[] id |4|4|4|5|4|5|6|
‾‾‾‾‾‾‾‾‾‾‾‾‾
0 1 2 3 4 5 6
Set 4: {0, 1, 2, 4}
Set 5: {3, 5}
Set 6: {6}
The specific set number doesn't matter as long as all elements in the same set share the same id.
connect(x, y)
After connect(2, 3):
_____________
int[] id |5|5|5|5|5|5|6|
^‾^‾^‾‾‾^‾‾‾‾
0 1 2 3 4 5 6
Set 5: {0, 1, 2, 3, 4, 5}
Set 6: {6}
isConnected(x, y)
id[x] == id[y]
Code & Runtimes
public class QuickFindDS implements DisjointSets {
private int[] id;
/* Θ(N) */
public QuickFindDS(int N){
id = new int[N];
for (int i = 0; i < N; i++){
id[i] = i;
}
}
/* need to iterate through the array => Θ(N) */
public void connect(int p, int q){
int pid = id[p];
int qid = id[q];
for (int i = 0; i < id.length; i++){
if (id[i] == pid){
id[i] = qid;
}
}
}
/* Θ(1) */
public boolean isConnected(int p, int q){
return (id[p] == id[q]);
}
}
Implementation | Constructor | connect | isConnected |
---|---|---|---|
ListOfSets | Θ(N) | O(N) | O(N) |
QuickFind | Θ(N) | Θ(N) | Θ(1) |
Quick Union
We treat the array as trees, or a forest.
- Each item: index of its parent
- Root: negative value
┌──┬──┬──┬──┬──┬──┬──┐
parent │-1│0 │1 │-1│0 │3 │-1│
└──┴──┴──┴──┴──┴──┴──┘
0 1 2 3 4 5 6
┌ 0 ┐ 3 ┐ 6
4 1 ┐ 5
2
"root 0" "root 3" "root 6"
find(int item)
For QuickUnion we define a helper function find(int item)
which returns the root of the tree item
is in.
connect(x, y)
- Find the root of their respective trees
- Make one the child of other
parent[find(y)] = find(x)
Calling connect(5, 2):
┌──┬──┬──┬──┬──┬──┬──┐
parent │-1│0 │1 │0 │0 │3 │-1│
└──┴──┴──┴^^┴──┴──┴──┘
0 1 2 3 4 5 6
- Best case:
x
,y
are both root, Θ(1) - Worst case: Linear tree, Θ(N)
- Upper bound: O(N)
isConnected
find(x) == find(y)
Code & Runtimes
public class QuickUnionDS implements DisjointSets {
private int[] parent;
public QuickUnionDS(int num) {
parent = new int[num];
for (int i = 0; i < num; i++) {
parent[i] = -1;
}
}
private int find(int p) {
while (parent[p] >= 0) {
p = parent[p];
}
return p;
}
@Override
public void connect(int p, int q) {
int i = find(p);
int j= find(q);
parent[i] = j;
}
@Override
public boolean isConnected(int p, int q) {
return find(p) == find(q);
}
}
Implementation | Constructor | connect | isConnected |
---|---|---|---|
QuickUnion | Θ(N) | O(N) | O(N) |
QuickFind | Θ(N) | Θ(N) | Θ(1) |
QuickUnion | Θ(N) | O(N) | O(N) |
- N = number of elements in our DisjointSets data structure
Weighted Quick Union (WQU)
- The shorter the tree the faster it takes
- New rule: whenever we call
connect
, we always link the root of the smaller tree to the larger tree. - Maximum height logN
- Determine tree weight: use root value
-(size of tree)
Maximum height: Log N
Imagine any element
The tree with
So we can double up to
Code
In connect(x, y)
:
if (xRoot < yRoot) {
parent[yRoot] = xRoot;
} else {
parent[xRoot] = yRoot;
}
Runtimes
Implementation | Constructor | connect | isConnected |
---|---|---|---|
QuickUnion | Θ(N) | O(N) | O(N) |
QuickFind | Θ(N) | Θ(N) | Θ(1) |
QuickUnion | Θ(N) | O(N) | O(N) |
Weighted Quick Union | Θ(N) | O(log N) | O(log N) |
Weighted Quick Union with Path Compression
Whenever we call find(x)
we have to traverse the path from x
to root. So, along the way we can connect all the items we visit to their root at no extra asymptotic cost.
The average runtime of connect
and isConnected
becomes almost constant in the long term! This is called the amortized runtime.
Code
protected int find(int x) {
while (id[x] >= 0) {
id[x] = id[id[x]];
x = id[x];
}
return x;
}