- Find determine which set a particular element is in. It is useful for determining if two elements are in the same set.
- Union combine or merge two sets into a single set.
Because it supports these two operations, a disjoint-set data structure is sometimes called a union-find data structure or merge-find set. The other important operation, MakeSet, which makes a set containing only a given element, is generally trivial. With these three operations, many practical partitioning problem can be solved.
In order to define these operations more precisely, some way of representing the sets is needed. One common approach is to select a fixed element of each set, called its representative, to represent the set as a whole. Then, Find(x) returns the representative of the set that x belongs to, and Union takes two set representatives as its arguments.
A simple approach to creating a disjoint-set data structure is to create a linked list for each set. The element at the head of each list is chosen as its representative.
MakeSet creates a list of one element. Union appends the two lists, a constant-time operation. The drawback of this implementation is that Find requires (n) or linear time.
This can be avoided by including in each linked list node a pointer to the head of the list; then Find takes constant time. However, Union now has to update each element of the list being appended to make it point to the head of the new combined list, requiring (n) time.
When the length of each list is tracked, the required time can be improved by always appending the smaller list to the longer. Using this weighted-union heuristic, a sequence of m MakeSet, Union, and Find operations on n elements requires O(m + nlog n) time. For asymptotically faster operations, a different data structure is needed.
Source: An article from Wikipedia, the free Encyclopedia
Definition
The above leaves much freedom in how to perform the unions and how to choose the names. A possibility is to apply a rule like "add first to second", meaning that an operation union(x, y) is performed by adding all elements of the subset in which x lies to the subset in which y lies. The new name for this set then becomes the name of the subset in which y lies.
The subset ADT may sound unimportant, but it shows up everywhere. Subsets are the canonical example of the mathematical concept of equivalence relations. An equivalence relation is a binary operator "~" on elements of a set with the property that
- a ~ a, for all a: ~ is reflexive
- a ~ b implies that b ~ a, for all a and b: ~ is symmetric
- a ~ b and b ~ c implies that a ~ c, for all a, b and c: ~ is transitive
The subset ADT allows to maintain equivalence relations dynamically: relations may be added by applying the union operation. The only limitation is that there is no de-union: once unified, the sets cannot be split anymore. This latter feature would be much more expensive to implement, in particular it would require that all previous operations are recorded.
There are several implementations, ranging from extremely simple giving modest performance (one operation O(1), the other O(n)), to slightly less simple giving excellent performance (almost constant time for both operations).
Author:Jop Sibeyn
And here some link to know how this algorithm works: http://www.users.csbsju.edu/~lziegler/CS239/UnionFind.html