Wednesday, January 12, 2011

The Union-Find Algorithm

A Union-Find Algorithm is an algorithm that maintains and combines different sets. It performs two useful operations on a data structure such as:
  • Find determine which set a particular element is in. It is useful for determining if two elements are in the same set.
  • Union combine or merge two sets into a single set.

Because it supports these two operations, a disjoint-set data structure is sometimes called a union-find data structure or merge-find set. The other important operation, MakeSet, which makes a set containing only a given element, is generally trivial. With these three operations, many practical partitioning problem can be solved.

In order to define these operations more precisely, some way of representing the sets is needed. One common approach is to select a fixed element of each set, called its representative, to represent the set as a whole. Then, Find(x) returns the representative of the set that x belongs to, and Union takes two set representatives as its arguments.

A simple approach to creating a disjoint-set data structure is to create a linked list for each set. The element at the head of each list is chosen as its representative.

MakeSet creates a list of one element. Union appends the two lists, a constant-time operation. The drawback of this implementation is that Find requires (n) or linear time.

This can be avoided by including in each linked list node a pointer to the head of the list; then Find takes constant time. However, Union now has to update each element of the list being appended to make it point to the head of the new combined list, requiring (n) time.

When the length of each list is tracked, the required time can be improved by always appending the smaller list to the longer. Using this weighted-union heuristic, a sequence of m MakeSet, Union, and Find operations on n elements requires O(m + nlog n) time. For asymptotically faster operations, a different data structure is needed.

Source: An article from Wikipedia, the free Encyclopedia

Definition

The defining properties of the subset ADT are union and find. It is used to maintain sets of subsets. In itially there are n subsets each consisting of 1 element. These elements may be assumed to be the numbers 0 to n - 1. Gradually subsets get fused (union) together and become larger. At the same time there are queries (find operations) asking for the unique identifier, the name of the subset to which an element belongs. Initially the name might be taken equal to the single number in the subset. Later it might be one of the numbers in the subset, possibly, but not necessarily, the smallest number. The only important thing is that find(x) and find(y) return the same values when x a nd y belong to the same subset, and different values when they belong to different subsets.

The above leaves much freedom in how to perform the unions and how to choose the names. A possibility is to apply a rule like "add first to second", meaning that an operation union(x, y) is performed by adding all elements of the subset in which x lies to the subset in which y lies. The new name for this set then becomes the name of the subset in which y lies.


The subset ADT may sound unimportant, but it shows up everywhere. Subsets are the canonical example of the mathematical concept of equivalence relations. An equivalence relation is a binary operator "~" on elements of a set with the property that

  • a ~ a, for all a: ~ is reflexive
  • a ~ b implies that b ~ a, for all a and b: ~ is symmetric
  • a ~ b and b ~ c implies that a ~ c, for all a, b and c: ~ is transitive
If we read "~" like "in the same subset as", then all three properties are satisfied. The most important example of an equivalence relation is "is reachable": in a road system with two-way roads, all three properties are satisfied. Thus, the subset ADT can be used to compute something called the "connected components" of a graph.

The subset ADT allows to maintain equivalence relations dynamically: relations may be added by applying the union operation. The only limitation is that there is no de-union: once unified, the sets cannot be split anymore. This latter feature would be much more expensive to implement, in particular it would require that all previous operations are recorded.

There are several implementations, ranging from extremely simple giving modest performance (one operation O(1), the other O(n)), to slightly less simple giving excellent performance (almost constant time for both operations).

Author:Jop Sibeyn



And here some link to know how this algorithm works: http://www.users.csbsju.edu/~lziegler/CS239/UnionFind.html