Disjoint-Set forests in Python alternate implementation

问题

I'm implementing a disjoint set system in Python, but I've hit a wall. I'm using a tree implementation for the system and am implementing Find(), Merge() and Create() functions for the system.

I am implementing a rank system and path compression for efficiency.

The catch is that these functions must take the set of disjoint sets as a parameter, making traversing hard.

class Node(object):
    def __init__(self, value):
        self.parent = self
        self.value = value
        self.rank = 0

def Create(values):
    l = [Node(value) for value in values]
    return l

The Create function takes in a list of values and returns a list of singular Nodes containing the appropriate data.

I'm thinking the Merge function would look similar to this,

def Merge(set, value1, value2):
    value1Root = Find(set, value1)
    value2Root = Find(set, value2)
    if value1Root == value2Root:
        return
    if value1Root.rank < value2Root.rank:
        value1Root.parent = value2Root
    elif value1Root.rank > value2Root.rank:
        value2Root.parent = value1Root
    else:
        value2Root.parent = value1Root
        value1Root.rank += 1

but I'm not sure how to implement the Find() function since it is required to take the list of Nodes and a value (not just a node) as the parameters. Find(set, value) would be the prototype.

I understand how to implement path compression when a Node is taken as a parameter for Find(x), but this method is throwing me off.

Any help would be greatly appreciated. Thank you.

Edited for clarification.

回答1:

Clearly merge function should be applied to pair of nodes.

So find function should take single node parameter and look like this:

def find(node):
    if node.parent != node:
        node.parent = find(node.parent)
    return node.parent

Also wikipedia has pseudocode that is easily translatable to python.

回答2:

The implementation of this data structure becomes simpler when you realize that the operations union and find can also be implemented as methods of a disjoint set forest class, rather than on the individual disjoint sets.

If you can read C++, then have a look at my take on the data structure; it hides the actual sets from the outside world, representing them only as numeric indices in the API. In Python, it would be something like

class DisjSets(object):
    def __init__(self, n):
        self._parent = range(n)
        self._rank = [0] * n

    def find(self, i):
        if self._parent[i] == i:
            return i
        else:
            self._parent[i] = self.find(self._parent[i])
            return self._parent[i]

    def union(self, i, j):
        root_i = self.find(i)
        root_j = self.find(j)
        if root_i != root_j:
            if self._rank[root_i] < self._rank[root_j]:
                self._parent[root_i] = root_j
            elif self._rank[root_i] > self._rank[root_j]:
                self._parent[root_j] = root_i
            else:
                self._parent[root_i] = root_j
                self._rank[root_j] += 1

(Not tested.)

If you choose not to follow this path, the client of your code will indeed have to have knowledge of Nodes and Find must take a Node argument.

回答3:

Find is always done on an item. Find(item) is defined as returning the set to which the item belongs. Merger as such must not take nodes, merge always takes two items/sets. Merge or union (item1, item2) must first find(item1) and find(item2) which will return the sets to which each of these belong. After that the smaller set represented by an up-tree must be added to the taller. When a find is issued, always retrace the path and compress it.

A tested implementation with path compression is here.

来源：https://stackoverflow.com/questions/9488284/disjoint-set-forests-in-python-alternate-implementation

标签

python

set

disjoint-sets