Removing duplicate subtrees from binary tree

前端 未结 3 662
清歌不尽
清歌不尽 2021-02-15 07:55

I have to design an algorithm under the additional homework. This algorithm have to compress binary tree by transforming it into DAG by removing repetitive subtrees and redirect

相关标签:
3条回答
  • 2021-02-15 08:22

    I would go with a hashing approach.

    A hash for a leaf is its value mod P_1. Hash for a node is (value+hash(left_son)*P_2+hash(right_son)*P_2^2) mod P_1, where P_1, P_2 are primes. If you count those hashes for at least 5 different big prime pairs(by big i mean something near 10^8-10^9, so you can do your math without overflowing), you can safely assume that nodes with same hashes are the same.

    Then you can walk the tree, checking sons, first and do your transform. This will work in O(n) time.

    NOTE that you can use other hash functions, like (value + hash(left_son)*P_2 + hash(right_son)*P_3) mod P_1, etc.

    0 讨论(0)
  • 2021-02-15 08:32

    This happens when constructing oBDDs. The Idea is: put the tree into a canonical form, and construct a hashtable with an entry for every node. Hash function is a function of the node + the hash functions for the left/right child nodes. Complexity is O(N), but only if one can rely on the hashvalues being unique. The final compare (e.g. for Resolving collisions) will still cost o(N*N) for the recursive subtree <--> subtree compare. More on BDDs or the original Bryant paper

    The hashfunction I currently use:

    #define SHUFFLE(x,n) (((x) << (n))|((x) >>(32-(n))))
    /* a node's hashvalue is based on its value
     * and (recursively) on it's children's hashvalues.
     */
    #define NODE_HASH2(l,r) ((SHUFFLE((l),5)^SHUFFLE((r),9)))
    #define NODE_HASH3(v,l,r) ((0x54321u*(v) ^ NODE_HASH2((l),(r))))
    

    Typical usage:

    void node_sethash(NodeNum num)
    {
    if (NODE_IS_NULL(num)) return;
    
    if (NODE_IS_TERMINAL(num)) switch (nodes[num].var) {
            case 0: nodes[num].hash.hash= HASH_FALSE; break;
            case 1: nodes[num].hash.hash= HASH_TRUE; break;
            case 2: nodes[num].hash.hash= HASH_FALSE^HASH_TRUE; break;
            }
    else if (NODE_IS_NAMED(num)) {
            NodeNum f,t;
            f = nodes[num].negative;
            t = nodes[num].positive;
            nodes[num].hash.hash = NODE_HASH3 (nodes[num].var, nodes[f].hash.hash, nodes[t].hash.hash);
            }
    return ;
    }
    

    Searching the hash table:

    NodeNum *hash_hnd(NodeNum num, int want_exact)
    {
    unsigned slot;
    NodeNum *ptr, this;
    if (NODE_IS_NULL(num)) return NULL;
    
    slot = nodes[num].hash.hash % COUNTOF(hash_nodes);
    
    for (ptr = &hash_nodes[slot]; !NODE_IS_NULL(this= *ptr); ptr = &nodes[this].hash.link) {
            if (this == num) break;
            if (want_exact) continue;
            if (nodes[this].hash.hash != nodes[num].hash.hash) continue;
            if (nodes[this].var != nodes[num].var) continue;
            if (node_compare( nodes[this].negative , nodes[num].negative)) continue;
            if (node_compare( nodes[this].positive , nodes[num].positive)) continue;
                    /* duplicate node := same var+same children */
            break;
            }
    return ptr;
    }
    

    The recursive compare function:

    int node_compare(NodeNum one, NodeNum two)
    {
    int rc;
    
    if (one == two) return 0;
    
    if (NODE_IS_NULL(one) && NODE_IS_NULL(two)) return 0;
    if (NODE_IS_NULL(one) && !NODE_IS_NULL(two)) return -1;
    if (!NODE_IS_NULL(one) && NODE_IS_NULL(two)) return 1;
    
    if (NODE_IS_TERMINAL(one) && !NODE_IS_TERMINAL(two)) return -1;
    if (!NODE_IS_TERMINAL(one) && NODE_IS_TERMINAL(two)) return 1;
    
    if (VAR_RANK(nodes[one].var)  < VAR_RANK(nodes[two].var) ) return -1;
    if (VAR_RANK(nodes[one].var)  > VAR_RANK(nodes[two].var) ) return 1;
    
    
    rc = node_compare(nodes[one].negative,nodes[two].negative);
    if (rc) return rc;
    rc = node_compare(nodes[one].positive,nodes[two].positive);
    if (rc) return rc;
    
    return 0;
    }
    
    0 讨论(0)
  • 2021-02-15 08:35

    This is a problem commonly solved to do common sub-expression elimination in programming languages.

    The approach is as follows (and is easily generalized to more than 2 children in a node):

    Algorithm (Assumes mutable tree structure; You can easily build a new tree along the way):

    MakeDAG(tree):
    
        HASH = a new hash-table-based dictionary
    
        foreach subtree NODE in the tree // traverse this however you like
    
            if NODE is in HASH
                replace NODE with HASH[NODE]
            else
                HASH[NODE] = N // insert the current node, N, in the dictionary
    

    To compute the hash code for a node, you need to recursively compute the hash nodes until you reach the leaves of the tree.

    Simply calculating these hash codes naively will bump up your runtime to O(n^2).

    It is crucial that you store the results on your way down the tree to avoid repeated recursive calls and to improve the runtime to O(n).

    0 讨论(0)
提交回复
热议问题