There are two binary trees T1 and T2 which store character data, duplicates allowed.
How can I find whether T2 is a subtree of T1 ? .
T1 has millions of nodes and
I am not sure, whether my idea is correct. Nevertheless, for your persual.
Let us say we have T1 as parent tree and T2 as a tree which might be a subtree of T1. Do the following. Assumption made is T1 and T2 are binary tree without any balancing factor.
1) Search the root of T2 in T1. If not found T2 is not a subtree. Searching the element in BT will take O(n) time.
2) If the element is found, do pre-order traversal of T1 from the node root element of T2 is found. This will take o(n) time. Do pre-order traversal of T2 as well. Will take O(n) time. Result of pre-order traversal can be stored into a stack. Insertion in stack will take only O(1).
3) If size of two stacks is not equal, T2 is not a subtree.
4) Pop one element from each stack and check for equality. If mismatch occurs, T2 is not a subtree.
5) If all elments matched T2 is a subtree.
If given the root of both trees, and given that the nodes are of the same type, why is then just ascertaining that the root of T2 is in T1 not sufficient?
I am assuming that "given a tree T" means given a pointer to the root of T and the data type of the node.
Regards.
One of the plain way is to write is_equal() method for tree and do the following,
bool contains_subtree(TNode*other) {
// optimization
if(nchildren < other->nchildren) return false;
if(height < other->height) return false;
// go for real check
return is_equal(other) || (left != NULL && left->contains_subtree(other)) || (right != NULL && right->contains_subtree(other));
}
Note that is_equal() can be optimized by using hashcode for the tree. It can be done in simple way by taking height of the tree or number of children or range of the values as hashcode.
bool is_equal(TNode*other) {
if(x != other->x) return false;
if(height != other->height) return false;
if(nchildren != other->nchildren) return false;
if(hashcode() != other->hashcode()) return false;
// do other checking for example check if the children are equal ..
}
When the tree is similar to a linked list, it will take O(n) time. We can also use some heuristic while choosing the children to compare.
bool contains_subtree(TNode*other) {
// optimization
if(nchildren < other->nchildren) return false;
if(height < other->height) return false;
// go for real check
if(is_equal(other)) return true;
if(left == NULL || right == NULL) {
return (left != NULL && left->contains_subtree(other)) || (right != NULL && right->contains_subtree(other));
}
if(left->nchildren < right->nchildren) { // find in smaller child tree first
return (left->contains_subtree(other)) || right->contains_subtree(other);
} else {
return (right->contains_subtree(other)) || left->contains_subtree(other);
}
}
Another way is to serialize both tree as string and find if the second string(serialized from T2) is sub-string of the first string(serialized from T1).
The following code serializes in pre-order.
void serialize(ostream&strm) {
strm << x << '(';
if(left)
left->serialize(strm);
strm << ',';
if(right)
right->serialize(strm);
strm << ')';
}
And we can use some optimized algorithm, for example, Knuth–Morris–Pratt algorithm to find(possibly in O(n) time) the existence of the sub-string and eventually find if a tree is a sub-tree of other .
Again the string can be compressed efficiently with Burrows–Wheeler_transform. And it is possible to bzgrep to search sub-string in the compressed data.
Another way is to sort the sub-trees in the tree by height and number of children.
bool compare(TNode*other) {
if(height != other->height)
return height < other->height;
return nchildren < other->nchildren;
}
Note that there will be O(n^2) sub-trees. To reduce the number we can use some range based on the height. For example, we can only be interested about the sub-trees of height 1000 to 1500.
When the sorted data is generated it is possible to do binary search in it and find if it is subset in O(lg n) time(considering that there is no duplicate in sorted data) .