Given two unsorted arrays of size N each, we are to determine if the Binary Search Tree constructed from them will be equal or not.
So, the elements of
"Construct two trees and compare" does not have to be O(N^2). You can use an auxilliary data structure that lets you find the position of the new node in O(log N) instead of O(N), so that the construction of the BST is O(N log N) even if the BST being constructed is not balanced.
With each empty position (i.e. a free child slot in a node) pos
in a BST, there is an associated interval (a_pos,b_pos)
(one of the values might be +/- infinity), such that new node for value v
will be created at pos
if and only if the value is in the interval.
You can store the intervals in a balanced interval tree, so that the position for each new arriving value can be found in O(log N). The update of the interval tree is also O(log N), as you only replace one interval with two.
(Actually, the intervals never overlap, so the auxilliary structure can be a plain old (balanced) BST instead of an interval tree.)
Example:
Take the following non-balanced BST, constructed for an array prefix [1,10,2,9,3, ...]
1
/ \
a 10
/ \
2 f
/ \
b 9
/ \
3 e
/ \
c d
The letters a-f
denote the possible places where a new node can be placed (nil leaves). With each of the letter, there's an associated interval, as follows:
[-inf,1] -> a
[1,2] -> b
[2,3] -> c
[3,9] -> d
[9,10] -> e
[10, +inf] -> f
A new node for a value v
will be added into the BST at the place determined by the interval that v
belongs to. Zero will end up at a
, 5 at d
and so on. The key idea is to store this information outside of the tree.
If you can efficiently represent the above table (with links to the actual tree nodes), adding new node to the tree will take O(access to table) + O(1). The O(1) represents adding the node into the non-balanced BST, given that you already know where you place it. Adding 5 will not require comparing with 1,10,2,9 and 3 but instead will be looked up in the table and and placed directly at d
.
Once you place the new node, you obviously also have to update the table. The data structure to represent the table could be an interval tree ( http://en.wikipedia.org/wiki/Interval_tree ).
Try this:
int identical(struct node* a, struct node* b)
{
if (a==NULL && b==NULL)
{
return(true);
}
else if (a!=NULL && b!=NULL)
{
return(a-> data == b-> data && identical(a->left, b->left) && identical(a->right, b->right));
}
else
return(false);
}
I think you can improve the naive approach from O(N^2) to O(NlogN) by using range minimum query to construct the binary tree.
Suppose we want to construct the binary tree for an array A.
The idea is to first construct an array B where B[i] is the position of the ith largest element in A. This can be done by sorting in O(NlogN).
We can then use range minimum query on array B to allow us to find the minimum value of B[i] for a given range a<=i<=b. In other words, this lets us find the first position in A where we have a value in the range between the ath and bth largest elements.
RMQ takes time O(N) to preprocess, and then queries can be answered in time O(1).
We can then recursively find for each element its left and right children (if any) and check that they match.
Suppose the two arrays are A and A2, and we assume for simplicity that A,A2 have been preprocessed such that the ith largest element is equal to i.
The trees are identical if find_children(1,N) is True:
find_children(low,high)
if high==low
return True
node = A[RMQ(low,high)]
return node == A2[RMQ2(low,high)]
and find_children(low,node-1)
and find_children(node+1,high)
This function is called once for each node (and empty child pointer) in the tree so takes time O(N).
Overall, this is O(NlogN) as the preprocess sorting takes O(NlogN).
Suppose we have entered elements 20 and 51 into a binary tree. We will then have 20 being the root, and 51 being the right child. To find the left child of 51 we need to find the first element in the array which has a value greater than 20, and less than 51. This value is given by our range minimum query applied to the range 20+1->51-1.
We can therefore find the left and right children of all nodes faster than by inserting them into the binary tree in the natural way (only faster in a theoretical worst case - the other methods may well be faster for typical examples).
I came up with following code. It works fine, though partitioning is inefficient.
bool isBST (vector<int> vec1, vector<int> vec2) {
if (vec1.size() == 0 && vec2.size() == 0)
return true;
if (vec1.size() != vec2.size())
return false;
if (vec1[0] != vec2[0])
return false;
vector<int> temp1;
vector<int> temp2;
vector<int> temp3;
vector<int> temp4;
for (int k = 1; k < vec1.size(); k++) {
if(vec1[k] < vec1[0])
temp1.push_back(vec1[k]);
else
temp2.push_back(vec1[k]);
if(vec2[k] < vec2[0])
temp3.push_back(vec2[k]);
else
temp4.push_back(vec2[k]);
}
return isBST(temp1, temp3) && isBST(temp2, temp4);
}