How do you compute the average height of a binary search tree when adding 1000 random ints? What is the average height?
There doesn't appear to be a simple answer to this question, though there are a number of numeric approximations, e.g.:
Devroye, Luc. "A note on the height of binary search trees." Journal of the ACM (JACM) 33.3 (1986): 489-498.
Reed, Bruce. "The height of a random binary search tree." Journal of the ACM (JACM) 50.3 (2003): 306-332.
http://staff.ustc.edu.cn/~csli/graduate/algorithms/book6/chap13.htm
These approximations generally take the form: A ln n - B ln ln n + C
Where A~4.311
and B~1.953
So probably the most useful thing to say is that the average height for random insertions is O(log n)
, but if you actually need a numeric approximation I think (4.311 ln n - 1.953 ln ln n)
would be close enough for large n.
For n=1000
, that gives about 26
- which fits the experimental results reported elsewhere quite nicely.
Regardless of what tree you are using the average height will be log2(1000), as someone mentioned before. It's true that depending on the order of the numbers inserted the actual height may vary, but assuming randomly distributed numbers, which you mention, then the actual value will, more often than not, approximate the expected value (Which, once again, is log2(1000))
This question got me asking whether you can definitively work this out without actually generating the trees.
I managed to write an application which could calculate the answer to what the average height would be if you added every possible permutation of N unique numbers to a naively implemented binary tree.
The answers I got are in this graph. (The X-axis is the number of items in the tree, the blue line is the average height, and the red line is the optimum possible height)
N Average Height 2 2 16 7.039 32 9.280 64 11.679 256 16.783 343 17.896
Granitebolshevik is right: it's possible but statistically unlikely that a naively implemented tree will be the optimum height, without extra balancing functionality.
The algorithm has a complexity of O(N^2), and it isn't fast enough to calculate really large numbers.
It depends on whether you are using any sort of balanced tree structure (such as a red-black tree). Since you are inserting random numbers into a binary tree, it would be reasonable to expect that the average depth is approximately log2(1000) - so the values 10 and 11 would be 'normal'. I'm not sure how far it could deviate from that; no shallower than 10 levels, possibly somewhat deeper. An extreme case with no balancing would be 1000 deep; that is unlikely to happen with random numbers.
You can compute the height of a binary tree using this recursive definition:
height(empty) = 0
height(tree) = 1 + max(height(tree.left), height(tree.right))
One way to empirically measure the average height of such a tree is to repeatedly create an empty tree and add 1000 random items to it. Measure the height of each trial using the above function, and average them.
I suspect your task is probably to find a formula for the average height of a binary tree.
it depends on the order the are added. If you start with the smallest value then the tree will be deeper because all new values will be added to the right child BST. If you add the largest value first then the left child will be deep while the right is empty.