Can someone help explain how can building a heap be O(n) complexity?
Inserting an item into a heap is O(log n)
, and the insert is repeated n/2 times (t
Lets suppose you have N elements in a heap. Then its height would be Log(N)
Now you want to insert another element, then the complexity would be : Log(N), we have to compare all the way UP to the root.
Now you are having N+1 elements & height = Log(N+1)
Using induction technique it can be proved that the complexity of insertion would be ∑logi.
Now using
log a + log b = log ab
This simplifies to : ∑logi=log(n!)
which is actually O(NlogN)
But
we are doing something wrong here, as in all the case we do not reach at the top. Hence while executing most of the times we may find that, we are not going even half way up the tree. Whence, this bound can be optimized to have another tighter bound by using mathematics given in answers above.
This realization came to me after a detail though & experimentation on Heaps.
Your analysis is correct. However, it is not tight.
It is not really easy to explain why building a heap is a linear operation, you should better read it.
A great analysis of the algorithm can be seen here.
The main idea is that in the build_heap
algorithm the actual heapify
cost is not O(log n)
for all elements.
When heapify
is called, the running time depends on how far an element might move down in tree before the process terminates. In other words, it depends on the height of the element in the heap. In the worst case, the element might go down all the way to the leaf level.
Let us count the work done level by level.
At the bottommost level, there are 2^(h)
nodes, but we do not call heapify
on any of these, so the work is 0. At the next to level there are 2^(h − 1)
nodes, and each might move down by 1 level. At the 3rd level from the bottom, there are 2^(h − 2)
nodes, and each might move down by 2 levels.
As you can see not all heapify operations are O(log n)
, this is why you are getting O(n)
.
"The linear time bound of build Heap, can be shown by computing the sum of the heights of all the nodes in the heap, which is the maximum number of dashed lines. For the perfect binary tree of height h containing N = 2^(h+1) – 1 nodes, the sum of the heights of the nodes is N – H – 1. Thus it is O(N)."
We get the runtime for the heap build by figuring out the maximum move each node can take. So we need to know how many nodes are in each row and how far from their can each node go.
Starting from the root node each next row has double the nodes than the previous row has, so by answering how often can we double the number of nodes until we don't have any nodes left we get the height of the tree. Or in mathematical terms the height of the tree is log2(n), n being the length of the array.
To calculate the nodes in one row we start from the back, we know n/2 nodes are at the bottom, so by dividing by 2 we get the previous row and so on.
Based on this we get this formula for the Siftdown approach: (0 * n/2) + (1 * n/4) + (2 * n/8) + ... + (log2(n) * 1)
The term in the last paranthesis is the height of the tree multiplied by the one node that is at the root, the term in the first paranthesis are all the nodes in the bottom row multiplied by the length they can travel,0. Same formula in smart:
Bringing the n back in we have 2 * n, 2 can be discarded because its a constant and tada we have the worst case runtime of the Siftdown approach: n.
I really like explanation by Jeremy west.... another approach which is really easy for understanding is given here http://courses.washington.edu/css343/zander/NotesProbs/heapcomplexity
since, buildheap depends using depends on heapify and shiftdown approach is used which depends upon sum of the heights of all nodes. So, to find the sum of height of nodes which is given by S = summation from i = 0 to i = h of (2^i*(h-i)), where h = logn is height of the tree solving s, we get s = 2^(h+1) - 1 - (h+1) since, n = 2^(h+1) - 1 s = n - h - 1 = n- logn - 1 s = O(n), and so complexity of buildheap is O(n).
Successive insertions can be described by:
T = O(log(1) + log(2) + .. + log(n)) = O(log(n!))
By starling approximation, n! =~ O(n^(n + O(1)))
, therefore T =~ O(nlog(n))
Hope this helps, the optimal way O(n)
is using the build heap algorithm for a given set (ordering doesn't matter).