Can someone help explain how can building a heap be O(n) complexity?
Inserting an item into a heap is O(log n)
, and the insert is repeated n/2 times (t
As we know the height of a heap is log(n), where n is the total number of elements.Lets represent it as h
When we perform heapify operation, then the elements at last level(h) won't move even a single step.
The number of elements at second last level(h-1) is 2h-1 and they can move at max 1 level(during heapify).
Similarly, for the ith, level we have 2i elements which can move h-i positions.
Therefore total number of moves=S= 2h*0+2h-1*1+2h-2*2+...20*h
S=2h {1/2 + 2/22 + 3/23+ ... h/2h} -------------------------------------------------1
this is AGP series, to solve this divide both sides by 2
S/2=2h {1/22 + 2/23+ ... h/2h+1} -------------------------------------------------2
subtracting equation 2 from 1 gives
S/2=2h {1/2+1/22 + 1/23+ ...+1/2h+ h/2h+1}
S=2h+1 {1/2+1/22 + 1/23+ ...+1/2h+ h/2h+1}
now 1/2+1/22 + 1/23+ ...+1/2h is decreasing GP whose sum is less than 1 (when h tends to infinity, the sum tends to 1). In further analysis, let's take an upper bound on the sum which is 1.
This gives S=2h+1{1+h/2h+1}
=2h+1+h
~2h+h
as h=log(n), 2h=n
Therefore S=n+log(n)
T(C)=O(n)
Proof of O(n)
The proof isn't fancy, and quite straightforward, I only proved the case for a full binary tree, the result can be generalized for a complete binary tree.
There are already some great answers but I would like to add a little visual explanation
Now, take a look at the image, there are
n/2^1
green nodes with height 0 (here 23/2 = 12)
n/2^2
red nodes with height 1 (here 23/4 = 6)
n/2^3
blue node with height 2 (here 23/8 = 3)
n/2^4
purple nodes with height 3 (here 23/16 = 2)
so there are n/2^(h+1)
nodes for height h
To find the time complexity lets count the amount of work done or max no of iterations performed by each node
now it can be noticed that each node can perform(atmost) iterations == height of the node
Green = n/2^1 * 0 (no iterations since no children)
red = n/2^2 * 1 (heapify will perform atmost one swap for each red node)
blue = n/2^3 * 2 (heapify will perform atmost two swaps for each blue node)
purple = n/2^4 * 3 (heapify will perform atmost three swaps for each purple node)
so for any nodes with height h maximum work done is n/2^(h+1) * h
Now total work done is
->(n/2^1 * 0) + (n/2^2 * 1)+ (n/2^3 * 2) + (n/2^4 * 3) +...+ (n/2^(h+1) * h)
-> n * ( 0 + 1/4 + 2/8 + 3/16 +...+ h/2^(h+1) )
now for any value of h, the sequence
-> ( 0 + 1/4 + 2/8 + 3/16 +...+ h/2^(h+1) )
will never exceed 1
Thus the time complexity will never exceed O(n) for building heap
While building a heap, lets say you're taking a bottom up approach.
In case of building the heap, we start from height, logn -1 (where logn is the height of tree of n elements). For each element present at height 'h', we go at max upto (logn -h) height down.
So total number of traversal would be:-
T(n) = sigma((2^(logn-h))*h) where h varies from 1 to logn
T(n) = n((1/2)+(2/4)+(3/8)+.....+(logn/(2^logn)))
T(n) = n*(sigma(x/(2^x))) where x varies from 1 to logn
and according to the [sources][1]
function in the bracket approaches to 2 at infinity.
Hence T(n) ~ O(n)
Basically, work is done only on non-leaf nodes while building a heap...and the work done is the amount of swapping down to satisfy heap condition...in other words (in worst case) the amount is proportional to the height of the node...all in all the complexity of the problem is proportional to the sum of heights of all the non-leaf nodes..which is (2^h+1 - 1)-h-1=n-h-1= O(n)