In a max heap (assuming it's represented by an array), the top of the heap (ie. the largest value in the heap) swaps with the last element in the array (ie. one of the smallest values in the heap), the last element is removed, and then the new top-of-the-heap element swaps with other values to settle back into its proper place.
Instead, why isn't the top element just removed and then other elements can "fill in" for the the heap?
One of the key properties of a heap is that the underlying binary tree is a complete binary tree (i.e. every level except the last one has to be completely "filled"). This is so that the heap has O(lg N)
operations because we only have to modify one element at each of the O(lg N)
levels. Let's take a look at an example
10
/ \
8 7
/ \ / \
5 6 4 3
If we follow your method and "fill in" the heap we get
8
/ \
6 7
/ \ / \
5 ? 4 3
The tree is no longer a complete binary tree as there is a "hole" at the ?
. Since we don't know that the tree is complete, we don't know anything about the height of the tree and so we can't guarantee O(lg N)
operations.
This is why we take the last element in the heap, put it on top and then shuffle it down - to maintain the complete binary tree property.
why isn't the top element just removed and then other elements can "fill in" for the the heap?
The reason for this is that the index of an element plays an important role in maintaining the structure of the heap. The two children of an element at index i
are located at indexes 2*i+1
and 2*i+2
. If you "just remove" the top element, you wouldn't end up with another heap: the indexes 1
and 2
would no longer contain children of the max
element, because the max
element would no longer be there. In a sense, you will end up with two "broken" heaps instead of a properly working one. You must replace the value at index zero, otherwise the indexing scheme among the remaining elements is going to break down.
While removing an element from the top cannot go unnoticed, removing the one at the bottom is OK: all you need to do is to make a note that the smallest element is at last-1
instead of last
. So the sequence of operations becomes as follows:
- Remove the element that can be removed safely
- Put it in place of the element that cannot be removed safely
- Percolate the element down the heap until it settles, picking the higher of its two parents at each step
Conceptually, what you propose would work fine. The abstract definition of a heap allows for the topmost element to be removed the other to "sift-up".
In practice, a common heap implementation simulates a tree by using an array of consecutive pointers (when the parent of element n is located at position n/2). In this implementation, it is inconvenient to leave "holes" in the array of pointers.
The "trick" for solving that problem is swapping-in the last element and repositioning it with a "sift-down" step. That assures that all the consecutive array elements are part of the tree and that there are no holes in the sequence. This makes the algorithm easier to implement and saves space which would be needed by link fields.
Executive summary: it is merely an implementation detail (quite convenient and very common).
The whole idea of heap algorithm is that at all times you maintain a complete tree of elements (represented by an array). If you removed something from the root of the tree, you have to put something else in there instead. In an array the most efficient way to achieve that is to move the last element there.
Your concern seems to be based on the assumption that the last element in the array (leaf element in the tree) is the smallest element. That is not correct. Heap array is not fully sorted. Heap has a "vertical" ordering in each subtree, but it has no "horizontal" ordering between the subtrees. The last element in the array will certainly be the smallest in the unique path from the root to that leaf, but in general case it will not be the smallest in the entire heap.
When you look at any leaf element of a heap of size N
, you can certainly say that it is not one of the log N
greatest elements in the entire heap. But that's all you can say. For example, if your tree has 256 elements in it, then the last element in the array (or any other leaf element) will rank somewhere between 9th to 256th. See? It could be the 9th out of 256! Referring to such element as "smallest" is simply ridiculous. On average not only it is not the smallest, it is not even close to being the smallest.
Again, the last element is chosen specifically because it is the cheapest way to maintain a continuous array. If you implemented heap in some other way, say, through a linked tree instead of an array, then the optimal way of restoring the heap after the root removal might be different.
来源:https://stackoverflow.com/questions/11488227/why-does-heapify-swap-the-top-of-the-heap-with-the-element-at-the-bottom-of-the