Why does heapify swap the top of the heap with the element at the bottom of the heap?

旧巷老猫 提交于 2019-12-06 03:54:19

One of the key properties of a heap is that the underlying binary tree is a complete binary tree (i.e. every level except the last one has to be completely "filled"). This is so that the heap has O(lg N) operations because we only have to modify one element at each of the O(lg N) levels. Let's take a look at an example

    10
   /  \
  8    7
 / \  / \
5  6  4  3

If we follow your method and "fill in" the heap we get

     8
   /   \
  6     7
 / \   / \
5  ?   4  3

The tree is no longer a complete binary tree as there is a "hole" at the ?. Since we don't know that the tree is complete, we don't know anything about the height of the tree and so we can't guarantee O(lg N) operations.

This is why we take the last element in the heap, put it on top and then shuffle it down - to maintain the complete binary tree property.

why isn't the top element just removed and then other elements can "fill in" for the the heap?

The reason for this is that the index of an element plays an important role in maintaining the structure of the heap. The two children of an element at index i are located at indexes 2*i+1 and 2*i+2. If you "just remove" the top element, you wouldn't end up with another heap: the indexes 1 and 2 would no longer contain children of the max element, because the max element would no longer be there. In a sense, you will end up with two "broken" heaps instead of a properly working one. You must replace the value at index zero, otherwise the indexing scheme among the remaining elements is going to break down.

While removing an element from the top cannot go unnoticed, removing the one at the bottom is OK: all you need to do is to make a note that the smallest element is at last-1 instead of last. So the sequence of operations becomes as follows:

  • Remove the element that can be removed safely
  • Put it in place of the element that cannot be removed safely
  • Percolate the element down the heap until it settles, picking the higher of its two parents at each step

Conceptually, what you propose would work fine. The abstract definition of a heap allows for the topmost element to be removed the other to "sift-up".

In practice, a common heap implementation simulates a tree by using an array of consecutive pointers (when the parent of element n is located at position n/2). In this implementation, it is inconvenient to leave "holes" in the array of pointers.

The "trick" for solving that problem is swapping-in the last element and repositioning it with a "sift-down" step. That assures that all the consecutive array elements are part of the tree and that there are no holes in the sequence. This makes the algorithm easier to implement and saves space which would be needed by link fields.

Executive summary: it is merely an implementation detail (quite convenient and very common).

The whole idea of heap algorithm is that at all times you maintain a complete tree of elements (represented by an array). If you removed something from the root of the tree, you have to put something else in there instead. In an array the most efficient way to achieve that is to move the last element there.

Your concern seems to be based on the assumption that the last element in the array (leaf element in the tree) is the smallest element. That is not correct. Heap array is not fully sorted. Heap has a "vertical" ordering in each subtree, but it has no "horizontal" ordering between the subtrees. The last element in the array will certainly be the smallest in the unique path from the root to that leaf, but in general case it will not be the smallest in the entire heap.

When you look at any leaf element of a heap of size N, you can certainly say that it is not one of the log N greatest elements in the entire heap. But that's all you can say. For example, if your tree has 256 elements in it, then the last element in the array (or any other leaf element) will rank somewhere between 9th to 256th. See? It could be the 9th out of 256! Referring to such element as "smallest" is simply ridiculous. On average not only it is not the smallest, it is not even close to being the smallest.

Again, the last element is chosen specifically because it is the cheapest way to maintain a continuous array. If you implemented heap in some other way, say, through a linked tree instead of an array, then the optimal way of restoring the heap after the root removal might be different.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!