At school we are currently learning sorting algorithms in Java and I got for my homework the Heap Sort. I did my reading, I tried to find out as much as I could, but it seems I
One way to think of heap sort is as a cleverly optimized version of selection sort. In selection sort, the sort works by repeatedly finding the largest element not yet placed correctly, then putting it into the next correct spot in the array. However, selection sort runs in time O(n2) because it has to do n rounds of finding the largest element out of a bunch (and there can be up to n different elements to look at) and putting it into place.
Intuitively, heap sort works by building up a special data structure called a binary heap that speeds up finding the largest element out of the unplaced array elements. Binary heaps support the following operations:
At a very high level, the algorithm works as follows:
This sorts the array because the elements returned by Delete-Max are in descending order. Once all the elements have been removed, the array is then sorted.
Heap sort is efficient because the Insert and Delete-Max operations on a heap both run in O(log n) time, meaning that n inserts and deletions can be done on the heap in O(n log n) time. A more precise analysis can be used to show that, in fact, it takes Θ(n log n) time regardless of the input array.
Typically, heap sort employs two major optimizations. First, the heap is usually built up in-place inside the array by treating the array itself as a compressed representation of the heap. If you look at a heapsort implementation, you will usually see unusual uses of array indices based on multiplying and dividing by two; these accesses work because they are treating the array as a condensed data structure. As a result, the algorithm requires only O(1) auxiliary storage space.
Second, rather than building up the heap one element at a time, the heap is usually built using a specialized algorithm that runs in time Θ(n) to build the heap in-place. Interestingly, in some cases this ends up making the code easier to read because code can be reused, but the algorithm itself becomes a bit trickier to understand and analyze.
You will sometimes see heapsort done with a ternary heap. This has the advantage of being slightly faster on average, but if you find a heapsort implementation using this without knowing what you're looking at it can be fairly tricky to read. Other algorithms also use the same general structure but a more complex heap structure. Smoothsort uses a much more complicated heap to get O(n) best-case behavior while maintaining O(1) space usage and O(n log n) worst-case behavior. Poplar sort is similar to smoothsort, but with O(log n) space usage and slightly better performance. One can even think of classic sorting algorithms like insertion sort and selection sort as heap sort variants.
Once you have a better grasp of heapsort, you may want to look into the introsort algorithm, which combines quicksort, heapsort, and insertion sort to produce an extremely fast sorting algorithm that combines the strength of quicksort (fast sorting on average), heapsort (excellent worst-case behavior), and insertion sort (fast sorting for small arrays). Introsort is what's used in many implementations of C++'s std::sort
function, and is not very hard to implement yourself once you have a working heapsort.
Hope this helps!