The collections.Count.most_common
function in Python uses the heapq
module to return the count of the most common word in a file, for instance.
Here's another variation based on Sedgewick
The heap is represented internally in an array where if a node is at k, it's children are at 2*k and 2*k + 1. The first element of the array is not used, to make the math more convenient.
To add a new element to the heap, you append it to the end of the array and then call swim repeatedly until the new element finds its place in the heap.
To delete the root, you swap it with the last element in the array, delete it and then call sink until the swapped element finds its place.
swim(k):
while k > 1 and less(k/2, k):
exch(k, k/2)
k = k/2
sink(k):
while 2*k <= N:
j = 2*k
if j < N and less(j, j+1):
j++
if not less(k, j):
break
exch(k, j)
k = j
Here's a visualization of heap insert, inserting the first 15 letters of the alphabet: [a-o]
Your confusion may come from the fact that the Python module heapq does not define a heap as a data type (a class) with its own methods (e.g. as in a deque or a list
). It instead provides functions that you can run on a Python list
.
It's best to think of heapq
as a module providing a set of algorithms (methods) to interpret lists as heaps and manipulate them accordingly. Note that it's common to represent heaps internally as arrays (as an abstract data structure), and Python already has lists serving that purpose, so it makes sense for heapq
to just provide methods to manipulate lists as heaps.
Let's see this with an example. Starting with a simple Python list:
>>> my_list = [2, -1, 4, 10, 0, -20]
To create a heap with heapq
from my_list
we just need to call heapify
which simply re-arranges the elements of the list to form a min-heap:
>>> import heapq
>>> # NOTE: This returns NoneType:
>>> heapq.heapify(my_list)
Note that you can still access the list underlying the heap, since all heapify
has done is change the value referenced by my_list:
>>> my_list
[-20, -1, 2, 10, 0, 4]
Popping elements from the heap held by my_list
:
>>> [heapq.heappop(my_list) for x in range(len(my_list))]
[-20, -1, 0, 2, 4, 10]
this is a slightly modified version of the code found here : http://code.activestate.com/recipes/577086-heap-sort/
def HeapSort(A,T):
def heapify(A):
start = (len(A) - 2) / 2
while start >= 0:
siftDown(A, start, len(A) - 1)
start -= 1
def siftDown(A, start, end):
root = start
while root * 2 + 1 <= end:
child = root * 2 + 1
if child + 1 <= end and T.count(A[child]) < T.count(A[child + 1]):
child += 1
if child <= end and T.count(A[root]) < T.count(A[child]):
A[root], A[child] = A[child], A[root]
root = child
else:
return
heapify(A)
end = len(A) - 1
while end > 0:
A[end], A[0] = A[0], A[end]
siftDown(A, 0, end - 1)
end -= 1
if __name__ == '__main__':
text = "the quick brown fox jumped over the the quick brown quick log log"
heap = list(set(text.split()))
print heap
HeapSort(heap,text)
print heap
Output
['brown', 'log', 'jumped', 'over', 'fox', 'quick', 'the']
['jumped', 'fox', 'over', 'brown', 'log', 'the', 'quick']
you can visualize the program here http://goo.gl/2a9Bh
In Python 2.X and 3.x, heaps are supported through an importable library, heapq. It supplies numerous functions to work with the heap data structure modelled in a Python list. Example:
>>> from heapq import heappush, heappop
>>> heap = []
>>> data = [1, 3, 5, 7, 9, 2, 4, 6, 8, 0]
>>> for item in data:
heappush(heap, item)
>>> ordered = []
>>> while heap:
ordered.append(heappop(heap))
>>> ordered
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> data.sort()
>>> data == ordered
True
You can find out more about Heap functions: heappush, heappop, heappushpop, heapify, heapreplace
in heap python docs.