What would you use the heapq Python module for in real life?

前端 未结 3 1110
栀梦
栀梦 2021-02-02 16:59

After reading Guido\'s Sorting a million 32-bit integers in 2MB of RAM using Python, I discovered the heapq module, but the concept is pretty abstract to me.

<
相关标签:
3条回答
  • 2021-02-02 17:32

    Comparing it to a self-balancing binary tree, a heap doesn't seem to gain you much if you just look at complexity:

    • Insertion: O(logN) for both
    • Remove max element: O(logN) for both
    • Build structure from an array of elements O(N) for heap, O(N log N) for binary tree.

    But whereas a binary tree tends to need each node pointing to its children for efficiency, a heap stores its data packed tightly into an array. This allows you to store much more data in a fixed amount of memory.

    So for the cases when you only need insertion and max-removal, a heap is perfect and can often use half as much memory as a self-balancing binary tree (and much easier to implement if you have to). The standard use-case is a priority queue.

    0 讨论(0)
  • 2021-02-02 17:32

    This was an accidental discovery of me by trying to see how could I implement the Counter Module in Python 2.6. Just have a look into the implementation and usage of collections.Counter. This is actually implemented through heapq.

    0 讨论(0)
  • 2021-02-02 17:44

    The heapq module is commonly use to implement priority queues.

    You see priority queues in event schedulers that are constantly adding new events and need to use a heap to efficiently locate the next scheduled event. Some examples include:

    • Python's own sched module: http://hg.python.org/cpython/file/2.7/Lib/sched.py#l106
    • The Tornado web server: https://github.com/facebook/tornado/blob/master/tornado/ioloop.py#L260
    • Twisted internet servers: http://twistedmatrix.com/trac/browser/trunk/twisted/internet/base.py#L712

    The heapq docs include priority queue implementation notes which address the common use cases.

    In addition, heaps are great for implementing partial sorts. For example, heapq.nsmallest and heapq.nlargest can be much more memory efficient and do many fewer comparisons than a full sort followed by a slice:

    >>> from heapq import nlargest
    >>> from random import random
    >>> nlargest(5, (random() for i in xrange(1000000)))
    [0.9999995650034837, 0.9999985756262746, 0.9999971934450994, 0.9999960394998497, 0.9999949126363714]
    
    0 讨论(0)
提交回复
热议问题