Python collections.Counter: most_common complexity

前端 未结 2 1641
清酒与你
清酒与你 2020-12-01 10:32

What is the complexity of the function most_common provided by the collections.Counter object in Python?

More specifically, is Counte

相关标签:
2条回答
  • 2020-12-01 10:53

    The source shows exactly what happens:

    def most_common(self, n=None):
        '''List the n most common elements and their counts from the most
        common to the least.  If n is None, then list all element counts.
    
        >>> Counter('abracadabra').most_common(3)
        [('a', 5), ('r', 2), ('b', 2)]
    
        '''
        # Emulate Bag.sortedByCount from Smalltalk
        if n is None:
            return sorted(self.iteritems(), key=_itemgetter(1), reverse=True)
        return _heapq.nlargest(n, self.iteritems(), key=_itemgetter(1))
    

    heapq.nlargest is defined in heapq.py

    0 讨论(0)
  • 2020-12-01 11:00

    From the source code of collections.py, we see that if we don't specify a number of returned elements, most_common returns a sorted list of the counts. This is an O(n log n) algorithm.

    If we use most_common to return k > 1 elements, then we use heapq.nlargest. This is an O(k) + O((n - k) log k) + O(k log k) algorithm, which is very good for a small constant k, since it's essentialy linear. The O(k) part comes from heapifying the initial k counts, the second part from n - k calls to heappushpop method and the third part from sorting the final heap of k elements. Since k <= n we can conclude that the complexity is:

    O(n log k)

    If k = 1 then it's easy to show that the complexity is:

    O(n)

    0 讨论(0)
提交回复
热议问题