Surprising results with Python timeit: Counter() vs defaultdict() vs dict()

前端 未结 1 649
独厮守ぢ
独厮守ぢ 2020-11-27 08:36

I obtained very surprising results with timeit, can someone tell me if I am doing something wrong ? I am using Python 2.7.

This is the contents of file speedtest_ini

相关标签:
1条回答
  • 2020-11-27 09:00

    Yes, this is expected; the Counter() constructor uses Counter.update() which uses self.get() to load initial values rather than rely on __missing__.

    Moreover, the defaultdict __missing__ factory is handled entirely in C code, especially when using another type like int() that is itself implemented in C. The Counter source is pure Python and as such the Counter.__missing__ method requires a Python frame to execute.

    Because dict.get() is still handled in C, the constructor approach is the faster approach for a Counter(), provided you use the same trick Counter.update() uses and alias the self.get lookup as a local first:

    >>> import timeit
    >>> import random
    >>> import sys
    >>> sys.version_info
    sys.version_info(major=2, minor=7, micro=9, releaselevel='final', serial=0)
    >>> to_count = [random.randint(0, 100) for r in range(60)]
    >>> timeit.timeit('for i in to_count: c[i] += 1',
    ...               'from collections import Counter; from __main__ import to_count; c = Counter()',
    ...               number=10000)
    0.2510359287261963
    >>> timeit.timeit('for i in to_count: c[i] = c_get(i, 0) + 1',
    ...               'from collections import Counter; from __main__ import to_count; c = Counter(); c_get = c.get',
    ...               number=10000)
    0.20978617668151855
    

    Both defaultdict and Counter are helpful classes built for their functionality, not their performance; not relying on the __missing__ hook can be faster still:

    >>> timeit.timeit('for i in to_count: d[i] = d_get(i, 0) + 1',
    ...               'from __main__ import to_count; d = {}; d_get = d.get',
    ...               number=10000)
    0.11437392234802246
    

    That's a regular dictionary using an aliased dict.get() method for maximum speed. But then you'll also have to re-implement the bag behaviour of Counter, or the Counter.most_common() method. The defaultdict use cases go way beyond counting.

    In Python 3.2, updating a Counter() got a speed boost by adding a C library that handles this case; see issue 10667. Testing on Python 3.4, the Counter() constructor now beats the aliased dict.get case:

    >>> timeit.timeit('Counter(to_count)',
    ...               'from collections import Counter; from __main__ import to_count',
    ...               number=100000)
    0.8332311600097455
    >>> timeit.timeit('for i in to_count: d[i] = d_get(i, 0) + 1',
    ...               'from __main__ import to_count; d = {}; d_get = d.get',
    ...               number=100000)
    0.961191965994658
    >>> import sys
    >>> sys.version_info
    sys.version_info(major=3, minor=4, micro=2, releaselevel='final', serial=0)
    

    (Note: to get a meaningful timing result the number of iterations was increased from 10k to 100k; so if you compare these against the dict.get() case above you need to take the timing there times ten, at 1.144 seconds).

    0 讨论(0)
提交回复
热议问题