Can I speedup an iterable class when I know it's length in advance?

大憨熊 提交于 2020-04-10 23:23:03

问题


PEP 424 mentions in the "Rationale" that:

Being able to pre-allocate lists based on the expected size, as estimated by __length_hint__ , can be a significant optimization. CPython has been observed to run some code faster than PyPy, purely because of this optimization being present.

So I asked myself the question that I'm now asking here: Is it possible to speed up some iterable class processing an iterator (when it's possible to correctly predict it's "length") based on this knowledge?


回答1:


Setting aside the generator/iterator terminology confusion, the __length_hint__ method is a really minor optimization I would only use in special circumstances. I wrote my own simple little test:

class Range:

    def __init__(self, n):
        self._n = n
        self._i = 0

    def __iter__(self):
        return self

    def __next__(self):
        i = self._i
        if i >= self._n:
            raise StopIteration
        self._i += 1
        return i

class RangeWithHint(Range):

    def __length_hint__(self):
        return self._n

If this is used to generate a list of values, the advantage of preallocating the list only becomes measurable with really large lists of about a million elements, and even then is very small:

Python 3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 08:06:12) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> timeit("xs = list(Range(1000000))", "from __main__ import Range", number=10)
5.068971888250076
>>> timeit("xs = list(RangeWithHint(1000000))", "from __main__ import RangeWithHint", number=10)
4.7962311912107225

Takeaway: Python is already really, really fast at reallocating lists as they grow. Don't assume that __length_hint__ is going to vastly improve that speed.




回答2:


My conclusions about doing two experiments (one after receiving feedback of @TerryJanReedy):

There can be a significant (up to 50%) optimization in simple cases with long iterables but in absolute performance it's negligable as soon as some more complicated operations are performed with or on the item or the iterable is very short.

Setup

I implemented one class that just iterates over some iterator and one more map-like that applies a function to each item. Both classes come in two variants, one without implementing a __length_hint__ and one with it.

I choose Cython to remove as much Python overhead as possible:

from operator import length_hint

cdef class MyIter(object):
    cdef object it

    def __init__(self, iterable):
        self.it = iter(iterable)

    def __iter__(self):
        return self

    def __next__(self):
        return next(self.it)

cdef class MyIter2(object):
    cdef object it

    def __init__(self, iterable):
        self.it = iter(iterable)

    def __iter__(self):
        return self

    def __next__(self):
        return next(self.it)

    # --- This method is new ---
    def __length_hint__(self):
        return length_hint(self.it)

# Map-like classes

cdef class MyMap(object):
    cdef object func
    cdef object it

    def __init__(self, func, iterable):
        self.it = iter(iterable)
        self.func = func

    def __iter__(self):
        return self

    def __next__(self):
        return self.func(next(self.it))

cdef class MyMap2(object):
    cdef object func
    cdef object it

    def __init__(self, func, iterable):
        self.it = iter(iterable)
        self.func = func

    def __iter__(self):
        return self

    def __next__(self):
        return self.func(next(self.it))

    # --- This method is new ---
    def __length_hint__(self):
        return length_hint(self.it)

Timings

I did the timing using Python 3.5 with Ipythons %timeit command:

import random

lengths1 = []
timing1 = []
timing2 = []

lengths2 = []
timing3 = []
timing4 = []

for _ in range(30):
    i = random.randint(1, 1000000)
    lengths1.append(i)
    lst = list(range(i))

    res1 = %timeit -o list(MyIter(lst))
    timing1.append(res1)
    res2 = %timeit -o list(MyIter2(lst))
    timing2.append(res2)

    i = random.randint(1, 100000)  # factor 10 less items
    lengths2.append(i)
    lst = list(range(i))

    res3 = %timeit -o list(MyMap(float, lst))
    timing3.append(res3)
    res4 = %timeit -o list(MyMap2(float, lst))
    timing4.append(res4)

The results of the time difference (timing1 - timing2) and relative time difference (100 * (timing1 - timing2) / timing1):

MyIter

This shows a significant optimization (up to 50%) for long iterables.

MyMap

So the one with the __length_hint__ is sometimes faster but not what I would call significant.



来源:https://stackoverflow.com/questions/41582041/can-i-speedup-an-iterable-class-when-i-know-its-length-in-advance

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!