问题
PEP 424 mentions in the "Rationale" that:
Being able to pre-allocate lists based on the expected size, as estimated by
__length_hint__
, can be a significant optimization. CPython has been observed to run some code faster than PyPy, purely because of this optimization being present.
So I asked myself the question that I'm now asking here: Is it possible to speed up some iterable class processing an iterator (when it's possible to correctly predict it's "length") based on this knowledge?
回答1:
Setting aside the generator/iterator terminology confusion, the __length_hint__
method is a really minor optimization I would only use in special circumstances. I wrote my own simple little test:
class Range:
def __init__(self, n):
self._n = n
self._i = 0
def __iter__(self):
return self
def __next__(self):
i = self._i
if i >= self._n:
raise StopIteration
self._i += 1
return i
class RangeWithHint(Range):
def __length_hint__(self):
return self._n
If this is used to generate a list of values, the advantage of preallocating the list only becomes measurable with really large lists of about a million elements, and even then is very small:
Python 3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 08:06:12) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> timeit("xs = list(Range(1000000))", "from __main__ import Range", number=10)
5.068971888250076
>>> timeit("xs = list(RangeWithHint(1000000))", "from __main__ import RangeWithHint", number=10)
4.7962311912107225
Takeaway: Python is already really, really fast at reallocating lists as they grow. Don't assume that __length_hint__
is going to vastly improve that speed.
回答2:
My conclusions about doing two experiments (one after receiving feedback of @TerryJanReedy):
There can be a significant (up to 50%) optimization in simple cases with long iterables but in absolute performance it's negligable as soon as some more complicated operations are performed with or on the item or the iterable is very short.
Setup
I implemented one class that just iterates over some iterator and one more map
-like that applies a function to each item. Both classes come in two variants, one without implementing a __length_hint__
and one with it.
I choose Cython
to remove as much Python overhead as possible:
from operator import length_hint
cdef class MyIter(object):
cdef object it
def __init__(self, iterable):
self.it = iter(iterable)
def __iter__(self):
return self
def __next__(self):
return next(self.it)
cdef class MyIter2(object):
cdef object it
def __init__(self, iterable):
self.it = iter(iterable)
def __iter__(self):
return self
def __next__(self):
return next(self.it)
# --- This method is new ---
def __length_hint__(self):
return length_hint(self.it)
# Map-like classes
cdef class MyMap(object):
cdef object func
cdef object it
def __init__(self, func, iterable):
self.it = iter(iterable)
self.func = func
def __iter__(self):
return self
def __next__(self):
return self.func(next(self.it))
cdef class MyMap2(object):
cdef object func
cdef object it
def __init__(self, func, iterable):
self.it = iter(iterable)
self.func = func
def __iter__(self):
return self
def __next__(self):
return self.func(next(self.it))
# --- This method is new ---
def __length_hint__(self):
return length_hint(self.it)
Timings
I did the timing using Python 3.5 with Ipythons %timeit
command:
import random
lengths1 = []
timing1 = []
timing2 = []
lengths2 = []
timing3 = []
timing4 = []
for _ in range(30):
i = random.randint(1, 1000000)
lengths1.append(i)
lst = list(range(i))
res1 = %timeit -o list(MyIter(lst))
timing1.append(res1)
res2 = %timeit -o list(MyIter2(lst))
timing2.append(res2)
i = random.randint(1, 100000) # factor 10 less items
lengths2.append(i)
lst = list(range(i))
res3 = %timeit -o list(MyMap(float, lst))
timing3.append(res3)
res4 = %timeit -o list(MyMap2(float, lst))
timing4.append(res4)
The results of the time difference (timing1 - timing2
) and relative time difference (100 * (timing1 - timing2) / timing1
):
MyIter
This shows a significant optimization (up to 50%) for long iterables.
MyMap
So the one with the __length_hint__
is sometimes faster but not what I would call significant.
来源:https://stackoverflow.com/questions/41582041/can-i-speedup-an-iterable-class-when-i-know-its-length-in-advance