问题
Just came across this awesome __length_hint__()
method for iterators from PEP 424 (https://www.python.org/dev/peps/pep-0424/). Wow! A way to get the iterator length without exhausting the iterator.
My questions:
- Is there a simple explanation how does this magic work? I'm just curious.
- Are there limitations and cases where it wouldn't work? ("hint" just sounds a bit suspicious).
- Is there a way to get the hint for zips and generators as well? Or is it something fundamental only to iterators?
Edit: BTW, I see that the __length__hint__()
counts from current position to the end. i.e. partially consumed iterator will report the remaining length. Interesting.
回答1:
Wow! A way to get the iterator length without exhausting the iterator.
No. It's a way to get a vague hint about what the length might be. There is no requirement that it be in any way accurate.
Is there a simple explanation how does this magic work?
The iterator implements a __length_hint__
method that uses some sort of iterator-specific information to make a guess about how many elements it will output. This guess could be pretty decent, or it could suck horribly. For example, a list iterator knows where it is in the list and how long the list is, so it can report how many elements are left in the list.
Are there limitations and cases where it wouldn't work?
If the iterator doesn't have enough information to guess when it will run out, it can't implement a useful __length_hint__
. This is why generators don't have one, for example. Infinite iterators also can't implement a useful __length_hint__
, as there is no way to signal an infinite length.
Is there a way to get the hint for zips and generators as well? Or is it something fundamental only to iterators?
zip
instances and generators are both kinds of iterators. Neither zip
nor the generator type provide a __length_hint__
method, though.
回答2:
The purpose of this is basically just to facilitate more performant allocation of memory in Cython/C code. For example, imagine that a Cython module exposes a function that takes an iterable of custom MyNetworkConnection()
objects and, internally, needs to create and allocate memory for data structures to represent them in the Cython/C code. If we can get a rough estimate of the number of items in the iterator, we can allocate a large enough slab of memory in one operation to accommodate all of them with minimal resizing.
If __len__()
is implemented, we know the exact length and can use that for memory allocation. But often times we won't actually know the exact length, so the estimate helps us improve performance by giving us a "ballpark figure".
It's also surely useful in pure-Python code as well, for example, maybe a user-facing completion time estimate for an operation?
For question 2, well, it's a hint, so you can't rely on it to be exact. You must still account for allocating new memory if the hint is too low, or cleaning up if the hint is too high. I'm not personally aware of other limitations or potential problems.
For question 3, I see no reason why it wouldn't work for Generators, since a Generator is an Iterator:
>>> import collections
>>> def my_generator(): yield
>>> gen = my_generator()
>>> isinstance(gen, collections.Iterator)
True
回答3:
Is there a way to get the hint for zips and generators as well? Or is it something fundamental only to iterators?
In the case of generator I don't think that there is a easy or automatic way of doing it, because if you give my a arbitrary generator, which I don't know how it was made, how can I determine if it is finite or not?? I would need to look at the code, and if use some other function, I need to look at those function and how they are called and... it get messy pretty quick, so for a automatic way, the effort needed look much greater than the reward
In the case of zip
, I don't know why it don't have it, look easy enough to check for the each hint of each element and return the minimum among them, perhaps they don't add it because you can give generator to it and there is no why to get a hint from them?
So it may be a thing better in iterator because they are made with the iterator protocol
class MyIterator:
def __iter__(self):
return self
def __next__(self):
...
if condition_for_more_values:
...
return next_value
else:
raise StopIteration
so is more easy here to add the logic for the __length_hint__
function when this make sense, and that is why the build-in container (list, tuple, str, set, etc.) have such feature because they are made something like this
class Container:
...
def __len__(self):
...
def iter(self):
return Container_Iterator(self)
class Container_Iterator:
def __init__(self,con):
self.i=0
self.data=con
def __iter__(self):
return self
def __next__(self):
if self.i<len(self.data):
self.i+=1
return self.data[self.i-1]
else:
raise StopIteration
as the Conatiner_Iterator
have access to all the relevant info of the Container it know were it is at each time so it can give a meaningful hint and it can be as simple as
def __length_hint__(self):
return len(self.data) - self.i
回答4:
There are several answers to the question, but they are slightly missing the point: __length_hint__
is not magic. It is a protocol. If an object does not implement the protocol, that's it.
Let's take a detour and look at a + b
, as it is a simple example. The +
operator relies on a.__add__
and b.__radd__
to actually do something. int
implements __add__
to mean arithmetic addition (1 + 2 == 3
), while list
implements __add__
to mean content concatenation ([1] + [2] == [1, 2]
). This is because __add__
is just a protocol, to which objects must adhere if they provide it. The definition for __add__
is basically just "take another operand and return an object".
There is no separate, universal meaning to +
. If operands do not provide __add__
or _radd__
, there is nothing python can do about it.
Coming back to the actual question(s), what does this imply?
Is there a simple explanation how does this magic work? I'm just curious.
All the magic is listed in PEP 424 but it is basically: try len(obj)
, fall back to obj.__length_hint__
, use the default. That is all the magic.
In practice, an object has to implement __length_hint__
depending what it knows about itself. For example, take the range_iterator
of the range backport or the Py3.6 C Code):
return self._stop - self._current
Here, the iterator know how long it is at most, and how much it has provided. If it wouldn't keep track of the later, it might still return how long it is at most. In either way, it must use internal knowledge about itself.
Are there limitations and cases where it wouldn't work? ("hint" just sounds a bit suspicious).
Obviously, objects that don't implement __length_hint__
or __len__
don't work. Fundamentally, any object that does not have enough knowledge about its state cannot implement it.
Chained generators usually do not implement it. For example, (a ** 2 for a in range(5))
will not forward the length-hint from range
. This is sensible if you consider that there may be an arbitrary chain of iterators: length_hint
is only an optimization for pre-allocating space, and it may be faster to just fetch the content to put into that space.
In other cases, it may be plain impossible. Infinite and random iterators fall into this category, but also iterators over external resources.
Is there a way to get the hint for zips and generators as well? Or is it something fundamental only to iterators?
If an object does not implement __length_hint__
, then no. Zip and generators don't, probably for the efficiency reasons above.
Also note that a zip and generator objects are their own iterator.
foo = zip([1,2,3], [1,2,3])
id(foo) == id(iter(foo)) # returns True in py3.5
来源:https://stackoverflow.com/questions/38385360/pep-424-length-hint-is-there-a-way-to-do-the-same-for-generators-or-zips