What is the 2nd argument for the iter function in Python?

帅比萌擦擦* 提交于 2019-12-05 06:54:05

问题


Let's consider a file:

$ echo -e """This is a foo bar sentence .\nAnd this is the first txtfile in the corpus .""" > test.txt
$ cat test.txt 
This is a foo bar sentence .
And this is the first txtfile in the corpus .

And when I want to read the file by character, I can do https://stackoverflow.com/a/25071590/610569:

>>> fin = open('test.txt')
>>> while fin.read(1):
...     fin.seek(-1,1)
...     print fin.read(1),
... 
T h i s   i s   a   f o o   b a r   s e n t e n c e   . 
A n d   t h i s   i s   t h e   f i r s t   t x t f i l e   i n   t h e   c o r p u s   .

But using while loop might look a little unpythonic esp. when i use fin.read(1) to check for EOF and then backtrack in-order to read the current byte. And so I can do something like this How to read a single character at a time from a file in Python?:

>>> import functools
>>> fin = open('test.txt')
>>> fin_1byte = iter(functools.partial(fin.read, 1), '')
>>> for c in fin_1byte:
...     print c,
... 
T h i s   i s   a   f o o   b a r   s e n t e n c e   . 
A n d   t h i s   i s   t h e   f i r s t   t x t f i l e   i n   t h e   c o r p u s   .

But when I tried it without the second argument, it throws a TypeError:

>>> fin = open('test.txt')
>>> fin_1byte = functools.partial(fin.read, 1)
>>> for c in iter(fin_1byte):
...     print c,
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'functools.partial' object is not iterable

What is the 2nd argument in iter? The docs don't say much either: https://docs.python.org/2/library/functions.html#iter and https://docs.python.org/3.6/library/functions.html#iter


As per the doc:

Return an iterator object. The first argument is interpreted very differently depending on the presence of the second argument. Without a second argument, object must be a collection object which supports the iteration protocol (the iter() method), or it must support the sequence protocol (the getitem() method with integer arguments starting at 0). If it does not support either of those protocols, TypeError is raised. If the second argument, sentinel, is given, then object must be a callable object. The iterator created in this case will call object with no arguments for each call to its next() method; if the value returned is equal to sentinel, StopIteration will be raised, otherwise the value will be returned.

I guess the docs require some "decrypting":

  • Without a second argument, object must be a collection object which supports the iteration protocol (the iter() method)

Does that mean it needs to come from collections? Or is that as long as the object has an __iter__(), that's okay?

  • , or it must support the sequence protocol (the getitem() method with integer arguments starting at 0)

That's rather cryptic. So does that means it tries to see whether the sequence is indexed and hence query-able and that the index must starts from 0? Does it also mean that the indices need to be sequential, i.e. 0, 1, 2, 3, ... and not something like 0, 2, 8, 13, ...?

  • If it does not support either of those protocols, TypeError is raised.

Yes, this part, I do understand =)

  • If the second argument, sentinel, is given, then object must be a callable object.

Okay, now this gets a little sci-fi. Is it just a terminology in Python to call something a sentinel? What does sentinel mean Pythonically? And "callable object" like it's a function and not type object?

  • The iterator created in this case will call object with no arguments for each call to its next() method;

This part i don't really get it, maybe an example would help.

  • if the value returned is equal to sentinel, StopIteration will be raised, otherwise the value will be returned.

Okay, so sentinel here refers to some breaking criteria?

Can someone help to decipher/clarify the meaning of the above points about iter?


回答1:


With one argument, iter must be given an object that has the __iter__ special method, or __getitem__ special method. If neither of them exists, iter will raise an error

>>> iter(None)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'NoneType' object is not iterable

There are 2 protocols for iteration. The old protocol relies on calling __getitem__ for successive integers from 0 until one that raises IndexError. The new protocol relies on the iterator that is returned from __iter__.

In Python 2, str doesn't even have the __iter__ special method:

Python 2.7.12+ (default, Sep 17 2016, 12:08:02) 
[GCC 6.2.0 20160914] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> 'abc'.__iter__
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute '__iter__'

yet it is still iterable:

>>> iter('abc')
<iterator object at 0x7fcee9e89390>

To make your custom class iterable, you need to have either __iter__ or __getitem__ that raises IndexError for non-existent items:

class Foo:
    def __iter__(self):
        return iter(range(5))

class Bar:
    def __getitem__(self, i):
        if i >= 5:
            raise IndexError
        return i

Using these:

>>> list(iter(Foo()))
[0, 1, 2, 3, 4]
>>> list(iter(Bar()))
[0, 1, 2, 3, 4]

Usually explicit iter is not needed as for loops and methods that expect iterables will create an iterator implicitly:

>>> list(Foo())
[0, 1, 2, 3, 4]
>>> for i in Bar():
0
1
2
3
4

With the 2 argument form, the first argument must be a function or an object that implements __call__. The first argument is called without arguments; the return values are yielded from the iterator. The iteration stops when the value returned from the function call on that iteration equals the given sentinel value, as if by:

value = func()
if value == sentinel:
    return
else:
    yield value

For example, to get values on a die before we throw 6,

>>> import random
>>> throw = lambda: random.randint(1, 6)
>>> list(iter(throw, 6))
[3, 2, 4, 5, 5]
>>> list(iter(throw, 6))
[1, 3, 1, 3, 5, 1, 4]

To clarify it further, the given function (or the given object with __call__ special method) is called without arguments for each time the next() is used on the iterator:

>>> def throw_die():
...     die = random.randint(1, 6)
...     print("returning {}".format(die))
...     return die
...
>>> throws = iter(throw_die, 6)
>>> next(throws)
returning 2
2
>>> next(throws)
returning 4
4
>>> next(throws)
returning 6
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

(i.e. throw is called as throw() and if the returned value didn't equal to 6, it is yielded).

Or in the case of

>>> fin_1byte = iter(functools.partial(fin.read, 1), '')
>>> for c in fin_1byte:
...     print c,

reading from a file at the end-of-file returns the empty string (or empty bytes if it was a binary file):

>>> from io import StringIO
>>> fin = StringIO(u'ab')
>>> fin.read(1)
u'a'
>>> fin.read(1)
u'b'
>>> fin.read(1)
u''

If not yet at the end of file, one character would be returned.

This can be used to also make an endless iterator from repeated function calls:

>>> dice = iter(throw, 7)

Since the value returned can never be equal to 7, the iterator runs forever. A common idiom is to use an anonymous object to make sure that the comparison wouldn't succeed for any conceivable value

>>> dice = iter(throw, object())

Because

>>> object() != object()
True

Note, that the word sentinel is commonly used for a value that is used as an end marker in some data, and that doesn't occur naturally within the data, as in this Java answer.



来源:https://stackoverflow.com/questions/40297321/what-is-the-2nd-argument-for-the-iter-function-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!