问题
Let's consider a file:
$ echo -e """This is a foo bar sentence .\nAnd this is the first txtfile in the corpus .""" > test.txt
$ cat test.txt
This is a foo bar sentence .
And this is the first txtfile in the corpus .
And when I want to read the file by character, I can do https://stackoverflow.com/a/25071590/610569:
>>> fin = open('test.txt')
>>> while fin.read(1):
... fin.seek(-1,1)
... print fin.read(1),
...
T h i s i s a f o o b a r s e n t e n c e .
A n d t h i s i s t h e f i r s t t x t f i l e i n t h e c o r p u s .
But using while loop might look a little unpythonic esp. when i use fin.read(1)
to check for EOF and then backtrack in-order to read the current byte. And so I can do something like this How to read a single character at a time from a file in Python?:
>>> import functools
>>> fin = open('test.txt')
>>> fin_1byte = iter(functools.partial(fin.read, 1), '')
>>> for c in fin_1byte:
... print c,
...
T h i s i s a f o o b a r s e n t e n c e .
A n d t h i s i s t h e f i r s t t x t f i l e i n t h e c o r p u s .
But when I tried it without the second argument, it throws a TypeError
:
>>> fin = open('test.txt')
>>> fin_1byte = functools.partial(fin.read, 1)
>>> for c in iter(fin_1byte):
... print c,
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'functools.partial' object is not iterable
What is the 2nd argument in iter
? The docs don't say much either: https://docs.python.org/2/library/functions.html#iter and https://docs.python.org/3.6/library/functions.html#iter
As per the doc:
Return an iterator object. The first argument is interpreted very differently depending on the presence of the second argument. Without a second argument, object must be a collection object which supports the iteration protocol (the iter() method), or it must support the sequence protocol (the getitem() method with integer arguments starting at 0). If it does not support either of those protocols, TypeError is raised. If the second argument, sentinel, is given, then object must be a callable object. The iterator created in this case will call object with no arguments for each call to its next() method; if the value returned is equal to sentinel, StopIteration will be raised, otherwise the value will be returned.
I guess the docs require some "decrypting":
- Without a second argument, object must be a collection object which supports the iteration protocol (the iter() method)
Does that mean it needs to come from collections
? Or is that as long as the object has an __iter__()
, that's okay?
- , or it must support the sequence protocol (the getitem() method with integer arguments starting at 0)
That's rather cryptic. So does that means it tries to see whether the sequence is indexed and hence query-able and that the index must starts from 0? Does it also mean that the indices need to be sequential, i.e. 0, 1, 2, 3, ... and not something like 0, 2, 8, 13, ...?
- If it does not support either of those protocols, TypeError is raised.
Yes, this part, I do understand =)
- If the second argument, sentinel, is given, then object must be a callable object.
Okay, now this gets a little sci-fi. Is it just a terminology in Python to call something a sentinel
? What does sentinel
mean Pythonically? And "callable object" like it's a function and not type object?
- The iterator created in this case will call object with no arguments for each call to its next() method;
This part i don't really get it, maybe an example would help.
- if the value returned is equal to sentinel, StopIteration will be raised, otherwise the value will be returned.
Okay, so sentinel
here refers to some breaking criteria?
Can someone help to decipher/clarify the meaning of the above points about iter
?
回答1:
With one argument, iter
must be given an object that has the __iter__
special method, or __getitem__
special method. If neither of them exists, iter
will raise an error
>>> iter(None)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'NoneType' object is not iterable
There are 2 protocols for iteration. The old protocol relies on calling __getitem__
for successive integers from 0 until one that raises IndexError
. The new protocol relies on the iterator that is returned from __iter__
.
In Python 2, str
doesn't even have the __iter__
special method:
Python 2.7.12+ (default, Sep 17 2016, 12:08:02)
[GCC 6.2.0 20160914] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> 'abc'.__iter__
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute '__iter__'
yet it is still iterable:
>>> iter('abc')
<iterator object at 0x7fcee9e89390>
To make your custom class iterable, you need to have either __iter__
or __getitem__
that raises IndexError
for non-existent items:
class Foo:
def __iter__(self):
return iter(range(5))
class Bar:
def __getitem__(self, i):
if i >= 5:
raise IndexError
return i
Using these:
>>> list(iter(Foo()))
[0, 1, 2, 3, 4]
>>> list(iter(Bar()))
[0, 1, 2, 3, 4]
Usually explicit iter
is not needed as for
loops and methods that expect iterables will create an iterator implicitly:
>>> list(Foo())
[0, 1, 2, 3, 4]
>>> for i in Bar():
0
1
2
3
4
With the 2 argument form, the first argument must be a function or an object that implements __call__
. The first argument is called without arguments; the return values are yielded from the iterator. The iteration stops when the value returned from the function call on that iteration equals the given sentinel value, as if by:
value = func()
if value == sentinel:
return
else:
yield value
For example, to get values on a die before we throw 6,
>>> import random
>>> throw = lambda: random.randint(1, 6)
>>> list(iter(throw, 6))
[3, 2, 4, 5, 5]
>>> list(iter(throw, 6))
[1, 3, 1, 3, 5, 1, 4]
To clarify it further, the given function (or the given object with __call__
special method) is called without arguments for each time the next()
is used on the iterator:
>>> def throw_die():
... die = random.randint(1, 6)
... print("returning {}".format(die))
... return die
...
>>> throws = iter(throw_die, 6)
>>> next(throws)
returning 2
2
>>> next(throws)
returning 4
4
>>> next(throws)
returning 6
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
(i.e. throw
is called as throw()
and if the returned value didn't equal to 6, it is yielded).
Or in the case of
>>> fin_1byte = iter(functools.partial(fin.read, 1), '')
>>> for c in fin_1byte:
... print c,
reading from a file at the end-of-file returns the empty string (or empty bytes if it was a binary file):
>>> from io import StringIO
>>> fin = StringIO(u'ab')
>>> fin.read(1)
u'a'
>>> fin.read(1)
u'b'
>>> fin.read(1)
u''
If not yet at the end of file, one character would be returned.
This can be used to also make an endless iterator from repeated function calls:
>>> dice = iter(throw, 7)
Since the value returned can never be equal to 7, the iterator runs forever. A common idiom is to use an anonymous object
to make sure that the comparison wouldn't succeed for any conceivable value
>>> dice = iter(throw, object())
Because
>>> object() != object()
True
Note, that the word sentinel is commonly used for a value that is used as an end marker in some data, and that doesn't occur naturally within the data, as in this Java answer.
来源:https://stackoverflow.com/questions/40297321/what-is-the-2nd-argument-for-the-iter-function-in-python