Why does len() not support iterators?

问题

Many of Python's built-in functions (any(), all(), sum() to name some) take iterables but why does len() not?

One could always use sum(1 for i in iterable) as an equivalent, but why is it len() does not take iterables in the first place?

回答1:

Many iterables are defined by generator expressions which don't have a well defined len. Take the following which iterates forever:

def sequence(i=0):
    while True:
        i+=1
        yield i

Basically, to have a well defined length, you need to know the entire object up front. Contrast that to a function like sum. You don't need to know the entire object at once to sum it -- Just take one element at a time and add it to what you've already summed.

Be careful with idioms like sum(1 for i in iterable), often it will just exhaust iterable so you can't use it anymore. Or, it could be slow to get the i'th element if there is a lot of computation involved. It might be worth asking yourself why you need to know the length a-priori. This might give you some insight into what type of data-structure to use (frequently list and tuple work just fine) -- or you may be able to perform your operation without needing calling len.

回答2:

This is an iterable:

def forever():
    while True:
        yield 1

Yet, it has no length. If you want to find the length of a finite iterable, the only way to do so, by definition of what an iterable is (something you can repeatedly call to get the next element until you reach the end) is to expand the iterable out fully, e.g.:

len(list(the_iterable))

As mgilson pointed out, you might want to ask yourself - why do you want to know the length of a particular iterable? Feel free to comment and I'll add a specific example.

If you want to keep track of how many elements you have processed, instead of doing:

num_elements = len(the_iterable)
for element in the_iterable:
    ...

do:

num_elements = 0
for element in the_iterable:
    num_elements += 1
    ...

If you want a memory-efficient way of seeing how many elements end up being in a comprehension, for example:

num_relevant = len(x for x in xrange(100000) if x%14==0)

It wouldn't be efficient to do this (you don't need the whole list):

num_relevant = len([x for x in xrange(100000) if x%14==0])

sum would probably be the most handy way, but it looks quite weird and it isn't immediately clear what you're doing:

num_relevant = sum(1 for _ in (x for x in xrange(100000) if x%14==0))

So, you should probably write your own function:

def exhaustive_len(iterable):
    length = 0
    for _ in iterable: length += 1
    return length

exhaustive_len(x for x in xrange(100000) if x%14==0)

The long name is to help remind you that it does consume the iterable, for example, this won't work as you might think:

def yield_numbers():
    yield 1; yield 2; yield 3; yield 5; yield 7

the_nums = yield_numbers()
total_nums = exhaustive_len(the_nums)
for num in the_nums:
    print num

because exhaustive_len has already consumed all the elements.

EDIT: Ah in that case you would use exhaustive_len(open("file.txt")), as you have to process all lines in the file one-by-one to see how many there are, and it would be wasteful to store the entire file in memory by calling list.

来源：https://stackoverflow.com/questions/11463086/why-does-len-not-support-iterators

标签

python

iterable