How to have list() consume __iter__ without calling __len__?

匿名 (未验证) 提交于 2019-12-03 08:35:02

问题:

I have a class with both an __iter__ and a __len__ methods. The latter uses the former to count all elements.

It works like the following:

class A:     def __iter__(self):         print("iter")         for _ in range(5):             yield "something"      def __len__(self):         print("len")         n = 0         for _ in self:             n += 1         return n

Now if we take e.g. the length of an instance it prints len and iter, as expected:

>>> len(A()) len iter 5

But if we call list() it calls both __iter__ and __len__:

>>> list(A()) len iter iter ['something', 'something', 'something', 'something', 'something']

It works as expected if we make a generator expression:

>>> list(x for x in A()) iter ['something', 'something', 'something', 'something', 'something']

I would assume list(A()) and list(x for x in A()) to work the same but they don’t.

Note that it appears to first call __iter__, then __len__, then loop over the iterator:

class B:     def __iter__(self):         print("iter")          def gen():             print("gen")             yield "something"          return gen()      def __len__(self):         print("len")         return 1  print(list(B()))

Output:

iter len gen ['something']

How can I get list() not to call __len__ so that my instance’s iterator is not consumed twice? I could define e.g. a length or size method and one would then call A().size() but that’s less pythonic.

I tried to compute the length in __iter__ and cache it so that subsequent calls to __len__ don’t need to iter again but list() calls __len__ without starting to iterate so it doesn’t work.

Note that in my case I work on very large data collections so caching all items is not an option.

回答1:

It's a safe bet that the list() constructor is detecting that len() is available and calling it in order to pre-allocate storage for the list.

Your implementation is pretty much completely backwards. You are implementing __len__() by using __iter__(), which is not what Python expects. The expectation is that len() is a fast, efficient way to determine the length in advance.

I don't think you can convince list(A()) not to call len. As you have already observed, you can create an intermediate step that prevents len from being called.

You should definitely cache the result, if the sequence is immutable. If there are as many items as you speculate, there's no sense computing len more than once.



回答2:

You don't have to implement __len__. For an class that is iterable, it just needs to implement either of below:

  • __iter__, which returns an iterator, or a generator as in your class A & B
  • __getitems__, as long as it raises IndexError when the index is out of range

Blow code still works:

class A:     def __iter__(self):         print("iter")         for _ in range(5):             yield "something"  print list(A())

Which outputs:

iter ['something', 'something', 'something', 'something', 'something']


易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!