I have a class with both an __iter__
and a __len__
methods. The latter uses the former to count all elements.
It works like the following:
class A: def __iter__(self): print("iter") for _ in range(5): yield "something" def __len__(self): print("len") n = 0 for _ in self: n += 1 return n
Now if we take e.g. the length of an instance it prints len
and iter
, as expected:
>>> len(A()) len iter 5
But if we call list()
it calls both __iter__
and __len__
:
>>> list(A()) len iter iter ['something', 'something', 'something', 'something', 'something']
It works as expected if we make a generator expression:
>>> list(x for x in A()) iter ['something', 'something', 'something', 'something', 'something']
I would assume list(A())
and list(x for x in A())
to work the same but they don’t.
Note that it appears to first call __iter__
, then __len__
, then loop over the iterator:
class B: def __iter__(self): print("iter") def gen(): print("gen") yield "something" return gen() def __len__(self): print("len") return 1 print(list(B()))
Output:
iter len gen ['something']
How can I get list()
not to call __len__
so that my instance’s iterator is not consumed twice? I could define e.g. a length
or size
method and one would then call A().size()
but that’s less pythonic.
I tried to compute the length in __iter__
and cache it so that subsequent calls to __len__
don’t need to iter again but list()
calls __len__
without starting to iterate so it doesn’t work.
Note that in my case I work on very large data collections so caching all items is not an option.