python generator is too slow to use it. why should I use it? and when?

问题

Recently I got question about which one is the most fastest thing among iterator, list comprehension, iter(list comprehension) and generator. and then make simple code as below.

n = 1000000
iter_a = iter(range(n))
list_comp_a = [i for i in range(n)]
iter_list_comp_a = iter([i for i in range(n)])
gene_a = (i for i in range(n))

import time
import numpy as np

for xs in [iter_a, list_comp_a, iter_list_comp_a, gene_a]:
    start = time.time()
    np.sum(xs)
    end = time.time()
    print((end-start)*100)

the result is below.

0.04439353942871094 # iterator
9.257078170776367 # list_comprehension
0.006318092346191406 # iterator of list_comprehension
7.491207122802734 # generator

generator is so slower than other thing. and I don't know when it is useful?

回答1:

generators do not store all elements in a memory in one go. They yield one at a time, and this behavior makes them memory efficient. Thus you can use them when memory is a constraint.

回答2:

I think I asked a wrong question, maybe. in original code, it was not correct because the np.sum doesn't works well. np.sum(iterator) doesn't return correct answer. So, I changed my code like below.

n = 10000
iter_a = iter(range(n))
list_comp_a = [i for i in range(n)]
iter_list_comp_a = iter([i for i in range(n)])
gene_a = (i for i in range(n))

import time
import numpy as np
import timeit

for xs in [iter_a, list_comp_a, iter_list_comp_a, gene_a]:
    start = time.time()
    sum(xs)
    end = time.time()
    print("type: {}, performance: {}".format(type(xs), (end-start)*100))

and then, performance is like below. the performance of list is best and iterator is not good.

type: <class 'range_iterator'>, performance: 0.021791458129882812
type: <class 'list'>, performance: 0.013279914855957031
type: <class 'list_iterator'>, performance: 0.02429485321044922
type: <class 'generator'>, performance: 0.13570785522460938

and like @Kishor Pawar already mentioned, the list is better for performance, but when memory size is not enough, sum of list with too high n make the computer slower, but sum of iterator with too high n, maybe it it really a lot of time to compute, but didn't make the computer slow.

Thx for all. When I have to compute a lot of lot of data, generator is better. but,

回答3:

As a preamble : your whole benchmark is just plain wrong - the "list_comp_a" test doesn't test the construction time of a list using a list comprehension (nor does "iter_list_comp_a" fwiw), and the tests using iter() are mostly irrelevant - iter(iterable) is just a shortcut for iterable.__iter__() and is only of any use if you want to manipulate the iterator itself, which is practically quite rare.

If you hope to get some meaningful results, what you want to benchmark are the execution of a list comprehension, a generator expression and a generator function. To test their execution, the simplest way is to wrap all three cases in functions, one execution a list comprehension and the other two building lists from resp. a generator expression and a generator built from a generator function). In all cases I used xrange as the real source so we only benchmark the effective differences. Also we use timeit.timeit to do the benchmark as it's more reliable than manually messing with time.time(), and is actually the pythonic standard canonical way to benchmark small code snippets.

import timeit
# py2 / py3 compat
try:
    xrange
except NameError:
    xrange = range

n = 1000

def test_list_comp():
    return [x for x in xrange(n)]

def test_genexp():
    return list(x for x in xrange(n))

def mygen(n):
    for x in xrange(n):
        yield x

def test_genfunc():
    return list(mygen(n))

for fname in "test_list_comp", "test_genexp", "test_genfunc":
    result = timeit.timeit("fun()", "from __main__ import {} as fun".format(fname), number=10000)
    print("{} : {}".format(fname, result))

Here (py 2.7.x on a 5+ years old standard desktop) I get the following results:

test_list_comp : 0.254354953766
test_genexp : 0.401108026505
test_genfunc : 0.403750896454

As you can see, list comprehensions are faster, and generator expressions and generator functions are mostly equivalent with a very slight (but constant if you repeat the test) advantage to generator expressions.

Now to answer your main question "why and when would you use generators", the answer is threefold: 1/ memory use, 2/ infinite iterations and 3/ coroutines.

First point : memory use. Actually, you don't need generators here, only lazy iteration, which can be obtained by writing your own iterable / iterable - like for example the builtin file type does - in a way to avoid loading everything in memory and only generating values on the fly. Here generators expressions and functions (and the underlying generator class) are a generic way to implement lazy iteration without writing your own iterable / iterator (just like the builtin property class is a generic way to use custom descriptors without wrting your own descriptor class).

Second point: infinite iteration. Here we have something that you can't get from sequence types (lists, tuples, sets, dicts, strings etc) which are, by definition, finite). An example is the itertools.cycle iterator:

Return elements from the iterable until it is exhausted. Then repeat the sequence indefinitely.

Note that here again this ability comes not from generator functions or expressions but from the iterable/iterator protocol. There are obviously less use case for infinite iteration than for memory use optimisations, but it's still a handy feature when you need it.

And finally the third point: coroutines. Well, this is a rather complex concept specially the first time you read about it, so I'll let someone else do the introduction : https://jeffknupp.com/blog/2013/04/07/improve-your-python-yield-and-generators-explained/

Here you have something that only generators can offer, not a handy shortcut for iterables/iterators.

来源：https://stackoverflow.com/questions/49275927/python-generator-is-too-slow-to-use-it-why-should-i-use-it-and-when

标签

python

performance

generator

list-comprehension