Why is a `for` loop so much faster to count True values?

后端 未结 5 1601
忘掉有多难
忘掉有多难 2020-12-08 13:20

I recently answered a question on a sister site which asked for a function that counts all even digits of a number. One of the other answers contained two functions (which t

5条回答
  •  醉梦人生
    2020-12-08 13:35

    There are a few differences that actually contribute to the observed performance differences. I aim to give a high-level overview of these differences but try not to go too much into the low-level details or possible improvements. For the benchmarks I use my own package simple_benchmark.

    Generators vs. for loops

    Generators and generator expressions are syntactic sugar that can be used instead of writing iterator classes.

    When you write a generator like:

    def count_even(num):
        s = str(num)
        for c in s:
            yield c in '02468'
    

    Or a generator expression:

    (c in '02468' for c in str(num))
    

    That will be transformed (behind the scenes) into a state machine that is accessible through an iterator class. In the end it will be roughly equivalent to (although the actual code generated around a generator will be faster):

    class Count:
        def __init__(self, num):
            self.str_num = iter(str(num))
    
        def __iter__(self):
            return self
    
        def __next__(self):
            c = next(self.str_num)
            return c in '02468'
    

    So a generator will always have one additional layer of indirection. That means that advancing the generator (or generator expression or iterator) means that you call __next__ on the iterator that is generated by the generator which itself calls __next__ on the object you actually want to iterate over. But it also has some overhead because you actually need to create one additional "iterator instance". Typically these overheads are negligible if you do anything substantial in each iteration.

    Just to provide an example how much overhead a generator imposes compared to a manual loop:

    import matplotlib.pyplot as plt
    from simple_benchmark import BenchmarkBuilder
    %matplotlib notebook
    
    bench = BenchmarkBuilder()
    
    @bench.add_function()
    def iteration(it):
        for i in it:
            pass
    
    @bench.add_function()
    def generator(it):
        it = (item for item in it)
        for i in it:
            pass
    
    @bench.add_arguments()
    def argument_provider():
        for i in range(2, 15):
            size = 2**i
            yield size, [1 for _ in range(size)]
    
    plt.figure()
    result = bench.run()
    result.plot()
    

    Generators vs. List comprehensions

    Generators have the advantage that they don't create a list, they "produce" the values one-by-one. So while a generator has the overhead of the "iterator class" it can save the memory for creating an intermediate list. It's a trade-off between speed (list comprehension) and memory (generators). This has been discussed in various posts around StackOverflow so I don't want to go into much more detail here.

    import matplotlib.pyplot as plt
    from simple_benchmark import BenchmarkBuilder
    %matplotlib notebook
    
    bench = BenchmarkBuilder()
    
    @bench.add_function()
    def generator_expression(it):
        it = (item for item in it)
        for i in it:
            pass
    
    @bench.add_function()
    def list_comprehension(it):
        it = [item for item in it]
        for i in it:
            pass
    
    @bench.add_arguments('size')
    def argument_provider():
        for i in range(2, 15):
            size = 2**i
            yield size, list(range(size))
    
    plt.figure()
    result = bench.run()
    result.plot()
    

    sum should be faster than manual iteration

    Yes, sum is indeed faster than an explicit for loop. Especially if you iterate over integers.

    import matplotlib.pyplot as plt
    from simple_benchmark import BenchmarkBuilder
    %matplotlib notebook
    
    bench = BenchmarkBuilder()
    
    @bench.add_function()
    def my_sum(it):
        sum_ = 0
        for i in it:
            sum_ += i
        return sum_
    
    bench.add_function()(sum)
    
    @bench.add_arguments()
    def argument_provider():
        for i in range(2, 15):
            size = 2**i
            yield size, [1 for _ in range(size)]
    
    plt.figure()
    result = bench.run()
    result.plot()
    

    String methods vs. Any kind of Python loop

    To understand the performance difference when using string methods like str.count compared to loops (explicit or implicit) is that strings in Python are actually stored as values in an (internal) array. That means a loop doesn't actually call any __next__ methods, it can use a loop directly over the array, this will be significantly faster. However it also imposes a method lookup and a method call on the string, that's why it's slower for very short numbers.

    Just to provide a small comparison how long it takes to iterate a string vs. how long it takes Python to iterate over the internal array:

    import matplotlib.pyplot as plt
    from simple_benchmark import BenchmarkBuilder
    %matplotlib notebook
    
    bench = BenchmarkBuilder()
    
    @bench.add_function()
    def string_iteration(s):
        # there is no "a" in the string, so this iterates over the whole string
        return 'a' in s  
    
    @bench.add_function()
    def python_iteration(s):
        for c in s:
            pass
    
    @bench.add_arguments('string length')
    def argument_provider():
        for i in range(2, 20):
            size = 2**i
            yield size, '1'*size
    
    plt.figure()
    result = bench.run()
    result.plot()
    

    In this benchmark it's ~200 times faster to let Python do the iteration over the string than to iterate over the string with a for loop.

    Why do all of them converge for large numbers?

    This is actually because the number to string conversion will be dominant there. So for really huge numbers you're essentially just measuring how long it takes to convert that number to a string.

    You'll see the difference if you compare the versions that take a number and convert it to a string with the one that take the converted number (I use the functions from another answer here to illustrate that). Left is the number-benchmark and on the right is the benchmark that takes the strings - also the y-axis is the same for both plots:

    As you can see the benchmarks for the functions that take the string are significantly faster for large numbers than the ones that take a number and convert them to a string inside. This indicates that the string-conversion is the "bottleneck" for large numbers. For convenience I also included a benchmark only doing the string conversion to the left plot (which becomes significant/dominant for large numbers).

    %matplotlib notebook
    
    from simple_benchmark import BenchmarkBuilder
    import matplotlib.pyplot as plt
    import random
    
    bench1 = BenchmarkBuilder()
    
    @bench1.add_function()
    def f1(x):
        return sum(c in '02468' for c in str(x))
    
    @bench1.add_function()
    def f2(x):
        return sum([c in '02468' for c in str(x)])
    
    @bench1.add_function()
    def f3(x):
        return sum([True for c in str(x) if c in '02468'])    
    
    @bench1.add_function()
    def f4(x):
        return sum([1 for c in str(x) if c in '02468'])
    
    @bench1.add_function()
    def explicit_loop(x):
        count = 0
        for c in str(x):
            if c in '02468':
                count += 1
        return count
    
    @bench1.add_function()
    def f5(x):
        s = str(x)
        return sum(s.count(c) for c in '02468')
    
    bench1.add_function()(str)
    
    @bench1.add_arguments(name='number length')
    def arg_provider():
        for i in range(2, 15):
            size = 2 ** i
            yield (2**i, int(''.join(str(random.randint(0, 9)) for _ in range(size))))
    
    
    bench2 = BenchmarkBuilder()
    
    @bench2.add_function()
    def f1(x):
        return sum(c in '02468' for c in x)
    
    @bench2.add_function()
    def f2(x):
        return sum([c in '02468' for c in x])
    
    @bench2.add_function()
    def f3(x):
        return sum([True for c in x if c in '02468'])    
    
    @bench2.add_function()
    def f4(x):
        return sum([1 for c in x if c in '02468'])
    
    @bench2.add_function()
    def explicit_loop(x):
        count = 0
        for c in x:
            if c in '02468':
                count += 1
        return count
    
    @bench2.add_function()
    def f5(x):
        return sum(x.count(c) for c in '02468')
    
    @bench2.add_arguments(name='number length')
    def arg_provider():
        for i in range(2, 15):
            size = 2 ** i
            yield (2**i, ''.join(str(random.randint(0, 9)) for _ in range(size)))
    
    f, (ax1, ax2) = plt.subplots(1, 2, sharey=True)
    b1 = bench1.run()
    b2 = bench2.run()
    b1.plot(ax=ax1)
    b2.plot(ax=ax2)
    ax1.set_title('Number')
    ax2.set_title('String')
    

提交回复
热议问题