问题
I wrote a simple script that test the speed and this is what I found out. Actually for loop was fastest in my case. That really suprised me, check out bellow (was calculating sum of squares). Is that because it holds list in memory or is that intended? Can anyone explain this.
from functools import reduce
import datetime
def time_it(func, numbers, *args):
start_t = datetime.datetime.now()
for i in range(numbers):
func(args[0])
print (datetime.datetime.now()-start_t)
def square_sum1(numbers):
return reduce(lambda sum, next: sum+next**2, numbers, 0)
def square_sum2(numbers):
a = 0
for i in numbers:
i = i**2
a += i
return a
def square_sum3(numbers):
sqrt = lambda x: x**2
return sum(map(sqrt, numbers))
def square_sum4(numbers):
return(sum([i**2 for i in numbers]))
time_it(square_sum1, 100000, [1, 2, 5, 3, 1, 2, 5, 3])
time_it(square_sum2, 100000, [1, 2, 5, 3, 1, 2, 5, 3])
time_it(square_sum3, 100000, [1, 2, 5, 3, 1, 2, 5, 3])
time_it(square_sum4, 100000, [1, 2, 5, 3, 1, 2, 5, 3])
0:00:00.302000 #Reduce
0:00:00.144000 #For loop
0:00:00.318000 #Map
0:00:00.290000 #List comprehension`
Update - when I tried longer loops there are the results.
time_it(square_sum1, 100, range(1000))
time_it(square_sum2, 100, range(1000))
time_it(square_sum3, 100, range(1000))
time_it(square_sum4, 100, range(1000))
0:00:00.068992
0:00:00.062955
0:00:00.069022
0:00:00.057446
回答1:
Python function calls have overheads which make them relatively slow, so code that uses a simple expression will always be faster than code that wraps that expression in a function; it doesn't matter whether it's a normal def
function or a lambda
. For that reason, it's best to avoid map
or reduce
if you are going to pass them a Python function if you can do the equivalent job with a plain expression in a for
loop or a comprehension or generator expression.
There are a couple of minor optimizations that will speed up some of your functions. Don't make unnecessary assignments. Eg,
def square_sum2a(numbers):
a = 0
for i in numbers:
a += i ** 2
return a
Also, i * i
is quite a bit faster than i ** 2
because multiplication is faster than exponentiation.
As I mentioned in the comments, it's more efficient to pass sum
a generator than a list comprehension, especially if the loop is large; it probably won't make difference with a small list of length 8, but it will be quite noticeable with large lists.
sum(i*i for i in numbers)
BTW, you shouldn't use sum
or next
as variable names as that masks the built-in functions with the same names. It won't hurt anything here, but it's still not a good idea, and it makes your code look odd in an editor with more comprehensive syntax highlighting than the SO syntax highlighter.
Here's a new version of your code that uses the timeit
module. It does 3 repetitions of 10,000 loops each and sorts the results. As explained in the timeit docs, the important figure to look at in the series of the repetitions is the minimum one.
In a typical case, the lowest value gives a lower bound for how fast your machine can run the given code snippet; higher values in the result vector are typically not caused by variability in Python’s speed, but by other processes interfering with your timing accuracy. So the
min()
of the result is probably the only number you should be interested in.
from timeit import Timer
from functools import reduce
def square_sum1(numbers):
return reduce(lambda total, u: total + u**2, numbers, 0)
def square_sum1a(numbers):
return reduce(lambda total, u: total + u*u, numbers, 0)
def square_sum2(numbers):
a = 0
for i in numbers:
i = i**2
a += i
return a
def square_sum2a(numbers):
a = 0
for i in numbers:
a += i * i
return a
def square_sum3(numbers):
sqr = lambda x: x**2
return sum(map(sqr, numbers))
def square_sum3a(numbers):
sqr = lambda x: x*x
return sum(map(sqr, numbers))
def square_sum4(numbers):
return(sum([i**2 for i in numbers]))
def square_sum4a(numbers):
return(sum(i*i for i in numbers))
funcs = (
square_sum1,
square_sum1a,
square_sum2,
square_sum2a,
square_sum3,
square_sum3a,
square_sum4,
square_sum4a,
)
data = [1, 2, 5, 3, 1, 2, 5, 3]
def time_test(loops, reps):
''' Print timing stats for all the functions '''
timings = []
for func in funcs:
fname = func.__name__
setup = 'from __main__ import data, ' + fname
cmd = fname + '(data)'
t = Timer(cmd, setup)
result = t.repeat(reps, loops)
result.sort()
timings.append((result, fname))
timings.sort()
for result, fname in timings:
print('{0:14} {1}'.format(fname, result))
loops, reps = 10000, 3
time_test(loops, reps)
output
square_sum2a [0.03815755599862314, 0.03817843700016965, 0.038571521999983815]
square_sum4a [0.06384095800240175, 0.06462285799716483, 0.06579178199899616]
square_sum3a [0.07395686000018031, 0.07405958899835241, 0.07463337299850537]
square_sum1a [0.07867341000019223, 0.0788448769999377, 0.07908406700153137]
square_sum2 [0.08781023399933474, 0.08803317899946705, 0.08846573399932822]
square_sum4 [0.10260082300010254, 0.10360279499946046, 0.10415067900248687]
square_sum3 [0.12363515399920288, 0.12434166299863136, 0.1273790529994585]
square_sum1 [0.1276186039976892, 0.13786184099808452, 0.16315817699796753]
The results were obtained on an old single core 32 bit 2GHz machine running Python 3.6.0 on Linux.
回答2:
This is almost independent of the underlying programming language, as in abstractions do not come for free.
Meaning: there is always certain cost for calling methods. A stack needs to be established; control flow needs to "jump". And when you think of lower levels, such as CPUs: probably the code for that method needs to be loaded, and so on.
In other words: when your primary requirement is hard-core number crunching, then you have to balance ease-of-use with the cost of the corresponding abstractions.
Beyond: if you focus on speed, then you should look beyond python, or at least beyond "ordinary" python. Instead you could turn to modules such as numpy.
来源:https://stackoverflow.com/questions/43677763/why-is-in-my-case-for-loop-faster-vs-map-reduce-and-list-comprehension