I sometimes used map
if there was a function/method that was written in C to get a bit extra performance. However I recently revisited some of my benchmarks and
I think a fair comparison involves using the same function and same testing conditions in Python 3.5 and 3.6 as well as when comparing map
to list comprehension in a chosen Python version.
In my initial answer I have performed multiple tests that showed that map
was still faster by a factor of about two in both versions of Python when compared to list comprehension. However some results were not conclusive and so I performed some more tests.
First let me cite some of your points stated in the question:
"... [I] noticed that the relative performance [of
map
] (compared to a similar list comprehension) drastically changed between Python 3.5 and 3.6"
You also ask:
"My question is what happened in this case that made the list-comprehension faster and the map solution slower?"
It is not very clear if you mean that map is slower than list comprehension in Python 3.6 or if you mean that map is slower in Python 3.6 than in 3.5 and list comprehension's performance has increased (albeit not necessarily to the level of beating map
).
Based on more extensive tests that I have performed after my first answer to this question, I think I have an idea of what is going on.
However, first let's create conditions for "fair" comparisons. For this we need to:
Compare performance of map
in different Python versions using the same function;
Compare performance of map
to list comprehension in the same version using same function;
Run the tests on same data;
Minimize contribution from timing functions.
Here is version information about my system:
Python 3.5.3 |Continuum Analytics, Inc.| (default, Mar 6 2017, 12:15:08)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwin
IPython 5.3.0 -- An enhanced Interactive Python.
and
Python 3.6.2 |Continuum Analytics, Inc.| (default, Jul 20 2017, 13:14:59)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwin
IPython 6.1.0 -- An enhanced Interactive Python. Type '?' for help.
Let's first address the issue of "same data". Unfortunately because you effectively are using seed(None)
, each data set lst
is different on each of the two versions of Python. This probably contributes to the difference in performance seen on two Python versions. One fix would be to set, e.g., random.seed(0)
(or something like that). I chose to create the list once and save it using numpy.save()
and then load it in each version. This is especially important because I chose to modify your tests slightly (number of "loops" and "repeats") and I have increased the length of your dataset to 100,000,000:
import numpy as np
import random
lst = [random.randint(0, 10) for _ in range(100000000)]
np.save('lst', lst, allow_pickle=False)
Second, let's use timeit
module instead of IPython's magic command %timeit
. The reason for doing this comes from the following test performed in Python 3.5:
In [11]: f = (5).__lt__
In [12]: %timeit -n1 -r20 [f(i) for i in lst]
1 loop, best of 20: 9.01 s per loop
Compare this to the result of timeit
in same version of Python:
>>> t = timeit.repeat('[f(i) for i in lst]', setup="f = (5).__lt__;
... import numpy; lst = numpy.load('lst.npy').tolist()", repeat=20,
... number=1); print(min(t), max(t), np.mean(t), np.std(t))
7.442819457995938 7.703615028003696 7.5105415405 0.0550515642854
For unknown to me reasons, IPython's magic %timeit
is adding some time compared to timeit
package. Therefore, I will use timeit
exclusively in my testing.
NOTE: In the discussions that follows I will use only minimum timing (min(t)
).
>>> import numpy as np
>>> import timeit
>>> t = timeit.repeat('list(map(f, lst))', setup="f = (5).__lt__; import numpy; lst = numpy.load('lst.npy').tolist()", repeat=20, number=1); print(min(t), max(t), np.mean(t), np.std(t))
4.666553302988177 4.811194089008495 4.72791638025 0.041115884397
>>> t = timeit.repeat('[f(i) for i in lst]', setup="f = (5).__lt__; import numpy; lst = numpy.load('lst.npy').tolist()", repeat=20, number=1); print(min(t), max(t), np.mean(t), np.std(t))
7.442819457995938 7.703615028003696 7.5105415405 0.0550515642854
>>> t = timeit.repeat('[5 < i for i in lst]', setup="import numpy; lst = numpy.load('lst.npy').tolist()", repeat=20, number=1); print(min(t), max(t), np.mean(t), np.std(t))
4.94656751700677 5.07807950800634 5.00670203845 0.0340474956945
>>> t = timeit.repeat('list(map(abs, lst))', setup="import numpy; lst = numpy.load('lst.npy').tolist()", repeat=20, number=1); print(min(t), max(t), np.mean(t), np.std(t))
4.167273573024431 4.320013975986512 4.2408865186 0.0378852782878
>>> t = timeit.repeat('[abs(i) for i in lst]', setup="import numpy; lst = numpy.load('lst.npy').tolist()", repeat=20, number=1); print(min(t), max(t), np.mean(t), np.std(t))
5.664627838006709 5.837686392012984 5.71560354655 0.0456700607748
Notice how second test (list comprehension using f(i)
) is significantly slower than third test (list comprehension using 5 < i
) indicating that f = (5).__lt__
is not identical (or almost identical) to 5 < i
from the code perspective.
>>> t = timeit.repeat('f(1)', setup="f = (5).__lt__", repeat=20, number=1000000); print(min(t), max(t), np.mean(t), np.std(t))
0.052280781004810706 0.05500587198184803 0.0531139718529 0.000877649561967
>>> t = timeit.repeat('5 < 1', repeat=20, number=1000000); print(min(t), max(t), np.mean(t), np.std(t))
0.030931947025237605 0.033691533986711875 0.0314959864045 0.000633274658428
>>> t = timeit.repeat('abs(1)', repeat=20, number=1000000); print(min(t), max(t), np.mean(t), np.std(t))
0.04685414198320359 0.05405496899038553 0.0483296330043 0.00162837880358
Notice how again first test (of f(1)
) is significantly slower than second test (of 5 < 1
) further supporting that f = (5).__lt__
is not identical (or almost identical) to 5 < i
from the code perspective.
>>> import numpy as np
>>> import timeit
>>> t = timeit.repeat('list(map(f, lst))', setup="f = (5).__lt__; import numpy; lst = numpy.load('lst.npy').tolist()", repeat=20, number=1); print(min(t), max(t), np.mean(t), np.std(t))
4.599696700985078 4.743880658003036 4.6631793691 0.0425774678203
>>> t = timeit.repeat('[f(i) for i in lst]', setup="f = (5).__lt__; import numpy; lst = numpy.load('lst.npy').tolist()", repeat=20, number=1); print(min(t), max(t), np.mean(t), np.std(t))
7.316072431014618 7.572676292009419 7.3837024617 0.0574811241553
>>> t = timeit.repeat('[5 < i for i in lst]', setup="import numpy; lst = numpy.load('lst.npy').tolist()", repeat=20, number=1); print(min(t), max(t), np.mean(t), np.std(t))
4.570452399988426 4.679144663008628 4.61264215875 0.0265541828693
>>> t = timeit.repeat('list(map(abs, lst))', setup="import numpy; lst = numpy.load('lst.npy').tolist()", repeat=20, number=1); print(min(t), max(t), np.mean(t), np.std(t))
2.742673939006636 2.8282236389932223 2.78504617405 0.0260357089928
>>> t = timeit.repeat('[abs(i) for i in lst]', setup="import numpy; lst = numpy.load('lst.npy').tolist()", repeat=20, number=1); print(min(t), max(t), np.mean(t), np.std(t))
6.2177103200228885 6.428813881997485 6.28722427145 0.0493010620999
>>> t = timeit.repeat('f(1)', setup="f = (5).__lt__", repeat=20, number=1000000); print(min(t), max(t), np.mean(t), np.std(t))
0.051936342992121354 0.05764096099301241 0.0532974587506 0.00117079475737
>>> t = timeit.repeat('5 < 1', repeat=20, number=1000000); print(min(t), max(t), np.mean(t), np.std(t))
0.02675032999832183 0.032919151999522 0.0285137565021 0.00156522182488
>>> t = timeit.repeat('abs(1)', repeat=20, number=1000000); print(min(t), max(t), np.mean(t), np.std(t))
0.047831349016632885 0.0531779529992491 0.0482893927969 0.00112825297875
Notice how again first test (of f(1)
) is significantly slower than second test (of 5 < 1
) further supporting that f = (5).__lt__
is not identical (or almost identical) to 5 < i
from the code perspective.
I do not know how reliable are these timing tests and it is also difficult to separate all factors that contribute to these timing results. However we can notice from the "Group 2" of tests that the only "individual" test that significantly changed its timing is the test of 5 < 1
: it went down to 0.0268s in Python 3.6 from 0.0309s in Python 3.5. This makes the list comprehension test in Python 3.6 that uses 5 < i
to run faster than a similar test in Python 3.5. However, this does not mean that list comprehension become faster in Python 3.6.
Let's compare relative performance of map
to list comprehension for the same function in the same Python version. Then we get in Python 3.5: r(f) = 7.4428/4.6666 = 1.595
, r(abs) = 5.665/4.167 = 1.359
and in Python 3.6: r(f) = 7.316/4.5997 = 1.591
, r(abs) = 6.218/2.743 = 2.267
. Based on these relative performances we can see that in Python 3.6 performance of the map
relative to the performance of list comprehension is at least the same as in Python 3.5 for the f = (5).__lt__
function and this ratio has even improved for a function such as abs()
in Python 3.6.
In any case, I believe that there is no evidence that list comprehension became faster in Python 3.6 neither in the relative nor in the absolute sense. The only performance improvement is for [5 < i for i in lst]
test but that is because 5 < i
itself became faster in Python 3.6 and not due to the list comprehension itself being faster.
I think a fair comparison would involve using the same function. In the case of your example, when comparison is fair, map
still wins:
>>> import sys
>>> print(sys.version)
3.6.2 |Continuum Analytics, Inc.| (default, Jul 20 2017, 13:14:59)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]
>>> import random
>>> lst = [random.randint(0, 10) for _ in range(100000)]
>>> assert list(map((5).__lt__, lst)) == [5 < i for i in lst]
>>> f = (5).__lt__
>>> %timeit list(map(f, lst))
4.63 ms ± 110 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> %timeit [f(i) for i in lst]
9.17 ms ± 177 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
While in Python 3.5 (at least on my system) map
is faster than in Python 3.6, so is list comprehension:
>>> print(sys.version)
3.5.3 |Continuum Analytics, Inc.| (default, Mar 6 2017, 12:15:08)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]
>>> %timeit list(map(f, lst))
100 loops, best of 3: 4.36 ms per loop
>>> %timeit [f(i) for i in lst]
100 loops, best of 3: 8.12 ms per loop
Still, when using the same function, map
is ~2x faster than list comprehension in both Python 3.5 and 3.6.
It is my opinion that performing "fair" comparisons is important in answering OP's question: "My question is what happened in this case that made the list-comprehension faster and the map solution slower?" (last paragraph). However, in the first paragraph, @MSeifert says: "... [I] noticed that the relative performance (compared to a similar list comprehension) drastically changed between Python 3.5 and 3.6" That is, the comparison is between a map
and list comprehension
. Yet, @MSeifert tests are set-up as follows:
timig_map_35 = Timing(list(map(f, lst)))
timing_list_35 = Timing([g(i) for i in lst])
This kind of testing makes it difficult to find the cause of timing differences: are they because list comprehension became faster in 3.6 or map became slower in 3.6 or f(i)
is slower in 3.6 or g(i)
is faster in 3.6...
Therefore, I proposed to introduce f = (5).__lt__
and use the same function in both map
and list comprehension tests. I have also modified @MSeifert test by increasing the number of elements in the list and redusing number of "loops" in the timeit
:
import random
lst = [random.randint(0, 10) for _ in range(1000000)] # 10x more elements
f = (5).__lt__
%timeit -n1 -r1000 list(map(f, lst)) # f = (5).__lt__
%timeit -n1 -r1000 [f(i) for i in lst] # f(i) = (5).__lt__(i)
%timeit -n1 -r1000 [5 < i for i in lst] # g(i) = 5 < i
%timeit -n1 -r1000 [1 for _ in lst] # h(i) = 1
In Python 3.6 I get:
43.5 ms ± 1.79 ms per loop (mean ± std. dev. of 1000 runs, 1 loop each)
82.2 ms ± 2.39 ms per loop (mean ± std. dev. of 1000 runs, 1 loop each)
43.6 ms ± 1.64 ms per loop (mean ± std. dev. of 1000 runs, 1 loop each)
23.8 ms ± 1.27 ms per loop (mean ± std. dev. of 1000 runs, 1 loop each)
In Python 3.5 I get:
1 loop, best of 1000: 43.7 ms per loop
1 loop, best of 1000: 78.9 ms per loop
1 loop, best of 1000: 46 ms per loop
1 loop, best of 1000: 26.8 ms per loop
In my opinion this shows that list comprehension is slightly faster in 3.6 than in 3.5 except when f
is used. Therefore, it is difficult to conclude that it is the map
that is slower in Python 3.6 or is first timeit
above is slower because of the call to f
being slower. Therefore I performed two more tests:
%timeit -n1 -r1000 list(map(abs, lst))
%timeit -n1 -r1000 [abs(i) for i in lst]
%timeit -n1000000 -r1000 f(1)
In Python 3.6 I get:
25.8 ms ± 1.42 ms per loop (mean ± std. dev. of 1000 runs, 1 loop each)
67.1 ms ± 2.07 ms per loop (mean ± std. dev. of 1000 runs, 1 loop each)
64.7 ns ± 2.22 ns per loop (mean ± std. dev. of 1000 runs, 1000000 loops each)
In Python 3.5 I get:
1 loop, best of 1000: 38.3 ms per loop
1 loop, best of 1000: 56.4 ms per loop
1000000 loops, best of 1000: 59.6 ns per loop
This shows that map
can be significantly faster than list comprehension for some functions: specifically, for abs(x)
relative performance of map
versus "list comprehension" in Python 3.6 is 67.1/25.8 = 2.60
while in Python 3.5 it is 56.4/38.3 = 1.47
. Therefore it is interesting to know why @MSeifert test shows that map
is slower in Python 3.6. My last test above shows timing test for f(1)
"alone". I am not sure how valid this test is (unfortunately) - I wanted to avoid using map
or [for]
to eliminate one variable - but it shows that in Python 3.6 f = (5).__lt__
became slower than in Python 3.5. Therefore I conclude that it is the particular form of the function f
((5).__lt__
) whose evaluation slowed and not the map
function. I know this last "alone" test is likely a bad test, however, the fact that map
is very fast (relatively or absolutely) when used with abs
shows that the problem is in f
and not in map
.
NOTE: Python 3.5 uses IPython 5.3.0 and Python 3.6 uses IPython 6.1.0.