As answer to my question Find the 1 based position to which two lists are the same I got the hint to use the C-library itertools to speed up things.
To verify I code
I imagine the issue here is your test lists are tiny - meaning any difference is likely to be minimal, and the cost of creating the iterators is outweighing the gains they give.
In larger tests (where the performance is more likely to matter), the version using sum()
will likely outperform the other version.
Also, there is the matter of style - the manual version is longer, and relies on iterating by index, making it less flexible as well.
I would argue the most readable solution would be something like this:
def while_equal(seq, other):
for this, that in zip(seq, other):
if this != that:
return
yield this
def match(seq, other):
return sum(1 for _ in while_equal(seq, other))
Interestingly, on my system a slightly modified version of this:
def while_equal(seq, other):
for this, that in zip(seq, other):
if this != that:
return
yield 1
def match(seq, other):
return sum(while_equal(seq, other))
Performs better than the pure loop version:
a = [0, 1, 2, 3, 4]
b = [0, 1, 2, 3, 4, 0]
import timeit
print(timeit.timeit('match_loop(a,b)', 'from __main__ import a, b, match_loop'))
print(timeit.timeit('match(a,b)', 'from __main__ import match, a, b'))
Giving:
1.3171300539979711
1.291257290984504
That said, if we improve the pure loop version to be more Pythonic:
def match_loop(seq, other):
count = 0
for this, that in zip(seq, other):
if this != that:
return count
count += 1
return count
This times (using the same method as above) at 0.8548871780512854
for me, significantly faster than any other method, while still being readable. This is probably due to looping by index in the original version, which is generally very slow. I, however, would go for the first version in this post, as I feel it's the most readable.
timeit
to time small bits of code. I find that approach to be a little easier than using profile
. (profile
is good for finding bottlenecks though).itertools
is, in general, pretty fast. However, especially in this case, your takewhile
is going to slow things down because itertools needs to call a function for every element along the way. Each function call in python has a reasonable amount of overhead associated with it so that might be slowing you down a bit (there's also the cost of creating the lambda function in the first place). Notice that sum
with the generator expression also adds a little overhead. Ultimately though, it appears that a basic loop wins in this situation all the time.
from itertools import takewhile, izip
def match_iter(self, other):
return sum(1 for x in takewhile(lambda x: x[0] == x[1],
izip(self, other)))
def match_loop(self, other):
cmp = lambda x1,x2: x1 == x2
for element in range(min(len(self), len(other))):
if self[element] == other[element]:
element += 1
else:
break
return element
def match_loop_lambda(self, other):
cmp = lambda x1,x2: x1 == x2
for element in range(min(len(self), len(other))):
if cmp(self[element],other[element]):
element += 1
else:
break
return element
def match_iter_nosum(self,other):
element = 0
for _ in takewhile(lambda x: x[0] == x[1],
izip(self, other)):
element += 1
return element
def match_iter_izip(self,other):
element = 0
for x1,x2 in izip(self,other):
if x1 == x2:
element += 1
else:
break
return element
a = [0, 1, 2, 3, 4]
b = [0, 1, 2, 3, 4, 0]
import timeit
print timeit.timeit('match_iter(a,b)','from __main__ import a,b,match_iter')
print timeit.timeit('match_loop(a,b)','from __main__ import a,b,match_loop')
print timeit.timeit('match_loop_lambda(a,b)','from __main__ import a,b,match_loop_lambda')
print timeit.timeit('match_iter_nosum(a,b)','from __main__ import a,b,match_iter_nosum')
print timeit.timeit('match_iter_izip(a,b)','from __main__ import a,b,match_iter_izip')
Notice however, that the fastest version is a hybrid of a loop+itertools. This (explicit) loop over izip
also happens to be easier to read (in my opinion). So, we can conclude from this that takewhile
is the slow-ish part, not necessarily itertools
in general.