If I have a function to, for example, check if list1
is a sublist of list2
, which option is better:
Option 1:
Storing local is better because the lookup that python does is faster. It even pays off to store functions locally.
Performance questions are best answered by timing them - you can measure this using timeit:
import timeit
def noTempFunc():
for _ in range(200):
max([1,4,5,6])
def tempFunc():
m = max
for _ in range(200):
m([1,4,5,6])
print(timeit.timeit(noTempFunc, number=1000)) # 0.055301458000030834
print(timeit.timeit(tempFunc, number=1000)) # 0.049811941999905684 : 11% faster
In this case the max()
of the global context only has to be looked up once, and further lookups are done locally which is ~ 11% faster based on these numbers.
It pays up if you use your local m()
multiple times.
In your case it would be sensible to cache the len_list1 = len(list1)
- as it is used lots of time and does not change.
To make it more readable, you can consider:
def isSublist(list1, list2):
"""Checks if list2 is a sublist of list1"""
len_part = len(list2) # reused inside the list comp, only "calulated" once
return any( x == list2 for x in (list1[i:i+len_part]
for i in range(len(list1)-len_part+1) ))
print(isSublist([1,2,3,4],[1]))
print(isSublist([1,2,3,4],[2,3]))
print(isSublist([1,2,3,4],[1,2,4]))
print(isSublist([1,2,3,4],[1,2,3,4]))
Output:
True
True
False
True
Lookups:
Your faster version with cached length as well (based on Scott Mermelstein answer):
def isSublist1a(list1,list2): # cached length as well
l1 = len(list1)
for i in range(len(list2)):
part=list2[i:]
if len(part)<l1:
return False
if list1==part[:l1]:
return True
return False
list1=list(range(1000))
list2=list(range(400,420))
import timeit
print(timeit.timeit('isSublist1(list2,list1)', globals=globals(),number=1000))
print(timeit.timeit('isSublist1a(list2,list1)', globals=globals(),number=1000))
print(timeit.timeit('isSublist2(list2,list1)', globals=globals(),number=1000))
delivers (2 executions):
0.08652938600062043 # cached part
0.08017484299944044 # cached part + cached list1 lenght - slightly faster
0.15090413599955355 # non-cached version
0.8882850420004615 # cached part
0.8294611960000111 # cached part + cached list1 lenght - slightly faster
1.5524438030006422 # non-cached version
Supplementary to Patrick's excellent answer, let's try timeit with your actual code:
>>> def isSublist1(list1,list2):
... for i in range(len(list2)):
... part=list2[i:]
... if len(part)<len(list1):
... return False
... if list1==part[:len(list1)]:
... return True
... return False
...
>>> def isSublist2(list1,list2):
... for i in range(len(list2)):
... if len(list2[i:])<len(list1):
... return False
... if list1==list2[i:][:len(list1)]:
... return True
... return False
...
>>> list1=list(range(10000))
>>> list2=list(range(4000,4020))
>>> import timeit
>>> timeit.timeit('isSublist1(list2,list1)', globals=globals(),number=100)
6.420147094002459
>>> timeit.timeit('isSublist2(list2,list1)', globals=globals(),number=100)
12.455138996010646
So on my system, the amount of time needed with your temporary variable is just about half the time needed without it.
I don't know the nature of your lists and sublists; you may want to change what list1 and list2 are to be a good reflection of how your code will be used, but at least on my side, it looks like saving that temporary variable is a very good idea.
Incidentally, let's do another interesting experiment:
>>> def isSublist3(list1,list2):
... ln = len(list1)
... for i in range(len(list2)):
... part=list2[i:]
... if len(part)<ln:
... return False
... if list1==part[:ln]:
... return True
... return False
...
>>> timeit.timeit('isSublist1(list2,list1)',globals=globals(),number=100); timeit.timeit('isSublist3(list2,list1)',globals=globals(),number=100)
6.549526696035173
6.481004184985068
I ran it a few more times to see what I'd get:
6.470875242026523 6.463623657007702
6.151073662971612 5.787795798969455
5.685607994964812 5.655005165026523
6.439315696014091 6.372227535001002
Note that each time, the cached length takes less time than the non-cached length, though you don't get nearly the performance improvement you had by caching the slicing.
Note also that it's important not to draw too many conclusions from a single run of timeit. There are a lot of other variables affecting the timing, (in my case, pretty clearly, something happened to make it drop from 6.4 to 5.7 - in the middle of a single test!) so if you want to come up with a good rule you can count on, test several times to make sure you get consistent results.