Should I store a calculation in a variable if it will be used a lot?

后端 未结 2 453
死守一世寂寞
死守一世寂寞 2021-01-22 00:42

If I have a function to, for example, check if list1 is a sublist of list2, which option is better:

Option 1:



        
相关标签:
2条回答
  • 2021-01-22 01:18

    Storing local is better because the lookup that python does is faster. It even pays off to store functions locally.

    Performance questions are best answered by timing them - you can measure this using timeit:

    import timeit
    
    def noTempFunc():
        for _ in range(200):
            max([1,4,5,6])
    
    def tempFunc():
        m = max
        for _ in range(200):
            m([1,4,5,6])
    
    
    print(timeit.timeit(noTempFunc, number=1000))   #  0.055301458000030834
    print(timeit.timeit(tempFunc, number=1000))     #  0.049811941999905684 : 11% faster
    

    In this case the max() of the global context only has to be looked up once, and further lookups are done locally which is ~ 11% faster based on these numbers.

    It pays up if you use your local m() multiple times.


    In your case it would be sensible to cache the len_list1 = len(list1) - as it is used lots of time and does not change.

    To make it more readable, you can consider:

    def isSublist(list1, list2):
        """Checks if list2 is a sublist of list1"""
        len_part = len(list2)  # reused inside the list comp, only "calulated" once
        return any( x == list2 for x in (list1[i:i+len_part] 
                                         for i in range(len(list1)-len_part+1) ))
    
    
    print(isSublist([1,2,3,4],[1]))
    print(isSublist([1,2,3,4],[2,3]))
    print(isSublist([1,2,3,4],[1,2,4]))
    print(isSublist([1,2,3,4],[1,2,3,4]))
    

    Output:

    True
    True
    False
    True
    

    Lookups:

    • any() && How exactly does the python any() function work?
    • Split a python list into other "sublists" i.e smaller lists

    Your faster version with cached length as well (based on Scott Mermelstein answer):

    def isSublist1a(list1,list2): # cached length as well
        l1 = len(list1)
        for i in range(len(list2)):
            part=list2[i:]
            if len(part)<l1:
                return False
            if list1==part[:l1]:
                return True
        return False
    
    list1=list(range(1000))
    list2=list(range(400,420))
    import timeit
    print(timeit.timeit('isSublist1(list2,list1)', globals=globals(),number=1000))
    print(timeit.timeit('isSublist1a(list2,list1)', globals=globals(),number=1000))
    print(timeit.timeit('isSublist2(list2,list1)', globals=globals(),number=1000))
    

    delivers (2 executions):

    0.08652938600062043  # cached part
    0.08017484299944044  # cached part + cached list1 lenght - slightly faster
    0.15090413599955355  # non-cached version
    
    0.8882850420004615   # cached part
    0.8294611960000111   # cached part + cached list1 lenght - slightly faster
    1.5524438030006422   # non-cached version
    
    0 讨论(0)
  • 2021-01-22 01:18

    Supplementary to Patrick's excellent answer, let's try timeit with your actual code:

    >>> def isSublist1(list1,list2):                                                                                                                                                                                                                              
    ...   for i in range(len(list2)):                                                                                                                                                                                                                             
    ...     part=list2[i:]                                                                                                                                                                                                                                        
    ...     if len(part)<len(list1):                                                                                                                                                                                                                              
    ...       return False                                                                                                                                                                                                                                        
    ...     if list1==part[:len(list1)]:
    ...       return True                                                                                                                                                                                                                                         
    ...   return False                                                                                                                                                                                                                                            
    ... 
    >>> def isSublist2(list1,list2):
    ...   for i in range(len(list2)):
    ...     if len(list2[i:])<len(list1):
    ...       return False
    ...     if list1==list2[i:][:len(list1)]:
    ...       return True
    ...   return False
    ... 
    >>> list1=list(range(10000))
    >>> list2=list(range(4000,4020))
    >>> import timeit
    >>> timeit.timeit('isSublist1(list2,list1)', globals=globals(),number=100)
    6.420147094002459
    >>> timeit.timeit('isSublist2(list2,list1)', globals=globals(),number=100)
    12.455138996010646
    

    So on my system, the amount of time needed with your temporary variable is just about half the time needed without it.

    I don't know the nature of your lists and sublists; you may want to change what list1 and list2 are to be a good reflection of how your code will be used, but at least on my side, it looks like saving that temporary variable is a very good idea.

    Incidentally, let's do another interesting experiment:

    >>> def isSublist3(list1,list2):
    ...   ln = len(list1)
    ...   for i in range(len(list2)):
    ...     part=list2[i:]
    ...     if len(part)<ln:
    ...       return False
    ...     if list1==part[:ln]:
    ...       return True
    ...   return False
    ... 
    >>> timeit.timeit('isSublist1(list2,list1)',globals=globals(),number=100); timeit.timeit('isSublist3(list2,list1)',globals=globals(),number=100)
    6.549526696035173
    6.481004184985068
    

    I ran it a few more times to see what I'd get:

    6.470875242026523 6.463623657007702

    6.151073662971612 5.787795798969455

    5.685607994964812 5.655005165026523

    6.439315696014091 6.372227535001002

    Note that each time, the cached length takes less time than the non-cached length, though you don't get nearly the performance improvement you had by caching the slicing.

    Note also that it's important not to draw too many conclusions from a single run of timeit. There are a lot of other variables affecting the timing, (in my case, pretty clearly, something happened to make it drop from 6.4 to 5.7 - in the middle of a single test!) so if you want to come up with a good rule you can count on, test several times to make sure you get consistent results.

    0 讨论(0)
提交回复
热议问题