Speed differences between intersection() and 'object for object in set if object in other_set'

后端 未结 4 1225
谎友^
谎友^ 2021-02-09 10:02

Which one of these is faster? Is one \"better\"? Basically I\'ll have two sets and I want to eventually get one match from between the two lists. So really I suppose th

相关标签:
4条回答
  • 2021-02-09 10:45

    Your code is fine. Item lookup if object in other_set for sets is quite efficient.

    0 讨论(0)
  • 2021-02-09 10:46

    I realize this is a older post. But, I arrived here looking for performance speeds comparing using intersection vs in and thought it'd be worth adding more info. The answers above were great, but left me unclear as to the actual best path forward.

    The "first result" solution doesn't solve for my use case specifically.

    Instead, I wanted to know how the different implementations would perform, producing identical results sets, using discrete approaches. Not just the first single intersected value. As such, below I've included code to perform an evaluation of the options with a 1000 loop test. Contrary to what @agf posted, using sets is far faster when the desired output is a list of matches.

    My results were:

    runForin took 132851.600ms
    runForinBlist took 37700.916ms
    True
    runForInListComp took 132783.147ms
    True
    runForinSet took 780.919ms
    True
    runSetIntersection took 760.980ms (WINNER)
    True
    runSetin took 771.921ms
    True
    
    

    Here's the code. Hope it helps someone. Note: I also evaluated the blist (http://stutzbachenterprises.com/blist/blist.html) library as it performs quite well in other use cases.

    import time
    from random import sample, shuffle
    from blist import blist
    
    a = range(100000)
    aBlist = blist([i for i in a])
    
    b = sample(a, 1000)
    a.reverse()
    
    def print_timing(func):
        def wrapper(*arg):
            t1 = time.time()
            res = func(*arg)
            t2 = time.time()
            print '%s took %0.3fms' % (func.func_name, (t2-t1)*1000.0)
            return res
        return wrapper
    
    
    def forIn():
        ret = []
        for obj in b:
            if obj in a:
                ret.append(obj)
        return ret
    
    def forInBlist():
        ret = []
        for obj in b:
            if obj in aBlist:
                ret.append(obj)
        return ret
    
    
    def forInListComp():
        return [value for value in b if value in a] 
    
    
    def forInSet():
        ret = []
        for obj in b:
            if obj in set(a):
                ret.append(obj)
        return ret
    
    
    def setIntersection(): 
        return set(a).intersection(b) 
    
    
    def setIn():
        return list(set(a) & set(b))
    
    
    @print_timing
    def runForIn(times):
        for i in range(times):
            ret = forIn()
        return ret
            
    @print_timing
    def runForInBlist(times):
        for i in range(times):
            ret = forInBlist()
        return ret
    
    @print_timing
    def runForInListComp(times):
        for i in range(times):
            ret = forInListComp()
        return ret
    
    @print_timing
    def runForInSet(times):
        for i in range(times):
            ret = forInSet()
        return ret
    
    @print_timing
    def runSetIntersection(times):
        for i in range(times):
            ret = setIntersection()
        return ret
    
    @print_timing
    def runSetIn(times):
        for i in range(times):
            ret = setIn()
        return ret
    
    def checkResults(results):
        master = None
        for resultSet in results:
            if not master:
                master = sorted(list(resultSet))
                continue
            try:
                if master != sorted(list(resultSet)):
                    return False, master, sorted(list(resultSet))
            except:
                print resultSet
                return False
        return True
    
    iterations = 5
    results = []
    runForInResults = runForIn(iterations)
    results.append(runForInResults)
    
    runForInBlistResults = runForInBlist(iterations)
    results.append(runForInBlistResults)
    print checkResults(results)
    
    runForInListCompResults = runForInListComp(iterations)
    results.append(runForInListCompResults)
    print checkResults(results)
    
    runForInSetResults = runForInSet(iterations)
    results.append(runForInSetResults)
    print checkResults(results)
    
    runSetIntersectionResults = runSetIntersection(iterations)
    results.append(runSetIntersectionResults)
    print checkResults(results)
    
    runSetInResults = runSetIn(iterations)
    results.append(runSetInResults)
    print checkResults(results)
    
    0 讨论(0)
  • 2021-02-09 10:48

    I wrote a simple utility that checks if two sets have at least one element in common. I had the same optimization problem today and your post saved my day. This is just a way to thank you for pointing this out, hope this will help other people too :)

    Notice. The utility does NOT return the first element in common but rather returns true if they have at least one element in common, false otherwise. Of course it can be easily hacked to meet your goal.

    def nonEmptyIntersection(A, B):
        """
        Returns true if set A intersects set B.
        """
        smaller, bigger = A, B
        if len(B) < len(A):
            smaller, bigger = bigger, smaller
        for e in smaller:
            if e in bigger:
                return True
        return False
    
    0 讨论(0)
  • 2021-02-09 10:57
    from timeit import timeit
    
    setup = """
    from random import sample, shuffle
    a = range(100000)
    b = sample(a, 1000)
    a.reverse()
    """
    
    forin = setup + """
    def forin():
        # a = set(a)
        for obj in b:
            if obj in a:
                return obj
    """
    
    setin = setup + """
    def setin():
        # original method:
        # return tuple(set(a) & set(b))[0]
        # suggested in comment, doesn't change conclusion:
        return next(iter(set(a) & set(b)))
    """
    
    print timeit("forin()", forin, number = 100)
    print timeit("setin()", setin, number = 100)
    

    Times:

    >>>
    0.0929054012768
    0.637904308732
    >>>
    0.160845057616
    1.08630760484
    >>>
    0.322059185123
    1.10931801261
    >>>
    0.0758695262169
    1.08920981403
    >>>
    0.247866360526
    1.07724461708
    >>>
    0.301856152688
    1.07903130641
    

    Making them into sets in the setup and running 10000 runs instead of 100 yields

    >>>
    0.000413064976328
    0.152831597075
    >>>
    0.00402408388788
    1.49093627898
    >>>
    0.00394538156695
    1.51841512101
    >>>
    0.00397715579584
    1.52581949403
    >>>
    0.00421472926155
    1.53156769646
    

    So your version is much faster whether or not it makes sense to convert them to sets.

    0 讨论(0)
提交回复
热议问题