Pairwise Set Intersection in Python

后端 未结 3 2018
南旧
南旧 2021-02-04 06:47

If I have a variable number of sets (let\'s call the number n), which have at most m elements each, what\'s the most efficient way to calculate the pairwise in

相关标签:
3条回答
  • 2021-02-04 07:20

    If we can assume that the input sets are ordered, a pseudo-mergesort approach seems promising. Treating each set as a sorted stream, advance the streams in parallel, always only advancing those where the value is the lowest among all current iterators. Compare each current value with the new minimum every time an iterator is advanced, and dump the matches into your same-item collections.

    0 讨论(0)
  • 2021-02-04 07:28

    this ought to do what you want

    import random as RND
    import string
    import itertools as IT
    

    mock some data

    fnx = lambda: set(RND.sample(string.ascii_uppercase, 7))
    S = [fnx() for c in range(5)]
    

    generate an index list of the sets in S so the sets can be referenced more concisely below

    idx = range(len(S))
    

    get all possible unique pairs of the items in S; however, since set intersection is commutative, we want the combinations rather than permutations

    pairs = IT.combinations(idx, 2)
    

    write a function perform the set intersection

    nt = lambda a, b: S[a].intersection(S[b])
    

    fold this function over the pairs & key the result from each function call to its arguments

    res = dict([ (t, nt(*t)) for t in pairs ])
    

    the result below, formatted per the first option recited in the OP, is a dictionary in which the values are the set intersections of two sequences; each values keyed to a tuple comprised of the two indices of those sequences

    this solution, is really just two lines of code: (i) calculate the permutations; (ii) then apply some function over each permutation, storing the returned value in a structured container (key-value) container

    the memory footprint of this solution is minimal, but you can do even better by returning a generator expression in the last step, ie

    res = ( (t, nt(*t)) for t in pairs )
    

    notice that with this approach, neither the sequence of pairs nor the corresponding intersections have been written out in memory--ie, both pairs and res are iterators.

    0 讨论(0)
  • 2021-02-04 07:29

    How about using intersection method of set. See below:

    A={"a","b","c"}
    B={"c","d","e"}
    C={"a","c","e"}
    
    intersect_AB = A.intersection(B)
    intersect_BC = B.intersection(C)
    intersect_AC = A.intersection(C)
    
    print intersect_AB, intersect_BC, intersect_AC
    
    0 讨论(0)
提交回复
热议问题