Pairwise Set Intersection in Python

后端 未结 3 2012
南旧
南旧 2021-02-04 06:47

If I have a variable number of sets (let\'s call the number n), which have at most m elements each, what\'s the most efficient way to calculate the pairwise in

3条回答
  •  谎友^
    谎友^ (楼主)
    2021-02-04 07:28

    this ought to do what you want

    import random as RND
    import string
    import itertools as IT
    

    mock some data

    fnx = lambda: set(RND.sample(string.ascii_uppercase, 7))
    S = [fnx() for c in range(5)]
    

    generate an index list of the sets in S so the sets can be referenced more concisely below

    idx = range(len(S))
    

    get all possible unique pairs of the items in S; however, since set intersection is commutative, we want the combinations rather than permutations

    pairs = IT.combinations(idx, 2)
    

    write a function perform the set intersection

    nt = lambda a, b: S[a].intersection(S[b])
    

    fold this function over the pairs & key the result from each function call to its arguments

    res = dict([ (t, nt(*t)) for t in pairs ])
    

    the result below, formatted per the first option recited in the OP, is a dictionary in which the values are the set intersections of two sequences; each values keyed to a tuple comprised of the two indices of those sequences

    this solution, is really just two lines of code: (i) calculate the permutations; (ii) then apply some function over each permutation, storing the returned value in a structured container (key-value) container

    the memory footprint of this solution is minimal, but you can do even better by returning a generator expression in the last step, ie

    res = ( (t, nt(*t)) for t in pairs )
    

    notice that with this approach, neither the sequence of pairs nor the corresponding intersections have been written out in memory--ie, both pairs and res are iterators.

提交回复
热议问题