Pairwise Set Intersection in Python

后端未结

关注

 3  2019

南旧 2021-02-04 06:47

If I have a variable number of sets (let\'s call the number n), which have at most m elements each, what\'s the most efficient way to calculate the pairwise in

3条回答

谎友^ (楼主)

2021-02-04 07:28
this ought to do what you want
```
import random as RND
import string
import itertools as IT
```
mock some data
```
fnx = lambda: set(RND.sample(string.ascii_uppercase, 7))
S = [fnx() for c in range(5)]
```
generate an index list of the sets in S so the sets can be referenced more concisely below
```
idx = range(len(S))
```
get all possible unique pairs of the items in S; however, since set intersection is commutative, we want the combinations rather than permutations
```
pairs = IT.combinations(idx, 2)
```
write a function perform the set intersection
```
nt = lambda a, b: S[a].intersection(S[b])
```
fold this function over the pairs & key the result from each function call to its arguments
```
res = dict([ (t, nt(*t)) for t in pairs ])
```
the result below, formatted per the first option recited in the OP, is a dictionary in which the values are the set intersections of two sequences; each values keyed to a tuple comprised of the two indices of those sequences

this solution, is really just two lines of code: (i) calculate the permutations; (ii) then apply some function over each permutation, storing the returned value in a structured container (key-value) container

the memory footprint of this solution is minimal, but you can do even better by returning a generator expression in the last step, ie
```
res = ( (t, nt(*t)) for t in pairs )
```
notice that with this approach, neither the sequence of pairs nor the corresponding intersections have been written out in memory--ie, both pairs and res are iterators.
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...