问题
I need to find all the different intersections between two partitions of the same set. For example, if we have the following two partitions of the same set
x = [[1, 2], [3, 4, 5], [6, 7, 8, 9, 10]]
y = [[1, 3, 6, 7], [2, 4, 5, 8, 9, 10]]
the required result is
[[1], [2], [3], [4, 5], [6, 7], [8, 9, 10]].
In detail, we calculate the cartesian product between every subset of x and y, and for each of these products, we classify the elements in new subsets accordingly if they belong to the intersection of their associated subsets or not.
What is the optimal / more pythonic way to do it? Thanks in advance!
PERFORMANCE COMPARISON OF THE CURRENT ANSWERS:
import numpy as np
def partitioning(alist, indices):
return [alist[i:j] for i, j in zip([0]+indices, indices+[None])]
total = 1000
sample1 = np.sort(np.random.choice(total, int(total/10), replace=False))
sample2 = np.sort(np.random.choice(total, int(total/2), replace=False))
a = partitioning(np.arange(total), list(sample1))
b = partitioning(np.arange(total), list(sample2))
def partition_decomposition_product_1(x, y):
out = []
for sublist1 in x:
d = {}
for val in sublist1:
for i, sublist2 in enumerate(y):
if val in sublist2:
d.setdefault(i, []).append(val)
out.extend(d.values())
return out
def partition_decomposition_product_2(x, y):
all_s = []
for sx in x:
for sy in y:
ss = list(filter(lambda x:x in sx, sy))
if ss:
all_s.append(ss)
return all_s
def partition_decomposition_product_3(x, y):
return [np.intersect1d(i,j) for i in x for j in y]
And measuring execution time with %timeit
%timeit partition_decomposition_product_1(a, b)
%timeit partition_decomposition_product_2(a, b)
%timeit partition_decomposition_product_3(a, b)
we find
2.16 s ± 246 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
620 ms ± 84.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.03 s ± 111 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
thus the second solution is the fastest one.
回答1:
The fact that the two lists are partitions of the same set is not relevant to the algorithm choice. This boils down to iterating through two lists of lists and getting the intersection between each combination (you can add that assertion at the beginning of the function to ensure they are partitions of the same set, using this answer to flatten the lists efficiently). With this in mind, this function accomplishes the task, using this answer to calculate list intersection:
def func2(x, y):
# check that they partition the same set
checkx = sorted([item for sublist in x for item in sublist])
checky = sorted([item for sublist in y for item in sublist])
assert checkx == checky
# get all intersections
all_s = []
for sx in x:
for sy in y:
ss = list(filter(lambda x:x in sx, sy))
if ss:
all_s.append(ss)
return all_s
Then using this time comparison method, we can see that this new function is ~100x faster than your original implementation.
回答2:
I'm not sure if I understand you correctly, but this script produces the result you have in your question:
x = [[1, 2], [3, 4, 5], [6, 7, 8, 9, 10]]
y = [[1, 3, 6, 7], [2, 4, 5, 8, 9, 10]]
out = []
for sublist1 in x:
d = {}
for val in sublist1:
for i, sublist2 in enumerate(y):
if val in sublist2:
d.setdefault(i, []).append(val)
out.extend(d.values())
print(out)
Prints:
[[1], [2], [3], [4, 5], [6, 7], [8, 9, 10]]
回答3:
I may miss some details, but it seems a bit too easy:
[np.intersect1d(a,b) for a in x for b in y]
Output:
[array([1]),
array([2]),
array([3]),
array([4, 5]),
array([6, 7]),
array([ 8, 9, 10])]
The above includes duplicates, for example x=[[1,2,3],[1,4,5]]
and y=[[1,6,7]]
would gives [[1],[1]]
.
If you want to find the unique intersections:
[list(i) for i in {tuple(np.intersect1d(a,b)) for a in x for b in y}]
Output:
[[8, 9, 10], [6, 7], [1], [4, 5], [2], [3]]
来源:https://stackoverflow.com/questions/61647198/pythonic-and-efficient-way-to-find-all-the-different-intersections-between-two-p