问题
I want to efficiently find the intersection of two lists , keeping duplicates from both, e.g. A=[1,1,2,3], B=[1,1,2,4] should return [1,1,1,1,2]
I know a similar question was asked previously (Python intersection of two lists keeping duplicates) however this does not help me because only the duplicates from one list are retained.
The following works
def intersect(A,B):
C=[]
for a in A:
for b in B:
if a==b:
C.append(a)
return C
however it isn't efficient enough for what I'm doing! To speed things up I tried sorting the lists
def intersect(A,B):
A.sort()
B.sort()
C=[]
i=0
j=0
while i<len(A) and j<len(B):
if A[i]<=B[j]:
if A[i]==B[j]:
C.append(A[i])
i+=1
else:
j=j+1
return C
however this only keeps the duplicates from list B. Any suggestions?
回答1:
Here is the answer to your question as asked:
import collections
for A,B,expected_output in (
([1,1,2,3], [1,1,2,4], [1,1,1,1,2]),
([1,1,2,3], [1,2,4], [1,1,2])):
cntA = collections.Counter(A)
cntB = collections.Counter(B)
output = [
x for x in sorted(set(A) & set(B)) for i in range(cntA[x]*cntB[x])]
assert output == expected_output
Here is the answer to the question as originally interpreted by myself and two others:
import collections
A=[1,1,2,3]
B=[1,1,2,4]
expected_output = [1,1,1,1,2,2]
cntA = collections.Counter(A)
cntB = collections.Counter(B)
cnt_sum = collections.Counter(A) + collections.Counter(B)
output = [x for x in sorted(set(A) & set(B)) for i in range(cnt_sum[x])]
assert output == expected_output
You can find the collections.Counter()
documentation here. collections
is a great module and I highly recommend giving the documentation on the whole module a read.
I realized you don't actually need to find the intersection of the sets, because the "count of a missing element is zero" according to the documentation:
import collections
for A,B,expected_output in (
([1,1,2,3], [1,1,2,4], [1,1,1,1,2]),
([1,1,2,3], [1,2,4], [1,1,2])):
cntA = collections.Counter(A)
cntB = collections.Counter(B)
output = [
x for x in sorted(set(A)) for i in range(cntA[x]*cntB[x])]
assert output == expected_output
回答2:
How about this:
a_set = set(A)
b_set = set(B)
intersect = [i for i in A if i in b_set] + [j for j in B if j in a_set]
Two list comprehensions concatenated. A bit of extra time and memory is used to create sets of A and B, but that will be more than offset by the efficiency of checking membership of items in a set vs list.
You could also spruce it up a bit:
set_intersect = set(A) & set(B)
list_intersect = [ele for ele in A+B if ele in set_intersect]
Coerce both lists to sets, take their intersection, then use a list comprehension to add all elements from both lists A and B if they appear in the set intersection.
回答3:
I'm having a hard time speeding your code since I don't know what are you running it on. It makes a lot of difference whether you run it on small or large lists and how many distinct elements are there. Anyway, here are some suggestions:
1.
def intersect(a, b):
count_a = Counter(a)
count_b = Counter(b)
count_mul = []
for i in count_a:
count_mul.extend([i] * (count_a[i] * count_b[i]))
return count_mul
2.
This returns an iterator, you can use list(iterator)
to turn it into a list
def intersect(a, b):
count_a = Counter(a)
count_b = Counter(b)
count_mul = Counter()
for i in count_a:
count_mul[i] += count_a[i] * count_b[i]
return count_mul.elements()
3.
Very similar to your way but without changing the size of the list which takes time.
def intersect(A, B):
return [a for a in A for b in B if a == b]
I'm not sure this gives any improvement to your original way, it really depends on the inputs, but your way is O(n*m)
and mine is O(n+m)
.
You can use the module timeit
to check how fast it runs on your input:
from timeit import timeit
timeit('test.intersect(A, B)', 'import test; A = [1,1,2,3]; B = [1,1,2,4]')
来源:https://stackoverflow.com/questions/29830223/python-intersection-of-2-lists-keeping-duplicates-from-both-lists