Remove elements from one array if present in another array, keep duplicates - NumPy / Python

a 夏天 提交于 2020-06-21 19:07:33

问题


I have two arrays A (len of 3.8million) and B (len of 20k). For the minimal example, lets take this case:

A = np.array([1,1,2,3,3,3,4,5,6,7,8,8])
B = np.array([1,2,8])

Now I want the resulting array to be:

C = np.array([3,3,3,4,5,6,7])

i.e. if any value in B is found in A, remove it from A, if not keep it.

I would like to know if there is any way to do it without a for loop because it is a lengthy array and so it takes long time to loop.


回答1:


Using searchsorted

With sorted B, we can use searchsorted -

A[B[np.searchsorted(B,A)] !=  A]

From the linked docs, searchsorted(a,v) find the indices into a sorted array a such that, if the corresponding elements in v were inserted before the indices, the order of a would be preserved. So, let's say idx = searchsorted(B,A) and we index into B with those : B[idx], we will get a mapped version of B corresponding to every element in A. Thus, comparing this mapped version against A would tell us for every element in A if there's a match in B or not. Finally, index into A to select the non-matching ones.

Generic case (B is not sorted) :

If B is not already sorted as is the pre-requisite, sort it and then use the proposed method.

Alternatively, we can use sorter argument with searchsorted -

sidx = B.argsort()
out = A[B[sidx[np.searchsorted(B,A,sorter=sidx)]] != A]

Using in1d/isin

We can also use np.in1d, which is pretty straight-forward (the linked docs should help clarify) as it looks for any match in B for every element in A and then we can use boolean-indexing with an inverted mask to look for non-matching ones -

A[~np.in1d(A,B)]

Same with isin -

A[~np.isin(A,B)]

With invert flag -

A[np.in1d(A,B,invert=True)]

A[np.isin(A,B,invert=True)]

This solves for a generic when B is not necessarily sorted.




回答2:


I am not very familiar with numpy, but how about using sets:

C = set(A.flat) - set(B.flat)

EDIT : from comments, sets cannot have duplicates values.

So another solution would be to use a lambda expression :

C = np.array(list(filter(lambda x: x not in B, A)))


来源:https://stackoverflow.com/questions/52417929/remove-elements-from-one-array-if-present-in-another-array-keep-duplicates-nu

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!