Remove elements from one array if present in another array, keep duplicates - NumPy / Python

前端 未结 3 1045
无人共我
无人共我 2020-12-17 21:26

I have two arrays A (len of 3.8million) and B (len of 20k). For the minimal example, lets take this case:

A = np.array([1,1,2,3,3,         


        
3条回答
  •  有刺的猬
    2020-12-17 21:47

    Adding to Divakar's answer above -

    if the original array A has a wider range than B, that will give you an 'index out of bounds' error. See:

    A = np.array([1,1,2,3,3,3,4,5,6,7,8,8,10,12,14])
    B = np.array([1,2,8])
    
    A[B[np.searchsorted(B,A)] !=  A]
    >> IndexError: index 3 is out of bounds for axis 0 with size 3
    
    

    This will happen because np.searchsorted will assign index 3 (one-past-the-last in B) as the appropriate position for inserting in B the elements 10, 12 and 14 from A, in this example. Thus you get an IndexError in B[np.searchsorted(B,A)].

    To circumvent that, a possible approach is:

    def subset_sorted_array(A,B):
        Aa = A[np.where(A <= np.max(B))]
        Bb = (B[np.searchsorted(B,Aa)] !=  Aa)
        Bb = np.pad(Bb,(0,A.shape[0]-Aa.shape[0]), method='constant', constant_values=True)
        return A[Bb]
    

    Which works as follows:

    # Take only the elements in A that would be inserted in B
    Aa = A[np.where(A <= np.max(B))]
    
    # Pad the resulting filter with 'Trues' - I split this in two operations for
    # easier reading
    Bb = (B[np.searchsorted(B,Aa)] !=  Aa)
    Bb = np.pad(Bb,(0,A.shape[0]-Aa.shape[0]),  method='constant', constant_values=True)
    
    # Then you can filter A by Bb
    A[Bb]
    # For the input arrays above:
    >> array([ 3,  3,  3,  4,  5,  6,  7, 10, 12, 14])
    

    Notice this will also work between arrays of strings and other types (for all types for which the comparison <= operator is defined).

提交回复
热议问题