check for identical rows in different numpy arrays

后端 未结 6 770
无人共我
无人共我 2020-11-28 15:14

how do I get a row-wise comparison between two arrays, in the result of a row-wise true/false array?

Given datas:

a = np.array([[1,0],[2,0],[3,1],[         


        
相关标签:
6条回答
  • 2020-11-28 15:46

    You can do it as a list comp via:

    c = np.array([row in b for row in a])
    

    though this approach will be slower than a pure numpy approach (if it exists).

    0 讨论(0)
  • 2020-11-28 16:00

    Here's a vectorised solution:

    res = (a[:, None] == b).all(-1).any(-1)
    
    print(res)
    
    array([ True,  True, False,  True])
    

    Note that a[:, None] == b compares each row of a with b element-wise. We then use all + any to deduce if there are any rows which are all True for each sub-array:

    print(a[:, None] == b)
    
    [[[ True  True]
      [False  True]
      [False False]]
    
     [[False  True]
      [ True  True]
      [False False]]
    
     [[False False]
      [False False]
      [False False]]
    
     [[False False]
      [False False]
      [ True  True]]]
    
    0 讨论(0)
  • 2020-11-28 16:00

    You can use scipy's cdist which has a few advantages:

    from scipy.spatial.distance import cdist
    
    a = np.array([[1,0],[2,0],[3,1],[4,2]])
    b = np.array([[1,0],[2,0],[4,2]])
    
    c = cdist(a, b)==0
    print(c.any(axis=1))
    
    [ True  True False  True]
    
    print(a[c.any(axis=1)])
    
    [[1 0]
     [2 0]
     [4 2]]
    

    Also, cdist allows passing of a function pointer. So you can specify your own distance functions, to do whatever comparison you need:

    c = cdist(a, b, lambda u, v: (u==v).all())
    print(c)
    
    [[1. 0. 0.]
     [0. 1. 0.]
     [0. 0. 0.]
     [0. 0. 1.]]
    

    And now you can find which index matches. Which will also indicate if there are multiple matches.

    # Array with multiple instances
    a2 = np.array([[1,0],[2,0],[3,1],[4,2],[3,1],[4,2]])
    
    c2 = cdist(a2, b, lambda u, v: (u==v).all())
    print(c2)
    
    idx = np.where(c2==1)
    print(idx)
    
    print(idx[0][idx[1]==2])
    
    [[1. 0. 0.]
     [0. 1. 0.]
     [0. 0. 0.]
     [0. 0. 1.]
     [0. 0. 0.]
     [0. 0. 1.]]
    (array([0, 1, 3, 5], dtype=int64), array([0, 1, 2, 2], dtype=int64))
    [3 5]
    
    0 讨论(0)
  • 2020-11-28 16:06

    you can use numpy with apply_along_axis (kind of iteration on specific axis while axis=0 iterate on every cell, axis=1 iterate on every row, axis=2 on matrix and so on

    import numpy as np
    a = np.array([[1,0],[2,0],[3,1],[4,2]])
    b = np.array([[1,0],[2,0],[4,2]])
    c = np.apply_along_axis(lambda x,y: x in y, 1, a, b)
    
    0 讨论(0)
  • 2020-11-28 16:11

    Approach #1

    We could use a view based vectorized solution -

    # https://stackoverflow.com/a/45313353/ @Divakar
    def view1D(a, b): # a, b are arrays
        a = np.ascontiguousarray(a)
        b = np.ascontiguousarray(b)
        void_dt = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))
        return a.view(void_dt).ravel(),  b.view(void_dt).ravel()
    
    A,B = view1D(a,b)
    out = np.isin(A,B)
    

    Sample run -

    In [8]: a
    Out[8]: 
    array([[1, 0],
           [2, 0],
           [3, 1],
           [4, 2]])
    
    In [9]: b
    Out[9]: 
    array([[1, 0],
           [2, 0],
           [4, 2]])
    
    In [10]: A,B = view1D(a,b)
    
    In [11]: np.isin(A,B)
    Out[11]: array([ True,  True, False,  True])
    

    Approach #2

    Alternatively for the case when all rows in b are in a and rows are lexicographically sorted, using the same views, but with searchsorted -

    out = np.zeros(len(A), dtype=bool)
    out[np.searchsorted(A,B)] = 1
    

    If the rows are not necessarily lexicographically sorted -

    sidx = A.argsort()
    out[sidx[np.searchsorted(A,B,sorter=sidx)]] = 1
    
    0 讨论(0)
  • 2020-11-28 16:13
    a = np.array([[1,0],[2,0],[3,1],[4,2]])
    b = np.array([[1,0],[2,0],[4,2]])
    
    i = 0
    j = 0
    result = []
    

    We can take advantage of the fact that they are sorted and do this in O(n) time. Using two pointers we just move ahead the pointer that has gotten behind:

    while i < len(a) and j < len(b):
        if tuple(a[i])== tuple(b[j]):
            result.append(True)
            i += 1
            j += 1 # get rid of this depending on how you want to handle duplicates
        elif tuple(a[i]) > tuple(b[j]):
            j += 1
        else:
            result.append(False)
            i += 1
    

    Pad with False if it ends early.

    if len(result) < len(a):
        result.extend([False] * (len(a) - len(result)))
    
    print(result) # [True, True, False, True]
    

    This answer is adapted from Better way to find matches in two sorted lists than using for loops? (Java)

    0 讨论(0)
提交回复
热议问题