check for identical rows in different numpy arrays

后端 未结 6 770
无人共我 2020-11-28 15:14

how do I get a row-wise comparison between two arrays, in the result of a row-wise true/false array?

Given datas:

a = np.array([[1,0],[2,0],[3,1],[         

  • 2020-11-28 15:46

    You can do it as a list comp via:

    c = np.array([row in b for row in a])

    though this approach will be slower than a pure numpy approach (if it exists).

    0 讨论(0)
  • 2020-11-28 16:00

    Here's a vectorised solution:

    res = (a[:, None] == b).all(-1).any(-1)
    array([ True,  True, False,  True])

    Note that a[:, None] == b compares each row of a with b element-wise. We then use all + any to deduce if there are any rows which are all True for each sub-array:

    print(a[:, None] == b)
    [[[ True  True]
      [False  True]
      [False False]]
     [[False  True]
      [ True  True]
      [False False]]
     [[False False]
      [False False]
      [False False]]
     [[False False]
      [False False]
      [ True  True]]]
    0 讨论(0)
  • 2020-11-28 16:00

    You can use scipy's cdist which has a few advantages:

    from scipy.spatial.distance import cdist
    a = np.array([[1,0],[2,0],[3,1],[4,2]])
    b = np.array([[1,0],[2,0],[4,2]])
    c = cdist(a, b)==0
    [ True  True False  True]
    [[1 0]
     [2 0]
     [4 2]]

    Also, cdist allows passing of a function pointer. So you can specify your own distance functions, to do whatever comparison you need:

    c = cdist(a, b, lambda u, v: (u==v).all())
    [[1. 0. 0.]
     [0. 1. 0.]
     [0. 0. 0.]
     [0. 0. 1.]]

    And now you can find which index matches. Which will also indicate if there are multiple matches.

    # Array with multiple instances
    a2 = np.array([[1,0],[2,0],[3,1],[4,2],[3,1],[4,2]])
    c2 = cdist(a2, b, lambda u, v: (u==v).all())
    idx = np.where(c2==1)
    [[1. 0. 0.]
     [0. 1. 0.]
     [0. 0. 0.]
     [0. 0. 1.]
     [0. 0. 0.]
     [0. 0. 1.]]
    (array([0, 1, 3, 5], dtype=int64), array([0, 1, 2, 2], dtype=int64))
    [3 5]
    0 讨论(0)
  • 2020-11-28 16:06

    you can use numpy with apply_along_axis (kind of iteration on specific axis while axis=0 iterate on every cell, axis=1 iterate on every row, axis=2 on matrix and so on

    import numpy as np
    a = np.array([[1,0],[2,0],[3,1],[4,2]])
    b = np.array([[1,0],[2,0],[4,2]])
    c = np.apply_along_axis(lambda x,y: x in y, 1, a, b)
    0 讨论(0)
  • 2020-11-28 16:11

    Approach #1

    We could use a view based vectorized solution -

    # @Divakar
    def view1D(a, b): # a, b are arrays
        a = np.ascontiguousarray(a)
        b = np.ascontiguousarray(b)
        void_dt = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))
        return a.view(void_dt).ravel(),  b.view(void_dt).ravel()
    A,B = view1D(a,b)
    out = np.isin(A,B)

    Sample run -

    In [8]: a
    array([[1, 0],
           [2, 0],
           [3, 1],
           [4, 2]])
    In [9]: b
    array([[1, 0],
           [2, 0],
           [4, 2]])
    In [10]: A,B = view1D(a,b)
    In [11]: np.isin(A,B)
    Out[11]: array([ True,  True, False,  True])

    Approach #2

    Alternatively for the case when all rows in b are in a and rows are lexicographically sorted, using the same views, but with searchsorted -

    out = np.zeros(len(A), dtype=bool)
    out[np.searchsorted(A,B)] = 1

    If the rows are not necessarily lexicographically sorted -

    sidx = A.argsort()
    out[sidx[np.searchsorted(A,B,sorter=sidx)]] = 1
    0 讨论(0)
  • 2020-11-28 16:13
    a = np.array([[1,0],[2,0],[3,1],[4,2]])
    b = np.array([[1,0],[2,0],[4,2]])
    i = 0
    j = 0
    result = []

    We can take advantage of the fact that they are sorted and do this in O(n) time. Using two pointers we just move ahead the pointer that has gotten behind:

    while i < len(a) and j < len(b):
        if tuple(a[i])== tuple(b[j]):
            i += 1
            j += 1 # get rid of this depending on how you want to handle duplicates
        elif tuple(a[i]) > tuple(b[j]):
            j += 1
            i += 1

    Pad with False if it ends early.

    if len(result) < len(a):
        result.extend([False] * (len(a) - len(result)))
    print(result) # [True, True, False, True]

    This answer is adapted from Better way to find matches in two sorted lists than using for loops? (Java)

    0 讨论(0)