test for membership in a 2d numpy array

前端 未结 4 671
花落未央
花落未央 2020-12-03 08:05

I have two 2D arrays of the same size

a = array([[1,2],[3,4],[5,6]])
b = array([[1,2],[3,4],[7,8]])

I want to know the rows of b that are i

相关标签:
4条回答
  • 2020-12-03 08:47

    What we'd really like to do is use np.in1d... except that np.in1d only works with 1-dimensional arrays. Our arrays are multi-dimensional. However, we can view the arrays as a 1-dimensional array of strings:

    arr.view(np.dtype((np.void, arr.dtype.itemsize * arr.shape[-1])))
    

    For example,

    In [15]: arr = np.array([[1, 2], [2, 3], [1, 3]])
    
    In [16]: arr = arr.view(np.dtype((np.void, arr.dtype.itemsize * arr.shape[-1])))
    
    In [30]: arr.dtype
    Out[30]: dtype('V16')
    
    In [31]: arr.shape
    Out[31]: (3, 1)
    
    In [37]: arr
    Out[37]: 
    array([[b'\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00'],
           [b'\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00'],
           [b'\x01\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00']],
          dtype='|V16')
    

    This makes each row of arr a string. Now it is just a matter of hooking this up to np.in1d:

    import numpy as np
    
    def asvoid(arr):
        """
        Based on http://stackoverflow.com/a/16973510/190597 (Jaime, 2013-06)
        View the array as dtype np.void (bytes). The items along the last axis are
        viewed as one value. This allows comparisons to be performed on the entire row.
        """
        arr = np.ascontiguousarray(arr)
        if np.issubdtype(arr.dtype, np.floating):
            """ Care needs to be taken here since
            np.array([-0.]).view(np.void) != np.array([0.]).view(np.void)
            Adding 0. converts -0. to 0.
            """
            arr += 0.
        return arr.view(np.dtype((np.void, arr.dtype.itemsize * arr.shape[-1])))
    
    
    def inNd(a, b, assume_unique=False):
        a = asvoid(a)
        b = asvoid(b)
        return np.in1d(a, b, assume_unique)
    
    
    tests = [
        (np.array([[1, 2], [2, 3], [1, 3]]),
         np.array([[2, 2], [3, 3], [4, 4]]),
         np.array([False, False, False])),
        (np.array([[1, 2], [2, 2], [1, 3]]),
         np.array([[2, 2], [3, 3], [4, 4]]),
         np.array([True, False, False])),
        (np.array([[1, 2], [3, 4], [5, 6]]),
         np.array([[1, 2], [3, 4], [7, 8]]),
         np.array([True, True, False])),
        (np.array([[1, 2], [5, 6], [3, 4]]),
         np.array([[1, 2], [5, 6], [7, 8]]),
         np.array([True, True, False])),
        (np.array([[-0.5, 2.5, -2, 100, 2], [5, 6, 7, 8, 9], [3, 4, 5, 6, 7]]),
         np.array([[1.0, 2, 3, 4, 5], [5, 6, 7, 8, 9], [-0.5, 2.5, -2, 100, 2]]),
         np.array([False, True, True]))
    ]
    
    for a, b, answer in tests:
        result = inNd(b, a)
        try:
            assert np.all(answer == result)
        except AssertionError:
            print('''\
    a:
    {a}
    b:
    {b}
    
    answer: {answer}
    result: {result}'''.format(**locals()))
            raise
    else:
        print('Success!')
    

    yields

    Success!
    
    0 讨论(0)
  • 2020-12-03 08:53
    In [1]: import numpy as np
    
    In [2]: a = np.array([[1,2],[3,4]])
    
    In [3]: b = np.array([[3,4],[1,2]])
    
    In [5]: a = a[a[:,1].argsort(kind='mergesort')]
    
    In [6]: a = a[a[:,0].argsort(kind='mergesort')]
    
    In [7]: b = b[b[:,1].argsort(kind='mergesort')]
    
    In [8]: b = b[b[:,0].argsort(kind='mergesort')]
    
    In [9]: bInA1 = b[:,0] == a[:,0]
    
    In [10]: bInA2 = b[:,1] == a[:,1]
    
    In [11]: bInA = bInA1*bInA2
    
    In [12]: bInA
    Out[12]: array([ True,  True], dtype=bool)
    

    should do this... Not sure, whether this is still efficient. You need do mergesort, as other methods are unstable.

    Edit:

    If you have more than 2 columns and if the rows are sorted already, you can do

    In [24]: bInA = np.array([True,]*a.shape[0])
    
    In [25]: bInA
    Out[25]: array([ True,  True], dtype=bool)
    
    In [26]: for k in range(a.shape[1]):
        bInAk = b[:,k] == a[:,k]
        bInA = bInAk*bInA
       ....:     
    
    In [27]: bInA
    Out[27]: array([ True,  True], dtype=bool)
    

    There is still space for speeding up, as in the iteration, you don't have to check the entire column, but only the entries where the current bInA is True.

    0 讨论(0)
  • 2020-12-03 08:59

    the numpy module can actually broadcast through your array and tell what parts are the same as the other and return true if they are and false if they are not:

    import numpy as np
    a = np.array(([1,2],[3,4],[5,6])) #converting to a numpy array
    b = np.array(([1,2],[3,4],[7,8])) #converting to a numpy array
    new_array = a == b #creating a new boolean array from comparing a and b
    

    now new_array looks like this:

    [[ True  True]
     [ True  True]
     [False False]]
    

    but that is not what you want. So you can transpose (flip x and y) the array and then compare the two rows with an & gate. This will now create a 1-D array that will only return true if both columns in the row are true:

    new_array = new_array.T #transposing
    result = new_array[0] & new_array[1] #comparing rows
    

    when you print result you now get what you're looking for:

    [ True  True False]
    
    0 讨论(0)
  • 2020-12-03 09:01

    If you have smth like a=np.array([[1,2],[3,4],[5,6]]) and b=np.array([[5,6],[1,2],[7,6]]), you can convert them into complex 1-D array:

    c=a[:,0]+a[:,1]*1j
    d=b[:,0]+b[:,1]*1j
    

    This whole stuff in my Interpreter looks like this:

    >>> c=a[:,0]+a[:,1]*1j
    >>> c
    array([ 1.+2.j,  3.+4.j,  5.+6.j])
    >>> d=b[:,0]+b[:,1]*1j
    >>> d
    array([ 5.+6.j,  1.+2.j,  7.+6.j])
    

    And now that you have just 1D array, you can easily do np.in1d(c,d), and the Python will give you:

    >>> np.in1d(c,d)
    array([ True, False,  True], dtype=bool)
    

    And with this you don't need any loops, at least with this data type

    0 讨论(0)
提交回复
热议问题