check for identical rows in different numpy arrays

后端未结

关注

 6  770

how do I get a row-wise comparison between two arrays, in the result of a row-wise true/false array?

Given datas:

a = np.array([[1,0],[2,0],[3,1],[


                      
              相关标签:


      
      
        
          6条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  旧巷少年郎        
                
              
                            
                2020-11-28 15:46
              
            
            
                                                                       
You can do it as a list comp via:

c = np.array([row in b for row in a])


though this approach will be slower than a pure numpy approach (if it exists).
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  野性不改        
                
              
                            
                2020-11-28 16:00
              
            
            
                                                                       
Here's a vectorised solution:

res = (a[:, None] == b).all(-1).any(-1)

print(res)

array([ True,  True, False,  True])


Note that a[:, None] == b compares each row of a with b element-wise. We then use all + any to deduce if there are any rows which are all True for each sub-array:

print(a[:, None] == b)

[[[ True  True]
  [False  True]
  [False False]]

 [[False  True]
  [ True  True]
  [False False]]

 [[False False]
  [False False]
  [False False]]

 [[False False]
  [False False]
  [ True  True]]]

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  后悔当初        
                
              
                            
                2020-11-28 16:00
              
            
            
                                                                       
You can use scipy's cdist which has a few advantages:

from scipy.spatial.distance import cdist

a = np.array([[1,0],[2,0],[3,1],[4,2]])
b = np.array([[1,0],[2,0],[4,2]])

c = cdist(a, b)==0
print(c.any(axis=1))


[ True  True False  True]


print(a[c.any(axis=1)])


[[1 0]
 [2 0]
 [4 2]]


Also, cdist allows passing of a function pointer. So you can specify your own distance functions, to do whatever comparison you need:

c = cdist(a, b, lambda u, v: (u==v).all())
print(c)


[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 0.]
 [0. 0. 1.]]


And now you can find which index matches. Which will also indicate if there are multiple matches.

# Array with multiple instances
a2 = np.array([[1,0],[2,0],[3,1],[4,2],[3,1],[4,2]])

c2 = cdist(a2, b, lambda u, v: (u==v).all())
print(c2)

idx = np.where(c2==1)
print(idx)

print(idx[0][idx[1]==2])


[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 0.]
 [0. 0. 1.]
 [0. 0. 0.]
 [0. 0. 1.]]
(array([0, 1, 3, 5], dtype=int64), array([0, 1, 2, 2], dtype=int64))
[3 5]

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  误落风尘        
                
              
                            
                2020-11-28 16:06
              
            
            
                                                                       
you can use numpy with apply_along_axis (kind of iteration on specific axis while axis=0 iterate on every cell, axis=1 iterate on every row, axis=2 on matrix and so on

import numpy as np
a = np.array([[1,0],[2,0],[3,1],[4,2]])
b = np.array([[1,0],[2,0],[4,2]])
c = np.apply_along_axis(lambda x,y: x in y, 1, a, b)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  青春惊慌失措        
                
              
                            
                2020-11-28 16:11
              
            
            
                                                                       
Approach #1

We could use a view based vectorized solution -

# https://stackoverflow.com/a/45313353/ @Divakar
def view1D(a, b): # a, b are arrays
    a = np.ascontiguousarray(a)
    b = np.ascontiguousarray(b)
    void_dt = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))
    return a.view(void_dt).ravel(),  b.view(void_dt).ravel()

A,B = view1D(a,b)
out = np.isin(A,B)


Sample run -

In [8]: a
Out[8]: 
array([[1, 0],
       [2, 0],
       [3, 1],
       [4, 2]])

In [9]: b
Out[9]: 
array([[1, 0],
       [2, 0],
       [4, 2]])

In [10]: A,B = view1D(a,b)

In [11]: np.isin(A,B)
Out[11]: array([ True,  True, False,  True])


Approach #2

Alternatively for the case when all rows in b are in a and rows are lexicographically sorted, using the same views, but with searchsorted -

out = np.zeros(len(A), dtype=bool)
out[np.searchsorted(A,B)] = 1


If the rows are not necessarily lexicographically sorted -

sidx = A.argsort()
out[sidx[np.searchsorted(A,B,sorter=sidx)]] = 1

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  旧时难觅i        
                
              
                            
                2020-11-28 16:13
              
            
            
                                                                       
a = np.array([[1,0],[2,0],[3,1],[4,2]])
b = np.array([[1,0],[2,0],[4,2]])

i = 0
j = 0
result = []


We can take advantage of the fact that they are sorted and do this in O(n) time. Using two pointers we just move ahead the pointer that has gotten behind: 

while i < len(a) and j < len(b):
    if tuple(a[i])== tuple(b[j]):
        result.append(True)
        i += 1
        j += 1 # get rid of this depending on how you want to handle duplicates
    elif tuple(a[i]) > tuple(b[j]):
        j += 1
    else:
        result.append(False)
        i += 1


Pad with False if it ends early.

if len(result) < len(a):
    result.extend([False] * (len(a) - len(result)))

print(result) # [True, True, False, True]


This answer is adapted from Better way to find matches in two sorted lists than using for loops? (Java)
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复