python: check if an numpy array contains any element of another array

后端 未结 3 1213
长情又很酷
长情又很酷 2021-02-13 12:08

What is the best way to check if an numpy array contains any element of another array?

example:

array1 = [10,5,4,13,10,1,1,22,7,3,15,9]
array2 = [3,4,9,1         


        
相关标签:
3条回答
  • 2021-02-13 12:53

    You can use any built-in function and list comprehension:

    >>> array1 = [10,5,4,13,10,1,1,22,7,3,15,9]
    >>> array2 = [3,4,9,10,13,15,16,18,19,20,21,22,23]
    >>> any(i in array2 for i in array1)
    True
    
    0 讨论(0)
  • 2021-02-13 13:06

    You can try this

    >>> array1 = [10,5,4,13,10,1,1,22,7,3,15,9]
    >>> array2 = [3,4,9,10,13,15,16,18,19,20,21,22,23]
    >>> set(array1) & set(array2)
    set([3, 4, 9, 10, 13, 15, 22])
    

    If you get result means there are common elements in both array.

    If result is empty means no common elements.

    0 讨论(0)
  • 2021-02-13 13:08

    Using Pandas, you can use isin:

    a1 = np.array([10,5,4,13,10,1,1,22,7,3,15,9])
    a2 = np.array([3,4,9,10,13,15,16,18,19,20,21,22,23])
    
    >>> pd.Series(a1).isin(a2).any()
    True
    

    And using the in1d numpy function(per the comment from @Norman):

    >>> np.any(np.in1d(a1, a2))
    True
    

    For small arrays such as those in this example, the solution using set is the clear winner. For larger, dissimilar arrays (i.e. no overlap), the Pandas and Numpy solutions are faster. However, np.intersect1d appears to excel for larger arrays.

    Small arrays (12-13 elements)

    %timeit set(array1) & set(array2)
    The slowest run took 4.22 times longer than the fastest. This could mean that an intermediate result is being cached 
    1000000 loops, best of 3: 1.69 µs per loop
    
    %timeit any(i in a1 for i in a2)
    The slowest run took 12.29 times longer than the fastest. This could mean that an intermediate result is being cached 
    100000 loops, best of 3: 1.88 µs per loop
    
    %timeit np.intersect1d(a1, a2)
    The slowest run took 10.29 times longer than the fastest. This could mean that an intermediate result is being cached 
    100000 loops, best of 3: 15.6 µs per loop
    
    %timeit np.any(np.in1d(a1, a2))
    10000 loops, best of 3: 27.1 µs per loop
    
    %timeit pd.Series(a1).isin(a2).any()
    10000 loops, best of 3: 135 µs per loop
    

    Using an array with 100k elements (no overlap):

    a3 = np.random.randint(0, 100000, 100000)
    a4 = a3 + 100000
    
    %timeit np.intersect1d(a3, a4)
    100 loops, best of 3: 13.8 ms per loop    
    
    %timeit pd.Series(a3).isin(a4).any()
    100 loops, best of 3: 18.3 ms per loop
    
    %timeit np.any(np.in1d(a3, a4))
    100 loops, best of 3: 18.4 ms per loop
    
    %timeit set(a3) & set(a4)
    10 loops, best of 3: 23.6 ms per loop
    
    %timeit any(i in a3 for i in a4)
    1 loops, best of 3: 34.5 s per loop
    
    0 讨论(0)
提交回复
热议问题