NumPy: Comparing Elements in Two Arrays

后端 未结 6 376
不思量自难忘°
不思量自难忘° 2020-12-02 12:57

Anyone ever come up to this problem? Let\'s say you have two arrays like the following

a = array([1,2,3,4,5,6])
b = array([1,4,5])

Is there

相关标签:
6条回答
  • 2020-12-02 13:41

    Use np.intersect1d.

    #!/usr/bin/env python
    import numpy as np
    a = np.array([1,2,3,4,5,6])
    b = np.array([1,4,5])
    c=np.intersect1d(a,b)
    print(c)
    # [1 4 5]
    

    Note that np.intersect1d gives the wrong answer if a or b have nonunique elements. In that case use np.intersect1d_nu.

    There is also np.setdiff1d, setxor1d, setmember1d, and union1d. See Numpy Example List With Doc

    0 讨论(0)
  • 2020-12-02 13:49

    Your example implies set-like behavior, caring more about existance in the array than having the right element at the right place. Numpy does this differently with its mathematical arrays and matrices, it will tell you only about items at the exact right spot. Can you make that work for you?

    >>> import numpy
    >>> a = numpy.array([1,2,3])
    >>> b = numpy.array([1,3,3])
    >>> a == b
    array([ True, False,  True], dtype=bool)
    
    0 讨论(0)
  • 2020-12-02 13:51

    Thanks for your reply kaizer.se. It's not quite what I was looking for, but with a suggestion from a friend and what you said I came up with the following.

    import numpy as np
    
    a = np.array([1,4,5]).astype(np.float32)
    b = np.arange(10).astype(np.float32)
    
    # Assigning matching values from a in b as np.nan
    b[b.searchsorted(a)] = np.nan
    
    # Now generating Boolean arrays
    match = np.isnan(b)
    nonmatch = match == False
    

    It's a bit of a cumbersome process, but it beats writing loops or using weave with loops.

    Cheers

    0 讨论(0)
  • 2020-12-02 13:52

    ebresset, your answer won't work unless a is a subset of b (and a and b are sorted). Otherwise the searchsorted will return false indices. I had to do something similar, and combining that with your code:

    # Assume a and b are sorted
    idxs = numpy.mod(b.searchsorted(a),len(b))
    idxs = idxs[b[idxs]==a]
    b[idxs] = numpy.nan
    match = numpy.isnan(b)
    
    0 讨论(0)
  • 2020-12-02 13:55

    Actually, there's an even simpler solution than any of these:

    import numpy as np
    
    a = array([1,2,3,4,5,6])
    b = array([1,4,5])
    
    c = np.in1d(a,b)
    

    The resulting c is then:

    array([ True, False, False,  True,  True, False], dtype=bool)
    
    0 讨论(0)
  • 2020-12-02 13:56

    Numpy has a set function numpy.setmember1d() that works on sorted and uniqued arrays and returns exactly the boolean array that you want. If the input arrays don't match the criteria you'll need to convert to the set format and invert the transformation on the result.

    import numpy as np
    a = np.array([6,1,2,3,4,5,6])
    b = np.array([1,4,5])
    
    # convert to the uniqued form
    a_set, a_inv = np.unique1d(a, return_inverse=True)
    b_set = np.unique1d(b)
    # calculate matching elements
    matches = np.setmea_set, b_set)
    # invert the transformation
    result = matches[a_inv]
    print(result)
    # [False  True False False  True  True False]
    

    Edit: Unfortunately the setmember1d method in numpy is really inefficient. The search sorted and assign method you proposed works faster, but if you can assign directly you might as well assign directly to the result and avoid lots of unnecessary copying. Also your method will fail if b contains anything not in a. The following corrects those errors:

    result = np.zeros(a.shape, dtype=np.bool)
    idxs = a.searchsorted(b)
    idxs = idxs[np.where(idxs < a.shape[0])] # Filter out out of range values
    idxs = idxs[np.where(a[idxs] == b)] # Filter out where there isn't an actual match
    result[idxs] = True
    print(result)
    

    My benchmarks show this at 91us vs. 6.6ms for your approach and 109ms for numpy setmember1d on 1M element a and 100 element b.

    0 讨论(0)
提交回复
热议问题