Removing duplicate columns and rows from a NumPy 2D array

后端 未结 6 1757
别那么骄傲
别那么骄傲 2020-11-29 04:55

I\'m using a 2D shape array to store pairs of longitudes+latitudes. At one point, I have to merge two of these 2D arrays, and then remove any duplicated entry. I\'ve been se

相关标签:
6条回答
  • 2020-11-29 05:08
    >>> import numpy as NP
    >>> # create a 2D NumPy array with some duplicate rows
    >>> A
        array([[1, 1, 1, 5, 7],
               [5, 4, 5, 4, 7],
               [7, 9, 4, 7, 8],
               [5, 4, 5, 4, 7],
               [1, 1, 1, 5, 7],
               [5, 4, 5, 4, 7],
               [7, 9, 4, 7, 8],
               [5, 4, 5, 4, 7],
               [7, 9, 4, 7, 8]])
    
    >>> # first, sort the 2D NumPy array row-wise so dups will be contiguous
    >>> # and rows are preserved
    >>> a, b, c, d, e = A.T    # create the keys for to pass to lexsort
    >>> ndx = NP.lexsort((a, b, c, d, e))
    >>> ndx
        array([1, 3, 5, 7, 0, 4, 2, 6, 8])
    >>> A = A[ndx,]
    
    >>> # now diff by row
    >>> A1 = NP.diff(A, axis=0)
    >>> A1
        array([[0, 0, 0, 0, 0],
               [4, 3, 3, 0, 0],
               [0, 0, 0, 0, 0],
               [0, 0, 0, 1, 0],
               [0, 0, 1, 0, 0],
               [2, 5, 0, 2, 1],
               [0, 0, 0, 0, 0],
               [0, 0, 0, 0, 0]])
    
    >>> # the index array holding the location of each duplicate row
    >>> ndx = NP.any(A1, axis=1)  
    >>> ndx
        array([False,  True, False,  True,  True,  True, False, False], dtype=bool)  
    
    >>> # retrieve the duplicate rows:
    >>> A[1:,:][ndx,]
        array([[7, 9, 4, 7, 8],
               [1, 1, 1, 5, 7],
               [5, 4, 5, 4, 7],
               [7, 9, 4, 7, 8]])
    
    0 讨论(0)
  • 2020-11-29 05:12

    This should do the trick:

    def unique_rows(a):
        a = np.ascontiguousarray(a)
        unique_a = np.unique(a.view([('', a.dtype)]*a.shape[1]))
        return unique_a.view(a.dtype).reshape((unique_a.shape[0], a.shape[1]))
    

    Example:

    >>> a = np.array([[1, 1], [2, 3], [1, 1], [5, 4], [2, 3]])
    >>> unique_rows(a)
    array([[1, 1],
           [2, 3],
           [5, 4]])
    
    0 讨论(0)
  • 2020-11-29 05:18

    Here's one idea, it'll take a little bit of work but could be quite fast. I'll give you the 1d case and let you figure out how to extend it to 2d. The following function finds the unique elements of of a 1d array:

    import numpy as np
    def unique(a):
        a = np.sort(a)
        b = np.diff(a)
        b = np.r_[1, b]
        return a[b != 0]
    

    Now to extend it to 2d you need to change two things. You will need to figure out how to do the sort yourself, the important thing about the sort will be that two identical entries end up next to each other. Second, you'll need to do something like (b != 0).all(axis) because you want to compare the whole row/column. Let me know if that's enough to get you started.

    updated: With some help with doug, I think this should work for the 2d case.

    import numpy as np
    def unique(a):
        order = np.lexsort(a.T)
        a = a[order]
        diff = np.diff(a, axis=0)
        ui = np.ones(len(a), 'bool')
        ui[1:] = (diff != 0).any(axis=1) 
        return a[ui]
    
    0 讨论(0)
  • 2020-11-29 05:19

    since you refer to numpy.unique, you dont care to maintain the original order, correct? converting into set, which removes duplicate, and then back to list is often used idiom:

    >>> x = [(1, 1), (2, 3), (1, 1), (5, 4), (2, 3)]
    >>> y = list(set(x))
    >>> y
    [(5, 4), (2, 3), (1, 1)]
    >>> 
    
    0 讨论(0)
  • 2020-11-29 05:33

    The numpy_indexed package (disclaimer: I am its author) wraps the solution posted by user545424 in a nice and tested interface, plus many related features:

    import numpy_indexed as npi
    npi.unique(coordskeys)
    
    0 讨论(0)
  • 2020-11-29 05:34

    My method is by turning a 2d array into 1d complex array, where the real part is 1st column, imaginary part is the 2nd column. Then use np.unique. Though this will only work with 2 columns.

    import numpy as np 
    def unique2d(a):
        x, y = a.T
        b = x + y*1.0j 
        idx = np.unique(b,return_index=True)[1]
        return a[idx] 
    

    Example -

    a = np.array([[1, 1], [2, 3], [1, 1], [5, 4], [2, 3]])
    unique2d(a)
    array([[1, 1],
           [2, 3],
           [5, 4]])
    
    0 讨论(0)
提交回复
热议问题