Find unique rows in numpy.array

后端 未结 20 2873
独厮守ぢ
独厮守ぢ 2020-11-21 10:57

I need to find unique rows in a numpy.array.

For example:

>>> a # I have
array([[1, 1, 1, 0, 0, 0],
       [0, 1, 1, 1, 0, 0],
         


        
相关标签:
20条回答
  • 2020-11-21 11:56

    I didn’t like any of these answers because none handle floating-point arrays in a linear algebra or vector space sense, where two rows being “equal” means “within some

    0 讨论(0)
  • 2020-11-21 11:56

    Lets get the entire numpy matrix as a list, then drop duplicates from this list, and finally return our unique list back into a numpy matrix:

    matrix_as_list=data.tolist() 
    matrix_as_list:
    [[1, 1, 1, 0, 0, 0], [0, 1, 1, 1, 0, 0], [0, 1, 1, 1, 0, 0], [1, 1, 1, 0, 0, 0], [1, 1, 1, 1, 1, 0]]
    
    uniq_list=list()
    uniq_list.append(matrix_as_list[0])
    
    [uniq_list.append(item) for item in matrix_as_list if item not in uniq_list]
    
    unique_matrix=np.array(uniq_list)
    unique_matrix:
    array([[1, 1, 1, 0, 0, 0],
           [0, 1, 1, 1, 0, 0],
           [1, 1, 1, 1, 1, 0]])
    
    0 讨论(0)
  • 2020-11-21 11:57

    Why not use drop_duplicates from pandas:

    >>> timeit pd.DataFrame(image.reshape(-1,3)).drop_duplicates().values
    1 loops, best of 3: 3.08 s per loop
    
    >>> timeit np.vstack({tuple(r) for r in image.reshape(-1,3)})
    1 loops, best of 3: 51 s per loop
    
    0 讨论(0)
  • 2020-11-21 11:58

    Yet another possible solution

    np.vstack({tuple(row) for row in a})
    
    0 讨论(0)
  • 2020-11-21 12:01

    I've compared the suggested alternative for speed and found that, surprisingly, the void view unique solution is even a bit faster than numpy's native unique with the axis argument. If you're looking for speed, you'll want

    numpy.unique(
        a.view(numpy.dtype((numpy.void, a.dtype.itemsize*a.shape[1])))
        ).view(a.dtype).reshape(-1, a.shape[1])
    


    Code to reproduce the plot:

    import numpy
    import perfplot
    
    
    def unique_void_view(a):
        return numpy.unique(
            a.view(numpy.dtype((numpy.void, a.dtype.itemsize*a.shape[1])))
            ).view(a.dtype).reshape(-1, a.shape[1])
    
    
    def lexsort(a):
        ind = numpy.lexsort(a.T)
        return a[ind[
            numpy.concatenate((
                [True], numpy.any(a[ind[1:]] != a[ind[:-1]], axis=1)
                ))
            ]]
    
    
    def vstack(a):
        return numpy.vstack({tuple(row) for row in a})
    
    
    def unique_axis(a):
        return numpy.unique(a, axis=0)
    
    
    perfplot.show(
        setup=lambda n: numpy.random.randint(2, size=(n, 20)),
        kernels=[unique_void_view, lexsort, vstack, unique_axis],
        n_range=[2**k for k in range(15)],
        logx=True,
        logy=True,
        xlabel='len(a)',
        equality_check=None
        )
    
    0 讨论(0)
  • 2020-11-21 12:01

    The numpy_indexed package (disclaimer: I am its author) wraps the solution posted by Jaime in a nice and tested interface, plus many more features:

    import numpy_indexed as npi
    new_a = npi.unique(a)  # unique elements over axis=0 (rows) by default
    
    0 讨论(0)
提交回复
热议问题