Numpy: Row Wise Unique elements

后端 未结 5 993
名媛妹妹
名媛妹妹 2020-12-18 10:43

Does any one know how to get unique elements row wise in a matrix. For e.g. input matrix may be like:

a = [[1,2,1,3,4,1,3],
     [5,5,3,1,5,1,2],
     [1,2,3         


        
相关标签:
5条回答
  • 2020-12-18 11:21

    Assuming the values in a are floats, you could use:

    def using_complex(a):
        weight = 1j*np.linspace(0, a.shape[1], a.shape[0], endpoint=False)
        b = a + weight[:, np.newaxis]
        u, ind = np.unique(b, return_index=True)
        b = np.zeros_like(a)
        np.put(b, ind, a.flat[ind])
        return b
    
    In [46]: using_complex(a)
    Out[46]: 
    array([[1, 2, 0, 3, 4, 0, 0],
           [5, 0, 3, 1, 0, 0, 2],
           [1, 2, 3, 4, 5, 6, 7],
           [9, 3, 8, 2, 0, 0, 4],
           [4, 6, 7, 0, 2, 3, 5]])
    

    Note that using_complex does not return the unique values in the same order as rowWiseUnique; per the comments underneath the question, sorting the values is not required.


    The most efficient method may depend on the number of rows in the array. Methods that use map or a for-loop to handle each row separately are good if the number of rows is not too large, but if there are lots of rows, you can do better by using a numpy trick to handle the entire array with one call to np.unique.

    The trick is to add a unique imaginary number to each row. That way, when you call np.unique, the floats in the original array will be recognized as different values if they occur in different rows, but be treated as the same value if they occur in the same row.

    Below, this trick is implemented in the function using_complex. Here is a benchmark comparing rowWiseUnique, the original method, with using_complex and solve:

    In [87]: arr = np.random.randint(10, size=(100000, 10))
    
    In [88]: %timeit rowWiseUnique(arr)
    1 loops, best of 3: 1.34 s per loop
    
    In [89]: %timeit solve(arr)
    1 loops, best of 3: 1.78 s per loop
    
    In [90]: %timeit using_complex(arr)
    1 loops, best of 3: 206 ms per loop
    

    import numpy as np
    
    a = np.array([[1,2,1,3,4,1,3],
         [5,5,3,1,5,1,2],
         [1,2,3,4,5,6,7],
         [9,3,8,2,9,8,4],
         [4,6,7,4,2,3,5]])
    
    def using_complex(a):
        weight = 1j*np.linspace(0, a.shape[1], a.shape[0], endpoint=False)
        b = a + weight[:, np.newaxis]
        u, ind = np.unique(b, return_index=True)
        b = np.zeros_like(a)
        np.put(b, ind, a.flat[ind])
        return b
    
    def rowWiseUnique(a):
        b = map(uniqueRowElements,a)
        b = np.asarray(b)
        return b
    
    def uniqueRowElements(row):
        length = row.shape[0]
        newRow = np.unique(row)
        zerosNumb = length-newRow.shape[0]
        zeros = np.zeros(zerosNumb)
        nR = np.concatenate((newRow,zeros),axis=0)
        return nR    
    
    def solve(arr):
        n = arr.shape[1]
        new_arr = np.empty(arr.shape)
        for i, row in enumerate(arr):
            new_row = np.unique(row)
            new_arr[i] = np.hstack((new_row, np.zeros(n - len(new_row))))
        return new_arr
    
    0 讨论(0)
  • 2020-12-18 11:24

    A variation on OP's solution with a slight improvement, ~3% when using numpy.apply_along_axis with large (1000x1000) arrays - but still a bit slower than @Ashwini's solution.

    def foo(row):
        b = np.zeros(row.shape)
        u = np.unique(row)
        b[:u.shape[0]] = u
        return b
    
    b = np.apply_along_axis(foo, 1, a)
    

    Timing ratios seem to be a bit closer using an array with duplicates in the rows, a = np.random.random_integers(0, 500, (1000*1000)).reshape(1000,1000).

    0 讨论(0)
  • 2020-12-18 11:24

    The fastest way should be to set all duplicates to zero using sort and diff:

    def row_unique(a):
        unique = np.sort(a)
        duplicates = unique[:,  1:] == unique[:, :-1]
        unique[:, 1:][duplicates] = 0
        return unique
    

    This is about 3 times as fast as the unutbu's solution on my computer:

    In [26]: a = np.random.randint(1, 101, size=100000).reshape(1000, 100)
    
    In [27]: %timeit row_unique(a)
    100 loops, best of 3: 3.18 ms per loop
    
    In [28]: %timeit using_complex(a)
    100 loops, best of 3: 15.4 ms per loop
    
    In [29]: assert np.all(np.sort(using_complex(a)) == np.sort(row_unique(a)))
    
    0 讨论(0)
  • 2020-12-18 11:37

    It's not very efficient, because moving all zeros into a row's end can't be very efficient.

    import numpy as np
    
    a = np.array([[1,2,1,3,4,1,3],
         [5,5,3,1,5,1,2],
         [1,2,3,4,5,6,7],
         [9,3,8,2,9,8,4],
         [4,6,7,4,2,3,5]])
    
    row_len = len(a[0])
    
    for r in xrange(len(a)):
        found = set()
        for i in xrange(row_len):
            if a[r][i] not in found:
                found.add(a[r][i])
            else:
                a[r][i] = 0
        a[r].sort()
        a[r] = a[r][::-1]
    
    print(a)
    

    Output:

    [[4 3 2 1 0 0 0]
     [5 3 2 1 0 0 0]
     [7 6 5 4 3 2 1]
     [9 8 4 3 2 0 0]
     [7 6 5 4 3 2 0]]
    
    0 讨论(0)
  • 2020-12-18 11:45

    You can do something like this:

    def solve(arr):
        n = arr.shape[1]
        new_arr = np.empty(arr.shape)
        for i, row in enumerate(arr):
            new_row = np.unique(row)
            new_arr[i] = np.hstack((new_row, np.zeros(n - len(new_row))))
        return new_arr
    

    This is around 4X times faster than OP's current code for 1000 X 1000 array:

    >>> arr = np.arange(1000000).reshape(1000, 1000)
    >>> %timeit b = map(uniqueRowElements, arr); b = np.asarray(b)
    10 loops, best of 3: 71.2 ms per loop
    >>> %timeit solve(arr)
    100 loops, best of 3: 16.6 ms per loop
    
    0 讨论(0)
提交回复
热议问题