Delete element from multi-dimensional numpy array by value

前端 未结 5 1708
抹茶落季
抹茶落季 2020-12-18 10:06

Given a numpy array

a = np.array([[0, -1, 0], [1, 0, 0], [1, 0, -1]])

what\'s the fastest way to delete all elements of value -1

相关标签:
5条回答
  • 2020-12-18 10:55

    How about this?

    print([[y for y in x if y > -1] for x in a])
    [[0, 0], [1, 0, 0], [1, 0]]
    
    0 讨论(0)
  • 2020-12-18 10:56

    Approach #1 : Using NumPy splitting of array -

    def split_based(a, val):
        mask = a!=val
        p = np.split(a[mask],mask.sum(1)[:-1].cumsum())
        out = np.array(list(map(list,p)))
        return out
    

    Approach #2 : Using loop comprehension, but minimal work within the loop -

    def loop_compr_based(a, val):
        mask = a!=val
        stop = mask.sum(1).cumsum()
        start = np.append(0,stop[:-1])
        am = a[mask].tolist()
        out = np.array([am[start[i]:stop[i]] for i  in range(len(start))])
        return out
    

    Sample run -

    In [391]: a
    Out[391]: 
    array([[ 0, -1,  0],
           [ 1,  0,  0],
           [ 1,  0, -1],
           [-1, -1,  8],
           [ 3,  7,  2]])
    
    In [392]: split_based(a, val=-1)
    Out[392]: array([[0, 0], [1, 0, 0], [1, 0], [8], [3, 7, 2]], dtype=object)
    
    In [393]: loop_compr_based(a, val=-1)
    Out[393]: array([[0, 0], [1, 0, 0], [1, 0], [8], [3, 7, 2]], dtype=object)
    

    Runtime test -

    In [387]: a = np.random.randint(-2,10,(1000,1000))
    
    In [388]: %timeit split_based(a, val=-1)
    10 loops, best of 3: 161 ms per loop
    
    In [389]: %timeit loop_compr_based(a, val=-1)
    10 loops, best of 3: 29 ms per loop
    
    0 讨论(0)
  • 2020-12-18 10:57

    Another method you might consider:

    def iterative_numpy(a):
        mask = a != 1
        out = np.array([ a[i,mask[i]] for i xrange(a.shape[0]) ])
        return out
    

    Divakar's method loop_compr_based calculates sums along the rows of mask and a cumulative sum of that result. This method avoids such summations but still has to iterate through the rows of a. It also returns an array of arrays. This has the annoyance that out has to be indexed with the syntax out[1][2] rather than out[1,2]. Comparing the times with a matrix random integer matrices:

    In [4]: a = np.random.random_integers(-1,1, size = (3,30))
    
    In [5]: %timeit iterative_numpy(a)
    100000 loops, best of 3: 11.1 us per loop
    
    In [6]: %timeit loop_compr_based(a)
    10000 loops, best of 3: 20.2 us per loop
    
    In [7]: a = np.random.random_integers(-1,1, size = (30,3))
    
    In [8]: %timeit iterative_numpy(a)
    10000 loops, best of 3: 59.5 us per loop
    
    In [9]: %timeit loop_compr_based(a)
    10000 loops, best of 3: 30.8 us per loop
    
    In [10]: a = np.random.random_integers(-1,1, size = (30,30))
    
    In [11]: %timeit iterative_numpy(a)
    10000 loops, best of 3: 64.6 us per loop
    
    In [12]: %timeit loop_compr_based(a)
    10000 loops, best of 3: 36 us per loop
    

    When there are more columns than rows, iterative_numpy wins out. When there are more rows than columns, loop_compr_based wins but transposing a first will improve the performance of both methods. When the dimensions are comparably the same, loop_compr_based is best.

    Important Side Discussion

    Outside of the implementation, it's important to note that any numpy array which has a non-uniform shape is not an actual array in the sense that the values do not occupy a contiguous section of memory and further, the usual array operations will not work as expected.

    As an example:

    >>> a = np.array([[1,2,3],[1,2],[1]])
    >>> a*2
    array([[1, 2, 3, 1, 2, 3], [1, 2, 1, 2], [1, 1]], dtype=object)
    

    Notice that numpy actually informs us that this is not the usual numpy array with the note dtype=object.

    Thus it might be best to just make a list of numpy arrays and use them accordingly.

    0 讨论(0)
  • 2020-12-18 11:00

    For almost everything you might want to do with such an array, you can use a masked array

    a = np.array([[0, -1, 0], [1, 0, 0], [1, 0, -1]])
    
    b=np.ma.masked_equal(a,-1)
    
    b
    Out[5]: 
    masked_array(data =
     [[0 -- 0]
     [1 0 0]
     [1 0 --]],
                 mask =
     [[False  True False]
     [False False False]
     [False False  True]],
           fill_value = -1)
    

    If you really want the ragged array, it can be .compressed() by line

    c=np.array([b[i].compressed() for i in range(b.shape[0])])
    
    c
    Out[10]: array([array([0, 0]), array([1, 0, 0]), array([1, 0])], dtype=object)
    
    0 讨论(0)
  • 2020-12-18 11:08

    Use indexes = np.where(a == -1) to get indexes of elements Find indices of elements equal to zero from numpy array

    Then delete specific elements by index with np.delete(your_array, indexes) How to remove specific elements in a numpy array

    0 讨论(0)
提交回复
热议问题