Slicing numpy array with another array

后端 未结 5 1191
青春惊慌失措
青春惊慌失措 2021-01-17 16:12

I\'ve got a large one-dimensional array of integers I need to take slices off. That\'s trivial, I\'d just do a[start:end]. The problem is that I need more of th

相关标签:
5条回答
  • 2021-01-17 16:41

    It's not a "pure" numpy solution (although as @mgilson's comment notes, it's hard to see how the irregular output could be a numpy array), but:

    a = numpy.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], numpy.int16)
    start = numpy.array([1, 5, 7], numpy.int16)
    end   = numpy.array([2, 10, 9], numpy.int16)
    
    map(lambda range: a[range[0]:range[1]],zip(start,end))
    

    gets you:

    [array([1], dtype=int16), array([5, 6, 7, 8, 9], dtype=int16),  array([7, 8], dtype=int16)]
    

    as required.

    0 讨论(0)
  • 2021-01-17 16:44

    There is no numpy method to do this. Note that since it is irregular, it would only be a list of arrays/slices anyways. However I would like to add that for all (binary) ufuncs which are almost all functions in numpy (or they are at least based on them), there is the reduceat method, which might help you to avoid actually creating a list of slices, and thus, if the slices are small, speed up calculations too:

    In [1]: a = np.arange(10)
    
    In [2]: np.add.reduceat(a, [0,4,7]) # add up 0:4, 4:7 and 7:end
    Out[2]: array([ 6, 15, 24])
    
    In [3]: np.maximum.reduceat(a, [0,4,7]) # maximum of each of those slices
    Out[3]: array([3, 6, 9])
    
    In [4]: w = np.asarray([0,4,7,10]) # 10 for the total length
    
    In [5]: np.add.reduceat(a, w[:-1]).astype(float)/np.diff(w) # equivalent to mean
    Out[5]: array([ 1.5,  5. ,  8. ])
    

    EDIT: Since your slices overlap, I will add that this is OK too:

    # I assume that start is sorted for performance reasons.
    reductions = np.column_stack((start, end)).ravel()
    sums = np.add.reduceat(a, reductions)[::2]
    

    The [::2] should be no big deal here normally, since no real extra work is done for overlapping slices.

    Also there is one problem here with slices for which stop==len(a). This must be avoided. If you have exactly one slice with it, you could just do reductions = reductions[:-1] (if its the last one), but otherwise you will simply need to append a value to a to trick reduceat:

     a = np.concatenate((a, [0]))
    

    As adding one value to the end does not matter since you work on the slices anyways.

    0 讨论(0)
  • 2021-01-17 16:45

    If you want it in one line, it would be:

    x=[list(a[s:e]) for (s,e) in zip(start,end)]
    
    0 讨论(0)
  • 2021-01-17 16:51

    This can (almost?) be done in pure numpy using masked arrays and stride tricks. First, we create our mask:

    >>> indices = numpy.arange(a.size)
    >>> mask = ~((indices >= start[:,None]) & (indices < end[:,None]))
    

    Or more simply:

    >>> mask = (indices < start[:,None]) | (indices >= end[:,None])
    

    The mask is False (i.e. values not masked) for those indices that are >= to the start value and < the end value. (Slicing with None (aka numpy.newaxis) adds a new dimension, enabling broadcasting.) Now our mask looks like this:

    >>> mask
    array([[ True, False,  True,  True,  True,  True,  True,  True,  True,
             True,  True,  True],
           [ True,  True,  True,  True,  True, False, False, False, False,
            False,  True,  True],
           [ True,  True,  True,  True,  True,  True,  True, False, False,
             True,  True,  True]], dtype=bool)
    

    Now we have to stretch the array to fit the mask using stride_tricks:

    >>> as_strided = numpy.lib.stride_tricks.as_strided
    >>> strided = as_strided(a, mask.shape, (0, a.strides[0]))
    >>> strided
    array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11],
           [ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11],
           [ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11]], dtype=int16)
    

    This looks like a 3x12 array, but each row points at the same memory. Now we can combine them into a masked array:

    >>> numpy.ma.array(strided, mask=mask)
    masked_array(data =
     [[-- 1 -- -- -- -- -- -- -- -- -- --]
     [-- -- -- -- -- 5 6 7 8 9 -- --]
     [-- -- -- -- -- -- -- 7 8 -- -- --]],
                 mask =
     [[ True False  True  True  True  True  True  True  True  True  True  True]
     [ True  True  True  True  True False False False False False  True  True]
     [ True  True  True  True  True  True  True False False  True  True  True]],
           fill_value = 999999)
    

    This isn't quite the same as what you asked for, but it should behave similarly.

    0 讨论(0)
  • 2021-01-17 17:02

    Similar solution like timday. Similar speed:

    a = np.random.randint(0,20,1e6)
    start = np.random.randint(0,20,1e4)
    end = np.random.randint(0,20,1e4)
    
    def my_fun(arr,start,end):
            return arr[start:end]
    
    %timeit [my_fun(a,i[0],i[1]) for i in zip(start,end)]
    %timeit map(lambda range: a[range[0]:range[1]],zip(start,end))
    

    100 loops, best of 3: 7.06 ms per loop 100 loops, best of 3: 6.87 ms per loop

    0 讨论(0)
提交回复
热议问题