Extend numpy mask by n cells to the right for each bad value, efficiently

前端 未结 7 1359
[愿得一人]
[愿得一人] 2021-02-15 15:39

Let\'s say I have a length 30 array with 4 bad values in it. I want to create a mask for those bad values, but since I will be using rolling window functions, I\'d also like a f

7条回答
  •  时光说笑
    2021-02-15 15:42

    A few years late, but I've come up with a fully vectorized solution that requires no loops or copies (besides the mask itself). This solution is a bit (potentially) dangerous because it uses numpy.lib.stride_tricks.as_strided. It's also not as fast as @swentzel's solution.

    The idea is to take the mask and create a 2D view of it, where the second dimension is just the elements that follow the current element. Then you can just set an entire column to True if the head is True. Since you are dealing with a view, setting a column will actually set the following elements in the mask.

    Start with the data:

    import numpy as np
    a = np.array([4, 0, 8, 5, 10, 9, np.nan, 1, 4, 9, 9, np.nan, np.nan, 9,\
                  9, 8, 0, 3, 7, 9, 2, 6, 7, 2, 9, 4, 1, 1, np.nan, 10])
    n = 3
    

    Now, we will make the mask a.size + n elements long, so that you don't have to process the last n elements manually:

    mask = np.empty(a.size + n, dtype=np.bool)
    np.isnan(a, out=mask[:a.size])
    mask[a.size:] = False
    

    Now the cool part:

    view = np.lib.stride_tricks.as_strided(mask, shape=(n + 1, a.size),
                                           strides=mask.strides * 2)
    

    That last part is crucial. mask.strides is a tuple like (1,) (since bools are usually about that many bytes across. Doubling it means that you take a 1-byte step to move one element in any dimension.

    Now all you need to do is expand the mask:

    view[1:, view[0]] = True
    

    That's it. Now mask has what you want. Keep in mind that this only works because the assignment index precedes the last changed value. You could not get away with view[1:] |= view[0].

    For benching purposes, it appears that the definition of n has changed from the question, so the following function takes that into account:

    def madphysicist0(a, n):
        m = np.empty(a.size + n - 1, dtype=np.bool)
        np.isnan(a, out=m[:a.size])
        m[a.size:] = False
    
        v = np.lib.stride_tricks.as_strided(m, shape=(n, a.size), strides=m.strides * 2)
        v[1:, v[0]] = True
        return v[0]
    

    V2

    Taking a leaf out of the existing fastest answer, we only need to copy log2(n) rows, not n rows:

    def madphysicist1(a, n):
        m = np.empty(a.size + n - 1, dtype=np.bool)
        np.isnan(a, out=m[:a.size])
        m[a.size:] = False
    
        v = np.lib.stride_tricks.as_strided(m, shape=(n, a.size), strides=m.strides * 2)
    
        stop = int(np.log2(n))
        for k in range(1, stop + 1):
            v[k, v[0]] = True
        if (1<

    Since this doubles the size of the mask at every iteration, it works a bit faster than Fibonacci: https://math.stackexchange.com/q/894743/295281

提交回复
热议问题