Let\'s say I have a length 30 array with 4 bad values in it. I want to create a mask for those bad values, but since I will be using rolling window functions, I\'d also like a f
A few years late, but I've come up with a fully vectorized solution that requires no loops or copies (besides the mask itself). This solution is a bit (potentially) dangerous because it uses numpy.lib.stride_tricks.as_strided. It's also not as fast as @swentzel's solution.
The idea is to take the mask and create a 2D view of it, where the second dimension is just the elements that follow the current element. Then you can just set an entire column to True
if the head is True
. Since you are dealing with a view, setting a column will actually set the following elements in the mask.
Start with the data:
import numpy as np
a = np.array([4, 0, 8, 5, 10, 9, np.nan, 1, 4, 9, 9, np.nan, np.nan, 9,\
9, 8, 0, 3, 7, 9, 2, 6, 7, 2, 9, 4, 1, 1, np.nan, 10])
n = 3
Now, we will make the mask a.size + n
elements long, so that you don't have to process the last n
elements manually:
mask = np.empty(a.size + n, dtype=np.bool)
np.isnan(a, out=mask[:a.size])
mask[a.size:] = False
Now the cool part:
view = np.lib.stride_tricks.as_strided(mask, shape=(n + 1, a.size),
strides=mask.strides * 2)
That last part is crucial. mask.strides
is a tuple like (1,)
(since bools are usually about that many bytes across. Doubling it means that you take a 1-byte step to move one element in any dimension.
Now all you need to do is expand the mask:
view[1:, view[0]] = True
That's it. Now mask
has what you want. Keep in mind that this only works because the assignment index precedes the last changed value. You could not get away with view[1:] |= view[0]
.
For benching purposes, it appears that the definition of n
has changed from the question, so the following function takes that into account:
def madphysicist0(a, n):
m = np.empty(a.size + n - 1, dtype=np.bool)
np.isnan(a, out=m[:a.size])
m[a.size:] = False
v = np.lib.stride_tricks.as_strided(m, shape=(n, a.size), strides=m.strides * 2)
v[1:, v[0]] = True
return v[0]
V2
Taking a leaf out of the existing fastest answer, we only need to copy log2(n)
rows, not n
rows:
def madphysicist1(a, n):
m = np.empty(a.size + n - 1, dtype=np.bool)
np.isnan(a, out=m[:a.size])
m[a.size:] = False
v = np.lib.stride_tricks.as_strided(m, shape=(n, a.size), strides=m.strides * 2)
stop = int(np.log2(n))
for k in range(1, stop + 1):
v[k, v[0]] = True
if (1<
Since this doubles the size of the mask at every iteration, it works a bit faster than Fibonacci: https://math.stackexchange.com/q/894743/295281