I\'m processing data by looping over vectors along an axis (could be any axis) of numpy ndarray (could be of any dimensions).
I didn\'t work on array directly becaus
The best way to test efficiency is to do time tests on realistic examples. But %timeit
(ipython) tests on toy examples are a start.
Based on experience from answering similar 'if you must iterate' questions, there isn't much difference in times. np.frompyfunc
has a modest speed edge - but its pyfunc
takes scalars, not arrays or slices. (np.vectorize
is a nicer API to this function, and a bit slower).
But here you want to pass a 1d slice of an array to your function, while iterating over all the other dimensions. I don't think there's much difference in the alternative iteration methods.
Actions like swapaxis
, transpose
and ravel
are fast, often just creating a new view with different shape and strides.
np.ndindex
uses np.nditer
(with the multindex flat) to iterate over a range of dimensions. nditer
is fast when used in C code, but isn't anything special when used in Python code.
np.apply_along_axis
creates a (i,j,:,k)
indexing tuple, and steps the variables. It's a nice general approach, but isn't doing anything special to speed things up. itertools.product
is another way of generating the indices.
But usually it isn't the iteration mechanism that slows things down, it's the repeated call to your function. You can test the iteration mechanism by using a trivial function, e.g.
def foo(x):
return x
===================
You don't need to swapaxes
to use ndindex
; you can use it to iterate on any combination of axes.
For example, make a 3d array, and sum along the middle dimension:
In [495]: x=np.arange(2*3*4).reshape(2,3,4)
In [496]: N=np.ndindex(2,4)
In [497]: [x[i,:,k].sum() for i,k in N]
Out[497]: [12, 15, 18, 21, 48, 51, 54, 57]
In [498]: x.sum(1)
Out[498]:
array([[12, 15, 18, 21],
[48, 51, 54, 57]])
I don't think it makes a difference in speed; the code's just simpler.
===================
Another possible tool is np.ma
, masked arrays. With those you mark individual elements as masked (because they are nan
or 0
). It has code that evaluates things like sum
, mean
, product
in such a way that the masked values don't harm the solution.
The 3d array again:
In [517]: x=np.arange(2*3*4).reshape(2,3,4)
add in some bad values:
In [518]: x[1,1,2]=99
In [519]: x[0,0,:]=99
those values mess up the normal sum
:
In [520]: x.sum(axis=1)
Out[520]:
array([[111, 113, 115, 117],
[ 48, 51, 135, 57]])
but if we mask them, they are 'filtered out' of the solution (in this case, they are set temporarily to 0)
In [521]: xm=np.ma.masked_greater(x,50)
In [522]: xm
Out[522]:
masked_array(data =
[[[-- -- -- --]
[4 5 6 7]
[8 9 10 11]]
[[12 13 14 15]
[16 17 -- 19]
[20 21 22 23]]],
mask =
[[[ True True True True]
...
[False False False False]]],
fill_value = 999999)
In [523]: xm.sum(1)
Out[523]:
masked_array(data =
[[12 14 16 18]
[48 51 36 57]],
...)