What is the most efficient way to map a function over a numpy array? The way I\'ve been doing it in my current project is as follows:
import numpy as np
x
Use numpy.fromfunction(function, shape, **kwargs)
See "https://docs.scipy.org/doc/numpy/reference/generated/numpy.fromfunction.html"
squares = squarer(x)
Arithmetic operations on arrays are automatically applied elementwise, with efficient C-level loops that avoid all the interpreter overhead that would apply to a Python-level loop or comprehension.
Most of the functions you'd want to apply to a NumPy array elementwise will just work, though some may need changes. For example, if
doesn't work elementwise. You'd want to convert those to use constructs like numpy.where:
def using_if(x):
if x < 5:
return x
else:
return x**2
becomes
def using_where(x):
return numpy.where(x < 5, x, x**2)
As mentioned in this post, just use generator expressions like so:
numpy.fromiter((<some_func>(x) for x in <something>),<dtype>,<size of something>)
It seems no one has mentioned a built-in factory method of producing ufunc
in numpy package: np.frompyfunc
which I have tested again np.vectorize
and have outperformed it by about 20~30%. Of course it will perform well as prescribed C code or even numba
(which I have not tested), but it can a better alternative than np.vectorize
f = lambda x, y: x * y
f_arr = np.frompyfunc(f, 2, 1)
vf = np.vectorize(f)
arr = np.linspace(0, 1, 10000)
%timeit f_arr(arr, arr) # 307ms
%timeit vf(arr, arr) # 450ms
I have also tested larger samples, and the improvement is proportional. See the documentation also here
All above answers compares well, but if you need to use custom function for mapping, and you have numpy.ndarray
, and you need to retain the shape of array.
I have compare just two, but it will retain the shape of ndarray
. I have used the array with 1 million entries for comparison. Here I use square function, which is also inbuilt in numpy and has great performance boost, since there as was need of something, you can use function of your choice.
import numpy, time
def timeit():
y = numpy.arange(1000000)
now = time.time()
numpy.array([x * x for x in y.reshape(-1)]).reshape(y.shape)
print(time.time() - now)
now = time.time()
numpy.fromiter((x * x for x in y.reshape(-1)), y.dtype).reshape(y.shape)
print(time.time() - now)
now = time.time()
numpy.square(y)
print(time.time() - now)
Output
>>> timeit()
1.162431240081787 # list comprehension and then building numpy array
1.0775556564331055 # from numpy.fromiter
0.002948284149169922 # using inbuilt function
here you can clearly see numpy.fromiter
works great considering to simple approach, and if inbuilt function is available please use that.