问题
I'm trying to write a function that performs a mathematical operation on an array and returns the result. A simplified example could be:
def original_func(A):
return A[1:] + A[:-1]
For speed-up and to avoid allocating a new output array for each function call, I would like to have the output array as an argument, and alter it in place:
def inplace_func(A, out):
out[:] = A[1:] + A[:-1]
However, when calling these two functions in the following manner,
A = numpy.random.rand(1000,1000)
out = numpy.empty((999,1000))
C = original_func(A)
inplace_func(A, out)
the original function seems to be twice as fast as the in-place function. How can this be explained? Shouldn't the in-place function be quicker since it doesn't have to allocate memory?
回答1:
I think that the answer is the following:
In both cases, you compute A[1:] + A[:-1]
, and in both cases, you actually create an intermediate matrix.
What happens in the second case, though, is that you explicitly copy the whole big newly allocated array into a reserved memory. Copying such an array takes about the same time as the original operation, so you in fact double the time.
To sum-up, in the first case, you do:
compute A[1:] + A[:-1] (~10ms)
In the second case, you do
compute A[1:] + A[:-1] (~10ms)
copy the result into out (~10ms)
回答2:
If you want to perform the operation in-place, do
def inplace_func(A, out): np.add(A[1:], A[:-1], out)
This does not create any temporaries (which A[1:] + A[:-1]
) does.
All Numpy binary operations have corresponding functions, check the list here: http://docs.scipy.org/doc/numpy/reference/ufuncs.html#available-ufuncs
回答3:
I agree with Olivers explanation. If you want to perform the operation inplace, you have to loop over your array manually. This will be much slower, but if you need speed you can resort to Cython which gives you the speed of a pure C implementation.
来源:https://stackoverflow.com/questions/7529786/altering-numpy-function-output-array-in-place