问题
For a benchmark comparison, I consider the simple function:
function dealiasing2d(where_dealiased, data)
[n1, n0, nk] = size(data);
for i0=1:n0
for i1=1:n1
if where_dealiased(i1, i0)
data(i1, i0, :) = 0.;
end
end
end
It can be useful in pseudo-spectral simulations (where data
is a 3d array of complex numbers) but basically it applies a mask to a set of images, putting to zeros some elements for which where_dealiased
is true.
I compare the performance of different languages (and implementations, compilers, ...) on this simple case. For Matlab, I time the function with timeit. Since I don't want to benchmark my ignorance in Matlab, I would like to really optimize this function with this language. What would be the fastest way to do this in Matlab?
The simple solution I use now is:
function dealiasing2d(where_dealiased, data)
[n1, n0, nk] = size(data);
N = n0*n1;
ind_zeros = find(reshape(where_dealiased, 1, []));
for ik=1:nk
data(ind_zeros + N*(ik-1)) = 0;
end
I suspect this is not the right way to do it since the equivalent Numpy solution is approximately 10 times faster.
import numpy as np
def dealiasing(where, data):
nk = data.shape[0]
N = reduce(lambda x, y: x*y, data.shape[1:])
inds, = np.nonzero(where.flat)
for ik in xrange(nk):
data.flat[inds + N*ik] = 0.
Finally, if someone tells me something like "When you want to be very fast with a particular function in Matlab, you should compile it like that: [...]", I would include such solution in the benchmark.
Edit:
After 2 answers, I've benchmarked the propositions and it seems that there is no noticeable performance improvement. This is strange since the simple Python-Numpy solution is really (one order of magnitude) much faster so I am still looking for a better solution with Matlab...
回答1:
If I understand correctly, this can be done easily and quickly with bsxfun:
data = bsxfun(@times, data, ~where_dealiased);
This sets to 0
all third-dimension-components of the entries for which where_dealiased
is true
(it multiplies them by 0
), and leaves the rest as they were (it multiplies them by 1
).
Of course, this assumes [size(data,1) size(data,2]==size(where_dealiased)
.
Your solution with linear indexing is probably very fast too. To save some time there, you can remove the reshape
, because find
already returns linear indices:
ind_zeros = find(where_dealiased);
回答2:
Approach #1: Logical indexing With repmat
-
data(repmat(where_dealiased,1,1,size(data,3))) = 0;
Approach #2: Linear indexing with bsxfun(@plus
-
[m,n,r] = size(data);
idx = bsxfun(@plus,find(where_dealiased),[0:r-1]*m*n); %// linear indices
data(idx) = 0;
This should be fast if you are have few non zero elements in where_dealiased
.
回答3:
No optimization without benchmark! So here are some proposed solutions and the performance measurements. The initialization code is:
N = 2000;
nk = 10;
where = false([N, N]);
where(1:100, 1:100) = 1;
data = (5.+j)*ones([N, N, nk]);
and I time the functions with the function timeit like this:
timeit(@() dealiasing2d(where, data))
For comparison, when I do exactly the same with the Numpy function given in the question, it runs in 0.0167 s.
The initial Matlab functions with the 2 loops runs in approximately 0.34 s and the equivalent Numpy function (with 2 loops) is slower and runs in 0.42 s. It could be because Matlab uses JIT compilation.
Luis Mendo mentions that I can remove the reshape
because find
already returns linear indices. I like it since the code is much cleaner but a reshape
is anyway very cheap so it does not really improve the performance of the function:
function dealiasing2d(where, data)
[n1, n0, nk] = size(data);
N = n0*n1;
ind_zeros = find(where);
for ik=1:nk
data(ind_zeros + N*(ik-1)) = 0;
end
This function takes 0.23 s, which is faster than the solution with the 2 loops but really slow compared to the Numpy solution (~14 times slower!). That was the reason why I wrote my question.
Luis Mendo also proposes a solution based on the function bsxfun, which gives:
function dealiasing2d_bsxfun(where, data)
data = bsxfun(@times, data, ~where);
This solution involves N*N*nk
multiplications (by 1 or 0), which is clearly too much work since we just have to put to zero 100*100*nk
values in the array data
. However, these multiplications can be vectorized so it is "quite fast" compared to the other Matlab solutions: 0.23 s, i.e. the same as the first solution using find
!
Both solutions proposed by Divakar involves the creation of a large array of size N*N*nk
. There is no Matlab loop so we can hope for better performances but...
function dealiasing2d_bsxfun2(where, data)
[n1, n0, nk] = size(data);
idx = bsxfun(@plus, find(where), [0:nk-1]*n1*n0);
data(idx) = 0;
takes 0.23 s (still same amount of time as the other functions!) and
function dealiasing2d(where, data)
data(repmat(where,[1,1,size(data,3)])) = 0;
takes 0.30 s (~ 20% more than the other Matlab solutions).
To conclude, it seems that there is something that limits the performance of Matlab in this case. It could also be that there is a better solution in Matlab or that I am doing something wrong with the benchmark... It would be great if someone with Matlab and Python-Numpy can provide other timings.
Edit:
Some more data regarding Divakar comment:
For N = 500 ; nk = 500:
Method | time (s) | normalized
----------------|----------|------------
Numpy | 0.05 | 1.0
Numpy loop | 0.05 | 1.0
Matlab bsxfun | 0.70 | 14.0
Matlab find | 0.75 | 15.0
Matlab bsxfun2 | 0.76 | 15.2
Matlab loop | 0.77 | 15.4
Matlab repmat | 0.96 | 19.2
For N = 500 ; nk = 100:
Method | time (s) | normalized
----------------|----------|------------
Numpy | 0.01 | 1.0
Numpy loop | 0.03 | 3.0
Matlab bsxfun | 0.14 | 12.7
Matlab find | 0.15 | 13.6
Matlab bsxfun2 | 0.16 | 14.5
Matlab loop | 0.16 | 14.5
Matlab repmat | 0.20 | 18.2
For N = 2000 ; nk = 10:
Method | time (s) | normalized |
----------------|----------|------------|
Numpy | 0.02 | 1.0 |
Matlab find | 0.23 | 13.8 |
Matlab bsxfun2 | 0.23 | 13.8 |
Matlab bsxfun | 0.24 | 14.4 |
Matlab repmat | 0.30 | 18.0 |
Matlab loop | 0.34 | 20.4 |
Numpy loop | 0.42 | 25.1 |
I really wonder why Matlab seems so slow compared to Numpy...
来源:https://stackoverflow.com/questions/28644866/optimize-a-mask-function-in-matlab