I have some Matlab code which needs to be speeded up. Through profiling, I\'ve identified a particular function as the culprit in slowing down the execution. This function i
You should absolutely without any hesitation move the loop inside the mex file. The example below demonstrates a 1000 times speedup for a virtually empty work unit in a for loop. Obviously as the amount of work in the for loop changes this speedup will decrease.
Here is an example of the difference:
Mex function without internal loop:
#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[ ], int nrhs, const mxArray *prhs[ ])
{
int i=1;
plhs[0] = mxCreateDoubleScalar(i);
}
Called in Matlab:
tic;for i=1:1000000;donothing();end;toc
Elapsed time is 3.683634 seconds.
Mex function with internal loop:
#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[ ], int nrhs, const mxArray *prhs[ ])
{
int M = mxGetScalar(prhs[0]);
plhs[0] = mxCreateNumericMatrix(M, 1, mxDOUBLE_CLASS, mxREAL);
double* mymat = mxGetPr(plhs[0]);
for (int i=0; i< M; i++)
mymat[i] = M-i;
}
Called in Matlab:
tic; a = donothing(1000000); toc
Elapsed time is 0.003350 seconds.
Well, this is the fastest I can make it in Matlab:
%#eml
function L = test(s,t)
m = numel(s);
n = numel(t);
% trivial cases
if m==0 && n==0
L = 0; return; end
if n==0
L = m; return; end
if m==0
L = n; return; end
% non-trivial cases
M = zeros(m+1,n+1);
M(:,1) = 0:m;
for j = 2:n+1
for i = 2:m+1
M(i,j) = min([
M(i-1,j) + 1
M(i,j-1) + 1
M(i-1,j-1) + (s(i-1)~=t(j-1));
]);
end
end
L = min(M(end,:));
end
Can you compile this and run some tests? (For some weird reason, compilation fails to work on my installation...) Perhaps change %#eml
to %#codegen
first, if you think that's easier.
NOTE: for the C version, you should also interchange the for-loops, so that the loop over j
is the inner one.
Also, the row1
and row2
approach is a lot more memory efficient. If you're going to compile anyway, I'd use that approach.
As usual, it all depends on the amount of work you do in the MEX file.. The overhead of calling MEX function is constant and does not depend on e.g., the problem size. It means that arguments are not copied to new, temporary arrays. Hence, if it is enough work, the MATLAB overhead of calling the MEX file will not show. Anyway, in my experience the MEX call overhead is significant only for the first time the mex function is called - the dynamic library has to be loaded, symbols resolved etc. Subsequent MEX calls have very little overhead and are very efficient.
Almost everything in MATLAB is connected with some overhead due to the nature of this high-level language. Unless you have a code, which you are sure is fully compiled with JIT (but then you do not need a mex file :)) So you have a choice of one overhead over the other..
So sum up - I would not be too scared of MEX calling overhead.
Edit As often heard here and elsewhere, the only reasonable thing to do in any particular case is of course BENCHMARK and check it for your self. You can easily estimate the MEX call overhead by writing a trivial MEX function:
#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[ ], int nrhs, const mxArray *prhs[ ])
{
}
On my computer you get
tic; for i=1:1000000; mexFun; end; toc
Elapsed time is 2.104849 seconds.
That is 2e-6s overhead per MEX call. Add your code, time it and see, if the overhead is at acceptable level, or not.
As Andrew Janke noted below (thanks!), the MEX function overhead apparently depends on the number of arguments you pass to the MEX function. It is a small dependence, but it is there:
a = ones(1000,1);
tic; for i=1:1000000; mexFun(a); end; toc
Elapsed time is 2.41 seconds.
It is not related to size of a
:
a = ones(1000000,1);
tic; for i=1:1000000; mexFun(a); end; toc
Elapsed time is 2.41805 seconds.
But it is related to the number of arguments
a = ones(1000000,1);
b = ones(1000000,1);
tic; for i=1:1000000; mexFun(a, b); end; toc
Elapsed time is 2.690237 seconds.
So you might want to take that into account in your tests.