Anybody know about the fastest method for calculating convolution? Unfortunately the matrix which I deal with is very large (500x500x200) and if I use convn
in MATL
If your kernel is separable, the greatest speed gains will be realized by performing multiple sequential 1D convolutions.
Steve Eddins of MathWorks describes how to take advantage of the associativity of convolution to speed up convolution when the kernel is separable in a MATLAB context on his blog. For a P-by-Q
kernel, the computational advantage of performing two separate and sequential convolutions vs. 2D convolution is PQ/(P+Q)
, which corresponds to 4.5x for a 9x9 kernel and ~11x for a 15x15 kernel. EDIT: An interesting unwitting demonstration of this difference was given in this Q&A.
To figure out if the kernel is separable (i.e. the outer product of two vectors) the blog goes on to describe how to check if your kernel is separable with SVD and how to get the 1D kernels. Their example is for a 2D kernel. For a solution for N-dimensional separable convolution, check this FEX submission.
Another resource worth pointing out is this SIMD (SSE3/SSE4) implementation of 3D convolution by Intel, which includes both source and a presentation. The code is for 16 bit integers. Unless you move to GPU (e.g. cuFFT), it is probably hard to get faster than Intel's implementations, which also includes Intel MKL. There is an example of 3D convolution (single-precision float) at the bottom of this page of the MKL documentation (link fixed, now mirrored in https://stackoverflow.com/a/27074295/2778484).
i have 2 way to calc fastconv
and 2 betther than 1
1- armadillo you can use armadillo library for calcing conv with this code
cx_vec signal(1024,fill::randn);
cx_vec code(300,fill::randn);
cx_vec ans = conv(signal,code);
2-use fftw ans sigpack and armadillo library for calcing fast conv in this way you must init fft of your code in constructor
FastConvolution::FastConvolution(cx_vec inpCode)
{
filterCode = inpCode;
fft_w = NULL;
}
cx_vec FastConvolution::filter(cx_vec inpData)
{
int length = inpData.size()+filterCode.size();
if((length & (length - 1)) == 0)
{
}
else
{
length = pow(2 , (int)log2(length) + 1);
}
if(length != fftCode.size())
initCode(length);
static cx_vec zeroPadedData;
if(length!= zeroPadedData.size())
{
zeroPadedData.resize(length);
}
zeroPadedData.fill(0);
zeroPadedData.subvec(0,inpData.size()-1) = inpData;
cx_vec fftSignal = fft_w->fft_cx(zeroPadedData);
cx_vec mullAns = fftSignal % fftCode;
cx_vec ans = fft_w->ifft_cx(mullAns);
return ans.subvec(filterCode.size(),inpData.size()+filterCode.size()-1);
}
void FastConvolution::initCode(int length)
{
if(fft_w != NULL)
{
delete fft_w;
}
fft_w = new sp::FFTW(length,FFTW_ESTIMATE);
cx_vec conjCode(length,fill::zeros);
fftCode.resize(length);
for(int i = 0; i < filterCode.size();i++)
{
conjCode.at(i) = filterCode.at(filterCode.size() - i - 1);
}
conjCode = conj(conjCode);
fftCode = fft_w->fft_cx(conjCode);
}
You could try the overlap-add and overlap-save methods. They involve breaking up your input signal into smaller chunks and then using either of the above methods.
An FFT is most likely - and I could be wrong - the fastest method, especially if you are using built-in routines in MATLAB or a library in C++. Apart from that, breaking the input signal into smaller chunks should be a good bet.