问题
i have a cufftcomplex data block which is the result from cuda fft(R2C). i know the data is save as a structure with a real number followed by image number. now i want to get the amplitude=sqrt(R*R+I*I), and phase=arctan(I/R) of each complex element by a fast way(not for loop). Is there any good way to do that? or any library could do that?
回答1:
Since cufftExecR2C
operates on data that is on the GPU, the results are already on the GPU, (before you copy them back to the host, if you are doing that.)
It should be straightforward to write your own cuda kernel to accomplish this. The amplitude you're describing is the value returned by cuCabs
or cuCabsf
in cuComplex.h
header file. By looking at the functions in that header file, you should be able to figure out how to write your own that computes the phase angle. You'll note that cufftComplex
is just a typedef of cuComplex
.
let's say your cufftExecR2C call left some results of type cufftComplex
in array data
of size sz
. Your kernel might look like this:
#include <math.h>
#include <cuComplex.h>
#include <cufft.h>
#define nTPB 256 // threads per block for kernel
#define sz 100000 // or whatever your output data size is from the FFT
...
__host__ __device__ float carg(const cuComplex& z) {return atan2(cuCimagf(z), cuCrealf(z));} // polar angle
__global__ void magphase(cufftComplex *data, float *mag, float *phase, int dsz){
int idx = threadIdx.x + blockDim.x*blockIdx.x;
if (idx < dsz){
mag[idx] = cuCabsf(data[idx]);
phase[idx] = carg(data[idx]);
}
}
...
int main(){
...
/* Use the CUFFT plan to transform the signal in place. */
/* Your code might be something like this already: */
if (cufftExecR2C(plan, (cufftReal*)data, data) != CUFFT_SUCCESS){
fprintf(stderr, "CUFFT error: ExecR2C Forward failed");
return;
}
/* then you might add: */
float *h_mag, *h_phase, *d_mag, *d_phase;
// malloc your h_ arrays using host malloc first, then...
cudaMalloc((void **)&d_mag, sz*sizeof(float));
cudaMalloc((void **)&d_phase, sz*sizeof(float));
magphase<<<(sz+nTPB-1)/nTPB, nTPB>>>(data, d_mag, d_phase, sz);
cudaMemcpy(h_mag, d_mag, sz*sizeof(float), cudaMemcpyDeviceToHost);
cudaMemcpy(h_phase, d_phase, sz*sizeof(float), cudaMemcpyDeviceToHost);
You can also do this using thrust by creating functors for the magnitude and phase functions, and passing these functors along with data
, mag
and phase
to thrust::transform.
I'm sure you can probably do it with CUBLAS as well, using a combination of vector add and vector multiply operations.
This question/answer may be of interest as well. I lifted my phase function carg
from there.
来源:https://stackoverflow.com/questions/18502087/how-to-get-cufftcomplex-magnitude-and-phase-fast