What is the fastest FFT implementation in Python?
It seems numpy.fft and scipy.fftpack both are based on fftpack, and not FFTW. Is fftpack as fast as FFTW? What about using multithreaded FFT, or using distributed (MPI) FFT?
You could certainly wrap whatever FFT implementation that you wanted to test using Cython or other like-minded tools that allow you to access external libraries.
GPU-based
If you're going to test FFT implementations, you might also take a look at GPU-based codes (if you have access to the proper hardware). There are several: reikna.fft, scikits.cuda.
CPU-based
There's also a CPU based python FFTW wrapper pyFFTW.
(There is pyFFTW3 as well, but it is not so actively maintained as pyFFTW, and it does not work with Python3. (source))
I don't have experience with any of these. It's probably going to fall to you to do some digging around and benchmark different codes for your particular application if speed is important to you.
For a test detailed at https://gist.github.com/fnielsen/99b981b9da34ae3d5035 I find that scipy.fftpack performs fine compared to my simple application of pyfftw via pyfftw.interfaces.scipy_fftpack
, except for data with a length corresponding to a prime number.
There seems to be some setup cost associated with evoking pyfftw.interfaces.scipy_fftpack.fft the first time. The second time it is faster. Numpy's and scipy's fftpack with a prime number performs terribly for the size of data I tried. CZT is faster in that case. Some months ago an issue was put up at Scipy's Github about the problem, see https://github.com/scipy/scipy/issues/4288
20000 prime=False
padded_fft : 0.003116
numpy_fft : 0.003502
scipy_fft : 0.001538
czt : 0.035041
fftw_fft : 0.004007
------------------------------------------------------------
20011 prime=True
padded_fft : 0.001070
numpy_fft : 1.263672
scipy_fft : 0.875641
czt : 0.033139
fftw_fft : 0.009980
------------------------------------------------------------
21803 prime=True
padded_fft : 0.001076
numpy_fft : 1.510341
scipy_fft : 1.043572
czt : 0.035129
fftw_fft : 0.011463
------------------------------------------------------------
21804 prime=False
padded_fft : 0.001108
numpy_fft : 0.004672
scipy_fft : 0.001620
czt : 0.033854
fftw_fft : 0.005075
------------------------------------------------------------
21997 prime=True
padded_fft : 0.000940
numpy_fft : 1.534876
scipy_fft : 1.058001
czt : 0.034321
fftw_fft : 0.012839
------------------------------------------------------------
32768 prime=False
padded_fft : 0.001222
numpy_fft : 0.002410
scipy_fft : 0.000925
czt : 0.039275
fftw_fft : 0.005714
------------------------------------------------------------
The pyFFTW3 package is inferior compared to the pyFFTW library, at least implementation wise. Since they both wrap the FFTW3 library I guess speed should be the same.
Where I work some researchers have compiled this Fortran library which setups and calls the FFTW for a particular problem. This Fortran library (module with some subroutines) expect some input data (2D lists) from my Python program.
What I did was to create a little C-extension for Python wrapping the Fortran library, where I basically calls "init" to setup a FFTW planner, and another function to feed my 2D lists (arrays), and a "compute" function.
Creating a C-extensions is a small task, and there a lot of good tutorials out there for that particular task.
To good thing about this approach is that we get speed .. a lot of speed. The only drawback is in the C-extension where we must iterate over the Python list, and extract all the Python data into a memory buffer.
The FFTW site shows fftpack running about 1/3 as fast as FFTW, but that's with a mechanically translated Fortran-to-C step followed by C compilation, and I don't know if numpy/scipy uses a more direct Fortran compilation. If performance is critical to you, you might consider compiling FFTW into a DLL/shared library and using ctypes to access it, or building a custom C extension.
FFTW3 seems to be the fastest implementation available that's nicely wrapped. The PyFFTW bindings in the first answer work. Here's some code that compares execution times: test_ffts.py
来源:https://stackoverflow.com/questions/6365623/improving-fft-performance-in-python