I've compared many NEON optimized FFT libraries on ARM Cortex-A9, and "libav" is certainly the fastest FFT code, but it is:
- single-threaded,
- only supports 1D FFTs,
- only supports power-of-2 dimensions,
- and doesn't have various optimizations for real input/output (it is only a complex-to-complex FFT).
On the other hand, "FFTW" (either the official version or the Vesperix version) is multi-threaded, supports 2D FFTs, supports non-power-of-2 dimensions with very little penalty, and has full optimizations for real input/output instead of just complex input/output.
So depending on your FFT requirements, FFTW might be faster for your project due to the extra features, but if you only need the FFT that libav provides (or you write the extra features yourself using NEON and multi-threading), then libav is actually the fastest 1D Complex-to-Complex FFT code.
To give you an indication, it seems that the FFTW NEON optimizations were performed by a student of the guy who performed the libav NEON optimizations. So would you rather the code from the student or the mentor ;-)
Another issue is that libav uses an LGPL license whereas FFTW uses a GPL license so is more restrictive, unless if you are willing to pay a large sum of money to purchase a proper license for FFTW.
(Personally, I ended up writing my own 2D & real-data features using NEON & multi-threading on top of libav's 1D FFT, but it was a lot of effort since I wasn't an FFT expert!)