numpy fft is fast for lengths that are products of small primes, but how small?

问题

I've seen several examples showing that if the input length is a product of 2,3,5,7 etc. then numpy's fft implementation is fast. But what is the largest prime number that is still considered "small" here?

回答1:

Note that scipy's FFT has radices of 2, 3, 4, and 5 (reference) . I assume numpy may have a similar implementation, which would make 5 the largest efficient prime factor in FFT lengths.

Empirically, the largest prime I'd consider "small" for the purpose of FFT performance is 11. But any input length of less than about 30 is going to be pretty fast for practical purposes. Any algorithmic performance gains are certainly going to be dwarved by Python's execution overhead. Things are getting more interesting for higher input lengths.

Here are some performance results for small FFTs (median execution time over 500 batches of 1000 FFTs each):

I have marked prime valued n in red and power-of-twos in green.

Mark the following observations:

in general the FFT is slow for primes but fast for power of twos. This is pretty much expected and validates the results.
no performance difference for n <=11 was measurable. This may be due to FFT implementation or execution overhead.
Primes of 31 (maybe 29) and higher are clearly slower than other nearby values.
There are some non-power-of-two values that also give good performance. This are probably highly composite numbers.

The measurements were performed like this:

import numpy as np
import matplotlib.pyplot as plt
from time import time


N = np.arange(2, 65)
times = np.empty((500, N.size))
for i, n in enumerate(N):
    for r in range(times.shape[0]):
        x = np.random.randn(1000, n)
        t = time()
        y = np.fft.fft(x, axis=-1)
        t = time() - t
        times[r, i] = t


med = np.median(times, axis=0)
plt.plot(N, med, 'k')

primes = np.array([2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61])
plt.plot(primes, med[primes-2]+0.0005, 'rx', label='n = prime')

ptwos = np.array([2, 4, 8, 16, 32, 64])
plt.plot(ptwos, med[ptwos-2]-0.0005, 'gx', label='n = 2**k')

plt.legend(loc='best')
plt.xlabel('n')
plt.ylabel('time')
plt.grid()
plt.show()

回答2:

numpy.fft is fast for composite numbers, but not fast for primes. Use pyFFTW for the highest-performance DFT for Python.

Explanation:

According to an old numpy issue, the Bluestein algorithm is not implemented for DFT on arrays of prime length. Wikipedia notes that this algorithm has performance characteristics equivalent to a high-performance algorithm applied to an input whose length has been zero-padded:

The key point is that these FFTs are not of the same length N: such a convolution can be computed exactly from FFTs only by zero-padding it to a length greater than or equal to 2N–1. In particular, one can pad to a power of two or some other highly composite size, for which the FFT can be efficiently performed by e.g. the Cooley–Tukey algorithm in O(N log N) time. Thus, Bluestein's algorithm provides an O(N log N) way to compute prime-size DFTs, albeit several times slower than the Cooley–Tukey algorithm for composite sizes.

I'd recommend to avoid using numpy's implementation generally for these degenerate case. Use https://pypi.python.org/pypi/pyFFTW instead. My intuition would be performance differences will be constant (i.e., half as fast) until the padded-length array no longer fits in your processor's cache—then it will be 10-100x slower.

来源：https://stackoverflow.com/questions/46357589/numpy-fft-is-fast-for-lengths-that-are-products-of-small-primes-but-how-small

标签

python

performance

numpy

fft

primes