repeatedly computing the fft over a varying number of rows

问题

I am interested in computing the fft of the first rows of a matrix, but I do not know in advance how many rows I need. I need to do this repeatedly but the number of rows I need to transform can change.

I will illustrate with the following example. Suppose I have a 100 by 128 array. If I plan for 1-dimensional fft's on each row, FFTW produces the following plan:

(dft-ct-dit/8
  (dftw-direct-8/28-x100 "t2fv_8_sse2")
  (dft-vrank>=1-x8/1
    (dft-direct-16-x100 "n1fv_16_sse2")))

Although I don't fully understand this output, I do see the key ingredients: 1) This is a single Cooley-Tucker pass, note that 8*16=128. 2) Because of the x100 postfix on two lines, the plan states that this needs to happen for 100 rows.

I see three possibilities:

One-size-fits-all planning: plan for the 100 by 128 array, and execute this big plan even when only the first (say) 20 rows are needed. Pros: we need only one plan so there is little planning overhead. Cons: potentially substantial performance loss in the execution phase (transforming more than I need).
Exhaustive planning: obtain plans using the same input/output array but for a all possible number of rows. In the example I would have 100 plans, where plan i carries out the fft for each of the first i rows. Pros: Transforming exactly what I need. Cons: Experiments show that I have to pay the planning penalty over and over, even though say for i=50 the plan will be the same as above but with x50 instead of x100. (I suppose there is no guarantee this will indeed be the plan identified by the FFTW planner, but I wouldn't mind losing "optimality" if I can cut out the planning time.)
Single-row planning: plan for a single row and use a loop to move data into input, transform, and move data out of output. Pros: I'm transforming exactly what I need. Cons: it seems to me this removes a lot of the FFTW optimizations, for instance when I use multiple threads. (I'm generally confused how multithreading works in FFTW since it is ill-documented... I know threading information is part of the plan, but printing the plan doesn't display any of it. This is off-topic though.)

I was thinking that I would combine all three ideas by first creating the one-size-fits-all plan, modifying this plan 99 times in a for loop instead of planning for the different sizes, and executing as under the exhaustive-planning approach. However I can't find any documentation on the plan/wisdom format, the output of wisdom with hexadecimal numbers is impenetrable. So I am wondering how I can carry out this hybrid approach.

来源：https://stackoverflow.com/questions/44503726/repeatedly-computing-the-fft-over-a-varying-number-of-rows

标签

fftw