Further understanding of fftw processing of portaudio signals

问题

I want to analyze a signal I get from my microphone port by using portaudio and fftwpp. For that I followed the explanation provided here. My questions concerning that are now:
There it is stated that I should chunk a window out of the incoming data. My data is already chunked, after I am only recording for a short time, and afterwards process it. Thus I am assuming that a rectangular window is already applied to my data. Is that correct?
Now I am getting 200k data points, should I directly put them into an array:

    Array::array1<Complex> F(np,align);
    Array::array1<double> f(n,align);               // For out-of-place transforms
    //  array1<double> f(2*np,(double *) F()); // For in-place transforms

    fftwpp::rcfft1d Forward(n,f,F);
    fftwpp::crfft1d Backward(n,F,f);
    qDebug() << "Putting " << numSamples << " into an array!";
    for(int i = 0; i < numSamples; i++)
        f[i] = this->data.recordedSamples[i];

or should I split them up? If I all put them in one array, which resolution do I get then? My sample rate is set to 44.1 kHz.

回答1:

Assuming your data is not stationary (in other words the spectral content is time-varying, as would be the case for e.g. speech or music), then you would typically want to pick a window size during which the data can be considered to be somewhat stationary. For speech and music a typical window size might be of the order of 20 ms. For a sample rate of 44.1 kHz this correspond to 882 samples, so an FFT size of 1024 might be a good starting point.

It's also common to overlap successive windows, to get better time resolution for the time-varying components of your signal. A 50% overlap is commonly used, so your first block of samples would be 0..1023, the second block would be 512..1535, etc.

As has already been suggested in @Stefan's answer, you should apply a suitable window function to each block of samples, prior to the FFT. Commonly used windows are Hamming and von Hann (aka Hanning). Obviously the window function needs to be the same size as the FFT (e.g. N = 1024).

For any remaining block of samples of size < N at the end of your data you can just pad with zeroes.

The commonly used term for the above operation is generating a spectrogram. It's essentially a 3D data structure of time v frequency v magnitude/phase, which can bd displayed in various different ways or used for further frequency-domain processing.

See also these closely related StackOverflow questions and answers:

Using Apple's Accelerate framework, FFT, Hann windowing and Overlapping
Giving large no. of samples to KissFFT.
Accelerate framework vDSP, FFT framing

回答2:

Thus I am assuming that a rectangular window is already applied to my data. Is that correct?

In a way, a window is commonly used to filter out high frequency distortion due to the sudden on/off state of the signal, or reduce or reorder spectral leakage (https://en.wikipedia.org/wiki/Spectral_leakage)

It is recommended to apply a window, especially (non-rectangular) if you want to visualize the fft. See https://en.wikipedia.org/wiki/Window_function#Hann_.28Hanning.29_window for options.

Be aware that you apply the window before the fft.

or should I split them up?

Well, that depends on your requirements. But in general, its better not to, due to the windowing, the longer the sample, the more accurate the FFT for that period of time, although those kind of techniques are not uncommon to speed things up.

which resolution do I get then?

The resolution is the sample rate divided by the sample count.

来源：https://stackoverflow.com/questions/35701407/further-understanding-of-fftw-processing-of-portaudio-signals

标签

c++

audio

fft

fftw

portaudio