How to extract semi-precise frequencies from a WAV file using Fourier Transforms

陌路散爱 提交于 2019-11-28 18:26:43

To get the power spectrum of a section of your file:

  • collect N samples, where N is a power of 2 - if your sample rate is 44.1 kHz for example and you want to sample approx every second then go for say N = 32768 samples.

  • apply a suitable window function to the samples, e.g. Hanning

  • pass the windowed samples to an FFT routine - ideally you want a real-to-complex FFT but if all you have a is complex-to-complex FFT then pass 0 for all the imaginary input parts

  • calculate the squared magnitude of your FFT output bins (re * re + im * im)

  • (optional) calculate 10 * log10 of each magnitude squared output bin to get a magnitude value in dB

Now that you have your power spectrum you just need to identify the peak(s), which should be pretty straightforward if you have a reasonable S/N ratio. Note that frequency resolution improves with larger N. For the above example of 44.1 kHz sample rate and N = 32768 the frequency resolution of each bin is 44100 / 32768 = 1.35 Hz.

You are basically interested in estimating a Spectrum -assuming you've already gone past the stage of reading the WAV and converting it into a discrete time signal.

Among the various methods, the most basic is the Periodogram, which amounts to taking a windowed Discrete Fourier Transform (with a FFT) and keeping its squared magnitude. This correspond to Paul's answer. You need a window which spans over several periods of the lowest frequency you want to detect. Example: if your sinusoids can be as low as 10 Hz (period = 100ms), you should take a window of 200ms o 300ms or so (or more). However, the periodogram has some disadvantages, though it's simple to compute and it's more than enough if high precision is not required:

The raw periodogram is not a good spectral estimate because of spectral bias and the fact that the variance at a given frequency does not decrease as the number of samples used in the computation increases.

The periodogram can perform better by averaging several windows, with a judious choosing of the widths (Bartlet method). And there are many other methods for estimating the spectrum (AR modelling).

Actually, you are not exactly interested in estimating a full spectrum, but only the location of a single frequency. This can be done seeking a peak of an estimated spectrum (done as explained), but also by more specific and powerful (and complicated) methods (Pisarenko, MUSIC algorithm). They would probably be overkill in your case.

WAV files contain linear pulse code modulated (LPCM) data. That just means that it is a sequence of amplitude values at a fixed sample rate. A RIFF header is contained at the beginning of the file to convey information like sampling rate and bits per sample (e.g. 8 kHz signed 16-bit).

The format is very simple and you could easily roll your own. However, there are several libraries available to speed the process such as libsndfile. Simple Direct-media Layer (SDL)/SDL_mixer and PortAudio are two nice libraries for playback.

As for feeding the data into FFTW, you would need to buffer 1 second chunks (determine size by the sample rate and bits per sample). Then convert all of the samples to IEEE floating-point (i.e. float or double depending on the FFTW configuration--libsndfile can do this for you). Next create another array to hold the frequency domain output. Finally, create and execute an FFTW plan by passing both buffers to fftw_plan_dft_r2c_1d and calling fftw_execute with the returned fftw_plan handle.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!