I want to do the FFT of an audio signal in real time, meaning while the person is speaking in the microphone. I will fetch the data (I do this with portaudio, if it would be eas
If you need amplitude, frequency and time in one graph, then the transform is known as a Time-Frequency decomposition. The most popular one is called the Short Time Fourier Transform. It works as follows:
1. Take a small portion of the signal (say 1 second)
2. Window it with a small window (say 5 ms)
3. Compute the 1D fourier transform of the windowed signal.
4. Move the window by a small amount (2.5 ms)
5. Repeat above steps until end of signal.
6. All of this data is entered into a matrix that is then used to create the kind of 3D representation of the signal that shows its decomposition along frequency, amplitude and time.
The length of the window will decide the resolution you are able to obtain in frequency and time domains. Check here for more details on STFT and search for "Robi Polikar"'s tutorials on wavelet transforms for a layman's introduction to the above.
Edit 1:
You take a windowing function (there are innumerable functions out there - here is a list. Most intuitive is a rectangular window but the most commonly used are the Hamming/Hanning window functions. You can follow the steps below if you have a paper-pencil in hand and draw it along.
Assume that the signal that you have obtained is 1 sec long and is named x[n]
. The windowing function is 5 msec long and is named w[n]
. Place the window at the start of the signal (so the end of the window coincides with the 5ms point of the signal) and multiply the x[n]
and w[n]
like so:
y[n] = x[n] * w[n]
- point by point multiplication of the signals.
Take an FFT of y[n]
.
Then you shift the window by a small amount (say 2.5 msec). So now the window stretches from 2.5ms to 7.5 ms of the signal x[n]
. Repeat the multiplication and FFT generation steps. In other words, you have an overlap of 2.5 msec. You will see that changing the length of the window and the overlap gives you different resolutions on the time and Frequency axis.
Once you do this, you need to feed all the data into a matrix and then have it displayed. The overlap is for minimising the errors that might arise at boundaries and also to get more consistent measurements over such short time frames.
P.S: If you had understood STFT and other time-frequency decompositions of a signal, then you should have had no problems with steps 2 and 4. That you have not understood the above mentioned steps makes me feel like you should revisit time-frequency decompositions also.