Doing FFT in realtime

前端 未结 4 1431
小鲜肉
小鲜肉 2021-01-30 05:56

I want to do the FFT of an audio signal in real time, meaning while the person is speaking in the microphone. I will fetch the data (I do this with portaudio, if it would be eas

相关标签:
4条回答
  • 2021-01-30 06:14

    I use a Sliding DFT, which is many times faster than an FFT in the case where you need to do a fourier transform each time a sample arrives in the input buffer.

    It's based on the fact that once you have performed a fourier transform for the last N samples, and a new sample arrives, you can "undo" the effect of the oldest sample, and apply the effect of the latest sample, in a single pass through the fourier data! This means that the sliding DFT performance is O(n) compared with O(Log2(n) times n) for the FFT. Also, there's no restriction to powers of two for the buffer size to maintain performance.

    The complete test program below compares the sliding DFT with fftw. In my production code I've optimized the below code to unreadibility, to make it three times faster.

        #include <complex>
    #include <iostream>
    #include <time.h>
    #include <math_defines.h>
    #include <float.h>
    
    #define DO_FFTW // libfftw
    #define DO_SDFT
    
    #if defined(DO_FFTW)
    #pragma comment( lib, "d:\\projects\\common\\fftw\\libfftw3-3.lib" )
    
    namespace fftw {
    #include <fftw/fftw3.h>
    }
    
    fftw::fftw_plan plan_fwd;
    fftw::fftw_plan plan_inv;
    
    #endif
    
    typedef std::complex<double> complex;
    
    // Buffer size, make it a power of two if you want to improve fftw
    const int N = 750;
    
    
    // input signal
    complex in[N];
    
    // frequencies of input signal after ft
    // Size increased by one because the optimized sdft code writes data to freqs[N]
    complex freqs[N+1];
    
    // output signal after inverse ft of freqs
    complex out1[N];
    complex out2[N];
    
    // forward coeffs -2 PI e^iw -- normalized (divided by N)
    complex coeffs[N];
    // inverse coeffs 2 PI e^iw
    complex icoeffs[N];
    
    // global index for input and output signals
    int idx;
    
    
    // these are just there to optimize (get rid of index lookups in sdft)
    complex oldest_data, newest_data;
    
    //initilaize e-to-the-i-thetas for theta = 0..2PI in increments of 1/N
    void init_coeffs()
    {
        for (int i = 0; i < N; ++i) {
            double a = -2.0 * PI * i  / double(N);
            coeffs[i] = complex(cos(a)/* / N */, sin(a) /* / N */);
        }
        for (int i = 0; i < N; ++i) {
            double a = 2.0 * PI * i  / double(N);
            icoeffs[i] = complex(cos(a),sin(a));
        }
    }
    
    
    // initialize all data buffers
    void init()
    {
        // clear data
        for (int i = 0; i < N; ++i)
            in[i] = 0;
        // seed rand()
        srand(857);
        init_coeffs();
        oldest_data = newest_data = 0.0;
        idx = 0;
    }
    
    // simulating adding data to circular buffer
    void add_data()
    {
        oldest_data = in[idx];
        newest_data = in[idx] = complex(rand() / double(N));
    
    }
    
    
    // sliding dft
    void sdft()
    {
        complex delta = newest_data - oldest_data;
        int ci = 0;
        for (int i = 0; i < N; ++i) {
            freqs[i] += delta * coeffs[ci];
            if ((ci += idx) >= N)
                ci -= N;
        }
    }
    
    // sliding inverse dft
    void isdft()
    {
        complex delta = newest_data - oldest_data;
        int ci = 0;
        for (int i = 0; i < N; ++i) {
            freqs[i] += delta * icoeffs[ci];
            if ((ci += idx) >= N)
                ci -= N;
        }
    }
    
    // "textbook" slow dft, nested loops, O(N*N)
    void ft()
    {
        for (int i = 0; i < N; ++i) {
            freqs[i] = 0.0;
            for (int j = 0; j < N; ++j) {
                double a = -2.0 * PI * i * j / double(N);
                freqs[i] += in[j] * complex(cos(a),sin(a));
            }
        }
    }
    
    double mag(complex& c)
    {
        return sqrt(c.real() * c.real() + c.imag() * c.imag());
    }
    
    void powr_spectrum(double *powr)
    {
        for (int i = 0; i < N/2; ++i) {
            powr[i] = mag(freqs[i]);
        }
    
    }
    
    int main(int argc, char *argv[])
    {
        const int NSAMPS = N*10;
        clock_t start, finish;
    
    #if defined(DO_SDFT)
    // ------------------------------ SDFT ---------------------------------------------
        init();
    
        start = clock();
        for (int i = 0; i < NSAMPS; ++i) {
    
            add_data();
    
            sdft();
            // Mess about with freqs[] here
            //isdft();
    
            if (++idx == N) idx = 0; // bump global index
    
            if ((i % 1000) == 0)
                std::cerr << i << " iters..." << '\r';
        }
        finish = clock();
    
        std::cout << "SDFT: " << NSAMPS / ((finish-start) / (double)CLOCKS_PER_SEC) << " fts per second." << std::endl;
    
        double powr1[N/2];
        powr_spectrum(powr1);
    #endif
    
    #if defined(DO_FFTW)
    
    // ------------------------------ FFTW ---------------------------------------------
        plan_fwd = fftw::fftw_plan_dft_1d(N, (fftw::fftw_complex *)in, (fftw::fftw_complex *)freqs, FFTW_FORWARD, FFTW_MEASURE);
        plan_inv = fftw::fftw_plan_dft_1d(N, (fftw::fftw_complex *)freqs, (fftw::fftw_complex *)out2, FFTW_BACKWARD, FFTW_MEASURE);
    
        init();
    
        start = clock();
        for (int i = 0; i < NSAMPS; ++i) {
    
            add_data();
    
            fftw::fftw_execute(plan_fwd);
            // mess about with freqs here
            //fftw::fftw_execute(plan_inv);
    
            if (++idx == N) idx = 0; // bump global index
    
            if ((i % 1000) == 0)
                std::cerr << i << " iters..." << '\r';
        }
        // normalize fftw's output 
        for (int j = 0; j < N; ++j)
            out2[j] /= N;
    
        finish = clock();
    
        std::cout << "FFTW: " << NSAMPS / ((finish-start) / (double)CLOCKS_PER_SEC) << " fts per second." << std::endl;
        fftw::fftw_destroy_plan(plan_fwd);
        fftw::fftw_destroy_plan(plan_inv);
    
        double powr2[N/2];
        powr_spectrum(powr2);
    #endif
    #if defined(DO_SDFT) && defined(DO_FFTW)
    // ------------------------------      ---------------------------------------------
        const double MAX_PERMISSIBLE_DIFF = 1e-11; // DBL_EPSILON;
        double diff;
        // check my ft gives same power spectrum as FFTW
        for (int i = 0; i < N/2; ++i)
            if ( (diff = abs(powr1[i] - powr2[i])) > MAX_PERMISSIBLE_DIFF)
                printf("Values differ by more than %g at index %d.  Diff = %g\n", MAX_PERMISSIBLE_DIFF, i, diff);
    
    #endif
        return 0;
    }
    
    0 讨论(0)
  • 2021-01-30 06:17

    If you need amplitude, frequency and time in one graph, then the transform is known as a Time-Frequency decomposition. The most popular one is called the Short Time Fourier Transform. It works as follows:
    1. Take a small portion of the signal (say 1 second)
    2. Window it with a small window (say 5 ms)
    3. Compute the 1D fourier transform of the windowed signal.
    4. Move the window by a small amount (2.5 ms)
    5. Repeat above steps until end of signal.
    6. All of this data is entered into a matrix that is then used to create the kind of 3D representation of the signal that shows its decomposition along frequency, amplitude and time.

    The length of the window will decide the resolution you are able to obtain in frequency and time domains. Check here for more details on STFT and search for "Robi Polikar"'s tutorials on wavelet transforms for a layman's introduction to the above.

    Edit 1:
    You take a windowing function (there are innumerable functions out there - here is a list. Most intuitive is a rectangular window but the most commonly used are the Hamming/Hanning window functions. You can follow the steps below if you have a paper-pencil in hand and draw it along.

    Assume that the signal that you have obtained is 1 sec long and is named x[n]. The windowing function is 5 msec long and is named w[n]. Place the window at the start of the signal (so the end of the window coincides with the 5ms point of the signal) and multiply the x[n] and w[n] like so:
    y[n] = x[n] * w[n] - point by point multiplication of the signals.
    Take an FFT of y[n].

    Then you shift the window by a small amount (say 2.5 msec). So now the window stretches from 2.5ms to 7.5 ms of the signal x[n]. Repeat the multiplication and FFT generation steps. In other words, you have an overlap of 2.5 msec. You will see that changing the length of the window and the overlap gives you different resolutions on the time and Frequency axis.

    Once you do this, you need to feed all the data into a matrix and then have it displayed. The overlap is for minimising the errors that might arise at boundaries and also to get more consistent measurements over such short time frames.

    P.S: If you had understood STFT and other time-frequency decompositions of a signal, then you should have had no problems with steps 2 and 4. That you have not understood the above mentioned steps makes me feel like you should revisit time-frequency decompositions also.

    0 讨论(0)
  • 2021-01-30 06:23

    You can create a realtime FFT by choosing a short time-span and analysing (FFT'ing) just that time-span. You can probably get away with just selecting non-overlapping timespans of say 100-500 milliseconds; the analytically purer way to do this would be using a sliding-window (again of e.g. 100-500 ms), but that is often unnecessary and you can show nice graphics with the non-overlapping timespans without much processing power.

    0 讨论(0)
  • 2021-01-30 06:38

    Real-time FFT means completely different from what you just described. It means that for given N and X[N] your algorithm gives Fx[i] while incrementing value i. Meaning, proceeding value does not compute until current value computation completed. This is completely different from what you just described.

    Hardware usually uses FFT with around 1k-16k points. Fixed N, not real-time computation. Moving window FFT as described with previous answers.

    0 讨论(0)
提交回复
热议问题