Extracting precise frequencies from FFT Bins using phase change between frames

前端 未结 6 2348
南方客
南方客 2020-12-12 16:05

I have been looking through this fantastic article: http://blogs.zynaptiq.com/bernsee/pitch-shifting-using-the-ft/

While being fantastic, it is extremely hard and he

相关标签:
6条回答
  • 2020-12-12 16:16

    I have implemented this algorithm for Performous myself. When you take another FFT at a time offset, you expect the phase to change according to the offset, i.e. two FFTs taken 256 samples apart should have a phase difference of 256 samples for all frequencies present in the signal (this assumes that the signals themselves are steady, which is a good assumption for short periods like 256 samples).

    Now, the actual phase values you get from FFT are not in samples but in phase angle, so they will be different depending on the frequency. In the following code the phaseStep value is the conversion factor needed per bin, i.e. for frequency corresponding to bin x, the phase shift will be x * phaseStep. For bin center frequencies x would be an integer (the bin number) but for actual detected frequencies it may be any real number.

    const double freqPerBin = SAMPLE_RATE / FFT_N;
    const double phaseStep = 2.0 * M_PI * FFT_STEP / FFT_N;
    

    The correction works by assuming that the signal in a bin has the bin center frequency and then calculating the expected phase shift for that. This expected shift is substracted from the actual shift, leaving the error. A remainder (modulo 2 pi) is taken (-pi to pi range) and the final frequency is calculated with bin center + correction.

    // process phase difference
    double delta = phase - m_fftLastPhase[k];
    m_fftLastPhase[k] = phase;
    delta -= k * phaseStep;  // subtract expected phase difference
    delta = remainder(delta, 2.0 * M_PI);  // map delta phase into +/- M_PI interval
    delta /= phaseStep;  // calculate diff from bin center frequency
    double freq = (k + delta) * freqPerBin;  // calculate the true frequency
    

    Notice that many adjacent bins often end up corrected to the same frequency because the delta correction can be up to 0.5 * FFT_N / FFT_STEP bins either way so the smaller FFT_STEP you use, the further away will corrections be possible (but this increases the processing power needed as well as imprecision due to inaccuracies).

    I hope this helps :)

    0 讨论(0)
  • 2020-12-12 16:16

    Finally I have figured this out; really I had to derive it from scratch. I knew there would be some simple way to derive it, my (usual) mistake was to attempt to follow other people's logic rather than use my own common sense.

    This puzzle takes two keys to unlock it.

    • The first key is to understand how over-sampling introduces a rotation on bin phase.

    • The second key comes from Graph 3.3 & 3.4 here: http://www.dspdimension.com/admin/pitch-shifting-using-the-ft/

    ...

    for (int k = 0; k <= fftFrameSize/2; k++) 
    {
        // compute magnitude and phase 
        bins[k].mag = 2.*sqrt(fftBins[k].real*fftBins[k].real + fftBins[k].imag*fftBins[k].imag);
        bins[k].phase = atan2(fftBins[k].imag, fftBins[k].real);
    
        // Compute phase difference Δϕ fo bin[k]
        double deltaPhase;
        {
            double measuredPhaseDiff = bins[k].phase - gLastPhase[k];
            gLastPhase[k] = bins[k].phase;
    
            // Subtract expected phase difference <-- FIRST KEY
            // Think of a single wave in a 1024 float frame, with osamp = 4
            //   if the first sample catches it at phase = 0, the next will 
            //   catch it at pi/2 ie 1/4 * 2pi
            double binPhaseExpectedDiscrepancy = M_TWOPI * (double)k / (double)osamp;
            deltaPhase = measuredPhaseDiff - binPhaseExpectedDiscrepancy;
    
            // Wrap delta phase into [-Pi, Pi) interval 
            deltaPhase -= M_TWOPI * floor(deltaPhase / M_TWOPI + .5);
        }
    
        // say sampleRate = 40K samps/sec, fftFrameSize = 1024 samps in FFT giving bin[0] thru bin[512]
        // then bin[1] holds one whole wave in the frame, ie 44 waves in 1s ie 44Hz ie sampleRate / fftFrameSize
        double bin0Freq = (double)sampleRate / (double)fftFrameSize;
        bins[k].idealFreq = (double)k * bin0Freq;
    
        // Consider Δϕ for bin[k] between hops.
        // write as 2π / m.
        // so after m hops, Δϕ = 2π, ie 1 extra cycle has occurred   <-- SECOND KEY
        double m = M_TWOPI / deltaPhase;
    
        // so, m hops should have bin[k].idealFreq * t_mHops cycles.  plus this extra 1.
        // 
        // bin[k].idealFreq * t_mHops + 1 cycles in t_mHops seconds 
        //   => bins[k].actualFreq = bin[k].idealFreq + 1 / t_mHops
        double tFrame = fftFrameSize / sampleRate;
        double tHop = tFrame / osamp;
        double t_mHops = m * tHop;
    
        bins[k].freq = bins[k].idealFreq + 1. / t_mHops;
    }
    
    0 讨论(0)
  • 2020-12-12 16:17

    The basic principle is very simple. If a given component exactly matches a bin frequency then its phase will not change from one FT to the next. However if the frequency does not correspond exactly with the bin frequency then there will be a phase change between successive FTs. The frequency delta is just:

    delta_freq = delta_phase / delta_time
    

    and the refined estimate of the frequency of the component will then be:

    freq_est = bin_freq + delta_freq
    
    0 讨论(0)
  • 2020-12-12 16:19

    This is the frequency estimation technique used by phase vocoder methods.

    If you look at a single point on a (fixed frequency and fixed amplitude) sine wave in time, the phase will advance with time by an amount proportional to the frequency. Or you can do the converse: if you measure how much the phase of a sinusoid changes over any unit of time, you can calculate the frequency of that sinusoid.

    A phase vocoder uses two FFTs to estimate phase with reference to two FFT windows, and the offset of the two FFTs is the distance between the 2 phase measurements in time. From thence, you have your frequency estimate for that FFT bin (an FFT bin being roughly a filter to isolate a sinusoidal component or other sufficiently narrowband signal that fits within that bin).

    For this method to work, the spectrum near the FFT bin in use has to be fairly stationary, e.g. not changing in frequency, etc. That's the assumption a phase vocoder requires.

    0 讨论(0)
  • 2020-12-12 16:22

    Signal frequencies that fall exactly on a bin frequency advance bin phase by integer multiples of 2π. Since the bin phases that correspond to the bin frequencies are multiples of 2π due to the periodic nature of the FFT there is no phase change in this case. The article you mention also explains this.

    0 讨论(0)
  • 2020-12-12 16:26

    Maybe this will help. Think of the FFT bins as specifying little clocks or rotors, each spinning at the frequency of the bin. For a stable signal, the (theoretical) next position of the rotor can be predicted using the math in the bit you don't get. Against this "should be" (ideal) position, you can compute several useful things: (1) the difference with the phase in a bin of an adjacent frame, which is used by a phase vocoder to better estimate of the bin frequency, or (2) more generally phase deviation, which is a positive indicator of a note onset or some other event in the audio.

    0 讨论(0)
提交回复
热议问题