How to filter FFT data (for audio visualisation)?

后端未结

关注

 2  531

I was looking at this Web Audio API demo, part of this nice book

If you look at the demo, the fft peaks fall smoothly. I\'m trying to do same with Processing in Java

Scaling FFT results for display

Generally, when you take the Fourier transform and you want to display a graph of it, you need (as you mention) to scale it logarithmically. This is because the magnitude of the values will vary over a huge range - many orders of magnitude - and compressing this into the small space observable on a graph will cause the main peaks to dwarf the rest of the information.

To actually do this scaling, we convert the values to decibels. It is important to note that decibels is a scale and not a unit - it represents a ratio between two numbers: usually a measured value and some reference. The general formula for decibels is

x_dB = 10 * log10((x ^ 2) / (ref ^ 2))

where log10 is logarithm to base 10, ^ is the power operator, and x_ref is your chosen reference value. Since FFT'd values from an audio file don't (usually) have any meaningful units,x_ref is commonly chosen to be simply 1 for this application. Further, since x is complex, you need to take the absolute value. So the formula will be

x_dB = 10 * log10(abs(x) ^ 2)

There is a small (numerical and speed) optimisation possible here, since you're squaring the result of a square-root:

x_dB = 10 * log10(real(x) ^ 2 + imag(x) ^ 2)

Perceptual weighting

Scaling of frequency-domain measurements is commonly done when measuring sound pressure and power levels: a specific measurement type is chosen for the given application (I won't go into the types here), and a recording of sound is made according to this measurement type. The result is FFT'd and then multiplied by a given weighting at each frequency depending on what the result will be used for and what type of sound has been recorded. There are two weightings in common use: A, and C. C is generally used only for extremely high amplitude sounds.

Note that this kind of weighting is not really necessary if you just want to display a nice-looking graph: it is used to make sure everyone in the world can make measurements (and measurement equipment) that follow the same standard. If you do decide to include this, it must be performed as a multiplication before conversion to decibels (or as an addition of the decibel value of the weighting - which is mathematically equivalent).

Info on the A-weighting is on wikipedia.

Windowing

Windowing is performed primarily to reduce the effect of the Gibbs phenomenon. We can never get rid of it completely but windowing does help. Unfortunately it has other effects: sharp peaks are broadened and "side-lobes" introduced; there is always a compromise between peak sharpness and side-lobe height. I am not going to go into all the details here unless you specifically ask for it; there is a fairly lengthy explanation of windowing in this free online book.

Time-domain smoothing of individual frequency bins

As for making the line in each frequency bin decay slowly, here's a simple idea that might do the trick: in each frequency bin, apply a simple exponential moving average. Say your FFT results are stored in X[k], where k is the frequency index. Let your display value be Y[k] such that

Y[k] = alpha * Y_(t-1)[k] + (1 - alpha) * X[k]

where 0 < alpha < 1 is your smoothing factor, and Y_(t-1)[k] is the value of Y[k] at the last time step (t-1). This is actually a simple low-pass IIR (infinite impulse response) filter, and hopefully should do basically what you want (perhaps with a little tweaking). The closer alpha is to zero, the more quickly new observations (input X[k]) will affect the result. The closer it is to one, the more slowly the result will decay, but the input will also affect the result more slowly, so it may appear "sluggish". You may want to add a conditional around it to take the new value immediately if it's higher than the current value.

Note that, again, this should be performed prior to conversion to decibels.

(edit) Having looked at the code you posted a little more clearly, this does appear to be the method used in the example you're trying to reproduce. Your initial attempt was close, but note that the first term is the smoothing coefficient multiplied by the last display value, not the current input.

(edit 2) Your third update is, again, close, but there is a slight mistranslation of the formula in the following lines

fftSmooth[i] = smoothing * fftPrev[i] + ((1 - smoothing) * fftCurr[i]);

fftPrev[i] = fftCurr[i];//

Instead of the previous value of the FFT coefficients before smoothing, you want to take the value after smoothing. (note that this means you don't actually need another array to store the previous value)

fftSmooth[i] = smoothing * fftSmooth[i] + ((1 - smoothing) * fftCurr[i]);

If smoothing == 0, this line should have little effect other than to multiply the result by a scalar.

Normalization in the absolute value computation

Looking more closely at the way they compute the absolute value, they have a normalization in there, so that whichever of the two complex values is the maximum, becomes 1, and the other is scaled accordingly. This means you will always get an absolute value between 0 and 1, and is probably their alternative to decibel conversion. Really, this is not quite what the documentation of their abs function suggests, which is a little annoying... but anyway, if you replicate this it will guarantee that your values are always in a sensible range.

To do this simply in your code, you could do something like

float maxVal = Math.max(Math.abs(fftReal[i]), Math.abs(fftImag[i]));
if (maxVal != 0.0f) { // prevent divide-by-zero
    // Normalize
    fftReal[i] = fftReal[i] / maxVal;
    fftImag[i] = fftImag[i] / maxVal;
}

fftCurr[i] = scale * (float)Math.log10(fftReal[i]*fftReal[i] + fftImag[i]*fftImag[i]);
// ...

Putting it all together: Some code

Having messed around with it for a while in Processing 2.1, I have a solution that I believe you will be happy with:

import ddf.minim.analysis.*;
import ddf.minim.*;

Minim       minim;
//AudioInput  in;
AudioPlayer in;
FFT         fft;

float smoothing = 0.60;
final boolean useDB = true;
final int minBandwidthPerOctave = 200;
final int bandsPerOctave = 10;
float[] fftSmooth;
int avgSize;

float minVal = 0.0;
float maxVal = 0.0;
boolean firstMinDone = false;

void setup(){
  minim = new Minim(this);
  //in = minim.getLineIn(Minim.STEREO, 512);
  in = minim.loadFile("C:\\path\\to\\some\\audio\\file.ext", 2048);

  in.loop();

  fft = new FFT(in.bufferSize(), in.sampleRate());

  // Use logarithmically-spaced averaging
  fft.logAverages(minBandwidthPerOctave, bandsPerOctave);

  avgSize = fft.avgSize();
  fftSmooth = new float[avgSize];

  int myWidth = 500;
  int myHeight = 250;
  size(myWidth, myHeight);
  colorMode(HSB,avgSize,100,100);

}

float dB(float x) {
  if (x == 0) {
    return 0;
  }
  else {
    return 10 * (float)Math.log10(x);
  }
}

void draw(){
  background(0);
  stroke(255);

  fft.forward( in.mix);

  final int weight = width / avgSize;
  final float maxHeight = (height / 2) * 0.75;

  for (int i = 0; i < avgSize; i++) {
    // Get spectrum value (using dB conversion or not, as desired)
    float fftCurr;
    if (useDB) {
      fftCurr = dB(fft.getAvg(i));
    }
    else {
      fftCurr = fft.getAvg(i);
    }

    // Smooth using exponential moving average
    fftSmooth[i] = (smoothing) * fftSmooth[i] + ((1 - smoothing) * fftCurr);

    // Find max and min values ever displayed across whole spectrum
    if (fftSmooth[i] > maxVal) {
      maxVal = fftSmooth[i];
    }
    if (!firstMinDone || (fftSmooth[i] < minVal)) {
      minVal = fftSmooth[i];
    }
  }

  // Calculate the total range of smoothed spectrum; this will be used to scale all values to range 0...1
  final float range = maxVal - minVal;
  final float scaleFactor = range + 0.00001; // avoid div. by zero

  for(int i = 0; i < avgSize; i++)
  {
    stroke(i,100,100);
    strokeWeight(weight);

    // Y-coord of display line; fftSmooth is scaled to range 0...1; this is then multiplied by maxHeight
    // to make it within display port range
    float fftSmoothDisplay = maxHeight * ((fftSmooth[i] - minVal) / scaleFactor);

    // X-coord of display line
    float x = i * weight;

    line(x, height / 2, x, height / 2 - fftSmoothDisplay);
  }
  text("smoothing: " + (int)(smoothing*100)+"\n",10,10);
}
void keyPressed(){
  float inc = 0.01;
  if(keyCode == UP && smoothing < 1-inc) smoothing += inc;
  if(keyCode == DOWN && smoothing > inc) smoothing -= inc;
}

The above uses a slightly different approach - averaging the spectrum in a series of bins that is smaller than the total spectrum size - that produces a result closer to WMP's than your original.

Result example

Enhancement: Now with A-weighting

I have an updated version of the code that applies the A-weighting in each frequency band (though only when dB mode is on, because the table I had was in dB :). Turn A-weighting on for a result closer to WMP's, or off for one closer to VLC's.

There are also some minor tweaks to the way it is displayed: it is now centred in the display and it will display only up to a maximum band centre frequency.

Here's the code - enjoy!

import ddf.minim.analysis.*;
import ddf.minim.*;

Minim       minim;
//AudioInput  in;
AudioPlayer in;
FFT         fft;

float smoothing = 0.73;
final boolean useDB = true;
final boolean useAWeighting = true; // only used in dB mode, because the table I found was in dB 
final boolean resetBoundsAtEachStep = false;
final float maxViewportUsage = 0.85;
final int minBandwidthPerOctave = 200;
final int bandsPerOctave = 10;
final float maxCentreFrequency = 18000;
float[] fftSmooth;
int avgSize;

float minVal = 0.0;
float maxVal = 0.0;
boolean firstMinDone = false;

final float[] aWeightFrequency = { 
  10, 12.5, 16, 20, 
  25, 31.5, 40, 50, 
  63, 80, 100, 125, 
  160, 200, 250, 315, 
  400, 500, 630, 800, 
  1000, 1250, 1600, 2000, 
  2500, 3150, 4000, 5000,
  6300, 8000, 10000, 12500, 
  16000, 20000 
};

final float[] aWeightDecibels = {
  -70.4, -63.4, -56.7, -50.5, 
  -44.7, -39.4, -34.6, -30.2, 
  -26.2, -22.5, -19.1, -16.1, 
  -13.4, -10.9, -8.6, -6.6, 
  -4.8, -3.2, -1.9, -0.8, 
  0.0, 0.6, 1.0, 1.2, 
  1.3, 1.2, 1.0, 0.5, 
  -0.1, -1.1, -2.5, -4.3, 
  -6.6, -9.3 
};

float[] aWeightDBAtBandCentreFreqs;

void setup(){
  minim = new Minim(this);
  //in = minim.getLineIn(Minim.STEREO, 512);
  in = minim.loadFile("D:\\Music\\Arthur Brown\\The Crazy World Of Arthur Brown\\1-09 Fire.mp3", 2048);

  in.loop();

  fft = new FFT(in.bufferSize(), in.sampleRate());

  // Use logarithmically-spaced averaging
  fft.logAverages(minBandwidthPerOctave, bandsPerOctave);
  aWeightDBAtBandCentreFreqs = calculateAWeightingDBForFFTAverages(fft);

  avgSize = fft.avgSize();
  // Only use freqs up to maxCentreFrequency - ones above this may have
  // values too small that will skew our range calculation for all time
  while (fft.getAverageCenterFrequency(avgSize-1) > maxCentreFrequency) {
    avgSize--;
  }

  fftSmooth = new float[avgSize];

  int myWidth = 500;
  int myHeight = 250;
  size(myWidth, myHeight);
  colorMode(HSB,avgSize,100,100);

}

float[] calculateAWeightingDBForFFTAverages(FFT fft) {
  float[] result = new float[fft.avgSize()];
  for (int i = 0; i < result.length; i++) {
    result[i] = calculateAWeightingDBAtFrequency(fft.getAverageCenterFrequency(i));
  }
  return result;    
}

float calculateAWeightingDBAtFrequency(float frequency) {
  return linterp(aWeightFrequency, aWeightDecibels, frequency);    
}

float dB(float x) {
  if (x == 0) {
    return 0;
  }
  else {
    return 10 * (float)Math.log10(x);
  }
}

float linterp(float[] x, float[] y, float xx) {
  assert(x.length > 1);
  assert(x.length == y.length);

  float result = 0.0;
  boolean found = false;

  if (x[0] > xx) {
    result = y[0];
    found = true;
  }

  if (!found) {
    for (int i = 1; i < x.length; i++) {
      if (x[i] > xx) {
        result = y[i-1] + ((xx - x[i-1]) / (x[i] - x[i-1])) * (y[i] - y[i-1]);
        found = true;
        break;
      }
    }
  }

  if (!found) {
    result = y[y.length-1];
  }

  return result;     
}

void draw(){
  background(0);
  stroke(255);

  fft.forward( in.mix);

  final int weight = width / avgSize;
  final float maxHeight = height * maxViewportUsage;
  final float xOffset = weight / 2 + (width - avgSize * weight) / 2;

  if (resetBoundsAtEachStep) {
    minVal = 0.0;
    maxVal = 0.0;
    firstMinDone = false;
  }

  for (int i = 0; i < avgSize; i++) {
    // Get spectrum value (using dB conversion or not, as desired)
    float fftCurr;
    if (useDB) {
      fftCurr = dB(fft.getAvg(i));
      if (useAWeighting) {
        fftCurr += aWeightDBAtBandCentreFreqs[i];
      }
    }
    else {
      fftCurr = fft.getAvg(i);
    }

    // Smooth using exponential moving average
    fftSmooth[i] = (smoothing) * fftSmooth[i] + ((1 - smoothing) * fftCurr);

    // Find max and min values ever displayed across whole spectrum
    if (fftSmooth[i] > maxVal) {
      maxVal = fftSmooth[i];
    }
    if (!firstMinDone || (fftSmooth[i] < minVal)) {
      minVal = fftSmooth[i];
    }
  }

  // Calculate the total range of smoothed spectrum; this will be used to scale all values to range 0...1
  final float range = maxVal - minVal;
  final float scaleFactor = range + 0.00001; // avoid div. by zero

  for(int i = 0; i < avgSize; i++)
  {
    stroke(i,100,100);
    strokeWeight(weight);

    // Y-coord of display line; fftSmooth is scaled to range 0...1; this is then multiplied by maxHeight
    // to make it within display port range
    float fftSmoothDisplay = maxHeight * ((fftSmooth[i] - minVal) / scaleFactor);
    // Artificially impose a minimum of zero (this is mathematically bogus, but whatever)
    fftSmoothDisplay = max(0.0, fftSmoothDisplay);

    // X-coord of display line
    float x = xOffset + i * weight;

    line(x, height, x, height - fftSmoothDisplay);
  }
  text("smoothing: " + (int)(smoothing*100)+"\n",10,10);
}
void keyPressed(){
  float inc = 0.01;
  if(keyCode == UP && smoothing < 1-inc) smoothing += inc;
  if(keyCode == DOWN && smoothing > inc) smoothing -= inc;
}

result 2

0 讨论(0)