How to filter FFT data (for audio visualisation)?

问题

I was looking at this Web Audio API demo, part of this nice book

If you look at the demo, the fft peaks fall smoothly. I'm trying to do same with Processing in Java mode using the minim library. I've looked at how this is done with the web audio api in the doFFTAnalysis() method and tried to replicate this with minim. I also tried to port how abs() works with the complex type:

/ 26.2.7/3 abs(__z):  Returns the magnitude of __z.
00565   template<typename _Tp>
00566     inline _Tp
00567     __complex_abs(const complex<_Tp>& __z)
00568     {
00569       _Tp __x = __z.real();
00570       _Tp __y = __z.imag();
00571       const _Tp __s = std::max(abs(__x), abs(__y));
00572       if (__s == _Tp())  // well ...
00573         return __s;
00574       __x /= __s; 
00575       __y /= __s;
00576       return __s * sqrt(__x * __x + __y * __y);
00577     }
00578

I'm currently doing a quick prototype using Processing(a java framework/library). My code looks like this:

import ddf.minim.*;
import ddf.minim.analysis.*;

private int blockSize = 512;
private Minim minim;
private AudioInput in;
private FFT         mfft;
private float[]    time = new float[blockSize];//time domain
private float[]    real = new float[blockSize];
private float[]    imag = new float[blockSize];
private float[]    freq = new float[blockSize];//smoothed freq. domain

public void setup() {
  minim = new Minim(this);
  in = minim.getLineIn(Minim.STEREO, blockSize);
  mfft = new FFT( in.bufferSize(), in.sampleRate() );
}
public void draw() {
  background(255);
  for (int i = 0; i < blockSize; i++) time[i] = in.left.get(i);
  mfft.forward( time);
  real = mfft.getSpectrumReal();
  imag = mfft.getSpectrumImaginary();

  final float magnitudeScale = 1.0 / mfft.specSize();
  final float k = (float)mouseX/width;

  for (int i = 0; i < blockSize; i++)
  {      
      float creal = real[i];
      float cimag = imag[i];
      float s = Math.max(creal,cimag);
      creal /= s;
      cimag /= s; 
      float absComplex = (float)(s * Math.sqrt(creal*creal + cimag*cimag));
      float scalarMagnitude = absComplex * magnitudeScale;        
      freq[i] = (k * mfft.getBand(i) + (1 - k) * scalarMagnitude);

      line( i, height, i, height - freq[i]*8 );
  }
  fill(0);
  text("smoothing: " + k,10,10);
}

I'm not getting errors, which is good, but I'm not getting the expected behaviour which is bad. I expected the peaks to fall slower when smoothing(k) is close 1, but as far as I can tell my code only scales the bands.

Unfortunately math and sound isn't my strong point, so I'm stabbing in the dark. How can I replicate the nice visualisation from the Web Audio API demo ?

I would be tempted to say this can be language agnostic, but using javascript for example wouldn't apply :). However, I'm happy to try any other java library that does FFT analysis.

UPDATE

I've got a simple solution for smoothing (continuously diminish values of each previous fft band if the current fft band is not higher:

import ddf.minim.analysis.*;
import ddf.minim.*;

Minim       minim;
AudioInput  in;
FFT         fft;

float smoothing = 0;
float[] fftReal;
float[] fftImag;
float[] fftSmooth;
int specSize;
void setup(){
  size(640, 360, P3D);
  minim = new Minim(this);
  in = minim.getLineIn(Minim.STEREO, 512);
  fft = new FFT(in.bufferSize(), in.sampleRate());
  specSize = fft.specSize();
  fftSmooth = new float[specSize];
  fftReal   = new float[specSize];
  colorMode(HSB,specSize,100,100);
}

void draw(){
  background(0);
  stroke(255);

  fft.forward( in.left);
  fftReal = fft.getSpectrumReal();
  fftImag = fft.getSpectrumImaginary();
  for(int i = 0; i < specSize; i++)
  {
    float band = fft.getBand(i);

    fftSmooth[i] *= smoothing;
    if(fftSmooth[i] < band) fftSmooth[i] = band;
    stroke(i,100,50);
    line( i, height, i, height - fftSmooth[i]*8 );
    stroke(i,100,100);
    line( i, height, i, height - band*8 );


  }
  text("smoothing: " + (int)(smoothing*100),10,10);
}
void keyPressed(){
  float inc = 0.01;
  if(keyCode == UP && smoothing < 1-inc) smoothing += inc;
  if(keyCode == DOWN && smoothing > inc) smoothing -= inc;
}

The faded graph is the smoothed one and the fully saturated one is the live one.

I am however still missing something, in comparison to the Web Audio API demo:

I think the Web Audio API might take into account that the medium and higher frequencies will need to be scaled to be closer to what we perceive, but I'm not sure how to tackle that.

I was trying to read more on how the RealtimeAnalyser class does this for the WebAudioAPI, but it seems FFTFrame class's doFFT method might do the logarithmic scaling. I haven't figured out how doFFT works yet.

How can I scale a raw FFT graph with a logarithmic scale to account for perception ? My goal is to do a decent looking visualisation and my guess is i will need to:

smooth values, otherwise elements will animate to fast/twitchy
scale the FFT bins/bands to get better data for medium/high frequencies
map process FFT values to visual elements (find the maximum values/bounds)

Any hints on how I can achieve this ?

UPDATE 2

I'm guessing this part does the smoothing and scaling I'm after in the Web Audio API: // Normalize so than an input sine wave at 0dBfs registers as 0dBfs (undo FFT scaling factor). const double magnitudeScale = 1.0 / DefaultFFTSize;

// A value of 0 does no averaging with the previous result.  Larger values produce slower, but smoother changes.
double k = m_smoothingTimeConstant;
k = max(0.0, k);
k = min(1.0, k);    

// Convert the analysis data from complex to magnitude and average with the previous result.
float* destination = magnitudeBuffer().data();
size_t n = magnitudeBuffer().size();
for (size_t i = 0; i < n; ++i) {
    Complex c(realP[i], imagP[i]);
    double scalarMagnitude = abs(c) * magnitudeScale;        
    destination[i] = float(k * destination[i] + (1 - k) * scalarMagnitude);
}

It seems the scaling is done by taking the absolute of the complex value. This post points in the same direction. I've tried using the abs of the complex number using Minim and using various window functions but it still doesn't look like what I'm aiming for(the Web Audio API demo):

import ddf.minim.analysis.*;
import ddf.minim.*;

Minim       minim;
AudioInput  in;
FFT         fft;

float smoothing = 0;
float[] fftReal;
float[] fftImag;
float[] fftSmooth;
int specSize;

WindowFunction[] window = {FFT.NONE,FFT.HAMMING,FFT.HANN,FFT.COSINE,FFT.TRIANGULAR,FFT.BARTLETT,FFT.BARTLETTHANN,FFT.LANCZOS,FFT.BLACKMAN,FFT.GAUSS};
String[] wlabel = {"NONE","HAMMING","HANN","COSINE","TRIANGULAR","BARTLETT","BARTLETTHANN","LANCZOS","BLACKMAN","GAUSS"};
int windex = 0;

void setup(){
  size(640, 360, P3D);
  minim = new Minim(this);
  in = minim.getLineIn(Minim.STEREO, 512);
  fft = new FFT(in.bufferSize(), in.sampleRate());
  fft.window(window[windex]);
  specSize = fft.specSize();
  fftSmooth = new float[specSize];
  fftReal   = new float[specSize];
  colorMode(HSB,specSize,100,100);
}

void draw(){
  background(0);
  stroke(255);

  fft.forward( in.mix);
  fftReal = fft.getSpectrumReal();
  fftImag = fft.getSpectrumImaginary();
  for(int i = 0; i < specSize; i++)
  {
    float band = fft.getBand(i);

    //Sw = abs(Sw(1:(1+N/2))); %# abs is sqrt(real^2 + imag^2)
    float abs = sqrt(fftReal[i]*fftReal[i] + fftImag[i]*fftImag[i]);

    fftSmooth[i] *= smoothing;
    if(fftSmooth[i] < abs) fftSmooth[i] = abs;

    stroke(i,100,50);
    line( i, height, i, height - fftSmooth[i]*8 );
    stroke(i,100,100);
    line( i, height, i, height - band*8 );


  }
  text("smoothing: " + (int)(smoothing*100)+"\nwindow:"+wlabel[windex],10,10);
}
void keyPressed(){
  float inc = 0.01;
  if(keyCode == UP && smoothing < 1-inc) smoothing += inc;
  if(keyCode == DOWN && smoothing > inc) smoothing -= inc;
  if(key == 'W' && windex < window.length-1) windex++;
  if(key == 'w' && windex > 0) windex--;
  if(key == 'w' || key == 'W') fft.window(window[windex]);
}

I'm not sure I'm using the window functions correctly because I don't notice a huge difference between them. Is the abs of the complex value correct ? How can I get a visualisation closer to my aim ?

UPDATE 3

I've tried to apply @wakjah's helpful tips like so:

import ddf.minim.analysis.*;
import ddf.minim.*;

Minim       minim;
AudioInput  in;
FFT         fft;

float smoothing = 0;
float[] fftReal;
float[] fftImag;
float[] fftSmooth;
float[] fftPrev;
float[] fftCurr;
int specSize;

WindowFunction[] window = {FFT.NONE,FFT.HAMMING,FFT.HANN,FFT.COSINE,FFT.TRIANGULAR,FFT.BARTLETT,FFT.BARTLETTHANN,FFT.LANCZOS,FFT.BLACKMAN,FFT.GAUSS};
String[] wlabel = {"NONE","HAMMING","HANN","COSINE","TRIANGULAR","BARTLETT","BARTLETTHANN","LANCZOS","BLACKMAN","GAUSS"};
int windex = 0;

int scale = 10;

void setup(){
  minim = new Minim(this);
  in = minim.getLineIn(Minim.STEREO, 512);
  fft = new FFT(in.bufferSize(), in.sampleRate());
  fft.window(window[windex]);
  specSize = fft.specSize();
  fftSmooth = new float[specSize];
  fftPrev   = new float[specSize];
  fftCurr   = new float[specSize];
  size(specSize, specSize/2);
  colorMode(HSB,specSize,100,100);
}

void draw(){
  background(0);
  stroke(255);

  fft.forward( in.mix);
  fftReal = fft.getSpectrumReal();
  fftImag = fft.getSpectrumImaginary();
  for(int i = 0; i < specSize; i++)
  {
    //float band = fft.getBand(i);
    //Sw = abs(Sw(1:(1+N/2))); %# abs is sqrt(real^2 + imag^2)
    //float abs = sqrt(fftReal[i]*fftReal[i] + fftImag[i]*fftImag[i]);
    //fftSmooth[i] *= smoothing;
    //if(fftSmooth[i] < abs) fftSmooth[i] = abs;

    //x_dB = 10 * log10(real(x) ^ 2 + imag(x) ^ 2);
    fftCurr[i] = scale * (float)Math.log10(fftReal[i]*fftReal[i] + fftImag[i]*fftImag[i]);
    //Y[k] = alpha * Y_(t-1)[k] + (1 - alpha) * X[k]
    fftSmooth[i] = smoothing * fftPrev[i] + ((1 - smoothing) * fftCurr[i]);

    fftPrev[i] = fftCurr[i];//

    stroke(i,100,100);
    line( i, height, i, height - fftSmooth[i]);

  }
  text("smoothing: " + (int)(smoothing*100)+"\nwindow:"+wlabel[windex]+"\nscale:"+scale,10,10);
}
void keyPressed(){
  float inc = 0.01;
  if(keyCode == UP && smoothing < 1-inc) smoothing += inc;
  if(keyCode == DOWN && smoothing > inc) smoothing -= inc;
  if(key == 'W' && windex < window.length-1) windex++;
  if(key == 'w' && windex > 0) windex--;
  if(key == 'w' || key == 'W') fft.window(window[windex]);
  if(keyCode == LEFT && scale > 1) scale--;
  if(keyCode == RIGHT) scale++;
}

I'm not sure I've applied the hints as intended. Here's how my output looks:

but I don't think I'm there yet if I compare this with visualisations I'm aiming for:

spectrum in windows media player

spectrum in VLC player

I'm not sure I've applied the log scale correctly. My assumptions was, that I would a plot similar to what I'm aiming for after using log10 (ignoring smoothing for now).

UPDATE 4:

import ddf.minim.analysis.*;
import ddf.minim.*;

Minim       minim;
AudioInput  in;
FFT         fft;

float smoothing = 0;
float[] fftReal;
float[] fftImag;
float[] fftSmooth;
float[] fftPrev;
float[] fftCurr;
int specSize;

WindowFunction[] window = {FFT.NONE,FFT.HAMMING,FFT.HANN,FFT.COSINE,FFT.TRIANGULAR,FFT.BARTLETT,FFT.BARTLETTHANN,FFT.LANCZOS,FFT.BLACKMAN,FFT.GAUSS};
String[] wlabel = {"NONE","HAMMING","HANN","COSINE","TRIANGULAR","BARTLETT","BARTLETTHANN","LANCZOS","BLACKMAN","GAUSS"};
int windex = 0;

int scale = 10;

void setup(){
  minim = new Minim(this);
  in = minim.getLineIn(Minim.STEREO, 512);
  fft = new FFT(in.bufferSize(), in.sampleRate());
  fft.window(window[windex]);
  specSize = fft.specSize();
  fftSmooth = new float[specSize];
  fftPrev   = new float[specSize];
  fftCurr   = new float[specSize];
  size(specSize, specSize/2);
  colorMode(HSB,specSize,100,100);
}

void draw(){
  background(0);
  stroke(255);

  fft.forward( in.mix);
  fftReal = fft.getSpectrumReal();
  fftImag = fft.getSpectrumImaginary();
  for(int i = 0; i < specSize; i++)
  {    
    float maxVal = Math.max(Math.abs(fftReal[i]), Math.abs(fftImag[i]));
    if (maxVal != 0.0f) { // prevent divide-by-zero
        // Normalize
        fftReal[i] = fftReal[i] / maxVal;
        fftImag[i] = fftImag[i] / maxVal;
    }

    fftCurr[i] = -scale * (float)Math.log10(fftReal[i]*fftReal[i] + fftImag[i]*fftImag[i]);
    fftSmooth[i] = smoothing * fftSmooth[i] + ((1 - smoothing) * fftCurr[i]);

    stroke(i,100,100);
    line( i, height/2, i, height/2 - (mousePressed ? fftSmooth[i] : fftCurr[i]));

  }
  text("smoothing: " + (int)(smoothing*100)+"\nwindow:"+wlabel[windex]+"\nscale:"+scale,10,10);
}
void keyPressed(){
  float inc = 0.01;
  if(keyCode == UP && smoothing < 1-inc) smoothing += inc;
  if(keyCode == DOWN && smoothing > inc) smoothing -= inc;
  if(key == 'W' && windex < window.length-1) windex++;
  if(key == 'w' && windex > 0) windex--;
  if(key == 'w' || key == 'W') fft.window(window[windex]);
  if(keyCode == LEFT && scale > 1) scale--;
  if(keyCode == RIGHT) scale++;
}

produces this:

In the draw loop I'm drawing from the centre since scale is now negative. If I scale the values up the result starts to look random.

UPDATE6

import ddf.minim.analysis.*;
import ddf.minim.*;

Minim       minim;
AudioInput  in;
FFT         fft;

float smoothing = 0;
float[] fftReal;
float[] fftImag;
float[] fftSmooth;
float[] fftPrev;
float[] fftCurr;
int specSize;

WindowFunction[] window = {FFT.NONE,FFT.HAMMING,FFT.HANN,FFT.COSINE,FFT.TRIANGULAR,FFT.BARTLETT,FFT.BARTLETTHANN,FFT.LANCZOS,FFT.BLACKMAN,FFT.GAUSS};
String[] wlabel = {"NONE","HAMMING","HANN","COSINE","TRIANGULAR","BARTLETT","BARTLETTHANN","LANCZOS","BLACKMAN","GAUSS"};
int windex = 0;

int scale = 10;

void setup(){
  minim = new Minim(this);
  in = minim.getLineIn(Minim.STEREO, 512);
  fft = new FFT(in.bufferSize(), in.sampleRate());
  fft.window(window[windex]);
  specSize = fft.specSize();
  fftSmooth = new float[specSize];
  fftPrev   = new float[specSize];
  fftCurr   = new float[specSize];
  size(specSize, specSize/2);
  colorMode(HSB,specSize,100,100);
}

void draw(){
  background(0);
  stroke(255);

  fft.forward( in.mix);
  fftReal = fft.getSpectrumReal();
  fftImag = fft.getSpectrumImaginary();
  for(int i = 0; i < specSize; i++)
  {
    fftCurr[i] = scale * (float)Math.log10(fftReal[i]*fftReal[i] + fftImag[i]*fftImag[i]);
    fftSmooth[i] = smoothing * fftSmooth[i] + ((1 - smoothing) * fftCurr[i]);

    stroke(i,100,100);
    line( i, height/2, i, height/2 - (mousePressed ? fftSmooth[i] : fftCurr[i]));

  }
  text("smoothing: " + (int)(smoothing*100)+"\nwindow:"+wlabel[windex]+"\nscale:"+scale,10,10);
}
void keyPressed(){
  float inc = 0.01;
  if(keyCode == UP && smoothing < 1-inc) smoothing += inc;
  if(keyCode == DOWN && smoothing > inc) smoothing -= inc;
  if(key == 'W' && windex < window.length-1) windex++;
  if(key == 'w' && windex > 0) windex--;
  if(key == 'w' || key == 'W') fft.window(window[windex]);
  if(keyCode == LEFT && scale > 1) scale--;
  if(keyCode == RIGHT) scale++;
  if(key == 's') saveFrame("fftmod.png");
}

This feels so close:

This looks much better than the previous version, but some values on the lower/left side of the spectrum look a bit off and the shape seems to change very fast. (smoothed values plot zeroes)

回答1:

I'm a little unclear on exactly what kind of smoothing you want to do, but I will try to provide some information that might help you.

Scaling FFT results for display

Generally, when you take the Fourier transform and you want to display a graph of it, you need (as you mention) to scale it logarithmically. This is because the magnitude of the values will vary over a huge range - many orders of magnitude - and compressing this into the small space observable on a graph will cause the main peaks to dwarf the rest of the information.

To actually do this scaling, we convert the values to decibels. It is important to note that decibels is a scale and not a unit - it represents a ratio between two numbers: usually a measured value and some reference. The general formula for decibels is

x_dB = 10 * log10((x ^ 2) / (ref ^ 2))

where log10 is logarithm to base 10, ^ is the power operator, and x_ref is your chosen reference value. Since FFT'd values from an audio file don't (usually) have any meaningful units,x_ref is commonly chosen to be simply 1 for this application. Further, since x is complex, you need to take the absolute value. So the formula will be

x_dB = 10 * log10(abs(x) ^ 2)

There is a small (numerical and speed) optimisation possible here, since you're squaring the result of a square-root:

x_dB = 10 * log10(real(x) ^ 2 + imag(x) ^ 2)

Perceptual weighting

Scaling of frequency-domain measurements is commonly done when measuring sound pressure and power levels: a specific measurement type is chosen for the given application (I won't go into the types here), and a recording of sound is made according to this measurement type. The result is FFT'd and then multiplied by a given weighting at each frequency depending on what the result will be used for and what type of sound has been recorded. There are two weightings in common use: A, and C. C is generally used only for extremely high amplitude sounds.

Note that this kind of weighting is not really necessary if you just want to display a nice-looking graph: it is used to make sure everyone in the world can make measurements (and measurement equipment) that follow the same standard. If you do decide to include this, it must be performed as a multiplication before conversion to decibels (or as an addition of the decibel value of the weighting - which is mathematically equivalent).

Info on the A-weighting is on wikipedia.

Windowing

Windowing is performed primarily to reduce the effect of the Gibbs phenomenon. We can never get rid of it completely but windowing does help. Unfortunately it has other effects: sharp peaks are broadened and "side-lobes" introduced; there is always a compromise between peak sharpness and side-lobe height. I am not going to go into all the details here unless you specifically ask for it; there is a fairly lengthy explanation of windowing in this free online book.

Time-domain smoothing of individual frequency bins

As for making the line in each frequency bin decay slowly, here's a simple idea that might do the trick: in each frequency bin, apply a simple exponential moving average. Say your FFT results are stored in X[k], where k is the frequency index. Let your display value be Y[k] such that

Y[k] = alpha * Y_(t-1)[k] + (1 - alpha) * X[k]

where 0 < alpha < 1 is your smoothing factor, and Y_(t-1)[k] is the value of Y[k] at the last time step (t-1). This is actually a simple low-pass IIR (infinite impulse response) filter, and hopefully should do basically what you want (perhaps with a little tweaking). The closer alpha is to zero, the more quickly new observations (input X[k]) will affect the result. The closer it is to one, the more slowly the result will decay, but the input will also affect the result more slowly, so it may appear "sluggish". You may want to add a conditional around it to take the new value immediately if it's higher than the current value.

Note that, again, this should be performed prior to conversion to decibels.

(edit) Having looked at the code you posted a little more clearly, this does appear to be the method used in the example you're trying to reproduce. Your initial attempt was close, but note that the first term is the smoothing coefficient multiplied by the last display value, not the current input.

(edit 2) Your third update is, again, close, but there is a slight mistranslation of the formula in the following lines

fftSmooth[i] = smoothing * fftPrev[i] + ((1 - smoothing) * fftCurr[i]);

fftPrev[i] = fftCurr[i];//

Instead of the previous value of the FFT coefficients before smoothing, you want to take the value after smoothing. (note that this means you don't actually need another array to store the previous value)

fftSmooth[i] = smoothing * fftSmooth[i] + ((1 - smoothing) * fftCurr[i]);

If smoothing == 0, this line should have little effect other than to multiply the result by a scalar.

Normalization in the absolute value computation

Looking more closely at the way they compute the absolute value, they have a normalization in there, so that whichever of the two complex values is the maximum, becomes 1, and the other is scaled accordingly. This means you will always get an absolute value between 0 and 1, and is probably their alternative to decibel conversion. Really, this is not quite what the documentation of their abs function suggests, which is a little annoying... but anyway, if you replicate this it will guarantee that your values are always in a sensible range.

To do this simply in your code, you could do something like

float maxVal = Math.max(Math.abs(fftReal[i]), Math.abs(fftImag[i]));
if (maxVal != 0.0f) { // prevent divide-by-zero
    // Normalize
    fftReal[i] = fftReal[i] / maxVal;
    fftImag[i] = fftImag[i] / maxVal;
}

fftCurr[i] = scale * (float)Math.log10(fftReal[i]*fftReal[i] + fftImag[i]*fftImag[i]);
// ...

Putting it all together: Some code

Having messed around with it for a while in Processing 2.1, I have a solution that I believe you will be happy with:

import ddf.minim.analysis.*;
import ddf.minim.*;

Minim       minim;
//AudioInput  in;
AudioPlayer in;
FFT         fft;

float smoothing = 0.60;
final boolean useDB = true;
final int minBandwidthPerOctave = 200;
final int bandsPerOctave = 10;
float[] fftSmooth;
int avgSize;

float minVal = 0.0;
float maxVal = 0.0;
boolean firstMinDone = false;

void setup(){
  minim = new Minim(this);
  //in = minim.getLineIn(Minim.STEREO, 512);
  in = minim.loadFile("C:\\path\\to\\some\\audio\\file.ext", 2048);

  in.loop();

  fft = new FFT(in.bufferSize(), in.sampleRate());

  // Use logarithmically-spaced averaging
  fft.logAverages(minBandwidthPerOctave, bandsPerOctave);

  avgSize = fft.avgSize();
  fftSmooth = new float[avgSize];

  int myWidth = 500;
  int myHeight = 250;
  size(myWidth, myHeight);
  colorMode(HSB,avgSize,100,100);

}

float dB(float x) {
  if (x == 0) {
    return 0;
  }
  else {
    return 10 * (float)Math.log10(x);
  }
}

void draw(){
  background(0);
  stroke(255);

  fft.forward( in.mix);

  final int weight = width / avgSize;
  final float maxHeight = (height / 2) * 0.75;

  for (int i = 0; i < avgSize; i++) {
    // Get spectrum value (using dB conversion or not, as desired)
    float fftCurr;
    if (useDB) {
      fftCurr = dB(fft.getAvg(i));
    }
    else {
      fftCurr = fft.getAvg(i);
    }

    // Smooth using exponential moving average
    fftSmooth[i] = (smoothing) * fftSmooth[i] + ((1 - smoothing) * fftCurr);

    // Find max and min values ever displayed across whole spectrum
    if (fftSmooth[i] > maxVal) {
      maxVal = fftSmooth[i];
    }
    if (!firstMinDone || (fftSmooth[i] < minVal)) {
      minVal = fftSmooth[i];
    }
  }

  // Calculate the total range of smoothed spectrum; this will be used to scale all values to range 0...1
  final float range = maxVal - minVal;
  final float scaleFactor = range + 0.00001; // avoid div. by zero

  for(int i = 0; i < avgSize; i++)
  {
    stroke(i,100,100);
    strokeWeight(weight);

    // Y-coord of display line; fftSmooth is scaled to range 0...1; this is then multiplied by maxHeight
    // to make it within display port range
    float fftSmoothDisplay = maxHeight * ((fftSmooth[i] - minVal) / scaleFactor);

    // X-coord of display line
    float x = i * weight;

    line(x, height / 2, x, height / 2 - fftSmoothDisplay);
  }
  text("smoothing: " + (int)(smoothing*100)+"\n",10,10);
}
void keyPressed(){
  float inc = 0.01;
  if(keyCode == UP && smoothing < 1-inc) smoothing += inc;
  if(keyCode == DOWN && smoothing > inc) smoothing -= inc;
}

The above uses a slightly different approach - averaging the spectrum in a series of bins that is smaller than the total spectrum size - that produces a result closer to WMP's than your original.

Enhancement: Now with A-weighting

I have an updated version of the code that applies the A-weighting in each frequency band (though only when dB mode is on, because the table I had was in dB :). Turn A-weighting on for a result closer to WMP's, or off for one closer to VLC's.

There are also some minor tweaks to the way it is displayed: it is now centred in the display and it will display only up to a maximum band centre frequency.

Here's the code - enjoy!

import ddf.minim.analysis.*;
import ddf.minim.*;

Minim       minim;
//AudioInput  in;
AudioPlayer in;
FFT         fft;

float smoothing = 0.73;
final boolean useDB = true;
final boolean useAWeighting = true; // only used in dB mode, because the table I found was in dB 
final boolean resetBoundsAtEachStep = false;
final float maxViewportUsage = 0.85;
final int minBandwidthPerOctave = 200;
final int bandsPerOctave = 10;
final float maxCentreFrequency = 18000;
float[] fftSmooth;
int avgSize;

float minVal = 0.0;
float maxVal = 0.0;
boolean firstMinDone = false;

final float[] aWeightFrequency = { 
  10, 12.5, 16, 20, 
  25, 31.5, 40, 50, 
  63, 80, 100, 125, 
  160, 200, 250, 315, 
  400, 500, 630, 800, 
  1000, 1250, 1600, 2000, 
  2500, 3150, 4000, 5000,
  6300, 8000, 10000, 12500, 
  16000, 20000 
};

final float[] aWeightDecibels = {
  -70.4, -63.4, -56.7, -50.5, 
  -44.7, -39.4, -34.6, -30.2, 
  -26.2, -22.5, -19.1, -16.1, 
  -13.4, -10.9, -8.6, -6.6, 
  -4.8, -3.2, -1.9, -0.8, 
  0.0, 0.6, 1.0, 1.2, 
  1.3, 1.2, 1.0, 0.5, 
  -0.1, -1.1, -2.5, -4.3, 
  -6.6, -9.3 
};

float[] aWeightDBAtBandCentreFreqs;

void setup(){
  minim = new Minim(this);
  //in = minim.getLineIn(Minim.STEREO, 512);
  in = minim.loadFile("D:\\Music\\Arthur Brown\\The Crazy World Of Arthur Brown\\1-09 Fire.mp3", 2048);

  in.loop();

  fft = new FFT(in.bufferSize(), in.sampleRate());

  // Use logarithmically-spaced averaging
  fft.logAverages(minBandwidthPerOctave, bandsPerOctave);
  aWeightDBAtBandCentreFreqs = calculateAWeightingDBForFFTAverages(fft);

  avgSize = fft.avgSize();
  // Only use freqs up to maxCentreFrequency - ones above this may have
  // values too small that will skew our range calculation for all time
  while (fft.getAverageCenterFrequency(avgSize-1) > maxCentreFrequency) {
    avgSize--;
  }

  fftSmooth = new float[avgSize];

  int myWidth = 500;
  int myHeight = 250;
  size(myWidth, myHeight);
  colorMode(HSB,avgSize,100,100);

}

float[] calculateAWeightingDBForFFTAverages(FFT fft) {
  float[] result = new float[fft.avgSize()];
  for (int i = 0; i < result.length; i++) {
    result[i] = calculateAWeightingDBAtFrequency(fft.getAverageCenterFrequency(i));
  }
  return result;    
}

float calculateAWeightingDBAtFrequency(float frequency) {
  return linterp(aWeightFrequency, aWeightDecibels, frequency);    
}

float dB(float x) {
  if (x == 0) {
    return 0;
  }
  else {
    return 10 * (float)Math.log10(x);
  }
}

float linterp(float[] x, float[] y, float xx) {
  assert(x.length > 1);
  assert(x.length == y.length);

  float result = 0.0;
  boolean found = false;

  if (x[0] > xx) {
    result = y[0];
    found = true;
  }

  if (!found) {
    for (int i = 1; i < x.length; i++) {
      if (x[i] > xx) {
        result = y[i-1] + ((xx - x[i-1]) / (x[i] - x[i-1])) * (y[i] - y[i-1]);
        found = true;
        break;
      }
    }
  }

  if (!found) {
    result = y[y.length-1];
  }

  return result;     
}

void draw(){
  background(0);
  stroke(255);

  fft.forward( in.mix);

  final int weight = width / avgSize;
  final float maxHeight = height * maxViewportUsage;
  final float xOffset = weight / 2 + (width - avgSize * weight) / 2;

  if (resetBoundsAtEachStep) {
    minVal = 0.0;
    maxVal = 0.0;
    firstMinDone = false;
  }

  for (int i = 0; i < avgSize; i++) {
    // Get spectrum value (using dB conversion or not, as desired)
    float fftCurr;
    if (useDB) {
      fftCurr = dB(fft.getAvg(i));
      if (useAWeighting) {
        fftCurr += aWeightDBAtBandCentreFreqs[i];
      }
    }
    else {
      fftCurr = fft.getAvg(i);
    }

    // Smooth using exponential moving average
    fftSmooth[i] = (smoothing) * fftSmooth[i] + ((1 - smoothing) * fftCurr);

    // Find max and min values ever displayed across whole spectrum
    if (fftSmooth[i] > maxVal) {
      maxVal = fftSmooth[i];
    }
    if (!firstMinDone || (fftSmooth[i] < minVal)) {
      minVal = fftSmooth[i];
    }
  }

  // Calculate the total range of smoothed spectrum; this will be used to scale all values to range 0...1
  final float range = maxVal - minVal;
  final float scaleFactor = range + 0.00001; // avoid div. by zero

  for(int i = 0; i < avgSize; i++)
  {
    stroke(i,100,100);
    strokeWeight(weight);

    // Y-coord of display line; fftSmooth is scaled to range 0...1; this is then multiplied by maxHeight
    // to make it within display port range
    float fftSmoothDisplay = maxHeight * ((fftSmooth[i] - minVal) / scaleFactor);
    // Artificially impose a minimum of zero (this is mathematically bogus, but whatever)
    fftSmoothDisplay = max(0.0, fftSmoothDisplay);

    // X-coord of display line
    float x = xOffset + i * weight;

    line(x, height, x, height - fftSmoothDisplay);
  }
  text("smoothing: " + (int)(smoothing*100)+"\n",10,10);
}
void keyPressed(){
  float inc = 0.01;
  if(keyCode == UP && smoothing < 1-inc) smoothing += inc;
  if(keyCode == DOWN && smoothing > inc) smoothing -= inc;
}

回答2:

Inside your loop: you need to add a logarithmic computation for a lg scale:

stroke(i,100,50);
line( i, height, i, height - fftSmooth[i]*8 );
stroke(i,100,100);
line( i, height, i, height - band*8 );

Should be changed to:

int l = map(log(map(i ,0 ,specSize,0,100),0,2,0,width).  // an estimation, may have to calibrate
stroke(i,100,50);
line( l, height, l, height - fftSmooth[i]*8 );
stroke(i,100,100);
line( l, height, l, height - band*8 );

来源：https://stackoverflow.com/questions/20408388/how-to-filter-fft-data-for-audio-visualisation

标签

java

audio

fft

visualization

processing