Creating spectrogram from .wav using FFT in java

前端 未结 1 403
旧时难觅i
旧时难觅i 2021-02-04 10:58

After researching and a lot of trials-and-errors, I have come to a point that I can construct a spectrogram which I think it has element of rights and wrongs.

1条回答
  •  遇见更好的自我
    2021-02-04 11:48

    Fortunately it seems you have more rights than wrongs.

    The first and main issue which results in the extra red lines is due to how you decode the data in readWAV2Array.getByteArray. Since the samples span 4 bytes, you must index in multiples of 4 (e.g. bytes 0,1,2,3 for sample 0, bytes 4,5,6,7 for sample 1) otherwise you would be reading overlapping blocks of 4 bytes (e.g. bytes 0,1,2,3 for sample 0, bytes 1,2,3,4 for sample 1). The other thing with this conversion is that you must explicitly cast the result to the signed short type before it can be assigned to left and right (which are of type double) in order to get a signed 16 bit result out of unsigned bytes. This should give you a conversion loop which looks like:

    for (int i = 0; 4*i+3 < totalLength; i++){
      left = (short)((data_raw[4*i+1] & 0xff) << 8) | (data_raw[4*i] & 0xff);
      right = (short)((data_raw[4*i+3] & 0xff) << 8) | (data_raw[4*i+2] & 0xff);
      data_mono[i] = (left+right)/2.0;
    }       
    

    At this point you should start to get a plot that has strong lines representing your 20Hz-20kHz chirp:

    But you should notice that you actually get 2 lines. This is because for real-valued signal, the frequency spectrum has Hermitian symmetry. The magnitude of the spectrum above the Nyquist frequency (half the sampling rate, in this case 44100Hz/2) is thus a redundant reflection of the spectrum below the Nyquist frequency. Only plotting the non-redundant part below the Nyquist frequency can be achieved by changing the definition of nY in main to:

    int nY = WS/2 + 1;
    

    and would give you:

    Almost what we're looking for, but the sweep with increasing frequency generates a figure with a line that's decreasing. That's because your indexing make the 0Hz frequency at index 0 which is the top of the figure, and the 22050Hz frequency at index nY-1 which is the bottom of the figure. To flip the figure around and get the more usual 0Hz at the bottom and 22050Hz at the top, you can change the indexing to use:

    plotData[i][nY-j-1] = 10 * Math.log10(amp_square);
    

    Now you should have a plot which looks like the one you were expecting (although with a different color map):

    A final note: while I understand your intention to avoid taking the log of 0 in your conversion to decibels, setting the output to the linear scale amplitude in this specific case could produce unexpected results. Instead I would select a cutoff threshold amplitude for the protection:

    // select threshold based on the expected spectrum amplitudes
    // e.g. 80dB below your signal's spectrum peak amplitude
    double threshold = 1.0;
    // limit values and convert to dB
    plotData[i][nY-j-1] = 10 * Math.log10(Math.max(amp_square,threshold));
    

    0 讨论(0)
提交回复
热议问题