Librosa melspectrogram times don't match actual times in audio file

时间秒杀一切 提交于 2020-01-25 07:20:10


I'm trying to calculate MFCC coefficients using librosa.feature, but when I plot it using specshow, times on the specshow graph don't match the actual times in my audio file

I tried the code from librosa docs where we create MFCC having pre-computed log-power Mel spectrogram

WINDOW_HOP = 0.01       # [sec]
WINDOW_SIZE = 0.025     # [sec]

y, fs = librosa.load('audio_dataset/0f39OWEqJ24.wav', sr=None) # fs is 22000

# according to WINDOW_SIZE and fs, win_length is 550, and hop_length is 220
mel_specgram = librosa.feature.melspectrogram(y[:550], sr=fs, n_mels=20, hop_length=int(WINDOW_HOP * fs), win_length=int(WINDOW_SIZE * fs))

mfcc_s = librosa.feature.mfcc(S=librosa.power_to_db(mel_specgram), n_mfcc=12)

librosa.display.specshow(mfcc_s, x_axis='s')

Now look at the scale in specshow image, second frame(window) should start at 220 sample, which is 10ms, but it doesn't


You should specify the sample rate when using specshow or librosa.feature.mfcc. Otherwise 22050 Hz is assumed. Also, tell librosa, which hop length you have used:

hop_length = int(WINDOW_HOP * fs)
mel_specgram = librosa.feature.melspectrogram(y[:550], sr=fs,
    n_mels=20, hop_length=hop_length,
    win_length=int(WINDOW_SIZE * fs))

mfcc_s = librosa.feature.mfcc(S=librosa.power_to_db(mel_specgram), n_mfcc=12, sr=fs)

librosa.display.specshow(mfcc_s, x_axis='s', sr=fs, hop_length=hop_length)

These details are essential for proper visualization and not contained in mfcc_s.

