Librosa melspectrogram times don't match actual times in audio file

时间秒杀一切 提交于 2020-01-25 07:20:10

问题


I'm trying to calculate MFCC coefficients using librosa.feature, but when I plot it using specshow, times on the specshow graph don't match the actual times in my audio file

I tried the code from librosa docs https://librosa.github.io/librosa/generated/librosa.feature.mfcc.html where we create MFCC having pre-computed log-power Mel spectrogram

WINDOW_HOP = 0.01       # [sec]
WINDOW_SIZE = 0.025     # [sec]

y, fs = librosa.load('audio_dataset/0f39OWEqJ24.wav', sr=None) # fs is 22000

# according to WINDOW_SIZE and fs, win_length is 550, and hop_length is 220
mel_specgram = librosa.feature.melspectrogram(y[:550], sr=fs, n_mels=20, hop_length=int(WINDOW_HOP * fs), win_length=int(WINDOW_SIZE * fs))

mfcc_s = librosa.feature.mfcc(S=librosa.power_to_db(mel_specgram), n_mfcc=12)

librosa.display.specshow(mfcc_s, x_axis='s')

Now look at the scale in specshow image, second frame(window) should start at 220 sample, which is 10ms, but it doesn't


回答1:


You should specify the sample rate when using specshow or librosa.feature.mfcc. Otherwise 22050 Hz is assumed. Also, tell librosa, which hop length you have used:

[...]
hop_length = int(WINDOW_HOP * fs)
mel_specgram = librosa.feature.melspectrogram(y[:550], sr=fs,
    n_mels=20, hop_length=hop_length,
    win_length=int(WINDOW_SIZE * fs))

mfcc_s = librosa.feature.mfcc(S=librosa.power_to_db(mel_specgram), n_mfcc=12, sr=fs)

librosa.display.specshow(mfcc_s, x_axis='s', sr=fs, hop_length=hop_length)

These details are essential for proper visualization and not contained in mfcc_s.



来源:https://stackoverflow.com/questions/58354334/librosa-melspectrogram-times-dont-match-actual-times-in-audio-file

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!