问题
I am using the mel spectrogram function which can be found here:Mel Spectrogram Librosa
I use it as follows:
signal = librosa.feature.melspectrogram(y=waveform, sr=sample_rate, n_fft=512, n_mels=128)
Why is 128 mel bands use? I understand that the mel filterbank is used to simulate the "filterbank" in human ears, that's why it discriminates higher frequencies.
I am designing and implementing a Speech-to-Text with Deep Learning and when I used n_mels=64, it didn't work at all, it only works with n_mels=128.
Could it because I am normalizing it before injecting it to the network? I am using the librosa.utils.normalize function and it normalizes the mel spectrogram between -1 and 1.
I tried to find where to learn or the reasoning, the only paper I found was this one. Here mel bands from 512 to 128 are being used..... Comparison of Time-Frequency Representations forEnvironmental Sound Classification usingConvolutional Neural Networks
Output examples when n_mels=128
Output examples when n_mels=64 Thanks.
来源:https://stackoverflow.com/questions/62623975/why-128-mel-bands-are-used-in-mel-spectrograms