Why 128 mel bands are used in mel spectrograms?

问题

I am using the mel spectrogram function which can be found here:Mel Spectrogram Librosa

I use it as follows:

signal = librosa.feature.melspectrogram(y=waveform, sr=sample_rate, n_fft=512, n_mels=128)

Why is 128 mel bands use? I understand that the mel filterbank is used to simulate the "filterbank" in human ears, that's why it discriminates higher frequencies.

I am designing and implementing a Speech-to-Text with Deep Learning and when I used n_mels=64, it didn't work at all, it only works with n_mels=128.

Could it because I am normalizing it before injecting it to the network? I am using the librosa.utils.normalize function and it normalizes the mel spectrogram between -1 and 1.

I tried to find where to learn or the reasoning, the only paper I found was this one. Here mel bands from 512 to 128 are being used..... Comparison of Time-Frequency Representations forEnvironmental Sound Classification usingConvolutional Neural Networks

Output examples when n_mels=128

Output examples when n_mels=64 Thanks.

来源：https://stackoverflow.com/questions/62623975/why-128-mel-bands-are-used-in-mel-spectrograms

标签

speech-recognition

speech

mel

librosa

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!