How do you determing the correct dimension of Mel Spectrogram Feature Extraction for NN

末鹿安然 提交于 2021-02-11 12:26:34

问题


I trying to implement a Mel Spectrogram feature extraction:

n_mels = 128

# Extracting MelFrequency Spectrum for every file
def extract_features(file_name):
try:
    audio, sample_rate = librosa.load(file_name, res_type='kaiser_fast')
    mely = librosa.feature.melspectrogram(y=audio, sr=sample_rate, n_mels=n_mels)
    
except Exception as e:
    print("Error encountered while parsing file: ", file)
    return None

return mely.T

It appears that I am implementing this feature extraction incorrectly as when I check the x_test array it is (353,) and the x_train array is (1408,). The data is not correctly being parsed and an error is cast.

ERROR BEGIN

    v = format % tuple(row) + newline
TypeError: only size-1 arrays can be converted to Python scalars

When I modify the extract_features code to:

def extract_features(file_name):
    try:
        audio, sample_rate = librosa.load(file_name, res_type='kaiser_fast')
        mely = librosa.feature.melspectrogram(y=audio, sr=sample_rate, n_mels=n_mels)
        melyscaled = np.mean(mely.T, axis=0)

    except Exception as e:
        print("Error encountered while parsing file: ", file)
        return None

    return melyscaled

The program works.

How to get the correct dimension from the definition code without doing any scaling? What does the np.mean do to the feature extracted?

Also, how do you determine the correct value for n_mels?

来源:https://stackoverflow.com/questions/65328571/how-do-you-determing-the-correct-dimension-of-mel-spectrogram-feature-extraction

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!