How do you determing the correct dimension of Mel Spectrogram Feature Extraction for NN

末鹿安然 提交于 2021-02-11 12:26:34


I trying to implement a Mel Spectrogram feature extraction:

n_mels = 128

# Extracting MelFrequency Spectrum for every file
def extract_features(file_name):
    audio, sample_rate = librosa.load(file_name, res_type='kaiser_fast')
    mely = librosa.feature.melspectrogram(y=audio, sr=sample_rate, n_mels=n_mels)
except Exception as e:
    print("Error encountered while parsing file: ", file)
    return None

return mely.T

It appears that I am implementing this feature extraction incorrectly as when I check the x_test array it is (353,) and the x_train array is (1408,). The data is not correctly being parsed and an error is cast.


    v = format % tuple(row) + newline
TypeError: only size-1 arrays can be converted to Python scalars

When I modify the extract_features code to:

def extract_features(file_name):
        audio, sample_rate = librosa.load(file_name, res_type='kaiser_fast')
        mely = librosa.feature.melspectrogram(y=audio, sr=sample_rate, n_mels=n_mels)
        melyscaled = np.mean(mely.T, axis=0)

    except Exception as e:
        print("Error encountered while parsing file: ", file)
        return None

    return melyscaled

The program works.

How to get the correct dimension from the definition code without doing any scaling? What does the np.mean do to the feature extracted?

Also, how do you determine the correct value for n_mels?

