librosa

librosa can't open .wav created by librosa?

心已入冬 提交于 2020-08-03 02:19:50
问题 i'm trying to use librosa to generate some data by cutting 1s pieces from some .wav file with a duration of 60s. This part works, i create all my files and i can also listen to them via any player, but if i try to open them with librosa.load i receive this error: >>> librosa.load('.\\train\\audio\\silence\\0doing_the_dishes.wav', sr=None) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "C:\Users\gionata\AppData\Local\Programs\Python\Python36\lib\site\packages

librosa can't open .wav created by librosa?

流过昼夜 提交于 2020-08-03 02:19:10
问题 i'm trying to use librosa to generate some data by cutting 1s pieces from some .wav file with a duration of 60s. This part works, i create all my files and i can also listen to them via any player, but if i try to open them with librosa.load i receive this error: >>> librosa.load('.\\train\\audio\\silence\\0doing_the_dishes.wav', sr=None) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "C:\Users\gionata\AppData\Local\Programs\Python\Python36\lib\site\packages

Find sound effect inside an audio file

爷,独闯天下 提交于 2020-07-16 08:15:13
问题 I have a load of 3 hour MP3 files, and every ~15 minutes a distinct 1 second sound effect is played, which signals the beginning of a new chapter. Is it possible to identify each time this sound effect is played, so I can note the time offsets? The sound effect is similar every time, but because it's been encoded in a lossy file format, there will be a small amount of variation. The time offsets will be stored in the ID3 Chapter Frame MetaData. Example Source, where the sound effect plays

Why 128 mel bands are used in mel spectrograms?

六月ゝ 毕业季﹏ 提交于 2020-06-29 06:42:19
问题 I am using the mel spectrogram function which can be found here:Mel Spectrogram Librosa I use it as follows: signal = librosa.feature.melspectrogram(y=waveform, sr=sample_rate, n_fft=512, n_mels=128) Why is 128 mel bands use? I understand that the mel filterbank is used to simulate the "filterbank" in human ears, that's why it discriminates higher frequencies. I am designing and implementing a Speech-to-Text with Deep Learning and when I used n_mels=64, it didn't work at all, it only works