librosa

FileNotFoundError: [WinError 2] system can't find the file specified

Deadly 提交于 2019-12-11 14:25:49
问题 I used to load waveform file with librosa but failed I tried to check the kernel.jason file to fix it import librosa import matplotlib.pyplot as plt import numpy as np import librosa.display import IPython.display as ipd %matplotlib inline filepath = './audio_train/happy/' filename = filepath + '00334.wav' y, sr = librosa.load(filename, sr=None) FileNotFoundError: [WinError 2] The system cannot find the file specified. 回答1: i fix this problem by entering conda install -c conda-forge librosa

Recorded audio of one note produces multiple onset times

谁说胖子不能爱 提交于 2019-12-08 16:39:14
问题 I am using the Librosa library for pitch and onset detection. Specifically, I am using onset_detect and piptrack. This is my code: def detect_pitch(y, sr, onset_offset=5, fmin=75, fmax=1400): y = highpass_filter(y, sr) onset_frames = librosa.onset.onset_detect(y=y, sr=sr) pitches, magnitudes = librosa.piptrack(y=y, sr=sr, fmin=fmin, fmax=fmax) notes = [] for i in range(0, len(onset_frames)): onset = onset_frames[i] + onset_offset index = magnitudes[:, onset].argmax() pitch = pitches[index,

Spectrograms generated using Librosa don't look consistent with Kaldi?

橙三吉。 提交于 2019-12-07 19:30:22
问题 I generated spectrogram of a "seven" utterance using the "egs/tidigits" code from Kaldi, using 23 bins, 20kHz sampling rate, 25ms window, and 10ms shift. Spectrogram appears as below visualized via MATLAB imagesc function: I am experimenting with using Librosa as an alternative to Kaldi. I set up my code as below using the same number of bins, sampling rate, and window length / shift as above. time_series, sample_rate = librosa.core.load("7a.wav",sr=20000) spectrogram = librosa.feature

How do I apply a binary mask and STFT to produce an audio file?

情到浓时终转凉″ 提交于 2019-12-06 15:57:42
问题 So here's the idea: you can generate a spectrogram from an audio file using shorttime Fourier transform (stft). Then some people have generated something called a "binary mask" to generate different audio (ie. with background noise removed etc.) from the inverse stft. Here's what I understand: stft is a simple equation that is applied to the audio file, which generates the information that can easily be displayed a spectrogram. By taking the inverse of the stft matrix, and multiplying it by a

Spectrograms generated using Librosa don't look consistent with Kaldi?

点点圈 提交于 2019-12-06 12:59:27
I generated spectrogram of a "seven" utterance using the "egs/tidigits" code from Kaldi, using 23 bins, 20kHz sampling rate, 25ms window, and 10ms shift. Spectrogram appears as below visualized via MATLAB imagesc function: I am experimenting with using Librosa as an alternative to Kaldi. I set up my code as below using the same number of bins, sampling rate, and window length / shift as above. time_series, sample_rate = librosa.core.load("7a.wav",sr=20000) spectrogram = librosa.feature.melspectrogram(time_series, sr=20000, n_mels=23, n_fft=500, hop_length=200) log_S = librosa.core.logamplitude

How do I apply a binary mask and STFT to produce an audio file?

依然范特西╮ 提交于 2019-12-04 20:42:25
So here's the idea: you can generate a spectrogram from an audio file using shorttime Fourier transform (stft). Then some people have generated something called a "binary mask" to generate different audio (ie. with background noise removed etc.) from the inverse stft. Here's what I understand: stft is a simple equation that is applied to the audio file, which generates the information that can easily be displayed a spectrogram. By taking the inverse of the stft matrix, and multiplying it by a matrix of the same size (the binary matrix) you can create a new matrix with information to generate

Unable to use Multithread for librosa melspectrogram

自古美人都是妖i 提交于 2019-12-04 20:21:23
I have over 1000 audio files (it's just a initial development, in the future, there will be even more audio files), and would like to convert them to melspectrogram. Since my workstation has a Intel® Xeon® Processor E5-2698 v3, which has 32 threads, I would like to use multithread to do my job. My code import os import librosa from librosa.display import specshow from natsort import natsorted import numpy as np import sys # Libraries for multi thread from multiprocessing.dummy import Pool as ThreadPool import subprocess pool = ThreadPool(20) songlist = os.listdir('../opensmile/devset_2015/')

Multiprocessing Pool slow when calling external module

六眼飞鱼酱① 提交于 2019-12-02 02:12:21
问题 My script is calling librosa module to compute Mel-frequency cepstral coefficients (MFCCs) for short pieces of audio. After loading the audio, I'd like to compute these (along with some other audio features) as fast as possible - hence multiprocessing. Problem: multiprocessing variant is much slower than sequential. Profiling says my code spends over 90% of the time on <method 'acquire' of '_thread.lock' objects> . It's not surprising if it were many small tasks, but in one test case, I am

Librosa pitch tracking - STFT

柔情痞子 提交于 2019-11-30 18:21:54
I am using this algorithm to detect the pitch of this audio file. As you can hear, it is an E2 note played on a guitar with a bit of noise in the background. I generated this spectrogram using STFT: And I am using the algorithm linked above like this: y, sr = librosa.load(filename, sr=40000) pitches, magnitudes = librosa.core.piptrack(y=y, sr=sr, fmin=75, fmax=1600) np.set_printoptions(threshold=np.nan) print pitches[np.nonzero(pitches)] As a result, I am getting pretty much every possible frequency between my fmin and fmax . What do I have to do with the output of the piptrack method to

Librosa pitch tracking - STFT

安稳与你 提交于 2019-11-30 01:37:16
问题 I am using this algorithm to detect the pitch of this audio file. As you can hear, it is an E2 note played on a guitar with a bit of noise in the background. I generated this spectrogram using STFT: And I am using the algorithm linked above like this: y, sr = librosa.load(filename, sr=40000) pitches, magnitudes = librosa.core.piptrack(y=y, sr=sr, fmin=75, fmax=1600) np.set_printoptions(threshold=np.nan) print pitches[np.nonzero(pitches)] As a result, I am getting pretty much every possible