mfcc | 易学教程

Building Speech Dataset for LSTM binary classification

阅读更多关于 Building Speech Dataset for LSTM binary classification

问题 I'm trying to do binary LSTM classification using theano. I have gone through the example code however I want to build my own. I have a small set of "Hello" & "Goodbye" recordings that I am using. I preprocess these by extracting the MFCC features for them and saving these features in a text file. I have 20 speech files(10 each) and I am generating a text file for each word, so 20 text files that contains the MFCC features. Each file is a 13x56 matrix. My problem now is: How do I use this

Building Speech Dataset for LSTM binary classification

阅读更多关于 Building Speech Dataset for LSTM binary classification

Why do MFCC extraction libs return different values?

阅读更多关于 Why do MFCC extraction libs return different values?

问题 I am extracting the MFCC features using two different libraries: The python_speech_features lib The BOB lib However the output of the two is different and even the shapes are not the same. Is that normal? or is there a parameter that I am missing? The relevant section of my code is the following: import bob.ap import numpy as np from scipy.io.wavfile import read from sklearn import preprocessing from python_speech_features import mfcc, delta, logfbank def bob_extract_features(audio, rate):

generate mfcc's for audio segments based on annotated file

阅读更多关于 generate mfcc's for audio segments based on annotated file

问题 My main goal is in feeding mfcc features to an ANN. However I am stuck at the data pre processing step and my question has two parts. BACKGROUND : I have an audio. I have a txt file that has the annotation and time stamp like this: 0.0 2.5 Music 2.5 6.05 silence 6.05 8.34 notmusic 8.34 12.0 silence 12.0 15.5 music I know for a single audio file, I can calculate the mfcc using librosa like this : import librosa y, sr = librosa.load('abcd.wav') mfcc=librosa.feature.mfcc(y=y, sr=sr) Part 1: I'm

Converting from one MFCC type to another - HTK

阅读更多关于 Converting from one MFCC type to another - HTK

问题 I am working with the HTK toolkit on a word spotting task and have a classic training and testing data mismatch. The training data consisted of only "clean" (recorded over a mic) data. The data was converted to MFCC_E_D_A parameters which were then modelled by HMMs (phone-level). My test data has been recorded over landline and mobile phone channels (inviting distortions and the like). Using the MFCC_E_D_A parameters with HVite results in incorrect output. I want to make use of cepstral mean

How to combine mfcc vector with labels from annotation to pass to a neural network

阅读更多关于 How to combine mfcc vector with labels from annotation to pass to a neural network

问题 Using librosa, I created mfcc for my audio file as follows: import librosa y, sr = librosa.load('myfile.wav') print y print sr mfcc=librosa.feature.mfcc(y=y, sr=sr) I also have a text file that contains manual annotations[start, stop, tag] corresponding to the audio as follows: 0.0 2.0 sound1 2.0 4.0 sound2 4.0 6.0 silence 6.0 8.0 sound1 QUESTION: How to do I combine the generated mfcc's that was generated by librosa, with the annotations from text file. Final goal is, I want to combine mfcc

What are the 13 MFCC features

阅读更多关于 What are the 13 MFCC features

问题 I saw many places that uses 13 MFCC features to analyse wav files. I couldn't find any exploitation to what is the meaning of each feature? e.g what is the first MFCC feature, the second , etc. In particular, I couldn't find how to get the pitch (F0) from the MFCC features? Thanks 来源： https://stackoverflow.com/questions/54530052/what-are-the-13-mfcc-features

Are MFCC features required for speech recognition

阅读更多关于 Are MFCC features required for speech recognition

问题 I'm currently developing a speech recognition project and I'm trying to select the most meaningful features. Most of the relevant papers suggest using Zero Crossing Rates, F0, and MFCC features therefore I'm using those. My question is, a training sample with duration of 00:03 has 268 features. Considering I'm doing a multi class classification project with 50+ samples per class training including all MFCC features may suffer the project from curse of dimensionality or 'reduce the importance'

How to train a machine learning algorithm using MFCC coefficient vectors?

阅读更多关于 How to train a machine learning algorithm using MFCC coefficient vectors?

问题 For my final year project i am trying to identify dog/bark/bird sounds real time (by recording sound clips). I am using MFCC as the audio features. Initially i have extracted altogether 12 MFCC vectors from a sound clip using jAudio library. Now I'm trying to train a machine learning algorithm(at the moment i have not decided the algorithm but it is most probably SVM). The sound clip size is like around 3 seconds. I need to clarify some information about this process. They are, Do i have to

Audio descriptor MFCC in C#

阅读更多关于 Audio descriptor MFCC in C#

问题 I'm doing primitive speech recognition and need simple descriptor for my audio signals. Now I have only FFT from my audio signal, but I don't know what should I do after that. When I tried use Hidden Markov Models with only FFT from my training signals, it gives me wrong answers. Could you tell me about any C# libraries, which help me change my FFT signal to MFCC(Mel Frequency Cepstrum Coefficients)? 回答1: I don't know such libraries for C# but I can show you my implementation of extracting 20