问题
I am extracting the MFCC features using two different libraries:
- The python_speech_features lib
- The BOB lib
However the output of the two is different and even the shapes are not the same. Is that normal? or is there a parameter that I am missing?
The relevant section of my code is the following:
import bob.ap
import numpy as np
from scipy.io.wavfile import read
from sklearn import preprocessing
from python_speech_features import mfcc, delta, logfbank
def bob_extract_features(audio, rate):
#get MFCC
rate = 8000 # rate
win_length_ms = 30 # The window length of the cepstral analysis in milliseconds
win_shift_ms = 10 # The window shift of the cepstral analysis in milliseconds
n_filters = 26 # The number of filter bands
n_ceps = 13 # The number of cepstral coefficients
f_min = 0. # The minimal frequency of the filter bank
f_max = 4000. # The maximal frequency of the filter bank
delta_win = 2 # The integer delta value used for computing the first and second order derivatives
pre_emphasis_coef = 0.97 # The coefficient used for the pre-emphasis
dct_norm = True # A factor by which the cepstral coefficients are multiplied
mel_scale = True # Tell whether cepstral features are extracted on a linear (LFCC) or Mel (MFCC) scale
c = bob.ap.Ceps(rate, win_length_ms, win_shift_ms, n_filters, n_ceps, f_min,
f_max, delta_win, pre_emphasis_coef, mel_scale, dct_norm)
c.with_delta = False
c.with_delta_delta = False
c.with_energy = False
signal = np.cast['float'](audio) # vector should be in **float**
example_mfcc = c(signal) # mfcc + mfcc' + mfcc''
return example_mfcc
def psf_extract_features(audio, rate):
signal = np.cast['float'](audio) #vector should be in **float**
mfcc_feature = mfcc(signal, rate, winlen = 0.03, winstep = 0.01, numcep = 13,
nfilt = 26, nfft = 512,appendEnergy = False)
#mfcc_feature = preprocessing.scale(mfcc_feature)
deltas = delta(mfcc_feature, 2)
fbank_feat = logfbank(audio, rate)
combined = np.hstack((mfcc_feature, deltas))
return mfcc_feature
track = 'test-sample.wav'
rate, audio = read(track)
features1 = psf_extract_features(audio, rate)
features2 = bob_extract_features(audio, rate)
print("--------------------------------------------")
t = (features1 == features2)
print(t)
回答1:
However the output of the two is different and even the shapes are not the same. Is that normal?
Yes, there are different varieties of the algorithm and each implementation choose its own flavor
or is there a parameter that I am missing?
It is not just about parameters, there are algorithmic differences too like window shape (hamming vs hanning), shape of mel filters, starts of mel filters, normalization of mel filters, liftering, dct flavor and so on and so forth.
If you want same results just use the single library for extraction, it is pretty hopeless to sync them.
回答2:
Have you tried comparing the two with some tolerance? I believe the two MFCCs are arrays of floating point numbers, and testing for exact equality might not be wise. Try using numpy.testing.assert_allclose
with some tolerance, and decide if the tolerance is good enough.
Nevertheless, I missed you saying that even the shapes mismatch, and I am not experienced with bob.ap to comment on that confidently. However, there's often the case that some libraries pad the input with zeros either in the beginning or the end of the input array for windowing reasons, and that may be responsible if one of these is doing it differently.
来源:https://stackoverflow.com/questions/52112204/why-do-mfcc-extraction-libs-return-different-values