问题
I am trying to extract features from .wav files by using MFCC's of the sound files. I am getting an error when I try to convert my list of MFCC's to a numpy array. I am quite sure that this error is occurring because the list contains MFCC values with different shapes (But am unsure of how to solve the issue).
I have looked at 2 other stackoverflow posts, however these don't solve my problem because they are too specific to a certain task.
ValueError: could not broadcast input array from shape (128,128,3) into shape (128,128)
Value Error: could not broadcast input array from shape (857,3) into shape (857)
Full Error Message:
Traceback (most recent call last): File "/..../.../...../Batch_MFCC_Data.py", line 68, in X = np.array(MFCCs) ValueError: could not broadcast input array from shape (20,590) into shape (20)
Code Example:
all_wav_paths = glob.glob('directory_of_wav_files/**/*.wav', recursive=True)
np.random.shuffle(all_wav_paths)
MFCCs = [] #array to hold all MFCC's
labels = [] #array to hold all labels
for i, wav_path in enumerate(all_wav_paths):
individual_MFCC = MFCC_from_wav(wav_path)
#MFCC_from_wav() -> returns the MFCC coefficients
label = get_class(wav_path)
#get_class() -> returns the label of the wav file either 0 or 1
#add features and label to the array
MFCCs.append(individual_MFCC)
labels.append(label)
#Must convert the training data to a Numpy Array for
#train_test_split and saving to local drive
X = np.array(MFCCs) #THIS LINE CRASHES WITH ABOVE ERROR
# binary encode labels
onehot_encoder = OneHotEncoder(sparse=False)
Y = onehot_encoder.fit_transform(labels)
#create train/test data
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(MFCCs, Y, test_size=0.25, random_state=0)
#saving data to local drive
np.save("LABEL_SAVE_PATH", Y)
np.save("TRAINING_DATA_SAVE_PATH", X)
Here is a snapshot of the shape of the MFCC's (from .wav files) in the MFCCs array
The MFCCs array contains with the following shapes :
...More above...
(20, 423) #shape of returned MFCC from one of the .wav files
(20, 457)
(20, 1757)
(20, 345)
(20, 835)
(20, 345)
(20, 687)
(20, 774)
(20, 597)
(20, 719)
(20, 1195)
(20, 433)
(20, 728)
(20, 939)
(20, 345)
(20, 1112)
(20, 345)
(20, 591)
(20, 936)
(20, 1161)
....More below....
As you can see, the MFCC's in the MFCCs array don't all have the same shape, and this is because the recordings are not all the same lengths of time. Is this the reason why I can't convert the array to a numpy array? If this is the issue, how do I fix this issue to have the same shape throughout the MFCC array?
Any code snippets for accomplishing this and advice would be greatly appreciated!
Thanks!
回答1:
Use the following logic to downsample the arrays to min_shape
i.e. reduce larger arrays to min_shape
min_shape = (20, 345)
MFCCs = [arr1, arr2, arr3, ...]
for idx, arr in enumerate(MFCCs):
MFCCs[idx] = arr[:, :min_shape[1]]
batch_arr = np.array(MFCCs)
And then you can stack these arrays in a batch array as in the below minimal example:
In [33]: a1 = np.random.randn(2, 3)
In [34]: a2 = np.random.randn(2, 5)
In [35]: a3 = np.random.randn(2, 10)
In [36]: MFCCs = [a1, a2, a3]
In [37]: min_shape = (2, 2)
In [38]: for idx, arr in enumerate(MFCCs):
...: MFCCs[idx] = arr[:, :min_shape[1]]
...:
In [42]: batch_arr = np.array(MFCCs)
In [43]: batch_arr.shape
Out[43]: (3, 2, 2)
Now for the second strategy, to upsample the arrays smaller arrays to max_shape
, follow similar logic but fill the missing values with either zeros or nan
values as you prefer.
And then again, you can stack the arrays as a batch array of shape (num_arrays, dim1, dim2)
; So, for your case, the shape should be (num_wav_files, 20, max_column
)
来源:https://stackoverflow.com/questions/48001383/valueerror-could-not-broadcast-input-array-from-shape-20-590-into-shape-20