问题
I want to resample my dataset. This consists in categorical transformed data with labels of 3 classes. The amount of samples per class are:
- counts of class A: 6945
- counts of class B: 650
- counts of class C: 9066
- TOTAl samples: 16661
The data shape without labels is (16661, 1000, 256). This means 16661 samples of (1000,256). What I would like is to up-sampling the data up to the number of samples from the majority class, that is, class A -> (6945)
However, when calling:
from imblearn.over_sampling import SMOTE
print(categorical_vector.shape)
sm = SMOTE(random_state=2)
X_train_res, y_labels_res = sm.fit_sample(categorical_vector, labels.ravel())
It keeps saying ValueError: Found array with dim 3. Estimator expected <= 2.
How can I flatten the data in a way that the estimator could fit it and that it makes sense too? Furthermore, how can I unflatten (with 3D dimension) after getting X_train_res?
回答1:
I am considering a dummy 3d
array and assuming a 2d
array size by myself,
arr = np.random.rand(160, 10, 25)
orig_shape = arr.shape
print(orig_shape)
Output: (160, 10, 25)
arr = np.reshape(arr, (arr.shape[0], arr.shape[1]))
print(arr.shape)
Output: (4000, 10)
arr = np.reshape(arr, orig_shape))
print(arr.shape)
Output: (160, 10, 25)
来源:https://stackoverflow.com/questions/56125380/resampling-data-using-smote-from-imblearn-with-3d-numpy-arrays