resampling data - using SMOTE from imblearn with 3D numpy arrays

蹲街弑〆低调 提交于 2019-12-11 06:27:36

问题


I want to resample my dataset. This consists in categorical transformed data with labels of 3 classes. The amount of samples per class are:

  • counts of class A: 6945
  • counts of class B: 650
  • counts of class C: 9066
  • TOTAl samples: 16661

The data shape without labels is (16661, 1000, 256). This means 16661 samples of (1000,256). What I would like is to up-sampling the data up to the number of samples from the majority class, that is, class A -> (6945)

However, when calling:

from imblearn.over_sampling import SMOTE
print(categorical_vector.shape)
sm = SMOTE(random_state=2)
X_train_res, y_labels_res = sm.fit_sample(categorical_vector, labels.ravel())

It keeps saying ValueError: Found array with dim 3. Estimator expected <= 2.

How can I flatten the data in a way that the estimator could fit it and that it makes sense too? Furthermore, how can I unflatten (with 3D dimension) after getting X_train_res?


回答1:


I am considering a dummy 3d array and assuming a 2d array size by myself,

arr = np.random.rand(160, 10, 25)
orig_shape = arr.shape
print(orig_shape)

Output: (160, 10, 25)

arr = np.reshape(arr, (arr.shape[0], arr.shape[1]))
print(arr.shape)

Output: (4000, 10)

arr = np.reshape(arr, orig_shape))
print(arr.shape)

Output: (160, 10, 25)



来源:https://stackoverflow.com/questions/56125380/resampling-data-using-smote-from-imblearn-with-3d-numpy-arrays

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!