imblearn

SMOTE is giving array size / ValueError for all-categorical dataset

自闭症网瘾萝莉.ら 提交于 2021-02-08 07:39:56
问题 I am using SMOTE-NC for oversampling my categorical data. I have only 1 feature and 10500 samples. While running the below code, I am getting the error: --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-151-a261c423a6d8> in <module>() 16 print(X_new.shape) # (10500, 1) 17 print(X_new) ---> 18 sm.fit_sample(X_new, Y_new) ~\AppData\Local\Continuum\Miniconda3\envs\data-science\lib\site-packages\imblearn\base.py

Random forest: balancing test set?

戏子无情 提交于 2020-12-16 03:53:27
问题 I am trying to run a Random Forest Classifier on an imbalanced dataset (~1:4). I am using the method from imblearn as follows: from imblearn.ensemble import BalancedRandomForestClassifier rf=BalancedRandomForestClassifier(n_estimators=1000,random_state=42,class_weight='balanced',sampling_strategy='not minority') rf.fit(train_features,train_labels) predictions=rf.predict(test_features) The split in training and test set is performed within a cross-validation approach using

Jupyter: No module named 'imblearn" after installation

帅比萌擦擦* 提交于 2020-01-13 08:59:07
问题 I installed "imbalanced-learn" (version 0.3.1) on ANACONDA Navigator. When I ran an example from the imbalanced-learn website using Jupyter (Python 3), I got an message regarding "ModuleNotFoundError". No module named 'imblearn". from imblearn.datasets import make_imbalance from imblearn.under_sampling import NearMiss from imblearn.pipeline import make_pipeline from imblearn.metrics import classification_report_imbalanced How could I resolve this? 回答1: Problems importing imblearn python

SMOTE initialisation expects n_neighbors <= n_samples, but n_samples < n_neighbors

大兔子大兔子 提交于 2019-12-30 11:28:05
问题 I have already pre-cleaned the data, and below shows the format of the top 4 rows: [IN] df.head() [OUT] Year cleaned 0 1909 acquaint hous receiv follow letter clerk crown... 1 1909 ask secretari state war whether issu statement... 2 1909 i beg present petit sign upward motor car driv... 3 1909 i desir ask secretari state war second lieuten... 4 1909 ask secretari state war whether would introduc... I have called train_test_split() as follows: [IN] X_train, X_test, y_train, y_test = train_test

SMOTE initialisation expects n_neighbors <= n_samples, but n_samples < n_neighbors

霸气de小男生 提交于 2019-12-30 11:28:03
问题 I have already pre-cleaned the data, and below shows the format of the top 4 rows: [IN] df.head() [OUT] Year cleaned 0 1909 acquaint hous receiv follow letter clerk crown... 1 1909 ask secretari state war whether issu statement... 2 1909 i beg present petit sign upward motor car driv... 3 1909 i desir ask secretari state war second lieuten... 4 1909 ask secretari state war whether would introduc... I have called train_test_split() as follows: [IN] X_train, X_test, y_train, y_test = train_test

resampling data - using SMOTE from imblearn with 3D numpy arrays

蹲街弑〆低调 提交于 2019-12-11 06:27:36
问题 I want to resample my dataset. This consists in categorical transformed data with labels of 3 classes. The amount of samples per class are: counts of class A: 6945 counts of class B: 650 counts of class C: 9066 TOTAl samples: 16661 The data shape without labels is (16661, 1000, 256). This means 16661 samples of (1000,256). What I would like is to up-sampling the data up to the number of samples from the majority class, that is, class A -> (6945) However, when calling: from imblearn.over