Using Smote with Gridsearchcv in Scikit-learn

前端 未结 1 966
醉酒成梦
醉酒成梦 2020-12-05 15:26

I\'m dealing with an imbalanced dataset and want to do a grid search to tune my model\'s parameters using scikit\'s gridsearchcv. To oversample the data, I want to use SMOTE

相关标签:
1条回答
  • 2020-12-05 15:53

    Yes, it can be done, but with imblearn Pipeline.

    You see, imblearn has its own Pipeline to handle the samplers correctly. I described this in a similar question here.

    When called predict() on a imblearn.Pipeline object, it will skip the sampling method and leave the data as it is to be passed to next transformer. You can confirm that by looking at the source code here:

            if hasattr(transform, "fit_sample"):
                pass
            else:
                Xt = transform.transform(Xt)
    

    So for this to work correctly, you need the following:

    from imblearn.pipeline import Pipeline
    model = Pipeline([
            ('sampling', SMOTE()),
            ('classification', LogisticRegression())
        ])
    
    grid = GridSearchCV(model, params, ...)
    grid.fit(X, y)
    

    Fill the details as necessary, and the pipeline will take care of the rest.

    0 讨论(0)
提交回复
热议问题