How can I use a custom feature selection function in scikit-learn's `pipeline`

后端 未结 5 1910
闹比i
闹比i 2021-01-30 18:47

Let\'s say that I want to compare different dimensionality reduction approaches for a particular (supervised) dataset that consists of n>2 features via cross-validation and by u

5条回答
  •  时光说笑
    2021-01-30 19:42

    If you want to use the Pipeline object, then yes, the clean way is to write a transformer object. The dirty way to do this is

    select_3_and_4.transform = select_3_and_4.__call__
    select_3_and_4.fit = lambda x: select_3_and_4
    

    and use select_3_and_4 as you had it in your pipeline. You can evidently also write a class.

    Otherwise, you could also just give X_train[:, 2:4] to your pipeline if you know that the other features are irrelevant.

    Data driven feature selection tools are maybe off-topic, but always useful: Check e.g. sklearn.feature_selection.SelectKBest using sklearn.feature_selection.f_classif or sklearn.feature_selection.f_regression with e.g. k=2 in your case.

提交回复
热议问题