How can I use a custom feature selection function in scikit-learn's `pipeline`

后端未结

关注

 5  1910

闹比i 2021-01-30 18:47

Let\'s say that I want to compare different dimensionality reduction approaches for a particular (supervised) dataset that consists of n>2 features via cross-validation and by u

5条回答

时光说笑 (楼主)

2021-01-30 19:42
If you want to use the Pipeline object, then yes, the clean way is to write a transformer object. The dirty way to do this is
```
select_3_and_4.transform = select_3_and_4.__call__
select_3_and_4.fit = lambda x: select_3_and_4
```
and use select_3_and_4 as you had it in your pipeline. You can evidently also write a class.

Otherwise, you could also just give X_train[:, 2:4] to your pipeline if you know that the other features are irrelevant.

Data driven feature selection tools are maybe off-topic, but always useful: Check e.g. sklearn.feature_selection.SelectKBest using sklearn.feature_selection.f_classif or sklearn.feature_selection.f_regression with e.g. k=2 in your case.
0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...