Let\'s say that I want to compare different dimensionality reduction approaches for a particular (supervised) dataset that consists of n>2 features via cross-validation and by u
If you want to use the Pipeline
object, then yes, the clean way is to write a transformer object. The dirty way to do this is
select_3_and_4.transform = select_3_and_4.__call__
select_3_and_4.fit = lambda x: select_3_and_4
and use select_3_and_4
as you had it in your pipeline. You can evidently also write a class.
Otherwise, you could also just give X_train[:, 2:4]
to your pipeline if you know that the other features are irrelevant.
Data driven feature selection tools are maybe off-topic, but always useful: Check e.g. sklearn.feature_selection.SelectKBest
using sklearn.feature_selection.f_classif
or sklearn.feature_selection.f_regression
with e.g. k=2
in your case.