The main goals are as follows:
1) Apply StandardScaler
to continuous variables
2) Apply LabelEncoder
and OnehotEncoder
to
Check out the sklearn_pandas.DataFrameMapper meta-transformer. Use it as the first step in your pipeline to perform column-wise data engineering operations:
mapper = DataFrameMapper(
[(continuous_col, StandardScaler()) for continuous_col in continuous_cols] +
[(categorical_col, LabelBinarizer()) for categorical_col in categorical_cols]
)
pipeline = Pipeline(
[("mapper", mapper),
("estimator", estimator)]
)
pipeline.fit_transform(df, df["y"])
Also, you should be using sklearn.preprocessing.LabelBinarizer
instead of a list of [LabelEncoder(), OneHotEncoder()]
.