发表新帖

发表新帖

Feature preprocessing of both continuous and categorical variables (of integer type) with scikit-learn

前端未结

关注

 1  445

盖世英雄少女心 2021-02-01 09:20

The main goals are as follows:

1) Apply StandardScaler to continuous variables

2) Apply LabelEncoder and OnehotEncoder to

1条回答

迷失自我 (楼主)

2021-02-01 10:01
Check out the sklearn_pandas.DataFrameMapper meta-transformer. Use it as the first step in your pipeline to perform column-wise data engineering operations:
```
mapper = DataFrameMapper(
  [(continuous_col, StandardScaler()) for continuous_col in continuous_cols] +
  [(categorical_col, LabelBinarizer()) for categorical_col in categorical_cols]
)
pipeline = Pipeline(
  [("mapper", mapper),
  ("estimator", estimator)]
)
pipeline.fit_transform(df, df["y"])
```
Also, you should be using sklearn.preprocessing.LabelBinarizer instead of a list of [LabelEncoder(), OneHotEncoder()].
0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题