Feature preprocessing of both continuous and categorical variables (of integer type) with scikit-learn

前端 未结 1 446
盖世英雄少女心
盖世英雄少女心 2021-02-01 09:20

The main goals are as follows:

1) Apply StandardScaler to continuous variables

2) Apply LabelEncoder and OnehotEncoder to

相关标签:
1条回答
  • 2021-02-01 10:01

    Check out the sklearn_pandas.DataFrameMapper meta-transformer. Use it as the first step in your pipeline to perform column-wise data engineering operations:

    mapper = DataFrameMapper(
      [(continuous_col, StandardScaler()) for continuous_col in continuous_cols] +
      [(categorical_col, LabelBinarizer()) for categorical_col in categorical_cols]
    )
    pipeline = Pipeline(
      [("mapper", mapper),
      ("estimator", estimator)]
    )
    pipeline.fit_transform(df, df["y"])
    

    Also, you should be using sklearn.preprocessing.LabelBinarizer instead of a list of [LabelEncoder(), OneHotEncoder()].

    0 讨论(0)
提交回复
热议问题