Sklearn Pipeline: Get feature names after OneHotEncode In ColumnTransformer

后端 未结 2 804
梦毁少年i
梦毁少年i 2021-01-30 07:26

I want to get feature names after I fit the pipeline.

categorical_features = [\'brand\', \'category_name\', \'sub_category\']
categorical_transformer = Pipeline(s         


        
2条回答
  •  说谎
    说谎 (楼主)
    2021-01-30 07:54

    You can access the feature_names using the following snippet!

    clf.named_steps['preprocessor'].transformers_[1][1]\
       .named_steps['onehot'].get_feature_names(categorical_features)
    

    Using sklearn >= 0.21 version, we can make it more simpler:

    clf['preprocessor'].transformers_[1][1]['onehot']\
                       .get_feature_names(categorical_features)
    

    Reproducible example:

    import numpy as np
    import pandas as pd
    from sklearn.impute import SimpleImputer
    from sklearn.preprocessing import OneHotEncoder, StandardScaler
    from sklearn.pipeline import Pipeline
    from sklearn.compose import ColumnTransformer
    from sklearn.linear_model import LinearRegression
    
    df = pd.DataFrame({'brand': ['aaaa', 'asdfasdf', 'sadfds', 'NaN'],
                       'category': ['asdf', 'asfa', 'asdfas', 'as'],
                       'num1': [1, 1, 0, 0],
                       'target': [0.2, 0.11, 1.34, 1.123]})
    
    numeric_features = ['num1']
    numeric_transformer = Pipeline(steps=[
        ('imputer', SimpleImputer(strategy='median')),
        ('scaler', StandardScaler())])
    
    categorical_features = ['brand', 'category']
    categorical_transformer = Pipeline(steps=[
        ('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
        ('onehot', OneHotEncoder(handle_unknown='ignore'))])
    
    preprocessor = ColumnTransformer(
        transformers=[
            ('num', numeric_transformer, numeric_features),
            ('cat', categorical_transformer, categorical_features)])
    
    clf = Pipeline(steps=[('preprocessor', preprocessor),
                          ('regressor',  LinearRegression())])
    clf.fit(df.drop('target', 1), df['target'])
    
    clf.named_steps['preprocessor'].transformers_[1][1]\
       .named_steps['onehot'].get_feature_names(categorical_features)
    
    # ['brand_NaN' 'brand_aaaa' 'brand_asdfasdf' 'brand_sadfds' 'category_as'
    #  'category_asdf' 'category_asdfas' 'category_asfa']
    

提交回复
热议问题