python feature selection in pipeline: how determine feature names?

后端 未结 3 1774
眼角桃花
眼角桃花 2021-02-03 12:11

i used pipeline and grid_search to select the best parameters and then used these parameters to fit the best pipeline (\'best_pipe\'). However since the feature_selection (Selec

相关标签:
3条回答
  • 2021-02-03 12:54

    You can access the feature selector by name in best_pipe:

    features = best_pipe.named_steps['feat']
    

    Then you can call transform() on an index array to get the names of the selected columns:

    X.columns[features.transform(np.arange(len(X.columns)))]
    

    The output here will be the eighty column names selected in the pipeline.

    0 讨论(0)
  • 2021-02-03 13:05

    This could be an instructive alternative: I encountered a similar need as what was asked by OP. If one wants to get the k best features' indices directly from GridSearchCV:

    finalFeatureIndices = gs.best_estimator_.named_steps["feat"].get_support(indices=True)
    

    And via index manipulation, can get your finalFeatureList:

    finalFeatureList = [initialFeatureList[i] for i in finalFeatureIndices]
    
    0 讨论(0)
  • 2021-02-03 13:14

    Jake's answer totally works. But depending on what feature selector your using, there's another option that I think is cleaner. This one worked for me:

    X.columns[features.get_support()]
    

    It gave me an identical answer to Jake's answer. And you can see more about it in the docs, but get_support returns an array of true/false values for whether or not the column was used. Also, it's worth noting that X must be of identical shape to the training data used on the feature selector.

    0 讨论(0)
提交回复
热议问题