python feature selection in pipeline: how determine feature names?

后端未结

关注

 3  1774

i used pipeline and grid_search to select the best parameters and then used these parameters to fit the best pipeline (\'best_pipe\'). However since the feature_selection (Selec

相关标签:

3条回答

囚心锁ツ

2021-02-03 12:54
You can access the feature selector by name in best_pipe:
```
features = best_pipe.named_steps['feat']
```
Then you can call transform() on an index array to get the names of the selected columns:
```
X.columns[features.transform(np.arange(len(X.columns)))]
```
The output here will be the eighty column names selected in the pipeline.
0 讨论(0)
发布评论:

提交评论
- 加载中...
-上瘾入骨i

2021-02-03 13:05
This could be an instructive alternative: I encountered a similar need as what was asked by OP. If one wants to get the k best features' indices directly from GridSearchCV:
```
finalFeatureIndices = gs.best_estimator_.named_steps["feat"].get_support(indices=True)
```
And via index manipulation, can get your finalFeatureList:
```
finalFeatureList = [initialFeatureList[i] for i in finalFeatureIndices]
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
感动是毒

2021-02-03 13:14
Jake's answer totally works. But depending on what feature selector your using, there's another option that I think is cleaner. This one worked for me:
```
X.columns[features.get_support()]
```
It gave me an identical answer to Jake's answer. And you can see more about it in the docs, but get_support returns an array of true/false values for whether or not the column was used. Also, it's worth noting that X must be of identical shape to the training data used on the feature selector.
0 讨论(0)
发布评论:

提交评论
- 加载中...