sklearn: get feature names after L1-based feature selection

前端未结

关注

 2  1353

This question and answer demonstrate that when feature selection is performed using one of scikit-learn\'s dedicated feature selection routines, then the names of the selected f

相关标签:

2条回答

攒了一身酷

2021-02-10 08:40

I've been using sklearn 15.2, and according to LinearSVC documentation , coef_ is an array, shape = [n_features] if n_classes == 2 else [n_classes, n_features]. So first, np.flatnonzero doesn't work for multi-class. You'll have index out of range error. Second, it should be np.where(svc.coef_ != 0)[1] instead of np.where(svc.coef_ != 0)[0] . 0 is index of classes, not features. I ended up with using np.asarray(vectorizer.get_feature_names())[list(set(np.where(svc.coef_ != 0)[1]))]

0 讨论(0)
发布评论:

提交评论
- 加载中...
灰色年华

2021-02-10 08:51
For sparse estimators you can generally find the support by checking where the non-zero entries are in the coefficients vector (provided the coefficients vector exists, which is the case for e.g. linear models)
```
support = np.flatnonzero(estimator.coef_)
```
For your LinearSVC with l1 penalty it would accordingly be
```
from sklearn.svm import LinearSVC
svc = LinearSVC(C=1., penalty='l1', dual=False)
svc.fit(X, y)
selected_feature_names = np.asarray(vectorizer.get_feature_names())[np.flatnonzero(svc.coef_)]
```
0 讨论(0)
发布评论:

提交评论
- 加载中...