How to get feature names selected by feature elimination in sklearn pipeline?

后端未结

关注

 1  1866

逝去的感伤 2021-02-13 12:18

I am using recursive feature elimination in my sklearn pipeline, the pipeline looks something like this:

from sklearn.pipeline import FeatureUnion, Pipeline
from


      
      
        
          1条回答        

        
                    
            
            
                         
                
              
              
                
                   囚心锁ツ
                                             
                
                
                (楼主)
            
              
              
                2021-02-13 12:26
              

            
            
                        
You can access each step of the Pipeline with the attribute named_steps, here's an example on the iris dataset, that only selects 2 features, but the solution will scale.

from sklearn import datasets
from sklearn import feature_selection
from sklearn.svm import LinearSVC

iris = datasets.load_iris()
X = iris.data
y = iris.target

# classifier
LinearSVC1 = LinearSVC(tol=1e-4,  C = 0.10000000000000001)
f5 = feature_selection.RFE(estimator=LinearSVC1, n_features_to_select=2, step=1)

pipeline = Pipeline([
    ('rfe_feature_selection', f5),
    ('clf', LinearSVC1)
    ])

pipeline.fit(X, y)


With named_steps you can access the attributes and methods of the transform object in the pipeline.  The RFE attribute support_ (or the method get_support()) will return a boolean mask of the selected features:

support = pipeline.named_steps['rfe_feature_selection'].support_


Now support is an array, you can use that to efficiently extract the name of your selected features (columns).  Make sure your feature names are in a numpy array, not a python list.

import numpy as np
feature_names = np.array(iris.feature_names) # transformed list to array

feature_names[support]

array(['sepal width (cm)', 'petal width (cm)'], 
      dtype='|S17')


EDIT

Per my comment above, here is your example with the CustomFeautures() function removed:

from sklearn.pipeline import FeatureUnion, Pipeline
from sklearn import feature_selection
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import LinearSVC
import numpy as np

X = ['I am a sentence', 'an example']
Y = [1, 2]
X_dev = ['another sentence']

# classifier
LinearSVC1 = LinearSVC(tol=1e-4,  C = 0.10000000000000001)
f5 = feature_selection.RFE(estimator=LinearSVC1, n_features_to_select=500, step=1)

pipeline = Pipeline([
    ('features', FeatureUnion([
       ('tfidf', TfidfVectorizer(ngram_range=(1, 3), max_features= 4000))])), 
    ('rfe_feature_selection', f5),
    ('clf', LinearSVC1),
    ])

pipeline.fit(X, Y)
y_pred = pipeline.predict(X_dev)

support = pipeline.named_steps['rfe_feature_selection'].support_
feature_names = pipeline.named_steps['features'].get_feature_names()
np.array(feature_names)[support]

    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                    
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复