How to get feature names selected by feature elimination in sklearn pipeline?

后端 未结 1 1866
逝去的感伤
逝去的感伤 2021-02-13 12:18

I am using recursive feature elimination in my sklearn pipeline, the pipeline looks something like this:

from sklearn.pipeline import FeatureUnion, Pipeline
from         


        
1条回答
  •  囚心锁ツ
    2021-02-13 12:26

    You can access each step of the Pipeline with the attribute named_steps, here's an example on the iris dataset, that only selects 2 features, but the solution will scale.

    from sklearn import datasets
    from sklearn import feature_selection
    from sklearn.svm import LinearSVC
    
    iris = datasets.load_iris()
    X = iris.data
    y = iris.target
    
    # classifier
    LinearSVC1 = LinearSVC(tol=1e-4,  C = 0.10000000000000001)
    f5 = feature_selection.RFE(estimator=LinearSVC1, n_features_to_select=2, step=1)
    
    pipeline = Pipeline([
        ('rfe_feature_selection', f5),
        ('clf', LinearSVC1)
        ])
    
    pipeline.fit(X, y)
    

    With named_steps you can access the attributes and methods of the transform object in the pipeline. The RFE attribute support_ (or the method get_support()) will return a boolean mask of the selected features:

    support = pipeline.named_steps['rfe_feature_selection'].support_
    

    Now support is an array, you can use that to efficiently extract the name of your selected features (columns). Make sure your feature names are in a numpy array, not a python list.

    import numpy as np
    feature_names = np.array(iris.feature_names) # transformed list to array
    
    feature_names[support]
    
    array(['sepal width (cm)', 'petal width (cm)'], 
          dtype='|S17')
    

    EDIT

    Per my comment above, here is your example with the CustomFeautures() function removed:

    from sklearn.pipeline import FeatureUnion, Pipeline
    from sklearn import feature_selection
    from sklearn.feature_extraction.text import TfidfVectorizer
    from sklearn.svm import LinearSVC
    import numpy as np
    
    X = ['I am a sentence', 'an example']
    Y = [1, 2]
    X_dev = ['another sentence']
    
    # classifier
    LinearSVC1 = LinearSVC(tol=1e-4,  C = 0.10000000000000001)
    f5 = feature_selection.RFE(estimator=LinearSVC1, n_features_to_select=500, step=1)
    
    pipeline = Pipeline([
        ('features', FeatureUnion([
           ('tfidf', TfidfVectorizer(ngram_range=(1, 3), max_features= 4000))])), 
        ('rfe_feature_selection', f5),
        ('clf', LinearSVC1),
        ])
    
    pipeline.fit(X, Y)
    y_pred = pipeline.predict(X_dev)
    
    support = pipeline.named_steps['rfe_feature_selection'].support_
    feature_names = pipeline.named_steps['features'].get_feature_names()
    np.array(feature_names)[support]
    

    0 讨论(0)
提交回复
热议问题