Mapping column names to random forest feature importances

后端 未结 4 1245
-上瘾入骨i
-上瘾入骨i 2021-02-04 12:28

I am trying to plot feature importances for a random forest model and map each feature importance back to the original coefficient. I\'ve managed to create a plot that shows the

相关标签:
4条回答
  • 2021-02-04 13:02

    Another simple way to get a sorted list

    importances = list(zip(xgb_classifier.feature_importances_, df.columns))
    importances.sort(reverse=True)
    

    Next code adds a visualization if it's necessary

    pd.DataFrame(importances, index=[x for (_,x) in importances]).plot(kind = 'bar')
    
    0 讨论(0)
  • 2021-02-04 13:15

    A sort of generic solution would be to throw the features/importances into a dataframe and sort them before plotting:

    import pandas as pd
    %matplotlib inline
    #do code to support model
    #"data" is the X dataframe and model is the SKlearn object
    
    feats = {} # a dict to hold feature_name: feature_importance
    for feature, importance in zip(data.columns, model.feature_importances_):
        feats[feature] = importance #add the name/value pair 
    
    importances = pd.DataFrame.from_dict(feats, orient='index').rename(columns={0: 'Gini-importance'})
    importances.sort_values(by='Gini-importance').plot(kind='bar', rot=45)
    
    0 讨论(0)
  • 2021-02-04 13:18

    I use a similar solution to Sam:

    import pandas as pd
    important_features = pd.Series(data=brf.feature_importances_,index=x_dummies.columns)
    important_features.sort_values(ascending=False,inplace=True)
    

    I always just print the list using print important_features but to plot you could always use Series.plot

    0 讨论(0)
  • 2021-02-04 13:18

    It is simple, I plotted it like this.

    feat_importances = pd.Series(extraTree.feature_importances_, index=X.columns)
    feat_importances.nlargest(15).plot(kind='barh')
    plt.title("Top 15 important features")
    plt.show()
    
    
    0 讨论(0)
提交回复
热议问题