Feature Importance with XGBClassifier

后端 未结 9 1142
隐瞒了意图╮
隐瞒了意图╮ 2020-12-14 10:03

Hopefully I\'m reading this wrong but in the XGBoost library documentation, there is note of extracting the feature importance attributes using feature_importances_

相关标签:
9条回答
  • 2020-12-14 10:44

    For xgboost, if you use xgb.fit(),then you can use the following method to get feature importance.

    import pandas as pd
    xgb_model=xgb.fit(x,y)
    xgb_fea_imp=pd.DataFrame(list(xgb_model.get_booster().get_fscore().items()),
    columns=['feature','importance']).sort_values('importance', ascending=False)
    print('',xgb_fea_imp)
    xgb_fea_imp.to_csv('xgb_fea_imp.csv')
    
    from xgboost import plot_importance
    plot_importance(xgb_model, )
    
    0 讨论(0)
  • 2020-12-14 10:46

    It seems like the api keeps on changing. For xgboost version 1.0.2, just changing from imp_vals = xgb.booster().get_fscore() to imp_vals = xgb.get_booster().get_fscore() in @David's answer does the trick. The updated code is -

    from numpy import array
    
    def get_xgb_imp(xgb, feat_names):
        imp_vals = xgb.get_booster().get_fscore()
        imp_dict = {feat_names[i]:float(imp_vals.get('f'+str(i),0.)) for i in range(len(feat_names))}
        total = array(imp_dict.values()).sum()
        return {k:v/total for k,v in imp_dict.items()}
    
    0 讨论(0)
  • 2020-12-14 10:47

    I found out the answer. It appears that version 0.4a30 does not have feature_importance_ attribute. Therefore if you install the xgboost package using pip install xgboost you will be unable to conduct feature extraction from the XGBClassifier object, you can refer to @David's answer if you want a workaround.

    However, what I did is build it from the source by cloning the repo and running . ./build.sh which will install version 0.4 where the feature_importance_ attribute works.

    Hope this helps others!

    0 讨论(0)
  • 2020-12-14 10:57

    For those having the same problem as Luís Bianchin, "TypeError: 'str' object is not callable", I found a solution (that works for me at least) here.

    In short, I found modifying David's code from

    imp_vals = xgb.booster().get_fscore()
    

    to

    imp_vals = xgb.get_fscore()
    

    worked for me.

    For more detail I would recommend visiting the link above.

    Big thanks to David and ianozsvald

    0 讨论(0)
  • 2020-12-14 10:58

    Get Feature Importance as a sorted data frame

    import pandas as pd
    import numpy as np
    def get_xgb_imp(xgb, feat_names):
        imp_vals = xgb.booster().get_fscore()
        feats_imp = pd.DataFrame(imp_vals,index=np.arange(2)).T
        feats_imp.iloc[:,0]= feats_imp.index    
        feats_imp.columns=['feature','importance']
        feats_imp.sort_values('importance',inplace=True,ascending=False)
        feats_imp.reset_index(drop=True,inplace=True)
        return feats_imp
    
    feature_importance_df = get_xgb_imp(xgb, feat_names)
    
    0 讨论(0)
  • 2020-12-14 11:02

    You can also use the built-in plot_importance function:

    from xgboost import XGBClassifier, plot_importance
    fit = XGBClassifier().fit(X,Y)
    plot_importance(fit)
    

    0 讨论(0)
提交回复
热议问题