Hopefully I\'m reading this wrong but in the XGBoost library documentation, there is note of extracting the feature importance attributes using feature_importances_
For xgboost
, if you use xgb.fit()
,then you can use the following method to get feature importance.
import pandas as pd
xgb_model=xgb.fit(x,y)
xgb_fea_imp=pd.DataFrame(list(xgb_model.get_booster().get_fscore().items()),
columns=['feature','importance']).sort_values('importance', ascending=False)
print('',xgb_fea_imp)
xgb_fea_imp.to_csv('xgb_fea_imp.csv')
from xgboost import plot_importance
plot_importance(xgb_model, )
It seems like the api keeps on changing. For xgboost version 1.0.2, just changing from imp_vals = xgb.booster().get_fscore()
to imp_vals = xgb.get_booster().get_fscore()
in @David's answer does the trick. The updated code is -
from numpy import array
def get_xgb_imp(xgb, feat_names):
imp_vals = xgb.get_booster().get_fscore()
imp_dict = {feat_names[i]:float(imp_vals.get('f'+str(i),0.)) for i in range(len(feat_names))}
total = array(imp_dict.values()).sum()
return {k:v/total for k,v in imp_dict.items()}
I found out the answer. It appears that version 0.4a30
does not have feature_importance_
attribute. Therefore if you install the xgboost package using pip install xgboost
you will be unable to conduct feature extraction from the XGBClassifier
object, you can refer to @David's answer if you want a workaround.
However, what I did is build it from the source by cloning the repo and running . ./build.sh
which will install version 0.4
where the feature_importance_
attribute works.
Hope this helps others!
For those having the same problem as Luís Bianchin, "TypeError: 'str' object is not callable", I found a solution (that works for me at least) here.
In short, I found modifying David's code from
imp_vals = xgb.booster().get_fscore()
to
imp_vals = xgb.get_fscore()
worked for me.
For more detail I would recommend visiting the link above.
Big thanks to David and ianozsvald
Get Feature Importance as a sorted data frame
import pandas as pd
import numpy as np
def get_xgb_imp(xgb, feat_names):
imp_vals = xgb.booster().get_fscore()
feats_imp = pd.DataFrame(imp_vals,index=np.arange(2)).T
feats_imp.iloc[:,0]= feats_imp.index
feats_imp.columns=['feature','importance']
feats_imp.sort_values('importance',inplace=True,ascending=False)
feats_imp.reset_index(drop=True,inplace=True)
return feats_imp
feature_importance_df = get_xgb_imp(xgb, feat_names)
You can also use the built-in plot_importance function:
from xgboost import XGBClassifier, plot_importance
fit = XGBClassifier().fit(X,Y)
plot_importance(fit)