Feature Importance for XGBoost in Sagemaker

三世轮回 提交于 2020-06-27 12:51:27

问题


I have built an XGBoost model using Amazon Sagemaker, but I was unable to find anything which will help me interpret the model and validate if it has learned the right dependencies.

Generally, we can see Feature Importance for XGBoost by get_fscore() function in the python API (https://xgboost.readthedocs.io/en/latest/python/python_api.html) I see nothing of that sort in the sagemaker api(https://sagemaker.readthedocs.io/en/stable/estimators.html).

I know I can build my own model and then deploy that using sagemaker but I am curious if anyone has faced this problem and how they overcame it.

Thanks.


回答1:


SageMaker XGBoost currently does not provide interface to retrieve feature importance from the model. You can write some code to get the feature importance from the XGBoost model. You have to get the booster object artifacts from the model in S3 and then use the following snippet

import pickle as pkl
import xgboost
booster = pkl.load(open(model_file, 'rb'))
booster.get_score()
booster.get_fscore()

Refer XGBoost doc for methods to get feature importance from the Booster object such as get_score() or get_fscore().




回答2:


As of 2019-06-17, Sagemaker XGBoost model is stored on S3 in as archive named model.tar.gz. This archive consist of single pickled model file named xgboost-model.

To load the model directly from S3 without downloading, you can use the following code:

import s3fs
import pickle
import tarfile
import xgboost

model_path = 's3://<bucket>/<path_to_model_dir>/xgboost-2019-06-16-09-56-39-854/output/model.tar.gz'

fs = s3fs.S3FileSystem()

with fs.open(model_path, 'rb') as f:
    with tarfile.open(fileobj=f, mode='r') as tar_f:
        with tar_f.extractfile('xgboost-model') as extracted_f:
            xgbooster = pickle.load(extracted_f)

xgbooster.get_fscore()



回答3:


Although you can write a custom script like rajesh and Lukas suggested and use XGBoost as a framework to run the script (see How to Use Amazon SageMaker XGBoost for how to use the "script mode"), SageMaker has recently launched SageMaker Debugger, which allows you to retrieve feature importance from XGBoost in real time.

The following example notebook demonstrates how to use SageMaker Debugger to retrieve feature importance: https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker-debugger/xgboost_builtin_rules/xgboost-regression-debugger-rules.ipynb.



来源:https://stackoverflow.com/questions/55621967/feature-importance-for-xgboost-in-sagemaker

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!