How to use XGboost in PySpark Pipeline

后端 未结 3 1145
面向向阳花
面向向阳花 2021-02-09 04:23

I want to update my code of pyspark. In the pyspark, it must put the base model in a pipeline, the office demo of pipeline use the LogistictRegression as an base model. However,

3条回答
  •  别跟我提以往
    2021-02-09 05:15

    There is a maintained (used in production by several companies) distributed XGBoost library as mentioned above (https://github.com/dmlc/xgboost), however to use it from PySpark is a bit tricky, someone made a working pyspark wrapper for version 0.72 of the library, with 0.8 support in progress.

    See here https://medium.com/@bogdan.cojocar/pyspark-and-xgboost-integration-tested-on-the-kaggle-titanic-dataset-4e75a568bdb, and https://github.com/dmlc/xgboost/issues/1698 for the full discussion.

    Make sure the xgboost jars are in your pyspark jar path.

提交回复
热议问题