Using XGBOOST in c++

前端 未结 6 1627
谎友^
谎友^ 2020-12-13 01:14

How can I use XGBOOST https://github.com/dmlc/xgboost/ library in c++? I have founded Python and Java API, but I can\'t found API for c++

相关标签:
6条回答
  • 2020-12-13 01:27

    Use XGBoost C API.

      BoosterHandle booster;
      const char *model_path = "/path/of/model";
    
      // create booster handle first
      XGBoosterCreate(NULL, 0, &booster);
    
      // by default, the seed will be set 0
      XGBoosterSetParam(booster, "seed", "0");
    
      // load model
      XGBoosterLoadModel(booster, model_path);
    
      const int feat_size = 100;
      const int num_row = 1;
      float feat[num_row][feat_size];
    
      // create some fake data for predicting
      for (int i = 0; i < num_row; ++i) {
        for(int j = 0; j < feat_size; ++j) {
          feat[i][j] = (i + 1) * (j + 1)
        }
      }
    
      // convert 2d array to DMatrix
      DMatrixHandle dtest;
      XGDMatrixCreateFromMat(reinterpret_cast<float*>(feat),
                             num_row, feat_size, NAN, &dtest);
    
      // predict
      bst_ulong out_len;
      const float *f;
      XGBoosterPredict(booster, dtest, 0, 0, &out_len, &f);
      assert(out_len == num_row);
      std::cout << f[0] << std::endl;
    
      // free memory
      XGDMatrixFree(dtest);
      XGBoosterFree(booster);
    

    Note when you want to load an existing model(like above code shows), you have to ensure the data format in training is the same as in predicting. So, if you predict with XGBoosterPredict, which accepts a dense matrix as parameter, you have to use dense matrix in training.

    Training with libsvm format and predict with dense matrix may cause wrong predictions, as XGBoost FAQ says:

    “Sparse” elements are treated as if they were “missing” by the tree booster, and as zeros by the linear booster. For tree models, it is important to use consistent data formats during training and scoring.

    0 讨论(0)
  • 2020-12-13 01:35

    In case training in Python is okay and you only need to run the prediction in C++, there is a nice tool for generating static if/else-code from a trained model:

    https://github.com/popcorn/xgb2cpp

    I ended up using this after spending a day trying to load and use a xgboost model in C++ without success. The code generated by xgb2cpp was working instantly and also has the nice benefit that it does not have any dependencies.

    0 讨论(0)
  • 2020-12-13 01:35

    To solve this problem we runs the xgboost program from C++ source code.

    0 讨论(0)
  • 2020-12-13 01:40

    Here is what you need:https://github.com/EmbolismSoil/xgboostpp

    #include "xgboostpp.h"
    #include <algorithm>
    #include <iostream>
    
    int main(int argc, const char* argv[])
    {
        auto nsamples = 2;
        auto xgb = XGBoostPP(argv[1], 3); //特征列有4列, label有3个, iris例子中分别为三种类型的花,回归任何的话,这里nlabel=1即可
    
        //result = array([[9.9658281e-01, 2.4966884e-03, 9.2058454e-04],
        //       [9.9608469e-01, 2.4954407e-03, 1.4198524e-03]], dtype=float32)
        XGBoostPP::Matrix features(2, 4);
        features <<
            5.1, 3.5, 1.4, 0.2,
            4.9, 3.0, 1.4, 0.2;
    
        XGBoostPP::Matrix y;
        auto ret = xgb.predict(features, y);
        if (ret != 0){
            std::cout << "predict error" << std::endl;
        }
    
        std::cout << "intput : \n" << features << std::endl << "output: \n" << y << std::endl;
    }
    
    0 讨论(0)
  • 2020-12-13 01:43

    There is no example I am aware of. there is a c_api.h file that contains a C/C++ api for the package, and you'll have to find your way using it. I've just did that. Took me a few hours reading the code and trying few things out. But eventually I managed to create a working C++ example of xgboost.

    0 讨论(0)
  • 2020-12-13 01:46

    I ended up using the C API, see below an example:

    // create the train data
    int cols=3,rows=5;
    float train[rows][cols];
    for (int i=0;i<rows;i++)
        for (int j=0;j<cols;j++)
            train[i][j] = (i+1) * (j+1);
    
    float train_labels[rows];
    for (int i=0;i<rows;i++)
        train_labels[i] = 1+i*i*i;
    
    
    // convert to DMatrix
    DMatrixHandle h_train[1];
    XGDMatrixCreateFromMat((float *) train, rows, cols, -1, &h_train[0]);
    
    // load the labels
    XGDMatrixSetFloatInfo(h_train[0], "label", train_labels, rows);
    
    // read back the labels, just a sanity check
    bst_ulong bst_result;
    const float *out_floats;
    XGDMatrixGetFloatInfo(h_train[0], "label" , &bst_result, &out_floats);
    for (unsigned int i=0;i<bst_result;i++)
        std::cout << "label[" << i << "]=" << out_floats[i] << std::endl;
    
    // create the booster and load some parameters
    BoosterHandle h_booster;
    XGBoosterCreate(h_train, 1, &h_booster);
    XGBoosterSetParam(h_booster, "booster", "gbtree");
    XGBoosterSetParam(h_booster, "objective", "reg:linear");
    XGBoosterSetParam(h_booster, "max_depth", "5");
    XGBoosterSetParam(h_booster, "eta", "0.1");
    XGBoosterSetParam(h_booster, "min_child_weight", "1");
    XGBoosterSetParam(h_booster, "subsample", "0.5");
    XGBoosterSetParam(h_booster, "colsample_bytree", "1");
    XGBoosterSetParam(h_booster, "num_parallel_tree", "1");
    
    // perform 200 learning iterations
    for (int iter=0; iter<200; iter++)
        XGBoosterUpdateOneIter(h_booster, iter, h_train[0]);
    
    // predict
    const int sample_rows = 5;
    float test[sample_rows][cols];
    for (int i=0;i<sample_rows;i++)
        for (int j=0;j<cols;j++)
            test[i][j] = (i+1) * (j+1);
    DMatrixHandle h_test;
    XGDMatrixCreateFromMat((float *) test, sample_rows, cols, -1, &h_test);
    bst_ulong out_len;
    const float *f;
    XGBoosterPredict(h_booster, h_test, 0,0,&out_len,&f);
    
    for (unsigned int i=0;i<out_len;i++)
        std::cout << "prediction[" << i << "]=" << f[i] << std::endl;
    
    
    // free xgboost internal structures
    XGDMatrixFree(h_train[0]);
    XGDMatrixFree(h_test);
    XGBoosterFree(h_booster);
    
    0 讨论(0)
提交回复
热议问题