Improving model training speed in caret (R)

后端 未结 3 1882
轮回少年
轮回少年 2021-01-31 12:30

I have a dataset consisting of 20 features and roughly 300,000 observations. I\'m using caret to train model with doParallel and four cores. Even training on 10% of my data ta

3条回答
  •  栀梦
    栀梦 (楼主)
    2021-01-31 12:37

    @phiver hits the nail on the head but, for this situation, there are a few things to suggest:

    • make sure that you are not exhausting your system memory by using parallel processing. You are making X extra copies of the data in memory when using X workers.
    • with a class imbalance, additional sampling can help. Downsampling might help improve performance and take less time.
    • use different libraries. ranger instead of randomForest, xgboost or C5.0 instead of gbm. You should realize that ensemble methods are fitting a ton of constituent models and a bound to take a while to fit.
    • the package has a racing-type algorithm for tuning parameters in less time
    • the development version on github has random search methods for the models with a lot of tuning parameters.

    Max

提交回复
热议问题