Does random forest in R have a limitation of size of training data?

前端 未结 2 1685
忘掉有多难
忘掉有多难 2021-02-04 20:17

I am training randomforest on my training data which has 114954 rows and 135 columns (predictors). And I am getting the following error.

model <- randomForest         


        
相关标签:
2条回答
  • 2021-02-04 20:57

    One alternative you could try if you can't use a machine with more memory is: train separate models on subsets of the data (say 10 separate subsets) and then combine the output of each model in a sensible way (the easiest way to do this is averaging the predictions of the 10 models but there are other ways to ensemble models http://en.wikipedia.org/wiki/Ensemble_learning).

    Technically you would be using all your data without hitting the memory restriction, but depending on the size of the resulting subsets of the data the resulting models might be too weak to be of any use.

    0 讨论(0)
  • 2021-02-04 21:09

    As was stated in an answer to a previous question (which I can't find now), increasing the sample size affects the memory requirements of RF in a nonlinear way. Not only is the model matrix larger, but the default size of each tree, based on the number of points per leaf, is also larger.

    To fit the model given your memory constraints, you can do the following:

    1. Increase the nodesize parameter to something bigger than the default, which is 5 for a regression RF. With 114k observations, you should be able to increase this significantly without hurting performance.

    2. Reduce the number of trees per RF, with the ntree parameter. Fit several small RFs, then combine them with combine to produce the entire forest.

    0 讨论(0)
提交回复
热议问题