Does random forest in R have a limitation of size of training data?

前端 未结 2 1679
忘掉有多难
忘掉有多难 2021-02-04 20:17

I am training randomforest on my training data which has 114954 rows and 135 columns (predictors). And I am getting the following error.

model <- randomForest         


        
2条回答
  •  一生所求
    2021-02-04 21:09

    As was stated in an answer to a previous question (which I can't find now), increasing the sample size affects the memory requirements of RF in a nonlinear way. Not only is the model matrix larger, but the default size of each tree, based on the number of points per leaf, is also larger.

    To fit the model given your memory constraints, you can do the following:

    1. Increase the nodesize parameter to something bigger than the default, which is 5 for a regression RF. With 114k observations, you should be able to increase this significantly without hurting performance.

    2. Reduce the number of trees per RF, with the ntree parameter. Fit several small RFs, then combine them with combine to produce the entire forest.

提交回复
热议问题