Problematic Random Forest training runtime when using formula interface

前端 未结 2 1051
挽巷
挽巷 2021-02-09 01:50

Running the Random Forest example from http://www.kaggle.com/c/icdar2013-gender-prediction-from-handwriting/data, the following line:

forest_model <- randomFo         


        
相关标签:
2条回答
  • 2021-02-09 02:19

    Found the problem, using formula in randomForest has created a tremendous performance degradation.

    More on this and how to estimate random forest running time can found in: https://stats.stackexchange.com/questions/37370/random-forest-computing-time-in-r and in http://www.gregorypark.org/?p=286

    Here is final code:

    forest_model <- randomForest(y=train$male, x=train[,-2], ntree=10000,do.trace=T)
    
    0 讨论(0)
  • 2021-02-09 02:28

    One idea, to control the convergence is to use the do.trace for a verbose mode

    iris.rf <- randomForest(Species ~ ., data=iris, importance=TRUE,
    +                         proximity=TRUE,do.trace=TRUE)
    ntree      OOB      1      2      3
        1:   8.62%  0.00%  9.52% 15.00%
        2:   5.49%  0.00%  3.45% 13.79%
        3:   5.45%  0.00%  5.41% 11.76%
        4:   4.72%  0.00%  4.88%  9.30%
        5:   5.11%  0.00%  6.52%  8.89%
        6:   5.56%  2.08%  6.25%  8.33%
        7:   4.76%  0.00%  6.12%  8.16%
        8:   5.41%  0.00%  8.16%  8.16%
     .......
    
    0 讨论(0)
提交回复
热议问题