R: unclear behaviour of tuneRF function (randomForest package)

后端未结

关注

 1  2028

I feel uncomfortable with the meaning of the stepFactor parameter of the tuneRF function which is used for tuning the mtry parameter used further in th

相关标签:

1条回答

猫巷女王i

2021-02-06 07:43
Below is a summary of how tuneRF works:
1. a. Set mtry to the default value of sqrt(p) for classification, and p/3 for regression (where p = total number of variables)
  
  b. Compute the out-of-bag (OOB) error (say error_default) for a Random Forest with mtry set to the default value found above
2. a. Look to the left: set mtry = default value/stepFactor. For instance, if stepFactor=1.5 and your default starting value is 8, mtry would be set to be 8/1.5=5.33, rounded up to the be an integer, which gives 6
  
  b. Compute the OOB error, say error_left
3. a. Look to the right: set mtry = default value*stepFactor. To continue with my example, mtry would be set to be 8*1.5=12
  
  b. Compute the OOB error, say error_right
4. i. If (error_default < error_right) OR (error_default < error_left), the best mtry is the default value
  
  ii. If the previous condition is not met, but the delta between errors_default and error_right/error_left is less than the improve parameter, the best mtry is the default value
  
  iii. Without any loss of generality, if the condition is not met, and if error_right < error_left, and if (error_default-error_right) > improve, set mtry to be mtry_right (12). From now on, always go to the right
5. If 4.iii. is verified, iterate: set mtry to be mtry_right*stepFactor (in my example, 12*1.5=18), compute the OOB error and compare it with the error obtained at the previous step (in my example, for mtry=12). If the error new error is smaller, and if the gain in error reduction is enough (i.e, >improve), select the new mtry and continue to repeat these steps, otherwise stop and return the current mtry as the best mtry
The smaller stepFactor you set (e.g., 1.1, 1.2), the more values of mtry you try (fine search), the bigger stepFactor you set (e.g., 2, 2.5), the less values you try (rough search). Also, with low values of improve, the search will continue longer.
0 讨论(0)
发布评论:

提交评论
- 加载中...