How can I get the OOB samples used for each tree in random forest model R?

落花浮王杯 提交于 2021-01-04 05:53:50

问题


Is it possible to get the OOB samples used by random forest algorithm for each tree ? I'm using R language. I know that RandomForest algorithm uses almost 66% of the data (selected randomly) to grow up each tree, and 34 % of the data as OOB samples to measure the OOB error, but I don't know how to get those OOB samples for each tree ?

Any idea ?


回答1:


Assuming you are using the randomForest package, you just need to set the keep.inbag argument to TRUE.

library(randomForest)
set.seed(1)
rf <- randomForest(Species ~ ., iris, keep.inbag = TRUE)

The output list will contain an n by ntree matrix that can be accessed by the name inbag.

dim(rf$inbag)
# [1] 150 500

rf$inbag[1:5, 1:3]
#   [,1] [,2] [,3]
# 1    0    1    0
# 2    1    1    0
# 3    1    0    1
# 4    1    0    1
# 5    0    0    2

The values in the matrix tell you how many times a sample was in-bag. For example, the value of 2 in row 5 column 3 above says that the 5th observation was included in-bag twice for the 3rd tree.

As a bit of background here, a sample can show up in-bag more than once (hence the 2) because by default the sampling is done with replacement.

You can also sample without replacement via the replace parameter.

set.seed(1)
rf2 <- randomForest(Species ~ ., iris, keep.inbag = TRUE, replace = FALSE)

And now we can verify that without replacement, the maximum number of times any sample is included is once.

# with replacement, the maximum number of times a sample is included in a tree is 7
max(rf$inbag)
# [1] 7

# without replacemnet, the maximum number of times a sample is included in a tree is 1
max(rf2$inbag)
# [1] 1


来源:https://stackoverflow.com/questions/47728851/how-can-i-get-the-oob-samples-used-for-each-tree-in-random-forest-model-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!