Using randomForest package in R, how to get probabilities from classification model?

前端未结

关注

 1  1904

说谎

TL;DR :

Is there something I can flag in the original randomForest call to avoid having to re-run the

相关标签:

1条回答

不知归路

2021-01-31 19:31
model$predicted is NOT the same thing returned by predict(). If you want the probability of the TRUE or FALSE class then you must run predict(), or pass x,y,xtest,ytest like
```
randomForest(x,y,xtest=x,ytest=y), 
```
where x=out.data[, feature.cols], y=out.data[, response.col].

model$predicted returns the class based on which class had the larger value in model$votes for each record. votes, as @joran pointed out is the proportion of OOB(out of bag) ‘votes’ from the random forest, a vote only counting when the record was selected in an OOB sample. On the other hand predict() returns the true probability for each class based on votes by all the trees.

Using randomForest(x,y,xtest=x,ytest=y) functions a little differently than when passing a formula or simply randomForest(x,y), as in the example given above. randomForest(x,y,xtest=x,ytest=y) WILL return the probability for each class, this may sound a little weird, but it is found under model$test$votes, and the predicted class under model$test$predicted, which simply selects the class based on which class had the larger value in model$test$votes. Also, when using randomForest(x,y,xtest=x,ytest=y), model$predicted and model$votes have the same definition as above.

Finally, just to note, if randomForest(x,y,xtest=x,ytest=y) is used, then, in order to use predict() function the keep.forest flag should be set to TRUE.
```
model=randomForest(x,y,xtest=x,ytest=y,keep.forest=TRUE). 
prob=predict(model,x,type="prob")
```
prob WILL be equivalent to model$test$votes since the test data input are both x.
0 讨论(0)
发布评论:

提交评论
- 加载中...