I have a highly imbalanced data set with target class instances in the following ratio 60000:1000:1000:50
(i.e. a total of 4 classes). I want to use randomFor
You can pass a named vector to classwt
.
But how weight is calculated is very tricky.
For example, if your target variable y
has two classes "Y" and "N", and you want to set balanced weight, you should do:
wn = sum(y="N")/length(y)
wy = 1
Then set classwt = c("N"=wn, "Y"=wy)
Alternatively, you may want to use ranger
package. This package offers flexible builds of random forests, and specifying class / sample weight is easy. ranger
is also supported by caret
package.