I have a data set where the classes are unbalanced. The classes are either 0
, 1
or 2
.
How can I calculate the prediction error fo
If the frequency of class A is 10% and the frequency of class B is 90%, then the class B will become the dominant class and your decision tree will become biased toward the classes that are dominant
In this case, you can pass a dic {A:9,B:1}
to the model to specify the weight of each class, like
clf = tree.DecisionTreeClassifier(class_weight={A:9,B:1})
The class_weight='balanced'
will also work, It just automatically adjusts weights according to the proportion of each class frequencies
After I use class_weight='balanced'
, the record number of each class has become the same (around 88923)