问题
I'm using RandomForest from Weka 3.7.11 which in turn is bagging Weka's RandomTree. My input attributes are numerical and the output attribute(label) is also numerical.
When training the RandomTree, K attributes are chosen at random for each node of the tree. Several splits based on those attributes are attempted and the "best" one is chosen. How does Weka determine what split is best in this (numerical) case?
For nominal attributes I believe Weka is using the information gain criterion which is based on conditional entropy.
IG(T|a) = H(T) - H(T|a)
Is something similar used for numerical attributes? Maybe differential entropy?
回答1:
When tree is split on numerical attribute, it is split on the condition like a>5
. So, this condition effectively becomes binary variable and the criterion (information gain) is absolutely the same.
P.S. For regression commonly used is the sum of squared errors (for each leaf, then sum over leaves). But I do not know specifically about Weka
来源:https://stackoverflow.com/questions/30150970/what-splitting-criterion-does-random-tree-in-weka-3-7-11-use-for-numerical-attri