What splitting criterion does Random Tree in Weka 3.7.11 use for numerical attributes?

问题

I'm using RandomForest from Weka 3.7.11 which in turn is bagging Weka's RandomTree. My input attributes are numerical and the output attribute(label) is also numerical.

When training the RandomTree, K attributes are chosen at random for each node of the tree. Several splits based on those attributes are attempted and the "best" one is chosen. How does Weka determine what split is best in this (numerical) case?

For nominal attributes I believe Weka is using the information gain criterion which is based on conditional entropy.

IG(T|a) = H(T) - H(T|a)

Is something similar used for numerical attributes? Maybe differential entropy?

回答1:

When tree is split on numerical attribute, it is split on the condition like a>5. So, this condition effectively becomes binary variable and the criterion (information gain) is absolutely the same.

P.S. For regression commonly used is the sum of squared errors (for each leaf, then sum over leaves). But I do not know specifically about Weka

来源：https://stackoverflow.com/questions/30150970/what-splitting-criterion-does-random-tree-in-weka-3-7-11-use-for-numerical-attri

标签

machine-learning

weka

random-forest

decision-tree

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!