Regarding RandomTree in Weka

一笑奈何 提交于 2019-12-10 21:09:58

问题


I was playing around with weka when I observed a minNum field in the RandomTree configuration. I read the description which said "The minimum total weight of the instances in a leaf". However, I couldn't really understand what it means.

I played around with that number, and I realized that when I increase it, the size of the tree thus generated reduces. I couldn't correlate as to why this is happening.

Any help/references will be appreciated.


回答1:


This has to do with the minimum number of instances on a leaf node (which is often 2 by default in decision trees, like J48). The higher you set this parameter, the more general the tree will be since having many leaves with a low number of instances yields a too granular tree structure.

Here are two examples on the iris dataset, which shows how the -M option might affect size of the resulting tree:

$ weka weka.classifiers.trees.RandomTree -t iris.arff -i

petallength < 2.45 : Iris-setosa (50/0)
petallength >= 2.45
|   petalwidth < 1.75
|   |   petallength < 4.95
|   |   |   petalwidth < 1.65 : Iris-versicolor (47/0)
|   |   |   petalwidth >= 1.65 : Iris-virginica (1/0)
|   |   petallength >= 4.95
|   |   |   petalwidth < 1.55 : Iris-virginica (3/0)
|   |   |   petalwidth >= 1.55
|   |   |   |   sepallength < 6.95 : Iris-versicolor (2/0)
|   |   |   |   sepallength >= 6.95 : Iris-virginica (1/0)
|   petalwidth >= 1.75
|   |   petallength < 4.85
|   |   |   sepallength < 5.95 : Iris-versicolor (1/0)
|   |   |   sepallength >= 5.95 : Iris-virginica (2/0)
|   |   petallength >= 4.85 : Iris-virginica (43/0)

Size of the tree : 17

$ weka weka.classifiers.trees.RandomTree -M 6 -t iris.arff -i

petallength < 2.45 : Iris-setosa (50/0)
petallength >= 2.45
|   petalwidth < 1.75
|   |   petallength < 4.95
|   |   |   petalwidth < 1.65 : Iris-versicolor (47/0)
|   |   |   petalwidth >= 1.65 : Iris-virginica (1/0)
|   |   petallength >= 4.95 : Iris-virginica (6/2)
|   petalwidth >= 1.75
|   |   petallength < 4.85 : Iris-virginica (3/1)
|   |   petallength >= 4.85 : Iris-virginica (43/0)

Size of the tree : 11

As a sidenote, Random trees rely on bagging, which means there's a subsampling of attributes (K randomly chosen to split at each node); contrary to REPTree, however, there's no pruning (like in RandomForest), so you may end up with very noisy trees.



来源:https://stackoverflow.com/questions/4845812/regarding-randomtree-in-weka

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!