Understanding tree structure in R gbm package

夙愿已清 提交于 2019-11-30 15:21:02

The first column that is printed when you use the pretty.gbm.tree is the row.names that is assigned in the script pretty.gbm.tree.R. In the script, the row.names is assigned as row.names(temp) <- 0:(nrow(temp)-1) where temp is the tree information stored in data.frame form. The right way to interpret the row.names is to read it as the node_id with the root node being assigned a 0 value.

In your example:

Id SplitVar SplitCodePred LeftNode RightNode MissingNode ErrorReduction Weight Prediction 0 9 6.250000e+01 1 2 21 0.6634681 5981 0.005000061

means that the root node (indicated by the row number 0) is split by the 9-th split variable (the numbering of the split variable here starts from 0, so the split variable is the 10th column in the training set x). SplitCodePred of 6.25 denotes that all points less than 6.25 went to the LeftNode 1 and all points greater than 6.25 went to RightNode 2. All points that had a missing value in this column were assigned to the MissingNode 21. The ErrorReduction was 0.6634 due to this split and there were 5981 (Weight) in the root node. Prediction of 0.005 denotes the value assigned to all values at this node before the point was split. In the case of terminal nodes (or leaves) denoted by -1 in SplitVar, LeftNode, RightNode, and MissingNode, the Prediction denotes the value predicted for all the points belonging to this leaf node adjusted (times) times the shrinkage.

To understand the tree structure, its important to note that the splitting of the tree happens in a depth first fashion. So when the root node (with node id 0) is split into its left node and right node, the left side is processed until no further splits are possible before returning and labeling the right node. In both the trees in your example, the RightNode gets a value of 2. This is because in both cases, the LeftNode turns out to be a leaf node.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!