Search for corresponding node in a regression tree using rpart

谁都会走 提交于 2019-12-18 04:15:33

问题


I'm pretty new to R and I'm stuck with a pretty dumb problem.

I'm calibrating a regression tree using the rpart package in order to do some classification and some forecasting.

Thanks to R the calibration part is easy to do and easy to control.

#the package rpart is needed
library(rpart)

# Loading of a big data file used for calibration
my_data <- read.csv("my_file.csv", sep=",", header=TRUE)

# Regression tree calibration
tree <- rpart(Ratio ~ Attribute1 + Attribute2 + Attribute3 + 
                      Attribute4 + Attribute5, 
                      method="anova", data=my_data, 
                      control=rpart.control(minsplit=100, cp=0.0001))

After having calibrated a big decision tree, I want, for a given data sample to find the corresponding cluster of some new data (and thus the forecasted value).
The predict function seems to be perfect for the need.

# read validation data
validationData <-read.csv("my_sample.csv", sep=",", header=TRUE)

# search for the probability in the tree
predict <- predict(tree, newdata=validationData, class="prob")

# dump them in a file
write.table(predict, file="dump.txt") 

However with the predict method I just get the forecasted ratio of my new elements, and I can't find a way get the decision tree leaf where my new elements belong.

I think it should be pretty easy to get since the predict method must have found that leaf in order to return the ratio.

There are several parameters that can be given to the predict method through the class= argument, but for a regression tree all seem to return the same thing (the value of the target attribute of the decision tree)

Does anyone know how to get the corresponding node in the decision tree?

By analyzing the node with the path.rpart method, it would help me understanding the results.


回答1:


Benjamin's answer unfortunately doesn't work: type="vector" still returns the predicted values.

My solution is pretty klugy, but I don't think there's a better way. The trick is to replace the predicted y values in the model frame with the corresponding node numbers.

tree2 = tree
tree2$frame$yval = as.numeric(rownames(tree2$frame))
predict = predict(tree2, newdata=validationData)

Now the output of predict will be node numbers as opposed to predicted y values.

(One note: the above worked in my case where tree was a regression tree, not a classification tree. In the case of a classification tree, you probably need to omit as.numeric or replace it with as.factor.)




回答2:


You can use the partykit package:

fit <- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis)

library("partykit")
fit.party <- as.party(fit)
predict(fit.party, newdata = kyphosis[1:4, ], type = "node")

For your example just set

predict(as.party(tree), newdata = validationData, type = "node")



回答3:


I think what you want is type="vector" instead of class="prob" (I don't think class is an accepted parameter of the predict method), as explained in the rpart docs:

If type="vector": vector of predicted responses. For regression trees this is the mean response at the node, for Poisson trees it is the estimated response rate, and for classification trees it is the predicted class (as a number).



来源:https://stackoverflow.com/questions/5102754/search-for-corresponding-node-in-a-regression-tree-using-rpart

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!