问题
I am trying to understand how rpart works in a project that I am trying to complete. I am relatively new to R but I have a lot of experience using SAS to build a variety of analytical models.
First I ran this piece of code
mtree1 <- rpart(X17~., data = mydata, method="class", control = rpart.control(minsplit = 20, minbucket = 7, maxdepth = 10, usesurrogate = 2, xval =10 ))
I get a tree with X12 as the top split, X10 is the next split on the LHS, X69 on the RHS, and then X68 and X70 on that branch.
Next I ran the following piece
mtree1 <- rpart(X17~ X12+X10+X69+X68+X70, data = mydata, method="class", control = rpart.control(minsplit = 20, minbucket = 7, maxdepth = 10, usesurrogate = 2, xval =10 ))
I get the exact same tree
Finally I ran this
mtree1 <- rpart(X17~ X12+X69+X68+X70, data = mydata, method="class", control = rpart.control(minsplit = 20, minbucket = 7, maxdepth = 10, usesurrogate = 2, xval =10 ))
Now I get no splits at all. (BTW, my data set has 234144 observations & 90 independent variables with 210205 goods & 23839 bads.)
Here is an image of the code and output
What is the reason for this? I would appreciate any help. Thanks. KK
来源:https://stackoverflow.com/questions/46551031/r-rpart-no-splits-if-i-remove-less-important-variables