问题
If the factor variable is Climate, with 4 possible values: Tropical, Arid, Temperate, Snow, and a node in my rpart
tree is labeled as "Climate:ab", what is the split?
回答1:
I assume you use standard way to plot tree which is
plot(f)
text(f)
As you can read in help to text.rpart
, argument pretty
on default factor variables are presented as letters, so a
means levels(Climate)[1]
and it means that on left node are observation with Climate==levels(Climate)[1]
and on right the others.
You could print levels directly using
plot(f)
text(f, pretty=1)
but I recommend using draw.tree
from maptree package:
require(maptree)
draw.tree(f)
I used fake data to do plots:
X <- data.frame(
y=rep(1:4,25),
Climate=rep(c("Tropical", "Arid", "Temperate", "Snow"),25)
)
f <- rpart(y~Climate, X)
来源:https://stackoverflow.com/questions/2597310/how-do-i-interpret-rpart-splits-on-factor-variables-when-building-classification