Using ordinal variables in rpart and caret without converting to dummy categorical variables

核能气质少年 提交于 2019-12-06 07:52:35

问题


I am trying to create an ordinal regression tree in R using rpart, with the predictors mostly being ordinal data, stored as factor in R.

When I created the tree using rpart, I get something like this:

where the values are the factor values (E.g. A170 has labels ranging from -5 to 10).

However, when I use caret to train the data using rpart, when I extract the final model, the tree no longer has ordinal predictors. See below for a sample output tree

As you see above, it seems the ordinal variable A170 now has been converted into multiple dummy categorical value, i.e. A17010 in the second tree is a dummy for A170 of value 10.

So, is it possible to retain ordinal variables instead of converting factor variables into multiple binary indicator variables when fitting trees with the caret package?


回答1:


Let's start with a reproducible example:

set.seed(144)
dat <- data.frame(x=factor(sample(1:6, 10000, replace=TRUE)))
dat$y <- ifelse(dat$x %in% 1:2, runif(10000) < 0.1, ifelse(dat$x %in% 3:4, runif(10000) < 0.4, runif(10000) < 0.7))*1

As you note, training with the rpart function groups the factor levels together:

library(rpart)
rpart(y~x, data=dat)

I was able to reproduce the caret package splitting up the factors into their individual levels using the formula interface to the train function:

library(caret)
train(y~x, data=dat, method="rpart")$finalModel

The solution I found to avoid splitting factors by level is to input raw data frames to the train function instead of using the formula interface:

train(x=data.frame(dat$x), y=dat$y, method="rpart")$finalModel



来源:https://stackoverflow.com/questions/30819407/using-ordinal-variables-in-rpart-and-caret-without-converting-to-dummy-categoric

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!