r caret predict returns fewer output than input

ε祈祈猫儿з 提交于 2019-12-07 06:36:30

问题


I used caret to train an rpart model below.

trainIndex <- createDataPartition(d$Happiness, p=.8, list=FALSE)
dtrain <- d[trainIndex, ]
dtest <- d[-trainIndex, ]
fitControl <- trainControl(## 10-fold CV
  method = "repeatedcv", number=10, repeats=10)
fitRpart <- train(Happiness ~ ., data=dtrain, method="rpart",
                trControl = fitControl)
testRpart <- predict(fitRpart, newdata=dtest)

dtest contains 1296 observations, so I expected testRpart to produce a vector of length 1296. Instead it's 1077 long, i.e. 219 short.

When I ran the prediction on the first 220 rows of dtest, I got a predicted result of 1, so it's consistently 219 short.

Any explanation on why this is so, and what I can do to get a consistent output to the input?

Edit: d can be loaded from here to reproduce the above.


回答1:


I downloaded your data and found what explains the discrepancy.

If you simply remove the missing values from your dataset, the length of the outputs match:

testRpart <- predict(fitRpart, newdata = na.omit(dtest))

Note nrow(na.omit(dtest)) is 1103, and length(testRpart) is 1103. So you need a strategy to address missing values. See ?predict.rpart and the options for the na.action parameter to choose what you want.




回答2:


I had a similar issue using "newx" instead of "newdata" in the predict function. Using "newdata" (or nothing) solve my problem, hope it will help someone else who used newx and having same issue.



来源:https://stackoverflow.com/questions/30689801/r-caret-predict-returns-fewer-output-than-input

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!