问题
I used caret
to train an rpart
model below.
trainIndex <- createDataPartition(d$Happiness, p=.8, list=FALSE)
dtrain <- d[trainIndex, ]
dtest <- d[-trainIndex, ]
fitControl <- trainControl(## 10-fold CV
method = "repeatedcv", number=10, repeats=10)
fitRpart <- train(Happiness ~ ., data=dtrain, method="rpart",
trControl = fitControl)
testRpart <- predict(fitRpart, newdata=dtest)
dtest
contains 1296 observations, so I expected testRpart
to produce a vector of length 1296. Instead it's 1077 long, i.e. 219 short.
When I ran the prediction on the first 220 rows of dtest
, I got a predicted result of 1, so it's consistently 219 short.
Any explanation on why this is so, and what I can do to get a consistent output to the input?
Edit: d
can be loaded from here to reproduce the above.
回答1:
I downloaded your data and found what explains the discrepancy.
If you simply remove the missing values from your dataset, the length of the outputs match:
testRpart <- predict(fitRpart, newdata = na.omit(dtest))
Note nrow(na.omit(dtest))
is 1103, and length(testRpart)
is 1103. So you need a strategy to address missing values. See ?predict.rpart
and the options for the na.action parameter to choose what you want.
回答2:
I had a similar issue using "newx" instead of "newdata" in the predict function. Using "newdata" (or nothing) solve my problem, hope it will help someone else who used newx and having same issue.
来源:https://stackoverflow.com/questions/30689801/r-caret-predict-returns-fewer-output-than-input