问题
I am using caret package to predict the improvementNoticed
variable
library(caret)
head(trainData)
improvementNoticed V1 V2
681 0 0.06451613 0.006060769
1484 0 0.77924586 0.331009145
1356 0 0.22222222 0.017538684
541 0 0.21505376 0.011102470
2214 1 0.59195217 0.064764408
1111 0 0.97979798 0.036445064
V3 V4 V5
681 0.008182531 0.05263158 0
1484 0.316603794 0.88825188 0
1356 0.016182822 0.20000000 0
541 0.012665610 0.10000000 0
2214 0.051008693 0.55000000 0
1111 0.034643632 0.93333333 0
and I run
myControl = trainControl(method='cv',number=5,repeats=2,returnResamp='none')
model1 = train(improvementNoticed~., data=trainData, method = 'glm', trControl=myControl)
and I get the following error:
Error in names(out) <- paste("Fold", gsub(" ", "0", format(seq(along = out))), :
'names' attribute [1] must be the same length as the vector [0]
This is a result of the trainData[,1]
being a factor (rest numeric); previously (when trainData[,1]
was numeric) I got a different error:
Error in cut.default(y, unique(quantile(y, probs = seq(0, 1, length = cuts))), :
invalid number of intervals
Please note that improvementNoticed
is a binary variable.
If i convert trainData[,1]
into integer
, I get the same error, as with a numeric.
Two last things:
traceback()
5: createFolds(y, trControl$number, returnTrain = TRUE)
4: train.default(x, y, weights = w, ...)
3: train(x, y, weights = w, ...)
2: train.formula(improvementNoticed ~ ., data = trainData, method = "glm",
trControl = myControl)
1: train(improvementNoticed ~ ., data = trainData, method = "glm",
trControl = myControl)
And results of sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-redhat-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] elasticnet_1.1 lars_1.2 klaR_0.6-9 MASS_7.3-26
[5] kernlab_0.9-18 nnet_7.3-6 randomForest_4.6-7 doMC_1.3.0
[9] iterators_1.0.6 caret_5.17-7 reshape2_1.2.2 plyr_1.8
[13] lattice_0.20-15 foreach_1.4.1 cluster_1.14.4
loaded via a namespace (and not attached):
[1] codetools_0.2-8 grid_3.0.1 stringr_0.6.2 tools_3.0.1
回答1:
As it happens, the error was a really basic one.
I was performing normalization on the data (that I did not suspect would cause the issue) but it turned out one of the variables only had 0's in it; hence I got all NaN's, which caused the model to fail.
来源:https://stackoverflow.com/questions/18510492/train-in-caret-package-returns-an-error-about-names-gsub