问题
I would like to select the variables on the basis of the training error. For that reason I set method in trainControl to "none". However, if I run the function below twice I get two different errors (correctness rates). In this exsample the difference is not worth to mention. Even so I wouldn't have expected any difference at all.
Does somebody know where this difference comes from?
library(caret)
c_1 <- trainControl(method = "none")
maxvar <-(4)
direction <-"forward"
tune_1 <-data.frame(maxvar,direction)
train(Species~., data=iris, method = "stepLDA", trControl=c_1, tuneGrid=tune_1)->tr
1st
`stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
150 observations of 4 variables in 3 classes; direction: forward
stop criterion: assemble 4 best variables.
correctness rate: 0.96; in: "Petal.Width"; variables (1): Petal.Width
correctness rate: 0.96667; in: "Sepal.Width"; variables (2): Petal.Width, Sepal.Width
correctness rate: 0.97333; in: "Petal.Length"; variables (3): Petal.Width, Sepal.Width, Petal.Length
correctness rate: 0.98; in: "Sepal.Length"; variables (4): Petal.Width, Sepal.Width, Petal.Length, Sepal.Length
hr.elapsed min.elapsed sec.elapsed
0.00 0.00 0.28
2nd
> train(Species~., data=iris, method = "stepLDA", trControl=c_1, tuneGrid=tune_1)->tr
`stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
150 observations of 4 variables in 3 classes; direction: forward
stop criterion: assemble 4 best variables.
correctness rate: 0.96; in: "Petal.Width"; variables (1): Petal.Width
correctness rate: 0.96; in: "Sepal.Width"; variables (2): Petal.Width, Sepal.Width
correctness rate: 0.96667; in: "Petal.Length"; variables (3): Petal.Width, Sepal.Width, Petal.Length
correctness rate: 0.98; in: "Sepal.Length"; variables (4): Petal.Width, Sepal.Width, Petal.Length, Sepal.Length
hr.elapsed min.elapsed sec.elapsed
0.0 0.0 0.3
回答1:
Your are still doing 10-fold cross validation. As long as you do not set the seed you will always get a slightly different answer when you train the model multiple times.
if you run this piece of code, including the set.seed you will get the same correctness rates.
set.seed(42)
tr <- train(Species~., data=iris, method = "stepLDA", trControl=c_1, tuneGrid=tune_1)
Edit based on comment:
The 10-fold cross-validated correctness rate is not coming from Caret, but from the stepclass function from the klaR package.
stepclass(x, grouping, method, improvement = 0.05, maxvar = Inf, start.vars = NULL, direction = c("both", "forward", "backward"), criterion = "CR", fold = 10, cv.groups = NULL, output = TRUE, min1var = TRUE, ...)
fold parameter for cross-validation; omitted if ‘cv.groups’ is specified.
you can adjust this if you want to by just adding the fold parameter to the train function:
tr <- train(Species~., data=iris, method = "stepLDA", trControl=c_1, tuneGrid=tune_1, fold = 1)
But a fold of 1 is meaningless. you will get a bunch of warnings and errors.
来源:https://stackoverflow.com/questions/32159649/steplda-without-cross-validation