StepLDA without Cross Validation

自作多情 提交于 2019-12-11 10:06:06

问题


I would like to select the variables on the basis of the training error. For that reason I set method in trainControl to "none". However, if I run the function below twice I get two different errors (correctness rates). In this exsample the difference is not worth to mention. Even so I wouldn't have expected any difference at all.

Does somebody know where this difference comes from?

library(caret)

c_1 <- trainControl(method = "none")

maxvar     <-(4) 
direction <-"forward"
tune_1     <-data.frame(maxvar,direction)

train(Species~., data=iris, method = "stepLDA", trControl=c_1, tuneGrid=tune_1)->tr

1st

`stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
150 observations of 4 variables in 3 classes; direction: forward
stop criterion: assemble 4 best variables.
correctness rate: 0.96;  in: "Petal.Width";  variables (1): Petal.Width 
correctness rate: 0.96667;  in: "Sepal.Width";  variables (2): Petal.Width, Sepal.Width 
correctness rate: 0.97333;  in: "Petal.Length";  variables (3): Petal.Width, Sepal.Width, Petal.Length 
correctness rate: 0.98;  in: "Sepal.Length";  variables (4): Petal.Width, Sepal.Width, Petal.Length, Sepal.Length 

 hr.elapsed min.elapsed sec.elapsed 
       0.00        0.00        0.28 

2nd

> train(Species~., data=iris, method = "stepLDA", trControl=c_1, tuneGrid=tune_1)->tr
 `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
150 observations of 4 variables in 3 classes; direction: forward
stop criterion: assemble 4 best variables.
correctness rate: 0.96;  in: "Petal.Width";  variables (1): Petal.Width 
correctness rate: 0.96;  in: "Sepal.Width";  variables (2): Petal.Width, Sepal.Width 
correctness rate: 0.96667;  in: "Petal.Length";  variables (3): Petal.Width, Sepal.Width, Petal.Length 
correctness rate: 0.98;  in: "Sepal.Length";  variables (4): Petal.Width, Sepal.Width, Petal.Length, Sepal.Length 

 hr.elapsed min.elapsed sec.elapsed 
        0.0         0.0         0.3 

回答1:


Your are still doing 10-fold cross validation. As long as you do not set the seed you will always get a slightly different answer when you train the model multiple times.

if you run this piece of code, including the set.seed you will get the same correctness rates.

set.seed(42)
tr <- train(Species~., data=iris, method = "stepLDA", trControl=c_1, tuneGrid=tune_1)

Edit based on comment:

The 10-fold cross-validated correctness rate is not coming from Caret, but from the stepclass function from the klaR package.

stepclass(x, grouping, method, improvement = 0.05, maxvar = Inf, start.vars = NULL, direction = c("both", "forward", "backward"), criterion = "CR", fold = 10, cv.groups = NULL, output = TRUE, min1var = TRUE, ...)

fold parameter for cross-validation; omitted if ‘cv.groups’ is specified.

you can adjust this if you want to by just adding the fold parameter to the train function:

tr <- train(Species~., data=iris, method = "stepLDA", trControl=c_1, tuneGrid=tune_1, fold = 1)

But a fold of 1 is meaningless. you will get a bunch of warnings and errors.



来源:https://stackoverflow.com/questions/32159649/steplda-without-cross-validation

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!