问题
I have a number of caret model objects using the same data and tuning parameters. For a sanity check I want to see if each method gives me the same model object. (This is all part of a broader plan to run parallel processing and ensure my models are the same.)
For example, below, I train 2 different models and want to compare.
When I compare the caret objects it returns FALSE.
> library(caret)
>
> set.seed(0)
> myControl <- trainControl(method='cv', index=createFolds(iris$Species))
>
> set.seed(0)
> model1 <- train(Species~., iris, method='rf', trControl=myControl)
>
> set.seed(0)
> model2 <- train(Species~., iris, method='rf', trControl=myControl)
>
> identical(model1,model2)
[1] FALSE
> all.equal(model1,model2)
[1] "Component “times”: Component “everything”: Mean relative difference: 0.09036145"
[2] "Component “times”: Component “final”: Mean relative difference: 0.75"
> compare_models(model1, model2)
One Sample t-test
data: x
t = NaN, df = 9, p-value = NA
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
NaN NaN
sample estimates:
mean of x
0
If I compare the final model instead of the caret object, it returns TRUE.
> identical(model1$finalModel,model2$finalModel)
[1] TRUE
> all.equal(model1$finalModel,model2$finalModel)
[1] TRUE
So I am trying to determine why the caret objects are different? Or if I am using the wrong function?
I have also set the seeds (like in this example: https://stackoverflow.com/a/21988897/8799325) and still have the same issue.
UPDATE: When I interchange different models (e.g. rpart, lm) then with the finalModel specification I get FALSE for the identical() call and TRUE for all.equal(). There must be something in the use of different models?
> set.seed(0)
> myControl <- trainControl(method='cv', index=createFolds(iris$Species))
>
> set.seed(0)
> model3 <- train(Species~., iris, method='rpart', trControl=myControl)
>
> set.seed(0)
> model4 <- train(Species~., iris, method='rpart', trControl=myControl)
>
> identical(model3,model4)
[1] FALSE
> all.equal(model3,model4)
[1] "Component “times”: Component “everything”: Mean relative difference: 0.05063291"
[2] "Component “times”: Component “final”: Mean relative difference: 1"
> compare_models(model3, model4)
One Sample t-test
data: x
t = NaN, df = 9, p-value = NA
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
NaN NaN
sample estimates:
mean of x
0
>
> identical(model3$finalModel,model4$finalModel)
[1] FALSE
> all.equal(model3$finalModel,model4$finalModel)
[1] TRUE
回答1:
train()
stores the execution time it took to run the function, see model1$times
and ?train
. I think these times are irrelevant for your purpose, so that you can safely ignore them:
all.equal(model1[!names(model1) %in% "times"], model2[!names(model2) %in% "times"])
来源:https://stackoverflow.com/questions/61493218/best-function-to-compare-caret-model-objects