Why do results using caret::train(…, method = “rpart”) differ from rpart::rpart(…)?

拜拜、爱过 提交于 2019-12-04 03:12:11

caret actually does quite a bit more under the hood. In particular, it uses cross-validation to optimize the model hyperparameters. In your case, it tries three values of cp (type modFit and you'll see accuracy results for each value), whereas rpart just uses 0.01 unless you tell it otherwise (see ?rpart.control). The cross-validation will also take longer, especially since caret uses bootstrapping by default.

In order to get similar results, you need to disable cross-validation and specify cp:

modFit <- caret::train(y ~ ., method = "rpart", data = training,

In addition, you should use the same random seed for both models.

That said, the extra functionality that caret provides is a Good Thing, and you should probably just go with caret. If you want to learn more, it's well-documented, and the author has a stellar book, Applied Predictive Modeling.
