Get individual models and customized score in GridSearchCV and RandomizedSearchCV [duplicate]

问题

GridSearchCV and RandomizedSearchCV has best_estimator_ that :

Returns only the best estimator/model
Find the best estimator via one of the simple scoring methods : accuracy, recall, precision, etc.
Evaluate based on training sets only

I would like to enrich those limitations with

My own definition of scoring methods
Evaluate further on test set rather than training as done by GridSearchCV. Eventually it's the test set performance that counts. Training set tends to give almost perfect accuracy on my Grid Search.

I was thinking of achieving it by :

Get the individual estimators/models in GridSearchCV and RandomizedSearchCV
With every estimator/model, predict on test set and evaluate with my customized score

My question is:

Is there a way to get all individual models from GridSearchCV ?
If not, what is your thought to achieve the same thing as what I wanted ? Initially I wanted to exploit existing GridSearchCV because it handles automatically multiple parameter grid, CV and multi-threading. Any other recommendation to achieve the similar result is welcome.

回答1:

You can use custom scoring methods already in the XYZSearchCVs: see the scoring parameter and the documentation's links to the User Guide for how to write a custom scorer.

You can use a fixed train/validation split to evaluate the hyperparameters (see the cv parameter), but this will be less robust than a k-fold cross-validation. The test set should be reserved for scoring only the final model; if you use it to select hyperparameters, then the scores you receive will not be unbiased estimates of future performance.

There is no easy way to retrieve all the models built by GridSearchCV. (It would generally be a lot of models, and saving them all would generally be a waste of memory.)

The parallelization and parameter grid parts of GridSearchCV are surprisingly simple; if you need to, you can copy out the relevant parts of the source code to produce your own approach.

Training set tends to give almost perfect accuracy on my Grid Search.

That's a bit surprising, since the CV part of the searches means the models are being scored on unseen data. If you get very high best_score_ but low performance on the test set, then I would suspect your training set is not actually a representative sample, and that'll require a much more nuanced understanding of the situation.

来源：https://stackoverflow.com/questions/62864193/get-individual-models-and-customized-score-in-gridsearchcv-and-randomizedsearchc

标签

python

scikit-learn

cross-validation

grid-search

gridsearchcv