Get individual models and customized score in GridSearchCV and RandomizedSearchCV [duplicate]

邮差的信 提交于 2020-07-20 04:33:46

问题


GridSearchCV and RandomizedSearchCV has best_estimator_ that :

  • Returns only the best estimator/model
  • Find the best estimator via one of the simple scoring methods : accuracy, recall, precision, etc.
  • Evaluate based on training sets only

I would like to enrich those limitations with

  • My own definition of scoring methods
  • Evaluate further on test set rather than training as done by GridSearchCV. Eventually it's the test set performance that counts. Training set tends to give almost perfect accuracy on my Grid Search.

I was thinking of achieving it by :

  • Get the individual estimators/models in GridSearchCV and RandomizedSearchCV
  • With every estimator/model, predict on test set and evaluate with my customized score

My question is:

  • Is there a way to get all individual models from GridSearchCV ?
  • If not, what is your thought to achieve the same thing as what I wanted ? Initially I wanted to exploit existing GridSearchCV because it handles automatically multiple parameter grid, CV and multi-threading. Any other recommendation to achieve the similar result is welcome.

回答1:


You can use custom scoring methods already in the XYZSearchCVs: see the scoring parameter and the documentation's links to the User Guide for how to write a custom scorer.

You can use a fixed train/validation split to evaluate the hyperparameters (see the cv parameter), but this will be less robust than a k-fold cross-validation. The test set should be reserved for scoring only the final model; if you use it to select hyperparameters, then the scores you receive will not be unbiased estimates of future performance.

There is no easy way to retrieve all the models built by GridSearchCV. (It would generally be a lot of models, and saving them all would generally be a waste of memory.)

The parallelization and parameter grid parts of GridSearchCV are surprisingly simple; if you need to, you can copy out the relevant parts of the source code to produce your own approach.


Training set tends to give almost perfect accuracy on my Grid Search.

That's a bit surprising, since the CV part of the searches means the models are being scored on unseen data. If you get very high best_score_ but low performance on the test set, then I would suspect your training set is not actually a representative sample, and that'll require a much more nuanced understanding of the situation.



来源:https://stackoverflow.com/questions/62864193/get-individual-models-and-customized-score-in-gridsearchcv-and-randomizedsearchc

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!