How to save GridSearchCV object?

半腔热情 提交于 2020-02-05 13:53:08

问题


Lately, I have been working on applying grid search cross validation (sklearn GridSearchCV) for hyper-parameter tuning in Keras with Tensorflow backend. An soon as my model is tuned I am trying to save the GridSearchCV object for later use without success.

The hyper-parameter tuning is done as follows:

x_train, x_val, y_train, y_val = train_test_split(NN_input, NN_target, train_size = 0.85, random_state = 4)

history = History() 
kfold = 10


regressor = KerasRegressor(build_fn = create_keras_model, epochs = 100, batch_size=1000, verbose=1)

neurons = np.arange(10,101,10) 
hidden_layers = [1,2]
optimizer = ['adam','sgd']
activation = ['relu'] 
dropout = [0.1] 

parameters = dict(neurons = neurons,
                  hidden_layers = hidden_layers,
                  optimizer = optimizer,
                  activation = activation,
                  dropout = dropout)

gs = GridSearchCV(estimator = regressor,
                  param_grid = parameters,
                  scoring='mean_squared_error',
                  n_jobs = 1,
                  cv = kfold,
                  verbose = 3,
                  return_train_score=True))

grid_result = gs.fit(NN_input,
                    NN_target,
                    callbacks=[history],
                    verbose=1,
                    validation_data=(x_val, y_val))

Remark: create_keras_model function initializes and compiles a Keras Sequential model.

After the cross validation is performed I am trying to save the grid search object (gs) with the following code:

from sklearn.externals import joblib

joblib.dump(gs, 'GS_obj.pkl')

The error I am getting is the following:

TypeError: can't pickle _thread.RLock objects

Could you please let me know what might be the reason for this error?

Thank you!

P.S.: joblib.dump method works well for saving GridSearchCV objects that are used for the training MLPRegressors from sklearn.


回答1:


Try this:

from sklearn.externals import joblib
joblib.dump(gs.best_estimator_, 'filename.pkl')

If you want to dump your object into one file - use:

joblib.dump(gs.best_estimator_, 'filename.pkl', compress = 1)

Simple Example:

from sklearn import svm, datasets
from sklearn.model_selection import GridSearchCV
from sklearn.externals import joblib

iris = datasets.load_iris()
parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]}
svc = svm.SVC()
gs = GridSearchCV(svc, parameters)
gs.fit(iris.data, iris.target)

joblib.dump(gs.best_estimator_, 'filename.pkl')

#['filename.pkl']

EDIT 1:

you can also save the whole object:

joblib.dump(gs, 'gs_object.pkl')



回答2:


Subclass the sklearn.model_selection._search.BaseSearchCV class. Override the fit(self, X, y=None, groups=None, **fit_params) method, and modify its internal evaluate_candidates(candidate_params) function. Instead of immediately returning the results dictionary from evaluate_candidates(candidate_params), perform your serialization here (or in the _run_search method depending on your use case). With some additional modifications, this approach has the added benefit of allowing you to execute the grid search sequentially (see the comment in the source code here: _search.py). Note that the results dictionary returned by evaluate_candidates(candidate_params) is the same as the cv_results dictionary. This approach worked for me, but I was also attempting to add save-and-restore functionality for interrupted grid search executions.



来源:https://stackoverflow.com/questions/51424312/how-to-save-gridsearchcv-object

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!