问题
Lately, I have been working on applying grid search cross validation (sklearn GridSearchCV) for hyper-parameter tuning in Keras with Tensorflow backend. An soon as my model is tuned I am trying to save the GridSearchCV object for later use without success.
The hyper-parameter tuning is done as follows:
x_train, x_val, y_train, y_val = train_test_split(NN_input, NN_target, train_size = 0.85, random_state = 4)
history = History()
kfold = 10
regressor = KerasRegressor(build_fn = create_keras_model, epochs = 100, batch_size=1000, verbose=1)
neurons = np.arange(10,101,10)
hidden_layers = [1,2]
optimizer = ['adam','sgd']
activation = ['relu']
dropout = [0.1]
parameters = dict(neurons = neurons,
hidden_layers = hidden_layers,
optimizer = optimizer,
activation = activation,
dropout = dropout)
gs = GridSearchCV(estimator = regressor,
param_grid = parameters,
scoring='mean_squared_error',
n_jobs = 1,
cv = kfold,
verbose = 3,
return_train_score=True))
grid_result = gs.fit(NN_input,
NN_target,
callbacks=[history],
verbose=1,
validation_data=(x_val, y_val))
Remark: create_keras_model function initializes and compiles a Keras Sequential model.
After the cross validation is performed I am trying to save the grid search object (gs) with the following code:
from sklearn.externals import joblib
joblib.dump(gs, 'GS_obj.pkl')
The error I am getting is the following:
TypeError: can't pickle _thread.RLock objects
Could you please let me know what might be the reason for this error?
Thank you!
P.S.: joblib.dump method works well for saving GridSearchCV objects that are used for the training MLPRegressors from sklearn.
回答1:
Try this:
from sklearn.externals import joblib
joblib.dump(gs.best_estimator_, 'filename.pkl')
If you want to dump your object into one file - use:
joblib.dump(gs.best_estimator_, 'filename.pkl', compress = 1)
Simple Example:
from sklearn import svm, datasets
from sklearn.model_selection import GridSearchCV
from sklearn.externals import joblib
iris = datasets.load_iris()
parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]}
svc = svm.SVC()
gs = GridSearchCV(svc, parameters)
gs.fit(iris.data, iris.target)
joblib.dump(gs.best_estimator_, 'filename.pkl')
#['filename.pkl']
EDIT 1:
you can also save the whole object:
joblib.dump(gs, 'gs_object.pkl')
回答2:
Subclass the sklearn.model_selection._search.BaseSearchCV
class. Override the fit(self, X, y=None, groups=None, **fit_params)
method, and modify its internal evaluate_candidates(candidate_params)
function. Instead of immediately returning the results
dictionary from evaluate_candidates(candidate_params)
, perform your serialization here (or in the _run_search
method depending on your use case). With some additional modifications, this approach has the added benefit of allowing you to execute the grid search sequentially (see the comment in the source code here: _search.py). Note that the results
dictionary returned by evaluate_candidates(candidate_params)
is the same as the cv_results
dictionary. This approach worked for me, but I was also attempting to add save-and-restore functionality for interrupted grid search executions.
来源:https://stackoverflow.com/questions/51424312/how-to-save-gridsearchcv-object