I want to build a Pipeline in sklearn and test different models using GridSearchCV.
Just an example (please do not pay attention on what particular models are chosen):>
Lets assume you want to use PCA and TruncatedSVD as your dimesionality reduction step.
pca = decomposition.PCA()
svd = decomposition.TruncatedSVD()
svm = SVC()
n_components = [20, 40, 64]
You can do this:
pipe = Pipeline(steps=[('reduction', pca), ('svm', svm)])
# Change params_grid -> Instead of dict, make it a list of dict
# In the first element, pass parameters related to pca, and in second related to svd
params_grid = [{
'svm__C': [1, 10, 100, 1000],
'svm__kernel': ['linear', 'rbf'],
'svm__gamma': [0.001, 0.0001],
'reduction':pca,
'reduction__n_components': n_components,
},
{
'svm__C': [1, 10, 100, 1000],
'svm__kernel': ['linear', 'rbf'],
'svm__gamma': [0.001, 0.0001],
'reduction':svd,
'reduction__n_components': n_components,
'reduction__algorithm':['randomized']
}]
and now just pass the pipeline object to gridsearchCV
grd = GridSearchCV(pipe, param_grid = params_grid)
Calling grd.fit()
will search the parameters over both the elements of the params_grid list, using all values from one
at a time.
Please look at my other answer for more details: "Parallel" pipeline to get best model using gridsearch