问题
I'd like to perform a hyperparameter search for selecting preprocessing steps and models in sklearn as follows:
pipeline = Pipeline([("combiner", PolynomialFeatures()),
("dimred", PCA()),
("classifier", RandomForestClassifier())])
parameters = [{"combiner": [None]},
{"combiner": [PolynomialFeatures()], "combiner__degree": [2], "combiner__interaction_only": [False, True]},
{"dimred": [None]},
{"dimred": [PCA()], "dimred__n_components": [.95, .75]},
{"classifier": [RandomForestClassifier(n_estimators=100, class_weight="balanced")],
"classifier__max_depth": [5, 10, None]},
{"classifier": [KNeighborsClassifier(weights="distance")],
"classifier__n_neighbors": [3, 7, 11]}]
CV = GridSearchCV(pipeline, parameters, cv=5, scoring="f1_weighted", refit=True, n_jobs=-1)
CV.fit(train_X, train_y)
Of course, I need the results with the best pipeline with the best parameters. However, when I request best estimators with CV.best_estimator_
I get only the winning components, not the hyperparameters:
Pipeline(steps=[('combiner', None), ('dimred', PCA()),
('classifier', RandomForestClassifier())])
When I print out the CV.best_params_
, I get an even shorter info (only with the first element of the Pipeline
, the combiner
, no info about dimred
, classifier
whatsoever):
{'combiner': None}
How could I get the best pipeline combination with components and their hyperparameters?
回答1:
Pipeline
objects have a get_params() method which returns the parameters of the pipeline. This includes the parameters of the individual steps as well. Based on your example, the command
CV.best_estimator_.get_params()
will retrieve all pipeline parameters of the best estimator, including those you are looking for.
回答2:
Since your param_grid
is a list of dictionaries, each such dictionary gives a separate grid, and the search takes place over the disjoint union of those grids. So the best_estimator_
and best_params_
in your case correspond to the single-point grid with combiner=None
and everything else as defined in the original pipeline
. (And the search has never explored combiner=None
with other hyperparameters.)
来源:https://stackoverflow.com/questions/63212770/gridsearchcv-output-problems-in-scikit-learn