GridSearchCV output problems in Scikit-learn

不问归期 提交于 2021-01-27 20:57:03

问题


I'd like to perform a hyperparameter search for selecting preprocessing steps and models in sklearn as follows:

pipeline = Pipeline([("combiner", PolynomialFeatures()),
                     ("dimred", PCA()),
                     ("classifier", RandomForestClassifier())])

parameters = [{"combiner": [None]},
              {"combiner": [PolynomialFeatures()], "combiner__degree": [2], "combiner__interaction_only": [False, True]},

              {"dimred": [None]},
              {"dimred": [PCA()], "dimred__n_components": [.95, .75]},

              {"classifier": [RandomForestClassifier(n_estimators=100, class_weight="balanced")],
               "classifier__max_depth": [5, 10, None]},
              {"classifier": [KNeighborsClassifier(weights="distance")],
               "classifier__n_neighbors": [3, 7, 11]}]

CV = GridSearchCV(pipeline, parameters, cv=5, scoring="f1_weighted", refit=True, n_jobs=-1)
CV.fit(train_X, train_y)

Of course, I need the results with the best pipeline with the best parameters. However, when I request best estimators with CV.best_estimator_ I get only the winning components, not the hyperparameters:

Pipeline(steps=[('combiner', None), ('dimred', PCA()),
                ('classifier', RandomForestClassifier())])

When I print out the CV.best_params_, I get an even shorter info (only with the first element of the Pipeline, the combiner, no info about dimred, classifier whatsoever):

{'combiner': None}

How could I get the best pipeline combination with components and their hyperparameters?


回答1:


Pipeline objects have a get_params() method which returns the parameters of the pipeline. This includes the parameters of the individual steps as well. Based on your example, the command

CV.best_estimator_.get_params()

will retrieve all pipeline parameters of the best estimator, including those you are looking for.




回答2:


Since your param_grid is a list of dictionaries, each such dictionary gives a separate grid, and the search takes place over the disjoint union of those grids. So the best_estimator_ and best_params_ in your case correspond to the single-point grid with combiner=None and everything else as defined in the original pipeline. (And the search has never explored combiner=None with other hyperparameters.)



来源:https://stackoverflow.com/questions/63212770/gridsearchcv-output-problems-in-scikit-learn

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!