问题
This is a follow up on a question answered here, but I believe it deserves its own thread.
In the previous question, we were dealing with “an Ensemble of Ensemble classifiers, where each has its own parameters.” Let's start with the example provided by MaximeKan in his answer:
my_est = BaggingClassifier(RandomForestClassifier(n_estimators = 100, bootstrap = True,
max_features = 0.5), n_estimators = 5, bootstrap_features = False, bootstrap = False,
max_features = 1.0, max_samples = 0.6 )
Now say I want to go one level above that: Considerations like efficiency, computational cost, etc., aside, and as a general concept: How would I ran grid search with this kind of setup?
I can set up two parameter grids along these lines:
One for the BaggingClassifier
:
BC_param_grid = {
'bootstrap': [True, False],
'bootstrap_features': [True, False],
'n_estimators': [5, 10, 15],
'max_samples' : [0.6, 0.8, 1.0]
}
And one for the RandomForestClassifier
:
RFC_param_grid = {
'bootstrap': [True, False],
'n_estimators': [100, 200, 300],
'max_features' : [0.6, 0.8, 1.0]
}
Now I can call grid search with my estimator:
grid_search = GridSearchCV(estimator = my_est, param_grid = ???)
What do I do with the param_grid
parameter in this case? Or more specifically, how do I use both of the parameter grids I set up?
I have to say, it feels like I’m playing with matryoshka dolls.
回答1:
Following @James Dellinger comment above, and expanding from there, I was able to get it done. Turns out the "secret sauce" is indeed a mostly-undocumented feature - the __
(double underline) separator (there's some passing reference to it in the Pipeline documentation): it seems that adding the inside/base estimator name, followed by this __
to the name of an inside/base estimator parameter, allows you to create a param_grid
which covers parameters for both the outside and inside estimators.
So for the example in the question, the outside estimator is BaggingClassifier
and the inside/base estimator is RandomForestClassifier
. So what you need to do is, first, to import what needs to be imported:
from sklearn.ensemble import BaggingClassifier, RandomForestClassifier
from sklearn.model_selection import GridSearchCV
followed by the param_grid
assignments (in this case, those in example in the question):
param_grid = {
'bootstrap': [True, False],
'bootstrap_features': [True, False],
'n_estimators': [5, 10, 15],
'max_samples' : [0.6, 0.8, 1.0],
'base_estimator__bootstrap': [True, False],
'base_estimator__n_estimators': [100, 200, 300],
'base_estimator__max_features' : [0.6, 0.8, 1.0]
}
And, finally, your grid search:
grid_search=GridSearchCV(BaggingClassifier(base_estimator=RandomForestClassifier()), param_grid=param_grid, cv=5)
And you're off to the races.
来源:https://stackoverflow.com/questions/54543612/grid-search-on-parameters-inside-the-parameters-of-a-baggingclassifier