Alternate different models in Pipeline for GridSearchCV

后端未结

关注

 2  1889

隐瞒了意图╮ 2021-02-09 03:11

I want to build a Pipeline in sklearn and test different models using GridSearchCV.

Just an example (please do not pay attention on what particular models are chosen):

2条回答

栀梦 (楼主)

2021-02-09 03:18

An alternative solution that does not require to prefix the estimators names in the parameter grid is the following:

from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression

# the models that you want to compare
models = {
    'RandomForestClassifier': RandomForestClassifier(),
    'KNeighboursClassifier': KNeighborsClassifier(),
    'LogisticRegression': LogisticRegression()
}

# the optimisation parameters for each of the above models
params = {
    'RandomForestClassifier':{ 
            "n_estimators"      : [100, 200, 500, 1000],
            "max_features"      : ["auto", "sqrt", "log2"],
            "bootstrap": [True],
            "criterion": ['gini', 'entropy'],
            "oob_score": [True, False]
            },
    'KNeighboursClassifier': {
        'n_neighbors': np.arange(3, 15),
        'weights': ['uniform', 'distance'],
        'algorithm': ['ball_tree', 'kd_tree', 'brute']
        },
    'LogisticRegression': {
        'solver': ['newton-cg', 'sag', 'lbfgs'],
        'multi_class': ['ovr', 'multinomial']
        }  
}

and you can define:

from sklearn.model_selection import GridSearchCV

def fit(train_features, train_actuals):
        """
        fits the list of models to the training data, thereby obtaining in each 
        case an evaluation score after GridSearchCV cross-validation
        """
        for name in models.keys():
            est = models[name]
            est_params = params[name]
            gscv = GridSearchCV(estimator=est, param_grid=est_params, cv=5)
            gscv.fit(train_features, train_actuals)
            print("best parameters are: {}".format(gscv.best_estimator_))

basically running through the different models, each model referring to its own set of optimisation parameters through a dictionary. Of course do not forget to pass the models and the parameters dictionary to the fit function, in case you do not have them as global variables. Have a look at this GitHub project for a more complete overview.

0 讨论(0)

查看其它2个回答